OUTOFTHEBOXOPTIMIZATIONUSINGTHEPARAMETER-LESSPOPULA TION PYRAMID By BrianW.Goldman ADISSERTATION Submittedto MichiganStateUniversity inpartialfullmentoftherequirements forthedegreeof ComputerScience-DoctorOfPhilosophy 2015 ABSTRACT OUTOFTHEBOXOPTIMIZATIONUSINGTHEPARAMETER-LESSPOPULA TION PYRAMID By BrianW.Goldman TheParameter-lessPopulationPyramid(P3)isarecentlyintroduce dmethodforper- formingevolutionaryoptimizationwithoutrequiringanyuser-spec edparameters.P3's primaryinnovationistoreplacethegenerationalmodelwithapyram idofmultiplepop- ulationsthatareiterativelycreatedandexpanded.Incombination withlocalsearchand advancedcrossover,P3scalestoproblemdculty,exploitingprev iouslylearnedinformation beforeaddingmorediversity. Acrosssevenproblems,eachtestedusingonaverage18problems izes,P3outperformedall veadvancedcomparisonalgorithms.Thisimprovementincludesreq uiringfewerevaluations tondtheglobaloptimumandbettertnesswhenusingthesamenu mberofevaluations. UsingbothalgorithmanalysisandcomparisonweshowP3'seectivene ssisduetoitsability toproperlymaintain,add,andexploitdiversity. Unlikethebestcomparisonalgorithms,P3wasabletoachievethisqua lity withoutany problem-specictuning .Thus,unlikepreviousparameter-lessmethods,P3doesnotsacr ce qualityforapplicability.ThereforeweconcludethatP3isanecient, general,parameter- lessapproachtoblack-boxoptimizationthatismoreeectivethane xistingstate-of-the-art techniques. Furthermore,P3canbespecializedforgray-boxproblems,whichh aveknown,limited, non-linearrelationshipsbetweenvariables.Gray-BoxP3leverages theHamming-BallHill Climber,anexceptionallyecientformoflocalsearch,aswellasano velmethodforper- formingcrossoverusingtheknownvariableinteractions.Indoings oGray-BoxP3isableto ndtheglobaloptimumoflargeproblemsinseconds,improvingoverB lack-BoxP3byup totwoordersofmagnitude. TABLEOFCONTENTS LISTOFFIGURES ................................... v Chapter1Introduction ................................. 1 Chapter2ComparisonOptimizers .......................... 3 2.1RandomRestartHillClimber...........................4 2.2(1+( ; )).....................................5 2.3HierarchicalBayesianOptimizationAlgorithm................ ..6 2.4Parameter-lesshBOA................................ 8 2.5LinkageTreeGeneticAlgorithm.......................... 9 2.6ComparisonAlgorithmParameterTuning................... .12 Chapter3Parameter-lessPopulationPyramid .................. 15 Chapter4ProblemDescriptions ........................... 19 4.1SingleInstanceProblems.............................. 19 4.2RandomlyGeneratedProblemClasses..................... .21 Chapter5BenchmarkingP3 ............................. 23 5.1FindingtheGlobalOptimum...........................23 5.1.1QuantitativeComparison..........................23 5.1.2LocalSearch.................................25 5.1.3ModelBuilding...............................26 5.1.4P3......................................26 5.2FitnessOverTime.................................27 5.2.1QuantitativeComparison..........................27 5.2.2LocalSearch.................................27 5.2.3ModelBuilding...............................29 5.2.4P3......................................29 5.3ComputationalExpenses............................. .30 5.3.1OperationCounting.............................30 5.3.2WallClockPerformance..........................32 5.3.2.1ModelBuilding..........................32 5.3.2.2P3.................................32 Chapter6InternalWorkings ............................. 35 6.1PopulationSizing..................................35 6.1.1ProblemInstanceversusProblemClass.................. 35 6.1.2FastversusOptimal............................38 6.2InnerWorkingsSpecctoP3........................... 39 6.2.1Crossover..................................39 6.2.2Pyramid...................................41 6.2.2.1DeceptiveStepTrap.......................43 iiiChapter7NewProblemDomain:Gray-BoxOptimization .......... 44 7.1Problemsinthisdomain..............................45 7.2FormalRequirements................................ 46 7.3EcientLocalSearch................................ 47 7.4EcientHammingBallSearch..........................50 7.5TournamentUniformCrossover:TUX.................... ..53 Chapter8Gray-BoxP3 ................................ 55 Chapter9BenchmarkingGray-BoxP3 ....................... 58 9.1TheEectofRadius................................60 9.2FitnessOverTime.................................61 9.3Scalability......................................62 9.4Discussion......................................65 Chapter10UnderstandingWhenP3Excels ................... 68 10.1BigValley.....................................68 10.1.1QuicklyFindingAllLocalOptima....................69 10.1.2LookingatProblems...........................72 10.2PyramidLevels..................................76 Chapter11ConclusionsandFutureWork ..................... 81 APPENDIX ........................................ 84 BIBLIOGRAPHY .................................... 87 iv LISTOFFIGURES Figure2.1:Hillclimbingalgorithmusedtoimproverandomlygenerateds olutions untilnosinglebitchangeresultsinatnessimprovement.......4 Figure2.2:AlgorithmdescribinghowLTGAcreatesclustersusingEqu ation2.2 forapopulation. unmerged and useful areorderedsetsofsetsof geneloci...................................10 Figure2.3:Algorithmdescribinghowclustersareusedtoperformcr ossover....11 Figure2.4:Optimalpopulationsizesfoundusingbisectiononeachsize foreach problem...................................13 Figure3.1:OneiterationofP3optimization. pyramid isanorderedsetof populationsand hashset isasetofallsolutionsin pyramid ......16 Figure5.1:Comparisonofthemediannumberofevaluationstoreach theglobal optimumforthesixderentoptimizationmethodswithrespectto problemsize.Ifthemedianrundidnotreachtheglobaloptimumno dataelementisshown.Resultsgivenonalog-logscale.........24 Figure5.2:Comparesthemedianbesttnessreachedduringsearc hforeachof thesixoptimizationmethods.......................28 Figure5.3:Estimatedcomputationcostsincurredbymodelrebuildin g (Figure5.3a)andrepeateddonations(Figure5.3b)perevaluationa s problemsizeincreases...........................31 Figure5.4:Comparisonofthemediannumberofsecondstoreachth eglobal optimumforthesixderentoptimizationmethodswithrespectto problemsize.Ifthemedianrundidnotreachtheglobaloptimumno dataelementisshown.Resultsgivenonalog-logscale.........33 Figure6.1:ThetotalnumberofsolutionsstoredbyP3whentheglob aloptimum isfound.InFigure6.1bthered\+"indicatesLTGA'stuned populationsize...............................36 Figure6.2:Distributionofevaluationsrequiredtoreachtheglobalo ptimumfor P3andLTGAonthelargestsizeofeachproblem............37 Figure6.3:ComparisonofhowreducingLTGA'spopulationsizeaects the medianbesttnessreachedduringsearch................39 v Figure6.4:ForeachproblemFigure6.4ashowstheproportionofP3e valuations spendoncrossoversandFigure6.4bshowsthepercentageof tness-improvingcrossoverevaluations..................4 0 Figure6.5:ForeachproblemFigure6.5ashowsthenumberofsolution sstoredin eachlevelofthepyramidandFigure6.5bshowsthepercentageof tness-improvingcrossoverevaluationsateachlevel.......... .42 Figure7.1:Algorithmusedtoecientlydeterminethechangeintnes s associatedwitheachpotentialmovefromagivensolution.......4 8 Figure7.2:Algorithmforupdatingstoredinformationrelatedtoaso lutionwhen makingamove...............................49 Figure7.3:Algorithmtorecursivelyndallconnectedinducedsubgr aphsofsize r orfewer..................................51 Figure7.4:OneiterationofTUXoptimization. T isanorderedlistofsolutions, eachpositionofwhichcouldbeempty,awaitingacrossoverpartner ..53 Figure9.1:Comparisonofhowradiusaectssolutionqualityattermin ation.For NKq-Landscapes N =6 ;000and K =4andforIsingSpinGlasses N =6 ;084.Rangeofradiusvalueslimitedbymemoryconstraints...60 Figure9.2:TimerequiredforGray-BoxP3toreachtheglobaloptimu mof NearestNeighborNKqinstanceswith N =6 ;000............61 Figure9.3:Comparisonofsolutionqualityduringoptimizationonalog-lo gscale forderentalgorithms.ForNKq-Landscapes N =6 ;000and K =4 andforIsingSpinGlasses N =6 ;084.Eachalgorithmusesits best-found r value.............................62 Figure9.4:ComparisonofGray-BoxP3'ssolutionqualityduringoptimiz ationon alog-logscaleforderent r values.ForNKq-Landscapes N =6 ;000 and K =4andforIsingSpinGlasses N =6 ;084............63 Figure9.5:Comparisonofhoweachalgorithm'stimerequiredtoreach thebest tnessfoundscaleswithproblemsizeonNearestNeighborNKqwith K =4.With N =1000Gray-BoxP3is375xfasterthanBlack-Box P3.HBHCwasonlysuccessfulonthesmallestproblemsize......64 Figure9.6:Comparisonofhoweachalgorithm'stimerequiredtondth ebest tnessfoundscaleswithproblemsizeonIsingSpinGlass.With N =2025Gray-BoxP3is4.6xfasterthanBlack-BoxP3.HBHCwas onlysuccessfulonthesmallestproblemsize...............64 vi Figure9.7:Relativequalitiesofeachmethodasproblemsizeincreases on UnrestrictedNKqwith K =4.......................65 Figure10.1:Examplechangeofenumerationordering.Thegrayloci representall dependenciesforsomemove m i .Byreordering, m i 'slowest index dependencyimprovesfrom2to4.....................69 Figure10.2:Algorithmtondalllocaloptimaofagivengray-boxprob lem. MakeMove ,describedinFigure7.2,ipsbit index andupdatesthe tnesseect delta ofmakingallmovesdependenton index .move bin storesmovesbasedontheirlowest index dependency..........71 Figure10.3:Locationandqualityoflocaloptimaincomparisontotheg lobal optimumwith N =30and k =5.....................73 Figure10.4:Locationandqualityoflocaloptimaincomparisontotheg lobal optimaforarepresentativeNearestNeighborNKqproblemwith N =60and k =2.............................74 Figure10.5:Locationandqualityoflocaloptimaincomparisontotheg lobal optimaforarepresentativeUnrestrictedNKqproblemwith N =60 and k =2..................................74 Figure10.6:Locationandqualityoflocaloptimaincomparisontotheg lobal optimaforarepresentativeIsingSpinGlassproblemwith N =36...75 Figure10.7:Locationandqualityoflocaloptimaincomparisontotheg lobal optimaforarepresentativeMAX-SATproblemwith N =36......76 Figure10.8:DistributionoflocaloptimastoredateachlevelofGray -BoxP3in relationtotheglobaloptimumontheDeceptiveStepTrapproblem N =6000andtrapsofsize5........................77 Figure10.9:DistributionoflocaloptimastoredateachlevelofGray -BoxP3in relationtothebestfoundbytherunonaNearestNeighborNKq problem N =6000and K =4.......................78 Figure10.10:DistributionoflocaloptimastoredateachlevelofGra y-BoxP3in relationtothebestfoundbytherunonanUnrestrictedNKq problem N =6000and K =4.......................78 Figure10.11:DistributionoflocaloptimastoredateachlevelofGra y-BoxP3in relationtothebestfoundbytherunonanIsingSpinGlass N =6084.79 vii Figure10.12:DistributionoflocaloptimastoredateachlevelofGra y-BoxP3in relationtothebestfoundbytherunonaMAX-SATproblem N =6000..................................80 viii Chapter1 Introduction Aprimarypurposeofevolutionaryoptimizationistoecientlyndgoo dsolutions tochallengingreal-worldproblemswithminimalpriorknowledgeaboutt heproblemitself. Thisdrivinggoalhascreatedsearchalgorithmsthatcanescapeus erbiastocreatetruly novelresults,sometimespublishableorpatentableintheirownright [18].Whileitisnot possibleforanyalgorithmtodobetterthanrandomsearchacross allpossibleproblems[38], eectivenesscanbeachievedbyassumingthesearchlandscapeha sstructureandthenbiasing thealgorithmtowardexploitingthatstructure. Inevolutionaryoptimization,andgeneticalgorithms(GAs)inparticu lar,searchisoften biasedthroughparameters.Thiscanbebenecialasitallowspract itionerstoinjecttheir knowledgeabouttheshapeofthesearchlandscapeintothealgorit hm.However,thequality ofsolutionsfound,andthespeedatwhichtheyarefound,isstron glytiedtosettingthese parameterscorrectly[8].Assuch,eitherexpertknowledgeorexp ensiveparametertuning[13] arerequiredtoleveragethisfeaturetoitsfullestpotential.Furth ermore,parameterssuch aspopulationsize,mutationrate,crossoverrate,tournaments ize,etc.usuallyhavenoclear relationshiptotheproblembeingsolved,meaningevendomainexpert smaynotunderstand howtheparameterswillinteractwiththeproblemorwitheachother .Tofurthercomplicate matters,thereismountingevidencethatparametervaluesshould changeduringsearch[11, 19]. Therehavebeenperiodiceortstoreduceorremovetheneedfor parametertuning. 1 [27]introducedself-adaptiveparameters,inwhichparametervalu eswereincludedineach solution'sgenomeandthemselvesunderwentevolution.Thisallowedt hesearchprocessitself tooptimizesomeofitsownparameters,resultinginareducedneedf orexperttuning.[15] wasabletodesignanentirelyparameter-lessGAbyleveragingschem atheoryandparallel populations.Unfortunatelythesemethodswereprovablylessec ientthandirectlysetting theparameterstooptimalvalues[24]. Oneareathathasbeeneectiveatreducingthenumberofalgorith mparametersis modelbasedsearch.[22]'sHierarchicalBayesianOptimizationAlgorit hm(hBOA)and[33]'s LinkageTreeGeneticAlgorithm(LTGA)bothrequireonlyasinglepara meter:population size.[25]leveragedmodelbuildingtocreateafullyparameter-lessa lgorithm,butitis restrictedtoonlyorder-k,fullydecomposable,noiselessproblems .MostrecentlyweintroducedtheParameter-lessPopulationPyram id(P3)[9].This methodusesapyramidstructureofpopulationstocombinemodelb asedsearchwithlocal searchtoachieveparameter-lessoptimization.Initialresultssug gestthat,unlikeprevious parameter-lessmethods,P3ismoreecientthancurrentstate- of-the-artparameterized searchalgorithms.Inthisworkweshall:extendtheseresultstoco vermorecomparisonal- gorithms;comparebotheciencyinreachingtheglobaloptimumand intermediatetnesses; analyzealgorithmcomplexity;andprovidemoreindepthanalysisofP3 itself. 2 Chapter2 ComparisonOptimizers InordertofullyunderstandtheeectivenessofP3,wecompareit withveadvanced algorithmsthathaverelatedfeaturestoP3.TheRandomRestart HillClimberdenedby[9] waschosenasanecientformofrepeatedlocalsearch.AsP3com binesthishillclimberwith crossover,comparingwithlocalsearchaloneshowstheadvantag esofP3'soverallapproach. The(1+( ; ))algorithm[5]isthecurrentbesttheorysupportedsimplegenetic algorithm anditsmethodofcrossoverisinsomesenseamacro-mutationjust asinP3.hBOAand Parameter-lesshBOAareadvancedmodelbuildingsearchtechniqu esthatareeectiveat learningcomplexproblemstructure,designedtoachievesimilargoals asP3'slinkagelearning butusingveryderentmethods.FinallyLTGArepresentsthecurr entstate-of-the-artin black-boxsearchandistheoriginofP3'slinkagelearningandcrossov ermethods. OnlyhBOAandLTGArequireanyparameters,witheachoftheseonly requiringa populationsize.Thismakesknowingtheoptimalbehaviorofthesealg orithmsmuchmore tractable.Allofthealgorithmsarealsogeneorderindependent, tnessscaleinvariant,and unbiased.Thismeans,foranyproblem,theorderinwhichproblemva riablesappearinthe genomecanbechangedwithoutchangingthebehaviorofthesearc h.Thetnesscanalso bemanipulatedinanyfashionaslongastherankorderingofsolutions isunchanged.These algorithmsarealsounaectedbythemeaningassignedtoeachbit,s uchthatinvertinga predeterminedrandomsubsetofgenesbeforeevaluationwillnotim pactsearcheciency. Ourimplementationsofallofthesealgorithmsaswellasallofthepop ulationsize 3 1: procedure Hill-Climber 2: options [0 :::N 1] 3: tried ; 4: while j tried j < j options j do 5: forall index 2 shuffled ( options ) do 6: if index= 2 tried then 7: Flipbit index insolution 8: if solution'stnessincreased then 9: tried ; 10: else 11: Revertchange 12: endif 13: tried tried [f index g 14: endif 15: endfor 16: endwhile 17: endprocedure Figure2.1:Hillclimbingalgorithmusedtoimproverandomlygenerateds olutionsuntilno singlebitchangeresultsinatnessimprovement. information,rawresults,andprocessingscriptsareavailablefrom ourwebsite. 1 2.1RandomRestartHillClimber Perhapsthesimplestblack-boxsearchheuristicisstochasticlocal search,alsoknowas hillclimbing.Thisoptimizationtechniquefocusesonimprovingasingleso lutionuntilit reachesalocaloptimum.Hereweusetherst-improvementhillclimb erdenedby[9]and giveninFigure2.1.Thisalgorithmworksbyippingeachbitinarandomor der,keeping modcationswhentnessisimproved,untilsinglebitipscannotres ultinfurthertness improvements. Thehillclimberrequiresanamortizedcostof O (1)operationsperevaluation.Inorder toterminate,atleastoneevaluationmustbeperformedforeacho fthe N bitsinthesolution. Assuchanyoperationthathappensonlyoncepersearchcanbeam ortizedoveratleast N evaluations,coveringtheinitializationof options onLine2.Line6,whichpreventswasted 1 https://github.com/brianwgoldman/FastEfficientP3 4 evaluations,canbecalledatmosttwiceperevaluation:oncetoadd index into tried and oncetoprevent index frombeingunnecessarilyevaluatedagain.Theonlywaythreeormor e callscouldhappenisifnotnessimprovementwasmadefortheentire previousiteration, whichcontradictstheloopinvariant. Duetoitsnature,thishillclimbercannotescapebasinsofattractio n.Onceasolution isreachedsuchthatnoneofthesinglebitneighborsaretnessimpr ovements,searchstops. Thusthisalgorithmrequiresarestartmechanismtosolvemultimodal problems.Wehave chosenheretonaelyrestartsearchfromarandomsolutionwhe neveralocaloptimais found.Thisensuresthatonalllandscapesthereisalwaysanon-ze roprobabilityofsearch ndingtheglobaloptimum. 2.2 (1+( ; )) [5]presentedtherstgeneticalgorithmtoprovablyshowtheadva ntagesofperforming crossoverontheproblemknownasOneMax.Thiscomparativelysimp lealgorithmmaintains onlyasingleindividualandaself-controlledparameter .Eachiteration,thenumberofbitstoipischosenfromthebinomiald istribution b ˘ B ( N; N ),where N isthenumberofbitsinthegenome.Next, b c ospringareproduced byipping b bits.Thebestmutantthenproduces b c ospringviauniformcrossoverwith theoriginalparent,suchthateachgenecomesfromthemutantw ithprobability 1 .Inthe originalalgorithmthebestospringproducedbycrossoverthenr eplacestheoriginalparent ifitstnessisnoworse.The parameter,whichisinitializedto1,isdecreasedifthe ospringreplaceditsparentandincreasedotherwise. Theoriginalformulationwasdesignedspeccallyforunimodallandsc apesandassuch werenotdirectlysuitableformultimodalproblems.[9]extended(1+ ( ; ))toinclude randomrestarts.Assearchstagnates,the parameterincreasesinvalue.Eventuallythis resultsin N causingmutationtoalwaysipallbitsoftheindividual.Asthisprevent s anyfutureimprovement,whenever N searchisrestartedfromarandomsolutionwith 5 resetto1. Afewothereciencymodcationswerealsomade.Ifthereisatieinc rossoverospring tness,whicheverhasalargerhammingdistancefromtheparentis retained.Thisencourages driftingacrossplateaus.The\mod"controlstrategyproposed by[5]wasnotusedasit conictedwiththerandomrestartstrategy.Ifacrossoverindiv idualisidenticaltoeither ofitsparents,itisnotevaluated.Ifmutationproducesanosprin gthatisbetterthanthe bestcrossoverospring,itisusedtocompareagainsttheoriginal parent. 2.3HierarchicalBayesianOptimizationAlgorithm [22]usedstatisticalprinciplesincombinationwithadecisiontreestru cturetocreate theHierarchicalBayesianOptimizationAlgorithm(hBOA).Thismetho dcreatesamodel ofepistaticrelationshipsbetweengeneswhichisthenusedtostoch asticallygeneratenew solutions.Eachgenerationabinarytournamentwithreplacementis usedtoselect solutions fromthepopulation.Thesesolutionsarethenusedtobuildthemode l,whichinturnisused togenerate newsolutions.Thenewsolutionsarethenintegratedintothepopula tionusing restrictedtournamentreplacement. Conceptually,themodelbuiltbyhBOAistryingtoinferrulesofthefo rm\Given thatthissubsetofgenesaresettothesevalues,howfrequently isgene x i settovaluev?" Thiscanberepresentedusingadirectedacyclicdecisionforest,wit heachtreeintheforest representingonegeneinthesolution.Inthedecisiontree T i ,whichisusedtosetthevalue ofgene x i ,eachinternalnoderepresentspreviousdecisionsonhowtosets omeothergene x j ,withthechildrenofthatnoderepresentinghowthedecisionwasma de.Theleavesof eachtreegivetheprobabilitythat x i shouldbesettooneofthepossiblegenevalues. Theforestisconstructediteratively,witheachtreeinitiallycontain ingasingleleafand witheachleafstoringapointerforeachselectedsolution.Eachiter ationthealgorithm considersallpossiblewaysofsplittinganexistingleafusinganotherg ene x j ,suchthat solutionsintheleafaremovedtothenewlycreatedleavesbasedont heirvaluefor x j .The 6 generalgoalistoseparatethesolutionssuchthatallsolutionswit h x i =0movetooneleaf whilesolutionswith x j =1movetotheother. ThisgoalisformalizedusingmodelscoringfromBayesianstatistics. Initsrawformthis almostalwayscreatesnearinnitesimalresults,calculatingfractio nsthatincludefactorialsof andproductsover N .However,throughalgebraicmanipulationdiscussedinAppendix11, wederivedasimpedformshowninEquation2.1.Here l isaleafintree i ,with l 0 and l 00 theresultsofsplitting l .m i ( l )isthenumberofsolutionsthatreach l and m i ( x i ;l )isthe numberofsolutionsthatreach l withthegivenvaluefor x i .Ifnoproposedsplitsatises theinequality,iterationstops.Ifmultiplesplitsdo,whichevermaximiz estherightsideis chosen. 2 0 : 5 log 2 < ( m i ( l )+1)! m i (0 ;l )! m i (1 ;l )! m i (0 ;l 0 )! m i (1 ;l 0 )! m i (0 ;l 00 )! m i (1 ;l 00 )! ( m i ( l 0 )+1)!( m i ( l 00 )+1)! (2.1) Initiallythereare( N 2 )possiblewaystosplitexistingleaves,aseachofthe N single nodetreescanbesplitbyanyoftheother N 1genes.Eachiterationanewedgeisadded tothedecisionforest,meaningsomeofthepreviouslytestedsplits cannotbeused.For instance,if T i ,whichisusedtodecidethevalueof x i ,issplitusingthevalueof x j ,T j canno longerbesplitusing x i .Asasplitcreatestwonewleaves, O ( N )newpotentialsplitsmust alsobetested.Equation2.1parsesallsolutionsthatreachaleafto countgenefrequencies, requiring time.Thenumberoftotalleavescreateddependsheavilyonthepr oblemand .However,assumingnosplitsareacceptedorthatthecostoftest ingallfuturesplitsisless thantheinitial( N 2 ),constructingthemodelrequires( N 2 )time.Eachmodelisused togenerate solutions,leadingtoacostperevaluationof( N 2 ). Togenerateasolution,thevalueofeachgene x i issetusingitscorrespondingdecision tree T i .Becausetheforestisdirectedacyclic,theremustbeanordering of T i suchthat, before T i isexecuted,all x j itusestomakedecisionshavealreadybeenset.Assuch,previous decisionsmadebyothertreesareusedtofolloweach T i untilaleafisreached.Thevalueof x i isthensetbasedontheprobabilitythatothersolutionsreachedth atleafwitheachvalue 7 of x i . Toperformreplacement,hBOAusesrestrictedtournamentrepla cement.Aftereach newsolutionisgeneratedandevaluated,asetof w solutionsarechosenatrandomfrom thepopulation,where w =min f N; 20 g .Fromthissetthesolutionwhichismostgenetically similartotheospringischosen.Iftheospringisatleastastasth echosensolution,it replacesthechosensolutioninthepopulation.Otherwisetheospr ingisdiscarded.This methodisdesignedtopreservegeneticdiversityasonlygeneticallys imilarsolutionscompete ontness. hBOAisdesignedtoworkwithlargepopulationsizes,resultinginalarge number ofevaluationspergeneration.AshBOAutilizesexplicitdiversitymaint enance,standard methodsfordeterminingconvergencearenotconsideredveryac curate.Thereforetheauthors suggestthatanhBOArunshouldbeterminatedafterperformingg enerationsequalto N .Likeothermodelbasedtechniques,hBOAhasfewparameters.Th ereisnomutation orcrossover,andmodelingdoesnotrelyonanyexplicitparameters .Solutionselection, generation,andreplacementareallderivedfromthepopulationsiz e,whichmustbesetby theuser. 2.4Parameter-lesshBOA Usingthemethodsrstintroducedby[15]fortheParameter-less GA,[23]created Parameter-lesshBOAwhichautomaticallyscalesitspopulationsizeto ttheproblem.This isdonebymaintainingalistofconcurrentpopulationsusingexponent iallyscaledpopulation sizes. ArunofParameter-lesshBOAstartswithasinglepopulationofsize 0 ,conventionally setto 0 =10.Aftertwogenerationsareperformed,anewpopulationofsiz e 1 =2 0 iscreatedandperformsageneration.Evolutionthencontinueswit hthe 0 population performingtwogenerationsforeachoneperformedby 1 .Eachtimepopulation i performs itssecondgenerationanewpopulation i +1 =2 i iscreated,whichperformsgenerations 8 halfasoftenas i .Inthiswayaninnitenumberofparallelpopulationcanbesimulated, witheachpopulationreceivingthesamenumberoftotalevaluations .InallotheraspectseachpopulationisidenticaltoanhBOApopulatio nusingaxed .Nosearchinformationissharedamongpopulations,andeachsear chisindependently terminated.AssuchParameter-lesshBOAcannotperformbette rthanhBOAusingthe optimalpopulationsizeforagiveninstance,asitmustalsospendeva luationsonpopulations ofderentsizes.Thisineciencyisboundedbyalogmultipleofthetot alnumberof evaluations[24]. 2.5LinkageTreeGeneticAlgorithm [33]introducedtheLinkageTreeGeneticAlgorithm(LTGA)whichaut omaticallyde- tectsandexploitsproblemepistasisbyexaminingpairwisegeneentro py.Duetoitsenhanced abilitytopreservehightnessgenesubsets,LTGAwasabletooutp erformstate-of-the-art GAsacrossmanybenchmarks.Sinceitsintroduction,manyvariant sofLTGAhavebeen proposed[34,12]soforclaritywehavechosentheversionpresen tedby[35]asourmodel. LTGA'seectivenesscomesfromitsmethodofperformingcrossov er.Insteadofblindly mixinggenesbetweenparents,LTGAattemptstopreserveimport antinterrelationshipsbe- tweengenes.Beforeperforminganycrossoversinageneration, LTGArstbuildsasetof hierarchicalgeneclustersthatarethenusedtodictatehowgene saremixedduringcrossover. Figure2.2providestheagglomerativemethodLTGAusestocreateg eneclusters.This algorithmcreatesatreeofsetsusingpairwisegeneentropy,such thattheleavesofthetree containasinglegeneandinternalnodesaretheunionoftheirchildre n'ssets.Thesesets arethenusedbycrossovertospecifyepistaticrelationshipsthat shouldbepreserved.The processbeginsbycreatingthesetofsets unmerged thattracksalltop-levelclusters.Initially unmerged containssinglemembersetsforeachgene.Aftereachiterationth etwosetswith theminimumaveragepairwisedistance(giveninEquation2.2)aremerg edtocreateasingle cluster.Thisprocessisrepeateduntilonlyasinglesetremainsin unmerged whichcontains 9 1: procedure Cluster-Creation 2: unmerged ff 0 g ;f 1 g ;f 2 g ;:::; f N 1 gg 3: useful unmerged 4: while j unmerged j > 1 do 5: C i ;C j min C i ;C j 2 unmerged D( C i ;C j ) 6: unmerged unmerged f C i ;C j g + f C i [ C j g 7: useful useful + f C i [ C j g 8: if D( C i ;C j )=0 then 9: useful useful f C i ;C j g 10: endif 11: endwhile 12: Order useful basedonlastmergedrst 13: Removelargestclusterfrom useful return useful 14: endprocedure Figure2.2:AlgorithmdescribinghowLTGAcreatesclustersusingEqu ation2.2forapop- ulation. unmerged and useful areorderedsetsofsetsofgeneloci. allofthegenesinthegenome. D( C i ;C j )= 1 j C i jj C j j X c i 2 C i X c j 2 C j 2 H ( c i )+ H ( c j ) H ( c i [ c j ) (2.2) H ( c )= X s 2 S p c ( s )log( p c ( s ))(2.3) Throughoutthisprocess useful tracksthesetofallgeneclustersthatshouldbepre- servedforusebycrossover.Thissetbeginswithallgenesinsepar ateclusters,andeachtime anewclusteriscreateditisaddedto useful .However,notallclustersarenecessarilyworth keeping.Forinstance,inallversionsofLTGAtheclustercontaining allgenesisremoved from useful aspreservingallgenesduringcrossovercanonlycreateclones.[3 5]extended thisremovaltoincludeanyunsupportedsubsets.Ifthepairwised istancebetweentwoclus- tersis0,thismeanstherearenoindividualsinthepopulationthatdisr upttherelationships betweenthetwoclusters.Therefore,whenperformingcrossov er,thereisnoreasontobelieve atnessimprovementcanbeachievedbybreakingthestoredpatt ern.Assuchacluster isonlykeptifitsdirectsupersethasanon-zerodistance.Asanal step,Line12reorders useful suchthatclustersappearinreversedorderfromwhichtheywere addedto useful .10 1: procedure Cluster-Usage 2: forall C i 2 useful do 3: d rand choice ( P ) 4: Copy d 'sgenevaluesfor C i intosolution 5: if solutionwaschanged then 6: if solution'stnessdecreased then 7: Revertchanges 8: endif 9: endif 10: endfor 11: endprocedure Figure2.3:Algorithmdescribinghowclustersareusedtoperformcr ossover. Thusthemostlinkedclustersandthosecontainingsinglegenesappe arattheendofthe returnedlist. [35]'sversionofLTGAdoesnotusetheentirepopulationwhendeterm iningpairwise entropy.Instead,binarytournamentisusedtoselecthalfofthe population.Thisisdoneto ensurethemodelisbuiltusingonlyhigh-qualitysolutions,evenduring therstgeneration. Inordertoecientlyperformclustering,apairwisegenefrequenc ytableisconstructed fromtheselectedsolutions.TocalculateEquation2.2,Equation2.3is calledforeachgene ( H ( c i ))andpairofgenes( H ( c i [ c j )).Extractingthisinformationrequires O ( N 2 )time, where isthepopulationsizeand N isthegenomesize.Theprocessofconvertingthis pairwisefrequencyinformationintoclusterscanbeachievedin O ( N 2 )usingthebookkeeping methodspresentedby[14].Thiscostisperformedonlyoncepergen eration,andisthenused toperformapproximately O ( N )crossoverevaluations.Asaresult,theamortizedcostof LTGA'smodelbuildingis O ( N ). Figure2.3describeshowtheidentedclustersareusedbycrossov ertopreservegene linkagewhilestillexploringthesearchspace.Unlikemoretraditionalc rossovermethods, LTGAcrosseseachindividualwiththeentirepopulation.Also,topro duceasingleospring, multipleevaluationsofthetnessfunctionareperformed. Duringeachgeneration,everyindividualinthepopulationundergoe scrossover.Ina singlecrossoverevent,eachclusterofgenes C i in useful isappliedasacrossovermask.A 11 randomdonor d ischosenfromtheentirepopulation(notjustthemodelselectedp opulation), and d 'sgene'sfor C i arecopiedintotheworkingsolution.Ifamodcationismade,an evaluationisthenperformed.Ifthecrossoverresultedinnowors etnessthenthechanges arekept,whichallowsforneutraldriftacrossplateaus.Theresu ltingsolution,whichmust beatleastastasitsparent,isthencopiedintothenextgeneratio n. Intotaleachindividualcancauseupto j useful j evaluations.Ifallclusterswerekept, eventhosedeemedunhelpful,andalldonationswereevaluated,ev enthosethatdidnot changeanygenes,then Cluster-Usage wouldperformexactly2 N 2evaluationsforeach ofthe solutionsinthepopulation.Thisprovidestheamortizingevaluationsr equiredto makeclusteringonly O ( N )operationsperevaluation.However,byskippingsomeevalua- tions,itispossiblethatclusteringmaybesuper-linear. LTGAhasnoexplicitformofdiversitycontrolandhasnomethodfor introducingnew geneticinformationoncethepopulationhasconverged.Therefor eanLTGArunisconsidered convergedwhentwoconsecutivepopulationscontainthesameuniq uesolutions. Bydesign,LTGAonlyhasasingleparameter:populationsize.LTGAus esnomutation, andcrossoverisdenedintermsoftheclusteringalgorithm.Select ionbetweengenerationsis fullyelitistandembeddedinthecrossover,withselectionofmodelbu ildingsolutionsxedto abinarytournament.Neither Cluster-Creation nor Cluster-Usage relyonparameter values.LTGAdoesnotprovideanymethodforcontrollingorsetting thepopulationsize, relyinginsteadonaxeduser-specedsize. 2.6ComparisonAlgorithmParameterTuning Whilefourofthesixalgorithmsinourexperimentsdonotrequireanyu ser-speced parameters,hBOAandLTGAbothuseapopulationsizeparameter. Toensurethesetech- niquesarenotunfairlyhandicapped,weextensivelytunedeachusin gthebisectionmethod [29]todeterminetheoptimalpopulationsizeforeachproblemsize.E xtendedby[12],this methoditerativelydoublesthepopulationsizeuntilsomesuccesscr iteriaismetandthen 12 102103104105106101101.5 102102.5 103Problem Size Population Size Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (a)TunedPopulationSizesforhBOA 102103104105106101101.5 102102.5 103Problem Size Population Size Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (b)TunedPopulationSizesforLTGA Figure2.4:Optimalpopulationsizesfoundusingbisectiononeachsize foreachproblem. performsbisectionbetweenthesmallestsuccessfulandthelarge stunsuccessfulsizes.Inthis waytheminimumpopulationsizethatmeetsthesuccesscriteriaisfou nd.[9]proposeda successcriteriaofperforming r successfulrunsinarow,suchthattheexpectedfailurerate canbeboundedaboveby 3 r +1 [17].AsP3andtheotherthreealgorithmsdonotprema- turelyconverge,wechose r =100tosimilarlyensurehBOAandLTGAalmostneverdo.As bisectioncanmakeinnitelylargepopulationsizes,anyrunthathadn otfoundtheglobal optimumafter100millionevaluationsor128computinghourswascons ideredunsuccessful. Figure2.4showstheresultsfromperformingbisectiononallproblem stobeusedas comparisonbenchmarksinChapter5.IngeneralhBOArequiredpo pulationsizesthatwere atleastanorderofmagnitudelargerthanLTGA.Duetoruntimeand memoryoverhead, ndingtheoptimalvalueforhBOAwasmuchlesstractablethanforL TGAonmoderateto largeproblemsizes.LTGA'spopulationsizealsogrewsigncantlyslowe rthanhBOA'sas problemsizeincreased,especiallyonthetwoTrapproblemsandRast rigin.Bothalgorithms wereineectiveonMAX-SAT,withneitherabletotunetoproblemssiz esover60bits.This 13 islikelyduetothefactthatsomerandomlygeneratedMAX-SATlands capesarequiteat andhighlydeceptive[26]. Whilenotcurrentlytreatedasaparameter,wealsoperformedpre liminarytestsof integratinghillclimbingintoLTGAandhBOAasthisistheprocedureuse dinP3.To matchP3,weappliedrst-improvementhillclimbingtoeachalgorithm's initialpopulation. Wethenperformedbisectiononthemodedalgorithmsforthelarge stproblemsizeswhere hBOAwithouthillclimbingwaseective.Wefoundthatingeneralboth methodsperformed worsewhencombinedwithhillclimbing,insomecasesuptoanorderofm agnitudeworse. Therewerethreeexceptions:bothimprovedonMAX-SATandhBOA improvedonRastrigin. Inallcasestheinclusionofhillclimbingdidnotresultineitheralgorithmo utperformingP3in termsofevaluationsrequiredtoreachtheglobaloptimum.Assuch ,allfurtherexperiments usetheunmoded,publishedversions. 14 Chapter3 Parameter-lessPopulationPyramid [9]introducedtheParameter-lessPopulationPyramid(P3)asamet hodforperforming optimizationthatdoesnotrequiretheusertoprovideanyparamet ers.Thisisachieved bycombiningecientlocalsearchwiththemodelbuildingmethodsofL TGAusingan iterativelyconstructedhierarchyofpopulations. ThehighlevelalgorithmofP3ispresentedinFigure3.1.Unlikemoretra ditionalGAs, P3doesnotfollowagenerationalmodel.Instead,itmaintainsaniter ativelyexpanding pyramidofexpandingpopulations.Eachiteration,anewrandomsolu tionisgenerated. Thissolutionisbroughttoalocaloptimumusingthehillclimbingalgorithm inFigure2.1. Ifthatlocaloptimumhasnotyetbeenaddedtoanylevelofthepyr amid,thesolutionis addedtothelowestpopulation P 0 .Next,thesolutionisiterativelyimprovedbyapplyingLTGA'scrossove ralgorithm(Fig- ure2.3)witheachpopulation P i inthepyramid.Ifthisprocessresultsinastricttness improvementandhascreatedasolutionnotyetstoredinthepyram id,thatnewsolutionis addedtothenexthighestpyramidlevel P i +1 .If P i +1 doesnotyetexist,itiscreated.Inthis waypopulationsinthepyramidexpandovertime,andthenumberofp opulationsstored increasesovertime.Initiallythepyramidcontainsnosolutionsorpop ulations,meaningthe userdoesnotneedtospecifyapopulationsize. ToaccommodateP3'suniquepopulationstructure,someofLTGA'sc lusteringproce- duresweremoded.InLTGA,clustersareidentedatthestarto feachgenerationandare 15 1: procedure Iterate-P3 2: Createrandomsolution 3: Applyhillclimber 4: if solution = 2 hashset then 5: Addsolutionto P 0 andrebuild P 0 'smodel 6: Addsolutionto hashset 7: endif 8: forall P i 2 pyramid do 9: Mixsolutionwith P i 10: if solution'stnesshasimproved then 11: if solution = 2 hashset then 12: Addsolutionto P i +1 andrebuild P i +1 'smodel 13: Addsolutionto hashset 14: endif 15: endif 16: endfor 17: endprocedure Figure3.1:OneiterationofP3optimization. pyramid isanorderedsetofpopulationsand hashset isasetofallsolutionsin pyramid .usedtocreateallospringinthatgeneration.AsP3doesnotperf ormserialgenerations, P3insteadrebuildsthemodeleachtimeasolutionisaddedtoapopulat ion.Furthermore, unlikeourchosenvariantofLTGA,allsolutionsinthepopulationareu sedtogeneratethe model,notjustthewinnersofabinarytournament.Wecandothisb ecauseeventheworst solutionsinthepyramidarealreadyhighqualityduetothepreviousap plicationoflocal search.UsinglocalsearchinLTGAwasexaminedby[3]andfoundtop rovidenosigncant improvement.Alikelycausewasthatthatstudyappliedlocalsearch toeverysolution,not justtheinitialpopulation,resultinginsigncantoverhead. Beyondthechangesinpopulationstructuring,P3modesLTGA'sve rsionof Cluster- Creation and Cluster-Usage .P3changesLine12inFigure2.2from lastmergedrst orderingto smallestrst ordering.Thismethodappliesgeneclustersduringcrossoverbase d onhowmanygenesareineachcluster, 1 andnotonhowtightlylinkedthosegenesare.[12] foundthatthisalternativewasbetteratpreservingdiversity,an dthereforerequiredsmaller populations. 1 Tiesarebrokenrandomly. 16 P3alsomodedLine3inFigure2.3.Insteadofchoosingasinglegenetic donorforeach cluster,P3iteratesoverthepopulationinarandomorderuntilaso lutioninthepopulation isfoundthathasaleastonegenederentforthatclusterofgene sfromtheimproving solution.Thisprocessincreasesthelikelihoodofanevaluationbeingp erformedforevery cluster,andhelpstestraregenepatternsinthepopulation. InLTGAthecostofrebuildingthemodelis O ( N 2 )asitmustcollectpairwisegene frequencyinformationforall solutionsinthepopulation.P3doesnotstoreasingle population,anditdoesnothaveaxed sizeforanypopulation.However,eachtimea solutionisaddedtothepopulation,itrequires O ( N 2 )timetoupdatethetableofpairwise frequenciesandanother O ( N 2 )timetorebuildthelinkagemodel.Themodelisthenused immediatelytoperformuptooneevaluationforeachoftheupto2 N 2clusters.Justas inLTGA,ifnoevaluationshortcutsweremade,P3hasanamortizedc ostof O ( N )modeling costperevaluation.WhileP3doesrebuildthemodelmorefrequently persolutioninthe population,italsoperformsanumberoflocalsearchevaluationsth atarequiteecient, meaningtheoreticalcomparisonsoftheirspeedaredculttoperf orm.Asanalnote,P3's repeatedattemptstondausefuldonationmakeitlesslikelythanL TGAtoskipevaluations, buthasanaddedcosttondthesedonations.Repeateddonation scouldrequireasmuchas O ( )attemptsperevaluation,butexperimentalevidencesuggestth atthisoperationactually savesmoreoverheadthanitcostsbyincreasingthenumberofeva luationspermodelrebuild. EachofthepiecesoftheP3algorithmwereselectednotjustforth eirstandaloneecacy, butforthewaysinwhichtheyinteract.Byusingthehillclimbertoopt imizerandomly generatedsolutions,theunderlyingpairwiserelationshipsinthepro blemareexposed.Asa result,detectingclustersforusebycrossoverismuchmoreeec tive.Thecrossoveroperator isextremelyelitist,aseachgenedonationmustresultinnotnesslos s,andasolutionmust strictlyimprovetobeaddedtothenextlevelofthepyramid.Thisisb alancedbycontinual integrationofnewrandomlygenerated,thenlocallyoptimized,solut ions.Furthermore,each randomrestartdecreasestheprobabilityofspuriouslinkagescau sedbysharedancestry.This 17 diversityisfurtherpreservedbyapplyinggeneclustersinsmallest rstorderduringcrossover asthisreducestheprobabilityofgenetichitchhikers. Otheralgorithmshaveproposedusingmultipleconcurrentpopulatio ns.[16]hada hierarchyofpopulationswithsolutionsperiodicallyadvancingupward .Thisallowsforcon- tinuousintegrationofdiversityasthelowestpopulationisreseeded withrandomsolutions. However,thismethodresultedinincreasedparameterizationasno tonlywasapopulation sizerequired,butalsonewparametersforhowfrequentlygenera tionsadvancedbetween levelsandhowmanytotallevelstohave.[15]usedmultipleindependen tpopulationsofdif- ferentsizesasamethodforremovingthepopulationsizeparamete r,butdoingwasprovably lessecientthanusinganoptimalpopulationsizeasnoinformationiss haredbetweenthe populations. 18 Chapter4 ProblemDescriptions 4.1SingleInstanceProblems Understandinghowastochasticsearchalgorithmwillbehaveonarb itraryandcomplex searchlandscapescanbeexceedinglydcult.Thereforeacommon practiceforalgorithm understandingistoperformsearchonwelldened,wellundersto odlandscapes.Tobeof interesttheselandscapesneedtorepresentinterestingandimpo rtantaspectsofreal-world problems. OnesuchlandscapeistheDeceptiveTrapproblem[8].Inthislandscap ethegenome isbrokenupinto k bitnon-overlappingsubproblemsreferredtoas traps .Eachsubproblem isscoredusingEquation4.1,where t isthenumberofbitsinthetrapsetto1.Theglobal optimumineachtrapisastringofall1s,whileallothersolutionsleadto alocaloptima ofall0s.Thisproblemtestsanalgorithm'sabilitytoovercome k sizeddeceptionandis commonlyusedtodeterminehoweectivecrossoverisatpreservin gbuildingblocks.Any crossovereventthatmixesbitsfromderentparentsinthesame trapwilllikelyresultin thattrapbeingoptimizedtothelocaloptima.Forourexperimentsw echose k =7toensure highlydeceptivetraps. trap ( t )= 8 > < > : k 1 t;t 1. However,ifepistasisissetsuchthateachgenedependsonthe K directlyfollowingitin thegenome,thesolutioncanbefoundinpolynomialtime[39].TheseNe arestNeighborNK landscapesarethereforeidealforsearchalgorithmtesting.For allofourexperimentsusing NearestNeighborNKwexed K =5toensurehighlyruggedlandscapes. [30]presentsacombinatorialbenchmarkproblemderivedfromphy sics:IsingSpin Glasses.Aspinglassisdenedbyaweightedgraphofinteractionter msbetweenver- tices.Eachgeneassignsavaluetoeachvertex,withthetnessca lculatedbyEquation4.4. Inthisequation, E isthesetofalledges, e ij istheedgeweightconnectingvertex i tovertex j ,and x i and x j arethegenevaluesforvertex i and j .Optimaltnessiswhenthissumis 21 minimized. X e ij 2 E x i e ij x j (4.4) SimilartoNKLandscapes,thegeneralclassis NP -Hardtooptimize,butthe2 D J subset ofIsingSpinGlassescanbepolynomiallysolved. 1 Inthissubsetthegraphisrestrictedtobe atwo-dimensionaltorus,edgeweightsarerandomlysettoeither- 1or1,andvertexvalues mustbe-1or1. AsournalclassofrandomlygeneratedproblemswechosetheMax imumSatisability (MAX-SAT)problem.Relatedtothemorecommon3-SATproblem,aM AX-SATinstanceis denedbyasetofthree{termclauses.Eachtermisarandomlycho senvariable,whichmay alsobenegated.Aclausescoresifanandonlyifatleastoneterminth eclauseevaluates totrue.InordertomakeMAX-SATinstanceswithaknownglobalop timum,[9]proposed constructingclausesaroundaxedsolution.Inthiswaythesignso fthetermsaresetto ensurethetargetsolutionsatisestheclause.Toensureeachpr oblemischallengingwe choseaclause-to-variableratioof4 :27[31]. 1 http://www.informatik.uni-koeln.de/spinglass/ 22 Chapter5 BenchmarkingP3 5.1FindingtheGlobalOptimum Figure5.1showsthemediannumberofevaluationsrequiredbyeacho fthesixalgorithms tondtheglobaloptimumformultiplesizesofeachproblem.Eachdat apointinFigure5.1 representsthemedianof100runs,whereunsuccessfulrunsar etreatedasrequiringmore evaluationsthananysuccessfulrun.Ifthemedianrunwasnotsu ccessfulnopointisshown. Mediansareusedasthedataisnotnormallydistributed,andbecaus eitallowsformore meaningfulcomparisonbetweentechniqueswithderentsuccess rates.ForLTGA,the maximumproblemsizeusedforeachproblemwassettobethelargest ,optimalproblem sizewecouldfeasiblydetermine.ForHBOA,resultsonmanylargepro blemsarenotshown duetotheextremecomputationalcostrequiredtooptimallydeter minethepopulationsize. 5.1.1QuantitativeComparison Ofthe130testedcongurations,P3foundtheglobaloptimumusin gtheleastmedian evaluationson114.ThelargestproblemsizeforanyproblemwhereP 3wasnotthemost ecienthas49bits,withP3achievingthebestresultsonall92larger congurations.hBOA, LTGA,andParameter-lesshBOAonlyoutperformP3onthesmallest 5,4,and1Deceptive StepTrapinstances,respectively.RandomRestartHillClimbingout performsP3onthe smallest3NearestNeighborNKinstancesandthesmallestIsingSpinG lass.(1+( ; ))has 23 103104105106107108101.5 102102.5 Deceptive Trap 104105106107101.5 102102.5 Deceptive Step Trap 103104105106101.5 102102.5 103HIFF102103104105106107101101.5 102102.5 Rastrigin 103104105106107108101101.5 102102.5 Nearest Neighbor NK102103104105106107108101.5 102102.5 Ising Spin Glass102104106108101101.2 101.4 101.6 101.8 MAX-SAT Optimizer Hill Climber 1+(Lambda,Lambda) hBOA Parameter-less hBOA LTGA P3Problem Size Median Evaluations to Success Figure5.1:Comparisonofthemediannumberofevaluationstoreach theglobaloptimum forthesixderentoptimizationmethodswithrespecttoproblemsiz e.Ifthemedianrun didnotreachtheglobaloptimumnodataelementisshown.Resultsgiv enonalog-logscale. 24 themostsuccessoutperformingP3,doingsoonthesmallest5Dece ptiveTraps,3smallest DeceptiveStepTraps,and2smallestRastrigin.ThelikelihoodthatP3 wouldachievethese pairwiseresultsassumingitsmedianresultisactuallyworseis p< 10 15 accordingtothe binomialtest.PairwisecomparisonofLTGAandP3onthelargestpro blemsizeusingthe Mann-WhitneyUtestresultsin p< 10 5 forallproblems. 5.1.2LocalSearch TheRandomRestartHillClimberand(1+( ; ))arebothrelativelyeectiveonsmall problemsizes.Thisisespeciallytrueforthethreerandomlygenerat edproblemclasses. Theseproblemsmaycontainrelativelyfewlocaloptimaorjustbeexc eptionallydcultfor themodel-basedalgorithms.OnDeceptiveTrapandDeceptiveStep Trapusing4orfewer traps,(1+( ; ))performssigncantlybetterthananyotheralgorithm.Webeliev ethisis because(1+( ; ))isabletoovercomedeceptionbyprobabilisticallyippingentiretrap s. Thisabilityalsoleads(1+( ; ))tooutperformtheRandomRestartHillClimberonall problemsexceptNearestNeighborNK. Onlargerproblemsizes,theabilityforlocalsearchtoreachtheglob aloptimumquickly diminishes.OnlyonMAX-SATaretheseoptimizerscompetitiveatlarge rtestedproblem sizes.However,webelievethisisbecausethelargesttestedMAX-S ATwasanorderof magnitudesmallerthanthelargestsizetestedformostotherprob lems.Astheproblemsize increasesthenumberoflocaloptimaincreasesexponentially,which explainswhyRandom RestartHillClimbingwasunabletoscale.Forlargerproblemsitalsobec omesincreasingly unlikelyfor(1+( ; ))tomaketherightcombinationofchangesrequiredtoreachtheg lobal optimum.Thisbehaviorcauseshighvarianceinsuccessrate,asevid entbytheoccasional successesonlargeDeceptiveTrapproblems. 25 5.1.3ModelBuilding Onlytechniquesthatexplicitlybuiltmodelsofgeneepistasiswereablet osolvethe largestprobleminstances.Onsingle-instanceproblemsLTGAwasmo reeectivethan hBOA,withhBOAoutperformingLTGAonNearestNeighborNKandIs ingSpinGlasses. Thismaybecausedbythederencesinmodelingmethod:unlikethesin gle-instanceprob- lems,geneepistasisintherandomlygeneratedproblemclassescann otbebeperfectlyrep- resentedwithalinkagetree. ConsideringhowderenthBOAandLTGAareinperformingoptimizatio n,itissome- whatsurprisinghowsimilartheirresultsareonHIFF.However,both techniquesrelyon populationslargeenoughtosupportthediversityrequiredtoreac htheglobaloptimumand tomodelepistasis.Bothtechniquesalsoonlyrebuildmodelsonceper generation.Asthe subproblemsofHIFFarenested,itisunlikelythateithertechniquec anaccuratelymodel higher-orderepistasisbeforesolvinglowerordersubproblems.Th ereforebothmethodsre- quireonegenerationpersubproblemorder. 5.1.4P3 Unliketheothermodel-basedmethods,P3generallyoutperformsb othRandomRestart HillClimberand(1+( ; ))evenonsmallproblemsizes.Unliketheotherlocalsearch methods,P3outperformsLTGAandhBOAevenonlargeproblemsize s.Thisimpliesthat P3isgainingthebenetsofeach,leveraginglocalsearchtosolveea syproblemsandmodel buildingtosolveharderones. Furthermore,theinteractionbetweenthesetwooptimizationtoo lsexplainssomeof thereasonP3outperformseachmethodalone.OnDeceptiveTrap P3'suseofhillclimb- ingensuresalltrapsareimmediatelyoptimized,allowingforperfectlin kagedetectionand high-qualitydonation.OnHIFFlocalsearchsolvesallpairwisesubpr oblems,savingP3 agenerationoverLTGAandhBOA.IncomparisonP3isonlyaslightimpr ovementon DeceptiveStepTrap,whichislessamenabletolocalsearch. 26 5.2FitnessOverTime Forsomeapplications,ndingtheglobaloptimumislessimportanttha nndinggood solutionsquickly.ThereforeweexaminethisbehaviorinFigure5.2.At regularintervals duringoptimizationFigure5.2showsthemedianofthebesttnesses foundatthattime pointofsearchacross100runs.Figure5.2showsthelargestprob lemsizeforwhichwe wereabletosuccessfullygatherresultsforallsixalgorithms,butt hetrendsshownare representativeofalllargerproblemsizes.Themaximumreportingin tervalissettoinclude theslowestP3runtoreachtheglobaloptimum. 5.2.1QuantitativeComparison Of181samplepoints,P3hadthehighestmediantnessin121.Inpair wisecompetition, (1+( ; ))wasthemostlikelytooutperformP3,doingsoon50samplepoints. LTGA, hBOA,andParameter-lesshBOAwerethenextbest,outperform ingP3on27,20,and18 samplepoints,respectively.RandomRestartHillClimbingalmostneve routperformedP3, doingsoonly9times.ThelikelihoodthatP3wouldachievethesepairwise resultsassuming itsmedianresultisactuallyworseis p< 10 9 accordingtothebinomialtest. 5.2.2LocalSearch Perhapsthemoststrikingresultisthequalityof(1+( ; )).Untilquitefarinto searchthismethodperformsbetterthanbothLTGAandhBOA.Giv ensucientevaluations (1+( ; ))alsooutperformsRandomRestartHillClimbingonall7problems.Fo rbrief periodsinthemiddleofsearchitperformsthebestofalltechniques on:DeceptiveTrap; DeceptiveStepTrap;HIFF;IsingSpinGlass;andMAX-SATproblems .(1+( ; ))'sability toecientlyincorporategenemodcationsoflargerthanonebitallo wsittoovercome thedeceptionandplateausinDeceptiveTrapandDeceptiveStepTr ap,solvemedium-sized subproblemsinHIFF,ipthesignsonmultipleadjacentbitsinIsingSpin Glass,andcross plateausinMAX-SAT.However,thismethodisslowinreachingtheglob aloptimainmanyof 27 0.40.60.81.0101102103104105Deceptive Trap 203 0.40.60.81.0101102103104105106107Deceptive Step Trap 203 0.250.500.751.00101102103104105HIFF 5120.60.70.80.91.0101102103104105Rastrigin 350 0.60.70.80.91.0102104106Nearest Neighbor NK 2000.60.70.80.91.0101102103104105106Ising Spin Glass 4840.9250.9500.9751.000101102103104105MAX-SAT 40 Optimizer Hill Climber 1+(Lambda,Lambda) hBOA Parameter-less hBOA LTGA P3Evaluations Median Best FitnessFigure5.2:Comparesthemedianbesttnessreachedduringsearc hforeachofthesix optimizationmethods. 28 theseproblemswhichcausesittoeventuallybeovertakenbythemo delbuildingtechniques. 5.2.3ModelBuilding BothhBOAandLTGAaremarkedbyperiodsoflittleimprovementfollow edbyrapid improvement.InhBOAthisistakentotheextreme,withalltnessim provementmadeat theveryendofsearch.Inbothcasesthisiscausedbymodelbuildin g.Beforethemodelis accuratelittleimprovementismade.Onceitisaccurate,tnessimpr ovesdramatically. At58%oftherecordingintervalshBOAhastheworsttnessofany solver.Mostofthe exceptionsoccurwhenhBOAisstillevaluatingitsinitialpopulation,allo wingthisrandom searchtotemporarilysurpassthelocalsearchmethods.After N evaluations,however,hBOA andLTGAbothfallbehinduntiltheirmodelsbegintoimprove.Parame ter-lesshBOA reachesintermediatetnessesfasterthanhBOA,doingsoon62% ofintervals,asitsmodels begintooptimizeearlierthanhBOA.However,thistrendisreversed afterasucientnumber ofevaluations,mostclearlyonDeceptiveStepTrapandIsingSpinGla sses,ashBOA'stuned populationovertakesParameter-lesshBOA'sparallelpopulations. OneveryproblemLTGAhasvedistinctperiods:tnessplateau,ne arinstantaneous improvement,tnessplateau,andimprovementtoglobaloptimum. Theearlyperiodcorre- spondswithinitializationofthepopulation,withthersttnessgaina chievedimmediately uponcompletingtherstgeneration.Whenusinganinaccuratemod el,LTGA'smixing strategyperformsasortoflesseectivelocalsearch.Subsequ entgenerationsthenmake onlyminortnessimprovements.Oncethemodelbecomesaccurat eandtheprobabilityof acrossoverusinghigh-qualitygeneticmaterialincreasessucient ly,LTGAentersasecond periodofrapidimprovement. 5.2.4P3 TheintegrationofhillclimbingintoP3makesitstrictlybetterthanusin ghillclimbing alone.EarlyinoptimizationP3andtheRandomRestartHillClimberhave eectively 29 identicalquality.ThisisbecauseP3performsthesameevaluationsa stheHillClimberfor thersttworestarts.OnceP3beginsperformingcrossoveritimm ediatelyimprovesover theHillClimber.In95%ofintervalsP3hadatnessatleastashighasH illClimbing.As suchP3isbetterthanasimplehillclimberregardlessofhowlongeacht echniqueisrunand irrespectiveofhowhighqualitythesolutionfoundhastobe. Unlikethemodel-basedmethods,whichstruggleuntilmodelaccura cyimproves,P3's iterativesolutionintegrationallowsittoimprovemuchmorequickly.Th isbehaviorexists inmostproblems,butiseasiesttounderstandonDeceptiveTrap.O nthisproblem,P3 immediatelybringsalltrapstolocaloptima,equaledonlybytheRando mRestartHill Climberinquality.IncomparisonLTGAmustevaluatetheentirepopula tionandperform multiplegenerationstoreachsimilarquality.P3isabletoimmediatelyinte grateoptimal versionsofeachtrapintoasingleindividualastheyarefoundbyloca lsearch,resultingin smoothertnessimprovementthanLTGA. 5.3ComputationalExpenses Whileitiscommoninevolutionarycomputationtoassumetheevaluation functionwill dominatealgorithmcomplexity,insomedomainsthiswillnotbetrue.Mo del-basedmethods areespeciallylikelytoviolatethisnorm.Therefore,inordertoasses sP3'squalityinsolving problemswithecienttnessfunctions,weprovidedataonbothits algorithmiccomplexity andwallclocktime. 5.3.1OperationCounting WhendiscussingtheasymptoticcomplexityofP3inChapter3,twoas pectseluded preciseanalysis:howexpensiveismodelrebuildingandhowmanygene donationsaremade. Figure5.3providessomeinsightintohowoftenthesetwoaspectsof thealgorithmare utilized. Figure5.3areportsinanalgorithmicsensehowexpensivemodelrebu ildingisforsearch. 30 0.00.51.00200400600800 Problem Size Model Rebuild Cost per Evaluation Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (a)Rebuilds 02460200400600800 Problem Size Donations per Evaluation Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (b)Donations Figure5.3:Estimatedcomputationcostsincurredbymodelrebuildin g(Figure5.3a)and repeateddonations(Figure5.3b)perevaluationasproblemsizeincr eases. Inordertocalculatethisvaluewerecordedhowmanytimessearchr ebuiltthemodelduring eachrun.Figure5.3ashowstheestimatedratioofmodelrebuildingc ost( N 2 perrebuild) overevaluationcost( N perevaluation).Ifthecostofmodelbuildingscaledlinearlywith evaluations,therelationshipplottedforeachproblemshouldbeasy mptoticallyconstant. ForNearestNeighborNK,IsingSpinGlasses,andRastriginthisisthe case.ForbothTrap problemsandHIFFthereisslowgrowthintheratio.Theproblemsizes usedforMAX-SAT werenotsucienttoaccuratelygaugetheasymptoticbehavior.T ogetherthissuggeststhat whilethecostofbuildingthemodelisalmostlinearperevaluation,itcan growslowly. However,evenintheworstcase(HIFF)thisgrowthwasnomoreth antwicethealgorithmic costofanevaluationevenusing2048bits. Whenapplyingacrossoversubset,P3triesrandomdonorsfromth epopulationuntilone isfoundwithatleastonebitderentfromtheimprovingsolution.Int heorythiscanresult inupto O ( )operations.Figure5.3bexaminestheobservedaveragenumbero fdonations 31 perevaluationperformed.IsingSpinGlass,HIFF,andRastriginalla chieveeectively constantbehaviorhere,implyingrepeateddonationdoesnotimpac ttheasymptoticruntime ofP3.BothTrapfunctionsandNearestNeighborNKallincreaseinn umberofdonationsas problemsizeincreases,potentiallyincreasingalgorithmiccosts.Anim portantnoteisthat eachdonationmayrangeinsizefromasinglebitupto N 1.However,repeateddonation attemptsarefarmorelikelytohappenwithsmallerclusters.Assuch thismaycausesome super-lineargrowthinP3,butitisunlikelytobeveryhigh. 5.3.2WallClockPerformance ToassesswallclockperformanceweprovideFigure5.4.SimilartoFigu re5.1,eachpoint representsthemedianof100runs,withunsuccessfulrunstrea tedasslowerthansuccessful runs.Theseresultswerecollectedusing2.5GHzIntelXeonE5-267 0v2processors. 5.3.2.1ModelBuilding hBOAandParameter-lesshBOAperformmuchworsewhenusingwall clocktimeas theunitofcomparisonthanwhenusingevaluations.Thismakessens eashBOA'smodel buildingrequires( N 2 )timeperevaluationwhile,underreasonableassumptions,P3and LTGArequire O ( N )timeperevaluation.ThispenaltyismostclearonIsingSpinGlass wherehBOAgoesfrombeingslightlymoreecientthanLTGAintermso fevaluationsto threeordersofmagnitudeworseintermsofseconds.AsP3andLT GArequireasimilar asymptoticcomplexityperevaluationastheHillClimberand(1+( ; )),nosimilarchange inorderingoccurs. 5.3.2.2P3 WhenLTGAisoptimallytunedtoasingle-instanceproblemwithanecien tevaluation functionitcanndtheglobaloptimumfasterthanP3intermsofwall clocktime.However, onrandomlygeneratedproblemclassesP3'secientuseofevaluatio nsisenoughtoovertake LTGA. 32 10-2 100102104101.5 102102.5 Deceptive Trap 10-2 100102101.5 102102.5 Deceptive Step Trap 10-3 10-2 10-1 100101102101.5 102102.5 103HIFF10-4 10-2 100102101101.5 102102.5 Rastrigin 10-2 100102104101101.5 102102.5 Nearest Neighbor NK10-4 10-2 100102104101.5 102102.5 Ising Spin Glass10-4 10-2 100102101101.2 101.4 101.6 101.8 MAX-SAT Optimizer Hill Climber 1+(Lambda,Lambda) hBOA Parameter-less hBOA LTGA P3Problem Size Median Seconds to SuccessFigure5.4:Comparisonofthemediannumberofsecondstoreachth eglobaloptimumfor thesixderentoptimizationmethodswithrespecttoproblemsize.I fthemedianrundid notreachtheglobaloptimumnodataelementisshown.Resultsgiven onalog-logscale. 33 Onthefoursingle-instanceproblemsLTGAnotonlyndstheglobalo ptimumusingless wallclocktime,thefactorspeedupincreasesasproblemlengthdoe s.Naelythissuggests LTGAisachievingalowerorderofcomplexity.However,fortheseex perimentsLTGAis growingatsub-lineartimeperevaluation,whichisnotasymptoticallys tabledueto(at minimum)thetimerequiredtoperformanevaluation.Wesuspecttha tthetruecauseis that N issmallenoughtobeovershadowedbylowerorderpolynomialterms .Forexample, LTGArequires O ( N= )timeperevaluationtorebuildthelinkagemodelfromthefrequency table.Asaresult,forsmall modelbuilding,andnotextractingpairwisefrequency,can dominateruntime. Whenappliedtorandomlygeneratedproblemclasses,thederence sinP3andLTGA's evaluationcomplexitydominatesruntimecomplexity.SimilartowithFigu re5.1,theamount ofspeedupP3achievesoverLTGAincreaseswithproblemsizeonNea restNeighborNK, IsingSpinGlasses,andMAX-SAT. AcrossbothtypesofproblemswendthatP3'stimeperevaluationg rowsapproximately linearly.Assuch,weconcludethatP3requiresasymptoticallysimilara mountsoftimeper evaluationastheotherecienttechniques. 34 Chapter6 InternalWorkings 6.1PopulationSizing AmajoradvantagetoP3isthatitdoesnotrequiretheusertoseta populationsize parameter.BeyondmakingP3easiertoapply,thisalsoconveystwo additionaladvantages: diversityscaledtoinitializationandnoneedtosacrceintermediate tnessforeventual optimality. Figure6.1ashowshowthenumberoftotalsolutionsstoredinthepy ramidchangesas problemsizeincreases,similartoFigure2.4forhBOAandLTGA'stuned populationsizes. Asexpected,thenumberofconcurrentlystoredsolutionsincrea sesasproblemdcultin- creases,withtheexactbehaviordependentontheproblemlandsc ape.Figure6.1bexamines howthenumberofsolutionsstoredisdistributedonthelargestpro blemsizes.Herewesee thatthebehaviordependsonthetypeofproblem.Onsingle-instan ceproblemsP3'sstored varianceisrelativelylow,andgenerallyhigherthanoptimallytunedLTG A'spopulationsize. OnrandomlygeneratedproblemclassesP3hasamuchhighervarianc einstoredsolutions, butingeneralrequiressmallersizesthanLTGA. 6.1.1ProblemInstanceversusProblemClass OurprocedurefortuningLTGAandhBOAoutlinedinSection2.6involve dndingthe optimalpopulationsizeforeachclassofproblem.Forreal-worldblac k-boxoptimizationthis 35 101102103104101101.5 102102.5 103Problem Size Concurrent SolutionsProblem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (a)Asproblemsizeincreases ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++101102103104105Deceptive Trap Deceptive Step Trap HIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT Concurrent Solutions(b)Largestproblemsize Figure6.1:ThetotalnumberofsolutionsstoredbyP3whentheglob aloptimumisfound. InFigure6.1bthered\+"indicatesLTGA'stunedpopulationsize. isrealisticallythebesteitheralgorithmcouldhopeforastuningtoapr obleminstanceor populationinitializationinvolvesrepeatedlysolvingtheproblembeingtu ned.Thislimitation doesnotexistinparameter-lessmethods,whichscaletheirdiversit ybasedontheproblem instancewithoutneedingtosolvethatinstancerepeatedly. Toachievehighsuccessratesonrandomlygeneratedproblemclass es,LTGAandhBOA mustuseapopulationwhichislargeenoughtosolvethehardestinsta ncesinthatclass. Thereforethesemethodswillhavepopulationsizeslargerthannec essarytosolveeasierin- stancesintheclass.Evenonsingle-instanceproblems,bothmetho dswillrequirepopulation sizeslargeenoughtoensuretheworstrandominitializationisdiverse enoughtosolvethe problem,whichmaybemuchlargerthanthebestrandominitialization. Figure6.2highlightshowthiscaneecttherequirednumberofevalua tionstoreach theglobaloptimum,showingthedistributionofresultswhensolvingt helargestsizeof eachproblem.OneachproblemexceptIsingSpinGlassandMAX-SAT, LTGAhasamuch 36 Deceptive TrapDeceptive Step TrapHIFF RastriginNearest Neighbor NKIsing Spin Glass MAX-SAT 500000100000015000005.0e+061.0e+071.5e+072.0e+072.5e+073.0e+07800000120000016000001500002000002500003000003500004000002e+074e+076e+070.0e+005.0e+061.0e+071.5e+070e+002e+074e+076e+078e+07LTGAP3 LTGAP3 LTGAP3 LTGAP3 LTGAP3 LTGAP3 LTGAP3 Evaluations to Success Figure6.2:Distributionofevaluationsrequiredtoreachtheglobalo ptimumforP3and LTGAonthelargestsizeofeachproblem. 37 smallerderencebetweenitsbestandworstruns.Thismakessens easLTGAusesthesame populationsizeregardlessofinstanceandbecauseitssearchprog ressesgenerationally.In contrastP3hasamuchhighersplit,withmanyrunsnishingveryquic kly.Onallproblems exceptDeceptiveStepTrap,P3'supperquartileislowerthanLTGA's lowerquartile.Fur- thermore,onDeceptiveTrap,HIFF,andIsingSpinGlasses,P3'swo rstrunisbetterthan LTGA'sbestrun.ForNearestNeighborNK,mostofP3'srunsnishm uchfasterthanthe fastestLTGAruns.However,someofP3'soutlierstakeapproxima telyaslongasLTGA's tunedperformance.ThissupportsthehypothesisthatP3isablet oscaleitsdiversitynotjust totheproblemclass,buttotheprobleminstanceorevenprobleminit ialization,something whollyinfeasiblefortunedpopulationsizingtodo. ThistuningdistinctionisalsoapparentwhencomparingParameter-le sshBOAwith hBOAinFigure5.1.WhilegenerallyperformingworsethanhBOA,thed erencebetween thetwoalgorithmsissmallestonrandomlygeneratedproblemclasses .OnMAX-SAT, Parameter-lesshBOAactuallyoutperformedbothhBOAandLTGA, likelyduetoitsability toscalediversitytotheprobleminstanceinsteadoftheentireprob lemclass. 6.1.2FastversusOptimal InSection5.2weexaminedintermediatetnessqualitiesofLTGAandh BOAwhenus- ingpopulationsizestunedtoreachtheglobaloptimum.Asaresult,b othwereexceptionally ineectiveatquicklyreachinghigh-qualitysolutions.Thisisbecauseu nlikeP3,thesemeth- odshaveanexplicittradeobetweenoptimalperformanceandinte rmediateperformance causedbytheirpopulationsizeparameter. Figure6.3examinestheeectofpopulationsizeonLTGA'sintermediat etnessbyre- ducingLTGA'spopulationsizetoonetenthofthetunedvalue.Thetw oproblemsshownare representativeofthebehaviorofusingasmallerpopulationsizeont heotherveproblems. ReducingthepopulationsizecausedLTGAtoimproveearlierbutplate auatlowertnesses. ThiscausedLTGA'ssuccessratetodropfrom100to0onDeceptive StepTrapandfrom98 38 0.40.60.81.0101102103104105106107Evaluations Median Best FitnessOptimizer LTGA_OriginalLTGA_TenthP3 (a)DeceptiveStepTrap203 0.60.70.80.91.0102104106Evaluations Median Best FitnessOptimizer LTGA_OriginalLTGA_TenthP3 (b)NearestNeighborNK200 Figure6.3:ComparisonofhowreducingLTGA'spopulationsizeaects themedianbest tnessreachedduringsearch. to68onNearestNeighborNK.Evenwhenusingareducedpopulation sizeforLTGA,P3 stillachievedatnessatleastashighasLTGAat80%ofintervals.Th elikelihoodthatP3 wouldachievethesepairwiseresultsassumingitsmedianresultisactu allyworseis p< 10 15 accordingtothebinomialtest. 6.2InnerWorkingsSpecictoP3 Whileanalysisofoptimizationspeedisusefulfromapractitionerstan dpoint,doingso provideslittleinsightintoalgorithmbehavior.Tobetterunderstand howP3worksindetail wepresentherealookatsomeinternalfeaturesspecctoP3. 6.2.1Crossover Figure6.4ashowstheproportionofevaluationsP3spendsoncross over,asopposedto hillclimbing,andFigure6.4bshowswhatpercentageofcrossoverev aluationsresultedina 39 0.000.250.500.750200400600800 Problem Size Percentage of Evaluations Spent on Crossover Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (a)CrossoverProportion 10-2 10-1.5 10-1 10-0.5 1000200400600800 Problem Size Percentage of Crosovers Resulting in Fitness Improvement Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (b)CrossoverSuccess Figure6.4:ForeachproblemFigure6.4ashowstheproportionofP3e valuationsspendon crossoversandFigure6.4bshowsthepercentageoftness-impr ovingcrossoverevaluations. tnessimprovement.Togethertheseguresprovidesomeinsight intotheroleofcrossover withinP3.Thebehaviorforeachisclearlyproblemdependentandgen erallyasymptotically stableasproblemsizeincreases. Whensolvingproblemswhereepistasiscanbeeectivelydetectedan drepresentedbya linkagetree,P3tendstospendfewerevaluationsperformingcros soverandeachcrossoveris morelikelytobesuccessful.DeceptiveTrapandRastriginaretheea siestproblemstocapture epistasis,withlocalsearchquicklyreducingpairwiseentropyineach .Thesearealsothe problemswhereP3usestheleastevaluationsoncrossoverandhas thehighestsuccessrates forcrossover.AttheotherextremeareNearestNeighborNKan dIsingSpinGlasses,which bothhaveoverlappinglinkagenotrepresentablebyalinkagetree.T heseproblemshavethe highestcrossoverusageandlowestcrossoversuccessofanypr oblemexceptDeceptiveStep Trap.WhileDeceptiveStepTrap'sepistasiscanbeaccuratelymodele dbyalinkagetree, theexponentialnumberofplateausmakesdetectinggenelinkagev erychallenging. 40 P3'scrossoversuccessratesarelowerthanLTGA's,butbothpro ducethesameordering oftheproblemswhenusingcrossoversuccesssuchthatNearest NeighborNKistheleast successfulandDeceptiveTrapisthemostsuccessfulproblem.C ounterintuitively,theuseof hillclimbingontheinitialpopulationreducesP3'scrossoversuccess, notbecauseitreduces modelqualityorthedonationpool,butbecauseitismuchmorechallen gingtoimprove locallyoptimalsolutionsthanrandomlygeneratedones.LTGA'scros soverbenetsfromap- plicationtounoptimizedsolutions,whichmakesitsaggregatecrosso versuccessincomparable toP3's. Evenwhencrossoversuccessratesarequitelow,suchasNeares tNeighborNK's0.007% success,theresultsfromSection5.1andSection5.2showhowcritic althissmallpercentage istooptimization.Withoutcrossover,P3'sperformancewouldbeide nticaltotheRandom RestartHillClimber,whichwasunabletosolveevenmoderately-sized problemsandquickly fellbehindP3inintermediatetnessquality.Thereforeeveninfreq uentlysuccessfulcrossover donationsarecriticaltosuccess.Thisdoes,however,suggesta potentialavenueforfuture improvementbyusingmoresuccessfulmodelinganddonationalgorit hms. 6.2.2Pyramid AnotherfeatureuniquetoP3istheshapeandsizeofthepopulation pyramidcon- structedforeachproblem.Figure6.5ashowsthenumberofsolutio nsstoredateachlevelof thepyramidforthelargesttestedproblemsizes.Eachpointisthem ediansizeacross100 runs.Ifarundidnotstoreanysolutionsatalevelitistreatedas0. Nopointisdrawnifthe medianrunhad0solutionsstoredatthatlevel.Whilepyramidsizeisae ctedbyproblem size,theoverallshapeisnot.AssuchthebehaviorshowninFigure6 .5aisrepresentativeof thatforallothertestedproblemsizes. WiththeexceptionofthedipinDeceptiveStepTrap,allofthepyram idsshowa monotonicreductioninsizeasthelevelincreases.Thisisbecauseas olutionmustbeastrict tnessimprovementoveritspreviousversiontobeaddedtoahighe rlevel,whichbecomes 41 02004006000102030 Pyramid Level Number of SolutionsProblem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (a)PopulationSize 10-3.5 10-3 10-2.5 10-2 10-1.5 10-1 10-0.5 0102030 Pyramid Level Percentage of Crosovers Resulting in Fitness Improvement Problem Deceptive TrapDeceptive Step TrapHIFFRastrigin Nearest Neighbor NKIsing Spin GlassMAX-SAT (b)CrossoverSuccess Figure6.5:ForeachproblemFigure6.5ashowsthenumberofsolution sstoredineach levelofthepyramidandFigure6.5bshowsthepercentageoftnes s-improvingcrossover evaluationsateachlevel. lesslikelyeachtimethesolutionimproves.[20]foundtheoreticalevid enceand[11]found empiricalevidencethattheoptimalpopulationsizedecreaseseach generationoftraditional evolutionarysearch.Bydecreasinginsize,P3implicitlystoresmored iversityinlowlevels andfocusessearcharoundhigh-qualitysolutionsathigherlevels.I ncomparison,LTGAand hBOAsuboptimallyuseaxedpopulationsizeateachgeneration. Figure6.5bexamineshowcrossoversuccesschangesatderentle velsofthepyramid. Atlowlevels,successgetsprogressivelylowerassolutionqualityincr easesfasterthanthe model'sabilitytoimprovesolutions.Athigherlevelsmodelingbecomesmo reaccurateand donationscontainhigherfrequenciesofhigh-qualitybuildingblocks, resultinginincreased crossoversuccess.Thehighestlevelofmostproblemshasalowcr ossoversuccessrate,as solutionscrossingwiththatlevelhavealreadybeenimprovedbypre viousoperationstothe pointwheretheonlyimprovementwouldbetocreatetheglobaloptim um,whichcanonly happenonce. 42 6.2.2.1DeceptiveStepTrap ThenumberofsolutionsstoredatthefourthlevelofDeceptiveSt epTrapissigncantly lowerthanthatofthethirdorfthlevels,breakingthedecreasing trendoftheothersix problems.Figure6.5bhasasimilaraberration,withcrossoversucce ssdroppingto0.0004% beforereboundingandfollowingthemorecommontrajectory.This behaviorexistsinall otherproblemsizestested,withthedipsoccurringatexactlythes amelevel. Thisbehaviorisrootedinthepeculiarnatureofthislandscape.Afte rlocalsearch,all trapsinallsolutionshaveatotalnumberof1bitsequaltoeither0, 1,3,5,or7asthese correspondtothelocaloptimawhenusing k =7and s =2.Crossovereasilyovercomes thetwo-bitplateaus,andasaresultsolutionsinthesecondlevelge nerallydonotcontain thelowesttnesslocaloptima(5bitsset)andthethirdlevelhasfe wtrapssettothenext worstlocaloptima(3bitsset).Asaresult,solutionsthatreachth ethirdlevelcanonly beimprovedbyreplacingthedeceptivelocaloptima(0and1bitsset) withtheglobal optimum(7bitsset).Theglobaloptimumisrareinthepopulation,and with8waysto representlocaloptimalinkagelearningisinaccurate.Thereforeitis unlikelyforcrossover tobesuccessful,meaningfewsolutionswillbeaddedtothefourthle vel.Solutionsthatdo improvebydenitionmusthaveahigherfrequencyofoptimaltraps ettings,meaninglevel four'smodelwillbemoreaccurateanddonationsaremorelikelytoco ntainoptimaltrap values.Thusthelevelsizeandcrossoversuccessratesincrease aftercontractingaroundlevel four. 43 Chapter7 NewProblemDomain:Gray-Box Optimization Alloftheworkpresentedsofarhasfocusedontheblack-boxopt imizationdomain. Theseproblemsarecharacterizedbyacompletelackofavailablepro blem-speccinforma- tion.Ablack-boxalgorithmcanonlyproposeasolutionandmeasuret hequalityofthat solution,usingonlythatinformationtoinformitssearch.Optimizers thataresuccessful inthisdomaingeneralizewellasthereareminimalrequirementstoapp lythemtoanew problem. However,formanyreal-worldapplicationsthereispotentiallymorein formationavail- able.Theotherendofthespectrumiswhite-boxoptimization,inwhic hthealgorithmknows theexactproblemclassitistryingtosolve.Thisdomainisdominatedby problem-specc searchheuristics[6,32,39]thatleverageallaspectsoftheprob lemtoachieveeciency. Thismakessuchalgorithmsveryspecc,suchthatoutsideoftheir targetdomaintheyei- thercannotbeappliedortheirapplicationhasnoguaranteeofsear chquality.Assuch,each mustalsobedesignedbyhandforeachnewproblemclass,requiringd eepunderstandingof theproblemandthetimetodevelopthealgorithm. Inbetweenthosetwodomainsthereisanotherdomain:gray-boxo ptimization.In thisdomainsomefeaturesgeneraltomultipleproblemclassesareex ploited,beyondjust anevaluationfunction.Thegoalistocreateoptimizationmethodst hatbenetfromthese 44 featureswithoutthosemethodsbecomingspecializedtoasmallset ofproblemclasses. Inadditiontotheblack-boxevaluationfunction,weaddintwomoree xploitablefeatures accessibletosearchalgorithms:knownvariableepistasisandpartia lfunctionevaluation. 7.1Problemsinthisdomain Tounderstandthetypesofproblemsinthisdomain,letusrstexam ineasimple artcialproblemwe'llcall3-Equal.Thequalityofasolutionisdetermin edbyhowoften threeconsecutivebitsaresettothesamevalue.Asaresult,them aximumtnessis N (all zeroorallone)andtheminimumtnessis0. The3-Equalproblemhasbothknownepistasisandpartialfunction evaluation.From thedenitionweknowthateachbitisepistaticallylinkedtothetwovar iablesthatprecede itandthetwothatfollowit.Therearenoothernon-linearrelationsh ipswiththatbit.This alsomeansitispossibletoevaluatethetnesscontributionofthatb itonlyknowingthe valueofatmostfourothers,regardlessofthesizeof N .Aswe'lldiscussinSection7.3this hasenormousimplicationsonsearcheciency. EveryproblemdiscussedinChapter4exceptHIFFtsintothisdoma in.Deceptive TrapandDeceptiveStepTrapbothcanhaveknownepistasis(which bitsareinwhichtraps) andpartialreevaluation(scoreasingletrap),thesameasRastrig in(separable).AllNK problemswhere K ˝ N haveaknowableepistasistableandthetnesscontributionofeach subfunctioniscalculablewithoutevaluatingtheentirestring.Similarly ,MAX-SAT'sclause listspecesepistasis,witheachclauseindependentlyevaluable.Isin gSpinGlassesareeven simpler,witheachedgeinthegraphevaluableusingonlytwoproblemva riables. BeyondMAX-SATandIsingSpinGlasses,whichareinterestingreal-w orldproblems untothemselves,many NP -Hardreal-worldproblemscanbeexpressedinthesetworequire- ments.Thisisespeciallytrueofgraphproblems. Dominatingset :Findaminimumvertexsetsuchthatallverticesareeitherinthe setoradjacenttosomethingintheset.Thetnessofavertexisd eterminedbyifitis 45 intheset(addtosetsize)andifitisdominated(addtoundominateds ize).Epistasis istheadjacencysetforeachvertex,plusitself.Usedinwirelessse nsornetworks[40]. VertexCover :Findaminimumvertexsetsuchthatalledgesareincidentonat leastonevertexintheset.Epistasisandtnessarecalculatedusin geachvertex(add tosetsize)andeachedge(addtouncoveredsize),independently .Usedinnetwork security[28]. Max-Cut :Findthesetofverticesthatmaximizethenumberofedgesincident on exactlyonevertexintheset.Epistasisandtnessarecalculatedu singeachedge independently.UsedinVLSIdesign[7]. SetCover :Givenauniverseofelementsandasetofsubsetsofthatuniverse ,nd aminimumsetofsubsetswhoseunionrecreatestheentireuniverse .Eachelement intheuniverseisepistaticallylinkedwithsubsetsthatcontainthatele ment(addto uncoveredsize)andeachsubsetalsocontributesdirectlytotne ss(addtosetsize). Usedincomputationalbiology[21]. Zero-OneLinearProgramming :Givenasetofconstraints,maximizeanobjective function,allofwhicharelinearcombinationsofvariables.Eachcons traintcreatesan epistaticrelationshipbetweenthevariablesinthatconstraint,and tnessisalinear combinationofvariablesallowingforpartialevaluation.Usedinpower systems[1]. 7.2FormalRequirements Inordertodrawconclusionsaboutsearcheciency,itisnecessar ytomakethefeatures ofthetargetproblemdomainmoreexplicit. Theoverallqualityofasolutionmustbeequaltothesumofapplyinga llsubfunctionsto thatsolution,whereasubfunctioncanbeanymappingfromasubse tofproblemvariablesto ameasureofquality.Asaconsequenceeachsubfunctionmustbein dependentlyevaluable. 46 Themappingofvariablestosubfunctions(epistasis)mustalsobekn own,andthemaximum numberofvariablesparticipatingineachsubfunctionisrepresente dbythevariable k .To achievemaximumeciency k shouldbeconstantinregardstoproblemsize.Thecostof evaluatingasubfunctionshouldalsobeboundedbysomefunction b ( k ).Forexample,on MAX-SAT k istheclausesize,with b ( k )linearinclausesize,andthetnessequaltothe sumofallclauseterms.OnNK, k = K +1aseachsubfunctioninNKdependsonavariable and K others. Asthemappingofvariablestosubfunctionsisknown,itisalsopossible tocalculate c ,whichisthemaximumnumberofsubfunctionsanyvariableparticipate sin.Algorithmsare mostecientwhen c isconstantwithrespecttoproblemsize.ForexampleforallNeares t NeighborNKlandscapesallvariablesparticipateinexactly c = k = K +1subfunctions. RandomlygeneratedMAX-SATinstanceshavenoguaranteethat c isconstant,butthe expectednumberofsubfunctionsinwhichavariableappearsisequa ltotheclause-to-variable ratiotimestheclausesize,or12.81.Assumingthatallsubfunctions use k variablesandeach variableappearsin c subfunctionsprovidesaboundonthetotalnumberofsubfunctio ns cN k .7.3EcientLocalSearch Section2.1presentedanecientmethodforperforminglocalsear chfortheblack-box domain.Eachpotentialbitip,referredtohereasa move ,requirestheentiresolutionto bereevaluatedtaking( N )time.Furthermore,eachtimeatness-improvingmoveismade allpreviouslytestedmovesmustbetestedagain.Asaresult,impro vingarandomsolution tobealocaloptimumusingthisalgorithmrequiresbetween( N 2 )and O ( IN 2 ),where I isthenumberofimprovingmoves.Thelowerboundisachievedwhenth ewhileloopin Figure2.1runsaconstantnumberoftimes.Forinstance,onOneMa x,DeceptiveTrap, DeceptiveStepTrap,andHIFFallpossiblesinglebitimprovementsar efoundduringthe rstpassthroughtheloop.Theworstcaseiswheneachloopisexpe ctedtoonlymakea singleimprovingmove,whichcausesall N movestobereevaluatedonceperimprovement, 47 1: procedure InitializeDelta ( solution ) 2: fitness 0 3: 8 m delta [m ] 0 4: forall s 2 subfunctions do 5: pre move f s ( solution ) 6: fitness fitness + pre move 7: forall m 2 affected moves ( s ) do 8: post move f s ( solution m ) 9: delta [m ] delta [m ]+( post move pre move ) 10: endfor 11: endfor 12: endprocedure Figure7.1:Algorithmusedtoecientlydeterminethechangeintnes sassociatedwitheach potentialmovefromagivensolution. suchasontheLeadingOnesproblem. Localsearchinthegray-boxdomaincanbesigncantlymoreecien t[37].Thisisdue totwoconsequencesofthedomain:thetnesseectofmakingam ovecanbecalculatedin O (1)time,not O ( N ),andthenumberofmovesthatmustbereevaluatedperimprovem ent is O (1),not O ( N ).Thisresultsinlocalsearchrequiring O ( N + I )time.If I iswithina constantfactorof N ,thismeansgeneratingrandomlocaloptimaisatmostaconstantfa ctor slowerthangeneratingrandomsolutionsinthesearchspace. Toachievethisperformance,thislocalsearchtechniquebeginsby determiningthe changeintnesscausedbymakingeachpotentialmove m ,denotedas delta [m ].Figure7.1 calculatesthe delta foreach m startingatagivensolution,aswellasthetnessofthe solution.Here, f s evaluatesthesubfunction s onthegivensolution,and solution m isthe resultofmakingmove m on solution .Foreachofthe cN k subfunctions InitializeDelta mustdeterminethetnessofthatsubfunctionbeforeandafter makingeachof k movesthat overlapthatsubfunction.Combined,thisresultsinlessthan cN 2 b ( k )operations,whichis O ( N )assuming c and b ( k )donotgrowwith N .Also,thisprocedurecalls f s only k +1 timesmorethanisrequiredtondthetnessofthesolutionitself. Whenperforminghillclimbing,onlymoves m suchthat delta [m ]> 0aretness- 48 1: procedure MakeMove ( m ) 2: fitness fitness + delta [m ]3: forall s 2 affected subfunctions ( m ) do 4: pre both f s ( solution ) 5: just m f s ( solution m ) 6: forall m 0 2 affected moves ( s ) do 7: just m 0 f s ( solution m 0 ) 8: post both f s ( solution m m 0 ) 9: delta [m 0 ] delta [m 0 ] ( just m 0 pre both )+( post both just m ) 10: open ( m 0 ) 11: endfor 12: endfor 13: close ( m ) 14: solution solution m 15: endprocedure Figure7.2:Algorithmforupdatingstoredinformationrelatedtoaso lutionwhenmakinga move. improvingmoves.Initiallyallmovesareconsidered open ,meaningthattheycouldpo- tentiallybetnessimprovements.Duringeachiterationarandommo ve m ischosenfrom open ,and delta [m ]ischecked.If m isatness-improvingmove, MakeMove ( m ),shownin Figure7.2,iscalled. MakeMove updatesthetnessand delta valuestoreectthechangeinthesolution. Thetnessofthesolutionaftermakingthemovedoesnotrequirea nycallsto f s as delta [m ]alreadystoresthechangeintness.Thisprocessrequiresupdat ingallofthe delta valuesfor the k movesthatinteractwitheachofthe c subfunctionsaectedby m .Thisupdatereplaces outdatedinformationthatusedtheoriginalsolution( just m 0 pre both )withhowmuch themoveimprovesoverthenewsolution( post both just m ).As delta [m 0 ]hasupdated, m 0 couldpotentiallybecomeatness-improvingmoveandisthereforea ddedinto open .As m wasjustipped,itcannotbeatnessimprovementandistherefor eremovedfrom open .Intotalthisrequireslessthan ck 4 b ( k )time,whichis O (1)assuming c ,k ,and b ( k )donot scalewith N .Eachtimeanimprovingmoveisfoundatmost ck additionalmovesareaddedto open .If open everbecomesemptyalocaloptimumhasbeenreached.Therefore thenumberof 49 times delta [m ]ischeckedisnomorethan N + Ick .Asaresultthetotalcostofimproving arandomsolutionuntilitreachesaglobaloptimumisonly O ( N + I ). Asaninterestingaddition,thismethodcanactuallybeusedtoecien tlyperform approximatesteepestascenthillclimbing[37].Insteadofchoosingm ovesfrom open at random,themovesarebinnedusingtheir delta values.Movesarethenchosenrandomly fromthehighestqualitynon-emptybin,andchangesin delta cancausemovestochangebins. Assumingthetotalnumberofbinsdoesnotincreasewith N ,bothstepscanbedonein O (1) time.However,atleastforMAX-SAT,thereisevidencethatwhenu sedincombinationwith subsequentsearchheuristicsthelessgreedy,rst-improvemen talgorithmwasmoreeective. 7.4EcientHammingBallSearch Anotherconsequenceofthegray-boxdomainisthatincreasingth ehamming-ballradius oflocalsearchbecomesmoretractable[4].Ahammingballisthecollec tionofallsolutions withinagivenhammingdistance,orradius,fromagivensolution.Inst eadofimproving solutionsuntilnosinglebitipisatnessimprovement,theHamming-B allHillClimber (HBHC)ndssolutionswhichcannotbeimprovedbyipping r orfewerbits. Inablack-boxsetting,verifyingthatno r -bitipcanimproveasolutionrequirestesting all n r neighbors.Thisquicklybecomesintractableas r increases.However,inthegray- boxdomainnotallcombinationsneedtobetested.Considerthatift wovariablesdonot participateinthesamesubfunction,therelationshipbetweentheir eectsis,bydenition, additive.Assuchthereisnowayforippingbothtogethertobeat nessimprovement withoutippingoneofthembeingatnessimprovement.Therefore itisnotnecessaryto tryallpossible r -sizedsubsetsofthesolution. Considerthe3-Equalproblem.Ifthereareatleasttwolocibetwe en x i and x j thenthey donotshareacommonsubfunction.Therefore,thechangeintn essresultingfromipping bothisequalto delta [x i ]+ delta [x j ].If delta [x i ] 0and delta [x j ] 0then delta [x i ;x j ] 0. Nowconsiderasolutionsuchthatthersthalfissetto0andthesec ondhalfissetto1. 50 1: procedure ConnectedInducedSubgraphs 2: closed ; 3: found [] 4: forall v 2 V do 5: closed closed [f v g 6: found found + CISG ( v ,; ,closed ,; ) 7: endfor 8: return found 9: endprocedure 10: procedure CISG ( v ,subset ,closed ,open ) 11: subset 0 subset [f v g 12: found [subset 0 ]13: if j subset 0 j r thenreturn found 14: endif 15: closed here ; 16: open 0 open [ adjacent ( v ) 17: forall v 0 2 open 0 suchthat v 0 = 2 closed do 18: closed here closed here [f v 0 g 19: closed closed [f v 0 g 20: recurse CISG ( v 0 ,subset 0 ,closed ,open 0 ) 21: found found + recurse 22: endfor 23: closed closed closed here 24: return found 25: endprocedure Figure7.3:Algorithmtorecursivelyndallconnectedinducedsubgr aphsofsize r orfewer. Eventhough c =3thereisnowaytoimprovethissolutionwithoutsimultaneouslyippin g all0'sto1'sorviceversa.Anysmalleripwillnotbeatnessimproveme nt. Todeterminewhichofthe n r ipsthatmustbechecked,consideragraphwhereeach vertexisavariableinthesolution.Anedgeexistsbetweentwovertic esifandonlyifthe correspondingvariablesparticipateinatleastonesubfunctiontog ether.Restated,thereis onlyanedgeifthetwovariableshaveadirect,non-linearrelationship .Themaximumdegree ofavertexinthisgraphis c ( k 1),makingitsparseforsucientlylarge N .Ifasubsetof verticesisconnectedthenitispossiblethatippingallofthosevaria blestogetherwillresult inatnessimprovementevenwhenippinganysubsetofthesubset willnot.However,if thesubsetisnotconnectedtheneachcomponentofthesubsetc anbetestedindependently. 51 Inordertodeterminewhichmovesinthehammingballmustbeevaluat ed,wedeveloped ConnectedInducedSubgraphs giveninFigure7.3. CISG isarecursivehelperfunction thatndsallsubgraphsthatcontainagiven subset andagivenvertex v ,whileexcludingany otherverticesaddedto closed .Tondallsubgraphs, CISG iscalledonceforeachvertexin thegraph,suchthat closed containsallpreviouslysearchedvertices,and subset = open = ; .Intheinitialcallalldesiredsubgraphsthatcontain v arefound,whichiswhy v remainsin closed topreventduplicatesubgraphsfrombeingreturned. Ateachrecursivelevel CISG expands open toincludeanyverticesadjacentto v inthe graph.Byconstructionthismeansthat open 0 containsallpossiblewaysofaddingasingle vertextothecurrent subset 0 .Aseach v 0 istesteditistemporarilyaddedto closed toprevent recursivecallsfromaddingitagainto subset 0 .Whenappliedtothesparsegraphsinherentinthegray-boxdomain, thisalgorithm requires O ( r !( ck ) r N )time,whichreducesto O ( N ).Thetimespentineachcallisdominated bytheloopover open 0 .Intheworstcase, open 0 increasesinsizebythefulladjacencyof v ,whichisboundedby ck .Thiscreatesaworse-casecomplexityforasingletop-levelcall of Q r i ick = r !( ck ) r .Thismustbecalledonceforeachofthe N variablesresultingin O ( r !( ck ) r N ). Asthisalgorithmndsallconnectedsubsetsin O ( N )timeforaxed r ,thenumberof movesthatmustbetestedtodetermineifasolutionisan r -bitlocaloptimumis O ( N ).This meansthatwhileonNearestNeighborNKwith N =8000, K =5,and r =3theblack-box methodwouldrequire85billionchecks,thegray-boxmethodrequire sonly248,000.Even whenallowingconnectionstobecompletelyrandom,whichresultsin c increasingwith N ,thegray-boxmethodstillonlyrequiresapproximately375millionchec ks,two-and-a-half ordersofmagnitudelessthanassumingablack-box. Fromtheseconclusionsitispossibletomodifythehillclimberpresente dinSection7.3 toecientlynd r -bitlocaloptima[4].Theonlychangeisthatinsteadofhavingamove and delta foreachbit,theremustbeamoveand delta foreachconnectedinducedsubgraph 52 1: procedure Iterate-TUX 2: Createrandom solution 3: Hamming-BallHillClimb solution 4: forall T i 2 T do 5: if T i isempty then 6: T i solution 7: return 8: endif 9: Cross solution with T i tocreate2 i +1 ospring 10: Hamming-BallHillClimbeachospring 11: solution bestofospring, solution ,and T i 12: T i empty 13: endfor 14: Add solution toendof T 15: endprocedure Figure7.4:OneiterationofTUXoptimization. T isanorderedlistofsolutions,eachposition ofwhichcouldbeempty,awaitingacrossoverpartner. of r orfewerbits.Theeciencyanalysisisunchangedwiththeexception thattheconstant increasesexponentiallywith r asthenumberof delta valuesthatmustbeupdatedeach moveincreases.Still,with c ,r ,and k constantwithrespectto N ,itispossibletond r -bit hammingballlocaloptimain O ( N + I )time. 7.5TournamentUniformCrossover:TUX Hamming-BallHillClimbingisnotsucienttoecientlyndtheglobalopt imum onproblemswithevenmoderateepistasis[4].Thisisbecause,likeallra ndomrestart hillclimbers,itreliesonrandominitializationtofallinsidetheglobaloptim um'sbasin ofattraction. Toremedythislimitation,wesetouttodevelopaminimallycomplexmemet ical- gorithmtohelpincreasethisprobability.Figure7.4presentstheTou rnamentUniform Crossover(TUX)algorithm,whichcombinessimplisticselectionwitheq ualprobabilityuni- formcrossover,themostbasicunbiasedcrossover,togenerat estartingsolutionslikelytobe intheglobaloptima'sbasinofattraction. 53 ConceptuallyTUXiterativelybuildsastructuresimilartoasingle-elimina tionbracket forsolutions.Each\match"inthetournamenttakesintwocandida tesolutions,produces ospringviauniformcrossover,applieshillclimbingtoeachospring, withthe\winner" beingthebestofallthosesolutions.Thetournament\bracket"is constructediteratively, storinganinitiallyemptylistofsolutions T ,suchthat j T j isequaltotheheightofthe tournament.Iterativeconstructionispossiblebecause,whenas ub-bracketiscomplete,only asinglesolutionemergesandsolutionsonlyneedtobestoreduntilth eirpartnerisfound. TUXisfullyelitistbutdoesnotprematurelyconverge.Thisisbecause searchcontinuously integratesnewrandomlygeneratedsolutionsthroughotherpart softhebracket.Whenever thetopofthecurrentbracketisreached,TUXdoublesthesizeof thevirtualtournament. Whencrossingsolutionsat T i ,TUXproduces2 i +1 ospring.Thisrelationshipensures thatintotalalllevelsofthetournament,includingrandominitializatio n,performthesame numberofhillclimbingsteps.Italsoshiftsthefocusofsearchtowa rdareasexpectedtobe ofhighertness.Theexpectationisalsothatitbecomesprogress ivelyhardertoimprove solutionsthehigherupthetournamentyouadvance,somoreatte mptsarenecessaryto createnewusefulsolutions. TheprimaryadvantagesofTUXarethatitdoesnotintroduceanyn ewparameters (thoughitstillrequiresan r forthehillclimber)andisrelativelysimpletoimplement.Even soitallowsforlearningfrompreviouslocaloptimaand,asSection9will show,itisquite eectiveatoptimization. 54 Chapter8 Gray-BoxP3 P3representsanaturalmethodforintegratingtheHBHCintoaglo baloptimization algorithmasP3alreadyutilizeslocalsearch.WhileHBHC'sinclusionadds aparame- ter,Section9.1andSection9.2provideevidencethat r canbexedto1,preservingthe parameter-lessnatureofP3.BeyondusingHBHC,therearealsoa numberofwaysinwhich P3canbemademoreecientbyleveragingtheadditionalinformation availableinthe gray-boxdomain. Firstandforemost,linkagelearningisnolongernecessary.Byden itionthedirectnon- linearrelationshipsbetweenvariablesareknown.Asaresult,Gray- BoxP3doesnotneed tostorethepairwisefrequencyinformation,reducingitsrequired memoryfrom O ( N 2 )to O ( N ).Instead,wehavedevelopedamethodforcreatingalinkagetree thatlearnsclusters fromthesamedependencygraphdenedinSection7.4.Thegoalisf oreachclustertobe aconnectedinducedsubgraph,withthesizeofclustersmirroringt hoseproducedbythe agglomerativelinkagelearningprocessnormallyusedwithP3.Toform asinglecluster,a randomgraphsearchisperformedfromarandomstartingvertex untiladesirednumberof uniqueverticeshavebeenexplored.Clustersizesaresetrecursiv ely.Foreachclusterofsize l> 1aclusterofsize a andaclusterofsize l a arealsocreated,with a chosenuniformly fromtherange[1 ::l 1].Thisrecursiveprocessbeginsbyinitializing l = N andsplittingto formthersttwoclusters. Thislinkingalgorithmhasanumberofusefulproperties.First,itcre atesexactly2 N 2 55 clusters,distributedinsizesimilarlytotheblack-boxclusteringalgor ithm.Performinga randomgraphsearchtond l uniqueverticesrequires O ( lck )time,meaningclustercreation isoptimallyecient.Theclustersplittingprocesshasidenticalprope rtiestorandompivot quicksort,meaningthesumofclustersizesis O ( N log N )intheaveragecase.Thiseciency allowsnewclusterstobecreatedbeforeeverymixingevent,unlikeB lack-BoxP3wherethey arecreatedonlywhennewsolutionsareaddedtothepopulation.Fo rsimplicitytheclusters areshuedaftertheyarecreated,notsortedonsizelikeinBlack- BoxP3. Beyondeciency,therearegoodreasonstobelievethesecluster swillbeusefulfor search.Theclosertwovariablesareinthedependencygraph,the morelikelytheyareto appearinthesamecluster.Allvariablesonaverageareexpectedt oappearinatleastone cluster,butvariablesthatarecentralinthegraphwillappearinmo reclustersthanthose ontheperiphery.Themorepathsofagivenlengthbetweentwovar iables,thehigherthe probabilityofthembeinginthesamecluster.UnlikeBlack-BoxP3,this linkagetreedoes notrequireclusterstobenested,allowingmorediversityinthetype sofclustersappearing inasingletree. IneecttheclustersaresamplingmovesthattheHBHCwouldmakeif r l .Forany solutioninthesearchspacethatisnotgloballyoptimal,theremustex istsomemovethatwill improveitsquality.However,thismovemaybearbitrarilylargeanditis intractabletotest allpossiblemovesofevenmoderatesize.Bysamplingfromallpossible largemoves,wecan maintaintractabilitywhilegainingapotentiallynon-zeroprobabilityofim provement.As clustersareusedtodonatevaluesbetweensolutions,thesemove sarealwaysinthedirection ofpreviouslyfoundhigh-qualitysolutions.Thisassumesthatthede nsityofhigh-quality solutionsishigherthanaveragebetweengoodsolutions. Anothereciencygainisthateachtimeadonationismade,onlythea ectedpartofthe solution'stnessneedstoberecalculated.Asaresult,thenumber ofsubfunctionevaluations requiredtodeterminethechangeintnessisonly O ( l ).ThisalsoallowsforGray-BoxP3 toecientlyreapplyhillclimbingaftereachdonationasonlyaectedm ovesneedtobe 56 rechecked.Asaresult,adonationanditsresultingmodcationsfr omhillclimbingarekept onlyifthenewlocaloptimumisatleastastasthesolutionbeforethe donationoccurred. Allcombined,asingledonationplusreturningtoalocaloptimumrequir es O ( l + I )time, whilejustthedonationinBlack-BoxP3requires O ( N ). 57 Chapter9 BenchmarkingGray-BoxP3 Tocomparethesegray-boxoptimizationtechniqueswehavechose nNKq-Landscapes[4] andIsingSpinGlasses[30].NKq-Landscapescreateacollectionofra ndomlygeneratedprob- leminstancesgivenahigher-levelproblemclassdescription.Eachins tanceisdescribedbya seriesof N subfunctions,eachcorrespondingtoavariableinthesolution.This subfunction usesitsvariableand K othervariablesinthesolutiontocalculateatnessvalue.Fitness valuesarerepresentedasarandomlygeneratedlookuptable,suc hthattableentriesare integersintherange[0 ::q 1].Aseachsubfunctionreads K +1variables,thetable'ssize and q aresetto2 K +1 .Thequalityofasolutionisequaltothesumofthevaluesreturned bythesesubfunctions. Inthisworkweconsidertwomethodsforchoosingthe K variableseachsubfunction dependson:NearestNeighborNKqandUnrestrictedNKq.InNear estNeighborNKq eachvariabledependsonthe K variablesthatsequentiallyfollowitinthesolution,with dependencieswrappingaroundtheendofthesolution.Landscape softhisformcanbesolved inpolynomialtime[39],allowingcomparisonsofhowquicklyeachoptimizat ionalgorithm canndaglobaloptima.NearestNeighborNKqalsoensuresthat c = k = K +1andthat both c and k donotincreaseas N increases,meaningtheeciencyconclusionsmadein Section7.4areapplicable. UnrestrictedNKqlandscapesdrawthe K dependenciesatrandomwithoutreplacement. For K> 1itisNP-Hardtondtheglobaloptimumoftheselandscapes.Thisals omeans 58 thatwhile k remainsxed,themaximumnumberofsubfunctionsavariableappea rsin( c ) canincreaseas N increases.Asaresult,someoftheeciencyclaimsinSection7.4mayn ot beapplicable. IsingSpinGlassesareatypeofMAX-CUTproblemrelevantinstatistic alphysics.Each spinglassencodesspins(vertices)andtheirrelationships(edges) withthegoalofassigning eachspinadirectionthatminimizesrelationshipenergy.JustaswithN Kq-Landscapes,Ising SpinGlassesasawholeareNP-Hard,butthe2 D J subsetispolynomiallysolvable[30] 1 .Inthissubsetthegraphisa2Dtoroidalgrid,withedgeweightsof 1.Ingray-boxterms, problemsofthissubsethave k =2and c =4regardlessof N .Foreachproblemclasstested,wegenerated50instances.Extre meproblemsizeswere chosentostresseachalgorithm.Eachmethodwasrunonceoneac hinstance,andlimited to3hoursofcomputationand4GBofmemory.Runswereperforme don2.5GHzIntel XeonE5-2670v2processorsusingtheC++11codeavailablefromou rwebsite. 2 Eachtime therunachievedanewbesttnesswerecordedthecurrentamou ntofprocessingtimeused. Timingincludesthediscoveryofsubgraphstoallowforcomparisonbe tweenderentradius values.Whenreportingthe\best"tnessforaninstancewemean thebesttnessfoundby anymethodbeforethetimelimitisreached.OnallNearestNeighborN Kqinstancesthe \best"tnessisalsotheglobaloptimum,veredusingdynamicprog ramming.ForIsing SpinGlassthe\best"tnesswastheglobaloptimum44outof50time s. Allguresreportthemedian,upperandlowerquartilesforeitherp ercentageerroror secondstoreachthebest.Arun'spercentageerrorisequaltot hederencebetweenits tnessandthebest,dividedbythebest.Whenreportingseconds toreachthebesttness, anyrunthatdidnotndthebesttnessistreatedasslowerthana nyrunthatdid.Ifthe medianrunwasunsuccessful,nodatapointisdrawn. 1 http://www.informatik.uni-koeln.de/spinglass/ 2 https://github.com/brianwgoldman/GrayBoxOptimizatio n/releases 59 0.0000.0250.0500.07512345 Optimizer HBHC TUX Gray-Box P3 Nearest Neighbor NKq0.000.020.040.06123 Optimizer HBHC TUX Gray-Box P3 Unrestricted NKq 0.000.050.100.150.200.25246 Optimizer HBHC TUX Gray-Box P3 2D Ising Spin GlassRadiusErrorFigure9.1:Comparisonofhowradiusaectssolutionqualityattermin ation.ForNKq- Landscapes N =6 ;000and K =4andforIsingSpinGlasses N =6 ;084.Rangeofradius valueslimitedbymemoryconstraints. 9.1TheEectofRadius TheonlyalgorithmparameterintheHBHC,TUX,andGray-BoxP3isth eradiusof thehammingball.Thereforeourrstexperimentsaredesignedtod eterminetheeectof thisparameteronsolutionquality. Figure9.1showstheeectonnalsolutiontnessas r increases.Asexpectedfrom[4], theHBHCobtainshigherqualityas r increases,withthemagnitudeoftheimprovement decreasing.TUXhasasimilarrelationshipandoutperformsHBHCona llthreeproblems forall r values.Regardlessof r ,Gray-BoxP3outperformsboth,withalmostall r values reachingthesamebesttness.ForNearestNeighborNKq,Gray- BoxP3ndstheglobal optimumineveryrunfor r< 4,withonlyasingleunsuccessfulrunat r =4. Tofurtherexaminetheeectof r onGray-Box-P3,Figure9.2showsthenumberof secondsrequiredtoreachtheglobaloptimum.Setting r =1wasthemostecientcongu- rationforall K> 1,supportingthetrendthatGray-BoxP3worksbestwithsmall r values. With K =1,thelandscapeissmoothenoughthatwithasucientlyhigh r theHBHCis abletondtheglobaloptimum. 60 10-1 100101102103104246 RadiusSecondsK1 2 3 4 5Figure9.2:TimerequiredforGray-BoxP3toreachtheglobaloptimu mofNearestNeighbor NKqinstanceswith N =6 ;000. 9.2FitnessOverTime Reachinghigh-qualitysolutionsquicklycansometimesbemoreimporta ntthanreaching theglobaloptimumeventually.Figure9.3showshowsolutionqualitypr ogressesforeach algorithm.HBHCandTUXhavesigncantearlydelayscausedbytheir high r values. Larger r 'srequirealargeamountofinitialpartialevaluationbeforeperform inghillclimbing. AfteronefulliterationHBHCeectivelystalls,withTUXcontinuingto improve.Bothare eclipsedbyGray-BoxP3,whichquicklydescendstotheglobaloptimu m,outperforming HBHCandTUXateverytimepoint. Figure9.4furtherillustratestheeectof r onGray-BoxP3.OnbothNearestNeighbor NKqandIsingSpinGlasses,increasing r doesnotchangetheshapeofthecurve.Instead, thequalityreachedissimplytimeshifted,suchthatgivenmoretimehig her r valueswill reachthesamequality.Asaresult,fortheseproblemsweconclude thathigher r values simplyaddmoreexpensefornooverallgain.OnUnrestrictedNKqth isrelationshipisless certain,with r =1potentiallyhavingaderent,andworse,shapethan r> 1.However, 61 10-5 10-4 10-3 10-2 10-1 10-2 10-1 100101102103104Optimizer HBHC TUX Gray-Box P3 Nearest Neighbor NKq10-3.5 10-3 10-2.5 10-2 10-1.5 10-1 10-0.5 100101102103104Optimizer HBHC TUX Gray-Box P3 Unrestricted NKq 10-3 10-2 10-1 10010-2 10-1 100101102103104Optimizer HBHC TUX Gray-Box P3 2D Ising Spin GlassSecondsErrorFigure9.3:Comparisonofsolutionqualityduringoptimizationonalog-lo gscaleforderent algorithms.ForNKq-Landscapes N =6 ;000and K =4andforIsingSpinGlasses N = 6 ;084.Eachalgorithmusesitsbest-found r value. duetomemoryandtimerestrictionsitisdculttoknowifthistrendco ntinues. 9.3Scalability Perhapsthemostcriticaltestofanoptimizationalgorithm'squalityis howitscales asproblemdcultyincreases.Totestthisbehavior,weranallthre ealgorithmsusingthe optimal r valuesdeterminedexperimentallyinSection9.1,varying N from200to10,000 forNKqand196to6,084forIsingSpinGlass.Intheseplotswealsoinc ludetheblack-box versionofP3toshowtheeciencygainsavailableforusinggray-box information. Figure9.5andFigure9.6showhowlongeachalgorithmrequiredtoreac hthebestoverall qualityfoundonNearestNeighborNKqandIsingSpinGlasses,respe ctively.ForNearest NeighborNKqthebestfoundistheglobaloptimumforallrunsofallle ngths,whileforIsing SpinGlassesthebestqualityfoundbyanymethodwasworsethanth eglobaloptimumin 7runsof N =4 ;096and21runsof N =6 ;084.ThemedianrunoftheHBHCwasunable toreachthebesttnessforanyproblemstestedusingmorethan 200bits.TUXperformed somewhatbetter,reachingthebesttnessmorethanhalfofthe timeonproblemsizesupto N =800and N =625forNearestNeighborNKqandIsingSpinGlass,respectively.B lack- 62 10-5 10-4 10-3 10-2 10-1 10-2 10-1 100101102103104Radius1 2 3 4 5Nearest Neighbor NKq10-3.5 10-3 10-2.5 10-2 10-1.5 10-1 10-0.5 10-1 100101102103104Radius1 2 3Unrestricted NKq 10-3 10-2 10-1 10010-2 10-1 100101102103104Radius1 2 3 4 5 62D Ising Spin GlassSecondsErrorFigure9.4:ComparisonofGray-BoxP3'ssolutionqualityduringoptimiz ationonalog-log scaleforderent r values.ForNKq-Landscapes N =6 ;000and K =4andforIsingSpin Glasses N =6 ;084. BoxP3,whichdoesnotutilizepartialreevaluationortheHBHC,wasa bletoconsistently reachthebesttnessuntilitithitthememorylimiton N =2 ;000forNKqand N =2 ;916 forIsingSpinGlass.ThislimitationisduetoBlack-BoxP3's O ( N 2 )memoryrequirements. Forallsizesofbothproblems,Gray-BoxP3wasthefastesttorea chthebesttness.On NearestNeighborNKqtheimprovementissubstantial,withnoaltern ativenishingwithin twoordersofmagnitude.Onitslargestsuccessfulproblemsize,t hemeantimetocompletion forBlack-BoxP3was375timesslowerthanGray-BoxP3.UsingtheM ann-Whitneytest tocomparetheirruntimesresultsin p< 10 16 .Thisisespeciallyimpressiveconsidering previousworkhasshownBlack-BoxP3isfastertoreachtheglobal optimumthanother leadingblack-boxmethods[10].Applyingregression,weestimatetha tBlack-BoxP3'stime toglobaloptimumonNearestNeighborNKqis O ( N 2 : 75 )whileGray-BoxP3'sis O ( N 1 : 98 ). TheresultsonIsingSpinGlassaresimilar,withalessextremederenc ebetweenBlack- BoxP3andGray-BoxP3.Ingeneral,Gray-Boxisthefastesttech niquetondtheglobal optimumbyanorderofmagnitude,withBlack-BoxP3'smeanrunnish ing4.6timesslower thanGray-Boxon N =2 ;025.UsingtheMann-Whitneytesttocomparetheirruntimes resultsin p< 10 14 .TheregressionlinesuggeststhatwhileGray-BoxP3scalesat O ( N 3 : 35 ), 63 10-1 100101102103104102.5 103103.5 104LengthSecondsOptimizer HBHC TUX Gray-Box P3 Black-Box P3 Figure9.5:Comparisonofhoweachalgorithm'stimerequiredtoreach thebesttnessfound scaleswithproblemsizeonNearestNeighborNKqwith K =4.With N =1000Gray-Box P3is375xfasterthanBlack-BoxP3.HBHCwasonlysuccessfulont hesmallestproblem size. 10-1 100101102103104102.5 103103.5 LengthSecondsOptimizer HBHC TUX Gray-Box P3 Black-Box P3 Figure9.6:Comparisonofhoweachalgorithm'stimerequiredtondth ebesttnessfound scaleswithproblemsizeonIsingSpinGlass.With N =2025Gray-BoxP3is4.6xfaster thanBlack-BoxP3.HBHCwasonlysuccessfulonthesmallestproble msize. 64 0.000.010.020.030.04102.5 103103.5 104LengthErrorOptimizer HBHC TUX Gray-Box P3 Black-Box P3 Figure9.7:Relativequalitiesofeachmethodasproblemsizeincreases onUnrestrictedNKq with K =4. Black-BoxP3scalesat O ( N 3 : 05 ). AsUnrestrictedNKqdoesnothaveaknownglobaloptimum,andthe derentalgo- rithmsrarelyfoundthesamebesttness,Figure9.7comparesthe errorforeachtechnique attermination.Gray-BoxP3ingeneralndsthebesttness,with TUXoccasionallyper- formingbetter.When N> 2 ;000,Gray-BoxP3ndsbetterqualitysolutionsthanallother methodsforeveryinstance.UsingtheMann-Whitneytesttocomp arethetnessofGray- BoxP3andTUXwhen N =10 ;000resultsin p< 10 16 .HBHCandBlack-BoxP3only reachsimilarqualitiesasTUXandGray-BoxP3when N =200,doingsoin4and7runs, respectively.AstheproblemsizeincreaseTUXbeginstofallbehindG ray-BoxP3,withthe HBHCstabilizingatabout2.5%worsethanthebestfound.HereBlack -BoxP3performs worsethantheothertechniques,fallingfurtherbehindasthepro blemsizeincreases. 9.4Discussion Inlinewithpreviouswork,wehavefoundthatHBHCcannoteective lyndglobal optimaonproblemswithevenmoderateepistasis.Ingeneral,italsoo btainsalmostno 65 improvementintnessafteronlyafewrestarts.WedesignedTUXa sasimplisticway ofchoosingrestartlocationsbasedonpreviouslyfoundlocaloptim a.UnlikeHBHCalone, TUXwasabletocontinueimprovinggivenmoretime,ndingglobaloptim aonproblems threetimesaslarge. TUX'seectivenessislikelyduetotheHBHCactingasasuperrepairop eratorforuni- formcrossover.Givenasucientlylarge r theHBHCcanreturnsectionsofthecrossover ospringtoeitherparent'soriginalversionofagivensubfunction. TheHBHCiselitist meaningthereisabiastowardreturningtothebetterofthetwopa rents'versions.Further- more,bybeingsodisruptive,uniformcrossoverpotentiallyallowsfo rtheHBHCtoalsond unrelatedimprovements. WhileTUXimprovesoverplainHBHC,Gray-BoxP3isrequiredtoperfor mtruly successfulglobaloptimization.Gray-BoxP3replacesnaelocal searchwiththeHBHC andutilizesknownnon-linearrelationshipsinsteadofstatisticallinka gelearning.OnNKq- Landscapesthisdrasticallyimprovessearcheectiveness.Amajo rsourceofthisimprove- mentislikelyhowdcultitisforBlack-BoxP3tolearnlinkagerelationship sonthese landscapes.Furthermore,Gray-BoxP3canperformpartialree valuationandecienthill climbingduringthemixingphase. Gray-BoxP3'ssuccessislessdramaticonIsingSpinGlasses.Whileitst illoutperformed allcompetitors,Black-BoxP3mayactuallyscalebettertolargerpr oblems.Oneexplanation forthisdeviationisthatIsingSpinGlassesrequiremoreexplorationo fequaltnessplateaus. Forinstance,inFigure9.3andFigure9.4thereisasigncantpauseinim provementwhen Gray-BoxP3reachesthesecond-besttnessinthelandscape.N othinginitsdesignsuggests thatGray-BoxP3shouldbemoreeectiveatneutraldrift.Anoth erpotentialissueisthaton theselandscapestheimportanceofeachnon-linearrelationshipma ybedetectablyunequal. Asaresult,Black-Boxlinkagelearningmaybetterclustervariablest hathavemeaningful impactontnesswhileGray-Boxassumesallareequallyimportant.A usefuldirectionfor futureworkwouldbetoexploremethodsofperformingecientlear ningontopoftheknown 66 variableinteractions. Somewhatsurprisingisthederenceinbehaviorbetweenthepolyno miallysolvable problemsandUnrestrictedNKq.WhileBlack-BoxP3performedwellin theformer,itwas theleastsuccessfulinthelatter.TheoptimalradiusforGray-Bo xP3alsoshiftedfrom1 to2.ApotentialexplanationisthatinbothIsingSpinGlassandNeare stNeighborNKq thenumberofuniquevariableswithinagivenradiusfromagivenvariab leissigncantly lowerthanintheworstcase.ThisexplainswhyonUnrestrictedNKqe venmoderatelyhigh r valueshitourmemorylimit.ForBlack-BoxP3thismayalsobecausinginc reaseddcultyin linkagelearningasvariablesbecomeindirectlydependentonmuchlarg ersets.Furthermore, Black-BoxP3maybebenetingfromanincreasedrateofduplicated ependenciesonNearest NeighborNKqnotpresentinUnrestrictedNKq. WhiletheinclusionofHBHCintoGray-BoxP3introducesaparameter, itrequirestriv- ialconguration.Intheworstcasetheremaybeahandfulof r valuestotest.Furthermore, ourevidencesuggestssetting r =1isquitepowerful,withhighervalueslikelytobeonlya timeshiftinquality.Thisisincontrastto r 'sroleinHBHC,wherelow r valuesarenever expectedtoreachthesamequalityashigher r values.ThereforeweconcludethatGray- BoxP3maintainstheout-of-the-boxqualityofBlack-BoxP3,whiled rasticallyimproving eciencyforthisnewdomainofproblems. 67 Chapter10 UnderstandingWhenP3Excels Theruggednessandhighdimensionalityofmostinterestinglandscap esmakesthem challengingtovisualizeorotherwiseanalyze.However,doingsocanb ehelpfulinquantifying thedcultyofaproblem,andhowtobestdesignalgorithmstodealw iththosedculties. Similarly,knowingwhichcharacteristicsfavoraparticularalgorithmc anhelpresearchers choosethealgorithmmostlikelytoperformwellontheirproblem.Tof urtherthisend, weexplorethelandscapesusedinprevioussectionstounderstand whatmakesalandscape suitedforP3optimization. 10.1BigValley Onemethodforvisualizingtheglobalstructureofalandscapeistoe xaminethere- lationshipbetweenlocaloptima[2].Initsoriginalform,thisprocessin volvesgenerating thousandsofrandomsolutionsandthenapplyinglocalsearchtoea ch.Thisinformationis thendisplayedintwo-dimensionalplots:distancefromthenearest globaloptimumandt- nessderencefromtheglobaloptima.Foranumberofinteresting problems,a\BigValley" oflocaloptimaexists,suchthatthehigherasolution'stnessis,th ecloseritisinrepresen- tationspacetotheglobaloptimum.Thisresultsuggeststhatfocu singsearcharoundknown high-qualitysolutionsincreasesthelikelihoodofndingevenhigher-q ualitysolutions.This relationshipisanunderlyingassumptionofallevolutionary-basedse archmethods,including 68 Figure10.1:Examplechangeofenumerationordering.Thegrayloci representalldepen- denciesforsomemove m i .Byreordering, m i 'slowest index dependencyimprovesfrom2to 4. P3. Wehavesetouttoextendthismethodoflandscapevisualizationbyc onsideringnotjust asampleoflocaloptima,butalllocalandglobaloptimainthelandscap e.Whenconsidered asablack-box,ndingalllocaloptimarequireseachsolutiontobeen umeratedandall ofitsneighborstobecheckedfortnessimprovements.Thisenum erationisprohibitively expensiveforeventriviallandscapes,requiring( N 2 N )operations.However,byleveraging thegray-boxdomain,someeciencyimprovementscanbemade. 10.1.1QuicklyFindingAllLocalOptima Forgray-boxproblems,wecandeterminethesetofalltness-imp rovingmovesfroma givensolutionin O ( N )timeandthissetcanbeupdatedin O (1)timewhenippingasingle bit.Therefore,checkingifeachnewsolutionislocallyoptimalrequire s O (1)timeandthe overallenumerationprocessrequires O (2 N ). Duetothelimitednon-linearityofthegray-boxdomain,itispossibleto excludelarge partsofthesearchspacewithoutmissinganylocaloptima.Conside rtherepresentation presentedinthetopofFigure10.1.Inablack-boxdomain,enumera tionwouldprogress asabinarycounter,treating index zero(symbol A inthegenome)astheleastsigncant bit.Thisensuresthatbeforechanging indexi ,allpossiblesettingsof index 0through i 1havebeentested.Thegray-boxdomainmakesitpossibletoskipco mbinationswhich 69 cannotbelocaloptima.InFigure10.1thereexistsamove m i whichisatnessimprovement whenenumerationstarts(allvariablessetto0).Duetotheknown relationshipsbetween variables,weknowthatthequalityof m i onlydependsonvariables C ,E ,F ,and H .Therefore,untiloneofthosefourvariablesaremoded,thesolu tioncannotbealocal optimum.Asaresult,solutions00000000,10000000,01000000,a nd11000000cannotbe localoptimaandcanthereforebeskippedduringenumeration.Mor egenerally,ifatany pointduringenumerationthereexistsatness-improvingmove,no localoptimacanexist untilatleastonedependencyofthatmoveismoded. Thisknowledgecanbeeectivelyexploitedtoskippartsoftheenume ration,asshown inFigure10.2.Beforestarting,eachmoveisputintoatable move bin basedonthatmove's lowest index dependency.Thisistherst index whichcanbemodedbyenumeration tochangethetnesseectofmakingthatmove.Inordertodete rminehowmuchofthe enumerationcanbeskipped,wemustndthehighest index in move bin whichcontainsa tness-improvingmove,asdoneby FindNextIndex .Ifnomoveistness-improving,then alocaloptimumhasbeenfound. The AllLocalOptima algorithminFigure10.2worksbyrepeatedlycalling Find- NextIndex andadding1totheresulting index position'svalue.Initially AllLocalOp- tima uses FindNextIndex tocheckallmoves(initializes index to N 1).Ifatanypoint FindNextIndex returns-1thennomoveistness-improvingandthecurrentsolut ionis addedtothelistoflocaloptima found .AllLocalOptima thenaddsa1tothe index returnedby FindNextIndex usingthelooponLine11toperformcarryoperationsand Line16tocreatethenew1value.Iterationstopswhenthecarrye xceedsthesolutionlength. Whenperformingsubsequentchecks,notallmovesneedtoberet estedforimprovement. Instead,thehighest index binthatmustbetestedisthehighest index ippedbytheprevious iteration.Thissimpcationispossiblebecausethepreviousiteration hasveredthatall movesinhigher index binsarenottness-improving,andnoactionperformedduringtha t iterationcanmakethemtness-improving.Furthermore,no1sca nexistinlower index 70 1: procedure AllLocalOptima 2: solution f 0 g N 3: found [] 4: index N 1 5: while index