AUTOMATEDADDITIONOFFAULT-TOLERANCEVIALAZYREPAIRANDGRACEFUL DEGRADATION By YiyanLin ADISSERTATION Submittedto MichiganStateUniversity inpartialoftherequirements forthedegreeof ComputerScience-DoctorofPhilosophy 2015 ABSTRACT AUTOMATEDADDITIONOFFAULT-TOLERANCEVIALAZYREPAIRAND GRACEFULDEGRADATION By YiyanLin Inthisdissertation,weconcentrateontheproblemofautomatedadditionoffault-tolerancethat transformsafault-intolerantprogramtobeafault-tolerantprogram.Wesolvethisproblemvia modelrepair.Modelrepairisacorrect-by-constructtechniquetoreviseanexistingmodelsothat therevisedmodelthegivencorrectnesscriteria,suchassafety,liveness,orfault-tolerance. Weconsidertwoproblemsofusingmodelrepairtoaddfault-tolerance.First,iftherepaired modelviolatestheassumptions(e.g.,partialobservability,inabilitytodetectcrashedprocesses, etc)madeintheunderlyingsystem,thenitcannotbeimplemented.Wedenotetheserequirements as realizabilityconstraints .Second,theadditionoffault-tolerancemayfailiftheprogramcannot fullyrecoveraftercertainfaultsoccur.Inthisdissertation,weproposea lazy repairapproach toaddressrealizabilityissuesinaddingfault-tolerance.Additionally,weproposeatechniqueto automaticallyaddgracefuldegradationtoaprogram,sothattheprogramcanrecoverwithpartial functionality(thatisbythedesignertobethecriticalfunctionality)iffullrecoveryis impossible. Amodelrepairtechniquetransformsamodeltoanothermodelthatanewsetofprop- erties.Suchatransformationshouldalsomaintainthemappingbetweenthemodelandtheunder- lyingprogram.Forexample,inadistributedprogram,everyprocessisrestrictedtoread(orwrite) somevariablesinotherprocesses.Amodelthatrepresentsthisprogramshouldalsodisallowthe processtoread(orwrite)thoseinaccessablevariables.Iftheseconstraintsareviolated,thenthe correspondingmodelwillbeunrealizable.Anunrealizablemodel(inthiscontext,amodelthat violatestheread/writerestrictions)maymakeitimpossibletoobtainthecorrespondingimple- mentation.Resolvingrealizabilityconstraintsincreasesthecomplexityofmodelrepair.Existing heuristic-basedapproachisdesignedandoptimizedfordistributedprograms.Weneed amoregenericmodelrepairapproachforothertypesofprograms.Hence,inthisdissertation,we proposeamodelrepairtechnique,i.e.,lazyrepair,toaddfault-tolerancetoprogramswithdifferent typesofrealizabilityconstraints.Itinvolvestwosteps.First,weonlyfocusonrepairingtoobtaina modelthatcorrectnesscriteriawhileignoringrealizabilityconstraints.Inthesecondstep, werepairthismodelfurtherbyremovingbehaviorswhileensuringthatthedesiredis preserved.Thelazyrepairapproachtheprocessofdevelopingheuristics,andprovides atradeoffintermsofthetimesavedinthestepandtheextraworkrequiredinthesecondstep. Inaddition,safetycriticalsystemssuchasairplanes,automobilesandelevatorsshouldoperate withhighdependabilityinthepresenceoffaults.Iftheoccurrenceoffaultsbreaksdownsome components,thesystemmaynotbeabletofullyrecover.Inthisscenario,thesystemcanstill operatewithremainingresourcesanddeliverpartialbutcorefunctionality,i.e.,todisplay graceful degradation .Existingmodelrepairapproachescannottransformaprogramtoprovidegraceful degradation.Inthisdissertation,weproposeatechniquetoaddfault-tolerancetoaprogramwith gracefuldegradation.Intheabsenceoffaults,suchaprogramexhibitsidealbehaviors.Inthe presenceoffaults,theprogramisallowedtorecoverwithreducedfunctionality.Thistechnique involvestwosteps.First,itautomaticallygeneratesaprogramwithgracefuldegradationbasedon theinputfault-intolerantprogram.Second,itaddsfault-tolerancetotheoutputprogramfrom step.Wedemonstratethatthistechniqueisapplicableinthecontextofhighatomicityprogramsas wellaslowatomicityprograms(i.e.,distributedprograms). TABLEOFCONTENTS LISTOFTABLES ....................................... vii LISTOFFIGURES ...................................... viii LISTOFALGORITHMS ................................... ix Chapter1Introduction .................................. 1 1.1RealizabilityConstraints...............................3 1.2GracefulDegradation.................................4 1.3Contributions.....................................7 1.4OutlineoftheDissertation..............................10 Chapter2PreliminaryConcepts ............................. 12 2.1Program........................................12 2.1.1Example:ByzantineAgreement.......................14 2.2......................................15 2.2.1Examplecont'd:ByzantineAgreement...................17 2.3FaultsandFault-tolerance...............................18 2.3.1Examplecont'd:Fault-tolerantByzantineAgreement............20 Chapter3LazyRepairforSynchronousSystems .................... 23 3.1Background......................................23 3.2InterleavingSemantics................................24 3.3SynchronousSemantics................................24 3.4PreliminaryConceptsforSynchronousPrograms..................25 3.5RealizabilityConstraintsinSynchronousProgram..................28 3.6ModelingFaultsforAddingFault-tolerancetoSynchronousPrograms.......30 3.7ProblemStatementofAddingFault-tolerancetoSynchronousPrograms......32 3.8AlgorithmofAddingFault-tolerancetoSynchronousProgram...........32 3.8.1AddingNonmaskingandFailsafeFault-tolerance..............38 3.8.2ExpedientRecoveryinthePresenceofFaults................38 3.9CaseStudy1:GroupMembershipProtocol.....................39 3.9.1Fault-intolerantGroupMembershipProtocol................40 3.9.2FaultsinGroupMembershipProtocol....................42 3.9.3SafetyofGroupMembershipProtocol.............42 3.9.4LegitimateStatesofGroupMembershipProtocol..............43 3.9.5AddingFailsafe/MaskingFault-tolerancetoGroupMembershipProtocol.44 3.10CaseStudy2:AThree-State(MSI)Write-BackInvalidationProtocol........45 3.10.1Fault-intolerantMSIProtocol.........................46 3.10.2FaultsinMSIProtocol............................48 3.10.3SafetyofMSIProtocol.....................48 iv 3.10.4LegitimateStatesofMSIProtocol......................49 3.10.5AddingMaskingFault-tolerancetoMSIProtocol..............49 3.11ExperimentResults..................................50 3.12ApplicationofOurAlgorithminExistingArchitectures/Languages.........51 3.12.1ApplicationinTTA..............................51 3.12.2ApplicationinEsterel.............................53 3.12.3ApplicationinBIP..............................54 Chapter4LazyRepairforCyber-physicalSystems ................... 55 4.1PreliminaryConceptsforCyber-physicalSystem..................56 4.2RealizabilityConstraintsinCyber-physicalPrograms................59 4.3ModelingFaultsforAddingRecoverytoCyber-physicalPrograms.........60 4.4ConstraintsforAddingRecovery...........................63 4.4.1Non-interferenceConstraints.........................64 4.4.2ConstraintsonCyber-physicalInteractions..................65 4.5HeuristicsforAddingRecoverytoCyber-physicalPrograms............68 4.6CaseStudy1:BroadcastChannel...........................71 4.7CaseStudy2:TrainSignalsController........................72 Chapter5LazyRepairforDistributedSystems ..................... 74 5.1RealizabilityConstraintsinDistributedPrograms..................74 5.1.1WriteRestrictions...............................75 5.1.2ReadRestrictions...............................76 5.2ProblemStatementofAddingFault-tolerancetoDistributedPrograms.......76 5.3AlgorithmofAddingFault-tolerancetoDistributedPrograms............77 5.4ExperimentalResults.................................80 Chapter6AddingGracefulDegradationtoHighAtomicityPrograms ........ 83 6.1PreliminaryConcepts.................................83 6.2ProblemStatementforGracefulDegradationDesign................83 6.3AlgorithmforGeneratingGracefulProgram.....................84 6.4CaseStudy:PrinterSystem..............................87 6.4.1OriginalPrinterSystemActions.......................87 6.4.2ofPrinterSystem.......................88 6.4.3GeneratingaGracefulPrinterSystemProgram...............90 Chapter7AddingGracefulDegradationtoDistributedPrograms .......... 92 7.1AlgorithmforGeneratingDistributedGracefulPrograms..............92 7.2CaseStudy:ByzantineAgreement..........................93 7.2.1ofByzantineAgreement...................94 7.2.2ActionsforByzantineAgreement......................95 7.2.3InvariantofByzantineAgreement......................96 7.2.4GeneratingGracefulByzantineAgreementProgram.............96 v Chapter8AddingFault-tolerancetoProgramwithGracefulDegradation ...... 100 8.1ProblemStatement..................................100 8.2CaseStudy(continued):Fault-tolerantPrinterSystem................104 8.3CaseStudy(continued):Fault-tolerantByzantineAgreement............105 Chapter9AddingMulti-GracefulDegradation ..................... 107 9.1ProblemStatement..................................107 9.2Algorithm.......................................109 9.3OhioCoalResearchCenterVentilationSystem...................113 9.3.1ModelingVentilationSystem.........................115 9.4ApplicationofMulti-GracefulDegradationinOCRC................118 9.4.1Faults.....................................119 9.4.2RequirementsinthepresenceofFaults....................120 9.4.3ApplicationofAlgorithm11.........................121 9.4.4ExtensiontodealwithUnsafeCentral....................124 Chapter10RelatedWork .................................. 126 10.1SynthesisTechniques.................................127 10.2RevisiononModels..................................128 10.3RevisiononPrograms.................................130 Chapter11ConclusionandFutureWork ......................... 132 11.1Conclusion......................................132 11.2FutureWork......................................135 BIBLIOGRAPHY ....................................... 138 vi LISTOFTABLES Table3.1ExperimentalResultsforGroupMembershipProtocol...........51 Table3.2ExperimentalResultsforThree-StateWrite-BackInvalidationProtocol..51 Table3.3PartofEsterelPrimitives..........................53 Table4.1Experimentalresultsforautomatedadditionoffaultrecoverytobroadcast channel....................................72 Table4.2Experimentalresultsforautomatedadditionoffaultrecoverytotrainsig- nalscontroller.................................73 Table5.1ExperimentalResultsforByzantineAgreement..............81 Table5.2ExperimentalResultsforStabilizingChain.................82 vii LISTOFFIGURES Figure1.1Fault-tolerantprogrambehavior.......................5 Figure1.2Fault-tolerantgracefuldegradationprogram.................5 Figure1.3Step1:DesignofGracefulProgram....................6 Figure1.4Step2:DesignofFault-TolerantGracefulDegradation..........6 Figure1.5Lazyrepairframework...........................7 Figure1.6Addinggracefuldegradationframework..................10 Figure2.1Fault-intolerantbyzantineagreementprogram................15 Figure2.2Faultsofbyzantineagreementprogram...................21 Figure2.3Maskingfault-tolerantbyzantineagreementprogram............21 Figure3.1Stateupdateinasynchronousprogram...................27 Figure3.2Asynchronousprogram...........................29 Figure3.3Asynchronousprogram...........................29 Figure3.4Asynchronousprogram...........................29 Figure3.5Notasynchronousprogram.........................29 Figure3.6Faultintolerantprocessorgroupmembershipprotocol...........42 Figure3.7Fault-intolerantCacheCoherenceProtocol.................48 Figure4.1Acommunicationprotocol..........................56 Figure4.2Thecommunicationprotocolwithfaults...................61 Figure4.3Thecommunicationprotocolwithfaultrecovery..............62 Figure4.4Atrainsignalcontroller............................72 Figure5.1Violatewriterestriction............................74 viii Figure5.2Violatereadrestriction............................74 Figure5.3CorrectModeling..............................74 Figure5.4Fault-intolerantstabilizingchainprogram..................81 Figure6.1GracefulPrinterSystemProgram......................91 Figure7.1ActionsforGracefulByzantineAgreementProgram............99 Figure8.1 ActionsforFault-tolerantGraceful-DegradationProgram ............106 Figure9.1OverviewofOCRCLabSafetySystem...................116 Figure9.2ActionsofFault-intolerantProgram p ...................119 Figure9.3Faults....................................120 Figure9.4Transitionsof p g 1 (exceptthoseshowninTable ?? )............123 Figure9.5Newlydiscoveredtransitionsin Part3 ...................124 ix LISTOFALGORITHMS Algorithm1AddingMaskingFault-tolerancetoaSynchronousProgram...... 33 Algorithm2Addsaferecovery............................ 33 Algorithm3Breakcycles............................... 34 Algorithm4ConstructingaSynchronousProgram.................. 35 Algorithm5AddingFastRecovery.......................... 39 Algorithm6AddingMaskingFault-tolerancetoaDistributedProgram....... 77 Algorithm7Constructadistributedprogram..................... 79 Algorithm8GracefulProgramGeneration...................... 86 Algorithm9Handlereadrestriction.......................... 93 Algorithm10AddingFault-tolerancetoGracefulProgram............... 103 Algorithm11Addingmulti-gracefuldegradation................... 111 x Chapter1 Introduction Theproblemofautomatedadditionoffault-tolerancefocusesontransformingafault-intolerant programintoafault-tolerantprogram.Existingtechniques[16,18,46,47]solvethisproblemvia modelrepair.Inmodelrepair,oneprovidesamodel(e.g.,anautomaton)representingthefault- intolerantprogrambehaviorasinput.Thismodelisrevisedsoastoensurethattherevisedmodel providesthedesiredfault-tolerancetothefaults.Moreover,thegeneratedprogramis correct-by-construction. Amodelrepairtechniquetransformsamodeltoanothermodelthatanewsetofprop- erties.Suchatransformationshouldalsomaintainthemappingbetweenthemodelandtheunder- lyingprogram.Forexample,inadistributedprogram,everyprocessisrestrictedtoread(orwrite) somevariablesinotherprocesses.Amodelthatrepresentsthisprogramshouldalsodisallowthe processtoread(orwrite)thoseinaccessiblevariables.Iftheseconstraintsareviolated,thenthe correspondingmodelwillbeunrealizable.Anunrealizablemodel(inthiscontext,amodelthat violatestheread/writerestrictions)maymakeitimpossibletoobtainthecorrespondingimplemen- tation.Forexample,ifamodelallowsoneprocesstoaccessaprivatevariableinanotherprocess, thenonewouldneedtoprovideanewcommunicationprotocolbetweenthesetwoprocesses.As aresult,thecorrectnessoftheprogrammayneedtobere-vsoastoensurethereisnodata raceordeadlockbetweenthesetwoprocesses. Resolvingrealizabilityconstraintsincreasesthecomplexityofmodelrepair[22,46].Existing techniquesreducethecomplexityusingheuristic-basedapproach[21,51].Thedevelopmentand 1 theimplementationofheuristicsarecomplicatedbythefactthat,foragivenheuristic,weneedto determinehowthatheuristicreducesthecomplexity.Furthermore,weneedtoidentifyifaheuristic issorestrictivethatitsusewillcausethealgorithmtodeclarefailureveryoften.Thecomplication ofdevelopingefheuristicsalsopreventsextendingexistingmodelrepairapproaches[46,31, 21]tobeapplicablefordifferenttypesofsystems.Hence,weneedamoregenericsolutionfor additionoffault-tolerancetoprogramsinabroaderdomain. Inadditiontoaddressingtherealizabilityissues,anotherproblemwewanttosolveinthisdis- sertationistoaddfault-tolerancewithgracefuldegradation.Complexsystemssuchasairplanes, automobiles,andelevatorshavehighdependabilityrequirement,sincesomeoftheirfunctionality issafety-critical.Thesesystemsshouldoperatewithhighdependabilityinthepresenceoffaults. Oncefaultsoccur,itmaybeimpossibleforasystemtodeliverfullfunctionality.Gracefuldegra- dationprovidesincreaseddependabilityforthesesystems.Intheabsenceoffaults,afault-tolerant systemwithgracefuldegradationbehavesnormally.Aftertheoccurrenceoffaults,suchasystem canrecovertooperatewithlimitedbutcorefunctionality,ifrecoverywithfullfunctionalityis impossible. Wecanuseaddingfault-tolerancetechniquestoaiddesignerstoobtainsuchafault-tolerant programwithgracefuldegradation.Theinputshouldbeaprogramwithasetofidealbehaviors, thatprovidefullfunctionality,aswellasasetofgracefulbehaviors,thatprovidepartialfunctional- ity.Notethatmostlegacyfault-intolerantprogramsdonotincludethesegracefulbehaviors.Andit isnotdesirabletoaskadesignertoprovidesuchaninputprogram,sinceitincreasestheoverhead forthedesigner.Hence,weneedatechniquetoautomaticallygenerateaprogramwithgraceful degradation. Next,weexplainthenotionsofrealizabilityconstraintsandgracefuldegradation. 2 1.1RealizabilityConstraints Toapplyautomatedadditionoffault-tolerance,wemustidentifytherelationbetweenthemodel andunderlyingsystem,andmakesurethattherevisedmodelcanberealizedbythesystem.Forex- ample,astateautomatonrepresentingadistributedprogrammaynotberealizedifitincludes anytransitionthatrequiresonesingleprocesstoaccessallvariablesoftheprogram.Likewise,an automatonofareal-timesystemmaynotberealizedifithasanbehaviorwheretimedoes notdiverge.Inotherwords,afterweperformrepaironaninputmodel M ,toobtainamodel M 0 , itisnecessarythat M 0 canbeconvertedintoaprogramforthegivensystem. Toillustratetheissueofrealizabilityconstraintsfurther,supposewerepairamodelrepresent- ingaprogramrunningonthetime-triggeredarchitecture(TTA[45]).Itisstraightforwardtomodel thebehavioroftheprogramusingastateautomaton.,wecanusethestatein theautomatontorepresentavaluationoftheprogramvariables.Also,wecanusetransitionsto representthevariableupdatesbetweentwostates.Inaddition,accordingtoTTAall processesmustparticipateandarerequiredtoaccomplishtheiractionswithingiventimeframe.To makesurethatamodelcanberealizedbyaTTAprogram,werequirethateachtransition thebehaviorofallprocessesinonetimeframe.Thispropertyistrueofinputmodel M sinceit isconstructedfromavalidTTAprogram.Howevertherepairedmodel M 0 cannotberealizedby aTTAprogramifitincludesanytransitionthatrepresentsonlyasubsetoftheprocessesupdating theirvariables. TheabovediscussionimpliesthattheproblemofmodelrepairforaTTAprogramrequiresthat wegenerateamodel M 0 thatnotonlythedesiredpropertiesbutalsothetiming restrictionofTTAarchitecture.Suchtimingrestrictionintroducesdependencyrelationamong transitionsduringrepair.Asaresult,theremovalofatransitioninthemodelmayleadtoremoval 3 ofothertransitionstoprocessbehaviorchange. Inthisdissertation,weproposealazyrepairframeworkforadditionoffault-tolerancetopro- gramswithrealizabilityconstraints.Thisnewapproachaimstosimplifythedevelopmentofal- gorithmbydecouplingtheprocessofresolvingrealizabilityconstraintsfromadditionoffault- tolerance.Asaresult,adesignercanreuseexistingaddingfault-tolerancetechniqueandonly focusonaddressingtherealizabilityissues. 1.2GracefulDegradation Asdiscussedbefore,onelimitationofaddingfault-toleranceisthatinmanyscenarios,itisim- possiblefortheprogramtorecovertotheoriginalprogrambehaviorafterfaultsoccur.Insuch scenarios,itisdesirablethattheprogramexhibitssomegracefullydegradedbehavior,i.e.,be- haviorthatis close toitsoriginalbehavior.Acanonicalexampleofagracefulbehaviorprogram includesthecasewheretheoriginalprogramprovidescoreservicesandauxiliaryservices.How- ever,afterrecovery,theprogrammayonlyprovidecoreservicesandfailtoprovidetheauxiliary services.Moregenerally,insuchasystem,itisexpectedthattheprogramwillsatisfyits cationundernormalcircumstances.However,uponoccurrenceoffaults,itmaysatisfy(agiven) weaker(respectively,relaxed) Theideaofsuchgracefuldegradationhasbeenintroducedin[40],whereauthorsconsiderthe toconsistofasetof n properties.Subsequently,theyconsider 2 n possible cationsforeachpossiblesubsetofthese n properties.Thus,theoretically,thereexists 2 n possible programsdependingupontheexactsubsetofpropertiesthatareexpectedtobeunder differentcircumstances.Inpractice,however,consideringsuch 2 n isunnecessary. Forexample,intheabovescenario,thereisnoneedtoconsideraprogramthatonlyprovides 4 auxiliaryservices.Likewise,thereisalsononeedtoconsideraprogramthatprovidesneitherthe corenortheauxiliaryservices.Inotherwords,intheaboveexample,weneedtoconsidertwo programs,onethatguaranteesthatboththecoreandauxiliaryservicesareprovided,andanother thatguaranteesthatatleastthecoreservicesareprovided. Tofurtherillustratetheapplicationofgracefuldegradation,weuseanelevatorsystem[64]. Thiselevatorsystemisdesignedwithacorerequirementthatallpassengersshouldbedelivered tothedestination.Additionally,itshouldsatisfyaperformancerequirementthattheelevatordoes notstopatwheretherearenopassengerrequests.Undernormalcircumstance,bothrequire- mentsshouldbeHowever,ifafaultbreaksthecontrolpanelinsidetheelevator,thenthe systemmaynotknowthelistofrequestedInthisscenario,abettersolutionistoletthe elevatormoveandstopatevery,soastomakesurepassengerscanreachtheirdestinations. Notethatinthiscase,theelevatortoleratesthefaultbyreducingitsefywhileguaranteesthe corerequirement. Figure1.1Fault-tolerantprogrambehavior. Figure1.2Fault-tolerantgraceful degradationprogram. Toillustratetheneedforfault-tolerancewithgracefuldegradationfurther,wecomparethe behaviorsofafault-tolerantprogramandafault-tolerantgracefuldegradationprogramviaFig- ure1.1andFigure1.2.Toillustrateourexample,let S (c.f.Figure1.1)denotesstatesfrom 5 whichtheprogramtheIntuitively,fault-tolerancerequiresthatiffaultsper- turbtheprogram,itrecoversbackto S (viarecoverytransitions,say t ).Hence,theprogramwill subsequentlysatisfyitsBycontrast,infault-tolerancewithgracefuldegradation,it recoversto S 0 ( S 0 S ).Ifitrecoversto S ,itwillsatisfyitsoriginalHowever, ifitrecoversto S 0 S ,itwillonlysatisfytheweakerinwhichonlypartsofthe requirementsareguaranteed.Hence,designingafault-tolerantprogramwithgracefuldegradation canbepartitionedintotwosteps, 1. designthegracefulprogramthattheweaker(c.f.Figure1.3) 2. addfault-tolerancetothatgracefulprogram(c.f.Figure1.4) Figure1.3Step1:DesignofGracefulProgram Figure1.4Step2:DesignofFault-Tolerant GracefulDegradation Thestepfocusesontheautomatedadditionofgracefuldegradationtothefault-intolerant programintheabsenceoffaults.Inparticular,thisstepbeginswithaprogram,say p ,that theoriginalandconstructsaprogram,say p g ,thattheweaker,degraded Clearly, p g cannotsatisfytheweakerinanarbitraryway.Hence,we constrainthissteptorequirethat p g hasalltransitionsofprogram p .However,itmayinclude additionaltransitionsthatallowittoexhibitbehaviorsthatonlysatisfytheweakeron. Inthesecondstep,webeginwithprogram p and p g toconstructafault-tolerantgracefully degradingprogram p f .Program p f ensuresthatintheabsenceoffaults,itbehaveslike p and, 6 hence,itwillsatisfytheoriginalintheabsenceoffaults.Moreover,iffaultsoccur then p f recoverstostatesfromwhere p g thedegraded 1.3Contributions Wepresenttwocontributionsofthisdissertationasfollows: Lazyrepairforadditionoffault-tolerance.Existingmodelrepairapproaches[21,51]trans- formtheinputprogramwhileapplyingheuristics cautiously .Duringcautiousrepair,every intermediateprogramrealizabilityconstraints.Whenthealgorithmterminates,the programisguaranteedtoberealizable.Unlikethecautiousapproach,weattempttosolvethe issueusingatwo-stepapproach(i.e., lazyrepair showninFigure1.5)illustratedasfollows, Figure1.5Lazyrepairframework 1. givenamodel M ,performrepairalgorithmtoaddpropertiestoexistingdesign(e.g. addingfault-toleranceorgracefuldegradation)andonlyresolvelimitedrealizability constraints(orevennoconstraints),inordertoobtainamodel M 00 . 2. resolvefullconstraintstoobtainthemodel M 0 thatistotallyrealizableunder systemsemantics. Inthestep,wefocusonaddingthedesiredpropertiessuchassafety,livenessorfault- tolerance.Inthisstep,weonlyconsiderasubsetofrealizabilityconstraints(possiblynone). 7 Thegoalofthisstepistoquicklyobtainamodel M 00 thatthedesired Notethatsuchaproblemcanbesolvedinpolynomialtimeinthesizeofstatespaceofinput program[46],wecanexpecttoobtain M 00 quickly.Anotheradvantageofusingtechniques in[46]isthat M 00 removesastate(respectivelyatransition)onlyifreachingthatstate (respectivelyexecutingthattransition)wouldmakeitimpossibletoaddthefault-tolerant property.Thus,itmaintainsamaximumpossiblechoicefordealingwithrealizabilitycon- straintsinthesecondstep. Inthesecondstep,weperformfurtherrepairon M 00 toobtain M 0 thatpreservestheprop- ertieswhilesatisfyingrealizabilityconstraints.Ifastate s isexcludedfrom M 00 ,thenit introducesimpossibilityofdesigningafault-tolerantprogramthatreaches s .Hence,adding newreachablestatesin M 0 neednotbeconsideredinthesecondstep. Themotivationbehindlazyrepairisthatthestepcanbemademoreefbyignoring realizabilityconstraints.Andthesecondstepcanbemademoreefbythefactthatwe onlyfocusonremovingtransitionswhilepreservingthedesiredInthecontext ofuniversalthiscanbehandledeasily.Forexample,removaloftransitions doesnotviolate(universal)safetyAnd,removaloftransitionsdoesnotviolate (universal)livenessifitdoesnotintroduceadeadlockinthestatespace. Inthisdissertation,westudythelazyrepairapproachforprogramsinthreedifferentdo- mains. Œ Synchronousprograms.Wepresentthenotionofsynchronoussemanticsformod- elingtime-triggeredsynchronousprograms.Wedeveloppolynomial-timesoundand completelazyrepairalgorithmsformodelsinsynchronoussemanticstoprovidethree differentlevelsoffault-tolerance,namely,fail-safe,nonmasking,andmasking.Ad- 8 ditionally,thealgorithmscanbeusedtoensurethatwhenfaultsstopoccurringthe programrecoverstoitssetoflegitimatestateswithinagivennumberofsteps.We illustrateouralgorithmusingaround-basedgroupmembershipprotocolaswellasa cachecoherenceprotocol.Wealsopresenttheresultsofourexperimentsontheim- plementationoftheaforementionedalgorithms.Ourapproachcanbeusedindifferent architectures/languagesusedinembeddedsystems.,weshowthatourtech- niquecanbeusedforrepairingmodelsinsynchronouslanguagessuchasEsterel[12], aswellasinTTA[45],andBIP[36]. Œ Cyber-physicalprograms.WeusetheBIPframework[11,35]tospecifycomponent- basedmodelsandproposeasetofrealizabilityconstraintstocapturecyber-physical features.Weshowthatautomatedadditionoffaultrecoverytocyber-physicalcomponent- basedmodelsaugmentedwithfaultybehaviorisNP-complete.Consequently,weintro- ducedasetofheuristicstocopewithexponentialcomplexity.Wepresentexperimental resultsofaddingrecoverytotwocyber-physicalprograms,i.e.,broadcastchannelsand trainsignalcontrollers. Œ Distributedprograms.Weproposeasetofrealizabilityconstraintstocapturefeatures indistributedprograms.Wepresentasoundandpolynomial-timealgorithmtoresolve therealizabilityconstraintsofadistributedprogram.Wepresenttheexperimentalre- sultsfortransformingtwotypesofdistributedfault-tolerantprograms,i.e.,byzantine agreementprogramsandstabilizingchainprograms. Automatedadditionofgracefuldegradation.WedevelopanAutomatedapproachtotrans- formafault-intolerantprogramtobeafault-tolerantprogramwithgracefuldegradation. Ourapproachforadditionoffault-tolerancewithgraceful-degradationconsistsoftwosteps 9 (showninFigure1.6).Thestepfocusesontheautomatedadditionofgracefuldegra- dationtothefault-intolerantprogramintheabsenceoffaults.Inparticular,afterthe step,weobtainsuchaprogramthatpreservesallthebehaviorsfromtheinputprogramand gainsasetofgracefulbehaviors.Inthesecondstep,weadaptexistingalgorithmsforadding fault-tolerancetodesignfault-tolerantprogramwithgracefuldegradation. Wedemonstratethatourapproachisapplicableforrevisingprogramsindifferentdomains. Inparticular,wepresentacasestudytoobtainagracefuldegradingprintersystem.In addition,wepresentanothercasestudytoreviseadistributedprogram,namelyabyzantine agreementprogram. Weextendthetechniqueofadditionofgracefuldegradationtosupportadditionofmulti- gracefuldegradation.Inmulti-gracefuldegradation,aprogramprovidessuccessivelyre- ducedguaranteeoforiginalrequirementsinthepresenceofincreasinglyseverefaults.We illustratethistechniqueviaacasestudyonafault-tolerantsystem,i.e.,adangerousgas detectionandventilationsystem[53]. Figure1.6Addinggracefuldegradationframework 1.4OutlineoftheDissertation Thisdissertationconsistsofthreeparts. 10 Inthepart,weinformallydescribestwomajorissuesinaddingfault-toleranceandour contributions(Chapter1).Thenweformallypresentthepreliminaryconcepts(Chapter2). Inthesecondpart,weaddresstheissuesofrealizabilityconstraintsinaddingfault-tolerance. InChapter34and5,wepresentthetechniquesoflazyrepairinaddingfault-toleranceto synchronoussystems,cyber-physicalsystemsanddistributedsystems,respectively. Inthethirdpart,wepresentthetechniqueofgeneratingafault-tolerantprogramwithgrace- fuldegradation.,Chapter6aimstosolvetheproblemofgeneratingagraceful programthatsatweakerofthesystem.InChapter7,wepresentthetech- niqueofensuringthegeneratedgracefulprogramcanberealizedbyadistributedsystem. TheninChapter8,weintroducetheapproachofaddingfault-tolerancetothegenerated distributedgracefulprogram.Finally,inChapter9,wepresentthetechniqueofadding multi-gracefuldegradation. Inthefourth,weconcludeourcontributioninthisdissertation.Additionally,wediscuss futureresearchworkinChapter11andrelatedworkinChapter10. 11 Chapter2 PreliminaryConcepts Inthischapter,wegiveformaloffundamentalconceptsinthisdissertation. cally,wethenotionsofprograms,faults,fault-tolerance.Toillustratethese concepts,weuseaByzantineAgreementprogramasarunningexample. 2.1Program Inthissection,wepresentthenotionof programs .Intuitively,wedescribeaprogramintermsofa setofvariablesandprocesses.Wespecifythebehaviorofaprocessthroughastate-transition system.Eachprocessisassignedasubsetofthesevariablesthattheprocessisabletoreadorwrite. Let V p = f v 0 ;v 1 ; ;v m g ;m 0 ,beasetofvariablesofagivenprogram p .Each variable( v i ; 0 i m )isassociatedwithadomain D i .Forsuchaprogram,a valuation isa function x : V p ! D 1 D 2 D m thatassignsavaluetoeachvariable. 2.1 (statespace) . A statespace (denotedas S p )ofaprogram p is S p = D 1 D 2 D m . 2.2 (state) . A state (denotedas s )isanelementof S p . 2.3 (transition) . A transition (denotedas ( s;s 0 ) )isanelementof S p S p . Remark. Inthisdissertation,givenatransitionoftheform ( s;s 0 ) ,wecallstate s asanoriginating stateand s 0 asaresultingstate.Andwesaytransition ( s;s 0 ) isenabledatstate s . 12 2.4 (statepredicate) . A statepredicate (denotedas S )isasubsetof S p . Remark. Inthisdissertation,astatepredicatecanalsobeexpressedintermsofabooleanex- pressionover V p .Whenthecontextisclear,weusethesetwonotionsinterchangeablyinthis dissertation. 2.5 (transitionpredicate) . A transitionpredicate (denotedas )isasubsetof S p S p . 2.6 (closure) . Astatepredicate S is closed inatransitionpredicate iff 8 ( s;s 0 ) 2 ;s 2 S ) s 0 2 S 2.7 (process) . A process j isbythetuple h R j ;W j ; j i where R j V p isa setofvariables j isallowedtoread, W j V p isasetofvariables j isallowedtowrite, j isa transitionpredicateinthestatespaceof j . 2.8 (program) . A program p isatuple h V p ;R p i ,where V p isasetofvariablesand R p isasetofprocesses. Notation. Let p = h V p ;R p i beaprogram.Theset p denotesthecollectionoftransition predicatesofallprocessesof p ,i.e., p = S j 2 R p j . 2.9 (guardedcommands) . Tocompactlyrepresentasetoftransitionsofaprocess,we utilize guardedcommands (alsocalled actions )oftheform: guard ! statement 1 ; statement 2 ; ::: ; where guard isastatepredicateoverprogramvariablesandeach statement i isanassignment operationsoverthosevariables.Thisguardedcommandcompactlyrepresentsasetoftransitions. 13 Toillustrate,assumethat guard representsasetoforiginatingstates,say S .From S ,byexecuting each statement i sequentiallyinoneatomicstep,theprogramreachesasetofresultingstates,say S 0 . 2.10 (computation) . Asequenceofstates, ˙ = h s 0 ;s 1 ; isa computation ofpro- gram p iff 8 j; 0 0::( s j 1 ;s j ) 2 ( p [ f ) . if h s 0 ;s 1 ; i isandterminatesinstate s l thentheredoesnotexiststate s suchthat ( s l ;s ) 2 p . 9 n;n 0::( 8 j;j> 0::( s j 1 ;s j ) 2 p ) Notation. Forbrevity,weuse p [] f tomean` p inthepresenceof f '.More,asequence isacomputationof p [] f iffitisacomputationof` p inthepresenceof f '.And,thetransitionsof p [] f areobtainedbytakingtheunionofthetransitionsin p andthetransitionsof f . 2.19 (fault-span) . Astatepredicate T isan f -spanofaprogram p fromstatepredicate S iff S ) T and T isclosedin p [] f . Thus,ateachstatewhereaninvariant S of p istrue,an f -span T of p from S isalsotrue.Also, like S , T isclosedin p .Moreover,ifanytransitionin f isexecutedinastatewhere T istrue, theresultingstateisalsoonewhere T istrue.Itfollowsthatforallcomputationsof p thatstartat stateswhere S aretrue, T isaboundaryinthestatespaceof p uptowhich(butnotbeyondwhich) thestateof p maybeperturbedbytheoccurrenceof f . Intheabsenceoffaults,aprogramshouldsatisfyitsInthepresenceoffaults, itmayormaynotguaranteetosatisfyallofitswhichisdeterminedbythelevel oftoleranceprovided.Withthisnotion,wethreelevelsoffault-tolerance,i.e.,fail-safe fault-tolerance,nonmaskingfault-toleranceandmaskingfault-tolerance.Intuitively,inthepres- enceoffaultsafail-safefault-tolerantprogramguaranteestosatisfyitsbutitmay notrecovertoasetoflegitimatestates.Inthepresenceoffaults,anonmaskingfault-tolerant programmaynotguaranteetosatisfyitsbutitguaranteestorecovertoasetofle- gitimatestates.Inthepresenceoffaults,amaskingfault-tolerantprogramguaranteestosatisfyits andtorecovertoasetoflegitimatestates. 19 2.20 (failsafe f -tolerantfor SPEC from S ) . p isfailsafe f -tolerantto SPEC from S iff, p SPEC in S ;and thereexists T suchthat T isan f -spanof p from S ,and p [] f maintains SPEC from T . 2.21 (nonmasking f -tolerantfor SPEC from S ) . p isnonmasking f -tolerantto SPEC from S iff, p SPEC in S ;and thereexists T suchthat T isan f -spanof p from S ,andeverycomputationof p [] f thatstarts fromastatein T hasastatein S . 2.22 (masking f -tolerantfor SPEC from S ) . p ismasking f -tolerantto SPEC from S iff, p SPEC in S ;and thereexists T suchthat T isan f -spanof p from S , p [] f maintains SPEC from T ,andevery computationof p [] f thatstartsfromastatein T hasastatein S . 2.3.1Examplecont'd:Fault-tolerantByzantineAgreement Nowweconsiderasetoffaultsthatperturbthebyzantineagreementprogram,wherethegeneral ischangedtobebyzantine,anditcanchangeitsdecisionvaluearbitrarily.Thesefaultsareshown asfollows. 20 f 1 : b:g = false ! b:g := true f 2 : b:g = true ^ d:g =0 ! d:g :=1 f 3 : b:g = true ^ d:g =1 ! d:g :=0 Figure2.2Faultsofbyzantineagreementprogram. Assumethesefaultshappenonthefault-intolerantprogramshowninFigure2.1.Inthiscase, theprogrammayreachastatewheretwonon-generalprocessesholdandwithdifferent decisions.Notethattheprograminthisstateviolate Sf ag .Inotherwords,thebyzantineagreement programwepresentinSection2.1.1isnotmaskingfault-tolerantto f 1 , f 2 and f 3 . Onepossiblemaskingfault-tolerantbyzantineagreementprogramcanbeimplementedviade- cisionwiththemajority[52].Wepresentthismaskingfault-tolerantbyzantineagreementprogram inFigure2.3.Weonlyprovideactionsforprocess j .Notethatactionsforprocesses k and l are identicaltothoseinprocess j . d:j = ?^ f:j = false ! d:j := d:g d:j =0 ^ ( d:k =0 _ d:l =0) ^ f:j = false ! f:j := true d:j =1 ^ ( d:k =1 _ d:l =1) ^ f:j = false ! f:j := true ( d:k =0 _ d:l =0) ^ f:j = false ! d:j :=0 ;f:j := true ( d:k =1 _ d:l =1) ^ f:j = false ! d:j :=1 ;f:j := true d:j =0 ^ d:k =1 ^ d:l =1 ! d:j :=1 d:j =1 ^ d:k =0 ^ d:l =0 ! d:j :=0 Figure2.3Maskingfault-tolerantbyzantineagreementprogram. ConsideringthefaultsinFigure2.2,wenowpresentafail-safefault-tolerantbyzantineagree- mentprogram.Inthisprogram,everynon-generalprocesskeepscopyingitsthedecisionfrom thegeneralprocess.Thenanon-generalprocessitsdecisiononceitthatallother processesholdthesamedecision. 21 ConsideringthefaultsinFigure2.2,wenowpresentanon-maskingfault-tolerantbyzantine agreementprogram.Inthisprogram,allnon-generalprocessescopythedecisionfromgeneraland theirdecisionsimmediately.Itthereexistsaprocessthatitdisagreeswithotherdecisions inmajority,thisprocessshouldchangeitsvaluetobesameasthemajority. 22 Chapter3 LazyRepairforSynchronousSystems Inthischapter,wepresentourcontributionsoflazyrepairtosynchronoussystems.Thischap- tercanbedividedintothreeparts.Inthepart(Section3.1toSection3.5),weintroduce thenotionsofinterleavingsemantics,synchronoussemanticsandtherealizabilityconstraintsin synchronoussemantics.Inthesecondpart,wepresenttheproblemofaddingfault-tolerancetoa synchronousprograminSection3.7.Andweproposeouralgorithmforaddingfault-tolerancein Section3.8.Finally,inthethirdpart,weillustratethetechniqueviatwocasestudys,namelya groupmembershipprotocol(Section3.9)andacachecoherenceprotocol(Section3.10). 3.1Background Onecharacteristicofembeddedsystemsistheneedtotreatphysicalrealtimeasorderquan- tity;i.e,itisnecessarythatthereisonlyonesynchronizedphysicaltimethatisavailabletoall componentsandprocesses.Subsequently,allcomponents/processesexecutetheiractionswhen thephysicaltimetickarrives(e.g.,inthetime-triggeredarchitecture(TTA)[45]orinEsterel[12] withtheprimitivestatement pause ).Usuallyapplicationsoftwaresbuiltupontime-triggeredar- chitectureisdesignedagainstamoreabstractmodel,namelyround-basedexecution.Inparticular, theprogramexecutionisdividedintoseveralrounds,agroupofprocessesgetchancestoaccom- palishactionsinoneround.Thissynchronousbehaviorisdifferentfromasynchronousbehavior previouslydiscussedindistributedsystems,whereeachprocessisallowedtoperforminarbitrary 23 speed.Theproblemofmodelrepairhasbeenstudiedpreviouslyinthecontextofinterleavingse- mantics[21,35,22,43],e.g.,indistributedsystems,wearguethatthecorrespondingsolutionsare notapplicableforseveralproblemsencounteredinembeddedsystems.,ininterleaving semantics,onlyoneofthecomponentsexecutesinagivenstep.Onthecontrary,inmanycom- monlyconsidereddistributedembeddedsystems,severalcomponentscanexecutesynchronously. Nextwediscusssuchdifferencebetweenasynchronousbehavior(i.e.,interleavingsemantics)and synchronousbehavior(i.e.,synchronoussemantics)indetail. 3.2InterleavingSemantics Interleavingsemantics [59]meansthatintheexecutionoftheprogram,ineverystep,onecom- ponent/processisnon-deterministicallychosenforexecution.Thenotionofinterleavingsemantics isinspiredbyprocessesthatcommunicateviaasynchronousmessagepassing.Ithasalsobeenex- tendedtosharedmemoryprogramswhereitisusedasanabstractiontosimplifyprogramdesign anditsproof.Insuchsharedmemoryprograms,itisanticipatedthatsomesynchronizationwould beaddedduringimplementationtoensurethatactionexecutionisserialized. 3.3SynchronousSemantics Synchronoussemantics (alsoconsideredmaximumparallelismsemantics,e.g.[37])considered inthisdissertationismotivatedbytimedexecution,whereallprocessesfollowaround-based execution.Intuitively,oneroundinthissystemcanbeviewedastwophases,areadphaseanda updatephase.Inthereadphase,eachprocessobtainsthestatesofotherprocessesitinteractswith. Then,intheupdatephase,eachprocessutilizesthisinformationtoupdateitsstate.Synchronous 24 semanticsisthususefulinpermittingaprocesstoexecuteindependently(andcoordinatingonlyon a clock-tick ).Italsocapturestheconcurrentexecutionmoreprecisely.Moreover,sinceitabstracts modelingofexplicittime,itisinreducingthecostofvandrepair.Also, synchronoussemanticsisclosertotimedexecution. However,inthecontextof synchronoussemantics ,theopportunitiesandchallengesofthe problemhasnotbeenaddressed.Repairingasynchronousmodelcreatesanewchallengeinthat removalofaprocesstransitionmayintroduceanewunexpectedbehaviortotheprogram.Thisis duetothefactthatifsomeprocess,say j ,doesnotexecuteitstransition,thenthesystemexecution includesanewglobaltransition,whereallprocessesexcept j execute.And,suchatransitionisnot partoftheoriginalprogram.Itfollowsthatsynchronoussemanticsintroducesrealizabilitycon- straints.Theseconstraintsdonotapplyininterleavingsemanticsasremovalofoneprocesstransi- tiononlyremovesthecorrespondingbehaviorfromtheoverallsystembehavior.Thisscenarioalso doesnotoccurinsystems(e.g.,[22],[36]wherecomponentsutilizeexplicitcoordination.Next, wepresentourofsynchronoussemanticsandinterleavingsemantics. 3.4PreliminaryConceptsforSynchronousPrograms Recallthatbytion2.8,thetransitionpredicate p ofaprogram p = h V p ;R p i isobtainedby takingunionofitsprocesses'transitions,i.e., p = [ j 2 R p j .Inotherword,inoneprogramtran- sitionof p ,onlyoneprocessisallowedtoexecute.However,fromtheabovediscussion,iffollow thiswearenotabletomodelsynchronousprograms,sincesynchronoussemanticper- mitsallprocessestoexecuteinoneatomicstep.Inthissection,weextendtheofprogram transitionssothatitfollowssynchronoussemantics.Additionally,forthesakeofcomparison,we alsoprovidethesameinthecontextofinterleavingsemantics. 25 3.1. Let p = h V p ;R p i beasynchronousprogram.A process j in R p isasa tuple h W j ; j i ,where W j V p ,and j S p S p . In3.1, W j denotesthevariablesthatprocess j isallowedtowrite.Inotherwords, j thefollowingcondition: ( s 0 ;s 1 ) 2 j )8 x : x 62 W j : x ( s 0 )= x ( s 1 ) Givenanytwoprocesses j and k ,werequirethat W j and W k aredisjoint.Inaddition,forall processestheirwritablevariablesareexhaustive;i.e., S l 2 R p W l = V p . Intuitively,in synchronoussemantics ,ineverystep,eachprocessmustexecuteoneofitstran- sitions(unlessthereisnotransitionforthatprocessinthegivenstate). 3.2 (synchronoussemantics) . Let p = h V p ;R p i beaprogram, j = h W j ; j i bea processin R p ,and s 0 and s 1 betwostatesin S p .Then, ( s 0 ;s 1 ) isatransitionof p in synchronous semantics iffthefollowingconditionis 8 j 2 R p ( ( 9 s j 2 S p :( s 0 ;s j ) 2 j ^8 x 2 W j : x ( s 1 )= x ( s j )) _ ( 8 s j 2 S p :( s 0 ;s j ) = 2 j ^8 x 2 W j : x ( s 0 )= x ( s 1 ))) Notations. Weuse p todenotethetransitionsofaprogram p .Fortheconvenienceofwriting,we use ( s 0 ;s 1 )= 1 j 1 k todenoteaprogramtransition,wheretransition 1 j isenabledinprocess j ,transition 1 k isenabledinprocess k ,andsoon.Ifsomeprocess,say l ,isnotinthis list,thenthestateof l remainsunchangedintransition ( s 0 ;s 1 ) .Thisispossibleonlyif l doesnot includeanytransitionthatoriginatesinstate s 0 ;i.e., 8 s 1 2 S p :( s 0 ;s 1 ) 62 l .Inaddition,we 26 introducethenotation j k ,where j k = f i 1 j i 2 k j i 1 j 2 j ^ i 2 k 2 k ^g todenoteasetofprogramtransitionsinwhicheachprocesschoosetoexecuteoneofitstransitions. Tofurtherillustrate3.2,weconsideraprogram p jk whichconsistsoftwoprocesses j and k ,twovariables v j 2 W j and v k 2 W k .Thedomainof v j and v k are f 0 ; 1 g .Thus,thestate spaceofprogram p jk is S p = f 00 ; 01 ; 10 ; 11 g ,wherethedigitsdenotethevaluesof v j and v k , respectively.Eachprocessincludestransitionsthatcheckifthevalueof v j and v k aredifferent.If theyaredifferent,thentheprocesstriestomakethemthesamebytogglingthevalueofitsvariable. Thus,thesetoftransitionsofprocesses j and k are j and k respectively,where j = f 1 j =(01 ; 11) ; 2 j =(10 ; 00) g and k = f 1 k =(01 ; 00) ; 2 k =(10 ; 11) g . Figure3.1Stateupdateinasynchronousprogram Undersynchronoussemantics,aprogramtransitionin p jk isexecutedasaresultof j and k updatingtheirvariablessimultaneouslyinoneatomicstep,asitisshowninFigure3.1.Thus, thesetofprogramtransitionsin p jk undersynchronoussemanticsare (01 ; 10) , (10 ; 01) , (00 ; 00) , (11 ; 11) . Notethatevenwithnewofsynchronousprogramtransitionsgiven,thosetions (including2.10-2.14)inChapter2requiresnochange.Alsonotethatinthe 27 abovewehaveallowedthepossibilityofnon-determinisminprocesstransitions.Thisis duetothefactthatweviewourrepairalgorithmstoidentifynon-deterministicimplementationof theprocessesintherepairedprogram.Thisnon-determinismwillprovideadesignerwithseveral choicestoimplementtherepairedprogram.Furthermore,iftheprocessesthemselvesaremade deterministic(i.e.,inanyglobalstate,eachprocesscanexecuteatmostonetransition),thenthe samepropertyholdsforprogramtransitionsinsynchronoussemantics. Wenowrewritetheforprogramtransitionsininterleavingsemanticsinition 3.3.Notethatitintroducesnodifferenceto2.8,where p = [ j 2 R p j . 3.3 (interleavingsemantics) . Let p = h V p ;R p i beaprogram, j = h W j ; j i beaprocess in R p ,and s 0 and s 1 betwostatesin S p .Then, ( s 0 ;s 1 ) isatransitionof p in interleaving semantics iffthefollowingconditionis (( s 0 ;s 1 ) 2 p ) _ (( s 0 = s 1 ) ^8 s 1 2 S p ::( s 0 ;s 1 ) = 2 p ) Remark. Notethateveniftheprocessesthemselvesaredeterministic,programtransitionsmay notbedeterministicininterleavingsemantics. Underinterleavingsemantics,thecorrespondingtransitionsof p jk are (01 ; 00) , (01 ; 11) , (10 ; 00) , (10 ; 11) , (00 ; 00) ,and (11 ; 11) . 3.5RealizabilityConstraintsinSynchronousProgram Inthissection,wepresentthenotionofrealizabilityconstraintsinsynchronousprograms.To illustrate,weconsiderfourdifferentprograms(notnecessarilysynchronousprogram)whichare showninFigure3.2,3.3,3.4and3.5.Wediscusswhethertheycanberealizedundersynchronous 28 Figure3.2A synchronous program. Figure3.3A synchronous program. Figure3.4A synchronous program. Figure3.5Not asynchronous program. semantics.Supposealltheseprogramsincludestwoprocesses j and k ,andtwovariables v j and v k .Both v j and v k arewithdomain f 0 ; 1 ; 2 g and v j 2 W j , v k 2 W k .Liketheexamplepresented inSection3.4,astateintheseprogramisrepresentedinatwo-digitform,wherethedigit representsthevalueof v j andsecondrepresentsthevalueof v k TheprograminFigure3.2isavalidsynchronousprogram,sincefollowing3.2this programcanberealizedundersynchronoussemanticswithprocessestransitionsshownbelow, j = f (00 ; 10) g k = f (00 ; 01) g (Figure3.2) Likewise,programsinFigure3.3and3.4canalsoberealizedundersynchronoussemantics, andtheirprocessestransitionsareshownasfollows, j = f (00 ; 10) g k = f (00 ; 01) ; (00 ; 02) g (Figure3.3) j = f (00 ; 10) ; (00 ; 20) g k = f (00 ; 01) ; (00 ; 02) g (Figure3.4) However,undersynchronoussemantics,processtransitionsfor j and k cannotbe fortheprogaminFigure3.5.SincethesetoftransitionsshowninFigure3.5impliesthatpro- cess j shouldincludetransitions f (00 ; 10) ; (00 ; 20) g ,andprocess k shouldincludetransitions f (00 ; 01) ; (00 ; 02) g .Itfollowsthataccordingto3.2,theprogramshouldincludefour 29 transitionswhicharesameasthoseshowninFigure3.4.Thus,theprogramshowninFigure3.5is notavalidsynchronousprogam. 3.6ModelingFaultsforAddingFault-tolerancetoSynchronous Programs Inthischapter,weaimtosolvetheproblemofaddingfault-tolerancetosynchronousprograms. However,duetothesynchronousnature,itintroducesnewchallengesofunderstandingtheeffects offaultsinsuchprograms.Henceinthissection,wediscussthevarioustypesoffaultsandhowto modeltheirexecutioninthecontextofsynchronoussemantics. Intheinterleavingsemantics,asetoffaults f foraprogram p aremodeledastransitions;i.e., f S p S p .Inotherwords,acomputationofaprograminthepresenceoffaultsisobtained byconsideringinterleavingsofprogramandfaulttransitions(see2.18).Inthecaseof programsundersynchronoussemantics,faultscanbecausedduetodifferentreasons.Forexample, considerthefollowingcases: Onecommontypeoffaultisaprocessfailure.Asanillustration,ifaprogramconsistsof n processes,thenineachprogramtransition,all n processesaresupposedtoexecute(unless somehavenovalidtransitioninthecurrentstate).However,ifoneofthemfailstoexecute, thentheresultingtransitionisonewhere n 1 processeschangetheirstate.Thetransition obtainedthuscannowbeviewedasafaulttransition.Inotherwords,eventhoughthe transitionisobtainedbyexecutionofasubsetofprocesses,thistransitionwouldbemodeled asafaulttransition. Moreover,afaulttransitioncouldexecuteconcurrentlywithoriginalprocesses.Inother 30 words,inonestep,allprocessesexecuteandconcurrently,thefaultalsoupdatessomepro- gramvariables.Inthiscase,thetransitionobtainedbyconsideringexecutionofallprocesses andthefaultwillbeconsideredasafaulttransition. Alternatively,executionoffaultscouldbeindependentofprogramexecution;i.e.,thefault occurs between twoprogramtransitions. Aswecanobservefromthesescenarios,eventhoughtheexecutionoftheprogramitselfoccurs insynchronoussemantics,aninteractionbetweenaprogramandfaultscanbeviewedintermsof interleavingsemantics.Inotherwords,faultsdonothavetosynchronizewithprogramactions. Moreover,faultscanpotentiallymodifyallprogramvariables.Furthermore,sincetheinteraction betweentheprogramandfaultsisinterleavinginnature,wecanutilizeexistingapproachesinthe context,whereprocessesintheprogramexecuteininterleavingsemantics. ,forthealternative,supposethatafaultperturbsthestateofaprogramtoa state s 0 ,fromwhichstate s 1 isreachedbyprogramtransition ( s 0 ;s 1 ) .Letthesetofvariables updatedbytransition ( s 0 ;s 1 ) be ! ( s 0 ;s 1 ) .Thisfault,thus,canbemodeledinaway,suchthat onlypartofvariablesin ! ( s 0 ;s 1 ) areupdated.Likewise,forthesecondalternative,thefaultcan bemodeled,sothatpartofthevariablesin ! ( s 0 ;s 1 ) areupdatedsameastransition ( s 0 ;s 1 ) while theremainingvariablesareupdatedwithnewvalues.Forthethirdalternative,thefaultcanupdate anyvariableswhiletheprogramtakesnotransitions. 31 3.7ProblemStatementofAddingFault-tolerancetoSynchronous Programs Inthissection,weformallytherepairproblemofaddingfault-tolerancetosynchronous programs.Givenafault-intolerantprogram,say p ,wewanttoobtainarepairedfault-tolerant program,say p 0 .Inthisdissertation,wealwaysassumethat p itsinthe absenceoffaults.Thus,arepairalgorithmshouldnotintroduceanynewbehaviorstotheprogram intheabsenceoffaults.Hence,werequirethattheinvariant I 0 therepairedfault-tolerantprogram isasubsetofthesetofinvariantoftheoriginalintolerantprogram.And,intheabsenceoffaults,a computationof p 0 thatstartsfromastatein I 0 shouldexistinthesetofcomputationsof p thatstart from I .Clearly,arepairalgorithmshouldalsoensurethattheprogramismaskingfault-tolerant. ProblemStatement: Givenaprogram p withinvariant I , SPEC , andaset f offaults,suchthat p SPEC from I ; Doesthereexistaprogram p 0 withinvariant I 0 ,suchthat: C1 : ( I 0 I ) ^ ( p 0 j I 0 p j I 0 ) C2 : p 0 ismaskingfault-tolerantto f for SPEC from I 0 . 3.8AlgorithmofAddingFault-tolerancetoSynchronousPro- gram Inthissection,weintroduceanalgorithmthatsolvestheproblemstatedinSection3.7;i.e., thealgorithmfollowslazyrepairbyaddingmaskingfault-toleranceundersynchronousseman- tics.Ouralgorithmforaddingmaskingfault-tolerance(seeAlgorithm1),consistsofthreekey 32 Algorithm1 AddingMaskingFault-tolerancetoaSynchronousProgram Input: Aprogram h V p ;R p i ,where j 2 R p is h W j ; j i ,legitimatestates I ,faults f ,safetyspeci- Sf . Output: Amaskingfault-tolerantprogramwithsetofprocessestransitions 0 j 0 k as p 0 1: p 0 ;I 0 ;T 0 := AddSafeRecovery ( p ;f;I;Sf ) //Algorithm2 2: p 0 := BreakCycles ( I 0 ;T 0 ; 0 p ) //Algorithm3 3: return ConstructSyncProgram ( T 0 ; p 0 ) //Algorithm4 steps: AddSafeRecovery (Line1), BreakCycles (Line2),and ConstructSyncProgram (Line3. Algorithm2 Addsaferecovery Function: AddSafeRecovery ( p ;f :transitions, S :statepredicate, Sf :safety 1: ms := smaloint ( X = X [f s 0 j9 s 1 :( s 0 ;s 1 ) 2 f ^ ( s 1 2 X _ ( s 0 ;s 1 ) 2 Sf ) g ) 2: mt := f ( s 0 ;s 1 ) j ( s 1 2 ms ) _ (( s 0 ;s 1 ) violates spec ) g 3: S 1 ;T 1 := S ms;true ms 4: repeat 5: T 2 ;S 2 := T 1 ;S 1 6: p 1 := p j S 1 [f ( s 0 ;s 1 ) j s 0 2 ( T 1 S 1 ) ^ s 1 2 T 1 g mt 7: T 1 := T 1 f s j S 1 isnotreachablefrom s in p 1 g 8: T 1 := laroint ( X =( X \ T 1 ) f s 0 j9 s 1 :( s 0 ;s 1 ) 2 f ^ s 1 = 2 X g ) 9: S 1 := S 1 smaloint ( X = X [f s 0 j8 s 1 2 S 1 \ T 1 :( s 0 ;s 1 ) = 2 p 1 g ) 10: if ( S 1 = fg_ T 1 = fg ) then 11: declarenosaferecoveryexists andexit 12: endif 13: until ( T 1 = T 2 ^ S 1 = S 2 ) 14: return p 1 , S 1 , T 1 Step1.Thestepaddsrecoverywhileensuringsafetyproperties.Thefunction AddSafeRecovery (whichisimplementedinAlgorithm2)ignoresthestructureofthesynchronous programandonlytreatsitasasetoftransitions.Thisfunctionreusestheapproachin Add masking ft [46] underinterleavingsemantics.Thefunctioncomputestheset ms ofstatesfromwhereoccur- renceoffaultsonlycanviolatethesafety(Line1),and,theset mt oftransitionsthat reach ms (Line2).Then,inLine6,itcomputestheset p 1 ofprogramtransitionsincludingnew recoverytransitionsthatdoesnotstartin S 1 .Usingthesetransitions,thenewfault-span T 1 (to 33 ensureclosureinLine8)andthesetoflegitimatestates S 1 (toensuredeadlockfreedominLine9) aresubsequentlyrecomputed.Whentherepeat-untilloopterminates,thefunctionensuresthat fromanystatereachedinthepresenceoffaults,itispossibletorecovertothelegitimatestates withoutviolatingsafety. Algorithm3 Breakcycles Function: BreakCycles ( S;T :statepredicate, p :transitions) // Let rank ( s;S;p ) bethelengthoftheshortestcomputationoftransitions p thatstarts from s andendsinastatein S 1: p 0 := f ( s 0 ;s 1 ) 2 p j ( s 0 2 S ) _ (( s 0 2 T ) ^ ( rank ( s 0 ;S;p ) > rank ( s 1 ;S;p ))) g 2: return p 0 Step2.Although AddSafeRecovery ensuresthatfromeverystatereachedinthepresence offaults,saferecoveryispossible,theresultingprogrammayincludecyclesoutsidethesetof legitimatestates.Hence,weusefunction BreakCycles (whichisimplementedinAlgorithm3) tobreaksuchcycles,sothattheprogramwilleventuallyrecovertoitslegitimatestates.Tothis end,weassigna rank toeachstateinthefault-span,wheretherankisthelengthoftheshortest pathfromthatstatetoastateinthesetoflegitimatestates.Afterweobtaintheranks,weonly includetransitionsofwhichtherankoftheoriginatingstateisgreaterthantherankofthetarget state(Line1). Step3.Finally,function ConstructSyncProgram ensuresthattheresultingprogramcanbe realizedundersynchronoussemantics.BeforeexplainingthedetailsinAlgorithm4,weusean exampletoillustratehowrealizabilityconstraintsintroducedinSection3.5affecttherepairfor synchronousprograms. Consideraprogram,say p jk ,withtwoprocesses,say j and k .Let s beastateofprogram p jk .Furthermore,let j (respectively, k )havetwotransitionsfrom s , 1 j and 2 j (respectively, 1 k and 2 k ).Thisimpliesthat p hasfourpossibletransitionsfromstate s ,namely, t 1 = 1 j 1 k , 34 Algorithm4 ConstructingaSynchronousProgram Function: ConstructSyncProgram ( T :statepredicate, p :transitions) 1: forall j 2 R p do 2: 0 j = S ( s 0 ;s 1 ) 2 p f ( s 0 ;s 2 ) j8 x 2 W j : x ( s 2 )= x ( s 1 ) ^8 x 62 W j : x ( s 2 )= x ( s 0 )) g 3: endfor 4: forall s 2 T do 5: trans := f ( s ;s 1 ) j ( s ;s 1 ) 2 p g 6: forall j 2 R p do 7: s j := S ( s ;s 1 ) 2 trans f ( s ;s 2 ) j8 x 2 W j : x ( s 2 )= x ( s 1 ) ^8 x= 2 W j : x ( s 2 )= x ( s ) g 8: endfor 9: A s := s 1 s 2 10: while A s * trans do 11: Let t :=( s ; r ) where t 2 ( A s trans ) 12: Let k beaprocessst. j s k j isthelargest 13: f t g k := f ( s ;s 1 ) j8 x 2 W k : x ( r )= x ( s 1 ) ^8 x= 2 W k : x ( s 1 )= x ( s ) g 14: 0 k := 0 k f t g k 15: recompute s k and A s sameasLines6-9. 16: endwhile 17: endfor 18: return 0 j 0 k as p 0 t 2 = 2 j 2 k , t 3 = 1 j 2 k and t 4 = 2 j 1 k .Supposethatwhilerepairingthisprogram,someofthese transitionsmayberemoved,e.g.,toensurethatsafetyisnotviolated.Forexample,considerthe scenariowheretransition t 4 isremoved.Whileaprograminvolvingtransitions t 1 , t 2 ,and t 3 isa valid single-process program,itcannotberealizedasamulti-processprogramundersynchronous semantics.Thisisbecausetorealizesuchaprogram, j (respectively, k )mustincludetransitions 1 j and 2 j (respectively, 1 k and 2 k ).However,if j and k includesuchtransitions,thentherepaired programwouldalsocontaintransition t 4 whichisnotacceptable. Asimilarproblemalsoariseswhileaddingtransitionstotheprogram.,consider thescenariowheretransitions t 1 , t 2 ,and t 3 areadded,but t 4 isnotadded,asitcausestheprogram toloopoutsidethesetoflegitimatestates.Clearly,insuchacase,therepairedprogramisnot realizableundersynchronoussemantics. Hence,inthisexample,ifweneedtoremovetransition t 4 = 2 j 1 k ,wehavetoupdatethe 35 progamasfollows:Wechangetransitionsofprocess j (or k )toensurethattransition t 4 cannotbe atransitionoftherepairedprogram.Thiscanbeachievedbyeitherremoving 2 j or 1 k .Here,we removethetransitionfromaprocessthathasthelargest(withatie-break)transitionsfromstate s .Wenotethatwhileremovingsuchtransitions,therewillbeatleastonetransitionthatremains fromstate s .Thisisduetothefactthatifthereisonlyonetransitionfrom s ,thenitgivesonlyone possibletransitionforeveryprocess.And,thisisrealizableundersynchronoussemantics. ,inAlgorithm4,itcomputesthesetoftransitionsofeachprocess(Line2). Then,fromeachstate s ,itensuresthatthesetofoutgoingtransitionsisrealizableinsynchronous semantics.Tothisend,wecomputetheset s j oftransitionsforeachprocess j thatoriginatefrom state s (Line7).Next,thewhile-loop(Lines6-11)considerstransitions t thatoriginatein s and areremovedintheprevioussteps(Line11).Let k betheprocesswithlargestnumberofoutgoing transitionsfromstate s (Line12).Finally,weremovetransitionsof k thatcontributein t (Lines13 and14). Theorem1. Algorithm1issoundandcomplete.And,itstimecomplexityispolynomialinthestate spaceoftheinputprogram. Algorithm1issound. Œ Proofof C1 :Ittriviallyholdsbecause AddSafeRecovery , BreakCycles and ConstructSyncProgram donotaddnew statesandtransitionsinsideinvariant I . Œ Proofof C2 :In AddSafeRecovery ,fromLine1to3,safetyisensured.Whenthe loop(Lines4to13)terminates,weknowthatnostateisremovedfromeither T 1 or S 1 inthelastiteration.Weshowthatthesynthesizedprogramismaskingfault-tolerant withinvariant S 1 andthefault-spanusedtoshowthatitisfault-tolerantis T 1 . 36 Whentheloop(Lines4to13)terminates,fromLine7,fromeverystatein T 1 ,thereis apathtosomestatesin S 1 usingtransitionsof p 1 .FromLine8, T 1 isclosedinfault f .And,fromLine9, S 1 isasubsetof T 1 .Hence,fromeverystatein T 1 thereisa computationof p 1 thatreachesastatein S 1 . Subsequently, BreakCycles breakscyclesin T 1 toensurethatstartingfromastatein T 1 , p 1 doesnotencountercycleswithoutreachingastatein S 1 .Thisensuresthatafter thisfunction,startingfromanystatein T 1 ,anycomputationof p 1 reachesastatein S 1 . Now, ConstructSyncProgram utilizestheoutputof BreakCycles toobtainthepro- gramundersynchronoussemantics.Observethat ConstructSyncProgram creates nodeadlockstate,becausetheloop(Line6toLine11)ensuresthereisatleastone transitionremainsfromanystatein T 1 ,otherwise trans (fromLine5)willbeempty. And ConstructSyncProgram createsnocyclessince,itsinputdoesnotcontainany anditonlyremovestransitionsfrom p .Thus,thesynthesizedprogramall constraintsofmaskingfault-tolerance(from2.22) Algorithm1iscomplete. WeshowthatifAlgorithm1fails,thenthecorrespondingfault-tolerantprogramdoesnot exist. Œ Proof. First,notethatonly AddSafeRecovery candeclarefailure.Ourargumentfor completenessisbasedontheobservationthatastatein S 1 (or T 1 )isremovedonlyifit hastoberemovedforobtainingamaskingfault-tolerantprogram.,onLine 1,statesin ms mustberemovedtoensurethatsafetyisnotviolatedinthepresenceof faults.And,onLine7,statesremovedfrom T 1 arestatesfromwhererecoveryinthe presenceoffaultsisimpossible.Likewise,statesremovedonLine8mustberemoved 37 toensure T 1 beingclosed f .Finally,statesremovedonLine9mustalsoberemoved toensurethat S 1 isasubsetof T 1 . TimecomplexityofAlgorithm1ispolynomialinthestatespaceoftheinputprogram. Œ Proof. Inboth AddSafeRecovery and ConstructSyncProgram alltheloopsiterate polynomialtimesinthestatespaceoftheinputprogram.Hence,thisresultfollows. 3.8.1AddingNonmaskingandFailsafeFault-tolerance Recallthatthedifferencebetweennonmaskingandmaskingfault-toleranceisthattheformerdoes notrequiresafetytobepreservedduringrecovery.ThiscanbeachievedbyusingAlgorithm1if wesetthesafetytobe ˚ .NotethatsinceAlgorithm1doesnotaddnewbehaviorsto theinputprogramintheabsenceoffaults,therepairedprogramwillcontinuetosatisfytherevised Recallthatthedifferencebetweenafail-safefault-tolerantprogramandamaskingfault-tolerant programisthattheformerdoesnotrequiretheprogramtorecovertoitslegitimatestates.Hence, toaddfailsafefault-tolerance,wesimplyneedtoremoveLine2thatinvokes BreakCycles and Line6from AddSafeRecovery thatensuresfeasibilityofrecovery. 3.8.2ExpedientRecoveryinthePresenceofFaults Oneofthepotentialadvantagesofsynchronoussemanticsisfasterrecovery(intermsofsteps) tothelegitimatestates.Thisisduetothefactthattheprocessesarecollaboratingbyexecuting ineachstep.Speci,wecanconsidertheproblem,wherethenumberofstepsinrecovery (i.e.,numberofstepsafterfaultsstop)needstobeboundedby MAX .Algorithm5ensuresthis property.,weinvoke AddSafeRecovery .Then,weidentifytherankofeachstate 38 Algorithm5 AddingFastRecovery 1: 0 p ;I 0 ;T 0 := AddSafeRecovery ( p ;f;I;Sf ) 2: while 9 s 2 T 0 ;rank ( s ) > MAX do 3: Sf := Sf [f ( ;s ) j s 2 T 0 ^ rank ( s ) > MAX g 4: 0 p ;I 0 ;T 0 := AddSafeRecovery ( p ;f;I;Sf ) 5: endwhile 6: 0 p := BreakCycles ( I 0 ;T 0 ; 0 p ) 7: S j 2 R p 0 j := ConstructSyncProgram ( T 0 ; p 0 ) inthefault-span.Ifthisrankexceeds MAX ,werevisethesafetytorequirethat suchstatescannotbereachedinthepresenceoffaults.Thealgorithmisthenrepeatedtoadd tolerancewiththisnewTheprocessisrepeateduntilnostatewithrankgreaterthan MAX exists.Atthispoint,wecanutilize BreakCycles and ConstructSyncProgram .Thus,the algorithmforprovidingexpedientrecoveryisasshowninAlgorithm5. 3.9CaseStudy1:GroupMembershipProtocol Inthecasestudy,weconsideraprotocoltoreachagreementon ProcessorGroupMembership [29].Theprotocolaimstoprovideeachprocessorwithaconsistentmembershiplistofprocessors initsprocessor-group.Inaddition,theprotocoltoleratescertainfaults(e.g.,communicationdelays, processorcrashes)inasynchronousdistributedsystem. Ourpresentationofthiscasestudyisorganizedasfollows.First,wepresentthefault-intolerant protocolastheinputforAlgorithm1.Then,weidentifythefaultsthatitissubjectto.Subsequently, weidentifythethathastobewhileaddingfault-tolerance.Finally,we describehowthefault-tolerantprotocolisdesignedusingAlgorithm1. 39 3.9.1Fault-intolerantGroupMembershipProtocol Aprocessorjoinsanexistinggroupinthreerounds.Intheround,theprocessorthatwants tojoinbroadcastsa newgroup message.Duetosynchronousnatureofthesystem,intheabsence offaults,thismessageisreceivedbyothersinthenextstep.Notethatinthesynchronoussys- temwhereeveryprocessorhasasynchronizedclock,messageisexpectedtoreachthedestination withincertaintime.Likewise,theprocessorcanwaitforanarrivalofmessagewithinabounded time.Otherwisetheprocessorregardsthemessagebeingdelayed(orlost).Thusprocessorper- formsactionsinthenextroundbasedonitsclockandthemessagesitreceived.Inthesecond round,uponreceivingthe newgroup message,allprocessorsinthegroupbroadcasta present mes- sage(toshowthewillingnessofforminganewgroupwiththenewprocessor).Finally,inthelast round,allprocessorsinthegrouprecomputetheirmembershiplistinawaythatincludeallthepro- cessorswhichbroadcast present messageandtheonebroadcasting newgroup message.Likewise, atlastround,theprocessorbroadcasting newgroup setits member tothosebroadcasting present anditself. Wedescribetheprotocolbasedonaninstancecomposedoffourprocesses,namely i , j , k ,and l .Forbrevityofthemodel,insteadofmodelingexplicitmessagepassing,weusesomeglobally readablevariablestodenoteabroadcastmessage.Thus,thevariablesofprocess j intheprogram are 1. joined :j : f true ; false g ,thatdenoteswhether j isjoinedtoagroup. 2. member :j : setofprocessorsregardedby j asmembersinitsgroup. 3. newgroup :j : f true ; false g ,thatdenoteswhether j isinterestedinjoiningthegroup. 4. present :j : f true ; false g ,thatdenoteswhether j sentapresentmessageinthecurrentround. 40 5. skipped :j : f true ; false g ,usedduetothefactthattheprocessthatsendsthenewgroup messageshouldnotexecuteinround2. TheroundsoftheprotocolareasshowninFigure3.6., Round1.Intheround,anyprocess(ifnotjoined)isallowedtosetits newgroup to true ifalltheprocesses' newgroup are false .Theguardofthisactionincludestheconstraint fi j wantstojointhegroupfl.Thisconstraintissettotruebytheapplicationusingthegroup membershipprotocol. Notethatundersynchronoussemantics,thisactionensuresthatseveralprocessescanrequest tojointhegroupatthesametime.However,untilthestatusoftheseprocessesisdetermined inround3,anotherprocesscannotsendanewgroupmessage. Round2.Next,allthejoinedprocessesshowtheirwillingnessofformingagroupby settingtheir present variableto true .Theprocessthatsentthenewgroupmessageshould notexecuteinround2.Tomodelthis,thevariable skipped :j (whichissetto true inround 1)isresetto false inround2. Round3.Finally,alltheprocessesenabledineitherround1orround2computethemem- bersasthosewhose present or newgroup istrue.Inaddition,theprocesswhose newgroup istruemarkits joined totrueand newgroup tofalsesimplydenotingthereisnoneedto broadcast newgroup messageforever.Likewise,aprocessthathasalreadyjoinedthegroup setsitbacktofalseandwaitsforotherprocessestojoin. 41 round1: ( j wantstojointhegroup ) ^: joined :j V p 2f i;j;k;l g : newgroup :p ! newgroup :j := true ; skipped :j := true ; round2: ˆ 9 p 2f i;j;k;l g : newgroup :p ^ joined :j ! present :j := true ; newgroup :j ^ skipped :j ! skipped :j := false ; round3: 8 > > > > > > > > < > > > > > > > > : : joined :j ^ newgroup :j ^: skipped :j ! joined :j := true ; member :j := f p j present :p = true g [f q j newgroup :q = true g ; newgroup :j := false ; joined :j ! member :j := f p j present :p = true g [f q j newgroup :q = true g ; present :j := false Figure3.6Faultintolerantprocessorgroupmembershipprotocol 3.9.2FaultsinGroupMembershipProtocol Inthiscasestudy,weconsiderthefault,whereaprocessfailstosendthe present messageinthe secondround.Thus,thisfaultcanbemodeledasatransitionwherethefaultyprocessdoesnotset thevariable present totruewhileothersdo.BasedonthediscussioninSection3.7,werepresent thisfaultforprocess j asfollows: f : 9 p 2f i;k;l g : newgroup :p ^ joined :j ! present :j := false ; 8 q 2f i;k;l g : if joined :q then present :q := true ; 3.9.3SafetyofGroupMembershipProtocol Thesafetyrequiresthatanyreachablestatethefollowingconstraints: 42 ( Sf a )Processorsinthegroupagreewitheachotheraboutgroupmembership. 1 Formally, 8 p :: joined:p ^ joined:q ) members:p = members:q ( Sf r )Thelistofgroupmembersisnonempty.Formally, joined:p ) members:p 6 = ; ( Sf s )Afteraprocessorjoinsagroup,itdoesnotleaveuntilafailureisdetectedora newgroup messageissent. Remark. Theoriginalprotocolastrongerversionof Sf r .,itguaranteesthat thegroupmembershipisxive.However,inthepresenceoffaults, Sf r isduetothe factthatsatisfyingthestrongerrequirementisimpossibleand,hence,Algorithm1declaresfailure. Therepairedprogramguaranteesthattheprogramcontinuestosatisfyitsexistingationin theabsenceoffaults.Hence,therepairedprogramwillensurexivityofgroupmembershipin theabsenceoffaults. 3.9.4LegitimateStatesofGroupMembershipProtocol Ifaprocesshasnotjoinedagroup,thereisnoneedtomaintainacopyofthemembersinthat group.Ifaprocess j joinsthegroup,asrequiredbysafety Sf a and Sf r , member :j shouldbeequaltoallotherprocessesinthatgroupand j itselfshouldbeinthegroup.Inaddition, ifsomejoinedprocess j broadcaststhe present message(i.e.,sets present :j totrue),thenallother 1 Observethatthisrequirementcannotbeininterleavingsemanticssincetwoprocessesthatupdategroup membershipcannotupdatetheirstateinthesamestep. 43 joinedprocessesshouldperformthesameactionaccordingtothesynchronoussemantics.Thus, thelegitimatesstatesoftheprotocolareasfollows: I =( 8 j : : joined :j ) member :j = ; ) _ (( 8 i;j : i 6 = j :( joined :j ^ joined :i ) ) ( member :i = member :j )) ^ ( 8 j : joined :j ) j 2 member :j ) ^ ( 9 j : 8 i : i 6 = j :( joined :j ^ joined :i ^ present :j ) ) present :i )) 3.9.5AddingFailsafe/MaskingFault-tolerancetoGroupMembershipPro- tocol Wenowconsidertheeffectoffault f onthefault-intolerantprogramtoexplainthechangesper- formedbyAlgorithm1whileaddingfailsafefault-tolerance.,considerthescenario, whereprocess i and j areinthegroup,andprocess k decidestojoinwhileprocess l decidesnot to.Aftertheroundwhere k setsits newgroup to true , f perturbstheprogram.Asaresult, onlyprocess i sets present :i to true .Theresultingstateoftheprogramisnotin I .Bothpro- cesses i and k onlyhaveoneactiontotakeinthenextroundwhichistocomputetheir member (denotedas 1 i and 1 k ,respectively).Basedonthe present and newgroup valuesofallprocesses, their member variablesaresetequalto f i;k g .Forprocess j ,ithastwoactionsitcanpotentially perform.Oneistoset present :j totrue(denotedas 1 j )andtheotheroneistoset member :j to be f i;k g (denotedas 2 j ).Noticethateitheroneof 1 j or 2 j shouldbeperformedsimultaneously with 1 i and 1 k .However,theprogramtransition 1 = 1 i 1 j 1 k resultsastateviolating Sf a , since member :i 6 = member :j .Thus, 1 isexcludedbyAlgorithm1.Theprogramtransition 2 = 1 i 2 j 1 k canremainintheprogramsincetheresultingstatedoesnotviolatethesafety Therefore,thefailsafefault-tolerantprogramisshownasfollows,whereround2 44 actionisrepaired: round2: 9 p 2f i;j;k;l g :: newgroup :p ^ joined :j V q 2f i;j;k;l g : present:q ! present :j := true Observethattherepairedprogramisfailsafefault-tolerant.Hence,itmaynotrecovertoa statein I .ThereareseveralpossiblerecoveryactionsthatAlgorithm1canaddtoobtainmasking fault-tolerance.Oneactionisthatwhenprocess j outthat j 62 member :j inround3,itleaves thegroupandstartstojoinagain.Thesamerecoveryactioncanbetakenwheninround3two processes p and q havethesamemembers,butonehasbroadcast present andtheotheronehas not.Thisrecoveryactionisshownasfollows. round3: 9 q : newgroup :q ^ joined :j ^: present :j ^9 p :( present :p ^ member :p = member :j ) ! joined:j := false ; member:j := ; 3.10CaseStudy2:AThree-State(MSI)Write-BackInvalida- tionProtocol Inthissection,wepresentoursecondcasestudywhereweconsideraninvalidation-basedproto- colforwritebackcaches[30].Thisprotocoldesignedtomaintainconsistentcontentsinmultiple cachesonabus-basedarchitecture.Theprotocolisvulnerabletocertainfaults,e.g.failureto updatethememorywiththevaluefromcachecontroller.ToillustratehowAlgorithm1 canaddfault-toleranceinthiscasestudy,thissectionisorganizedasfollows.First,wepresentthe inputforAlgorithm1.,weidentifythefault-intolerantprogramforcachecoherency protocol.Then,weidentifythefaultsthatitissubjectto.Subsequently,weidentifythe 45 tionthathastobewhileaddingfault-tolerance.Finally,weidentifyhowthefault-tolerant programisdesignedusingAlgorithm1. 3.10.1Fault-intolerantMSIProtocol. Inthefault-intolerantprotocol,therearemultipleprocessorsresidingonthebus.Eachofthem reads/writesmemorythroughtheirprivatecache.Theoperationsforthecacheareperformedby itscachecontroller.Inthiscasestudy,wemodeleachcachecontrollerasaprocess(see 3.1).Inotherwords,thereisoneprocessintroducedforeveryprocessorandaseparateprocessfor cachecontroller. Eachcacheblockisassignedoneofthreestates,namely Shared ( S ) , Invalid ( I ) and Modified ( M ) . Ifthecacheblockisinstate S thenitimpliesthatitscontentisreadytobereadandisconsistent withcorrespondingsharedcacheblocksinallothercaches(thathavethatblock)andisalsocon- sistentwiththeblockinmemory. Invalid denotesthatthecacheblockisnotreadyforusenow.A M blockindicatesthatitistheonlyonetothemostup-to-datecontent,whilethememory containsastalecopy.Thusweasetofvariablestodenotethecacheblockstate.Forbrevity, inthiscasestudy,weconsiderthatthereisonlyoneblockinthememoryandallthecaches.Thus forprocess j (namelyoneofthecachecontrollers), state:j : f M;S;I g ; denotestheblockstateinprocess j Eachcachecontrollercansetupataskonbus.Andthereisonlyonetaskallowedonbus.Upon interpretingthetaskonthebusandthe read=write requestsfromprocessor,thecachecontroller takesthecorrespondingaction.Forprocess j ,wethefollowingvariablestodenotetaskon thebusand read=write requestsfromprocessor. 46 task : fff w;r;f g ; f j 2 R p gg ;null g , w denotes write task, r denotes read task, f denotes flush task, null denotesthereisnotaskonbus. j isanyonprocessintheprogram.For example task = f w;j g denotesthecurrenttaskonbusis write setby j . req:j : f w;r;null g , w denotes write request, r denotes read request, null denotesthereis norequestfromprocessor. Weusenotation task=request todenotethecurrenttaskandrequestinterpretedbycache controller.Beforewemodeltheprotocolasguardedcommands,wegiveanoverviewofcache controller'sactionsuponcertain task=request .Uponreceivinga read request,controllerwith I blocksetsupa read taskonbusandfetchthedatafrommemoryforupdatingitsownblock.Then, itchangestheblockstateto S .Anyothercontrollerwith S blockobserving read tasktakesno action.However,thecontrollerwith M blockinterruptsthecurrent read taskandsetsup flush taskonbusinstead,andtransfersitsdatatothememoryandthecachewhichsetup read task before.Tomodelthis,weintroduceguardedcommandsasfollows.Forthecontrollerwhichwants toreadduetoits I block,itwilldelaythedata-copyforonestepsothatitcancheckwhetherthere is flush taskafterwards.Hence,uponreceiving read request,controllersetup read taskandsets variable skipped to true (seeaction A 1 inFigure3.7).Nextthecontrollerwith M blockchanges thetaskto flush (action A 4 ).Togetherwith A 4 ,thecontrollerusedtosetupthe read tasksets skipped backto false .Notethatthough A 4 isnotenabledinanycontroller, A 5 willbeexecuted alone.Asaresult,ifthereisnocontrollerholding M block, A 8 executessothatthecontroller fetchthedatafrommemory.Otherwise, A 6 and A 7 areperformedsimultaneouslyafter A 4 and A 5 .By A 6 ,dataistransferredfromoneholding M blocktoonerequesting read .Andby A 7 , thecontrollerwith M blockupdatesmemoryandchangestheblockstateto S .Inrestpartofthe protocol,by A 2 ,controllerwhichreceives write ,setup write taskonbus,changesblockstateto S andincrementsitsdataby1.Oncethecontrollerwith S or M blockobserves write task,by A 4 itwillsetitsblockstateto I .Notethatasetofcontrollersshouldperform A 4 simultaneously. 47 A 1 : state:j = I ^ req:j = r ! task := f r;j g ; skipped:j := true ; A 2 : req:j = w ! task := f w;j g ; state:j := M ; data:j := data:j +1; req:j := null ; A 3 :( state:j = S _ state:j = M ) ^ task = f w;k g! state:j := I ; A 4 : task = f r;k g^ state:j = M ! task := f f;j g ; A 5 : task = f r;j g^ skipped:j ! skipped:j := false ; A 6 : task = f f;k g^ state:j = I ^ req:j = r ^: skipped ! data:j := data:k ; state:j = S ; req:j := null ; A 7 : task = f f;j g^ state:j = M ! state:j := S ; data:m := data:j ; task := null ; A 8 : task = f r;j g^ req:j = r ^: skipped:j ! data:j := data:m ; state:j := S ; req:j := null Figure3.7Fault-intolerantCacheCoherenceProtocol 3.10.2FaultsinMSIProtocol. Weconsiderafaultwhereonecachecontrollerfailstoupdatememoryin A 7 .Notethatifsome controllerwantstoread,itstillgetthedatafromoneholding M block( A 6 )simultaneously.Thus, basedonthediscussioninSection3.7,weneedtomodel A 6 aspartofthefaultwhichshownas follows f : task = f;j ^ state:j = M ! state:j := S;task := null ; task = f f;k g^ state:j = I ^ req:j = r ^: skipped ! data:j := data:k ; state:j = S ; req:j := null ; 3.10.3SafetyofMSIProtocol. Thesafetyrequiresthatanyreachablestatethefollowingconstraint. Sf ag : 8 j;k;state:j = state:k = S ) data:j = data:k .Each S blockshouldholdsame datawiththoseinallother S blocks. 48 3.10.4LegitimateStatesofMSIProtocol. Initially,allcacheentriesaremarkedinvalidandthetaskisnull.Allreachablestatesfromthis statearelegitimatestates. 3.10.5AddingMaskingFault-tolerancetoMSIProtocol Considerhowthefault-intolerantprotocolcanbeperturbedby f .Supposeafaultoccurs insteadofaction A 7 ,allthecachesareupdatedbutthememorybecomesstale.Now,ifacache wantstoreadafterwards,byperforming A 1 , A 5 and A 8 ,thecachemaygetthestaledatain memorythusviolates Sf ag .InAlgorithm1, AddSafeRecovery ensuressuchastateisnotreachable(byLine1,2,4and7in AddSafeRecovery ). Inaddition,torecovertolegitimatestates, AddSafeRecovery addsthefollowingrecoveryaction (byLine8in AddSafeRecovery )insteadofperforming A 8 . task = r;j ^9 t : state:t = S ^ data:t 6 = data:m ! data:j := data:t;data:m := data:t; state:j := S;task := null Asaresult,thecomputedfault-tolerantprogramindicatesamoreconservativebehaviorinthe presenceoffaults.Beforeacachecontrollercopiesdatafrommemory,itcheckstheblockin memoryandthesharedblockinothercache,ifitisdifferentthecachecontrollershouldgetthe datafromothercache. 49 3.11ExperimentResults Inthissection,wedisscusstheimplementationofAlgorithm1anditsapplicationsonthecase studies.AlltheexperimentsinthissectionarerunonaPCwithAMDAthlonIIX42.8MHz processorwith6GBRAM.Algorithm1isimplementedsymbolicallyusingGlu/CuDDpackage [65]. Toobtainacompactrepresentation,wemodeltheprotocolsinthecasestudiesusingBDDs. Theforward/backwardreachabilitycalculationsin AddSafeRecovery (onLine1,8,9)areimple- mentedascalculations.ToimplementLine1in BreakCycle ,weusethefollowingfor- mula: p 0 = p \ (( S 1 S 0 ) [ ( S 2 S 1 ) [[ ( S n S n 1 )) Where S i isastatepredicatethatincludesstateswhoserankis i ,i.e.,theshortestpathfrom thesestatestoastateintheinvariantis i .Thus S 0 istheinvariant.Each S i isobtainedbybackward reachabilitycomputationfrominvariant. For ConstructSyncProgram ,itisnotpossibletoimplementitsymbolicallysinceweneedto checktheoutgoingtransitionsfromeachstatein T .However,weapplyotheroptimizationsto improvetheperformance.Forexample,givenstate s ,wecancompute s i and A s symbolically. Inspiteoftheseoptimizations, ConstructSyncProgram stillremainsthebottleneckofAlgo- rithm1.Thisisduetothefactthatsomeenumerationofstatesin T isessentialinimplementation of ConstructSyncProgram .Theresultsforthethreetasksinvolvedinaddingfault-toleranceareas showninTable3.1andTable3.2(N/AintheseTablesmeansthattheresultcouldnotbeobtained inonehour.) 50 AddSafeRecoveryBreakCyclesConstructSyncProgram 3 0.02s < 0.01s0.98s 4 0.15s0.13s150s 5 10s12sN/A Table3.1ExperimentalResultsforGroupMembershipProtocol AddSafeRecoveryBreakCyclesConstructSyncProgram 3 < 0.01s < 0.01s0.15s 4 < 0.01s < 0.01s0.32s 5 < 0.01s < 0.01s0.62s 10 < 0.01s < 0.01s4.27s 15 < 0.01s < 0.01s13.68s 20 < 0.01s < 0.01s33.18s 30 < 0.01s < 0.01s131s 40 < 0.01s < 0.01s185s 50 < 0.01s0.01sN/A Table3.2ExperimentalResultsforThree-StateWrite-BackInvalidationProtocol 3.12ApplicationofOurAlgorithminExistingArchitectures/Lan- guages 3.12.1ApplicationinTTA Inthe time-triggeredarchitecture (TTA)[45],thebasicapproachistoutilizeasharedbusalong withtimedivisionmultiplexing.Timedivisionmultiplexing(TDMA)ensuresthatonlyonepro- cesscanaccessthebusatatime.Moreover,sincethedatatransmittedbyanodeissentovera bus,itisreadbyallnodesinthesystem.Ouralgorithmscanbeappliedto protocols inaTTA (asopposedtothearchitectureitself).,aprotocolinTTAcanbeviewedasaprogram insynchronoussemantics.Ineachstep,thesetofprocessesupdatetheirstatesbasedonthedata transmittedonthebus.Moreover,thisexecutionisinsynchronoussemantics. 51 ThereareseveralwaysinwhichaprotocolinTTAcanbemodeled.Oneapproachistoutilize input variables.,withthisapproach,wecanmodeltheprotocol,sothatthestateof aprocessisupdated justbefore itsTDMAslot.Thisisreasonablegiventhattheprocesscan transmititsdataonlyinitsTDMAslot.So,anychangesperformedbeforewillnotbevisible untilthen.Moreover,wheneveranyprocesstransmits,thistransmitteddataisstoredinthe input variablesofotherprocesses.Usingsuchanapproachintroducesconstraintsthatmustbe bythegeneratedmodel.Forexample,onesuchconstraintisthattheprocessstatedoesnotchange whentheTDMAslotdoesnotcorrespondtoit.Anotherconstraintisabouthowinputvariables areNotethatboththeseconstraintsessentiallyidentifyasetofbadtransitionsandcan bemodeledasanauxiliarysafetyrequirementthathastobesatiduringrepair.Additionally, withthisapproach,theoriginalsafetyrequirementsmayneedtobewhileapplyingmodel repair.Toillustratethis,considerthecasewherewehavetwoprocesses j and k withvariables x:j and x:k .Ifthesafetyrequirementstatesthatthesevaluesshouldbeidenticalatalltimes,then duringrepair,thiswouldhavetobechangedsoastoaccountforthefactthat x:j (respectively, x:k )ischangedonlyintheTDMAslotof j (respectively, k ). Dependingupontheprotocolitself,however,thismaynotbenecessary.,inthe casestudyinSection3.9,wemodeledhoweachprocessupdateditsownstateuponreceivingmes- sagesrelatingtogroupmembershipupdates.Inthisapproach,eachprocessupdatesimmediately uponreceivinginformationfromothers.Moreover,ifprocessesaredeterministic,thenotherscan alsodeducethestateofothers.However,inamoregeneralscenario,onemayneedtointroduce thenotionofpublicandprivatevariableswheretheprocessupdatesitsinformationusingprivate variableswhichmaynotbevisibletoothers.Whilewedidnotaddressthisissueinthispaper, wenotethatithasbeenconsideredinthecontextofinterleavingsemanticsin[46].Moreover,the samecomplexityresultsapplyinsynchronoussemanticsaswell. 52 3.12.2ApplicationinEsterel Esterel[12]isasynchronousprogramminglanguagespeciallydesignedforreactivesystems.It utilizespreemption,parallelismandimperativeprogrammingstyle.Moreover,therearecompilers thatcantranslateEsterelprogramsintoCcode,VHDL,Verilog,etc.Ourapproachcanbeap- pliedtoasubsetofEsterelprograms.,anEsterelprogramconsistsofthefollowing primitives: nothing terminatesimmediatelyafter started. pause stopsforoneclocktick. emit S setsignal S tobepresent. loop p executes p againsoonafteritter- minates. p ; q q executesimmediatelyafter p terminates. p jj q p and q executesinparallel. trap T in p end alabeledexitblock T for p . exit T Jumptotheinnermost T -labeled exitblock. Table3.3PartofEsterelPrimitives IfagivenEsterelprogramisofthefollowingform: loop [ p 1 jj p 2 jj p 3 jj ::: ] end ,whereeach p i is oftheform q i ; pause andeach q i containsanycombinationofstatementsincluding nothing , semi- colon , signal , emit , present , trap ,and exit thenourapproachcanbeeasilyapplied., thegivenprogramcanbeconvertedtotransitionsystemseasily,evenusingautomatedtechniques. Then,ourapproachcanbeappliedtogeneraterepairedtransitionssystem.Theproblemofusing therepairedtransitionsystemstogeneratearepairedEsterelprogramisbeyondthescopeofthis paper.Wealsonotethatourapproachcanbeappliedforevensomeadditionalprograms.However, inthiscase,generatingthetransitionsystemfororiginalprogrammayrequiremanualintervention toevaluatedifferentplaceswheretheprogrammay pause forthenextcycle. 53 ConvertingtransitionsofarepairedprogramtoEsterelalsogeneratessomechallenges.Specif- ically,non-determinismfromtherepairedprogrammustberemoved.Wenote,however,thatthis canbedonewhilepreservingcorrectnessofrepairedprogram.Thisisduetothefactthatthe correctnessrequirementofrepairedprogramimposesconstraintson allcomputations .Hence,re- ducingnon-determinismdoesnotaffectit.However,theexactchoiceinreducingnon-determinism requiresmanualinterventionand/orheuristicstomaketherepairedprogram similar tothatofthe originalprogram. 3.12.3ApplicationinBIP ProcessalgebrasandsimilarformalismssuchasBIP[36]allowparametricinteractions whereseveralcomponentscanexecutesynchronizedactionsinoneatomicstep.InBIP,amodel isasasetofcomponentssynchronizedbyprimitivessuchasrendezvousandbroadcast operators.OuralgorithmcanbeappliedtoafragmentofBIPmodels.Inparticular,ifaBIP modelhasonlyrendezvousinteractionsandsuchinteractionsexpandover all components,then theresultingmodelcanbeexpressedusingthesynchronoussemanticsdescribedinSection3.4. 54 Chapter4 LazyRepairforCyber-physicalSystems 1 inChapter2areforsystemthatconsistsofasetofprocesses,whereeachprocess operatesindependently.Inacyber-physicalsystem,wehavecomponentsthatperformactions simultaneously.Inparticular,theexecutionsofincyber-physicalsystemarereliedoninteractions on2(ormore)components.Forexample,inathermo-sensitivesystem[1],toensurecritical functionality,thecontrollermodule(acybercomponent)shouldreadvoltageoutputproducedby thetemperaturesensor(aphysicalcomponent)andestimatecurrenttemperature,andtunecooling module(anotherphysicalcomponent)simultaneously.Wecannotreuseinterleavingsemanticsto modelthebehaviorofacyber-physicalsystem,sinceprocessesunderinterleavingsemanticsare allowedtoperformwithdelay.Also,wecannotreutilizesynchronoussemanticsformodeling, sinceitrequiresallprocessestoparticipateineveryprogramtransition.Hence,inSection4.1,we formalizethenotionofprograms,computations,faults,etcforcyber-physicalsystem,following presentedin[11,36].Therestofthischapterisorganizedasfollows.InSection4.2, weintroducetherealizabilityconstraintsincyber-physicalprograms.InSection4.3,weintroduce howtomodelthefaultsincyber-physicalprograms.TheninSection4.4,wediscusstheconstraints foraddrecoverytoacyber-physicalprogram.AndinSection4.5,wepresentseveralheuristics toachieveefaddingrecovery.FinallyinSection4.6andSection4.7,wepresenttwocase studys,i.e.,broadcastchannelandtrainsignalcontroller. 1 ThisisajointworkwithBorzooBonakdarpourandSandeepKulkarni 55 4.1PreliminaryConceptsforCyber-physicalSystem 4.1 (atomiccomponent) . An atomiccomponent B isalabelledtransitionsystem representedbyatuple ( Q;P; ! ;q 0 ) where Q isasetof states , P isasetof communicationports , Q P [f ˝ g Q isasetof transitions including 1. observable transitionslabelledbyports,and 2. unobservable ˝ transitions q 0 2 Q istheinitialstate. Figure4.1Acommunicationprotocol. Foranypairofstates q;q 0 2 Q andaport p 2 P [f ˝ g ,wewrite q p ! q 0 ,iff ( q;p;q 0 ) 2! . Whenthelabelisirrelevant,wesimplywrite q ! q 0 .Similarly, q p ! meansthatthereexists q 0 2 Q ,suchthat q p ! q 0 .Inthiscase,wesaythat p is enabled instate q .Figure4.1showsthreeatomic components Sender , Channel ,and Receiver .Forexample,inatomiccomponent Sender ,we have Q = f s 0 ;s 1 g , q 0 = s 0 , P = f snd ; sack g ,and ! = f ( s 0 ; snd ;s 1 ) ; ( s 1 ; sack ;s 0 ) g . 56 Inpractice,atomiccomponentsareextendedwithvariables.Eachvariablemaybeboundto aportandthroughinteractionsinvolvingthisport.Wealsoassociateaguardandan updatefunctiontoeachtransition.Aguardisapredicateonvariablesthatmustbetruetoallow theexecutionofthetransition.Anupdatefunctionisalocalcomputationtriggeredbythetransition thatthevariables.Forsimplicityandwithoutlossofgenerality,weomitthesedetailsin thisdissertation. 4.2 (componentcomputation) . A computation ofacomponent B =( Q;P; ! ;q 0 ) isa orsequenceofstates q 0 q 1 q 2 ,suchthat 1. q 0 = q 0 . 2. forall j 0 (i) q j 2 Q ,and(ii) q j ! q j +1 . Reachablestates.Let B =( Q;P; ! ;q 0 ) beacomponentand S beastatepredicatein B ;i.e., S Q .Westatepredicate Reach 1 ( S )= S [f q 0 j9 q 2 S : q ! q 0 g .Thatis, Reach 1 ( S ) canbecomputedbyidentifyingstatesthatareimmediatelyforwardreachablefromthesetofstates S .Likewise,onecancompute Reach 2 ( S )= Reach 1 ( Reach 1 ( S )) .Inastatemodel,itis straightforwardtoshowthatthereexists n 1 ,suchthat Reach n ( S )= Reach n +1 ( S ) .Wecall thisthesetof reachablestates from S anddenoteitby Reach ( S ) .Thus,thesetofreachablestates ofacomponent B =( Q;P; ! ;q 0 ) is Reach ( B )= Reach ( f q 0 g ) .Forexample,inFigure4.1,we have Reach ( Sender )= f s 0 ;s 1 g . 4.3 (interaction) . Foragivensystembuiltfromasetof m atomiccomponents f B i = ( Q i ;P i ; ! i ;q 0 i ) g m i =1 ,weassumethattheirrespectivesetsofportsarepairwisedisjoint;i.e.,for anytwo i 6 = j from f 1 ::m g ,wehave P i \ P j = ; .Wecanthereforetheset P = S m i =1 P i 57 ofallportsinthesystem.An interaction isaset a P ofports.Whenwewrite a = f p i g i 2 I ,we supposethatfor i 2 I , p i 2 P i ,where I f 1 ::m g . 4.4 (compositecomponent) . A compositecomponent isbyacomposition operatorparametrizedbyasetofinteractions 2 P . B = ( B 1 ;:::;B m ) ,isatransition system ( Q;; ! ;q 0 ) ,where Q = N m i =1 Q i , q 0 =( q 0 1 ;:::;q 0 m ) ,and ! istheleastsetoftransitions satisfyingtherule a = f p i g i 2 I 2 8 i 2 I : q i p i ! i q 0 i 8 i 62 I : q i = q 0 i ( q 1 ;:::;q m ) a ! ( q 0 1 ;:::;q 0 m ) Inacompositecomponent, ˝ -transitionsdonotsynchronizeandexecuteinaninterleavingfashion. Theinferenceruleintion4.4saysthatacompositecomponent B = ( B 1 ;:::;B m ) can executeaninteraction a 2 ,iffforeachport p i 2 a ,thecorrespondingatomiccomponent B i canexecuteatransitionlabelledwith p i ;thestatesofcomponentsthatdonotparticipateinthe interactionstayunchanged. Figure4.1illustratesacompositecomponent CCP = ( Sender ; Channel ; Receiver ) ,where = ff snd ; add 1 g ; f rem 1 ; rcv g ; f rack ; add 2 g ; f rem 2 ; sack gg .Thebehaviorofthemodelisas follows.Thecomponent Sender sendsapacketviaport snd andreceivesthecorrespondingac- knowledgementthroughport sack .Likewise, Receiver receivesthesentpacketthroughport rcv andsendsanacknowledgementthroughport rack .Byeachtransmission,component Channel addsanitemtoitssingle-spacebuffer(throughports add 1 and add 2 )andbyeachdelivery,the itemisremoved(viaports rem 1 and rem 2 ). 58 Similartocomputationsofanatomiccomponent,a computation ofacompositecomponent B = ( B 1 ;:::;B n ) isaorsequenceofstates s 0 s 1 s 2 ,suchthatforall i 0 1. s i isastateof B (i.e., s i =( q 0 ;q 1 ; q n ) ,whereforall j 2f 1 ::n g , q j 2 Q j ). 2. s i a ! s i +1 ,forsomeinteraction a 2 . Theofstatepredicate,statespaceremainsthesamebyreplacinglocalstatesofan atomiccomponentwithglobalstatesofacompositecomponent.Theset L ofgloballegitimate statesofacompositecomponent B = ( B 1 ;:::;B n ) istheconjunction ( ^ V n i =0 L i ) ,where L i isthesetoflegitimatestatesofatomiccomponent B i ,constrainedbythesetofinteractions. Observethattheconstraint in L isnecessarytoensurethatcomputationsofacompositecom- ponentarei.e.,anyarbitrarycombinationoflocallegitimatestatesdoesnotformaglobal legitimatestate,asnointeractionmaybeenabledfromthatcombination. 4.2RealizabilityConstraintsinCyber-physicalPrograms Weconsidertworealizabilityconstraintsforcyber-physicalprogramsdiscussedinthisdissertation, 1. Oncethebehaviorofaphysical-componentistheinternalstructureofthecompo- nentcannotbechanged.Forexample,inFigure4.1,given Channel asaphysicalcomponents withfourinnertransitionsamongfourstates,anyprogramthatadds(ordrops)transitions (states)inside Channel cannotberealizedasacyber-physicalprogram. 2. Foreveryinnertransitioninaphysicalcomponent,theremustexistacyber-componentin theprogramtoreacttosuchtransitionviainteraction.Forexample,inFigure4.1,wesay aprogramcannotberealizedasacyber-physicalprogramif Sender doesnotinteractwith Channel viaportsotherthan sack . 59 Inthischapter,weonlydeveloptheapproachtoproviderecoverymechanismdealingthe realizabilityconstraint.Intheexamplesconsideredinthischapter,thesecondconstraintdoesnot apply.Morestringentanalysistosatisfythesecondconstraintispartofthefuturework. 4.3ModelingFaultsforAddingRecoverytoCyber-physical Programs Inthissection,wedescribefaultmodelinthecontextofthecomponent-basedframeworkdescribed inSection4.1. Let B =( Q;P; ! ;q 0 ) beanatomiccomponent.Inordertospecifythefaultybehaviorof component B ,denoted B f ,weextendthecomponentbyintroducingnewports P f .A faulttran- sition in B f isoftheform t =( q;p;q 0 ) ,suchthat(1) t 62! ,(2) q;q 0 2 Q ,and(3) p 2 P f [f ˝ g . If p 2 P f ,wesaythatfaulttransition t is observable .Otherwise,(i.e., p = ˝ ),wesaythatfault transition t is internal .Wecalltransitionsin ! normal transitions.Now,let ! f bethesetoffault transitionsincomponent B f .Thus,weobtaincomponent B f =( Q;P [ P f ; ![! f ;q 0 ) and wecallit component B inthepresenceoffaults . Likewise,weacompositecomponentinthepresenceoffaults.Let B = ( B 1 ;:::;B m ) beacompositecomponentand B f =( [ f )( B f 1 ;:::;B f m ) bethecompositecomponentinthe presenceoffaults.Obviouslyeachinteractionin onlyconsistsofportsassociatedwithnormal transitions(called normalinteractions ).However,aninteractionin f (called faultinteraction ) consistsofatleastoneportin P f j ,where 1 j m . Theconceptofreachablestatesinamodelinthepresenceoffaultsissimilartotheonepre- sentedinSection4.1,byconsideringinternal,normal,andfaulttransitionsaswellasfaultand normalinteractions.Likewise,thenotionofcomputationscanbetriviallyextendedbyconsidering 60 theunionofnormalandfaultinteractionsandinternaltransitions. Figure4.2Thecommunicationprotocolwithfaults. Continuingourcommunicationprotocolexamplewithsomefaults(seeFigure4.2),letuscon- siderthecasewherethechannelislossyandfaultscauselossofthesentpacket(i.e.,internal transition ˝ 1 )ortheacknowledgement(i.e.,internalfaulttransition ˝ 2 ).Wedenotethismodelby CCP f =( [ f )( Sender f ; Channel f ; Receiver f ) .Components Sender and Receiver haveno faulty behavior (i.e., P f = ! f = ; ).Inotherwords, Sender f = Sender and Receiver f = Receiver .Onthe contrary,in Channel ,wehave P f = ; ,and, ! f = f c 1 ! c 0 ;c 3 ! c 0 g .Sincebothfault transitionsareinternal,wehave f = ; . Ingeneral,introducingfaultstoamodelmayresultintwoundesirablesituations: Introducingdeadlockstates.Forinstance,theoccurrenceofafaulttransitionin Channel leads CCP f toreachadeadlockstate s 1 c 0 r 0 thatthereisnointeractionorinternaltransition isenabledin s 1 c 0 r 0 . Introducingcyclesoutsidethesetofreachablestates(i.e.,livelocks). 61 4.5 (deadlock) . Let B =( Q;; ! ;q 0 ) beamodel.Wesaythatastate q 2 Q isa deadlock state,ifandonlyiftheredoesnotexist a 2 suchthat q a ! . Inordertotacklethesepitfalls,acommonpracticeinfault-tolerantsystemistoprovidethe systemwitha recoverymechanism .Roughlyspeaking,givenamodel B f ,faultrecoveryensures thatiffaultscauseasystemtoreachastatein : Reach ( B ) ,thesystemisabletoreachastatein Reach ( B ) withinanumberofsteps.Weemphasizethatinthischapter,wearenotconcerned with safety issues.Inotherwords,inthischapterweconsidertheproblemofaddingnon-masking fault-tolerancetocyber-physicalprograms. Figure4.3Thecommunicationprotocolwithfaultrecovery. 4.6. Let B f beamodelinthepresenceoffaultsand q 0 q 1 q 2 q 3 beacomputationof B f .Wesaythat B f provides faultrecovery iffwhenthereexists i 1 suchthat q i 62 Reach ( B ) , thenthereexists j i +1 ,where q j 2 Reach ( B ) . Consideringourcommunicationprotocol,onewaytoresolvethedeadlockstate s 1 c 0 r 0 isto addtherecoverymechanism,whereapacketisre-transmittedwhenthepacketoritsacknowl- edgementislostinthepreviousattempt.Thissolutionresultsinobtainingthemodel CCP 0 = 0 ( Sender 0 ; Channel ; Receiver ) inFigure4.3,where: 62 Sender 0 includesanadditionalport rec andtransition s 1 rec ! s 1 . 0 includesanadditionalrecoveryinteraction f rec ; add 1 g . Therecoveryinteraction f rec ; add 1 g isenabledwhen CCP 0 f isinthestate s 1 c 0 r 0 .Fromthis state,executinginteraction f rec ; add 1 g resultsinre-transmittingthelastpacket,whichinturn leadsthemodeltostate s 1 c 1 r 0 .Sincethisstateisin Reach ( CCP 0 ) ,weareguaranteedthatthe modelrecoversandcanresumeitsnormaloperationintheabsenceoffaults. Ourgoalinthischapteristoinvestigateautomatedapproachestosynthesizeacomponent- basedmodelthatprovidesfaultrecovery,suchas CCP 0 inFigure4.3fromagivencomponent- basedmodelinthepresenceoffaults,suchas CCP f inFigure4.2.Duetorealizabilityconstraints ofcyber-physicalsystem(discussedinSection4.2),wehavetoconsiderconstraintswhileadding recovery.Inthenextsection,wediscusstheseconstraints. 4.4ConstraintsforAddingRecovery AsdiscussedinSection4.3,developingarecoverymechanismforamodelinvolvesaugmentingthe modelwithcomputationsthatensurereachingnormaloperationofthemodelwhenfaultsoccur. Forinstance,anaiveapproachtoachievesuchrecoveryistoresetallcomponentsinthemodel (e.g.,toforceallcomponentstostartworkingfromtheirinitialstates).Now,anaturalquestionis whethersuchasolutionisacceptableforallmodelsinalldomains. Inordertoautomaticallysynthesizearecoverymechanismforamodelinthepresenceoffaults thatdoesnotsatisfytherecoveryconditionasstatedin4.6,onehastoidentifythe constraintsofsuchsynthesisdependinguponcorrectnesscriteriaandapplicationdomain: Correctness.Werequirethatadditionofrecoveryshouldnotinterferewiththenormalbe- 63 haviorofthemodelintheabsenceoffaults;i.e.,addingrecoverydoesnotresultinchanging thebehavioroftheoriginalmodelintheabsenceoffaults.ThisisdiscussedinSubsection 4.4.1. Realizability.Thesecondtypeoftheconstraintsdealswiththecasewheretheinitial modelencompassesa cyber-physicalsystem .Forinstance,asdiscussedinSection4.2,it isnotreasonable(orsometimespossible)tochangethebehaviorofphysicalprocessesin ordertoaddarecoverymechanism.Thus,ournaiveresetsolutionisimpracticalasitis oftenimpossibletoresetphysicalprocessesduringsystemexecution.Theseconstraintsare discussedinSubsection4.4.2. 4.4.1Non-interferenceConstraints Thesetofconstraintsensurethataddingrecoverytoamodelinthepresenceoffaultsdoesnot changethebehaviorofthemodelintheabsenceoffaults.Formally,let B = ( B 1 ;:::;B m ) bea modeland B f be B inthepresenceoffaults.Supposethat B 0 = 0 ( B 0 1 ;:::;B 0 m ) issynthesized from B byaddingsomerecoverymechanism;i.e.,if B 0 f reachesastatein : Reach ( B 0 ) ,thenit eventuallyreachesastatein Reach ( B 0 ) .Recallthat 0 mayincludeinteractionsthatdonotexistin .Inordertoguaranteethatsuchsynthesispreservesthenormalbehaviorofthegivenmodel,we requirethatwhen B 0 f recovers(i.e.,itreachesastatein Reach ( B 0 ) ),itbehavesthesameas B in theabsenceoffaults.More,werequirethatthesetofcomputationsof B and B 0 inthe absenceoffaultsareequivalent.Furthermore,werequirethatallatomiccomponentsin B 0 behave thesameastheircorrespondingatomiccomponentsin B .Weintroducethenotionof projection forcomponent-basedframework. 4.7. Let B =( Q;P; ! ;q 0 ) beacomponent.The projection ofasetoftransitions 64 T onastatepredicate S Q isthesetoftransitions: T j S = f q ! q 0 2 T j ( q 2 S ) ^ ( q 0 2 S ) g i.e.,thesetoftransitionsthatstartin S andendin S . Wenowformalizeourcorrectnessconstraintsasfollows: ( C 1) Forall 1 i m ,wehave Q i = Q 0 i ,and ( C 2) 0 j Reach ( B 0 ) j Reach ( B 0 ) Thetconstraintisconcernedwithatomiccomponentsin B 0 .Inparticular,constraint C 1 requiresthatthestatespaceofeachatomiccomponentiskeptunchangedduringsynthesis.Con- straint C 2 ensuresthatnonewinteractionsareaddedto B 0 intheabsenceoffaults.Thisconstraint alongwithConstraint C 1 ensurethatthesetofcomputationsof B 0 isequaltothesetofcomputa- tionsof B intheabsenceoffaults. 4.4.2ConstraintsonCyber-physicalInteractions Inthissection,wedescribetheconstraintsthathavetobemetduringadditionofrecoveryto component-basedmodelsofcyber-physicalsystems.Inparticular,ourfocusisondifferenttypes ofinteractionsinsuchsystemsandtheireffectduringsynthesis.Forsimplicity,weconsider scenarioswhereaninteractionisonlybetweentwocomponents;interactionswiththreeormore componentscanbehandledinasimilarfashionasdiscussedattheendofthisSubsection. Let B = ( B 1 ;:::;B m ) beamodel.Wepartitiontheatomiccomponentsin B intoclasses B C (cybercomponents)and B P (physicalcomponents).Suchpartitioningisnormally byamodeldesigner.Forinstance,inourcommunicationprotocol B C = f Sender ; Receiver g and 65 B P = f Channel g .Basedonthiswealsopartitioninteractionsintofoursets:cyber- to-cyber( CC ),cyber-to-physical( CP ),physical-to-cyber( PC ),andphysical-to-physical( PP ). While CC and PP interactionscanbetriviallywedistinguishbetween CP and PC interactionsbasedonwhethertheinteractionisfi initiated flbyacybercomponentoraphysical component.Weexpectthattheinitiatorofaninteractionisbythedesigner,asitdepends uponthesemanticsofthegivenaction.Forexample,inourcommunicationprotocolexample,it isexpectedthat Sender initiatesthemessagetransmissionand,hence,interaction f snd ; add 1 g isconsideredtobea CP interaction.Althoughtheissueofsymmetricinteractions,wherethe initiatorcannotbedeterminedisoutsidethescopeofthiswork,wecanconsiderittobeasetof interactions;forexample,aninteractionbetweencomponents A and B wouldbeviewedastwo interactions:onewhere A istheinitiatorandonewhere B istheinitiator. Beforewedescribethecyber-physicalconstraints,noticethatamongallinteractiontypes, CC isthemostsimpleinteraction.Itinvolvestwocomputationalcomponents.Hence,duringthe synthesisprocess,itwouldbepossibletoreviseeitherofthetwocomponentsinvolvedinthein- teractionbyaddingand/orremovingtransitionsinsideacomponent,addingand/orremovingports insideacomponent,oraddingand/orremovinginteractionsamongcomponents.Forexample,in Figure4.2,ifthereceiverinteractedwithanothercybercomponentthatprocessedthemessages, thenduringsynthesis,itwouldbepossibletomodifythereceivercomponentaswellasthecom- ponentthatprocessedthesemessages.Thus,weimposenorestrictionsover CC interactions. Thecyber-physicalconstraintsareasfollows: ( C 3) Unlike CC interactions, CP interactionsaddconstraintsduringsynthesis.Considerthein- teraction f snd ; add 1 g between Sender and Channel inFigure4.2.Thisinteractionisini- tiatedby Sender and Channel participatesinit.Since Channel isaphysicalprocess,it 66 isnotpossibletomodifythiscomponentduringrepair.Hence,anyrepairalgorithmmust keeporiginal Channel transitionsunchanged.Itcannotaddnewtransitions(e.g.,fromstate c 1 to c 3 )orremoveexistingtransitions.Likewise,therepairalgorithmcannotaddorre- moveexistingports.However,itmaybepossibletoutilizeexistingports(e.g., add 1 portin Channel )toaddnewinteractions(e.g.,recoveryinteraction f rec ; add 1 g inFigure4.3for re-transmissionby Sender ). ( C 4) In PC interactions,wehaveanissuesimilartothe CP interactions;i.e.,thephysicalcom- ponentcannotbetoadd/removetransitionsand/orports.Additionally, PC interac- tionsalsopreventremovalofcertaininteractions.Toillustratethis,considertheinteraction f rem 1 ; rcv g inFigure4.2.Itisexpectedthatwhen Channel deliversthemessage,there- ceiverisobligatedtoacceptitbyperformingitsowntransition r 0 rcv ! r 1 .Thus,inadditionto theconstraintsimposedbyphysicalcomponents,a PC interactionrequiresonetodealwith forcedinteractions ,whereanactioninonecomponentmustbeassociatedwithanactionin anothercomponent.Inotherwords,intherepairedmodel,wecannothavetransitionswhere thephysicalcomponentexecutesitstransitionsbutthecorrespondingcybercomponentdoes notexecuteitstransitions. ( C 5) Theeffectof PP interactionscanbeunderstoodbytheconstraintsinsynthesizingphysical components.,itisnotpossibletoadd/removeanysuchinteractionsand/orports. Moreover,itisnotpermissibletouseexistingportstocreatenewinteractions.(Notethat thiswaspossibleinthe CP interactions.) Finally,forinteractionsthatinvolvethreeormorecomponents,theserestrictionscanbeex- tendedinasimilarmannerbyidentifyingthe initiator componentandthenextendingabove restrictionsaccordingly.Forexample,a CP interaction(initiatedbyacybercomponentwithtwo 67 physicalcomponents),itwouldnotbepossibletomodifythephysicalcomponents.However, theportsinsidethephysicalcomponentscanbeusedtocreatenewinteractionswiththecyber component. 4.5HeuristicsforAddingRecoverytoCyber-physicalPrograms Duringtheadditionoffault-tolerancetocomponent-basedmodels,weneedtoaddrecoverytran- sitionsand/orinteractionstoensurethatthemodelrecoverstostatesthatarereachablefrominitial statesintheabsenceoffaults.In[22],theauthorsshowedthattheproblemofsynthesizingfault- tolerantCPSmodelsisNP-completewhenwewanttopreservetheefyoftheoriginalmodel. Hence,weneedtoidentifyheuristicsthatpermitefimplementationwhileensuringthatwe canthedesiredfault-tolerantcomponent-basedmodelinmanyexamples.Moreover,incases wherepreservingefyisnotpossible,wewouldliketominimizethenumberofinteractions thatdonotsatisfyconstraint C 6 .Basedonthisdiscussion,wepresentveheuristics,next.Of these,thetwopreserve C 6 .Theremainingthreeprovideatradeoffbetweenthefeasibilityto synthesizethefault-tolerantmodelanditsefy. Beforewedescribeourheuristics,weobservethe executioncost ofanyaddedinteraction.An interactionamongseveralcomponentssuffersfromahighexecutioncostsinceitrequiressynchro- nizationamongseveralcomponents.Also,ifaninteractionisaddedamongcomponentsthatwere notinteractingbefore,thentheresultingexecutioncostmaybehigh.Basedontheseobservations, weidentifyourheuristics,next. Heuristic1: OriginalInteractions. Iftwocomponentsinteractintheoriginal(input)model thenwecanhypothesizethatitispossibletoimplementtheinteractionamongthosetwocompo- 68 nentsef.Forthisreason,weidentifyanyrecoveryinteractionsthatcanbeaddedby onlyfocusingoncomponentsthatinteractedintheoriginalmodel.Asanillustration,intheexam- pleinFigure4.1,wecanaddaninteractionbetweenthepair( Sender , Channel )and( Channel , Receiver ).Ofcourse,thismightincludeadditionalrecoverytransitionsaddedinsideonecompo- nent.Also,itmayincludenewinteractionsthatutilizeexistingtransitionsand/orthenewlyadded transitions.However,thechoiceofnewtransitionsand/orinteractionsislimitedbytheconstraints ofthecyber-physicalsystem.Forexample,inaddinganewrecoveryinteractionfor Sender and Channel component,wecannotaddnewtransitionsorportsto Channel .However,existingports couldbecomposedwithnewportsintroducedin Sender . Alimitationoftheheuristicisthattherearescenarioswheretheoriginalmodelinvolves interactionbetweenseveralcomponentsalthoughthesynthesizedmodelneedstoincludeinterac- tionsthatonlyincludeasubsetofthosecomponents.Asanillustration,considertheextensionof theexampleinFigure4.1wheretherearetwochannelsandtwo(corresponding)receivers.Inthis case,theoriginalmodelwillcontaina(broadcast)interactionwherethesenderandbothchannels participate.However,thesynthesizedmodelmayalsocontainadditional(unicast)interactions wheresenderandonlyonechannelparticipate.Basedonthisobservation,weintroducethefol- lowingheuristic. Heuristic2:LimitedOriginalInteractions.Iftheoriginalmodelcontainsaninteractionthat includescomponents B 1 , B 2 ;:::;B n thenduringrecovery,weconsiderinteractionsconsistingofa subsetofcomponentsin f B 1 , B 2 ,..., B n g .Althoughthismayresultinaworstcasesituationwhere weneedtoconsideranexponentialnumberofpossibilities,wenotethatthisisexponentialinthe numberofcomponents(andnotinstatespace).Furthermore,thiscouldbeoptimizedfurtherby onlyconsideringinteractionswhereweonlyconsiderasmallsubsetofcomponents(e.g.,withtwo 69 orthreecomponents)andtheircomplements(i.e.,allcomponentsexceptthechosencomponents). Theaboveheuristicsonlypermitinteractionsamongcomponentsthatwereinteractinginthe originalmodel.Insomescenarios,itmaybenecessarytoaddfurtherinteractionsamongcom- ponentsthatdidnotinteractintheoriginalmodel.Forthisreason,weintroducethefollowing heuristic: Heuristic3:TransitiveInteraction.Ifnorecoverycouldbeaddedbyapplyingheuristics1 and2,weconsidernewinteractionsthatareobtainedbytransitiveclosureofinteractionsinthe originalmodel.,iforiginalmodelcontainsaninteractionbetweencomponents B 1 and B 2 andanotherinteractionbetweencomponents B 2 and B 3 ,thenweconsiderpossiblerecovery interactionsamongcomponents B 1 , B 2 ,and B 3 .Asanillustration,intheexampleinFigure4.1, wecanconsideraninteractionbetweenthe Sender , Channel ,and Receiver . Heuristic4:LimitedTransitiveInteraction.Forthecasewheretransitiveinteractionalso failstoaddtherequiredrecovery,weconsiderlimitedtransitiveinteractions.Forinstance,inthe exampleconsideredinthepreviousparagraph,thiswouldresultinconsiderationofaninteraction betweencomponents B 1 and B 3 . Finally,forthecasewherealltheseheuristicsfail,weconsidernewinteractionsthataredevel- opedfromscratch.,inthiscase,thenewrecoveryinteractionsmaynotberelatedto theinteractionsintheoriginalmodel. Heuristic5:HailMary.Sinceourgoalistoidentifyinteractionswithsmallnumberofcom- ponents,weconsiderinteractionsamonganysetoftwocomponents,thenanysetofthree componentsandsoon. 70 4.6CaseStudy1:BroadcastChannel BasedontheheuristicsinSection4.5,weconsiderpossiblenewinteractionsthatcanbeadded tothemodelinFigure4.1.Forthecasewherethereisonlyonechannelandonereceiver,accord- ingtoheuristic1,weconsiderpossibleinteractionsbetween( Sender , Channel )and( Channel , Receiver ).Whileweconsiderallpossibleinteractionsthatcanbeadded,weensurethatnonew transitionscanbeaddedto Channel componentandnoexistingtransitionsfrom Channel com- ponentareremoved.Basedontheserestrictions,weaddtheinteractionthatconsistsoftransition ( s 1 ;s 1 ) in Sender andthetransition ( c 0 ;c 1 ) inthe Channel .Applyingthisheuristicresultsin obtainingthemodelinFigure4.3. Forthecasewheretherearemultiplechannels,heuristic1addstheinteractiontodealwiththe casewhereallreceiverslosethemessage.Subsequently,heuristic2dealswiththecasewherea subsetofreceiverslosethemessage.First,heuristic2considersinteractionsbetweenasubsetof twocomponents.Thus,itconsidersaninteractionbetween(1)apairofchannelsand(2)between thesenderandonechannel.Ofthese,duetotherestrictionimposedonthechannelcomponent, wecannotaddanyinteractionsbetweenapairofchannels.And,itaddsaninteractionbetweenthe senderandonechannel.Sinceaddedinteractionssuftoproviderecoveryforthisexample,an exponentialblowupinthenumberofpossiblecomponentcombinationsisavoided. Theresultsforthetimerequiredforsynthesisforadifferentnumberofreceiversisasshown inTable4.1.Furthermore,basedonthediscussionabove,thetimerequiredforobtainingthefault- tolerantprogramisatmostquadraticinthenumberofcomponents.Furthermore,thetimeneeded foradditionoffaultrecoveryissmall.Forexample,evenwith1000receivers,thetimeforaddition offaultrecoveryislessthanoneminute. 71 Numberof Reachable TimeforAddition channels States ofFaultRecovery(ms) 5 10 3 48 10 10 6 58 50 10 30 149 100 10 60 489 200 10 120 2278 500 10 300 11935 1000 10 600 58763 2000 10 1200 256166 Table4.1Experimentalresultsforautomatedadditionoffaultrecoverytobroadcastchannel. 4.7CaseStudy2:TrainSignalsController Figure4.4Atrainsignalcontroller. Wealsoappliedourheuristicstoatrainsignalcontrollerexample.Inthisexample(seeFigure 4.4),atrain(physicalcomponent Train )travelsonacircularrailwaytrackcontrolledbyasequence ofsignals(cybercomponents Signal1 Signal4 ).Thetraincanpassasignalonlyifitisgreen. Eachsignaloperatesasfollows.When Train passesthesignal(say Signal3 ),itchangesphase 72 fromgreentored.Thisactionissynchronizedwiththetwoprecedingsignals(i.e., Signal2 and Signal1 ),sothattheprevioussignalturnsyellow(i.e., Signal2 )andthesignalbeforethat(i.e., Signal1 )turnsgreen.Moreover,thisactionissynchronizedwiththelocationof Train aswell(i.e., movingfromlocation3to4).Thus,bystartingfrominitialstate YRGG 3 ,wherethe4letters denotethestateofthesignalsandthelastnumberrepresentsthestateof Train ,thecomposite componentexhibitsacomputationofthefollowingform: YRGG 3 ;GYRG 4 ;GGYR 1 ;RGGY 2 ; Faultscanarbitrarilycausechangeofphasefromredtoyellowineachcomponent.Weshow thisinFigure4.4incomponent Signal3 onlyforsimplicity.Theoccurrenceofsuchafaultcauses themodeltoreachthestate GYYG 4 whichisadeadlockstate.Applyingtheheuristicsintroduced inSection4.5addstherecoveryinteraction f rec 4 ; rec 3 ; rec 2 g andthecorrespondingtransitionsin component Signal4 , Signal3 ,and Signal2 asshowninFigure4.4. ThetimeforsynthesisforthetrainexampleisasshowninTable4.2.Aswecansee,whenthe numberoftrainsisincreasedinitially,thereachablestatespaceincreases.Thiscausesanincrease inthetimeforadditionoffaultrecovery.However,afteracertainthresholdonthenumberof trains,thelevelofconcurrencydecreases.Inotherwords,onlyasmallsubsetoftrainscanmove atagiveninstance.Hence,afterthisthreshold,thetimeneededforadditionoffaultrecovery decreases. 2trains 3trains 4trains 5trains 5signals 50ms 8signals 52ms 53ms 10signals 107ms 65ms 62ms 12signals 119ms 582ms 54ms 15signals 765ms 11649ms 105ms 138ms Table4.2Experimentalresultsforautomatedadditionoffaultrecoverytotrainsignalscontroller. 73 Chapter5 LazyRepairforDistributedSystems Inthischapter,wepresentourworkforaddingfault-tolerancetodistributedprograms.Thischap- terisorganizedasfollows.InSection5.1,wetherealizabilityconstraintsofdistributed programs,i.e.,writerestrictionandreadrestriction.InSection5.2,wetheproblemstate- mentofaddingfault-tolerancetoadistributedprogram.InSection5.3,wepresentthealgorithm whichiscomposedoftwosteps.Inthestep,wereusethealgorithmin[46]toobtainahigh atomicityfault-tolerantprogram.Inthesecondstep,weperformfurtherrepairtoresolverealiz- abilityconstraintsofdistributedprograms.Finally,inSection5.4,wepresentourexperimental resultsofusingthealgorithmin5.3. 5.1RealizabilityConstraintsinDistributedPrograms Figure5.1Vi- olatewritere- striction. Figure5.2Vi- olatereadre- striction. Figure5.3 CorrectMod- eling Inthissection,wepresentthenotionofrealizabilityconstraintsindistributedprograms.To illurstrate,weconsideraprogram p = h V p ;R p i where V p = f v 0 ;v 1 ;v 2 g withdomain f 0 ; 1 g and R p = f j;k g .Process j canread v 0 ;v 1 andprocess k canread v 0 ;v 2 ,i.e., R j = f v 0 ;v 1 g and R k = f v 0 ;v 2 g .Process j canwritevariables v 1 andprocess k canwrite v 2 ,i.e., W j = f v 1 g and 74 W k = f v 2 g .Supposethatthebehaviorof j and k arethroughthefollowingactions: j :: v 0 =0 ! v 1 :=1 k :: v 0 =1 ! v 2 :=0 InFigure5.1,5.2and5.3,weshowasubsetoftransitionsof p instatewhere v 0 =0 .Nextwe evaluatewhetherthetransitionsinthesecanberealizedbyprocess j or k . ObservethatinFigure5.1,thetransition f (000 ; 011) g violateswriterestriction.Itisbecause both v 1 and v 2 areupdated,andneither j nor k hasthepermissiontowritebothvariables.Thus, anyprogramthatincludesthetransitioninFigure5.1cannotberealizedwithprocess j and k (with thegivenwriterestrictions). ObservethatinFigure5.2,thetransition f (000 ; 010) g violatesreadrestriction.Notethat j can notreadvariable v 2 ,hence j shouldnotexecuteundertheassumptionof v 2 equalsto 0 .Hence, p shouldincludeonemoretransition f (011 ; 001) g ,otherwisesuchprogramcannotberealizedby process j (withthegivenreadrestriction). Hence,aprogramincludestransitionsshowninFigure5.3canberealizedbyprocess j and k . Next,weformalizethenotionsofwriteandreadrestrictions. 5.1.1WriteRestrictions Let p = h V p ;R p i beaprogram,and j = h R j ;W j ; j i beaprocessofprogram p .Then j must excludethefollowingtransitionsduetoinabilitytowritevariablesnotin W j . write j ( W j )= f ( s 0 ;s 1 ) j9 v= 2 W j ;v ( s 0 ) 6 = v ( s 1 ) g 75 5.1.2ReadRestrictions Let p = h V p ;R p i beaprogram,and j = h R j ;W j ; j i beaprocessofprogram p .Let ( s 0 ;s 1 ) bea transitionin j .Ifvariable v isnotin R j ,then j mustincludecorrespondingtransitionsfromall states s 0 0 where s 0 0 and s 0 differonlyinthevalueof v .Let ( s 0 0 ;s 0 1 ) beonesuchtransition.Now,it mustbethecasethat s 1 and s 0 1 areidenticalexceptforthevalueof v ,and,thevalueof v mustbe identicalin s 0 0 and s 0 1 .Thus,eachtransition ( s 0 ;s 1 ) in p isassociatedwiththefollowing group predicate . group j ( s 0 ;s 1 )= f ( s 0 0 ;s 0 1 ) j ( 8 v 2 R j ;v ( s 0 )= v ( s 0 0 ) ^ v ( s 1 )= v ( s 0 1 )) ^ ( 8 v= 2 R j :: v ( s 0 0 )= v ( s 0 1 ) ^ v ( s 0 )= v ( s 1 )) g : Givenatransition,say ( s 0 ;s 1 )= f (000 ; 010) g ,ofprocess j inthepreviousexample,group predicate group j ( s 0 ;s 1 ) shouldinclude f (000 ; 010) ; (001 ; 011) g . 5.1 (transitionpredicaterealizability) . Wesayatransitionpredicate canbe realized byprocess j ,where j = h R j ;W j i ,ifandonlyifthefollowingconditionshold, \ write j ( W j )= ; .(Notviolatewriterestriction.) 8 ( s 0 ;s 1 ) 2 :: group j ( s 0 ;s 1 ) .(Notviolatereadrestriction.) 5.2ProblemStatementofAddingFault-tolerancetoDistributed Programs Inthissection,weformallytherepairproblemofadding fault-tolerance todistributedpro- grams.Givenafault-intolerantprogram,say p ,wewanttoobtainarepairedfault-tolerantpro- 76 gram,say p 0 .Inthisdissertation,wealwaysassumethat p itsspeintheabsence offaults.Thus,arepairalgorithmshouldnotintroduceanynewbehaviorstotheprograminthe absenceoffaults.Hence,werequirethattheinvariant I 0 oftherepairedfault-tolerantprogram isasubsetoftheinvariant I oftheoriginalintolerantprogram.And,intheabsenceoffaults,a computationof p 0 thatstartsfromastatein I 0 shouldexistinthesetofcomputationsof p thatstart from I .Clearly,arepairalgorithmshouldalsoensurethattheprogramisfault-tolerant. ProblemStatement: Givenaprogram p withinvariant I , SPEC , andaset f offaults,suchthat p SPEC from I ; Doesthereexistaprogram p 0 withinvariant I 0 ,suchthat: C1 : ( I 0 I ) ^ ( p 0 j I 0 p j I 0 ) C2 : p 0 isfault-tolerantto f for SPEC from I 0 . 5.3AlgorithmofAddingFault-tolerancetoDistributedPro- grams Algorithm6 AddingMaskingFault-tolerancetoaDistributedProgram Input: Aprogram h V p ;R p i ,where j 2 R p is h R j ;W j ; j i ,itstransitionpredicate p ,legitimate states I ,faults f , SPEC = h Sf;Lv i . Output: Amaskingfault-tolerantprogram p 0 1: repeat 2: p 0 ;I 0 ;T 0 := AddMasking ( p ;f;I;SPEC ) 3: p 0 := ConstructDistProgram ( p 0 ;T 0 ) //Algorithm7 4: DL = f s 0 j s 0 2 T 0 ^ V 8 s 1 2 T 0 ( s 0 ;s 1 ) = 2 p 0 g //Computedeadlockstates 5: until DL = ; 6: return p 0 Inthissection,wepresentAlgorithm6tosolvetheproblemstatedinSection5.2;i.e.,the 77 algorithmfollowslazyrepairbyaddingfault-tolerancetoadistributedprogram.Ouralgorithm showninAlgorithm6consistsoftwosteps AddMasking and ConstructDistProgram . Step1.Thestepaddsmaskingfault-tolerancetoaprogram.Weutilizethealgorithm Add masking presentedin[46]toimplementthisfunction.Notethatthisalgorithmonlyaddsfault-toleranceto highatomicityprograms,whichmeansprocesseshavenorestrictionofread/writevariablesinthe prorgam.Notethateventhoughweonlyconsidermaskingfault-toleranceinthissection,wecan reusethealgorithmsin[46]toobtainfail-safeandnon-maskingfault-tolerantprogramsaswell. Andthereisnoneedtochangetheimplementationof ConstructDistProgram . Step2.Inthesecondstep,thefunction ConstructDistProgram (seeAlgorithm7)constructsa programthatcanberealizedasadistributedprogram.BeforeexplainingAlgorithm7,we considerhowrealizabilityconstraintsintroducedinSection5.1affecttherepairofdistributed programs. Fortransitionsviolatingwriterestrictions,wehavenochoicebytoexcludethemfromthe program.Tomakesurereadrestrictionisnotviolated,wehavetocheckforeachtransition,say t ,sothatgiveneachprocess,say j ,eitherthewholegroupoftransitionsin group j ( t ) isincluded ornoneofthemisincluded.Giventhesetwoalternativestodealwithreadrestriction,wechoose toremovethewholegroupoftranstitionsratherthankeepit.Itfollowsthatin AddMasking ,it hasensuredamaximalhighatomicityprogamwithallpossiblebehaviors.Inotherwords,those states(ortransitions)removedby AddMasking cannotbekept,otherwisemaybe violated,invariantmaynotbeclosed,programmaynotrecoverinthepresenceoffaults,etc. Additionally,thefunction ConstructDistProgram hasnoguaranteeofdeadlockstatesfree- dom.Thus,ifthereisanydeadlockstategeneratedafterstep2,wehavetorepeatstep1soasto resolvethesedeadlockstates. NowwedescribeAlgorithm7linebyline.OnLine1,wecomputeatransitionpredicate to 78 Algorithm7 Constructadistributedprogram Input: Aprogram p = h V p ;R p i anditstransitionpredicate p ,astatepredicate S . Output: Transitionpredicateoflowatomicityprogram p 0 . 1: := p [f ( s;s 0 ) j s 2 ( true S ) ^ s 0 2 true g 2: p 0 := ; 3: forall j 2 R p do 4: NW j := f v j v= 2 W j ^ v 2 V p g 5: j := \f ( s;s 0 ) j V v 2 NW j v ( s )= v ( s 0 ) g 6: T j := j 7: while T j 6 = ;^ group j ( j ) * p 0 do 8: t := chooseonetransitionfrom T j 9: if group j ( t ) * j then 10: T j := T j group j ( t ) 11: else 12: G := group j ( t ) 13: forall V 2 2 R j W j do 14: T 0 := S ( s 0 ;s 1 ) 2 group j ( t ) f ( s;s 0 ) j V v 2 V p ^ v= 2 V v ( s )= v ( s 0 ) ^ v ( s 0 )= v ( s 1 ) g 15: if T 0 j then 16: G := G [ T 0 17: endif 18: endfor 19: T j := T j G 20: p 0 := p 0 [ G 21: endif 22: endwhile 23: endfor 24: return p 0 includealltransitionsin p andtransitionsthatareenabledin true S .OnLine2,weinitialize p 0 asanemptyset.Inthefor-loopfromLine3toLine23,weconstructthetransitions p 0 for thedistributedprogramwhichbothwriteandreadrestriction.OnLine4andLine5,we computesetoftransitionsas j andexcludetransitionsviolatingwriterestrictionfromit.Inthe while-loopfromLine7toLine22,weensure p 0 doesnotinludeanytransitionviolatingread restriction.FromLine7toLine22,wecheckeachtransition t in j toseewhetheragroupof transitions group j ( t ) violatesreadrestrictionofprocess j .Ifwethat group j ( t ) isnotasubset of j ,weignorethewholegroupoftransitionsfrom p 0 (seeLine9andLine10).Otherwise, 79 wecaninclude group j ( t ) aspartof p 0 .Inadditiontothat,wecanalsoincludeanothersetof transitions(constructedbyfor-loopfromLine13toLine18),suchthatthesetransitionsarein j andtheyareidenticaltooneofthetransitionsin group j ( t ) exceptforthosereadablevariablesof process j . Theorem2. Algorithm6issound. Proofof C1 :OnLine2, AddMasking ensures C1 ishold.AndonLine3, ConstructDistProgram repairs p 0 byremovingstatesandtranstitions.Hence C1 ishold. Proofof C2 :OnLine2, AddMasking ensures C2 ishold.OnLine3, ConstructDistProgram introducesnonewtransitions,meaningthatitcreatesnocyclesthatcanpreventtheprogram torecover.However,itispossiblethatsomestatesbecomedeadlockonLine3.Inthis case,weresolvedeadlockstatesbyrepeating AddMasking andthenresolverealizability constraintsin ConstructDistProgram .Hence C2 ishold. 5.4ExperimentalResults Inthissection,wepresentourexperimentalresultsforAlgorithm6.Toshowtheimprovementto existingsynthesistoolSYCRAFT[20],wechoose2widely-knownfault-tolerantprotocols,i.e., Byzantineagreementandstabilizingchain. Inparticular,forByzantineagreementprogram,wereusetheactions,andfaults presentedinChapter2.Nowweintroducethestabilizingchainprogram.Inastabilizingchain program,eachprocessmaintainsavariable,say v j forprocess j (where 0 j n ).Inourex- periment,thevariable v j iswithdomain f 0 ; 1 ; 2 ; 3 g .Thefault-intolerantstabilizingchainprogram canbedescribedusingthefollowingactions. 80 Actionsforprocess0: v 1 =1 ^ v 0 =0 ! v 0 :=2 v 1 =3 ^ v 0 =2 ! v 0 :=0 Actionsforprocessn: v n 1 =0 ^ v n =3 ! v n :=1 v n 1 =2 ^ v n =2 ! v n :=3 Actionsforprocessj,where 1 j n 1 : v j 1 = v j +1 ^ v j 1 =( v j +1) mod 4 ! v j := v j 1 Figure5.4Fault-intolerantstabilizingchainprogram. Thefaultsforthestabilizingchainprogramcanperturbaprocesstowriteitsvariablearbitrarily. NextwepresenttheexperimentalresultsforByzantineagreementprograms(showninTable5.1) andforstabilizingchainprograms(showninTable5.2).Ourexperimentalresultsshowfaster synthesistimecomparedtothecautiousapproachofSYCRAFT. Rechablestates Cautious Lazy Grouping BA 20 10 15 202 s 32 s 2s BA 25 10 19 767 s 90 s 5s BA 30 10 22 1836 s 259 s 8s BA 35 10 26 4900 s 576 s 15s BA 40 10 30 9366 s 1466 s 26s Table5.1ExperimentalResultsforByzantineAgreement 81 Rechablestates Cautious Lazy Grouping SC 10 10 15 2 s < 1 s < 1 s SC 20 10 19 N=A 2 s < 1 s SC 25 10 22 N=As 11 s < 1 s SC 26 10 26 N=A 18 s 1 s SC 27 10 30 N=A 27 s 1 s SC 28 10 30 N=A 52 s 1 s SC 29 10 30 N=A 220 s 1 s SC 30 10 30 N=A 889 s 1 s Table5.2ExperimentalResultsforStabilizingChain 82 Chapter6 AddingGracefulDegradationtoHigh AtomicityPrograms 6.1PreliminaryConcepts Ingracefuldegradation,itisexpectedthattheprogramwillsatisfyastrongerunder normalcircumstances.And,itwillsatisfyaweakerundersomeothercircumstances (e.g.,inthepresenceoffaults). 6.1 (Weaker . Let SPEC = h Sf;Lv i and SPEC r = h Sf r ;Lv r i betwo ofaprogram p .Wesaythat SPEC r isweakerthan spec ifandonlyifforanystate sequence ˙ if ˙ SPEC r then ˙ SPEC . 6.2ProblemStatementforGracefulDegradationDesign Inthissection,weformallytheproblemofgeneratingaprogramthattheweaker (cf.Problem1.3).Webeginwithaprogram p thatthe SPEC frominvariant I .Wederiveaprogram,say p g ,anditsinvariant I g suchthat p g aweaker say SPEC r from I g .Next,weconsidertherelationbetween p and p g aswellas Iand I g toidentifytheproblemstatement.Onerequirementon p g isthat p g issupposedtoadd newbehaviorsthatpotentiallyviolate SPEC whilesatisfying SPEC r .And,itisnotallowed 83 toremoveanybehaviorsof p thatsatisfy SPEC .Sincethecorrectnessof p isknownfromits invariant I ,basedonthisrequirement,itfollowsthat I shouldbeasubsetof I g .Additionally, p g cannotremoveanybehaviors(respectively,transitions)withintheoriginalinvariant I .Thus,the problemstatementisshownasfollows: ProblemStatement6.1 : Given p , I , SPEC and SPEC r suchthat p SPEC from I .Identify p g and I g suchthat: A1 : p g j I = p j I , I I g . A2 : p g SPEC r from I g . A3 : p g j I SPEC from I . Remark. Theaboveproblemstatementhasarequirementthat I beasubsetof I g .Thisre- quirementdiffersfrom[46]inthatthisenlargestheinvariantbyaddingnewstatesfromwherethe newcanbeBycontrast,in[46],theproblemstatementrequiresthatthe generatedinvariantbeasubsetoftheoriginalinvariant.Forthisreason,existingalgorithmscannot beusedtosolvetheaboveproblem. 6.3AlgorithmforGeneratingGracefulProgram Webeginwiththegivenprogram p ,itsinvariant I anditstwo(stronger) SPEC and (weaker) SPEC r .Inturn, SPEC and SPEC r areintermsofthecorrespondingsafety andlivenessOurgoalistoconstruct p g and I g suchthattheysatisfyconstraintsof ProblemStatementofSection6.2. Ourprogramalsotakesanadditionalinputwhichisastatepredicate,namely S a .Theintuition for S a arisesfromthefactthattheprogramisexpectedtosatisfythe SPEC r under 84 certainconstraints.Predicate S a isusedtocharacterizetheseconstraints.Ifsuchconstraintsare noteasily S a canbeinstantiatedtobe S p I ,i.e.,allstatesexceptthosein I . First,thealgorithmcomputes I tobethesetofstatesin S a exceptthosethatviolatesafety (i.e.,thosein Sf r bs ).Theguessfor I g ,theinvariantforthegracefulprogramisthensetto I [ I (Line2).Then,thealgorithmcomputestheguessfor p g .,in p g ,wereuse allthetransitionsin p thatbeginin I .Wealsoincludealltransitionsthatbeginin I unlessthey violatesafety Sf r bt (Line3).StartingfromLine4,thealgorithmrevisestheprogrambyensuring thelivenessisInLines6-9,weexcludethedeadlockstates,i.e.,statesfrom wherenotransitionsoriginatefrom I g .Thenwerecompute p g suchthatallthereachablestatesby p g startingfrom I 0 (or I )remainsin I 0 (or I respectively).Toensureliveness,weafunction rank thatassignseachstateanintegervaluethatrepresentsthelengthoftheshortestpathfrom thatstatetoreachatargetstatepredicate T .Then,onLines11and12,weexcludetransitions whererankdoesnotdecrease.Removalofsuchtransitionsensuresthattherewillbenocyclesthat preventtheprogramfromreaching T .Toresolvedeadlockstateswerepeattheloopstartingat Line4.Finally,thisprocessstopsonceaisreached. Theorem3. Algorithm8issound. Proof. Toprovethistheorem,weneedtoshowthatthethreeconditionsofProblem6.1are A1 : p g j I = p j I , I I g . Alltransitionsof p thatstartin I arepreservedin p g onLine1.And,noneofthemare removedonLine8or11,as p SPEC from I .Hence,thisconditionis Also,for I I g ,nostatein I isremovedatLine9(i.e.,atline10, I I g ).Furthermore, nostatein I isremovedonLine12.Hencetheconditionis 85 Algorithm8 GracefulProgramGeneration Input: statepredicate S a ,programtransitions p ,invariant I ,weakersafety Sf r (consisting of Sf r bs and Sf r bt ),liveness Lv (consistingnleads-topropertiesoftheform F i T i , i 2 1 n ). Output: gracefulprogram p g andinvariant I g withweaker 1: I := S a Sf r bs 2: I 0 := I [ I 3: p 0 := f ( s 0 ;s 1 ) j s 0 ;s 1 2 I 0 ::( s 0 2 I ^ ( s 0 ;s 1 ) 2 p ) _ ( s 0 2 I ^ ( s 0 ;s 1 ) = 2 SPEC r bt ) g 4: repeat 5: I old := I 0 ;p old := p 0 6: repeat 7: I old := I 0 8: p 0 := maxp ( p 0 ;I;I 0 ) 9: I 0 := I 0 deadlock ( I 0 ;p 0 ) 10: until I old = I 0 11: p 0 := p 0 S i 2 1 n f ( s 0 ;s 1 ) j s 0 2 I 0 ^ rank ( s 0 ; T i ;p 0 ) >rank ( s 1 ; T i ;p 0 ) ^ rank ( s 0 ; T i ;p 0 ) 6 =0 ^ rank ( s 1 ; T i ;p 0 ) 6 = 1g 12: I 0 := I 0 S i 2 1 n f s j rank ( s; T i ;p 0 )= 1^ s 2F i ^ s= 2 I g 13: until I old = I 0 ^ p old = p 0 14: return p 0 as p g , I 0 as I g if I 0 6 = ; ,otherwisedeclarenogracefulprogramgenerated. Function: maxp ( t :transitionpredicate, S 1 , S 2 , :setofstatepredicateswhere S 1 S 2 ) return f ( s 0 ;s 1 ) j t \ ( s 0 2 S 1 ) s 1 2 S 1 ) ^ ( s 0 2 S 2 ) s 1 2 S 2 ) ^g Function: deadlock ( S :statepredicate, t :transitionpredicate) return f s 0 j s 0 2 S ^ ( 8 s 1 2 S :( s 0 ;s 1 ) = 2 t ) g Function: rank ( s :state, T :statepredicate, t :transitionpredicate) return theshortestpathlengthfrom s tooneofthestatein T ,ifthepath(consistingonlytransitionsin t ) exists;or 1 ,otherwise. A2 : p g SPEC r from I g . Closureof I g followsfromLine8wheretransitionsthatcanpotentiallyviolatetheclosure areremoved. Moreover,byconstruction, p g cannotreachastatein Sf r bs andcannotexecuteatransition in Sf r bt .Finally,byLines11and12,thelivenesspropertyisalsoHence, p g SPEC r from I g . A3 : p g SPEC from I . Since p SPEC from I andnoneofthetransitionsthatbeginin I areremoved,then 86 p g j I = p j I .Hence, p g SPEC from I . 6.4CaseStudy:PrinterSystem Inthiscasestudy,wefocusontheprintersystemintroducedin[40].Intheoriginal theprintersystemisrequiredtosatisfythe FIFO However,asarguedin[40],it maybenecessarytorelaxthisrequirementundercertainconstraints.Inparticular,theweaker requirementconsideredin[40]requiresthatsomeout-of-orderprintingispermitted. 6.4.1OriginalPrinterSystemActions Theprogramin[40]consistsofseveralclientsandseveralprintservers.Usingthetransaction semantics,theclients`enqueue'theirprintrequestinasharedqueue.Theprintserversremove fromthisqueueandprintthetask.Forsimplicity,wedonotmodelthetransactionsusedforcon- currencycontrol.Furthermoreweomittheenqueueoperationsinceitdoesnotaffectthebehavior weinterestedin. Theoriginalprogram,say A ,consistsoftwoparts, 1. Once i th taskhasbeendequeued( d i =1 )andithasnotbeenprinted( p i =0 ),thenprintit ( p i :=1) . 2. Ifthe i th taskhasbeenprinted( p i =1 ),thenthenexttask( i +1 )inqueuegetachanceto bedequeued( d i +1 :=1 ). 87 Thusactionsareasfollows: d i =1 ^ p i =0 ! p i :=1 p i =1 ! d i +1 :=1 6.4.2ofPrinterSystem SafetyRecallthatthesafetyintermsofasetofbadstatesprogram shouldnotreachandasetofbadtransitionsthattheprogramshouldnotexecute.The cation Sf bs 1 isbytheoriginalprogramandrequiresthattheprintingmustoccurinorder and,hence,task j cannotbedequeueduntiltask j 1 isprinted.Thegracefulpermits thepossibilitythattask j canbedequeuedeveniftask j 1 isnotprinted.However,itrequires thattask j cannotbedequeueduntiltask j 2 isprinted.Thus,thesafety(original andgraceful)thatthestatesthatshouldnotbereachedisasfollows: Sf bs 1 = f s j9 i;j 2 1 n :: p i ( s )=0 ^ d j ( s )=1 ^ j>i g Sf bs 2 = f s j9 i;j 2 1 n :: p i ( s )=0 ^ d j ( s )=1 ^ j>i +1 g Observethat Sf bs 1 isthestrongercapturingthenotionofFIFO,and Sf bs 2 isa weakersinceitallowsatmosttwotasksbedequeuedwithoutthepending printing.Onecouldconsiderfurtherrelaxationthatallowsmoreout-of-orderprinting.Ouralgo- rithmcanbeappliedinthiscontextbyapplyingAlgorithm8totheprogramgeneratedattheend ofthissection.However,thedetailedanalysisofthisgenerationisoutsidethescopeofthispaper. Inadditiontobadstates,safetyspecanincludebadtransitionsthattheprogramis notallowedtoexecute.Thebadtransitions, Sf bt c ,describe structuralconstraints thathaveto bebytheprintersystem.Inparticular, Sf bt c statesthataprintedtaskcannotbereset 88 tounprinted.Likewise,oncethetaskhasbeendequeueditcannotbere-enqueued.And,atask cannotbeprinteduntilitisdequeued.Notethatthestructuralconstraintshavetobein theoriginalaswellasthegracefulprogram.Also,atmostonevariablecanbechangedinany transition.Thus, Sf bt c := f ( s;s 0 ) j9 i;j 2 1 n; ( p i ( s )=1 ^ p i ( s 0 )=0) _ ( d i ( s )=1 ^ d i ( s 0 )=0) _ ( d i ( s 0 )=0 ^ p i ( s 0 )=1) _ ( i 6 = j ^ d i ( s ) 6 = d i ( s 0 ) ^ d j ( s ) 6 = d j ( s 0 )) _ ( i 6 = j ^ p i ( s ) 6 = p i ( s 0 ) ^ p j ( s ) 6 = p j ( s 0 )) _ ( d i ( s ) 6 = d i ( s 0 ) ^ p j ( s ) 6 = p j ( s 0 )) g Sf bt c asetofbadtransitionsforprogramtoexcludeinordertoperformsafetyex- ecutionnomatterwhichlevelofweakerprogramisplacedin.First,intheprinter system,aprintedtaskcannotberesettounprinted.Second,oncethetaskhasbeendequeuedit cannotbere-enqueued(weincludethissinceitwillpotentiallyputtheprogramintoaloop).And third,werequirethatataskcannotbeprintedunlessithasbeendequeued. Weusesafetyfororiginalprogramas Sf bs 1 and Sf bt c ;forgracefulprogram as Sf bs 2 and Sf bt c .Notethatalthoughthesetofbadtransitionsarethesamehere,itisnota requirementforAlgorithm8. LivenessSThelivenessrequiresthateventuallyalltasksareprinted. 89 Thus, Lv := true S L ,where S L := 8 i 2 1 n;d i = p i ( s )=1 6.4.3GeneratingaGracefulPrinterSystemProgram WeinstantiateAlgorithm8withfollowinginputs: program p :isinstantiatedtobetransitionscorrespondingtoprogram A .Forsimplicity,we assumethattherearethreetasksinthesystem.Hence,thestateoftheprogramisrepresented as ( d 1 d 2 d 3 ;p 1 p 2 p 3 ) ,where d i (respectively, p i )denoteswhethertask i hasbeendequeued (respectivelyprinted). Thesafetytionisinstantiatedsothatthesetofbadstatesis Sf bs 2 andthesetof badtransitionsis Sf bt c Livenessistobe Lv . Invariant I isinstantiatedtobethestatesreachedinthecomputationof A bystartingfrom theinitialstate (000 ; 000) . Statepredicate S a isinstantiatedtobe S p I Considerthecandidatestatesgeneratedoutside I fromLine1thatalsoexcludesthoseviolate Sf bs 2 .Ingeneral,astheweakersafetyrequires,newprogramallowstwotaskscan bedequeuedregardlesseitheroneprintingornotandanewtaskcanthenbedequeuedif 90 eitheroneoftwopreviousdequeuedtasksprinting.Inotherwords, I = f (110 ; 000) ; (110 ; 010) ; (111 ; 010) ; (111 ; 011) ; (110 ; 110) ; (111 ; 100) ; (111 ; 101) ; (010 ; 000) ; (010 ; 010) ; 011 ; 010) ; (011 ; 011) g Thenewprogramtransitionsgeneratedareallpossiblepairofstatesin I plusthoserecovered from I to I providedthattheydonotviolate Sf bt c .TherearenodeadlockstatesonLine8. Moreover,sinceeverystatein I or I hasapathtoastatewherealltasksarecompleted,nostate andtransitionareremovedonLines11and12. Now,weevaluatethenewbehavioroftheprintersystemin p g providedthecontextshown in6.1.Ideally,theprogramstaysininvariant I whilesatisfying Sf bs 1 , Sf bt c and Lv . However,iftheprogramisperturbedtobeoutside I thenitwillsatisfy Sf bs 2 , Sf bt c and Lv . Figure6.1showsthebehaviorofthegracefulprogram.Notethatinthisprogram,thetasksbeing printedisnottotallyoutoforder,thoughitisnotFIFO.Forexample,task3cannotprecedestask 2providedthattask1hasnotbeenprinted. Figure6.1GracefulPrinterSystemProgram 91 Chapter7 AddingGracefulDegradationto DistributedPrograms Inthischapter,weextendthealgorithminSection6.3todistributedprogramsandpresentacase studyofaddinggracefuldegradationtoaByzantineAgreementprogram.Inparticular,Algorithm 8focusesonderivingaconcurrentgracefulprogram,wheretheprogramisabletoreadandupdate allvariablesinanatomicstep.Inapplyingthisalgorithmtodistributedprograms,itisnecessary thatthegeneratedprogramcanbeimplementedusingasetofprocesses. Themaindifferencebetweenahighatomicityprogram(i.e.,centralizedprogram)andadis- tributedprogramisthatinadistributedprogram,eachprocessmayhavesome private variablesto whichtheyhavesoleaccess.Useofsuchvariablesisessentialinensuringthatthegracefulpro- gramisimplementableinadistributedsystem.Asanillustration,ifweweretomodelprocesses thatmaybebyzantine,thentheknowledgeofwhetheraprocessisbyzantinemustnecessarilybe privatetothatprocess. 7.1AlgorithmforGeneratingDistributedGracefulPrograms Wehandlethedistributionissuesduringaddinggracefulsamewedoinaddingfault-toleranceto distributedprograms(seediscussioninChapter5).WeintroducetoAlgorithm8in ordertoapplywriteandreadrestrictions.Inparticular,writerestrictionsaremodeledassetofbad 92 transitionsthatprogramshouldexclude.Readrestrictionsareignoredinitially,i.e.,intheloop4-13 ofAlgorithm8.However,justbeforewechecktheconditiononLine13thatdetermineswhether theloopshouldterminate,weattempttoremovethecorrespondinggroupoftransitionsremoved intheloop.However,ifremovalofsometransition,say t 1 causesremovaloftransition t 2 and t 2 isatransitionin I I thenweonlyremove t 1 butnotthecorrespondinggroup.Thisisduetothat thefactthatProblemStatementinSection6.2preventsusfromremovingtransitionsin I I .Itis thereforepossiblethatthetransitionsofthegracefulprogrammaynotsatisfythereadrestrictions entirely.Basedonthisdiscussion,wepresentthefollowingalgorithmsnippet(Algorithm2)that isinsertedbetweenLine12andLine13inAlgorithm8tohandlereadrestrictions. Algorithm9 Handlereadrestriction for eachprocess j = h R j ;W j ; j i do p r := p old p 0 NW j := f v j v= 2 W j ^ v 2 V p g j := p r \f ( s;s 0 ) j V v 2 NW j v ( s )= v ( s 0 ) g for each t 2 j do p 0 := p 0 ( group j ( t ) I I ) endfor endfor 7.2CaseStudy:ByzantineAgreement Inthiscasestudy,wepresentacasestudyforbyzantineagreementproblem.Inthecanonical versionofbyzantineagreement,thereisageneralprocessandthreenon-generalprocesses.Inthis problem,inanidealscenario,thegeneralsendsadecision(either 0 or 1 )tonon-generals.The non-generalsreceivethisdecisionandtheirdecisiontobethatofthegeneral.Thisensures strongvalidity ,i.e.,thedecisionofthenon-generalisthesameasthatofthegeneraland strong agreement ,i.e.,thedecisionofthenon-generalmatchesanyothernon-general. 93 Onecanconsidersomegracefulversionsofthis.Onegracefulversionallowsanon-general tobebyzantineand,hence,allowittochangeitsdecision.Forthisgracefulversion,theprogram weakvalidity ,i.e.,thedecisionofthenon-byzantinenon-generalisthesameasthatof thegeneralprovidedthenon-generalisnon-byzantineand weakagreement ,thedecisionofthe non-generalisthesameasthatofanothernon-generalprovidedbotharenotbyzantine. Theprogramfortheidealscenariosuftodealwithensuringweakvalidityandweakagree- mentifanon-generalisbyzantine. Inthiscasestudy,webeginwiththisprogramasourinputprogramforwhichwewanttoadd gracefuldegradation.Subsequently,wederiveagracefulversionofthisprogramthatonly weakagreement .Finally,weutilizethisderivedgracefulprogramalongwiththeoriginalprogram toderiveafault-tolerancegracefuldegradationbyzantineagreementprogram,i.e.,aprogramthat 1. satis weakvalidity and weakagreement intheabsenceoffaults. 2. satis weakagreement inthepresenceoffaults,wherethefaultmakesthegeneralprocess byzantine. 7.2.1ofByzantineAgreement Asdescribedabove,thesafetyis weakvalidity (denotedas Sf bs wva )and weakagree- ment (denotedas Sf bs wag ). weakvalidity requiresnon-byzantinenon-generalstotheir decisiontobethesameasthatofthegeneral. weakagreement requiresthatthedecisionof anytwonon-byzantinenon-generalsshouldbethesame.Additionally,thenotionof requiresthatanon-generalcannotthedecision ? anditcannotchangeitafterit 94 it.Thus,thesetofbadstatesortransitionsbytheseareasfollows: Sf bs wva = 9 l 2 i;j;k :: : b:l ^ d:l 6 = d:g ^ f:l =1 Sf bs wag = 9 p;q 2 i;j;k :: : b:p ^: b:q ^ d:p 6 = d:q ^ f:p =1 ^ f:q =1 Sf bs nb = 9 l 2 i;j;k :: : b:l ^ f:l =1 ^ d:l = ? Sf bt fi = f ( s;s 0 ) j9 l 2 i;j;k :: s ( f:l )=1 ^ s ( b:l )=0 ^ ( s ( d:l ) 6 = s 0 ( d:l ) _ s 0 ( f:l )=0) g Thesafety Sf ,isintermsofthesetofbadstates, ( Sf bs wva [ Sf bs wag [ Sf bs nb ) ,andthesetofbadtransitions Sf bt fi . Finally,theliveness Lv requires(forbothoriginalandgracefulprogram)that eachnon-byzantinenon-generalitsdecision.Thisisasfollows: Lv := true ( 8 p 2 i;j;k ::( b:p =0) ! ( f:p =1)) 7.2.2ActionsforByzantineAgreement Asdiscussedabove,inthiscasestudy,theprogramdealswiththecasewhereanon-generalis byzantinebutnotthecasewhereageneralisbyzantine.Theactionsoftheoriginalprogramareas follows: d:j = ?^ f:j =0 ! d:j := d:g d:j 6 = ?^ f:j =0 ! f:j :=1 b:j ! d:j :=0 j 1 95 Thelastactionistheonethatallowsabyzantinenon-generaltochangeitsdecision.Since ProblemStatement6.1guaranteestopreserveexistingbehavior,theseactionswouldbepreserved intheprogram.Theseactionsareconcludedasenvironmentaction,i.e.,actionsthateventu- allystop.Hence,theyarenotconsideredinresolvinglivenessonLine11and12inAlgorithm8. 7.2.3InvariantofByzantineAgreement Theinvariantcanbecomputedfromtheinitialstatewherethegeneralisnotbyzantineandatmost onenon-generalmaybebyzantine.Moreover,thedecisionofeachnon-generalis ? andithasnot itsdecision.Thestatesreachedfromthisinitialstatebythecomputationoftheabove programareasfollows: I = : b:g ^ ( : b:i _: b:j ) ^ ( : b:j _: b:k ) ^ ( : b:i _: b:k ) ^ ( 8 p :: : b:p ) ( d:p = ?_ d:p = d:g )) ^ ( 8 p ::( : b:p ^ f:p ) ) ( d:p 6 = ? )) 7.2.4GeneratingGracefulByzantineAgreementProgram Thegracefulprogramisintendedforthescenarioswherethegeneralisbyzantine.Hence,the weaker SPEC r isintermsofthesetofbadstates, ( Sf bs wag [ Sf bs nb ) , andthesetofbadtransitions Sf bt fi . WenowillustrateAlgorithm8inthecontextofbyzantineagreement.Inparticular,wein- stantiateAlgorithm8with(1) p ,thetransitionsoftheoriginalprogram,(2) SPEC and SPEC r representingtheoriginalandweaker(3)livenessspe Lv ,(4)invariant I , and(5)statepredicate S a equalto b:g ^: b:i ^: b:j ^: b:k .Notethatthelastparameterisbasedon theobservationthatthegracefulprogramisintendedforscenarioswherethegeneralisbyzantine. 96 Now,weevaluateAlgorithm8ontheseinputs. Lines1-2:Identifyingasetofinvariant( I 0 ).Thealgorithmstartswithgeneratingallpossible statesininvariantsatisfyingweakersafetySinceweincludealloriginalinvariant( I ) infault-intolerantprogram,theinvariant( I )notin I isenumeratedasfollows. 1. First,observethat S a requiresthatonlythegeneralisbyzantineandnonon-generalisbyzan- tine. 2. Consideringstatesin S a further,weobserve(a) I includesallstateswhere d:i = d:j = d:k istrue.Thisisduetothefactthatinsuchstatesagreementissatiirrespectiveofvaluesof b:i;b:j;b:k;b:g:;f:i;f:j;f:k or d:g .(b)Now,considerstateswheresometwonon-generals, say i and j differ.Toensure Agreement ,insuchstatesin I ,either d:i (or d:j )shouldbe equalto ? or f:i (or f:j )mustbe 0 .(c)If I includesastatewhere f:j istruethen d:j must bedifferentfrom ? . Hence,aftercomputationbyLine1,wehave I =( b:g =1 ^ b:i = b:j = b:k =0) ^ ( 8 l 2 i;j;k : f:l =1 ! d:l 6 = ? ) ^ _ 0 B B B B B B B B B @ d:i = d:j = d:k d:i 6 = d:j ! ( d:i = ?_ d:j = ?_ f:i =0 _ f:j =0) d:i 6 = d:k ! ( d:i = ?_ d:k = ?_ f:i =0 _ f:k =0) d:j 6 = d:k ! ( d:j = ?_ d:k = ?_ f:j =0 _ f:k =0) 1 C C C C C C C C C A Line3:Identifyingasetoftransitions( p 0 )notviolatingsafetyspeciAfterenumerating thepossiblestatesin I generatedonLine1,thealgorithmgenerates p 0 byreusing p aswellas includinganytransitioninspace I I 0 thatdoesnotviolatethe SPEC r . 97 FollowingAlgorithm8asiswouldincludetransitionswhereprocess j changesthevalueof d:k .Or,itmayincludetransitionsofprocess j thatrelyonthegeneralbeingnon-byzantine. Sincethiswouldbeunacceptableinthefault-tolerantgracefuldegradationprogram,asitwould beimpossibletoimplementsuchaprogram,weutilizethestandardapproachfrom[46,51]for distributedprogramsandutilizeitinAlgorithm8. Loop4-13:Resolvingdeadlockstatesandlivenessviolation.Itisstraightforwardtoobservethat inthiscase,thefollowingtransitionscanbeincludedin p 0 :(1)Aprocessthathasnot canchangeitsdecisionto 0 or 1 non-deterministically,or(2)Aprocesscanaslongasits decisionisnot ? andanotherprocesshasnotwithadifferentdecision. Notation. Weusethesequence h x 1 ;x 2 ;x 3 ;x 4 i todenotethesetofstateswherethevalueof x 1 equals d:g ,thevalueof x 2 equals d:i ,thevalueof x 3 equals d:j andthevalueof x 4 equals to d:k .Weuse` 'todenotethatthedecisionvalueofcertainprocessisirrelevant.Forexample ; 1 ; 1 ; 1 i denotesthesetofstateswhere d:i = d:j = d:k =1 and d:g iseither 0 or 1 . Insuchaprogram,therearenodeadlockstates.However,sometransitionsneedtoberemoved duetonon-decreasingrankvalue(Line11).Inparticular,considerastatein ; 0 ; 0 ; 1 i process k hasnoteditsdecision.Furtherconsiderthetransitionwhereprocess i changesitsdecision to 1 .Thistransitionisremovedsincetherankofbothstates(numberofstepstoreachastatewhere allnon-byzantinenon-generalshaveisthesame.Inotherwords,aprocessinmajority cannotchangeitsdecisiontotheminorityprocessunlessthatminorityprocesshasits decision. Atthispoint,wereachtheendoftheloop(Line13).Sinceweneedtoconsidergrouping causedbyreadrestrictions,thisresultsinremovalofalltransitionswhereaprocessinmajority changesitsdecisiontoonethatisintheminority,irrespectiveofwhethertheminorityprocess has(Notethatthisremovaldoesnotimpacttransitionsin I ,theinvariantoftheoriginal 98 program,sincethereisnocorrespondingstatein I .) Thetransitionsofthegracefulprogramforprocess i instatesoutside I areasshowninFigure 7.1.(Recallthatthetransitionsforprocess i inside I arethesameasthatoftheoriginalprogram.): d:i = ?! d:i := d:g d:i = ?^ ( d:j = d:k = ? ) ! d:i :=0 j 1 d:i = ?^ f:i =0 ^ (( d:j 6 = d:g ^ d:j 6 = ? ) _ ( d:j 6 = d:k ^ d:k 6 = ? )) ! d:i :=0 j 1 d:i 6 = ?^ f:i =0 ^ ( d:i = d:j _ d:i = d:k ) ! f:i :=1 d:i 6 = ?^ f:i =0 ^ ( d:j = d:k = ?^ d:i = d:g ) ! f:i :=1 d:i 6 = ?^ f:i =0 ^ d:j 6 = ?^ d:i 6 = d:j ^ d:j = d:k ! d:i := d:j Figure7.1ActionsforGracefulByzantineAgreementProgram 99 Chapter8 AddingFault-tolerancetoProgramwith GracefulDegradation Inthischapter,wepresentthetechniqueofaddingfault-tolerancetothegracefulprogram.We formalizetherequirementsforaddingfault-tolerantgraceful-degradationinProblemStatement 8.1.Subsequently,wecontinuewiththetwocasestudiesinSection8.2andSection8.3byutilizing theoriginalandgracefulprogramtoobtainamaskingfault-tolerantgraceful-degradationprogram. 8.1ProblemStatement Nowweconsidertheproblemofaddingfault-tolerancetothegracefulprogram.Intuitively,the resultingfault-tolerantprogram,say p f ,isrequiredtosatisfytheoriginal(stronger)in theabsenceoffaults.However,whentheprogramperturbedbyfaults,theprogramisguaranteedto recovertostatesfromwhereittheweakerWeformallyidentifytheproblem inProblemStatement8.1. 100 ProblemStatement8.1: AdditionofFault-TolerantGraceful-Degradation Given p , SPEC , I ,suchthat p SPEC from I ; p g , SPEC r , I g ,suchthat p g SPEC r from I g ; I I g , p j I p g j I , f . Doesthereexist I f , I g f and p f suchthat B1 : I f I , p f j I f p g j I g f B2 : I g f I g , p f j I g f p g j I g f . B3 : p f SPEC from I f . B4 : p f ismasking f -tolerantfor SPEC r from I g f . Asonecanimagine,itshouldbepossibletoreuseexistingalgorithmsforaddingfault-tolerance toaddfault-tolerantgracefuldegradation.However,oneneedstoensurethatduringsuchreuse, thesynthesizedprogramtheweakerinthepresenceoffaults.Bycontrast, (assumingthatsatisfyingthestrongerwerenotfeasible),theexistingalgorithmfor addingfault-tolerancewilldeclarefailuretoaddfault-tolerance.Thereareseveralalgorithms [46,17,34]foraddingfault-toleranceintheliterature.Inthissection,weplantoutilizethemas ablackbox,i.e.,weonlyrelyontheassumptionthattheysatisfytheproblemofaddingfault- tolerance(repeatedfrom[46]). 101 ProblemStatement8.2: AdditionofFault-Tolerance Given p , SPEC , I , f suchthat p SPEC from I ; Doesthereexist I f ,and p f suchthat C1 : I f I , p f j I f p j I f C2 : p f ismasking f -tolerantfor SPEC from I f . Inordertodescribethefault-tolerantprogramsubjecttotheconstraintsinProblemStatement 8.1,wecanutilizeanyofthealgorithms[17,34,46].Weusethename Add masking todescribe suchagenericalgorithm.Sinceourreuseisblack-boxinnature,weonlyrelyontheproof(from [17,34,46])thatitProblem8.2.However,fortheconvenienceofthereader,we describethekeystepsthesealgorithms.Giventhefault-intolerantprogram,asetoffaulttransi- tions,theinvariantand Add masking ensuresallthereachablestatesby p [] f satisfysafetyandthenexcludesthecomputationthatstartsoutsidetheinvariantand cannotpossiblyreachastateintheinvariant.Inaddition, Add masking alsoresolvesdeadlock statesandremovescomputationsthatreachthedeadlockstatesifitisimpossibletoaddrecovery. Thealgorithmforaddingfault-tolerantgracefuldegradationisasshowninAlgorithm10. First,itinvokes Add masking (Line4)thattheconstraintsofProblem8.2.Theinput to Add masking consistsofthegracefulprogram p g ,faulttransition f ,invariantofthegraceful program I g ,andweaker SPEC r . Add masking returns p 0 and I 0 suchthatcon- straintsofProblem8.2areNotethat p 0 theweakerintheabsence offaults.Hence,weconsiderthecomputationsof p 0 on I \ I 0 (Line5)bycomputing I \ I 0 and p 00 j ( I \ I 0 ) ,i.e.,aprogramwhosetransitionsareasubsetofthetransitionsof p and p 0 .Sincethe transitionsofthisprogramareasubsetofthatof p ,itsatthestronger SPEC , 102 aslongasitdoesnotdeadlockinanystatein I \ I 0 .Iftherearedeadlockstatesthenthoseare removed(Loop6-11)andtheprocessisrepeated(Loop2-12). Algorithm10 AddingFault-tolerancetoGracefulProgram Input: fault-intolerantprogram p ,gracefulprogramtransitions p g ,faulttransitions f ,invariant I , gracefulinvariant I g ,weaker SPEC r . Output: maskingfault-tolerantprogram p f 1: p 0 := p g , I 0 := I g , I 00 := I 2: repeat 3: I old := I 0 , p old := p 0 4: p 0 ;I 0 := Add masking ( p 0 ;f;I 0 ;spec r ) 5: I 00 := I 00 \ I 0 , p 00 := p 0 j I 00 6: while deadlock ( I 00 ;p 00 ) 6 = ; do 7: I 0 := I 0 deadlock ( I 00 ;p 00 ) 8: I 00 := I 00 \ I 0 9: p 0 := maxp ( p 0 ;I 00 ;I 0 ) 10: p 00 := p 0 j I 00 11: endwhile 12: until I old = I 0 ^ p old = p 0 13: return p 0 as p f , I 0 as I g f ,and I 00 as I f Theorem1. Algorithm10issound. Proof. Toprovethistheorem,weneedtoshowthatthefourconditionsofProblem8.1are B1 : I f I , p f j I f p g j I g f . I f I followsfromLine1and4,thesatisfactionof C1 inProblem8.2,andthefactwedo notaddnewstatesinAlgorithm10.Sincenonewtransitionisadded,and I f I g f follows fromLine5and8.ThusweconcludeconditionB1is B2 : I g f I g , p f j I g f p g j I g f . SameastheproofforB1,B2is B3 : p f SPEC from I f . Sinceeveryreachablestatesby p SPEC ,andthereisnonewtransitionaddedin 103 obtaining p f j I f , p f j I f SPEC .ByLine9and10, I f isclosedon p f .SoB3is B4 : p f ismasking f -tolerantfor SPEC r from I g f . ConditionB4isfollowsfromLine3andfromLine4-8withcorrectinputparame- tersfor Add masking infutureiterations. 8.2CaseStudy(continued):Fault-tolerantPrinterSystem InthissectionwecontinuewiththeprintersystemfromSection6.4.Weconsiderthefaultthat dequeuesthenexttaskbeforethecurrenttaskisprinted.,thefaultactionisrepresented asfollows: f :: d i =1 ^ p i =0 ! d i +1 :=1 Next,weusethisfaultactionalongwiththeoriginalandgracefulprogramfromSection6.4 togeneratethefault-tolerantgracefuldegradationprogram:Observethatinthiscase,theinvariant ofthegracefulprogramisclosedinthefaultactions.Hence,Add maskingcantriviallysatisfythe fault-tolerancerequirementasnorecoveryactionsareneeded.And,theactionsofthefault-tolerant graceful-degradationprogramarethesameasthatofthegracefulprogram. 104 8.3CaseStudy(continued):Fault-tolerantByzantineAgree- ment Thefaultactionrelevantforthisstepistheonethatmakesthegeneralprocessbyzantine.We representthisfaultbytwoactions.Firstprocesscausesthegeneraltobebyzantineandthesecond allowsittochangeitsdecision. f :: 8 p 2 g;i;j;k : b:p ! b:g := true b:g ! d:g :=0 j 1 BasedontheconstraintsofProblem8.1,inthecontextofbyzantineagreement,theproblemof addingfault-tolerantgraceful-degradationwillresultinaprogramthathasthefollowingproperties: Intheabsenceoffaults,i.e.,whenthegeneralisnotbyzantine,itwillguaranteethat weakvalidity , weakagreement and Lv areMoreover,inthepresenceoffaults,i.e.,whenthegeneral isbyzantine, weakagreement willbeandeventuallytheprogramwillrecovertoastate whereboth weakagreement and Lv wouldbeObservethattakentogether,thisensures thattheprogramguaranteesthatifthegeneralisnotbyzantinethenallnon-generals withadecisionthatisthesameasthatofthegeneral.And,ifthegeneralisbyzantine,allnon- byzantinenon-generalswithidenticaldecision.Inotherwords,thefault-tolerantgraceful degradationprogramthetypicalrequirementsforbyzantineagreement[52]. ThegracefulprogramfromSection7.2didnotentirelyhandlethereadrestrictions.Hence, theinputprogramwouldremovecertaintransitionsthatpotentiallyviolatethereadrestrictions. Observethattheaction(whereprocess i changesfrom ? to d:g )existsin I aswellasoutside I . Hence,knowledgeof b:g isnotneededwhenexecutingthisaction.Hence,theactioncorresponding tothegroupcontainingsecondactionmustberemoved.Likewise,inthethirdaction, i canchange itsvaluetoeither 0 or 1 onlyifboth d:j and d:k aredifferentfromthegeneral.Sincethereisno 105 correspondingstatein I .Thefourthactionremainsasis.Regardingtheaction,weobserve thatifafaultoccursinastatereachedafterexecutingthatactionthenthecorrespondingstate, where d:g 6 = d:i and f:i =1 ,isoutside I g .Sincerecoveryisnotpossiblefromthisstate,this transitionwouldberemovedduringadditionoffault-tolerance.Finally,thesixthactionremains asis.Thus,theactionsofthefault-tolerantprogramareasshowninFigure8.1.Andasdiscussed above,ittherequirementsof[52].Theprogramisalsocorrectevenifthethird actionwereremoved.Itisgeneratedbyourprogramtoprovidemaximalchoicetothedesigner. d:i = ?! d:i := d:g Secondactionsubsumedbytheone. Thirdactionasbelow d:i = ?^ f:i =0 ^ (( d:j 6 = d:g ^ d:j 6 = ? ) ^ ( d:j 6 = d:k ^ d:k 6 = ? )) ! d:i :=0 j 1 d:i 6 = ?^ f:i =0 ^ ( d:i = d:j _ d:i = d:k ) ! f:i :=1 Fifthactionremoved d:i 6 = ?^ f:i =0 ^ d:j 6 = ?^ d:i 6 = d:j ^ d:j = d:k ! d:i := d:j Figure8.1 ActionsforFault-tolerantGraceful-DegradationProgram 106 Chapter9 AddingMulti-GracefulDegradation Inthischapter,weextendthetechniqueinChapter6toaddmulti-gracefuldegradation.Inmulti- gracefuldegradation,thesystemprovidessuccessivelyreducedguaranteesofrequirementsinthe presenceofincreasinglyseverefaults.Thischapterisoriganizedasfollows.InSection9.1,we presenttheproblemstatementofaddingmulti-gracefuldegradation.SubsequentlyinSection9.2, wepresentthealgorithmofaddingmult-gracefuldegradation.Finally,inSection9.3,weillustrate thetechniqueviaacasestudy,i.e.,agasdetectionandventilationsystem[53]. 9.1ProblemStatement Inthissection,weformallytheproblemofaddingmulti-gracefuldegradation.Amulti- gracefuldegradationprogramsatitsoriginalwhennofaultshaveoccurred. However,iffaultsoccurthendependingupontheirseverity,itaweaker Usingthestandardapproach(e.g.,[7])formodelingfaults,wemodelfaultsofprogram p asstate perturbations,i.e.,faultsofprogram p areasubsetof S p S p where S p isthestatespaceofpro- gram p .Tocharacterizetheseverityoffaults,amulti-gracefuldegradationprograma setoffaults,say f 1 ;f 2 ; ;f n suchthat 8 j :0