A 2.55.2...‘57.’ ”.5. "1:1 -5. . onyx J. 2 in .. , . “WWI: . .514; x. ; :1- F4 . f . ‘ r, ’L . 111...; . . . gut; , I t» . 1.. gm? 3. s . win? ). n 15.1. . .2 . .. .t i 3421:: \ 3v... x A? ..a\ 11-. slur. St . .1 ..fiw...d v.8 can .9 .. . . r » kn}. want!) . r .1 0.11 I .3: in. .139; :wfiwwfimwwmw \ .47. h mantgwfiyfiii. , V a hat: . i b , 3 .‘b t.) D ~E. 7(07?% \OSL This is to certify that the thesis entitled S MECHANICAL VERIFICATION OF AUTOMATIC SYNTHESIS OF FAULT-TOLERANT PROGRAMS presented by BORZOO BONAKDARPOUR has been accepted towards fulfillment of the requirements for the MS. degree in Computer Science Major Professor’s Signature I‘M 6! ° 9 J Date MSU is an Affirmative Action/Equal Opportunity Institution LIBRW' Michigan State University PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJCIRCJDateDuepes-p. 15 Al'TU MECHANICAL VERIFICATION OF AUTOMATIC SYNTHESIS OF FAULT-TOLERANT PROGRAMS By BORZOO BONAKDARPOUR A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Computer Science 2004 AUTO: Faun-1019] verification of able to ensure 0f algorithms prOgram usin; CorrectneSS Of themed by th define a frame sists of defini- an abstract \x' address the I) where progra] algorithms in {01' fiXDOint C; ABSTRACT MECHANICAL VERIFICATION OF AUTOMATIC SYNTHESIS OF FAULT-TOLERANT PROGRAMS By Borzoo Bonakdar‘pour Fault-tolerance is a crucial property in many computer systems. Thus, mechanical verification of algorithms associated with synthesis of fault-tolerant programs is desir- able to ensure their correctness. In this thesis, we present the mechanized verification of algorithms that automate the addition of fault-tolerance to a given fault-intolerant program using the PVS theorem prover. By this verification, not only we prove the correctness of the synthesis algorithms, but also we guarantee that any program syn- thesized by these algorithms is correct by construction. Towards this end, we formally define a framework for formal specification and verification of fault-tolerance that con- sists of definitions of programs, specifications, faults, and levels of fault-tolerance in an abstract way, so that they are independent of platform and architecture. We also address the problem of formal verification of automatic synthesis of multitolerance, where programs are subject to multiple classes of faults. The essence of the synthesis algorithms involves fixpoint calculations. Hence, we also develop a reusable theory for fixpoint calculations on finite sets in PVS. DEdicata €71 Dedicated to my parents in Iran and my brother for their faith, support, and encouragements. This thesis would not exist without their love. iii First Of finanCial‘ a of mathpm aISO intrOdI techniunS' with him 3‘ tolerance P how to face My guiti and Dr. C from all thf and interest his valuable I would like Esfahanian. semesters. It has bet accessible sou addition of f2 to mention h. ACKNOWLEDGEMENTS First of all I thank my advisor, Dr. Sandeep Kulkarni, for offering me technical, financial, and moral support during the last two years. He introduced me to the area of mathematical approaches to automate program synthesis and transformation. He also introduced me to computer-assisted verification and specifically theorem proving techniques. Much of the results reported in this thesis is inspired by my discussions with him about his ideas on developing a general framework for verification of fault- tolerance properties of systems. He helped me to understand what research is and how to face and solve a new problem. My guidance committee comprised of Dr. Sandeep Kulkarni, Dr. Betty Cheng, and Dr. Charles Ofria, and I am grateful that I had access to valuable guidance from all three of them. I express my thanks to Dr. Cheng for her critical reading and interesting questions during my defense. I express my thanks to Dr. Ofria for his valuable questions and comments on practical applications of this thesis. Also, I would like to truly thank our department’s graduate director, Dr. Abdol—Hossein Esfahanian, for offering financial support through teaching assistantships for several semesters. It has been a great pleasure to work closely with Ali Ebnenasir. He was the most accessible source for understanding the concepts of program synthesis and automatic addition of fault-tolerance. He also co—authored two papers with me. It is impossible to mention his innumerable contributions towards my work. My special thanks go to my colleagues Laura Campbell, Mahesh Arumugam, and Sascha Konrad for their comments and proof reading of my papers. I would like to use this opportunity to thank all my friends on Michigan State University campus; because of them my stay here has been very enjoyable and memorable. iv LIST OF I 1 Introdu 1,1 Aul 12 The 1.3 Org- 2 lVIOdeliI 2.1 An " 2.1. 2.1. 2.1.. 2.2 Pro; 2.3 Spec 24 Fan: 2.5 PrOI 3 ATheor 3.1 Cale, 3.1.] 3.1.2 3.2 Cale 3.2.1 3.2.2 4 SpeCifiCa Toleranc, 4'1 Auto 4.1.1 4.1.2 4.2 Amt) 4.2.1 4.2.2 AUtOr 4.3.1 4.3.2 ‘ ddin 4.3 4.4 SPBCificatj a-nCe 5.1 Fallltg titoler Table of Contents LIST OF FIGURES vii 1 Introduction 1 1.1 Automated Reasoning About F ault-Tolerance ............. 2 1.2 Thesis Contributions ........................... 3 1.3 Organization of the Thesis ........................ 4 Modeling a Framework for Fault-Tolerance and Problem Statement 6 2.1 An Introduction to PVS ......................... 7 2.1.1 The PVS Specification Language ................ 7 2.1.2 The PVS Typechecker ...................... 8 2.1.3 The PVS Theorem Prover .................... 9 2.2 Program .................................. 10 2.3 Specification ................................ 12 2.4 Faults and Fault-Tolerance ........................ 13 2.5 Problem Statement ............................ 16 A Theory for Fixpoint Calculations on Finite Sets 19 3.1 Calculating the Largest Fixpoint .................... 21 3.1.1 Specification of the Largest Fixpoint Calculation ........ 21 3.1.2 Verification of the Properties of the Largest Fixpoint ..... 23 3.2 Calculating the Smallest Fixpoint .................... 27 3.2.1 Specification of the Smallest Fixpoint Calculation ....... 28 3.2.2 Verification of the Properties of the Smallest Fixpoint ..... 30 Specification and Verification of Algorithms for Synthesis of Fault- Tolerance 32 4.1 Automatic Addition of Failsafe Fault-Tolerance ............ 34 4.1.1 Specification of the Synthesis of Failsafe Fault-Tolerance . . . 34 4.1.2 Verification of the Synthesis of Failsafe Fault-Tolerance . . . . 38 4.2 Automatic Addition of Nonmasking F ault-Tolerance .......... 47 4.2.1 Specification of the Synthesis of Nonmasking Fault-Tolerance . 47 4.2.2 Verification of the Synthesis of Nonmasking Fault-Tolerance . 49 4.3 Automatic Addition of Masking Fault-Tolerance ............ 51 4.3.1 Specification of the Synthesis of Masking Fault-Tolerance . . . 51 4.3.2 Verification of the Synthesis of Masking Fault-Tolerance . . . 56 4.4 Adding Fault-Tolerance in the Low Atomicity Model ......... 6O Specification and Verification of Automatic Synthesis of Multitoler- ance 62 5.1 Faults, Fault-tolerance, and Multitolerance ............... 63 5.2 The Problem of Mechanical Verification of Automatic Synthesis of Mul- titolerance ................................. 65 5.3 SD“ .\Iul 5.4 SD“ tOIm 6 Related 6.1 Cha 6.2 FOUI 6.3 Klee} 6.3.1 6.3.2 6.3.3 7 Discussic 8 Conclusil APPENDIC A Formal S B Formal S C Formal s. D Formal S] E Formal S] F Formal SI LIST OF RE 6 7 8 5.3 Specification and Verification of Synthesis of Nonmasking-Masking Multitolerance ............................... 5.4 Specification and Verification of Synthesis of Failsafe-Masking Multi- tolerance .................................. Related Work 6.1 Challenges in Automatic Synthesis of Programs ............ 6.2 Formal Verification of Fault-Tolerant Architectures .......... 6.3 Mechanical Verification of Fault-Tolerance Properties of Programs . . 66 70 74 75 78 81 6.3.1 Formal Verification of Fault-Tolerant Embedded Digital Systems. 81 6.3.2 Formal Verification of Fault-Tolerant Algorithms ........ 6.3.3 Formal Verification of Abstractions for Fault-Tolerance . . . . Discussion Conclusion and Future Work APPENDICES A B C Formal Specification of the Fault-Tolerance Framework Formal Specification of the Fixpoint Theory Formal Specification of the Synthesis of Failsafe Fault-Tolerance 82 83 86 91 96 97 101 106 D Formal Specification of the Synthesis of Nonmasking Fault-tolerance112 E F Formal Specification of the Synthesis of Masking Fault-tolerance Formal Specification of the Synthesis of Multitolerance LIST OF REFERENCES vi 114 120 125 31 32 33 34 41 42 43 'The Lari The Sun) TTR* The The The The 3.1 3.2 3.3 3.4 4.1 4.2 4.3 5.1 5.2 List of Figures The relationship between 9, :c, and :1: — g(:r) ............... 22 Largest fixpoint calculation. ....................... 22 The relationship between r, 2:, and a; U r(a:). .............. 28 Smallest fixpoint calculation ........................ 29 The synthesis algorithm for adding failsafe fault-tolerance ....... 35 The synthesis algorithm for adding nonmasking fault-tolerance . . . . 48 The synthesis algorithm for adding masking fault-tolerance ...... 52 The synthesis algorithm for adding nonmasking-masking multitolerance 66 The synthesis algorithm for adding failsafe-masking multitolerance. . 70 vii Chat Intro Autmnz methods, ha: instructions resent. years systems, whe computer sys For deca 80fithms hav. ematical tech Hence, man}... build frame“.C SUCh COmDUter than manual D AlltOlllate( include all poss this Category .. Chapter 1 Introduction Automated logical reasoning about computer systems, widely known as formal methods, has been successful in a number of domains. Proving properties of computer instructions sets is perhaps the most established application of formal methods in the resent years [1]. Another application of formal methods is in verification of critical systems, where strong assurance is required to gain more confidence in correctness of computer systems such as avionic and air traffic control systems. For decades, mathematicians have worked on techniques for verifying that al- gorithms have the properties they are expected to have. While traditional math- ematical techniques are valuable in this context, they are prone to human errors. Hence, many researchers have focused on computer-aided verification techniques to build frameworks for automated logical reasoning about computer systems. Thus, such computer-aided techniques for verification allow us to obtain more confidence than manual proofs. Automated proof systems fall into two broad categories [1]. The first can be seen as an extension of testing where automated support is exploited to create tests that include all possible cases, thus providing a proof of correctness. Model checkers fall in this category. The second can be seen as a formalization of mathematics where logic is used IO to aid the this thesis of autorna 1.1 A Fault-I ality in the hence, one Mechanical The related programs. I mechanical \ Cartner veril Shankar [,5] r clusion token aIEOII'thm by , While the Programs bein, A more general prOSTaIns. With this is used to characterize mathematical reasoning, and automated formal support is used to aid the creation and checking of proofs. Theorem prouers fall in this category. In this thesis, we focus on using theorem proving techniques to reason about properties of automatic synthesis of fault-tolerant programs. 1.1 Automated Reasoning About Fault-Tolerance Fault-tolerance is the ability of programs to provide a desirable level of function- ality in the presence of faults [2]. It is a necessity in most computer systems and, hence, one needs strong assurance of fault-tolerance properties of a given system. Mechanical verification of such systems is one way to get a strong form of assurance. The related work in the literature has focused on verification of concrete fault-tolerant programs. For example, Owre, Rushby, Shankar, and Henke [3] propose a concrete mechanical verification for a fault-tolerant digital-flight control system. Mantel and Géirtner verify the correctness of a fault-tolerant broadcast protocol [4]. Qadeer and Shankar [5] mechanically verify the self-stability property of Dijkstra’s mutual ex- clusion token ring algorithm [6]. Kulkarni, Rushby, and Shankar [7] verify the same algorithm by exploiting the theory of detectors and correctors [2]. While the verifications performed in [3—5, 7] enable us to gain confidence in the programs being verified, it is difficult to extend these verifications to other programs. A more general approach, therefore, is to verify algorithms that generate fault-tolerant programs. With this motivation, in this thesis, we focus on the problem of verifying algo- rithms that synthesize fault-tolerant programs. With such verification, we are guar- anteed that all the programs generated by the synthesis algorithms indeed satisfy their fault-tolerance requirements. Towards this end, we verify the transformation algorithms presented by Kulkarni and Arora, in [8,9], and Kulkarni and Ebnenasir in [10], using; of transform algorithms II gram to a m of faults. To in PVS. Thif and levels of algorithms 31' synthesized b no need to re We note I simultaneous 1 programs [1 1 .i in verification the Synthesis sets in PVS. calculations m 1-2 The In this tl of fault-tolera: IOIIOWS: ' “'9 Provi programs abSII‘a(-t]. in [10], using the PVS theorem prover. The algorithms in [8,9], focus on the problem of transforming a given fault-intolerant program to a fault-tolerant program. The algorithms in [10], focus on the problem of transforming a given fault-intolerant pro- gram to a multitolerant program, where the program is subject to multiple classes of faults. To verify these algorithms, we, first, model a framework for fault-tolerance in PVS. This framework consists of definitions for programs, specifications, faults, and levels of fault-tolerance. Then, we verify that the programs synthesized by the algorithms are indeed fault-tolerant. By this verification, we ensure that any program synthesized by these algorithms is also correct by construction and, hence, there is no need to verify the individual synthesized programs. We note that the algorithms in [8,9], are the basis for their extensions to deal with simultaneous occurrence of multiple faults from different types [10] and for distributed programs [11,12]. Thus, the verification of transformation algorithms in [8,9] is helpful in verification of algorithms in [10—12]. Since fixpoint calculation is at the heart of the synthesis algorithms, we also develop a theory for fixpoint calculations on finite sets in PVS. This theory is reusable for other formalizations that involve fixpoint calculations over finite sets as well. 1.2 Thesis Contributions In this thesis, we concentrate on mechanical verification of automatic synthesis of fault-tolerant and multitolerant program. The contributions of this thesis are as follows: 0 We provide a foundation and a formal framework for modeling fault-tolerance, where programs are subject to a single class of faults, and multitolerance, where programs are subject to multiple classes of faults. This framework is designed abstractly in such a way that it is independent of specific programs and plat- forms. 0 We develop a theory in PVS for fixpoint calculations over finite sets. This theory is expected to be reusable elsewhere. 0 We verify the correctness of the synthesis algorithms in [8, 9] for automatic addition of fault-tolerance, so that not only we ensure their correctness, but also we guarantee that any program synthesized by the algorithms is also correct by construction. 0 We verify the correctness of the synthesis algorithms in [10] for automatic ad- dition of multitolerance, so that not only we ensure their correctness, but also we guarantee that any program synthesized by the algorithms is also correct by construction. 1.3 Organization of the Thesis The organization of the thesis is as follows: in Chapter 2, first, we present an introduction to PVS. Then, we provide the formal definitions of programs, specifica- tions, faults, and fault-tolerance. Then, we formally state the problem of mechanical verification of synthesis of fault-tolerant programs. In Chapter 3, we develop and verify a theory for fixpoint calculations over finite sets. In Chapter 4, based on the formal framework developed in Chapter 2 and the fixpoint calculation theory, we for- malize and verify the synthesis algorithms proposed in [8] in PVS. In Chapter 5, first, we state the transformation problem for synthesizing multitolerant programs. Then, we present how we extend our formal framework, so that it can deal with simulta- neous occurrence of faults from multiple classes. Based on the extended framework, then, we mechanically verify the algorithms for synthesizing multitolerant programs. In Chapter 6, we present the related work on the problem of program synthesis and mechanical verification of fault-tolerance. In Chapter 7, we discuss about different aspects of the problem of verification of synthesis of fault-tolerant programs and we answer the questions have been raised on this work. Finally, we make concluding remarks and discuss future work in Chapter 8. Chapter 2 Modeling a Framework for Fault-Tolerance and Problem Statement In this chapter, we describe how we develop a general framework to model fault-tolerance 1. More specifically, we give formal definitions of elements of a fault- tolerance framework that are programs, specifications, faults, and fault-tolerance. Based on the definition of the above elements, we formalize the synthesis algorithms in [8] in Chapter 4. Note that the definitions must be extendable, so that we can reuse them as a basis to formalize the definitions regarding multiple classes of faults and multitolerance in Chapter 5. In addition, we also discus how we model the definitions in PVS in an abstract level, so that they are independent of any particular program and platform. In our framework, programs are specified in terms of their state space and their transitions. The definition of specifications is adapted from Alpern and Schneider [13]. The definitions of faults and fault-tolerance are adapted from Arora and Gouda [14] 1Appendix A contains the formal specification of this framework. and Kulkarni [2]. We give the definition of multitolerance and its related concepts in Chapter 5 separately. Before getting into details of formalization of fault-tolerance, we give an intro- duction to the PVS theorem prover. More specifically, we introduce the terminology that we effectively use throughout the thesis. The reader may skip Section 2.1, if (S)he is familiar with PVS. 2.1 An Introduction to PVS In this section we give a brief introduction to PVS and its syntax and seman- tics from the PVS user manuals [15—17]. PVS (Prototyping and Verification System) provides an integrated environment for the development and analysis of formal speci- fication and supports a range of activities involved in creating, analyzing, modifying, managing and documenting theories and proofs [16]. PVS runs on SUN 4 SPARC work- stations using Solaris 2 or higher and PC systems running on Redhat Linux. PVS is implemented in Common Lisp, but it is not necessary to know Lisp to effectively use the system. The Emacs editors provide the interface to PVS. 2.1.1 The PVS Specification Language The specification language of PVS is built on higher-order logics [17]. Thus, functions can take functions as arguments and return them as values, and quantifi- cations can be applied to function variables. There is a set of built-in types and type-constructors, as well as the notion of subtypes. We effectively use the features of higher-order definitions in formalizing our framework for fault-tolerance. Specifications are logically organized into parameterized importable theories and datatypes. In this thesis, every synthesis algorithm is modeled in a separate theory. Also, we have two more theories for formalizing our general framework for fault- tolerance and fixpoint calculations over finite sets. Now, we briefly review the elements and concepts of PVS language that we use in deveIOping our specifications. Types. Semantically, there are four kinds of types in PVS [17]: o Uninterpreted types support abstraction by providing a means of introducing a type with a minimum of assumptions on the type and imposing almost no constraints on an implementation of the specification. 0 Interpreted types are primarily a means for type expressions. Integers, Booleans, etc., are interpreted types. 0 Dependent types may be defined in terms of an earlier defined type. 0 Subtypes introduce a set of elements that is a subset of the set of elements in the super type. Our PVS specifications are founded on propositional logic, set theory, and theory of infinite sequences. These are built-in theories along a rich set of lemmas and theorems in the PVS prelude that assist us in proving our theorems 2.1.2 The PVS Typechecker The PVS typechecker analyzes theories for semantic consistency and adds se— mantic information to the internal representation built by the parser [16]. The type system of PVS is not algorithmically decidable; i.e., theorem proving may be required to establish the type-consistency of a PVS specification. Hence, in order to decide on type checking, PVS needs assistance from the developer. The proof obligations that need to be proved are called Type-Correctness Conditions (TCCs). In many cases, when a PVS specification contains similar definitions, the type- checker may generate same proof obligations. Judgments provide a means for con- trolling this by allowing the developer to prove a proof obligation once and reuse the formal proof to discharge the same TCCs. This kind of judgements are called constant judgements. There exists another kind of judgments, subtype judgements, which is useful for discharging TCCS generated for subtypes. In this thesis, since we have no subtypes in our specifications, we only state and prove constant judgements. 2.1.3 The PVS Theorem Prover PVS provides an interactive proof checker (prover) with the ability to store and replay proofs. PVS can be instructed to perform a single proof, or to rerun all the proofs in a given theory, all the proofs of all the lemmas used in a proof, or all the proofs in an entire specification. The prover maintains a proof tree, and it is the goal of the user to construct a proof tree which is complete, in the sense that all of the leaves are recognized as true. Each node of the proof tree is a proof goal that follows its offspring nodes by means of a proof step. The PVS proof checker employs a sequent calculus. Each proof goal of proof tree is a sequent. Each sequent consists of a sequence of formulas called antecedents and a sequence of formulas called consequents. The intuitive interpretation of a sequent is that the conjunctions of the antecedents implies the disjunctions of the consequents: (A1 A A2 A A3...)=> (81 V 82 V 83....) Now, we briefly describe some of the terminology that we use throughout this thesis for verification of our theorems: 0 Since our specifications effectively contains recursive definitions, induction is our basic tool to reason about the properties of the recursive definitions. 0 Skolemization is the way to remove universal quantifiers from consequents and existential quantifiers from antecedents. o lnstan univei for sm 0 Case z and ti o The ru Ience. ' “'8 use taining 0 The G; Simplifi explicit many 0 ° The GR metic Si aflSlng ] A prograr. from Its dolna o lnstantiation is the way to remove existential quantifiers from consequents and universal quantifiers from antecedents. For instance, if we are to prove: (V$=f(£v)=$+1)=> (f(P)=P+1) for some constant p, all we need to do is instantiating a: with p. 0 Case analysis of the current sequent splits the conjunctions in the consequent and the disjunctions in the antecedent to separate subgoals. For instance, (A=>B/\C) E (A=>B)/\(A=>C). o The rule of extensionality converts a set equivalence to a propositional equiva- lence. For instance, A g B E Va: : (A(:r) => B(a:)). 0 We use propositional and arithmetic simplifications to simplify the sequent con- taining solvable propositional and arithmetic formulas. o The GRIND strategy performs skolemization and instantiation, propositional simplification, rewriting using lemmas as rewrite rules, definition expansion, explicit case analysis according to the case structure in the goal, and performs many of these steps repeatedly until no further simplification is possible [15]. o The GROUND command invokes propositional simplification followed by arith- metic simplification and it is useful in obtaining simplified forms of the cases arising from propositional simplification [15]. 2.2 Program A program p is a finite set of transitions in its state space. In a program-specific approach a state of a program p is obtained by assigning each variable of p a value from its domain. However, as we do not deal with a specific program, in our formal 10 specification, we model the notion of state by an UNINTERPRETED TYPE in PVS (cf. Section 2.1.1). Likewise, a transition is modeled as a pair of states, which is also defined as an uninterpreted type. The programs of our interest are finite-state programs. Hence, we assume that the number of states and transitions are finite. We model this by two ASSUMPTIONS in PVS to express that the types state and transition are finite. The state space of p, denoted by Sp, is the set of all possible states of p. In PVS, we model the state space by the fullset of the type state. We define the following JUDGEMENT to avoid getting repetitive type-checking proof obligations from the PVS type-checker (cf. Section 2.1.2): Judgement 2.1: 3,, is a finite set. Proof: We discharge the judgement by considering the assumption that the type state is finite and applying the built-in finite_full lemma in the PVS prelude. In finite.full lemma, it is proved that the fullset of a finite type is a finite set. The type Action denotes finite sets of transitions. We model program, p, by a subset of S, x Sp. This actually means p can be any constant subset of 5,, X S,D and, hence, a program can be any set of transitions in its state space. A state predicate of p is a subset of 3,. In PVS, we model a state predicate, StatePred, by a finite set over states. This abstraction in definitions of program and state predicate is necessary to ensure that the verifications are correct for any program synthesized by the algorithms verified in chapters 4 and 5. A state predicate S is closed in the program p if and only if for all transitions (so, 31) in p, if so 6 S then 31 E S. Hence, we define closure as follows: closed (3,7)) = (Vso,sl | (so,sl)Ep: (3065 => 3165)). A sequence of states, (50,31, ...), is a computation of p if and only if any pair 11 of two consecutive states is a transition in p. We formalize this by a DEPENDENT TYPE as follows: Computation(Z) : TYPE = {c : sequence[state] I (Vi I i 2 0 : (oi, c,-+1) E Z)} where sequence[state] : N —> state and Z is any finite set from type of Action. A computation prefix is a sequence of states, where the first j steps are transitions in the given program: prefirr(Z,j) : TYPE 2 {c2 sequence[state] I (Vi I i Vm = (Cm+jacm+j+1) E 19- After removing the universal quantifier in the consequent, we manually instantiate n with j + m. Then, a propositional simplification discharges the theorem. Discussion. Throughout this chapter, we modeled all the elements of our formal framework abstractly. For instance, states and transitions are modeled as uninter- preted types, and program, faults, and the safety specification are modeled by ar- bitrary sets of transitions. Also, state predicate and, thus, program invariant are modeled by arbitrary subsets of the state space. This abstract modeling gives us the freedom to claim that any programs synthesized by the algorithms in chapters 4 and 5 are correct by construction and, hence, there is no need to verify the synthesized programs individually. 2.5 Problem Statement In this section, we recall (from [8]) the problem of automatic synthesis of fault- tolerance and we formally state the problem of mechanical verification of synthesis of fault-tolerant programs. As described earlier in this chapter, the fault-intolerant pro- gram is specified in terms of its state space Sp, its transitions, p, and its invariant, S. The specification provides a set of bad transitions (that should not occur in program computation). The faults, f, are specified in terms of a finite set of transitions. Like- wise, the fault-tolerant program is specified in terms of its state space Sp, its set of transitions, p’, its invariant, 3’, its specification, spec, and the type of fault-tolerance it provides. 16 Now. ' p, Note th alone to I" the same I that p sati: invariants . o If 5' inclnt iIl thr‘ 0 Regar and p transit that. p‘ Based on the (this definiti. Now, we explain what it means for a fault-tolerant program p’ to be derived from p. Note that this derivation is based on that p is obtained by adding fault-tolerance alone to p. Hence, we should be able to prove that in the absence of faults p’ has the same behavior as 19. Specifically, p’ should satisfy spec from S’ by simply using that p satisfies spec from S. To ensure this, we consider the relation between (1) the invariants S and S’, and (2) the transitions p and p’. o If S’ contains states that are not in S then, in the absence of faults, p’ will include computations that start outside S and hence, p’ contains new behaviors in the absence of faults. Therefore, we require that S’ Q S. 0 Regarding the transitions of p and p’, we focus only on the transitions of p’ IS’ and pIS’. If p’ IS’ contains a transition that is not in pIS’, p’ can use this transition in a new computation in the absence of faults and, hence, we require that p’IS’ Q pIS’. Based on the above two requirements we state the transformation problem is as follows (this definition will be instantiated in the obvious way depending upon the level of tolerance): The Transformation Problem Given p, S, spec, and f such that p satisfies spec from S. Identify p’ and S’ such that: o S’ Q S - (p’IS')§ (plS’) o p’ is f-tolerant to spec from S’ In order to define soundness and completeness in the context of the transformation problem, we define the corresponding decision problem (Likewise, this definition will be instantiated in the obvious way depending upon the level of tolerance): 17 The Decision Problem Given p, S, spec, and f such that p satisfies spec from S. Does there exist p’ and S’ such that: o S’ Q S ' (P’IS')§(I9|S’) o p’ is f-tolerant to spec from S’? Soundness. An algorithm for the transformation problem is sound if and only if for any given input, its output, namely p’ and S’, satisfies the transformation problem. Completeness. An algorithm for the transformation problem is complete if and only if for any given input, if the answer to the decision problem is ”yes” then algorithm always finds program p’ and state predicate S’. In Chapter 4, our goal is to mechanically verify that the proposed algorithms in [8] are indeed sound and complete using the PVS theorem prover. More specifically, based on the definitions in this chapter, we show that the algorithms in [8] satisfy the transformation problem. Also, we verify that the algorithms for adding failsafe and nonmasking fault-tolerance are complete. In Chapter 5, we redefine the transformation problem for synthesis of multitol- erant programs separately. Based on the new definition, we present a new problem statement for mechanical verification of automatic synthesis of multitolerance. 18 Chapter 3 A Theory for Fixpoint Calculations on Finite Sets In this chapter, we present a theory for calculation of fixpoint of formulas over finite sets 1. This theory is essential in the verification of the synthesis algorithms in chapters 4 and 5. For instance, the essence of adding failsafe and masking fault- tolerance to a given fault-intolerant program is recalculation of the invariant of the fault-intolerant program which in turn involves calculating the fixpoint of a formula. More specifically, we calculate fixpoint of a given formula to (i) calculate the set of states from where safety may be violated by faults alone; (ii) remove the set of states from where closure of fault-span is violated by fault transitions, and (iii) remove deadlock states that occur in a given state predicate. The p—calculus theory of the PVS prelude contains general definitions of the standard fixpoint calculation, however, it is not convenient to use that theory in the context of our problem. This is due to the fact that this library focuses on infinite sets and is not specialized to account for the properties of functions used in synthesis of fault-tolerant programs. By contrast, we find that by customizing the 1Appendix B contains the formal specification of this theory. 19 theory to the properties of functions used in synthesis of fault-tolerant programs, we can simplify the verification of the synthesis algorithms. Hence, in this chapter, we develop a theory for fixpoint calculations on finite sets. We also state and verify their properties. This library is expected to be reusable for other formalizations that involve fixpoint calculations on finite sets as well. Based on the definitions in this chapter, we model the synthesis algorithms for addition of fault-tolerance in chapters 4 and 5. Then, using the properties of smallest and largest fixpoints, verified in this chapter, we verify the correctness of the algorithms. A fixpoint of a function f : X —> X is any value 2:0 6 X such that f (120) = $0. In other words, further application of f does not change its value. For instance, fixpoint of f (:r) = 2:2 — 2a: — 1 is 2:0 = 1. A function may have more than one fixpoint. For example, every $0 in the domain of f (:11) = a: is a fixpoint. The least upper bound of fixpoints is called the smallest fizpoint and the greatest lower bound of fixpoints is called the largest firpoint. In our context, the functions whose fixpoint is calculated demonstrate certain characteristics. Hence, as described above, we focus on customizing the fixpoint theory based on these characteristics. In the context of finite sets, domain and range of f, X, are both finite sets of finite sets. Throughout this chapter, the variables i, j, k range over natural numbers. The variable a: is any finite set of any uninterpreted type and variable b is any member of 3:. The organization of this chapter is as follows: in Section 3.1, we present how to formalize and verify the properties of the largest fixpoint calculation. We describe the the same issues for the smallest fixpoint calculation in Section 3.2. 20 3.1 Calculating the Largest Fixpoint As mentioned before, a function may have several fixpoints. The greatest lower bound is fixpoints is called the largest firpoint. We recall that we study the concept of fixpoint calculations in the context of finite sets. In Section 3.1.1, we describe how we formalize the calculation of the largest fixpoint by the PVS specification language. Then, in Section 3.1.2 we verify the properties of the largest fixpoint. 3.1.1 Specification of the Largest Fixpoint Calculation One type of functions used in synthesis of fault-tolerance is a decreasing function for which the largest fixpoint is calculated. Given is an initial finite set a: and a function g that calculates a subset of 3:. The fixpoint calculation procedure removes the states in g(r) from a: and repeats this removal recursively until g(zr) becomes the empty set. This function may resemble a certain property. For example, it may calculate the set of deadlock states of a given state predicate. We formalize the procedure of calculating the largest fixpoint of a formula by first defining the type DecFunc for decreasing functions such as 9 such that: g : {A : finiteset} —> {B : finiteset I B Q A}. In other words, for all finite sets :13, g(cc) Q :1: (cf. Figure 3.1). With such a decreasing function, we define Dec(i, 11:)(g) to formalize the recursive behavior of the largest fixpoint calculation. Dec(i, :r)(g) keeps removing the elements of the initial set, 2:, that the function g of type DecFunc returns at every step: Dec(i—1,:r)(g)— g(Dec(i—1,:c)(g)) ifi 2% 0; a: ifi=0 Dec(i, 1") (g) = 21 Full set Figure 3.1: The relationship between 9, 2:, and :r - g(z). Full set Dfil+1 . X) Figure 3.2: Largest fixpoint calculation. Finally, we define the largest fixpoint as follows (cf. Figure 3.2): LgFia:(:r)(g) = {b | Vic : b E Dec(k,:r)(g))} Our goal is to prove the following properties of the largest fixpoint based on the above definitions: - g(LgFix(:v)(g)) = 0 0 LgFix($)(g) = LgFi$(LgFia=($)(g))(g) Intuitively, the first above property means that further applications of Dec func- tion does not change the value of the largest fixpoint. The second property means that LgFizr is indeed the largest fixpoint . We prove the above properties in theorems 3.7 and 3.8 in Section 3.1.2. 22 Remark. The above definition of fixpoint, as compared to definition of the least fixpoint in A—calculus [18] and p-calculus, is somewhat non-traditional. We find that this definition assists in verification of the synthesis algorithms. For example, we apply this fixpoint calculation for removing deadlock states where g(zr) denotes the deadlock states in set 3:. After calculating the largest fixpoint, we need to show that no deadlock states remain in the set 1:. Thus, we should show: g(LgFiz(:c)) = 0. Moreover, if g(LgFia:(:c)) = (I) then Vi : Dec(i, LgFia:(:r:)) = LgFix(:r). 3.1.2 Verification of the Properties of the Largest Fixpoint In this section, first, we develop and prove a chain of lemmas. We use these lemmas to prove the properties of the largest fixpoint of a formula in theorems 3.7 and 3.8. We effectively use the mentioned theorems to verify the soundness and completeness of the synthesis algorithms in chapters 4 and 5. Lemma 3.1: Any application of Dec function decreases the size of the initial set until it becomes the largest fixpoint. Formally, (j < ’6) => (D66(k,X)(g) E 0660', 10(9)) Proof. The INDUCT-AND—SIMPLIFY strategy discharges the lemma. CI Lemma 3.2: Until the fixpoint is achieved, the cardinality of Dec( j + 1,:r) is less than or equal to It] — j — 1. Formally, W = ((.<1(DeC(j,-’I=)(9))7é 0) ==> |(D€C(J'+1,$)(9)I S le -J' -1)) Proof. We prove this lemma by induction on j. In the base case, j = 0, after eliminating the quantifiers and expanding the definitions, we need to show if g(x) is nonempty then Ix — g(r)I g I:r| — 1. We prove this by using two predefined lemmas in the PVS prelude: card.diff.subse: Vy, z : finiteset : ((3; Q 2) => (Iz — yI 2 [2| - IyI)), and 23 nonempty_card: Vy : finiteset : (y 75 (I) 4:) IyI > 0). After instantiations, using the facts g(x) Q a: and g(x) 7E (I), the GRIND strategy dis- charges the base case. For induction step, after eliminating quantifiers, and expanding definitions, we need to prove (g(DeCU + 1.x)(g)) 75 I) A lDeCU + 1.x)(g)l S livl - J' - 1) => ([0860 + 1 + 1,$)(9)| 5 Incl - (J' + 1) - 1)- We discharge the induction step in the same way that we proved the base case. [3 Lemma 3.3: If the fixpoint is reached by step j then in any subsequent steps, fixpoint will be maintained. Formally, Vi : ((9(DEC(J',$)(9)) = 01) => (Vk | k 2 j =9(DeC(k,x)(g)) = 91)) Proof. After skolemization to remove the universal quantifier, we place induction on k. The base case, k = j = 0, is trivially true. In the induction step, we need to prove: (g(DeCUC, 110(9)) = 0) => (g(Dedk + 1, x)(g)) = (0)- By expanding the definition of Dec in the consequent, Dec(k + 1,3)(g) = Dec(k,:r)(g) — g(Dec(k,:r)(g)), and considering the antecedent we infer that g(Dec(k,:r)(g)) = (I), therefore g(Dec(k + 1,a:)(g)) = g(Dec(k,:r)(g)), which is equal to the empty set. E] Lemma 3.4: There exists a step i such that subsequent applications of g returns the empty set. Formally, 3 i: (Vn In _>_ i : g(Dec(n,:r)(g)) = Ill) Proof. First, we instantiate i with IxI. Then, after skolemization, we need to prove g(Dec(n, a:)(g)) = 0. Using Lemma 3.2 and instantiating j with Ier, we need to show two subgoals: Subgoal l: [Dec(Ier + 1,:r)(g)| > |:r:I — |:r| — 1, which is trivially true. Subgoal 2: (g(Dec(Ier,a:)(g)) = (0) => (g(Dec(n,:r)(g)) = 0). From Lemma 3.3, 24 we know Vj : (g(Dec(j,2:)(g)) = (0) => (W: I k 2 j : g(Dec(k,2:)(g)) = 0)). After automatic instantiations, we need to prove: (Vk | k 2 III I g(DeC(k.=r)(g)) = 0) => (g(DeC(n.:v)(g)) = 0)- Manual instantiation of k with n discharges the lemma. El Lemma 3.5: There exists a step j where fixpoint is achieved. Formally, 3 j = (Vk l k 2 J' = ((DCCUCa 93)(9) = 0660', 30(9)) A (g(DCC(k.$)(9)) = 91») Proof. Proof of the second conjunct is exactly the same as proof of Lemma 3.4, so we proceed with the proof of the first conjunct. From Lemma 3.4, we know that the existence of j such that We I k 2 j : g(Dec(k,2:)(g)) = 0. Using Lemma 3.4 and after skolemization, we place induction on k. In the base case, k = j = 0, we need to show Dec(0,2:)(g) = Dec(j,2:)(g), which is trivially true. In induction step, we need to prove: W I i 22' = ((06603 $)(g) = 1366(2) iv)(9) A g(DeCU, 90(9)) = 0) => (0660' + 1, :10(9) = 0660'. x)(g))) We prove this by applying the rule of extensionality and expanding Dec(i + 1, 2:)(g), which is equal to Dec(i,:r)(g) — g(Dec(i, 2:)(g)). As g(Dec(i, 2:)(g)) = (II, Dec(i + 1, 2:)(g) = Dec(i, 2:)(g) = Dec(j, 2:)(g) and the proof is complete. III Lemma 3.6: For some value j, Dec( j, 2:) will reach a fixpoint, and at this step value of Dec( j, 2:) is equal to the largest fixpoint. Formally, 3 j = (g(DBC(J',$)(9)) = I) A (1366(3) $)(9) = LgF z'.’1=(3=)(g))) Proof. Similar to proof of Lemma 3.5, the proof of the first conjunct is the same as proof of Lemma 3.4. To prove the second conjunct, first, we apply the rule of extensionality to convert the set equalities to boolean equalities. Then, a propositional split generates two subgoals: Subgoal 1: Vb E LgFix(:c)(g) : b E Dec(j, 2:)(g). First, we expand the definition of 25 V LgFia: = {b I Vk : b E Dec(k,:c)(g)} in the antecedent. Then, instantiating k with j proves the subgoal. Subgoal 2: V(b E Dec(j,:r)(g)) : b E LgFi2:(:c)(g). To verify this subgoal, after expanding the definition of LgFix and eliminating the universal quantifier by skolemization, we need to show: Vb E Dec(j,$)(g) : b E Dec(k,2:)(g). Using Lemma 3.5, we know: W I 2' Z 2' = (0600', $)(g) = 0660', $)(g) A g(DeC(i,-T)(9)) = (0)- We instantiate i with k and by prOpositional simplification through the GROUND command, we prove this subgoal. El Theorem 3.7: Application of function g on the largest fixpoint of a finite set returns the empty set. Formally, g(LgFi22(2:)(g)) = 0 Proof. Using Lemma 3.6, the GRIND strategy completes the proof. [3 Theorem 3.8: The largest fixpoint of the largest fixpoint of a function is equal to the largest fixpoint. Formally, LgFi:r(x)(g) = LgFix(LgFi=v($)(g))(g) Proof. By applying the rule of extensionality we convert the set equality to a boolean equality. Now we need to prove two subgoals: Subgoal 1: Vb : (b E LgFi2:(LgFi2:(2:)(g))(g) => b 6 LgFi:r(2:)(g)). After elimi- nating the universal quantifier, we expand the definition of the first LgFia: in the antecedent, which turns to We : b E Dec(lc, LgFix(:r)(g))(g). Then after manual in- stantiation of k with 0, the GRIND strategy discharges the subgoal. Subgoal 2: We need to show Vb : (b E LgFi2:(:r)(g) => b E LgFi2:(LgFi2:(2:)(g))(g)). We expand the definition of the first LgFia: in the consequent. Now we need to prove: 26 Vb: (b E LgFi23(2:)(g) => Vk : b E Dec(lc, LgFi2:(23)(g))(g)). We proceed by induction on k. In the base step, I: = 0, we need to show: Vb : (b 6 LgFi2:(2:)(g) => b E Dec(0,LgFi:r(2:)(g))(g)). We prove this by expanding the definition of Dec(O, LgFi2:(2:)(g))(g), which is equal to LgFi2:(2:)(g). At induction step, we need to show: Vb= ((b E LgF z'rlr(r13)(g) A W = (b E 0860', LgFi-‘r($)(g))(g)) => b E 0660+1.L9Fi$($)(g))(g))) After eliminating the quantifiers, we expand the definition of Dec(j + 1,...) = Dec(j, ...) — g(Dec(j, ...)) in the consequent. From Lemma 3.3, we know: Vi : (g(DeCU', $)(g)) = 0 => W I k 2 i : awed/6.110(9)) = 0)- We proceed using this lemma and we instantiate set 2: with LgFi2:(2:)(g) and i with 0. This generates another two subgoals: Subgoal 2.1: VI: : ((g(Dec(lc, LgFi2:(2:)(g))(g)) = (II) => (g(DeCU, LgFix($)(g))(g)) = 0))- Instantiating k with j and rewriting the definitions discharges the subgoal. Subgoal 2.2: g(Dec(O,LgFix(2:)(g))(g))) = (0. Towards this end, we expand the definition of Dec(O, LgFi2:(2:)(g))(g), which is equal to LgFi:r(2:)(g). From Theorem 3.7, we know that g(LgFi2:(2:)(g)) = (I). Proving this subgoal, completes the proof. CI 3.2 Calculating the Smallest Fixpoint As mentioned earlier in this chapter, the least upper bound of fixpoints is called the largest firpoint. In Section 3.2.1, we describe how we formalize the calculation of the largest fixpoint by the PVS specification language. Then, in Section 3.2.2 we verify the properties of the largest fixpoint. 27 Full set Figure 3.3: The relationship between r, 2:, and :r U r(:c). 3.2.1 Specification of the Smallest Fixpoint Calculation The second type of fixpoint used in fault-tolerance synthesis is an increasing function for which the smallest fixpoint is calculated. Given is an initial finite set 2: and a function r that calculates a finite set disjoint from 2:. Calculation of the smallest fixpoint involves adding the state in r(:r) to 2: recursively until r(:c) becomes the empty set. The function r may resemble a certain property. For instance, it may calculate the set of state that are reachable from 2:. We formalize the procedure of calculating the smallest fixpoint of a formula by first defining the type IncFunc for the increasing functions such as r such that: r : {A : finiteset} —> {B : finiteset | A (I B = 0}. In other words, for all finite sets 2: 2: fl r(2:) = (0 (cf. Figure 3.3). With such an increasing function, we define I nc(i, 2:)(r) to formalize the recursive behavior of the smallest fixpoint calculation. I nc(i, 2:)(r) starts adding elements to the initial set, 2:, that the function r of type IncFunc returns at every step: Inc(i,2:)(r) _—_ I’M" ’:$)(’)UT(I"C(i-1ix)(r)) 1:: aé :; at 1 i = 28 Full set lnc(i+1. x) Figure 3.4: Smallest fixpoint calculation. Finally, we define the smallest fixpoint as follows (cf. Figure 3.4): SmFi27(2:)(r) = {b I El k : b E Inc(k,:r)(r)} Our goal is to prove the following properties of the smallest fixpoint based on the above definitions: 0 r(SmFi:r(2:)(r)) = (II o SmFi2:(2:)(r) = SmFi2:(SmFi:r(2:)(r))(r) Intuitively, the first property means that SmFia: is indeed the smallest fixpoint. The second property means that further applications of of Inc function does not change the value of the smallest fixpoint. These properties are stated in theorems 3.15 and 3.16. Remark. The above definition of fixpoint, as compared to definition of the least fixpoint in A—calculus and u—calculus, is somewhat non-traditional. We find that this definition assists in verification of the synthesis algorithms. For instance, we apply the smallest fixpoint for calculating the reachable states of an initial set from where the safety specification may be directly violated, where r(:c) denotes the reverse reachable states of set 2:. After calculating the smallest fixpoint, we need to show that the smallest fixpoint contains all the states from where the safety specification can be violated in one or more steps of computation. 29 3.2.2 Verification of the Properties of the Smallest Fixpoint In this section, similar to the largest fixpoint calculation, we develop and prove a chain of lemmas to verify the properties of smallest fixpoint in theorems 3.15 and 3.16. We effectively use the mentioned theorems to verify the soundness and completeness of the synthesis algorithms in chapters 4 and 5. The proof of the mentioned lemmas and theorems 3.15 and 3.16 are very similar to the corresponding lemmas and theorems in Section 3.1.2. Lemma 3.9: Any application of Inc function decreases the size of the initial set until it becomes the smallest fixpoint. Formally, (’6 < 2') => (1n6(k,$)(7‘) Q 1716(2) $)(T)) Proof. The INDUCT_AND_SIMLIFY discharges the lemma. E] Lemma 3.10: Until the fixpoint is achieved, the cardinality of I nc( j, 2:) is less than or equal to | fullset[T] I — |2:| —j — 1). Formally, Vj : (r(Inc(j,:c)(r)) 96 (0) => IInc(j -l- 1,2:)(r)I S I fullset[T] I — I2:| —j— 1) Proof. The proof is similar to the proof of Lemma 3.2 in Section 3.1.2 El Lemma 3.11: If the fixpoint is reached by step j then in any subsequent steps, fixpoint will be maintained. Formally, W = ((707160) $)(r)) = 0) => W | k 21': (r(1n6(k,$)(r))= 0)) Proof. The proof is similar to the proof of Lemma 3.3 in Section 3.1.2. [:1 Lemma 3.12: There exists a step i such that subsequent applications of g returns the empty set. Formally, 3 J’ = (V(’6 | k 2 2') = (r(1nC(k,$)(r)) = 0) Proof. The proof is similar to the proof of Lemma 3.4 in Section 3.1.2. [I 30 Lemma 3.13: There exists a step j where fixpoint is achieved. Formally, El j : (Vk I k 2 j : (Inc(k,2:)(r) = Inc(j,2:)(r) /\ (r(Inc(k,2:)(r)) = 0))) Proof. The proof is similar to the proof of Lemma 3.5 in Section 3.1.2. Cl Lemma 3.14: For some value j, I nc( j, as) will reach a fixpoint, and at this step value of I nc( j, 2:) is equal to the smallest fixpoint. Formally, 3 j = ((17160, 1v)(7‘) = Sm172317017X?» A (7(1n0(j,$)(7')) = 0)) Proof. The proof is similar to the proof of Lemma 3.6 in Section 3.1.2. C] Theorem 3.15: Application of function r on the largest fixpoint of a function returns the empty set. Formally, r(SmFi2:(:r)(r)) = 0 Proof. The proof is similar to the proof of Theorem 3.7 in Section 3.1.2. E] Theorem 3.16: The smallest fixpoint of the smallest fixpoint of a function is equal to the smallest fixpoint. Formally, SmFia:(2:)(r) = SmFi2:(SmFi2:(2:)(r))(r) Proof. The proof is similar to the proof of Theorem 3.8 in Section 3.1.2. El 31 Chapter 4 Specification and Verification of Algorithms for Synthesis of Fault-Tolerance In this chapter, we describe the synthesis algorithms in [8] and explain how we formalize and verify them in PVS 1. As mentioned in Chapter 2, we are interested in three levels of fault-tolerance: failsafe, nonmasking, and masking. In this chapter, our focus is on the algorithms that synthesize fault-tolerant programs in the high atomicity model that are subject to one and only one class of faults. In this model, a program transition can read all variables as well as write all variables in one atomic step. For this model, in [8], the authors propose deterministic polynomial time algorithms to synthesize fault-tolerant programs for the three desired levels of fault-tolerance described in Chapter 2. We will also discuss about the verification of the nondeterministic synthesis al- gorithm, in [8], for the programs in the low atomicity model where a program cannot read or write all the variables in one atomic step. lAppendices C to E contain the formal specification of the algorithms. 32 In Section 4.1, we first describe, how to formalize and verify the algorithm for synthesis of failsafe fault—tolerant programs. Then, in Section 4.2, we present formal specification and verification of addition of nonmasking fault-tolerance. In Section 4.3, we describe how we mechanically verify the addition of masking fault-tolerance. Finally, in Section 4.4, we argue why the verification of the nondeterministic algorithm for synthesizing the programs in the low atomicity model is not quite a challenging problem. The essence of adding failsafe and masking fault-tolerance to a given fault- intolerant program is recalculation of the invariant of fault-intolerant program which in turn involves calculating the fixpoint of a formula. More specifically, we calculate fixpoint of a given formula to (i) calculate the set of states from where safety may be violated by faults alone; (ii) remove the set of states from where closure of fault- span is violated by fault transitions, and (iii) remove deadlock states that occur in a given state predicate. The mentioned fixpoint calculations are essential parts of re- calculating the invariant and fault-span to obtain fault-tolerant programs. Once the invariant is calculated the set of program transitions can be calculated in a straight- forward manner. We effectively use the theory of fixpoint calculations developed in Chapter 3 to formalize and verify the synthesis algorithms. Throughout this chapter, the variables x, s, so, 31 range over states. The variables i, j, k, m range over natural numbers. The variable X ranges over StatePred and the variable Z ranges over Action. As defined in Section 2.5, p and p’ are fault-intolerant and fault-tolerant programs respectively, S and S’ are invariants of fault-intolerant and fault-tolerant programs respectively, T is fault-span, f is the finite set of faults, and spec is the finite set of bad transitions that represents the safety specification. 33 4.1 Automatic Addition of Failsafe Fault- Tolerance In this section, we present forrnal specification and verification of the algorithm for adding failsafe fault-tolerance proposed in [8]. Along the formal specification, we also describe the intuition behind each step of the mentioned algorithm. As for verification, we prove the correctness of both soundness and completeness of the algorithm. 4.1.1 Specification of the Synthesis of Failsafe Fault- Tolerance Intuitively, the main feature of a failsafe program is it never violates the safety specification. However, it may not recover to its normal behavior. The essence of adding failsafe fault-tolerance to a given fault-intolerant program is to remove the states from where safety may be violated by taking one or more fault transitions. We reiterate the algorithm Add.failsafe (from [8, 9]) in Figure 4.1. As we describe the algorithm, we explain how to formalize it in PVS. In order to construct ms, the set of states from where safety can be violated by one or more fault transitions, first, we define msI nit as the finite set of states from where safety can be violated by taking a single fault transition. Note that (30,31) 6 spec means (so, 31) directly violates the safety specification. Formally, msInit : StatePred = {so I 3 31 : (so, 31) E f A (30,31) 6 spec} Now, we define a function, ReuReachStates, that calculates a state predicate from where states of another finite set, rs, are reachable through a fault transition. Formally, 34 Add_failsafe(p, f : transitions, S : state predicate, spec : specification) { ms := smallestfia:point(X = X U {so I (3 51 : (80,81) 6 f) /\ (31 E X V (30,31) violates spec) }; mt := {(30,31) : ((31 Ems) V (30,31) violates spec) }; S’ := ConstructInvariant(S — ms, p—mt); if (S’={}) declare no failsafe f-tolerant program p’ exists; else p’ :=ConstructTransitions(p—mt, S’) } ConstructInvariant(S : state predicate, p : transitions) // Returns the largest subset of S such that computations of p within that subset are infinite return largestfixpoint(X = (X (I S) — {so I (V31 : 81€X : (30,31) ¢ p)} ConstructTransitions(p : transitions, S : set of states) {return p-{(so,sl) : sOES A 31¢ S} } Figure 4.1: The synthesis algorithm for adding failsafe fault-tolerance ReuReachStates(rs : StatePred) : StatePred = {so[331231ErsA(so,sl)Ef/\so¢rs} As mentioned earlier, we use the fixpoint theory developed in Chapter 3 to formalize the synthesis algorithms. Obviously, ReuReachStates identifies a finite set that is disjoint from rs. Hence, ReuReachStates has the type of IncFunc. The following judgement helps the PVS type checker to discharge later proof obligations: Judgement 4.1 : ReuReachStates has type of IncFunc. Proof. The GRIND strategy simply discharges the judgement. C] We use the definition of smallest fixpoint, developed in Section 3.2.1, to define the state predicate ms. Towards this end, we consider msInit as the initial set, 2:, and ReuReachStates as the increasing function, r: 35 ms : StatePred = SmFi:r(msInit)(ReuReachStates) Then, we define the finite set of transitions, mt, that must be removed from p. These transitions are either transitions that may lead a computation to reach a state in ms, or transitions that directly violate safety: mt: Action = {(so, 31) | (31 6 ms V (so, 31) E spec)} The algorithm Add_failsafe removes the set ms from the invariant of the fault- intolerant program S. It also removes mt from p. This removal may create deadlock states (cf. Section 2.4). The set of deadlock states in ds using program Z is defined as follows: DeadlockStates(Z)(ds : StatePred) : StatePred = {so I so 6 ds:(V31I31E ds : (so, 31) ¢ Z)} Similar to ReuReachStates, we define the following judgement to help the PVS type checker to discharge proof obligations in the later theorems: Judgement 4.2: DeadlockStates(Z) has type of DecFunc. Proof. The GRIND strategy simply discharges the judgement. C] We construct the invariant of the fault-tolerant program by removing the deadlock states to ensure that computations of fault-tolerant program are infinite (cf. Section 2.4). We define ConstructInvariant using the largest fixpoint of a finite set X, that removes deadlock states of a given state predicate X: 36 ConstructInvariant(X, Z) : StatePred = LgFi2:(X)(DeadlockStates(Z)) We formally define define the invariant of fault-tolerant program as follows: S’ : StatePred = ConstructInvariant(S — ms,p — mt) We construct the finite set of transitions of fault-tolerant program by removing the transitions that violate the closure of S’: p’: Action = p— mt — {(30,31) I ((so,sl) E (p— mt)) A (so 6 S’ A 31 ¢ S’)} Finally, we present the definitions that we use in verification of completeness of the algorithm Add_failsafe. First, we define what it means for a program p” with invariant S” to be failsafe: IsFailsafe(S” : StatePred, p” : Action) : bool = (S” # 0) A closed(S”, ”) A (3"c5) AW’IS”§p|S”) A (DeadlockStates(p”)(S”) = (II) A Vj : (Vc : prefi2:(p” U f,j) I co 6 S” : V]: I k 2: ¢ ms). By instantiating k with 0, propositional simplification discharges the observation. [:1 Theorem 4.4: The first condition of the transformation problem (cf. Section 2.5) holds. Formally, S’ Q S Proof. Our strategy to prove this theorem is based on the fact that S’ is made out of S by removing some states. After expanding the definitions of S’, 38 ConstructInvariant, and LgFizr, we need to prove: Vk : (V2: : (x E Dec(k, S — ms)(DeadlockStates(p — mt)) => 2: 6 S)). Towards this end, first, we instantiate k with zero. Then, after expanding the definitions, we need to prove V2: : (2: E S — ms => 2: E S), which is trivially true. Cl Theorem 4.5: The second condition of the transformation problem (cf. Section 2.5) holds. Formally, p’lS’ (_I pIS’ Proof. The GRIND strategy discharges the theorem. Cl Theorem 4.6: S’ is closed in p’. Formally, closed(S’, p’) Proof. The GRIND strategy discharges the theorem. CI Lemma 4.7: V(so,sl) : ((so, 31) E f A 31 6 ms) => so 6 ms Proof. After expanding the definition of ms, we need to prove: V(so, sl) : ((so, 31) E f A 31 E SmFi2:(msInit)(ReuReachStates)) => so 6 S mF i:r(msI nit)(ReuReachStates). We know that ms is the smallest fixpoint of ms] nit by adding new reverse reachable states via fault transitions. Hence, recalculation of ms does not change its contents (cf. Theorem 3.16). Thus, if 31 is in ms, so should already be in ms as well. We proceed by applying Theorem 3.16. After instantiating 2: with msI nit, and r with ReuReachStates in Theorem 3.16, we expand the definition of the first SmFia: in S mF i2:(SmF i2:(msI nit) (ReuReachStates))(ReuReachStates) = S mF i2:(msI nit) (ReuReachStates). Then, we prove the lemma by one step manual calculation of Inc to show that if 31 is in Inc(SmFi2:(...)) then so is in it as well. D 39 Remark. Note that Lemma 4.7 is one of the instances where formalization of the fixpoint calculation in Chapter 3 is used. More specifically, using Theorem 3.16, we could show that ms contains all the states that may lead a computation to a state that violates safety and further calculation of ms does not change its contents. In other words, ms is a smallest fixpoint. Lemma 4.8: In the absence of faults, all computations of p — mt that start from a state in S’ are infinite. Formally, DeadlockStates(p -— mt)(S’) = 0 Proof. First, we expand the definitions of S’ and ConstructInvariant. Then, we need to prove: Deadlocthates(p — mt)(LgFix(S — ms)(DeadlockStates(p — mt))) = 0. Using Theorem 3.7, we instantiate z with LgFi:r(S - ms), and g with DeadlockStates(p — mt) to complete the proof. [I Theorem 4.9: In the absence of faults, all computations of p’ that start from a state in S’ are infinite. Formally, DeadlockStates(p’)(S’) = (0 Proof. In Lemma 4.8, we showed that all computations of p — mt that start from a state in S’ are infinite. Now, we need to show that all computations of p — mt after removing the transitions that violate the closure of S’ are still infinite. Obviously, removal of such transitions does not have anything to do with deadlock states, because the source of a transition that violates the closure must have been removed during the removal of deadlock states. Hence, the mechanical verification only involves a sequence of expansions and propositional simplifications. D 40 Remark. Note that Theorem 4.9 is another instance where formalization of fixpoint calculation in Chapter 3 is used. More specifically, DeadlockStates(p’)(S’) denotes the deadlock states in S’ using program p’. We repeatedly remove these deadlock states. Hence, once the fixpoint is reached, there are no deadlock states. Lemma 4.10: In the presence of faults, no computation prefix of failsafe tolerant program that starts from a state in S’ , reaches a state in ms. Formally, Vj:(Vc:prefi2:(p’Uf,j)IcoES’:Vka co 9E ms. The base case can be discharged using Observation 4.3. In induction step, we need to prove: (ann (Vkack+1¢ms). From Lemma 4.7, we know that if the terminus of a fault transition (so, 31) is in ms, then the source, so, is in ms as well. This means that if so is not in ms then 31 is not in ms either. We know that ck ¢ ms and, hence, by applying Lemma 4.7, we can infer ck+1 ¢ ms. CI Theorem 4.11: Any prefix of any computation of failsafe tolerant program in the presence of faults that starts in S’ does not violate safety. Formally, Vj : (V(c : prefi2:(p’ U f),j I co 6 S’) :Vk I k 3 c(f) I co = 31 : (3 k: (ck,ck+1) E spec)) The induction step intuitively means that if there exists a computation that starts from a state in ms and violates safety in j steps, there also exists a computation that starts from ms and violates safety in j + 1 steps. To prove the induction step, we do a case analysis on the step that 31 has been added to ms: Case I. 51 E ms(j): This case is trivially true, because we already know that there exists a computation that starts in ms( j ) and violates safety. Case II. 81 E ms(j + 1) — ms(j): To prove this case, after expanding the definitions of ms and Inc, we need to prove: (31 E ReuReachableStates(Inc(j,msInit)(ReuReachableStates)) A Vs I s E ms(j) : (3 c(f) I co = s: (3 k : (ck,ck+1) 6 spec)) A 3(31 6 Inc( j, msInit)(ReuReachableStates))) => 3 c(f) I co = 31 : (3 k: (ck,ck+1) E spec) N ow, we instantiate s with 31 to eliminate the universal quantifier in the antecedent. By removing the universal quantifier, the existential quantifier on c( f ) can be removed by skolemization. Now, we need to prove: ((32 E Inc(j,msInit)(ReuReachableStates)) A ((31,32) E f) A c6 = 32 A ((c’k,cI, +1) 6 spec) A (31 9! I nc(j, msI nit)(ReuReachableStates))) => 3 CU) I00 = 31 = (3 k = (ck,ck+1) E 8266)) This implication intuitively means that if computation c’ starts from 32 and violates safety in j steps, and (31, so) is a fault transition, then there should exist a computa— tion that starts from 51 and violates safety in j + 1 steps. To prove this implication, 43 we instantiate c with add(sl, c’) where add forms a computation such that co = 31 and su f fi2:1(c) = c’. Obviously, c is the computation that we are looking for. The mechanical proof after instantiation of c with add(sl, c’) is only a chain of automatic rewriting and propositional simplifications. C] Lemma 4.14: For all states that are added to ms, there exists a computation of only faults that violates safety. Formally, Vs Is 6 ms: (3 c(f) I co = s : (3 k : (ck,ck+1) E spec)) Proof. From Lemma 4.13 we know that for all i, from any state in ms(i), there exists a computation that eventually violates safety. Thus, to formally verify this lemma, we only need to do a chain of automatic skolemization, definition expansion, and instantiations. At the end, the GRIND strategy discharges the lemma. D Lemma 4.15: The invariant of any program that satisfies the transformation prob— lem is a subset of S — ms. Formally, VS”,p” : (IsFailsafe(S”,p”) => (S” Q S — ms)) Proof. After skolemization, expanding the definition of IsFailsafe, and applying Lemma 4.14, we need to prove: (S"§S A S”;é0/\ V2:I2:€ms:(3c(f)Ico=2::3k:(ck,ck+1)€spec) A VClP" U f) I 00 E 3" 3V1 1 (02': 9+1) 55 51060) => (S” Q S — ms) From the second and third conjuncts in the antecedent, we can infer that any state in ms cannot be in S”, as p” is failsafe. In other words, 8” 0 ms = 0. Besides, S” is nonempty. Hence, S” Q S — ms. To mechanically prove the Lemma based on the mentioned argument, it suffices to remove the quantifiers by skolemization and perform automatic instantiation, rewriting, and propositional simplification. El 44 Theorem 4.16: If there exists a program that satisfies the transformation problem, algorithm Add.failsafe does not declare failure under the condition that all states in S are in ms as well. Formally, 3 S”,p” : (IsFailsafe(S”,p”) => (S — ms yé (0)) Proof. From Lemma 4.15 we know that S” is a subset of S — ms. In addition, p” is failsafe and S” 79 0. Hence, S —ms cannot be empty as well. In order to mechanically verify this theorem, first we remove the existential quantifier by skolemization. Then, automatic, instantiations, rewriting, and propositional simplification discharges the theorem. El Lemma 4.17: For all transitions in mt, there exists a computation of only faults that starts with that transition and violates safety. Formally, V(t1,t2) I (11,12) 6 mt: (3 c(f) I 60 = t1=(3 1“ 3 (Ckack+1) E 31980)) Proof. The transitions in mt are the ones that are either in spec or their terminus are in ms. To mechanically verify this lemma, first, we perform a case analysis on the type of transition. If it is in spec, any computation that starts with that transition is the answer. As mentioned before, one reason that a transition is in mt is that its terminus is in ms. Thus, by applying Lemma 4.14 we can discharge the lemma, because for any computation that contains a state in ms, there exists a suffix that violates safety. CI Lemma 4.18: Any failsafe program that satisfy the transformation problem is a subset of p — mt. Formally, VS”,p” : (IsFailsafe(S”,p”) => p” I S” Q p - mt) Proof. The proof idea is similar to the proof of Lemma 4.15; we use the fact that p” is failsafe and based on Lemma 4.17, we show that p” I 5” must be a subset of 45 p — mt. After skolemizing, expanding the definitions, and applying Lemma 4.17, we need to prove: (V(t1,t2) I (t1,t2) 6 mt: (3 c(f) I co = t1 : (3 k: (ck,ck+1) E spec)) A (10” I S” E p I S”) A we" u f) | Co e S" = (w z (are...) at spec)) => (19” I S” 9 p - mt) From the first and third conjuncts in the antecedent, we can infer that any transition in mt cannot be in p”, as p” is failsafe. In other words, (p” I S”) 0 mt = 0. Hence, p” I S” Q p -— mt. Based on this argument, to mechanically prove the lemma, it suffices to remove the quantifiers by skolemization and perform automatic instantiation, rewriting, and propositional simplification. El Theorem 4.19: If there exists a program that satisfies the transformation problem, algorithm Add.failsafe does not declare failure under the condition that all the states in S - ms are deadlock states. Formally, 3 S”,p” I IsFailsafe(S”,p”) : (S’ # (II) Proof. From Theorem 4.16 we know that if there exists a failsafe program p”, then S — ms is not equal to the empty set. Hence, it suffices to prove that if such a failsafe program exists, then all the states in S — ms are not deadlock states. Towards this end, first, we apply Lemma 4.18. Then, after simplifications we need to prove: ((S”§ S—mS) A (10” | 3” Ep-mt) A (S" 79 (II) A closed(S”, ”) A (p” I S" Q I) I S”) A (DeadlockStates(p”)(S”) 75 (0)) => (5' 9e 0). After expanding the definitions of S’, ConstructInvariant, and LgFix, we need to prove: ((S”QS—ms) A (p”IS”Qp—mt) A 46 (S” # 0)) A closed(S”,p”) A (p” I S" Q I) I S”) A (DeadlockStates(p”)(S”) 94 (0)) => Vk : (Dec(k, S — ms)(DeadlockStates(p — mt)) 75 0). We proceed by placing induction on k. We discharge the base case, k = 0, using the GRIND strategy. In the induction step, we need to prove: (Dec(j, (S — ms)(DeadlockStates(p - mt)) ¢ (0) A ((S” Q S - W) A (10” I S” E 19- mt) A (S” 5'5 III) A closed(S”, ”) A (p" I S” E p I S”) A (DeadlockStates(p”)(S”) 75 0)) => Dec(j + 1, (S — ms)(DeadlockStates(p — mt)) 96 (II To prove the induction step, first, we expand the definition of Dec( j + 1,...) by one step. Then, the GROUND strategy discharges the theorem. CI 4.2 Automatic Addition of Nonmasking Fault- Tolerance In this section, we present formal specification and verification of the algorithm for adding nonmasking fault-tolerance proposed in [8]. Along the formal specification, we also describe the intuition behind each step of the mentioned algorithm. As for verification, we prove the correctness of both soundness and completeness of the algorithm. 4.2.1 Specification of the Synthesis of Nonmasking Fault- Tolerance To synthesize a nonmasking fault-tolerant program p’, we ensure that from any state in the state space, p’ eventually recovers to a state in the invariant. However, 47 during this recovery, the safety specification may be violated temporarily (cf. Section 2.4). Thus, it suffices to add one step of recovery from every state in the state space to the invariant. We reiterate the algorithm Add_nonmasking (from [8,9]) in Figure 4.2. Addmonmasking(p, f : transitions, S : state predicate, spec : specification) { S’ := S; p, I: (pIS) U {(80,81)130¢S /\ 8163} } Figure 4.2: The synthesis algorithm for adding nonmasking fault-tolerance Modeling Add_nonmasking in PVS is straightforward. As safety may be temporarily violated during the recovery, the algorithm does not change the invariant of the fault-intolerant program in synthesizing the fault-tolerant program. Hence, we define the invariant of nonmasking program as follows: S’ : StatePred = S. As mentioned earlier, the set of transitions for fault-tolerant program is the original transitions of the program plus the transitions that add recovery to the invariant in one step. In PVS, we define the set of transitions for fault-tolerant program as follows: p’:Action=(pIS) U {(so,sl)Iso¢S A 81ES} 48 4.2.2 Verification of the Synthesis of Nonmasking Fault- Tolerance In order to verify the correctness of algorithm Add_nonmasking, we prove that this algorithm is sound and complete. First, we present the proof of soundness and then we describe how to verify the completeness. Soundness In order to verify the soundness of Add_n0nmasking, we need to prove two properties to show that p’ is nonmasking fault-tolerant. First, we should show that the fault-tolerant program satisfies spec in the absence of faults. Second, we should show that in the presence of faults, if faults perturb the state of program, it eventually recovers to its invariant. Theorem 4.20: The first condition of the transformation problem (cf. Section 2.5) holds. Formally, S’ Q S Theorem 4.21: The second condition of the transformation problem (cf. Section 2.5) holds. Formally, p’IS’ Q PIS’ Theorem 4.22: S’ is closed in p’. Formally, closed(S’, p’) Proof. The GRIND strategy discharges the theorems 4.20, 4.21, and 4.22. CI Lemma 4.23: In the absence of faults, any computation of nonmasking program that starts from any state in the state space, eventually recovers to the invariant. Formally, Vc(p’):(3j|j>0:c,-ES’) Proof. After eliminating the universal quantifier by skolemizing, we need to prove: (Vn=(c..c.+1)ezf) ==> (31|j>0=c.-€S’). 49 Now, if we manually instantiate n with 0 and j with 1, we should prove: ((60.61) 6 P’) = (Cl 6 S’) This implication is true, as any transition in p’ ends in 8’. Thus, the GRIND strategy simply discharges the lemma. El Theorem 4.24: In the presence of faults, any computation of nonmasking program that starts from any state in the state space, eventually recovers to the invariant. Formally, Vc(p’Uf):(3jIj>0:CjES') Proof. From Lemma 2.3 in Section 2.4 we know that the number of occurrences of faults is finite and, hence, there exits a suffix of the computation that is fault-free. Using this suffix, and by applying Lemma 4.23, we can show that the computation will eventually recover to the invariant. Hence, after applying the mentioned lemmas, we need to prove: (Vn : (c,,,c,,+1) Ep’ Uf A Vm i (suffifL‘ACIma suffi$j+1(6)m+1) E P') => 3 i I i > 0 : c,- E S’ where su f f i2:,-(c) is a fault-free suffix of computation c that starts from the jth state and su f f i2:,-(c)m is the mth state in that suffix. Now, we manually instantiate i with j + 1 and m with 0. Obviously, Cj+1 E S’, as the transition that ends at c,“ is in p’D Theorem 4.25: For any synthesized nonmasking program, there exists a fault-span. Formally, 3 T : FaultSpan(T, S’,p’ U f) Proof. The proof of this theorem is the same as proof of Theorem 4.12. D 50 Completeness Add-nonmasking always finds a solution to the transformation problem. In other words, the answer to the decision problem is always yes. Therefore, the algorithm is complete and there is no need to formally verify the completeness. 4.3 Automatic Addition of Masking Fault- Tolerance In this section, we present formal specification and verification of the algorithm for adding masking fault-tolerance proposed in [8]. Along the formal specification, we also describe the intuition behind each step of the algorithm. As for verification, we prove that the algorithm is sound. 4.3.1 Specification of the Synthesis of Masking Fault- Tolerance In this section, we describe how we model the addition of masking fault-tolerance to a given program p. First, we reiterate the algorithm Add.masking (from [8, 9]) in Figure 4.3. As mentioned in Chapter 2, in addition of masking, the requirement for preserv- ing the Iiveness properties of a program is that the fault-tolerant program does not deadlock in the presence of faults and it should recover to the invariant after a finite number of steps while preserving safety. As mentioned in Section 2.4, we assume that the number of occurrences of faults in a computation is finite. Based on this assumption, we proved that for any computation of program in the presence of faults, there exists a suffix of the computation that is free from faults (cf. Lemma 2.3). We use this lemma to show that a masking fault-tolerant program always recovers to its 51 Add_rnasking(p, f : transitions, S : state predicate, spec : specification) { ms 2: smallestfizpoint(X = X U {so I (3.91 : (so, 31) E f) A (31 E X V (so, 31) violates spec) }; mt :2 {(so, .91) : ((31 Ems) V (so,sl) violates spec) }; S1 :2 ConstructInvariant(S — ms, p—mt); T1 := true—ms; repeat T2, 521: T1, 51; p12: pISQ U {(80,151) I So¢Sz A SUETQ A S1€T2}-mt; T1 := ConstructFaultSpan(Tg— {s : 51 is not reachable from s in p, }, f); S1 := ConstructInvariant(32 A T1, p1); if(Si={} V T1={}) declare no masking f-tolerant program p’ exists; exit until (T1=T2 A S] :82); For each state 3 : sETl : Rank(s) = length of the shortest computation prefix of p1 that starts from s and ends in a state in SI; p’ := {(so,sl) : ((so,sl)€p1) A (soESl V Rank(so)>Rank(sl))}; SI 2: 51; T.I 2: T1 } ConstructFaultSpan(T : state predicate, f : transitions) // Returns the largest subset of T that is closed in f. { return largest fizrpoint(X = (X (I T)- {30 3 (381 3 (30,81) 6 f /\ 31¢Xlll Figure 4.3: The synthesis algorithm for adding masking fault-tolerance 52 invariant. As can be seen in Figure 4.3, the algorithm contains a loop that identifies the invariant, program transitions, and fault-span of the masking fault-tolerant program. Modeling the loop is actually the challenging part of formal specification and verifi- cation of the algorithm. We model Add_masking in three phases: (1) initialization, (2) identifying the loop invariant, and (3) termination conditions. The initialization is straightforward and is very similar to how we modeled previous algorithms. The 100p invariant includes two properties (1) the intermediate invariant, So, at the start of the loop is a subset of S, the invariant of the fault-intolerant program, and (2) the intersection of ms and the intermediate fault-span, T2, at the start of the loop is the empty set. Hence, first, we show these properties for the initial guess of invariant and fault-span. Then, we show that if these properties hold at the start of an iteration, they hold at the start of the subsequent iteration as well. Modelling the termination condition is straightforward. Now, we explain the three phases of formalization in details: Initialization: Reasonably, our first estimate of the invariant of fault-tolerant program, S’, is the invariant of its failsafe fault-tolerant version. This actually makes sense because every masking tolerant is failsafe as well. Also, the first estimate of the fault-span is the state space excluding the states from where the safety may be violated, ms. To model the part of Add_masking before the loop, we define Sinit and Tom as follows: Sim: : StatePred = ConstructInvariant(S - ms, p — mt) Tm“ : StatePred = S, — ms The 100p invariant: Now, we model the repeat-until loop. Our initial estimate of the fault-span has two flaws: (1) the fault-span is not necessarily closed under 53 (p — mt) U f and (2) there does not necessarily exist a path from every state in the fault-span to the invariant. To resolve the mentioned flaws, in each iteration of the loop, the algorithm, recalculates the fault-span, invariant, and program transitions until the fixpoint is reached. In other words, the algorithm ensures that for all states in the fault-span, there exists a path that contains a state in the invariant, and the fault-span is closed in program in the presence of faults. The value of the intermediate invariant (respectively, fault-span) at the beginning of the 100p is 52 (respectively, T2). After the recalculation, let the new values of the invariant and fault-span be 51 and T1 respectively. We define 51 and T1 in terms of (arbitrary predicates) S2 and T2 as follows: 1. We define an intermediate program p1 as follows. We require that for a transition (so,sl) in p1, the following conditions are satisfied: (1) if so 6 S2 then 31 E SQ, (2) if so 6 T2 then 31 6 T2. Moreover, p1 does not contain any transition in mt. As mentioned earlier, So and T2 are two arbitrary state predicates that represent the intermediate invariant and fault-span respectively. Formally, S2 : StatePred T2 : StatePred P1 2 Action = ((p I S2) U TS) — mt, where TS: StatePred = {(so,sl) I so 5! S2 A so 6 T2 A 31 6 T2} 2. To formally specify construction of T1, first, we define the finite set of states from where closure of T2 may be violated. Formally, TN Close(X : StatePred) : StatePred = 54 {80I3812806X A (80,81)Ef A 81¢X}. Then, we define the finite set of states from where the intermediate invariant, So, is reachable. Formally, TReach : StatePred = {s I s 6 To A reachable(Sg,T2,p1, 3)} where reachable(Sg, T2, pl, 3) : StatePred = {sI3c(p1):((co=s) A (SET-2) A 3j:cj€Sg)}. Now, we define ConstructFaultspan as the largest subset of TReach that is closed in f. Formally, T1 = ConstructF aultspan(T Reach), where ConstructFaultSpan(X : StatePred) = LgF i2:(X )(TN Close) We remark that the definition of ConstructFaultspan is similar to the defi- nition of ConstructInvariant (cf. Section 4.1.1). The major difference is in ConstructInvariant, we removed deadlock states recursively until we reach a fixpoint, but in ConstructFaultspan, we remove the states from where closure of fault-span may be violated. . Since 51 should be a subset of T1, we model the recalculation of invariant as follows: S1 : StatePred = ConstructInvariant(S,» fl T1)(p1) 55 Termination of the loop: We continue the loop until we achieve the largest fix- point for S1 and T1. Obviously, when the fixpoint is achieved the intermediate and ultimate invariants (respectively fault-spans) are equal. We formalize the termina- tion condition of the loop in the verification phase. More specifically, we prove that provided (S1 = S2) A (T1 = T 2) is true, p1 is failsafe and provides potential recovery from every state in the fault-span. Finally, the algorithm removes the possible cycles from the fault-span. This removal is essential, otherwise a computation may remain in a cycle and never recov- ers to the invariant. We define the finite set of transitions of masking fault-tolerant program, p’, as follows: I” = {(30,51) I ((so,-91) E P1) A ((30 E 51) V rank(so) > rank(sl))}, where rank(s) = min(length(s,p1, S1)), where length(s,p,5) = {J' I 3 C(P) = (S = Co A 62' E 51)} 4.3.2 Verification of the Synthesis of Masking Fault- Tolerance To verify the correctness of algorithm Add_masking, we prove that it is sound. Our proof idea is based on the three phases of our formalization in Section 4.3.1. More specifically, we prove that when the state of the masking fault-tolerant program is perturbed by faults, the program never violates safety and it eventually recovers to the invariant. We also prove that after the 100p terminates the program has all the properties of a masking fault-tolerant program. Note that the proof idea and, in some cases, the proof tree of significant number of lemmas and theorems in this section are similar or exactly the same as lemmas and theorems we proved for Add.failsafe and Add-nonmasking. We discuss the issue of proof reusability more in Chapter 7 and 8. N ow, we describe the three phases of the verification in more details. 56 Properties of initial values for the invariant and fault-span: As mentioned earlier, in the first phase of verification, we show that the initial values of the invariant and fault-span satisfy the 100p invariant. More specifically, we prove this in Observation 4.26, and theorems 4.27 and 4.28. Their proofs are similar to proofs of Observation 4.3 and Theorem 4.4 for Add.failsafe. Note that many of the proofs developed for the verification of Add.failsafe are directly applicable to prove the lemmas and theorems in this section Observation 4.26: There exists no state in the initial fault-span from where the safety may be violated. Formally, Tinit 0 ms = 0 Theorem 4.27: Initially, the invariant is a subset of the fault-span. Formally, Sinit Q Tim't Theorem 4.28: The initial invariant is stronger than the invariant of the fault- tolerant program (the first condition of the transformation problem). Formally, Sinit Q S We remark that we do not state a theorem for the second condition of the transfor- mation problem in the initialization part, because the algorithm does not modify the set of program transitions before the 100p. Properties of the loop invariant: We prove that the synthesized masking fault-tolerant program satisfies the transformation problem by stating and proving a series of theorems and intermediate lemmas. Note that, in the first iteration S2 (respectively T2) is equal to Sinit (respectively Tinit)- Hence, in the first iteration, the 100p invariant is satisfied. Moreover, at the beginning of each iteration S1 (respectively T1) is assigned to So (respectively T2). Hence, we can assume that So 57 and T2 satisfy the loop invariant at the beginning of each iteration. So as to show the 100p invariant, first, we show that if So and T2 (the inter- mediate invariant and fault-span) satisfy the loop invariant then so do S1 and T1 (cf. Theorem 4.29). Then, we state and prove additional theorems about S1 and T1. Proofs of Theorems 429-432 are similar to the proofs of corresponding theorems in the verification of failsafe. Hence, we omit these proofs. Note that, we effectively reuse the proofs deveIOped for verification of Add.failsafe in the following theorems. Theorem 4.29: ((T2 0 ms = 0) => (T10 ms = 0)) A ((32 Q S) => (SI Q S)) Theorem 4.30: The invariant is always a subset of the fault—span. Formally, SI Q T1 Theorem 4.31 : If the intermediate program and invariant satisfy the second con- dition of the transformation problem, so do the program and SI. Formally, (PIIS2QPIS2)=>(P1ISIQPISH) Theorem 4.32: The recalculated invariant contains no deadlock states. Formally, DeadlockStates(p1)(Sl) = 0 Theorem 4.33: The recalculated fault-span is closed in f. Formally, closed(Tl, f) Proof: The proof is similar to the proof of Lemma 4.8. We know: T1 = ConstructFaultSpan(...) = LgFi2:(...). Using Theorem 3.7, in the definition of LgFix, we instantiate X with TReach, and g with TN Close to complete the proof. E] Remark. Note that Theorem 4.33 is another instance where formalization of fixpoint calculation in Chapter 3 is used. More specifically, closed(T1, f) denotes wether the state predicate T1 is closed in f. We repeatedly remove the states from where the closure of T1 may be violated. Hence, once the fixpoint is reached, there are no such 58 states. Properties at the termination of the loop: As mentioned in Section 4.3.1, we formalize the termination condition in the verification phase; i.e, we prove that provided (S1 = 82) A (T1 = T2) is true, p1 is failsafe and provides potential recovery from every state in fault-span. Theorem 4.34: Upon the loop termination, the invariant is closed in the program. Formally, (81 = S2) => closed(S1,p1) Proof: Based on the fact that 52 is closed in p1 by construction, when 31 = So, p1 is closed in 81 as well. Thus, to formally prove the theorem, we replace S1 by S2 to complete the proof. [:1 Theorem 4.35: Any prefix of any computation of the masking fault-tolerant program in the presence of faults does not violate safety. Formally, ((51 = 32) A (T1 = T2)) => VI 3 (VC3PT6f333IP1 Ufij) I 00 6 T1 3W9 I k < j 3 (ckack+1) If spec) Proof: Proof is similar to the proof of Theorem 4.11. CI Theorem 4.36: (T1 = T2) => closed(T1, p1 U f) Proof: Based on the fact that To is closed in p1 by construction, when T1 = T 2, T1 is closed in p1 as well. From Theorem 4.33, we also know that T1 is closed in f. Thus, using Theorem 4.33 and by replacing T1 by T2, we complete the proof. D Theorem 4.37: After termination of the loop, for any state in fault-span, T1, there exists a computation of p1 that starts from that state and reaches the invariant, SI. 59 Formally, ((81: S2) A (T1 2 T2)) => (Vs I s 6 T1 : reachable(Sl,T1,p1,s)) Proof: First, we apply Lemma 2.3 to show that there exists a suffix for every com- putation of p1 U f that contains no transition in f. After replacing T1 and S1 by T2 and S2 in the consequent, we need to prove: Vs I s 6 T1 : reachable(S2, T2,p1, 3). After expanding the definitions of T1, ConstructF aultS pan, and LgF i2: respectively, we need to prove: Vk : (s 6 Dec(k, TReach)(TClose)) => reachable(S2, T2,p1, 3) After instantiating k with O, the GRIND strategy discharges the theorem. [3 Theorem 4.38: For any synthesized nonmasking program, there exists a fault-span. Formally, (T1 = T2) => 3 T: FaultSpan(T,S1,p1 U f) Proof: From Theorem 4.30, we know that 81 is a subset of T1. Also, from Theorem 4.36, we know that if T1 = To is true, then the fault-span is closed in program in the presence of faults. Thus, for mechanical verification, it suffices to, first, instantiate T with T1, and then, apply theorems 4.30 and 4.36. E] 4.4 Adding Fault-Tolerance in the Low Atomicity Model The three algorithms described earlier in this chapter use the notion of program state in an abstract level. In practice, the state of a program is obtained by assigning each variable a value from its domain. In the context of synthesis of real-world programs, variables of a program and restrictions on reading and writing them matter. 60 For example, if a program has two integer variables 2: and y, [2: = 3, y = 10] identifies a state of this program. In this manner, a predicate over the states of a program forms a set of states, so—called a state predicates. For example, 2: 2 5 A y < 20 is a state predicate. To synthesize distributed programs, where processes of a program run on a dis- tributed system, one should consider the read-write restrictions of variables, as it is usually impossible to read or write all the program variables in an atomic step. There has been a lot of work on synthesis of distributed programs in the literature and we briefly discuss the issue of program synthesis in Chapter 6. In [8], the authors prove that the problem of synthesizing a masking fault-tolerant program is N P-Complete in the size of state space and, hence, there is no polynomial time algorithm to synthesize a masking tolerant program unless P = NP. Fur- thermore, the authors present a nondeterministic algorithm that guesses a solution, namely the fault-tolerant program p’, invariant S’, and the fault-span T’. Then, the algorithm verifies wether p’ is fault-tolerant from S’ and T’ is a boundary up to which p’ can be perturbed in the presence of faults. In other words, the nondeterministic algorithm verifies if the guessed solution satisfies the transformation problem. In [19], Kulkarni and Ebnenasir show that the problem of adding failsafe fault-tolerance in the low atomicity model is also NP-Complete in the size of state space. However, identifying the complexity of adding nonmasking fault-tolerance in the low atomicity model is still an open problem. Formal verification of a nondeterministic algorithm is not quite a challenging problem. Once the solution is guessed, the algorithm itself verifies the solutions. Therefore, we do not include the formal specification and verification of the synthesis algorithm in the low atomicity model in this thesis. 61 Chapter 5 Specification and Verification of Automatic Synthesis of Mult itolerance In Chapter 4, we presented specification and verification of algorithms associated with synthesis of fault-tolerant programs that are subject to a single class of faults. Real world systems are often subject to multiple classes of faults and, hence, they need to provide appropriate level of fault-tolerance to each class of faults. In other words, in synthesizing multitolerant programs, we are looking for the answer to the following questions: How a program could handle the occurrence of faults from different classes at the same time? And how would a synthesis algorithm provide a level of fault- tolerance for each class? In this Chapter, first, we introduce the concerns and definitions regarding multi- tolerance in Section 5.1. In Section 5.2, we formally state the problem of mechanical verification of automatic synthesis of multitolerant programs. In Section 5.3, first, we present how to modify the definitions in chapters 2 and 4, so that they become appropriate for modeling the synthesis of multitolerance . Then, we present for- 62 mal specification and verification of nonmasking-masking multitolerance. Finally, in Section 5.4, we present formal specification and verification of failsafe-masking mul- titolerance1 . The time complexity of addition of failsafe-masking and nonmasking-masking multitolerance are polynomial. However, the problem of addition of failsafe- nonmasking is N P-Complete and, hence, there exists no polynomial time algorithm to solve this problem unless P=N P. In [10], the authors propose a nondeterministic algorithm, but we are not interested in verification of that algorithm, as the algorithm verifies the guessed solution by itself. Remark. We remark that in this chapter, we effectively reuse the specification and formal proofs of the framework and algorithms developed in chapters 2 and 4. This is one of the instances that our manual proof reusability shows hope for devel- Oping automated proofs (proof strategies) for future verifications of extensions of the algorithms in [8, 10]. 5.1 Faults, Fault-tolerance, and Multitolerance The notion of multitolerance was first introduced by Arora and Kulkarni [20]. The definitions of program, specification, faults, and fault-tolerance remains the same (cf. Chapter 2). Now, we define what it means when a program is multitolerant. In Section 2.4, we gave the definition of different levels of fault-tolerance for a single class of faults. Now, we consider the case where faults from multiple classes, say f1 and f2, may occur in a given program computation. There exist several possible choices in deciding the level of fault-tolerance that should be provided in the presence of multiple fault-classes. In [10], Kulkarni and Ebnenasir propose to require that the fault-tolerance provided for the class where f1 1Appendix F contains the formal specification of the algorithms for synthesizing multitolerant programs. 63 and f2 occur simultaneously should be equal to the minimum level of fault-tolerance provided when either f1 or f2 occurs. For example, if masking fault-tolerance is to be provided to f1 and failsafe fault-tolerance is to be provided to f2, then failsafe fault-tolerance should be provided for the case f1 and f2 occur simultaneously. We reiterate the following table from [10] that illustrates the minimum level of fault- tolerance provided for different combinations of levels of fault-tolerance provided to individual classes of faults: Level of Fault-Tolerance Failsafe Nonmasking Masking Failsafe Failsafe No—Tolerance Failsafe Nonmasking No-Tolerance Nonmasking Nonmasking Masking failsafe Nonmasking Masking In order to simplify modeling of different classes of faults, we consider the union of all the classes of faults that failsafe (respectively nonmasking and masking) is to be provided, denoted by f failsa ,8 (respectively fnmmasking and fmaskmg). Now, given a fault-intolerant program p, its invariant S, its specification spec, and sets of distinct classes of faults ffaflsafe, fnmmasking, and fma,k,-,,g, we define what it means for a synthesized program p’, with invariant S’, to be multitolerant by considering how p’ behaves when (i) no faults occur; (ii) only one class of faults occur, and (iii) multiple classes of faults occur. Kulkarni and Ebnenasir, in [10], define a multitolerant program as follows: Definition. Program p’ is multitolerant to ffaflsafe, fnmmaumg, and meh-ng from S’ for spec iff the following conditions hold: 1. p’ satisfies spec from S’ in the absence of faults. 2. p’ is masking fmaskmg-tolerant form S’ for spec. 64 3. p’ is failsafe ( f failsa ,3 U fma,,k,,,g)-tolerant form S’ for spec. 4. p’ is nonmasking (fnonmaskmg U f,,,a.,k,-,,g)-tolerant form S’ for spec. 5.2 The Problem of Mechanical Verification of Au- tomatic Synthesis of Multitolerance In this section, we present the problem of mechanical verification of automatic synthesis of multitolerance. Based on the definition of a multitolerant program in Section 5.1, Kulkarni and Ebnenasir identify the requirements of the problem syn- thesizing a multitolerant program p’. The general idea of algorithms for synthesizing multitolerance proposed in [10] is based on the synthesis of fault-tolerance for sin- gle class of faults [8]. Similar to the algorithms in [8] the requirements are (l) the synthesis algorithms should not introduce any new behavior to the fault-intolerant program, and (2) the synthesis algorithms should simply add multitolerance. The formal definition of the transformation problem and soundness are as follows: The Transformation Problem Given p, S, spec, ffausafe, fnmmaskmg, and fmasking such that p satisfies spec from S. Identify p’ and S’ such that: S’ Q S (p'IS’)§(PIS’) p’ is multitolerant to f failsafe, fnmma,k,ng, and fmaskmg from S’ for spec Soundness. An algorithm for the transformation problem is sound if and only if for any given input, its output, namely p’ and S’, solves the transformation problem. In this chapter, our goal is to mechanically verify that the proposed algorithms in [10] for adding failsafe—masking and nonmasking-masking multitolerance are indeed sound by the PVS theorem prover. In other words, based on the definitions in this 65 chapter, we Show that the algorithms in [10] satisfy the transformation problem. 5.3 Specification and Verification of Synthesis of N onmasking—Masking Multitolerance In this section, we present description, formal specification and verification of the synthesis algorithm for addition of nonmasking-masking multitolerance to fault- intolerant programs that are subject to two types of faults fnmmsking and fnmking. First, we reiterate the algorithm Add_Nonmasking_Masking from [10] in Figure 5.1. Add_Nonmasking_Masking(p: transitions, fnoankmg, fmwking: fault, S: state predicate, spec: safety specification) { P1, SI: Tmasking i: Add-Masking(p, fmaskingv S: Spec); if (S’={}) declare no multitolerant program p’ exists; return 0, 0; P" T, i: Add_Nonmasking(p1, fnonmasking U fmasking: Tmaskinga SPEC); return p’, S’; } Figure 5.1: The synthesis algorithm for adding nonmasking-masking multitolerance In order to formalize the algorithm Add_Nonamsking_Masking (and Add_Failsafe-Masking in Section 5.4), first, we define f fag,“ 1e, fnmmmh-ng, and fmmking follows: fnonmaskingmasking 3 ACt’tO’n = f masking U fnonmasking ffailsafeJnasking : ACtion = fmasking U ffailsafe Our formalization for the fault-tolerance framework developed in Chapter 2 is appropriate for the case that we deal with only one class of faults. However, in 66 modeling synthesis of multitolerance, as we have different classes of faults, we need to redefine some of the definitions, so that they accept the type of fault. For example, in Section 4.1.1, we modeled ReuReachableSet as follows: ReuReachStates(rs : StatePred) : StatePred = {so[331:31ErsA(so,sl)EfAso¢rs} The problem with this definition is it only represents the set that can reach rs by taking a transition in f, which is the single class of faults. To formalize the algorithms for synthesizing multitolerance, we need to redefine ReuReachableSet, so that it accepts different classes of faults. With this introduction, we redefine msInit, ReuReachableSet, and ms as follows: msInit(anyFault : Action) : StatePred = {so I 3 31 : (so, 31) E anyFault A (so,sl) E spec} ReuReachStates(anyFault : Action)(rs : StatePred) : StatePred = {so I 3 31 :31 E rs A (so,sl) E anyFault A so 5! rs} ms(anyFault : Action) : StatePred = S mFi2:(msI nit(anyF ault))(ReuReachStates(anyFault : Action)) All the other definitions given in chapters 2 and 4 should also be converted in the same way, so that they are not restricted to only one class of faults. We refer the reader to the appendices for formal specification of new definitions. Now, having the elements of multitolerance framework, we can formalize the algorithm Add.Nonamsking_Masking. Note that most of the definitions we developed in chapters 2 and 4 appear with a parameter for the type of faults. We proceed as follows: Using the algorithm Add.masking, we synthesize a masking fma,k,-,,g-tolerant 67 program p1, with invariant S’, and fault-span Tmaskmg: S’ : StatePred = add_masking.S 1( fmaskmg) Tmasking : StatePred = add_nonmasking.S’(T1(fmaskingfl p1 : Action = add_masking.p1(fma,kmg) We note that add_masking is the name of the theory developed in Chapter 4 for the algorithm Add_masking in PVS. Since program p1 is masking fma,k,-,,g-tolerant, it provides recovery to its invariant S’, from any state in (Tmmking - 5’), while preserving safety. Thus, in the presence of fnonmmkmgmaskmg, if p1 is perturbed to (Tma,k,-,,g — S’), then p1 satisfies the requirements of nonmasking fault-tolerance. However, if fnmma,k,ngma,k,ng perturbs p1 to a state that is not in Tmmking, then recovery should be added from those states. Towards this end, the algorithms ensures that one step recovery is possible. Thus, using the algorithm Add_nonamsking, we add one step recovery from states that are not in Tmasking to the states in Tmmkmg: p’ I Actzo’n = add_n0’nmaSk’l:ng-pl(Tmasking:p1(fmaskin9)) I _ T — Tmasking As mentioned in Section 4.2.1, addition of one step recovery is independent of the type of faults. Hence, to simplify the verification of Add.Nonmasking.Masking, we formalize the multitolerant program by p1(fma,k,-,,g) and not p1(fnmma,k,ngm,k,ng). In order to verify the soundness of Add_Nonmasking_Masking, we prove a chain of lemmas and theorems. Note that, as the fault-span of nonmasking-masking multitolerant program and masking fault-tolerant program are equal, we do not restate and reprove the lemmas and theorems that verify the properties of safety of fault-span in the presence of only fma,k,-,,g. Note that, in most of the following 68 theorems, we assume that the termination condition of the repeat-until loop in Add_masking is satisfied. Theorem 5.1: The first condition of the transformation problem (cf. Section 5.2) holds. Formally, (32 Q 5) => (3' Q 5) Proof. The proof is the same as proof of Theorem 4.30. CI Theorem 5.2: The second condition of the transformation problem (cf. Section 5.2) holds. Formally, (I),I52§PIS2I=>(P'IS’§PIS’) Proof. The proof is the same as proof of Theorem 4.31. CI Theorem 5.3: In the presence of fnmmuhngflaumg faults, any computation of nonmasking-masking multitolerant program that starts from any state in the state space, eventually recovers to the invariant. Formally, VC(P’ U fnonmaskingjnasking) 3 (3 j I j > 0 : Cj E 5’) Proof. The proof is the same as proof of Theorem 4.24. [:1 Theorem 5.4: The invariant of the nonmasking-masking multitolerant program is a subset of the fault-span. Formally, S’ Q T’ Theorem 5.5: (add.masking.T1(fmmkmg) = T2) => closed(T’, p’ U meh-ng) Theorem 5.6: Upon the loop termination, the invariant is closed in the program. Formally, (add_maSking.Sl(fmasking) = S2) => closed(S’,p’) Theorem 5.7: For any synthesized nonmasking-masking program, there exists a fault-span. Formally, 3 T : F aultSpan(T, S’ , p’ U fnmmaskingjnomng) 69 Theorem 5.8: After termination of the loop, for any state in fault-span, T’, there exists a computation of p’ that starts from a state in T’ — S’ and reaches the invariant, S1. Formally, ((Sl(fmasking) = 52) /\ (T1(fmasking) = T2» => Vs I s E T’ : reachable(S’, T’,p’, s) Proof. The proof of theorems 5.4 to 5.8 are the same as proof of the corresponding theorems for the algorithm add_masking in Section 4.3.2. [:1 5.4 Specification and Verification of Synthesis of Failsafe-Masking Multitolerance In this section, we present description, formal specification and verification of the synthesis algorithm for addition of failsafe-masking multitolerance to fault-intolerant programs that are subject to two classes of faults ffaumfe and fmuhng. First, we reiterate the algorithm Add.Failsafe_Masking from [10] in Figure 5.2. Add_Failsafe_lVIasking(p: transitions, ffaflsafe, fmamng: fault, S: state predicate, spec: safety specification,) { (s(,,_1), sn) violates spec }; mt := {(so, 31) : ((31 Ems) V (so, 31) violates spec) }; 1911 SI: Tmasking 3: Add-MGSkinQUD _ mt, fmaskinga 8—7713: mt); if (S’={}) declare no multitolerant program p’ exists; return 0, 0; I”: T, I: Add-FailsaerJh ffailsafe U fmaskinga Tmasking—‘msa mt); return p’, S’; Figure 5.2: The synthesis algorithm for adding failsafe-masking multitolerance. The essence of the algorithm is as follows: first, it identifies the fault-span, such that no computation of the multitolerant program, p’, violates safety in the presence 70 l ‘I I.- E. -4 of f foam fe_ma,k,-,,g. Towards this end, the algorithm identifies the states from where safety may be violated when faults in f fags, fe_ma,k,-,,g occur: ms : StatePred = ms( f failsa jejunum) mt I ACtiOTL = mt(ffailsafe_masking) Then, the algorithm ensures that the multitolerant program can recover to its invariant, S’, when the state of the program is perturbed by fmaskmg: Tmasking 3 StatePred = Tl(fmasking) p1 : Action = add_masking.p1(fma,kmg) S’ : StatePred = add_masking.S1( fmaskmg) Finally, if faults f fag” fe_ma,k,-,,g perturbs the state of the program to a state 3, where s E Tmmhng then the algorithm should ensure that the safety is maintained. Towards this end, we add failsafe f failsa fe_ma,k,~,,g-tolerance to p1 from (Tmasking —ms): T’ : StatePred = C'onstructInvariant(Tmamng — ms, p1 — mt) p’ : Action = 191 — mt— {(30:81) I ((30131) E (P1 - mt)) A (50,6 T') A (31 ¢ T')} In order to verify the soundness of Add_Failsafe_Masking, we prove the following lemmas and theorems: Theorem 5.10: The first condition of the transformation problem (cf. Section 5.2) holds. Formally, (S2 Q S) => (3’ Q S) 71 Theo: 5:2“) It Theo form Lem mull The pros Le For Th For Th. Ihei int: Theorem 5.11: The second condition of the transformation problem (cf. Section 5.2) holds. Formally, (p’ I S2 Q p I So) => (p’ I S’ Q p I S’) Theorem 5.12: Upon the loop termination, the invariant is closed in the program. Formally, (3'232) => closed(S’,p’) Lemma 5.13: In the presence of faults, no computation prefix of failsafe-masking multitolerant program that starts from a state in S’, reaches a state in ms. Formally, (T2 O ms = 0) => VI 3 (Vc 3 197‘ 6f 256(1), U f failsafe_maskinga j ) 3 (coES’) => Vka VI 3 (Vc 3 PT 8f 393(1), U f failsafe.maskingvj) 3 (co E 3’) => Vk I k < j : (ck,ck+1) 4 spec) Lemma 5.15: All computations of p — mt that start from a state in T’ are infinite. Formally, DeadlockStates(p1 — mt)(T’) = 0 Theorem 5.16: All computations of p’ that start from a state in T’ are infinite. Formally, DeadlockStates(p’)(T’) = 0 Theorem 5.17: After termination of the loop, for any state in the fault-span, T’, there exists a computation of p1 that starts from a state in T’ — S’ and reaches the invariant, SI. Formally, ((SI = S2) A (T1(fmasking) = T2» => 72 Vs I s E T’ : reachable(S’,T’,p’, 3) Theorem 5.18: For any synthesized failsafe-masking program, there exists a fault-span. Formally, El T : F aultSpan(T, S’ , p’ U f lem-Isa fe_masking) Proof. Formal proof of lemmas and theorems 5.10 to 5.18 are similar to the corre- sponding lemmas and theorems in sections 4.1.2 and 4.3.2. There are minor differences such as the symbols and the occasions that we expand definitions. For instance, in Section 4.1.2, we proved that no computation that starts from a state in the invari- ant of failsafe fault-tolerant program 5’, violates safety. In Lemma 5.13 and Theorem 5.14, we prove the same thing, but for a different state predicate, the fault-span T’, and different set of faults, f MSG “masking. 73 Chapter 6 Related Work In this chapter, we focus on the previous work on verification of fault-tolerance properties of programs, circuits, and architectures. In Section 6.1, we briefly present the challenges in problem of program synthesis and transformation. In Section 6.2, we introduce the previous work on formal verification of fault-tolerant architectures, processors, and circuits. Then, in Section 6.3, we present related work on mechanical verification of fault-tolerance properties of programs, algorithms, and software related systems. Throughout this thesis, our approach for verification of correctness of the algo- rithms in [8, 10] is by using theorem proving. Model checking is also shown to be an effective tool in verification of behavior of fault-tolerant finite state systems [21-26]. Model checking is widely used in verification of properties of domain specific fault- tolerant controllers and embedded systems. This approach is not quite related to our approach in verification. Hence, we do not get into the details of this context. In Chapter 7 we argue why we chose to use theorem proving techniques to address the problem of verification of the synthesis algorithms. 74 6.1 Challenges in Automatic Synthesis of Pro- grams In this section, we present a review of related work on the problem of automatic synthesis of reactive systems prior to [8]. As mentioned in Chapter 1, automatic synthesis of programs is an algorithmic approach for generating a new program from its specification or from another program. This problem is well studies through different approaches. In [27, 28], the authors present a survey on the problem of automatic synthesis of reactive systems. More specifically, in [27], three main approaches are presented. We now briefly introduce these approaches: Model-Theoretic Approach This approach is based on the extraction of a finite model from a given temporal logic1 specification. A temporal logic specification consists of safety and Iiveness specifica- tions in terms of temporal operators. Model-theoretic approach mostly addresses synthesis of closed reactive systems where there is no interaction between the system and its environment. Several different algorithms have been presented in the literature for synthesis of reactive systems from their temporal logic specification. The initial work was pre- sented by Emerson and Clark [30]. They present an algorithm for synthesis of reactive programs form CTL (Computational Tree Logic) specification. In their formalization, the safety specification is represented by invariance formulas; i.e., for all paths of ex- ecution, nothing bad happens. They represent the Iiveness properties of a program by eventually formulas; i.e., for all paths of execution the eventually formulas hold. The synthesis method is based on a decision procedure that verifies the satisfiability of a CTL temporal logics specification. 1We refer the reader to [29] for a survey on temporal logics. 75 Mana and Wolper [31,32] present a similar algorithm for synthesizing distributed processes that can communicate by message passing. In their approach, the program specification is given as a PLTL (Propositional Linear Temporal Logics) formula. In [33,34], Attie and Emerson present algorithmic methods, for synthesis of dis- tributed and concurrent programs from CTL“ specification. In a more recent work, Arora, Attie, and Emerson [35] present an algorithm for synthesizing fault-tolerant program from CTL specification. Their algorithm consists of two phases. The first phase is basically performing the synthesis algorithm by Emerson and Clarke. In the second phase, they generate a fault-tolerant version of the synthesized program. Automata-Theoretic Approach As mentioned before, model-theoretic approaches address synthesis of closed reac- tive systems. By contrast, automata—theoretic approaches address synthesis of open reactive systems where a program has interactions with its environment. In the automata—theoretic approach, the input of the synthesis algorithm is a tree automaton [36] that represents the specification of the system. The synthesis of an open reactive system involves converting its temporal logic specification to a tree automaton and, then, checking non-emptiness of the tree automaton; i.e., checking whether the language accepted by the tree automaton is empty or not [37—40]. In [37,38], Pnueli and Rosner show that the synthesis algorithm that generates a deterministic finite automaton that satisfies LTL specification has a doubly expo- nential complexity in the size of specification. In [39], Kupferman and Vardi show that the problem of synthesizing an open reactive distributed system is undecidable, if we have no assumptions about the system architecture. Calculational Approach In many cases, it is always desirable to add a prOperty to programs rather than 76 synthesizing a new program from new specification. This desire motivated researchers to design algorithms that can add a particular property to programs in addition to their old properties. In other words, in later approaches we would like to reuse the program and its invariant to calculate a new program and its new invariant. In [41], Arora presents a new foundation for fault-tolerance. Using his formal framework, one can represent any kind of programs, faults, specifications, and fault- tolerance. In Chapter 2, we developed the formal specification of our framework based on Arora’s definitions in [41]. We described the synthesis algorithms developed by Kulkarni and Arora [8] in details in Chapter 4. We also described the algorithms for automatic synthesis of multitolerance [10] in details in Chapter 5. The authors in both [8] and [10], use a calculational approach to generate new program. As mentioned before, in their method the synthesized program is correct by construction and there is no need to verify the individual programs. Furthermore, since their synthesis algorithms begin with an existing fault-intolerant program, the derived fault-tolerant program reuses the fault-intolerant version. Another advantage of their approach is, having the set of bad transitions, program, and its invariant, the algorithm can synthesize the fault- tolerant program. In other words, synthesis is possible without having the entire specification. The issue of complexity is a serious barrier to implement the algorithms in [30, 32—35, 37-40]. In contrast, the time complexity of algorithms in [8,10] show h0pe on practical implementability. More specifically, the time complexity of addition of fault-tolerance for single class of faults in the high atomicity model is polynomial and addition of failsafe and masking fault-tolerance in the low atomicity model are NP- Complete in the size of state space [8,19]. The problem of addition of multitolerance, in general, is NP-Complete. However, the time complexity of additions of failsafe- masking and nonmasking-masking multitolerance are still polynomial. Moreover, 77 In [19], Kulkarni and Ebnenasir identify the polynomial time boundary of the addition of failsafe fault-tolerance in the low atomicity model. In [11], Kulkarni and Ebnenasir focus on the problem of enhancing the addition of fault-tolerance of a nonmasking fault-tolerant program to masking. To implement the synthesis algorithms, Kulkarni, Arora, and Chippada [42] de- velop a set of heuristics to synthesize a canonical version of the Byzantine agreement problem. Based on these heuristics, Ebnenasir and Kulkarni [12] has developed a tool for automatic synthesis of fault-tolerant programs. We refer the reader to [27] to investigate more on the described approaches and detailed comparison. 6.2 Formal Verification of Fault-Tolerant Archi- tectures Hardware vendors are, perhaps, the largest community that use formal methods to model and verify their products. Researchers in this community focus on processes that show equivalence relations between a design and a specification. Simulation helps to gain more confidence, but logic correctness is a more reliable way to verify hardware. In this research area, proving the properties of computer instruction sets is perhaps the most established application of formal methods. Fault-tolerance is usually a crucial part of safety-critical embedded systems. Thus, verification of fault-tolerance properties of hardware in embedded systems is well-studied and there has been enormous number papers and theses published in this area of research. In this section, we present the previous work on formal verification of fault-tolerant architectures, circuits, and processors. Formal verification of fault-tolerant circuits. Kljaich, Smith, and Wojcik [43] focus on verification of logical correctness of fault-tolerance properties of digital 78 systems. More specifically, they use theorem proving together with an extended Petri net representation to validate fault tolerance. Normal Petri net is an abstract machine that rests between Turing machines and finite state machines, in terms of decision power and modeling. Erecution of a Petri net models nondeterministic dynamic behavior and is represented by firing transitions. The extended Petri net theory in [43] avoids the nondeterministic behavior and gives functional capability to the firing of transitions. The authors, first, give an examples of a typical 2-bit ALU to show the modeling capability of their extension to Petri net model. Then, based on their design for the 2-bit ALU, they present a fault-tolerant processor (FTP) that tolerates one or more failures. Fault-tolerance is achieved using redundant modules. Using automated reasoning implemented by PROLOG, the authors show that for possible arrangements of instructions sets, how many redundant modules are required to meet the fault-tolerance specification of the CPU. Formal verification of synchronized fault-tolerant circuits. Verification of fault-tolerant clock synchronization circuits has been a challenging problem in the literature of hardware verification. Miner and Johnson [44] present an optimization of their previous work on formal development of a fault-tolerant synchronization cir- cuit. The authors argue that in the previous work researchers in the formal methods community had been focused on how to verify hardware using particular verification systems and not on type of reasoning support that we need. They represent circuits as systems of stream equations, where each stream corresponds to a signal within the circuit. The signals are annotated with invariants which can be established using proof by co—induction. Miner and Johnson, exploit the invariants to verify that their formally designed circuit satisfies the localized design refinements. More specifically, first, they introduce a design hierarchy with different levels of abstraction. Then, they use PVS as their verification tool to prove the properties of their design. NASA SPIDER project. NASA’s most recent project on fault-tolerant ar- 79 ‘I_ W chitectures is the formally-verified fault-tolerant architecture, Scalable Processor- Independent Design for Electromagnetic Resilience (SPIDER) [45]. SPIDER is a family of fault-tolerant, reconfigurable architectures that provide mechanisms for in- tegrating inter-dependent applications of differing criticalities. SPIDER comes with three protocols: Consistency Protocol, Diagnosis Protocol, and Synchronization Pro- tocol. The first two protocols provide a fault-tolerant broadcast communication archi- tecture. More specifically, the Consistency Protocol takes care of a reliable message broadcast in the presence of malicious faults, using the Byzantine Agreement prob— lem. The Diagnosis Protocol distributes local information about the health status of nodes across the network. In [46], Miner and Geser present mechanical verification of the first two protocols. They present formal proofs in PVS that given the dynamic maximum fault assumption and a sane health status classification, the Consistency Protocol agreement, and the Diagnosis Protocol provides a sane classification of faulty nodes. Abstractions for hardware verification. Melham [47] proposes four ab- stractions for formalizing and verifying hardware. He argues that such abstractions decreases the complexity and size of specification and verification. His argument is based on the fact that specifying hardware in detail and proving that the speci- fication satisfies the requirements through formula equivalence provokes the design hierarchy. Melham suggests a satisfaction relation based on the idea of abstraction, rather than equivalence, is the key to make large designs tractable. He introduces four abstractions: 1. Structural abstraction—the suppression of information about a device’s internal structure. The idea of structural abstraction is that the specification of a de- vice should not reflect its internal construction, but its externally observable behavior. 2. Behavioral abstraction concerns specification that only partially define a de- 80 vice’s behavior. 3. Data abstraction is well known from programming languages theory. 4. Temporal abstraction views a device by sequential time-dependent behavior. Based on [47], fault-tolerance properties of hardware can be specified in structural and behavioral levels of abstraction. Examples in [47] present very top level properties of every hardware design that can be applied to specification of many systems. 6.3 Mechanical Verification of Fault-Tolerance Properties of Programs In this section, we present the related work on formal verification of programs and algorithms that provide fault-tolerance for software systems. The related work in this section has a closer theme to our work in this thesis. In Section 6.3.1, we present related work on formal verification of fault-tolerant embedded and real-time digital systems. In Section 6.3.2, we introduce previous work on mechanical verifi- cation of algorithms that provide fault-tolerance for mutual exclusion and message passing problems. Finally, in Section 6.3.3, we present recent research work on defin- ing abstractions for fault-tolerant computing regardless of algorithm, protocol, and hardware platform. 6.3.1 Formal Verification of Fault-Tolerant Embedded Digi- tal Systems. Embedded systems mostly serve in digital controllers. In many cases, such con- trollers are in charge of monitoring safety-critical systems that need greatest assur- ance. For example, catastrophic failure of digital flight-control systems for passenger 81 aircraft must be ”extremely improbable” [3]. Many researchers has focused on verifica- tion of fault-tolerant digital flight control systems in the past two decades and there has been great advances in specification and verification techniques of such systems in the literature [3,48—52]. Clock synchronization and message passing are two major concerns in such systems. Components across distributed digital embedded systems miss clock signals due to transient and Byzantine faults. These faults can easily lead the entire system to an unsynchronized and out of order state. Hence, verification of fault-tolerant clock synchronization has being paid great attention from researchers and verification engineers. Message passing is another serious issue in embedded systems. Hence, fault-tolerant message passing and message broadcast protocols are also well-studied [3,46]. Verification tools vary from early theorem provers such as the Boyer-Moor theorem prover and EHDM to modern ones such as PVS, HOL, and ACL2. However, most attempts address formal proof techniques for a particular fault-tolerant protocols or algorithms. Although verification of fault-tolerant embedded systems rely on specific charac- teristics of different systems, there has been attempts to abstract the concepts that are common among such systems. For instance, in [49], Rushby introduces and ab- straction for mechanical verification of fault-tolerant time-triggered algorithms. His approach provides a methodology that can ease the formal specification and assurance of critical fault-tolerant systems. Mantel and Gartner [4] argue that modular reason- ing about fault-tolerant systems can simplify the verification process in practice. They exercise their idea on the problem of reliable message broadcast in point-to-point net- works in the presence of failures of processes. 6.3.2 Formal Verification of Fault-Tolerant Algorithms Dijkstra’s self-stabilizing token ring algorithm. A self-stabilizing algo- rithm is one that guarantees that the system behavior eventually recovers to a safe 82 subset of states regardless of the initial state. Obviously, self-stabilization is a type of fault-tolerance. In [6], Dijkstra introduced his well-known self-stabilizing mutual exclusion algorithm (also known as token ring algorithm) in 1974. At the time, he assumed the algorithm is trivially correct and left the proof as an exercise for the reader. However, 12 years later, Dijkstra published a belated proof of correctness of the algorithm in [53], which was not quite trivial! In the algorithm, there are N Processes for N > 1 numbered 0 to N — 1 arranged in a unidirectional ring where each process can observe its own state and that of its predecessor. Each process i maintains a counter v(i) of values from 0 to M — 1, where N 5 M. Process 0 is a distinguished process that is enabled when v(O) = v(N — 1), and when enabled, it can make a transition by incrementing its counter modulo M. A nonzero process i is enabled when v(i) 76 v(i - 1), and when enabled, it can update its counter value so that v(i) == v(i — 1). It can be shown that despite a centralized controller the system can self-stabilize from an arbitrary initial state to a stable behavior where exactly one process is enabled at any state. This algorithm is useful as a solution for distributed mutual exclusion problem; i.e., the enabled process (in other words, the one that has the token) can enter its critical section. Qadeer and Shankar [5], exercise mechanical verification of self-stability property of Dijkstra’s algorithm using PVS. First, they formalize the algorithm using the PVS specification language. Then, based on Dijk- stra’s informal proof sketch they present how to mechanically verify the self-stability property of the algorithm. However, they do not present any sort of abstraction or general design. Hence, there formal proof cannot be easily applied elsewhere. 6.3.3 Formal Verification of Abstractions for Fault-Tolerance In previous sections in this chapter, we presented related work on mechanical verification of specific fault-tolerant circuits, protocols, algorithms, digital embedded systems, and micrOprocessors. In this section, we introduce the related work that 83 attack the problem of verification of fault-tolerant systems by introducing abstractions for programs, faults, and distributed systems. Defining such abstractions make the process of verification less complex and more easy to prove. Abstractions for fault-tolerant distributed systems. In [54], Pike et al present four kinds of abstractions for the design and analysis of fault-tolerant dis- tributed systems. More specifically, they introduce (i) Message Abstractions to ad- dress the correctness of individual messages sent and received; (ii) Fault Abstractions to address the kinds of faults possible as well as their effect in the system; (iii) Fault- Masking Abstractions to address the kinds of local computations processes make to mask faults, and (iv) Communication Abstractions to address the kinds of data com- municated and the properties required for communication to succeed in the presence of faults. The authors formalize the abstractions in PVS. Abstractions for modeling faults, programs, and specifications. As men- tioned in Section 6.2, in [47], Melham introduces four abstraction for hardware veri- fication. Likewise, we need to introduce different levels of abstractions to make the specification and verification of fault-tolerant systems less complex. Moreover, using such abstractions, it is more likely that we can reuse formal proofs and develop proof strategies. In [7, 55, 56], the authors introduce abstractions for modeling faults, pro- grams, specifications, and component-based design of fault-tolerant programs. In [7], Kulkarni, Rushby, and Shankar present a case-study in component-based mechani- cal verification of Dijkstra’s self-stabilizing mutual exclusion algorithm. The authors demonstrate that decomposition of a fault-tolerant program to its components is use- ful to make the verification less complex and develop reusable formal proofs. More specifically, they decompose a fault-tolerant program to a fault-intolerant program and fault-tolerance components that are detectors and correctors. This decomposi- tion facilitates the verification of a given property by focusing on the component that is responsible for satisfying it. In this sense, verification of the entire fault-tolerant 84 program reduces to verification of the component that is in charge of satisfying a certain property. As an exercise, the authors, first, implement the theory of detectors and correctors [2] in PVS. Then, they verify the correctness of Dijkstra’s token ring algorithm using the mentioned theory. Kulkarni et a1 verify the same algorithm that Qadeer and Shankar prove its correctness in [5], but their approach (decomposition of program into its components) is more general and abstract in the sense that their formal proofs techniques are reusable to verify other fault-tolerant program. Briefly, in [5], the authors verify three major concerns. First, they formally prove that in the absence of faults, the fault-tolerant program has the same behavior as the fault- intolerant program. Then, they verify the corrector; i.e., if faults perturb the state of program, at the resulting state, recovery to an ideal state of the fault-intolerant program is possible. Finally, they verify that there is no interference between fault- intolerant program and the corrector. 85 Chapter 7 Discussion In this chapter, we discuss about different aspects of this thesis and try to answer the questions that have been raised about the verification of synthesis algorithms proposed in [8,10]. Theorem proving vs. model checking. The first issue that we would like to address is why we used theorem proving techniques to verify the synthesis algorithms. Model checking is also widely used to verify the fault-tolerance properties of many safety-critical embedded systems. However, model checking can only be used for systems that have finite and known size of state space. In our case, although the state space of the programs that the synthesis algorithms deal with are finite, the size of state space may vary. Therefore, model checking does not seem to be the right technique for verification of the synthesis algorithms. The synthesis method. In Chapter 6, we presented related work on challenges in the problem of program synthesis and mechanical verification of fault-tolerant systems. In [30,32,33,35I, the authors propose algorithms that synthesize a program from its temporal logic specification. In the previous work prior to [8], the input to synthesis algorithms is either an automaton or temporal logics specification and any modification in the specification requires synthesizing the new program from scratch. 86 By contrast, the algorithms in [8,10] reuse the fault-intolerant program to synthesize the fault-tolerant version. This reusability helps to improve the time complexity to some extent. Thus, the algorithms proposed in [8,10] seem to be suitable candidates for practical implementation purposes. As an extension of the algorithms in [8], in [42], the authors introduce a set of heuristics for synthesizing distributed fault-tolerant programs in polynomial time. Based on the heuristics, Ebnenasir and Kulkarni have developed a tool for synthesizing fault-tolerant programs [12]. Therefore, by formal verification of the algorithms in [8,10], we gain more confidence on their practical implementations as well. Advantages of mechanical verification of algorithms for the synthesis of fault-tolerant algorithms. Fault-tolerant systems are often in need of strong assurance. Mechanical verification is a reliable way to ensure that the fault-tolerance requirements of a system are met. We find that verification of algorithms for synthesis of fault-tolerance is a systematic and abstract way for formal verification of fault- tolerance. High level of abstraction. The algorithms presented in [8, 10] make no assump— tions about the system, except that they have finite state space. This high level of abstraction enables the algorithms to be applicable to synthesize both finite state hardware and software systems. Our focus on formal verification of such abstract algorithms makes it possible to extend our work to verify other algorithms in [11,42] that are extensions of algorithms in [8] for any system regardless of the platform and architecture. In addition, having the formal specification and verification developed in this thesis, we can easily verify the extensions of the algorithms by reusing the specification developed in this thesis. Correctness of synthesized programs. Another advantage of verifying a synthesis algorithm rather than individual fault-tolerant programs is to guarantee that any synthesized program by the algorithm is correct by construction. Thus, we are not 87 required to verify individual synthesized programs. Reusability of formal proofs. Although most of the related work on formal veri- fication of fault-tolerance provide confidence in correctness of their concerns, reusing the formal proof of one, in verification of others is not quite convenient. Manual reusability of formal proofs is the first step to develop proof strategies. As an illus- tration, in Section 4.3.2, we showed how we manually reused the formal proofs of Add.failsafe to verify the soundness of Add_masking. The issue of completeness. A synthesis algorithm is complete iff for any given program p with the invariant S, if there exists a solution p’ with invariant S’ that satisfies the transformation problem then the algorithm always finds program p’ and state predicate S’ . In Section 4.1.2, we have shown that the algorithm for adding failsafe fault-tolerance in complete. However, in this thesis, we did not spent a great deal of attention on verification of the completeness of the algorithms due to two reasons. First, in order to show the correctness of an algorithm, we are mostly interested in verification of the soundness of the algorithm. In our context, we need to prove that the program synthesized by the algorithm is fault-tolerant indeed; i.e., the synthesized program satisfies the transformation problem. Second, in the low atomicity model, proving the completeness of a deterministic polynomial time synthesis algorithm is irrelevant in the sense that the problem of adding fault- tolerance to distributed programs is known to be NP-Complete [8,19]. In other words, we have to apply heuristics [11, 42] to synthesize distributed fault-tolerant programs in polynomial time. As a result of applying such heuristics, we lose the completeness of the synthesis algorithms; i.e., if the heuristics are not applicable then the synthesis algorithm fails to synthesize a fault-tolerant distributed program although there may exist a fault-tolerant program that satisfies the requirements of the transformation problem. Lessons learned. Formalization and verification of the synthesis algorithms 88 and fixpoint theory presented in chapters 2 to 5 were not our first and last attempts. In fact, we tried several other ways to model the problem, but they were either not abstract enough or they were too complicated to handle. We summarize the lessons we learned as follows: 0 Sometimes it is very tempting to prove the easy-looking theorems before actually verifying them manually. This temptation may lead the developer to spend a lot of time to prove an incorrect theorem. Basically, if you cannot do a proof by hand you cannot do it automatically. 0 In many cases a failed proof teaches more than a successful proof, because the developer gains a better understanding of the problem and formalization through a failed proof. 0 Improper formal specification usually leads us to state wrong or difficult theo- rems. For instance, our first attempt to formalize the largest fixpoint calculation was based on an inductive definition on the finite set itself rather the steps of fixpoint calculation. Here is our first attempt to model the largest fixpoint calculation for removing deadlock states from a given state predicate X: Dec (X: StatePred) : RECURSIVE StatePred = LET ds = DeadlockStates(X) IN IF ds = {} THEN X ELSE Dec (X - ds) ENDIF MEASURE IXI This formalization is probably the simplest way to model a recursive function on a finite set, but using the induction schemes for finite sets in PVS is not quite 89 convenient and proof of very simple theorems turn to huge proof trees. On the other hand, by defining the recursive function based on the steps of fixpoint calculation rather than the finite set itself and using induction on the number of steps, which is an integer, theorems become much easier to prove. Hence, a simple formalization does not necessarily lead the developer to easier proofs. The development of the fixpoint theory was inspired by several implementations of specific fixpoint calculations. In fact, we did not develop the fixpoint theory before developing the synthesis algorithms! In many cases, several implemen- tations of special cases of a problem leads the developer to develop a general theory once and instantiate it to verify the special cases. 90 Chapter 8 Conclusion and Future Work In this thesis, we focused on the problem of verifying transformation algorithms that generate fault-tolerant programs that are correct by construction using PVS. We considered the programs that are subject to a single class or multiple classes of faults. For the case that a program is subject to a single class of faults we consid~ ered three types of fault-tolerance that are, failsafe, nonmasking, and masking. We verified soundness and completeness of synthesis algorithms for adding failsafe and nonmasking fault-tolerance. For addition of masking fault-tolerance, we only con- sidered verification of the soundness. For the synthesis algorithms associated with addition of multitolerance, we verified the soundness of addition of failsafe-masking and nonmasking-masking. The theory for fixpoint calculations on finite sets. The essence of adding failsafe and masking fault-tolerance as well as failsafe-masking, and nonmasking- masking multitolerance is calculating the new invariant of program, which in turn involves calculating the fixpoint of a formula. We developed a theory for fixpoint calculation on finite sets that is customized for specification and verification of fault- tolerance. More specifically, we introduced the formalization of smallest fixpoint calculation to calculate reachability of a set of sates. Moreover, we presented the 91 formalization of largest fixpoint calculation to calculate the set deadlock states of a given state predicate. The theory developed in Chapter 3 is expected to be reusable for other formalizations that involve fixpoint calculations. Mechanized verification of fault-tolerance and multitolerance. For the case that programs are subject to a single class of faults, we considered verification of addition of three levels of fault-tolerance. More specifically, first, we developed a for- mal framework for modeling fault-tolerance in PVS. This framework introduces formal definitions for programs, specifications, faults, and fault-tolerance. Then, using the framework, we formally proved that in the presence of faults, any synthesized program by the Add.failsafe algorithm never violates safety and in the absence of faults, the program maintains its specification. For the Add_nonmasking algorithm, we showed that the nonmasking fault-tolerant program provides recovery to the normal behavior when the state of the program is perturbed by faults, and in the absence of faults it maintains its specification. Finally, we verified that a masking fault-tolerant program provides safe recovery when the state of the program is perturbed by faults, and in the absence of faults it maintains its specification. In other words, we mechanically verified that the algorithms for addition of fault-tolerance in [8] are sound. We also addressed formal verification of completeness of failsafe. More specifically, we proved if there exists a failsafe program that satisfies the transformation problem (cf. Section 2.5), then the algorithm Add.failsafe never declares failure. For the case that programs are subject to multiple classes of faults, we consid- ered verification of addition of two types of multitolerance. More specifically, first, we extended our formal framework for fault-tolerance, so that our formal defini- tions can deal with multiple classes of faults. Then, using the extended framework, we formally proved that in the presence of faults, any synthesized program by the Add.Failsafe_Masking algorithm (i) never violates safety when the state of the pro- gram is perturbed in the presence of the class of faults that failsafe fault-tolerance is to 92 be provided; (ii) provides safe recovery when the state of the program is perturbed in the presence of the class of faults that masking fault-tolerance is to be provided; (iii) provides failsafe fault-tolerance if faults from both types occur, and (iv) maintains its specification in the absence of faults. For the Add_Nonmasking_Masking algorithm, we showed that the nonmasking-masking multitolerant program (i) provides safe re- covery to the normal behavior when the state of the program is perturbed by class of faults that masking fault-tolerance is to be provided; (ii) provides recovery to its invariant when the state of the program is perturbed by class of faults that nonmask- ing fault-tolerance is to be provided; (iii) provides nonmasking fault-tolerance if faults from both types occur, and (iv) maintains its specification in the absence of faults. In other words, we mechanically verified that the algorithms for addition of multitol- erance in [10] are all sound. Verification of multitolerance is one of the instances that we simply reused both formal specification and proofs of the framework developed in Chapter 2 and the algorithms developed in Chapter 4. We expect this reusability for verification of other extensions of the algorithms [11,42]. The algorithms verified in this thesis synthesize programs in the high atom- icity model, where a process can read and write all variables in an atomic step. In [8], the authors have presented a non-deterministic algorithm for synthesizing fault-tolerant distributed programs. Moreover, in [10], the authors have introduced a non-deterministic algorithm for synthesizing failsafe-nonmasking multitolerant pro- grams. The non-deterministic algorithms in both cases, first, guess a solutions and then verifies a solution by itself. Formal verification of such an algorithm that verifies its output by itself does not have any interesting point and, hence, we did not include them in this thesis. Soundness vs. completeness. In this thesis, we concentrated more on verifi- cation of soundness of algorithms rather than their completeness due to two reasons. First, in order to show the correctness of an algorithm, we are mostly interested in 93 verification of the soundness of the algorithm. In our context, we need to prove that programs synthesized by an algorithm are fault-tolerant indeed. Second, proving the completeness of a non-deterministic polynomial time synthesis algorithm is irrelevant for the problem of adding fault-tolerance in the sense that they are known to be N P- Complete. In order to deal with NP-Complete problems in addition of fault-tolerance, several heuristics have been proposed [11,42]. As a result of applying such heuristics, we lose the completeness of the synthesis algorithms. Therefore, we have to only concentrate on verifying the soundness of such heuristics unless P = NP. Advantages of verification of synthesis algorithms. Since we focus on verification of the transformation algorithms, we note that our results ensure that the programs synthesized using these algorithms indeed satisfy their required fault- tolerance properties. Thus, our approach is more general than verifying a particular fault-tolerant program. In a broader context, the verification of the algorithms considered in this thesis will assist us in verifying several other transformations. For example, the algorithms in [8, 9] have also been used to synthesize fault-tolerant distributed programs. As an illustration, we note that the algorithms in [11,12,42I that are extensions of the algorithms in [8, 9] have been used to synthesize solutions for several fault-tolerant programs including, Byzantine agreement, consensus, token ring, and alternating bit protocol. Thus, the theories developed in this thesis are directly applicable to verify the transformation algorithms in [11,12,42] as well. Future work. As a future work, one may consider mechanical verification of heuristics developed in [42] to synthesize fault-tolerant distributed programs. Also, formal verification of the algorithms proposed in [11] for enhancing the synthesis of masking fault-tolerant distributed programs through their nonmasking version is also an interesting problem. Our experience shows that significant number of proofs were reused. For instance, 94 we manually reused proofs of failsafe fault-tolerance to verify the soundness of the algorithm for synthesizing masking fault-tolerant programs. We also, reused formal proofs of soundness of addition of failsafe, nonmasking, and masking fault-tolerance to verify the algorithms for addition of multitolerance. We expect to reuse many of the theorems and proofs in future verifications as well. Therefore, a future work is deve10ping automated proofs, called proof strategies, based on our experience in reusability of formal proofs. 95 APPENDICES 96 Appendix A Formal Specification of the Fault-Tolerance Hamework FTEstate: TYPE]: THEORY BEGIN ASSUMING ST.is_finite: TRJsJ‘lnite : IMPORTING IMPORTING IMPORTING IMPORTING IMPORTING IMPORTING ASSUMPTION is_finite_type [state] ASSUMPTION is_finite_type [ [state , state] ] sets [state] setsJemmas [state] finitesets [state] sequences [state] finite_fp [state] min_nat 97 ENDASSUMING Transition: TYPE = [state, state] Action: TYPE = finiteset [Transition] Computation(Z: Action): TYPE = {A: sequence[state] l V (n: nat): ((A(n), A(n+1))€ Z)} prefix(Z: Action, j: nat): TYPE = {c: sequence l V ((i: nat | i < j)): ((c(i), c(i+1))€Z)} full_state: JUDGEMENT fullset [state] HAS_TYPE finiteset StateSpace: finiteset = fullset [state] S: StatePred T: StatePred p: Action f: Action sf : Action 3, so, 31: VAR state X, Y: VAR StatePred Z: VAR Action 2', j, k: VAR nat g: VAR DecFunc r: VAR IncFunc closed?(S: StatePred, P: Action): bool = V (30. 81)3 “((30, 30613) A (3065)) 3 (3155)) 98 proj(p: Action, S: StatePred): Action = (so, .91 I ((so, 31) 6p) A (3063) /\ (8165)} faultspan?(T, S: StatePred, p: Action): bool = (V so: (so E S) D (so 6 T)) /\ closed?(T, (pr)) axl: AXIOM closed?(S, p) ax3: AXIOM V (c: Computation((ZUf))): (3 (j: nat): (V ((n: nat I n 2 j)): ((c(n), c(n+1))€Z))) ax5: AXIOM nonempty?(S) ms_init(anyFault: Action): StatePred = {so I 3 (31): ((so, sl)EanyFault) /\ ((so, 31)Esf)} reverse.reachable.states(anyFault: Action) (rs: StatePred): StatePred = {80 l El (81): ((31 6 rs) /\ ((so, 31) E anyFault) /\ fl (so 6 rs))} rs_if: JUDGEMENT reverse_reachablestates(anyFault: Action) HAS_TYPE IncFunc deadlockstates(p: Action) (ds: StatePred): StatePred = {30 l 99 (so Eds) /\ (V (31): (.91 Eds) D n ((so, 31) Ep))} ax6: AXIOM empty? (deadlockstates (p) (S) ) d1_df: JUDGEMENT deadlockstates(p: Action) HAs-TYPE DecFunc ms(anyFault: Action): StatePred = nu_fix (msjnit (anyFault) ) (reverseJeachablestates (anyFault) ) ms(i) (anyFault: Action): StatePred :- nu_inc (i , ms_init (anyFault) ) (reverseJeachablestates ( anyFault) ) mt(anyFault: Action): Action 2 {30, 31 | ((31 €ms(anyFault)) V ((so, 31) Esf))} ConstructInvariant (X : StatePred , Z : Action) : StatePred = mu_fix (X) (deadlockstates (Z) ) finitefault: THEOREM V (c: Computation((ZU f))): 3 (j: nat): (V (n: nat): ((suffix(c, j)(n), suffix(c, j)(n+1))EZ)) END FT 100 Appendix B Formal Specification of the Fixpoint Theory finitejpET: TYPE]: THEORY BEGIN ASSUMING IMPORTING sets [T] IMPORTING setsJemmas [T] IMPORTING finitesetSETJ IMPORTING mucalculus [T] T.is_finite: ASSUMPTION is_finite_type[Tl ENDASSUMING StatePred: TYPE = finiteset [T] IncFunc: TYPE = [A: StatePred —> {B: StatePred I disjoint?(A, B)}] DecFunc: TYPE = [A: StatePred —-> {B : StatePred I (Bg A)}] StateSpace: finiteset = fullset [T] 101 s: VAR T 2', j, k: VAR nat X: VAR StatePred g: VAR Dethnc r: VAR IncFunc fullset_finite: JUDGEMENT fullset[T] HAs_TYPE finiteset nu_inc(i: nat, X: StatePred) (r: IncFunc): RECURSIVE StatePred = IF i = 0 THEN X ELSE (nu_inc(i— 1, X)(r)Ur(nu_inc(i— 1, X)(r))) EN- DIF MEASURE (A (:23: nat, y: StatePred): 2:) nu_fix(X: StatePred) (r: IncFunc): StatePred = {s | 3 (k: nat): (sEnancUc, X)(r))} mu_dec(i: nat, X: StatePred) (g: DecFunc): RECURSIVE StatePred = [F i = 0 THEN X ELSE (mu_dec(i— 1, X)(g)\g(mu.dec(i— 1, X)(g))) EN- DIF MEASURE (A (x: nat, y: StatePred): x) mu_fix(X: StatePred) (g: DecFunc): StatePred = {s I V (k: nat): (s€mu_dec(k, X)(g))} sfpcal(fpSet: StatePred) (r: IncFunc): RECURSIVE StatePred = IF nonempty? (r(fpSet)) THEN sfpcal((r(fpSet) U fpSet)) (r) ELSE fpSet EN- DIF 102 MEASURE card(StateSpace) — card(fpSet) lfpcal(fpSet: StatePred) (g: DecFunc): RECURSIVE StatePred = IF empty?(g(fpSet)) THEN fpSet ELSE lfpcal((fpSet\g(fpSet)))(g) ENDIF MEASURE card (fpSet) mul: LEMMA (j < k) D (mu_dec(k, X)(g) Qmu_dec(j, X)(g)) mu3: LEMMA V (j: nat) : nonempty? (g (mu_dec(j, X) (g))) D card(mu.dec(j + 1, X) (g)) g card(X) —j -1 mu5: LEMMA V (j: nat): empty?(g(mu.dec(j, X) (g))) D (V ((k: nat I k 2 j)): empty?(g(mu-dec(k, X)(g)))) mu6: LEMMA 3 (j: nat): V ((k: nat l k _>_ 3)): empty?(g(mu-dec(k, X)(g))) mu7: LEMMA 3 (j: nat): V ((k: nat I k 2 j)): mu_dec(k, X) (g) = mu_dec(j, X) (g) /\ empty?(g(mu.dec(k, X) (g))) mu8: LEMMA 3 (j: nat): 103 ((mu-dec(j, X) (g) = mu_fix(X) (g)) /\ empty?(g(mu_dec(j, X)(g)))) mu9: THEOREM empty? (9 (mu_fix (X) (g) )) mulO: THEOREM mu_fix(X) (g) = mu_fix (mu_fix(X) (g)) (g) nul: LEMMA (k < j) 3 (nu_inc(k, X)(r)§nu.inc(j, X)(r)) nu2_3: LEMMA (sEnancU-I—l, X)(r)) D ((s e r(nu_inc(j, X)(r))) v (3 ((k: nat l k g j)): (s€nu_inc(k, X)(r)))) nu2_4: LEMMA (senufix(X)(r)) D (3 (j: nat): (s€r(nu.inc(j, X)(r))) v (3 ((k: nat I k S j)): (sEnancUc, X)(r)))) nu3: LEMMA V (j: nat): nonempty? (r (nu.inc (j , X) (r) )) D card(nu_inc(j+ 1, X)(r)) S card(fullset[TI) — card(X) — j — 1 nu5: LEMMA V (j: nat): 104 empty?(r(nu_inc(j, X) (r))) 3 (V ((k: nat I k 2 j)): empty?(r(nu_inc(k, X)(r)))) nu6: LEMMA 3 (j: nat): V ((k: nat I k _>_ 3)): empty?(r(nu_inc(k, X)(r))) nu7: LEMMA 3 (j: nat): V ((k: nat I k 2 j)): nu_inc(k, X) (r) = nu_inc(j, X) (r) /\ empty?(r(nu_inc(k, X) (r))) nu8: LEMMA 3 (j: nat): ((nu_inc(j, X) (r) = nu_fix(X) (r)) /\ empty?(r(nu_inc(j, X) (r)))) nu9: THEOREM empty? (r (nu_fix (X) (r) )) nu10: THEOREM nu.fix(nu_fix(X) (r)) (r) = nu.fix(X) (r) END finite_fp 105 Appendix C Formal Specification of the Synthesis of Failsafe Fault-Tolerance add_failsafe [statez TYPE]: THEORY BEGIN ASSUMING ST.is_finite: ASSUMPTION is_finite_type [state] T R_is_finite: ASSUMPTION is_finite_type[[state, state]] IMPORTING FT [state] ENDASSUMING i, j: VAR nat so, 31: VAR state 106 Sp(anyFault: Action): StatePred :2 ConstructInvariant((S \ ms(anyFault)) , (p \ mt(anyFault))) pp(anyFault: Action): Action = (p \ mt(anyFault)) {50, sll -: so Sp(anyFault) A 3 member(so, Sp(anyFault))} sc: {c: Computation(f) I v ((3: state I (SEmanit(f)))): (c(O) = s) A ((c(O), c(1))€sf)} scl: {c: Computation(f) I V ((t: Transition I (tE(pr]sf)))): c(O) = t‘2} isfailsafe?(Spp: StatePred, ppp: Action): bool = nonempty?(Spp) A closed?(Spp, ppp) A (Spp Q S) A (pr0j(ppp, Spp) Q proj(P, Spp)) A empty? (deadlockstates (ppp) (Spp) ) A (V ((c: Computation((pppU f)) I (c(O) E Spp))): V (j: nat): -1 ((c(j), c(j+1)) Esf)) add_failsafe_fails?: bool 2 empty? (Sp (f) ) 107 rs_disj: LEMMA V (X : StatePred) : disjoint? (X , reverse_reachable_states(f) (X )) rs_empty: LEMMA reverse.reachable.states( f) (emptyset) = emptyset disj _Sp_ms: THEOREM disjoint? (Sp(f) , ms (f) ) disj _Sp_sf : THEOREM disjoint? (Sp (f) , ms_init (f) ) ms_f_sf : LEMMA V (30, 81)3 ((30, 51)€f) A ((30, SIIESf) D (806ms(f)) prop: LEMMA (ms_init(f) g ms(f)) disj_pp_sf: LEMMA disjoint?(pp(f), sf) prop2: LEMMA V (anyFault: Action): (((so, 31) E anyFault) A (31 E ms(anyFault))) D (so E ms(anyFault)) pr0p3: LEMMA V (80, 81): (((so, 31) 65f) A ((30, 31) E f)) 3 (306 ms(f)) prOp4: LEMMA V (j: nat): 108 ((30 e ms(j+1)(f)) A 3 (30 E ms(j)(f))> D (3 (so: ((30, 31) 6f) A (31 €ms(j)(f))) fix3: THEOREM ConstructInvariant(Sp( f) , (p \ mt( f )) = Sp( f ) scondl: THEOREM (Sp(f) g S) scond2: THEOREM (pr0j(pp(f), Sp(f)) g proj(p, Sp(f))) scond2_1: COROLLARY (pp( f) 9 p) scond3_1: LEMMA closed?(Sp(f) , pp(f)) scond3_1_1: THEOREM V (j: nat): V (C: prefix((pp(f)Uf). 1)): (c(0)€Sp(f)) D (V ((k: nat | k < j)): 3 (c(k)€ms(f))) scond3-1_2: THEOREM V (j: nat): V (c: prefix((pp(f)Uf), 3)): (6(0) ESp(f)) D (V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf)) scond3_2: THEOREM V (c: Computation (pp(f))): (c(0)€Sp(f)) D (V (j: nat): (c(j) ESp(f))) 109 scond3_3: THEOREM empty? (deadlockstates ( (p \ mt( f ))) (Sp(f) )) scond3_3-1 : THEOREM empty? (deadlockstates (pp (f) ) (Sp (f) ) ) scond3-4: THEOREM V (c: Computation((pp(f)Uf))): (c(0)ESp(f)) D (V (j: nat): 3 (c(j) €ms(f))) scond3-5: THEOREM V (c: Computation((pp(f)Uf))): (c(O) ESp(f)) D (V (j: nat): 3 ((c(j), c(j+1)) Esf)) scond3-6: THEOREM 3 (T: StatePred): faultspan?(T, Sp(f), (pp(f)Uf)) ccondl: LEMMA V ((3: state I (sEms(i)(f)))): (3 ((c: Computation(f) I c(O) = S)): 3 (k: nat): ((c(k), c(k+1))€sf)) ccond2: LEMMA V ((3: state I (s€ms(f)))): (3 ((c: Computation(f) I c(O) = 8)): 3 (k: nat): ((c(k), c(k+1))€sf)) ccond3: LEMMA 110 V ((t: Transition I (tEmt(f)))): 3 ((c: Computation(f) l c(0) = t‘1)): 3 (k: nat): ((c(k), c(k+1))€sf) ccond4: THEOREM V (Spp: StatePred, ppp: Action): is_failsafe?(Spp, ppp) D (Spp Q (S\mS(f))) ccond5: THEOREM V (Spp: StatePred, ppp: Action): isiailsafe?(Spp. ppp) D (pr0j(ppp, Spp) Q (p\mt(f))) ccond6: LEMMA (3 (Spp: StatePred, ppp: Action): is_failsafe?(Spp, ppp)) D nonempty?((S \ ms(f))) completeness: THEOREM (3 (Spp: StatePred, ppp: Action): is_failsafe?(Spp, ppp)) D 3 add_failsafe_fails? END add_failsafe 111 Appendix D Formal Specification of the Synthesis of Nonmasking Fault-tolerance addmonmasking Estate: TYPE]: THEORY BEGIN ASSUMING ST_is_finite: ASSUMPTION is.finite_type [state] TR_is_finite: ASSUMPTION is_finite_type[[state, state]] IMPORTING FT [state] ENDASSUMING so, 31: VAR state 112 Sp(anyS: StatePred): StatePred = anyS pp(anyS: StatePred, anyp: Action): Action —_— (Pr0j(anyp, anyS)U{80, 81 I T (30 E anyS) A (816 anyS)}) scond1_1: LEMMA (Sp(S) g S) scond1_2: LEMMA closed?(Sp(S). PMS, p)) scond2: LEMMA (proj(pp(S, p), Sp(S))Qproflp, Sp(S))) scond3: LEMMA V (c: Computation(pp(S, p))): 3 ((j: nat I j > 0)): (c(j)€Sp(S)) scond4: LEMMA V (c: Computation((pp(S, p)Uf))): 3 ((j: nat I j > 0)): (c(j)ESp(S)) soundness: THEOREM 3 (T: StatePred): faultspan?(T, Sp(S), (pp(S, p)Uf)) END addmonmasking 113 Appendix E Formal Specification of the Synthesis of Masking Fault-tolerance addmaskingEstate: TYPE]: THEORY BEGIN ASSUMING ST_is.finite: ASSUMPTION is.finite_type [state] TR_is_finite: ASSUMPTION is_finite_type[[state, state]] IMPORTING addiailsafe [state] ENDASSUMING s, so, 31: VAR state 52 , T2 : StatePred 114 mask_ax: AXIOM (S2 Q T2) reachable?(S, T: StatePred, p: Action, 3: state): bool = 3 (c: Computation(p)): (sET) A s = 6(0) A (3 (j: nat): (c(j)ES)) S_init(anyFault: Action): StatePred = ConstructInvariant ((S \ ms(anyFault)) , (p \ mt(anyFault))) T_init(anyFault: Action): StatePred = {s l 3 (sE ms(anyFault))} TmS : Action 2 {30: 31 I (fi (80 652)) /\ ((306712) A (316T2))} p1(anyFault: Action): Action = ((proj(p, S2)UTmS)\mt(anyFault)) TmR(anyFault: Action): StatePred = {s I (sETQ) A reachable?(Sg, T2, p1(anyFault), 3)} TcL(anyFault: Action) (X: StatePred): StatePred = {30 I 3 (81): (30 E X) /\ ((30, 81) E anyFault) A "I (81 E X)} ConstructFaultSpan(X : StatePred , anyFault: Action) : StatePred = mu_fix (X) (TcL(anyFault)) 115 T1(anyFau1t: Action): StatePred = ConstructFaultspan (TmR (anyFault) , anyFault) S1(anyFault: Action): StatePred = ConstructInvariant ((32 fl T1(anyFault)) , p1(anyFault)) T1_T2: LEMMA (T1(f) Q T2) SIJSZ: SIJTI: propl : prop2: LEMMA (51(f) E 32) THEOREM V (f: Action): (Sl(f) Q T1(f)) LEMMA disjoint?(T2, ms(f)) D disjoint?(T1 (f), ms(f)) LEMMA V (f: Action): empty?(TcL(f) (T1(f))) sf_p1: LEMMA disjoint?(p1(f), sf) prOp: LEMMA V (f : Action): ConstructFaultspan(T1 (f), f) = T1(f) scondl: LEMMA V (f: Action): closed?(Sg, p1(f)) scond2: LEMMA V (f: Action): closed?(Tg, p1(f)) scond3_1: THEOREM (52 Q S) 3 (Sl(f) Q S) 116 scond3-2: THEOREM (proj(pdf), S2)§pr0j(P, S2)) 3 (Pf0.l(P1(f)a 51(f))§proj(z?, Sl(f))) scond4: LEMMA V (f: Action): closed?(T1(f), f) scond4_1: THEOREM V (f: Action): (Sl(f) = Sg) D closed?(Sflf), p1(f)) scond4.2: THEOREM V (f: Action): (T1(f) = T2) 2) closed?(T1(f), (P1(f)Uf)) scond7: LEMMA (T1 (f) = T2) 3 closed?(Tg, f) scond8: THEOREM V (f: Action): (((Sl(f) = S2) A (T1(f) 2 T2)) 3 (V ((3: state I (sET1(f)))): reachable?(Sl(f), T1(f), p1(f), s))) scond8_2: THEOREM V (f: Action): (((Sl(f) = 32) A (T1(f) 2 T2)) 3 (V ((3: state I (sET1(f)))): reachable?(Sl(f), T1(f), pp(f), S))) scond3-3: THEOREM V (c: Computation((p1(f)Uf))): 117 (c(0)6T1(f)) 3 (V (j: nat>= 3 (c(j) €ms(f))> scond3-4: THEOREM (T1(f) = T2) 3 (V (c: Computation((p1(f)Uf))): (C(0)€T1(f)) D (V (j: nat): 3 ((60), C(j+1))€sf))> scond3_5: THEOREM (T1 (f) = T2) 3 (3 (T: StatePred): faultspan?(T, 51(f). P1(f))) scond3_6: THEOREM empty? (deadlockstates (p1 (f) ) (51 (f) )) scond3_8: LEMMA (Sl(f) = 32 A T1(f) = T2) 3 (V ((c: Computation((pp(f)Uf)) | (c(0)€T1(f)))): 3 ((j: nat | j > 0)): (c(j)€Sl(f))) scond3-9: THEOREM (Sl(f) = 52 A T1(f) 2 T2) 3 (V (j: nat): V (c: prefix((p1(f)Uf), 3)): (0(0) €T1(f)) 3 (V ((k: nat l k < j)): 3 (c(k)€ms(f)))) scond3_10: THEOREM (51(f) = $2 /\ T1(f) = T2) 3 (V (j: nat): 118 V (c: prefix((pp(f)Uf). 2)): (6(0) €T1(f)) D (V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf))) END adenasking 119 Appendix F Formal Specification of the Synthesis of Multitolerance adenultift [statez TYPE]: THEORY BEGIN ASSUMING ST _is_finite: ASSUMPTION is_finite_type [state] TR_is_finite: ASSUMPTION is_finite_type[[state, state]] IMPORTING add_failsafe [state] IMPORTING addmonmasking [state] IMPORTING adenasking [state] ENDASSUMING i, j: VAR nat so, 31: VAR state 120 Z : VAR Action f_failsafe: Action Lnonmasking: Action fJnasking: Action f_nonmasking_masking: Action 2 (fJnaskingULnonmasking) f_failsafe_masking: Action = (fJnaskingLijailsafe) ax4: AXIOM V (c: Computation((Z U fmonmasking)» : (3 (j: nat): (V ((n: nat I n 2 j)): ((c(n), c(n+1))€Z))) Sp1: StatePred = add.masking.S1 (fJnasking) Tp1: StatePred = add_nonmasking.Sp(T1(f.masking)) ppl: Action = addmonmasking.pp(Tp1, p1(f_masking)) ms: StatePred = FT .ms(f_failsafe.masking) mt: Action = FT .mt(f_failsafe_masking) 121 TJnasking: StatePred = T1(f_masking) p1: Action 2 add_masking.p1(f_masking) Sp2: StatePred = adenasking.Sl(f_masking) Tp2: StatePred = ConstructInvariant((T-masking\ms) , (p1\mt)) pp2: Action = (p1 - mt)- (sO, sl)| ((30, 31) 6 (p1 — mt)) AND ((sOETp2) AND NOT (3] ETp2)) finite_fault_nonmasking: THEOREM V (c: Computation( (Z Ufmonmaskingnnasking)»: 3 (j: nat): (V (n: nat): ((suffix(c, j)(n), suffix(c, j)(n+1))€Z)) scond3: LEMMA V (c: Computation(pp1)): 3 ((j: nat I j > 0)): (c(j)€Tp1) scond4: LEMMA V (c: Computation ( (pp1 U f_nonmasking_masking))) : 3 ((j: nat I j > 0)): (c(j)6Tp1) scond1-l: THEOREM (52 Q S) 3 (Sp1 Q S) scond1-3-2: THEOREM (proj(ppl, S2)§pr0j(p, 52)) D (proj(ppl, Sp1)§pr0i(p, Sp1)) 122 scond1-3_5: THEOREM (Spl Q Tpl) scond1-3_3: LEMMA (adenasking.T1(f_masking) = T2) 3 closed?(Tpl, (pplUfJnasking)) scond1_3_4: LEMMA (adenasking.S1(f_masking) = 52) D closed?(Spl, ppl) scond1_8: THEOREM ((81 (f_masking) = 52) A (T1(f_masking) == T2)) 3 (V ((3: state | (seTp1))): reachable?(Sp1, Tpl, ppl, 3)) scond1_3_6: THEOREM 3 (T: StatePred): faultspan?(T, Sp1, (pplUfmonmaskingnnaskingfl scond2_1: THEOREM (32 Q S) :3 (Sp2 Q S) scond2_3_2 : THEOREM (proj(pp2, $2) Sprojm 82)) D (proj(pp2, SP2) 9 pr0j(p, Sp2)) scond2_3_3: THEOREM (Sp2 = S2) 3 closed?(Sp2, pp2) scond2_3_5: THEOREM (Sp2 Q Tp2) scond2_3-1 -1 : THEOREM disjoint?(T2, ms) 3 (V (j: nat): 123 V (c: prefix((pp2Uf_failsafe.masking), j)): (C(0)€Sp2) I) (V ((k: nat I k < j)): 3 (c(k)€ms))) scond2_3_1-2: THEOREM disjoint?(T2, ms) 3 (V (j: nat): V (c: prefix((pp2Uf_failsafe_masking), 3)): (C(0) E Sp2) D (V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf))) scond2_3_3_1 : THEOREM empty? (deadlockstates((p1 \mt)) (Tp2)) scond2-3-3_2: THEOREM empty? (deadlockstates (pp2) (Tp2)) scond2_8: THEOREM ((Sp2 2 S2) A (Tp2 2 T2)) 2) (V ((3: state I (sETp2))): reachable?(Sp2, Tp2, pp2, 8)) scond2_3_6: THEOREM 3 (T: StatePred): faultspan?(T, Sp2, (pp2Uf_failsafeJnasking)) END add_multift 124 [1] I2] [3] [4] I5] [6] [7] l8] I9] I10] I11] [12] Bibliography K. Bhargavan, D. Obradovic, and C. Gunter. Formal verification of standards for distance vector routing protocols. Journal of the ACM, 49(4):538 — 576, 2002. S. S. Kulkarni. Component-based design of fault-tolerance. PhD thesis, Ohio State University, 1999. Sam Owre, John Rushby, Natarajan Shankar, and Friedrich von Henke. Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS. IEEE Transactions on Software Engineering, 21(2):107—125, February 1995. Heiko Mantel and Felix C.Gartner. A case study in the mechanical verification of fault-tolerance. Technical Report TUD-BS—1999-08, Department of Computer Science, Darmstadt University Of Technology, 1999. S. Qadeer and N. Shankar. Verifying a self-stabilizing mutual exclusion algo- rithm. In David Gries and Willem-Paul de Roever, editors, IFIP International Conference on Programming Concepts and Methods (PROCOMET ’98), pages 424—443, Shelter Island, NY, June 1998. Chapman & Hall. E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commu- nications of the ACM, 17(11), 1974. S. S. Kulkarni, J. Rushby, and N. Shankar. A case-study in component-based mechanical verification of fault-tolerant programs. Proceedings of the 19th IEEE International Conference on Distributed Computing Systems Workshop on Self- Stabilization ( WSS ’99) Austin, Texas, USA, pages 33-40, June 1999. S. S. Kulkarni and A. Arora. Automating the addition of fault-tolerance. Formal Techniques in Real- Time and Fault— Tolerant Systems, 2000. S. S. Kulkarni and A. Arora. Automating the addition of fault-tolerance. Techni- cal Report MSU-CSE—OO—13, Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, 2001. S. S. Kulkarni and A. Ebnenasir. Automated synthesis of multitolerance. IEEE Conference on Dependable and Network Systems (DSN’04), 2004. S. S. Kulkarni and A. Ebnenasir. Enhancing the fault-tolerance of nonmasking programs. International Conference on Distributed Computing Systems, 2003. Ali Ebnenasir and Sandeep S. Kulkarni. A framework for automatic synthe- sis of fault-tolerance. http://www.cse.msu.edu/”sandeep/software/Code/ synthesis-frameworkl. [13] B. Alpern and F. B. Schneider. Defining Iiveness. Information Processing Letters, 21:181—185, 1985. 125 [14] A. Arora and M. G. Gouda. Closure and convergence: A foundation of fault- tolerant computing. IEEE Transactions on Software Engineering, 19(11):1015- 1027, 1993. [15] N. Shankar, S. Owre, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS Prover Guide: Version 2.4. Computer Science Laboratory, SRI International, Menlo Park, CA, November 2001. URL: http://pvs.csl.sri.com/manuals.html. [16] S. Owre, N. Shankar, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS System Guide, Version 2.4. Computer Science Laboratory, SRI International, Menlo Park, CA, December 2001. URL: http://pvs.csl.sri.com/manuals.html. [17] S. Owre, N. Shankar, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS Lan- guage Reference, Version 2.4. Computer Science Laboratory, SRI International, Menlo Park, CA, December 2001. URL: http://pvs.csl.sri.com/manuals.html. [18] H. P. Barendregt. Handbook of Theoretical Computer Science: Volume B, Chap- ter 7, Functional Programming and Lambda Calculus. Elsevier Science Publishers B. V., 1990. [19] S. S. Kulkarni and A. Ebnenasir. The complexity of adding failsafe fault- tolerance. International Conference on Distributed Computing Systems, 2002. [20] A. Arora and S. S. Kulkarni. Component based design of multitolerant systems. IEEE Transactions on Software Engineering, 24(1):63—78, January 1998. [21] Tomoyuki Yokogawa, Tatsuhiro Tsuchiya, and Tsuchiya Kikuno. Automatic verification of fault tolerance using model checking. International Symposium on Dependable Computing, 2001. [22] Wilfried Steiner and John Rushby. Model checking a fault-tolerant startup algo- rithm: From design exploration to exhaustive fault simulation. IEEE Conference on Dependable and Network Systems (DSN’04), 2004. [23] John Rushby. Sal tutorial: Analyzing the fault-tolerant algorithm om(1). Tech- nical report, CSL, SRI International, 2004. [24] F. Schneider, S. M. Easterbrook, J. R. Callahan, and G. J. Holzmann. Validat- ing requirements for fault tolerant systems using model checking. Third IEEE Conference on Requirements Engineering, 1998. [25] P. Ramanathan and K. G. Shin. Use of common time base for checkpointing and rollback recovery in distributed systems. IEEE Transactions on Software Engineering, 19:571-583, 1993. [26] Jean-Charles Fabre, Thomas Losert, Eric Marsden, Nick Moffat, Michael Paulitsch, David Powell, and William Simmonds. Validation of fault tolerance and timing properties. Project Deliverable DSC2 for DSoS, Research Report 20/ 2003, Technische Universitat Wien, Institut fiir Technische Informatik, 'Il‘e- itlstr. 1-3/ 182-1, 1040 Vienna, Austria, 2003. 126 [27] Ali Ebnenasir. Automatic synthesis of distributed programs: A survey. 2002. http://www.cse.msu.edu/“ebnenasi/survey.pdf. [28] N. S. Bjorner. A servey of reactive synthesis. 1996. http://theory.stanford. edu/peoPle/nikolaj/dimacs96.ps. [29] E. A Emerson. Handbook of Theoretical Computer Science: Chapter 16, Tempo- I30] [31] I32] I33] I34] I35] [36] I371 [38] I39] [40] ml and Modal Logics. Elsevier Science Publishers B. V., 1990. EA. Emerson and EM. Clarke. Using branching time temporal logic to synthe- sis synchronization skeletons. Science of Computer Programming, 2(3):241—266, 1982. Z. Manna and R. Waldinger. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems, 2(1):90—121, 1980. Z. Manna and P. Wolper. Synthesis of communicating processes from temporal logic specifications. ACM Transactions on Programming Languages and Systems, 6(1):68-93, 1984. P. Attie and E. Emerson. Synthesis of concurrent systems with many similar pro- cesses. ACM Transactions on Programming Languages and Systems, 20(1):51— 115, 1998. Paul C. Attie and E. Allen Emerson. Synthesis of concurrent programs for an atomic read/write model of computation. ACM Transactions on Programming Languages and Systems (TOPLAS), 23(2):187 -— 242, 2001. A. Arora, P. C. Attie, and E. A. Emerson. Synthesis of fault-tolerant concurrent programs. Proceedings of the 17th ACM Symposium on Principles of Distributed Computing (PCDC), 1998. W. Thomas. Handbook of Theoretical Computer Science: Chapter 4, Automata on Infinite Objects. Elsevier Science Publishers B. V., 1990. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings of the 16th ACM Symposium on Principles of Programming Languages, pages 179—190, 1989. A. Pnueli and R. Rosner. On the synthesis of an asynchronous reactive module. In Proceeding of 16th International Colloqium on Automata, Languages, and Programming, Lec. Notes in Computer Science 372, Springer-Verlagz652—671, 1989. O. Kupferman and M.Y. Vardi. Synthesizing distributed systems. In Proc. 16th IEEE Symp. on Logic in Computer Science, July 2001. M. Y. Vardi. An automata—theoretic approach to fair realizability and synthesis. Computer Aided Verfification, volume .939 of LNCS,, pages 267—278, 1995. 127 [41] A. Arora. A foundation of fault-tolerant computing. PhD thesis, The University Of Texas at Austin, 1992. [42] S. S. Kulkarni, A. Arora, and A. Chippada. Polynomial time synthesis of Byzan- tine agreement. Symposium on Reliable Distributed Systems, 2001. [43] J. Kljaich Jr., B. T. Smith, and A. S. Wojcik. Formal verification of fault tol- erance using theorem-proving techniques. IEEE Transactions on Computers, 38(3):366 — 376, 1989. [44] PS. Miner and SD. Johnson. Verification of an Optimized fault-tolerant clock synchronization circuit. Third Workshop on Designing Correct Circuits (DCC96), 1996. [45] NASA Langley Research Center Formal Methods Group. SPIDER homepage. http://shemesh.1arc.nasa.gov/fm/spider/. [46] Alfons Geser and Paul S. Miner. A formal correctness proof of the SPIDER diagnosis protocol. In Theorem Proving in Higher-Order Logics (TPHCLs) - Track B Proceedings, number NASA/CP-2002—211736, pages 71—86, NASA Lan- gley Research Center, Hampton, VA, August 2002. [47] TR Melham. Abstraction mechanisms for hardware verification. In G. Birtwistle and RA. Subrahmanyam, editors, VLSI Specification, Verification, and Synthe- sis, pages 129-157, Boston, 1988. Kluwer Academic Publishers. [48] Ben L. Di Vito and Ricky W. Butler. Formal techniques for synchronized fault- tolerant systems. 3rd IFIP Working Conference on Dependable Computing for Critical Applications, 1992. [49] John Rushby. Systematic formal verification for fault-tolerant time-triggered algorithms. IEEE Transactions on Software Engineering, 25(5):651-661, 1999. [50] Natarajan Shankar. Mechanical verification of a generalized protocol for byzan- tine fault tolerant clock synchronization. In J. Vytopil, editor, Formal Techniques in Real— Time and Fault- Tolerant Systems, volume 571 of Lecture Notes in Com- puter Science, pages 217—236. Springer-Verlag, 1992. [51] Patrick Lincoln and John Rushby. The formal verification of an algorithm for interactive consistency under a hybrid fault model. In Costas Courcoubetis, editor, Computer-Aided Verification, CAV ’93, volume 697 of Lecture Notes in Computer Science, pages 292—304, Elounda, Greece, June/July 1993. Springer- Verlag. [52] John Rushby. Formal verification of an oral messages algorithm for interactive consistency. Technical Report CSL—91-01, SRI International, 1992. [53] E. W. Dijkstra. A belated proof of self-stabilization. Distributed Computing, (1):5—6, 1986. 128 [54] Lee Pike, Jeffery Maddalon, Paul Miner, and Alfons geser. Abstraction for fault- tolerant distributed system verification. Theorem Proving in Higher- Order Logics (TPHCLs), 2004. [55] S. S. Kulkarni, Borzoo Bonakdarpour, and Ali Ebnenasir. Mechanical verification of automatic synthesis of fault-tolerant programs. International Symposium on Logic-Based Program Synthesis and Transformation (LOPS TR), 2004. [56] S. S. Kulkarni, Borzoo Bonakdarpour, and Ali Ebnenasir. Mechanical verification of automatic synthesis of failsafe fault-tolerance. Theorem Proving in Higher— Oreder Logics (TPHOLs) Emerging Trends, 2004. 129 IIIIIIIIIIIIII[III]