A 2.55.2...‘57.’ ”.5.
"1:1 -5. .

onyx
J. 2 in

.. , .
“WWI: . .514; x. ;

:1- F4
. f .

‘ r, ’L .
111...; . . . gut; ,

I t» .

 

1..
gm? 3.
s

. win?

 

 

 

 

 

). n 15.1.
. .2 . .. .t
i 3421:: \

 

3v...
x

A? ..a\
11-. slur.

St . .1
..ﬁw...d v.8

can
.9
.. . . r »
kn}. want!)
. r .1 0.11 I
.3: in.

.139;

:wﬁwwﬁmwwmw

\ .47.

 

h

mantgwﬁyﬁii.

 

 

, V
a hat: .

 

i

b
,

3
.‘b

t.)
D
~E.

7(07?%

\OSL

This is to certify that the
thesis entitled

S

MECHANICAL VERIFICATION OF AUTOMATIC SYNTHESIS
OF FAULT-TOLERANT PROGRAMS

presented by

BORZOO BONAKDARPOUR

has been accepted towards fulﬁllment
of the requirements for the

MS. degree in Computer Science

 

 

Major Professor’s Signature
I‘M 6! ° 9
J

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

 

LIBRW'
Michigan State
University

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRCJDateDuepes-p. 15

 

Al'TU

 

MECHANICAL VERIFICATION OF
AUTOMATIC SYNTHESIS OF FAULT-TOLERANT PROGRAMS
By
BORZOO BONAKDARPOUR

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE
Department of Computer Science

2004

 

 

 

 

AUTO:

Faun-1019]
verification of
able to ensure
0f algorithms
prOgram usin;
CorrectneSS Of
themed by th
deﬁne a frame
sists of defini-
an abstract \x'
address the I)
where progra]
algorithms in

{01' ﬁXDOint C;

ABSTRACT

MECHANICAL VERIFICATION OF

AUTOMATIC SYNTHESIS OF FAULT-TOLERANT PROGRAMS

By

Borzoo Bonakdar‘pour

Fault-tolerance is a crucial property in many computer systems. Thus, mechanical
veriﬁcation of algorithms associated with synthesis of fault-tolerant programs is desir-
able to ensure their correctness. In this thesis, we present the mechanized veriﬁcation
of algorithms that automate the addition of fault-tolerance to a given fault-intolerant
program using the PVS theorem prover. By this veriﬁcation, not only we prove the
correctness of the synthesis algorithms, but also we guarantee that any program syn-
thesized by these algorithms is correct by construction. Towards this end, we formally
deﬁne a framework for formal speciﬁcation and veriﬁcation of fault-tolerance that con-
sists of deﬁnitions of programs, speciﬁcations, faults, and levels of fault-tolerance in
an abstract way, so that they are independent of platform and architecture. We also
address the problem of formal veriﬁcation of automatic synthesis of multitolerance,
where programs are subject to multiple classes of faults. The essence of the synthesis
algorithms involves ﬁxpoint calculations. Hence, we also develop a reusable theory

for ﬁxpoint calculations on ﬁnite sets in PVS.

 

DEdicata

€71

Dedicated to my parents in Iran and my brother for their faith, support, and

encouragements. This thesis would not exist without their love.

iii

 

 

First Of
ﬁnanCial‘ a
of mathpm
aISO intrOdI
techniunS'
with him 3‘
tolerance P
how to face

My guiti
and Dr. C
from all thf
and interest
his valuable
I would like
Esfahanian.
semesters.

It has bet
accessible sou
addition of f2

to mention h.

ACKNOWLEDGEMENTS

First of all I thank my advisor, Dr. Sandeep Kulkarni, for offering me technical,
ﬁnancial, and moral support during the last two years. He introduced me to the area
of mathematical approaches to automate program synthesis and transformation. He
also introduced me to computer-assisted veriﬁcation and speciﬁcally theorem proving
techniques. Much of the results reported in this thesis is inspired by my discussions
with him about his ideas on developing a general framework for veriﬁcation of fault-
tolerance properties of systems. He helped me to understand what research is and
how to face and solve a new problem.

My guidance committee comprised of Dr. Sandeep Kulkarni, Dr. Betty Cheng,
and Dr. Charles Ofria, and I am grateful that I had access to valuable guidance
from all three of them. I express my thanks to Dr. Cheng for her critical reading
and interesting questions during my defense. I express my thanks to Dr. Ofria for
his valuable questions and comments on practical applications of this thesis. Also,
I would like to truly thank our department’s graduate director, Dr. Abdol—Hossein
Esfahanian, for offering ﬁnancial support through teaching assistantships for several
semesters.

It has been a great pleasure to work closely with Ali Ebnenasir. He was the most
accessible source for understanding the concepts of program synthesis and automatic
addition of fault-tolerance. He also co—authored two papers with me. It is impossible
to mention his innumerable contributions towards my work.

My special thanks go to my colleagues Laura Campbell, Mahesh Arumugam, and
Sascha Konrad for their comments and proof reading of my papers. I would like to
use this opportunity to thank all my friends on Michigan State University campus;

because of them my stay here has been very enjoyable and memorable.

iv

 

 

LIST OF I

1 Introdu
1,1 Aul
12 The
1.3 Org-

2 lVIOdeliI
2.1 An

" 2.1.
2.1.

2.1..

2.2 Pro;

2.3 Spec

24 Fan:

2.5 PrOI

3 ATheor
3.1 Cale,
3.1.]

3.1.2

3.2 Cale
3.2.1

3.2.2

4 SpeCiﬁCa
Toleranc,
4'1 Auto

4.1.1
4.1.2
4.2 Amt)

4.2.1

4.2.2

AUtOr

4.3.1

4.3.2

‘ ddin

4.3

4.4

SPBCiﬁcatj

a-nCe

5.1 Fallltg
titoler

Table of Contents

LIST OF FIGURES vii
1 Introduction 1
1.1 Automated Reasoning About F ault-Tolerance ............. 2
1.2 Thesis Contributions ........................... 3
1.3 Organization of the Thesis ........................ 4

Modeling a Framework for Fault-Tolerance and Problem Statement 6
2.1 An Introduction to PVS ......................... 7
2.1.1 The PVS Speciﬁcation Language ................ 7
2.1.2 The PVS Typechecker ...................... 8

2.1.3 The PVS Theorem Prover .................... 9
2.2 Program .................................. 10
2.3 Speciﬁcation ................................ 12
2.4 Faults and Fault-Tolerance ........................ 13
2.5 Problem Statement ............................ 16
A Theory for Fixpoint Calculations on Finite Sets 19
3.1 Calculating the Largest Fixpoint .................... 21
3.1.1 Speciﬁcation of the Largest Fixpoint Calculation ........ 21
3.1.2 Veriﬁcation of the Properties of the Largest Fixpoint ..... 23
3.2 Calculating the Smallest Fixpoint .................... 27
3.2.1 Speciﬁcation of the Smallest Fixpoint Calculation ....... 28
3.2.2 Veriﬁcation of the Properties of the Smallest Fixpoint ..... 30
Speciﬁcation and Veriﬁcation of Algorithms for Synthesis of Fault-
Tolerance 32
4.1 Automatic Addition of Failsafe Fault-Tolerance ............ 34
4.1.1 Speciﬁcation of the Synthesis of Failsafe Fault-Tolerance . . . 34
4.1.2 Veriﬁcation of the Synthesis of Failsafe Fault-Tolerance . . . . 38
4.2 Automatic Addition of Nonmasking F ault-Tolerance .......... 47

4.2.1 Speciﬁcation of the Synthesis of Nonmasking Fault-Tolerance . 47
4.2.2 Veriﬁcation of the Synthesis of Nonmasking Fault-Tolerance . 49

4.3 Automatic Addition of Masking Fault-Tolerance ............ 51
4.3.1 Speciﬁcation of the Synthesis of Masking Fault-Tolerance . . . 51
4.3.2 Veriﬁcation of the Synthesis of Masking Fault-Tolerance . . . 56

4.4 Adding Fault-Tolerance in the Low Atomicity Model ......... 6O

Speciﬁcation and Veriﬁcation of Automatic Synthesis of Multitoler-

ance 62

5.1 Faults, Fault-tolerance, and Multitolerance ............... 63

5.2 The Problem of Mechanical Veriﬁcation of Automatic Synthesis of Mul-
titolerance ................................. 65

 

 

5.3 SD“
.\Iul
5.4 SD“
tOIm

6 Related
6.1 Cha
6.2 FOUI
6.3 Klee}

6.3.1
6.3.2
6.3.3

7 Discussic
8 Conclusil
APPENDIC
A Formal S
B Formal S
C Formal s.
D Formal S]
E Formal S]
F Formal SI
LIST OF RE

 

6

7

8

5.3 Speciﬁcation and Veriﬁcation of Synthesis of Nonmasking-Masking
Multitolerance ...............................
5.4 Speciﬁcation and Veriﬁcation of Synthesis of Failsafe-Masking Multi-
tolerance ..................................

Related Work

6.1 Challenges in Automatic Synthesis of Programs ............
6.2 Formal Veriﬁcation of Fault-Tolerant Architectures ..........
6.3 Mechanical Veriﬁcation of Fault-Tolerance Properties of Programs . .

66

70

74
75
78
81

6.3.1 Formal Veriﬁcation of Fault-Tolerant Embedded Digital Systems. 81

6.3.2 Formal Veriﬁcation of Fault-Tolerant Algorithms ........
6.3.3 Formal Veriﬁcation of Abstractions for Fault-Tolerance . . . .

Discussion

Conclusion and Future Work

APPENDICES

A
B
C

Formal Speciﬁcation of the Fault-Tolerance Framework
Formal Speciﬁcation of the Fixpoint Theory

Formal Speciﬁcation of the Synthesis of Failsafe Fault-Tolerance

82
83

86

91

96

97

101

106

D Formal Speciﬁcation of the Synthesis of Nonmasking Fault-tolerance112

E
F

Formal Speciﬁcation of the Synthesis of Masking Fault-tolerance

Formal Speciﬁcation of the Synthesis of Multitolerance

LIST OF REFERENCES

vi

114

120

125

 

 

31
32
33
34

41
42
43

'The
Lari
The

Sun)

TTR*
The
The

The
The

 

 

3.1
3.2
3.3
3.4

4.1
4.2
4.3

5.1
5.2

List of Figures

The relationship between 9, :c, and :1: — g(:r) ............... 22
Largest ﬁxpoint calculation. ....................... 22
The relationship between r, 2:, and a; U r(a:). .............. 28
Smallest ﬁxpoint calculation ........................ 29
The synthesis algorithm for adding failsafe fault-tolerance ....... 35
The synthesis algorithm for adding nonmasking fault-tolerance . . . . 48
The synthesis algorithm for adding masking fault-tolerance ...... 52

The synthesis algorithm for adding nonmasking-masking multitolerance 66
The synthesis algorithm for adding failsafe-masking multitolerance. . 70

vii

 

 

Chat

Intro

Autmnz
methods, ha:
instructions
resent. years
systems, whe
computer sys

For deca
80fithms hav.
ematical tech
Hence, man}...
build frame“.C
SUCh COmDUter
than manual D

AlltOlllate(

include all poss

this Category ..

 

Chapter 1

Introduction

Automated logical reasoning about computer systems, widely known as formal
methods, has been successful in a number of domains. Proving properties of computer
instructions sets is perhaps the most established application of formal methods in the
resent years [1]. Another application of formal methods is in veriﬁcation of critical
systems, where strong assurance is required to gain more conﬁdence in correctness of
computer systems such as avionic and air trafﬁc control systems.

For decades, mathematicians have worked on techniques for verifying that al-
gorithms have the properties they are expected to have. While traditional math-
ematical techniques are valuable in this context, they are prone to human errors.
Hence, many researchers have focused on computer-aided veriﬁcation techniques to
build frameworks for automated logical reasoning about computer systems. Thus,
such computer-aided techniques for veriﬁcation allow us to obtain more conﬁdence
than manual proofs.

Automated proof systems fall into two broad categories [1]. The ﬁrst can be seen
as an extension of testing where automated support is exploited to create tests that
include all possible cases, thus providing a proof of correctness. Model checkers fall in

this category. The second can be seen as a formalization of mathematics where logic

 

 

is used IO
to aid the
this thesis

of autorna

1.1 A

Fault-I
ality in the
hence, one
Mechanical
The related
programs. I
mechanical \
Cartner veril
Shankar [,5] r
clusion token
aIEOII'thm by ,

While the
Programs bein,

A more general

prOSTaIns.

 
  

With this

is used to characterize mathematical reasoning, and automated formal support is used
to aid the creation and checking of proofs. Theorem prouers fall in this category. In
this thesis, we focus on using theorem proving techniques to reason about properties

of automatic synthesis of fault-tolerant programs.

1.1 Automated Reasoning About Fault-Tolerance

Fault-tolerance is the ability of programs to provide a desirable level of function-
ality in the presence of faults [2]. It is a necessity in most computer systems and,
hence, one needs strong assurance of fault-tolerance properties of a given system.
Mechanical veriﬁcation of such systems is one way to get a strong form of assurance.
The related work in the literature has focused on veriﬁcation of concrete fault-tolerant
programs. For example, Owre, Rushby, Shankar, and Henke [3] propose a concrete
mechanical veriﬁcation for a fault-tolerant digital-ﬂight control system. Mantel and
Géirtner verify the correctness of a fault-tolerant broadcast protocol [4]. Qadeer and
Shankar [5] mechanically verify the self-stability property of Dijkstra’s mutual ex-
clusion token ring algorithm [6]. Kulkarni, Rushby, and Shankar [7] verify the same
algorithm by exploiting the theory of detectors and correctors [2].

While the veriﬁcations performed in [3—5, 7] enable us to gain conﬁdence in the
programs being veriﬁed, it is difﬁcult to extend these veriﬁcations to other programs.
A more general approach, therefore, is to verify algorithms that generate fault-tolerant
programs.

With this motivation, in this thesis, we focus on the problem of verifying algo-
rithms that synthesize fault-tolerant programs. With such veriﬁcation, we are guar-
anteed that all the programs generated by the synthesis algorithms indeed satisfy
their fault-tolerance requirements. Towards this end, we verify the transformation

algorithms presented by Kulkarni and Arora, in [8,9], and Kulkarni and Ebnenasir

 

 

in [10], using;
of transform
algorithms II
gram to a m
of faults. To
in PVS. Thif
and levels of
algorithms 31'
synthesized b
no need to re

We note I
simultaneous 1
programs [1 1 .i
in veriﬁcation
the Synthesis
sets in PVS.

calculations m

1-2 The

In this tl

 

of fault-tolera:

IOIIOWS:

' “'9 Provi

 

programs

abSII‘a(-t].

 

in [10], using the PVS theorem prover. The algorithms in [8,9], focus on the problem
of transforming a given fault-intolerant program to a fault-tolerant program. The
algorithms in [10], focus on the problem of transforming a given fault-intolerant pro-
gram to a multitolerant program, where the program is subject to multiple classes
of faults. To verify these algorithms, we, ﬁrst, model a framework for fault-tolerance
in PVS. This framework consists of deﬁnitions for programs, speciﬁcations, faults,
and levels of fault-tolerance. Then, we verify that the programs synthesized by the
algorithms are indeed fault-tolerant. By this veriﬁcation, we ensure that any program
synthesized by these algorithms is also correct by construction and, hence, there is
no need to verify the individual synthesized programs.

We note that the algorithms in [8,9], are the basis for their extensions to deal with
simultaneous occurrence of multiple faults from different types [10] and for distributed
programs [11,12]. Thus, the veriﬁcation of transformation algorithms in [8,9] is helpful
in veriﬁcation of algorithms in [10—12]. Since ﬁxpoint calculation is at the heart of
the synthesis algorithms, we also develop a theory for ﬁxpoint calculations on ﬁnite
sets in PVS. This theory is reusable for other formalizations that involve ﬁxpoint

calculations over ﬁnite sets as well.

1.2 Thesis Contributions

In this thesis, we concentrate on mechanical veriﬁcation of automatic synthesis
of fault-tolerant and multitolerant program. The contributions of this thesis are as

follows:

0 We provide a foundation and a formal framework for modeling fault-tolerance,
where programs are subject to a single class of faults, and multitolerance, where
programs are subject to multiple classes of faults. This framework is designed

abstractly in such a way that it is independent of speciﬁc programs and plat-

forms.

0 We develop a theory in PVS for ﬁxpoint calculations over ﬁnite sets. This

theory is expected to be reusable elsewhere.

0 We verify the correctness of the synthesis algorithms in [8, 9] for automatic
addition of fault-tolerance, so that not only we ensure their correctness, but also
we guarantee that any program synthesized by the algorithms is also correct by

construction.

0 We verify the correctness of the synthesis algorithms in [10] for automatic ad-
dition of multitolerance, so that not only we ensure their correctness, but also
we guarantee that any program synthesized by the algorithms is also correct by

construction.

1.3 Organization of the Thesis

The organization of the thesis is as follows: in Chapter 2, ﬁrst, we present an
introduction to PVS. Then, we provide the formal deﬁnitions of programs, speciﬁca-
tions, faults, and fault-tolerance. Then, we formally state the problem of mechanical
veriﬁcation of synthesis of fault-tolerant programs. In Chapter 3, we develop and
verify a theory for ﬁxpoint calculations over ﬁnite sets. In Chapter 4, based on the
formal framework developed in Chapter 2 and the ﬁxpoint calculation theory, we for-
malize and verify the synthesis algorithms proposed in [8] in PVS. In Chapter 5, ﬁrst,
we state the transformation problem for synthesizing multitolerant programs. Then,
we present how we extend our formal framework, so that it can deal with simulta-
neous occurrence of faults from multiple classes. Based on the extended framework,
then, we mechanically verify the algorithms for synthesizing multitolerant programs.

In Chapter 6, we present the related work on the problem of program synthesis and

mechanical veriﬁcation of fault-tolerance. In Chapter 7, we discuss about different
aspects of the problem of veriﬁcation of synthesis of fault-tolerant programs and we
answer the questions have been raised on this work. Finally, we make concluding

remarks and discuss future work in Chapter 8.

Chapter 2

Modeling a Framework for
Fault-Tolerance and Problem

Statement

In this chapter, we describe how we develop a general framework to model
fault-tolerance 1. More speciﬁcally, we give formal deﬁnitions of elements of a fault-
tolerance framework that are programs, speciﬁcations, faults, and fault-tolerance.
Based on the deﬁnition of the above elements, we formalize the synthesis algorithms
in [8] in Chapter 4. Note that the deﬁnitions must be extendable, so that we can reuse
them as a basis to formalize the deﬁnitions regarding multiple classes of faults and
multitolerance in Chapter 5. In addition, we also discus how we model the deﬁnitions
in PVS in an abstract level, so that they are independent of any particular program
and platform.

In our framework, programs are speciﬁed in terms of their state space and their
transitions. The deﬁnition of speciﬁcations is adapted from Alpern and Schneider [13].

The deﬁnitions of faults and fault-tolerance are adapted from Arora and Gouda [14]

 

1Appendix A contains the formal speciﬁcation of this framework.

and Kulkarni [2]. We give the deﬁnition of multitolerance and its related concepts in
Chapter 5 separately.

Before getting into details of formalization of fault-tolerance, we give an intro-
duction to the PVS theorem prover. More speciﬁcally, we introduce the terminology
that we effectively use throughout the thesis. The reader may skip Section 2.1, if
(S)he is familiar with PVS.

2.1 An Introduction to PVS

In this section we give a brief introduction to PVS and its syntax and seman-
tics from the PVS user manuals [15—17]. PVS (Prototyping and Veriﬁcation System)
provides an integrated environment for the development and analysis of formal speci-
ﬁcation and supports a range of activities involved in creating, analyzing, modifying,
managing and documenting theories and proofs [16]. PVS runs on SUN 4 SPARC work-
stations using Solaris 2 or higher and PC systems running on Redhat Linux. PVS is
implemented in Common Lisp, but it is not necessary to know Lisp to effectively use

the system. The Emacs editors provide the interface to PVS.

2.1.1 The PVS Speciﬁcation Language

The speciﬁcation language of PVS is built on higher-order logics [17]. Thus,
functions can take functions as arguments and return them as values, and quantiﬁ-
cations can be applied to function variables. There is a set of built-in types and
type-constructors, as well as the notion of subtypes. We effectively use the features
of higher-order deﬁnitions in formalizing our framework for fault-tolerance.

Speciﬁcations are logically organized into parameterized importable theories and
datatypes. In this thesis, every synthesis algorithm is modeled in a separate theory.

Also, we have two more theories for formalizing our general framework for fault-

tolerance and ﬁxpoint calculations over ﬁnite sets. Now, we brieﬂy review the elements
and concepts of PVS language that we use in deveIOping our speciﬁcations.

Types. Semantically, there are four kinds of types in PVS [17]:

o Uninterpreted types support abstraction by providing a means of introducing
a type with a minimum of assumptions on the type and imposing almost no

constraints on an implementation of the speciﬁcation.

0 Interpreted types are primarily a means for type expressions. Integers, Booleans,

etc., are interpreted types.
0 Dependent types may be deﬁned in terms of an earlier deﬁned type.

0 Subtypes introduce a set of elements that is a subset of the set of elements in

the super type.

Our PVS speciﬁcations are founded on propositional logic, set theory, and theory
of inﬁnite sequences. These are built-in theories along a rich set of lemmas and

theorems in the PVS prelude that assist us in proving our theorems

2.1.2 The PVS Typechecker

The PVS typechecker analyzes theories for semantic consistency and adds se—
mantic information to the internal representation built by the parser [16]. The type
system of PVS is not algorithmically decidable; i.e., theorem proving may be required
to establish the type-consistency of a PVS speciﬁcation. Hence, in order to decide on
type checking, PVS needs assistance from the developer. The proof obligations that
need to be proved are called Type-Correctness Conditions (TCCs).

In many cases, when a PVS speciﬁcation contains similar deﬁnitions, the type-
checker may generate same proof obligations. Judgments provide a means for con-

trolling this by allowing the developer to prove a proof obligation once and reuse

the formal proof to discharge the same TCCs. This kind of judgements are called
constant judgements. There exists another kind of judgments, subtype judgements,
which is useful for discharging TCCS generated for subtypes. In this thesis, since we

have no subtypes in our speciﬁcations, we only state and prove constant judgements.

2.1.3 The PVS Theorem Prover

PVS provides an interactive proof checker (prover) with the ability to store and
replay proofs. PVS can be instructed to perform a single proof, or to rerun all the
proofs in a given theory, all the proofs of all the lemmas used in a proof, or all the
proofs in an entire speciﬁcation. The prover maintains a proof tree, and it is the goal
of the user to construct a proof tree which is complete, in the sense that all of the
leaves are recognized as true. Each node of the proof tree is a proof goal that follows
its offspring nodes by means of a proof step.

The PVS proof checker employs a sequent calculus. Each proof goal of proof tree
is a sequent. Each sequent consists of a sequence of formulas called antecedents and
a sequence of formulas called consequents. The intuitive interpretation of a sequent

is that the conjunctions of the antecedents implies the disjunctions of the consequents:

(A1 A A2 A A3...)=> (81 V 82 V 83....)

Now, we brieﬂy describe some of the terminology that we use throughout this

thesis for veriﬁcation of our theorems:

0 Since our speciﬁcations effectively contains recursive deﬁnitions, induction is our

basic tool to reason about the properties of the recursive deﬁnitions.

0 Skolemization is the way to remove universal quantiﬁers from consequents and

existential quantiﬁers from antecedents.

 

 

o lnstan

univei

for sm

0 Case z

and ti

o The ru

Ience.

' “'8 use

taining

0 The G;
Simpliﬁ
explicit
many 0

° The GR
metic Si

aflSlng ]

 

A prograr.

from Its dolna

o lnstantiation is the way to remove existential quantiﬁers from consequents and

universal quantiﬁers from antecedents. For instance, if we are to prove:

(V$=f(£v)=$+1)=> (f(P)=P+1)

for some constant p, all we need to do is instantiating a: with p.

0 Case analysis of the current sequent splits the conjunctions in the consequent

and the disjunctions in the antecedent to separate subgoals. For instance,

(A=>B/\C) E (A=>B)/\(A=>C).

o The rule of extensionality converts a set equivalence to a propositional equiva-

lence. For instance,

A g B E Va: : (A(:r) => B(a:)).

0 We use propositional and arithmetic simplifications to simplify the sequent con-

taining solvable propositional and arithmetic formulas.

o The GRIND strategy performs skolemization and instantiation, propositional
simpliﬁcation, rewriting using lemmas as rewrite rules, deﬁnition expansion,
explicit case analysis according to the case structure in the goal, and performs

many of these steps repeatedly until no further simpliﬁcation is possible [15].

o The GROUND command invokes propositional simpliﬁcation followed by arith-
metic simpliﬁcation and it is useful in obtaining simpliﬁed forms of the cases

arising from propositional simpliﬁcation [15].

2.2 Program

A program p is a ﬁnite set of transitions in its state space. In a program-speciﬁc
approach a state of a program p is obtained by assigning each variable of p a value

from its domain. However, as we do not deal with a speciﬁc program, in our formal

10

speciﬁcation, we model the notion of state by an UNINTERPRETED TYPE in PVS (cf.
Section 2.1.1).

Likewise, a transition is modeled as a pair of states, which is also deﬁned as an
uninterpreted type. The programs of our interest are ﬁnite-state programs. Hence,
we assume that the number of states and transitions are ﬁnite. We model this by two
ASSUMPTIONS in PVS to express that the types state and transition are ﬁnite. The
state space of p, denoted by Sp, is the set of all possible states of p. In PVS, we model
the state space by the fullset of the type state. We deﬁne the following JUDGEMENT
to avoid getting repetitive type-checking proof obligations from the PVS type-checker
(cf. Section 2.1.2):

Judgement 2.1: 3,, is a ﬁnite set.

Proof: We discharge the judgement by considering the assumption that the type
state is ﬁnite and applying the built-in ﬁnite_full lemma in the PVS prelude. In
ﬁnite.full lemma, it is proved that the fullset of a ﬁnite type is a ﬁnite set.

The type Action denotes ﬁnite sets of transitions. We model program, p, by a
subset of S, x Sp. This actually means p can be any constant subset of 5,, X S,D and,
hence, a program can be any set of transitions in its state space. A state predicate of
p is a subset of 3,. In PVS, we model a state predicate, StatePred, by a ﬁnite set over
states. This abstraction in deﬁnitions of program and state predicate is necessary to
ensure that the veriﬁcations are correct for any program synthesized by the algorithms
veriﬁed in chapters 4 and 5.

A state predicate S is closed in the program p if and only if for all transitions

(so, 31) in p, if so 6 S then 31 E S. Hence, we deﬁne closure as follows:

closed (3,7)) = (Vso,sl | (so,sl)Ep: (3065 => 3165)).

A sequence of states, (50,31, ...), is a computation of p if and only if any pair

11

of two consecutive states is a transition in p. We formalize this by a DEPENDENT

TYPE as follows:

Computation(Z) : TYPE = {c : sequence[state] I (Vi I i 2 0 : (oi, c,-+1) E Z)}

where sequence[state] : N —> state and Z is any ﬁnite set from type of Action. A
computation prefix is a sequence of states, where the ﬁrst j steps are transitions in

the given program:
prefirr(Z,j) : TYPE 2 {c2 sequence[state] I (Vi I i <j : (c,,c,+1) E Z)}

Obviously, a computation preﬁx is a ﬁnite sequence of states. We deliberately
model computation preﬁxes by inﬁnite sequences of which only a ﬁnite part is used.
This is due to the fact that using ﬁnite sequences in PVS is not very convenient and
the type checker generates several proof obligations whenever ﬁnite sequences are
used.

The projection of program p on state predicate S consists of transitions of p that
start in S and end in S, denoted as pIS. Similar to the notion of program, we model

projection of p on S by a ﬁnite set of transitions:

p I S I ACtZO’n = {(30731) I (50731) E p A (50131) E S}'

2.3 Speciﬁcation

A specification is a set of inﬁnite sequence of states that is suffix closed and fusion

closed. Suffix closure of a set of inﬁnite state sequences means that if a state sequence

12

a is in that set then so are all the sufﬁxes of 0. Fusion closure of a set of inﬁnite state
sequences means that if state sequences ((1,137) and (5,55, 6) are in that set then so
are the state sequences (a,:r,6) and (6,1137), where a and S are ﬁnite preﬁxes of
state sequences, 7 and 6 are suffixes of state sequences, and a: is a program state.

Following Alpern and Schneider [13], the speciﬁcation consists of the safety spec-
ification and the Iiveness specification. Since the speciﬁcation is sufﬁxed closed and
fusion closed, it is always possible to specify the safety speciﬁcation as a set of bad
transitions [2]. Thus, for program p, we model its safety speciﬁcation by an arbi-
trary subset of S, x Sp. Hence, we model the safety speciﬁcation by a ﬁnite set of
transitions, called spec. We explain the Iiveness issue in Section 2.4.

Given program p, state predicate S, and speciﬁcation spec, we say that p satisfies
its speciﬁcation from S if and only if (1) S is closed in p, and (2) every computation of
p that starts in a state in S, does not contain a transition in spec. If p does not satisfy
its speciﬁcation from S, we say p violates its speciﬁcation. If p satisﬁes speciﬁcation
from S and S ¢{}, we say that S is an invariant of p. In PVS, we model an invariant

by an arbitrary state predicate.

2.4 Faults and Fault-Tolerance

The faults that a program is subject to are systematically represented by a ﬁnite
set of transitions. A class of fault f for program p is a subset of S, x 5,. As faults have
an arbitrary nature, in our formalization, we do not assume any relation between p
and f. Thus, they may be disjoint, equal, or they may intersect. A computation of
program p in the presence of faults f is an inﬁnite sequence of states where either a
transitions of p or a transition of f is taken at every step. Hence, a program in the
presence of faults is the union of program transitions and fault transitions. Likewise,

we model computation of program in the presence of faults as c : Computation(p U f).

13

We say that a state predicate T is an f-span (read as fault-span) of p from S
if and only if the following two conditions are satisﬁed: (1) S g T, and (2) T is

closed in p U f. Thus, in PVS, we model fault-span as follows:

FaultSpan(T,S,p U f)= ((S Q T) /\ (closed(T,p U f))).

Observe that for all computations of p that start at states where S is true, T is
a boundary in the state space of p up to which (but not beyond which) the state of
p may be perturbed by the transitions in f. Hence, we deﬁne the different levels of
fault-tolerance based on the behavior of the fault-tolerant program in its fault-span.

We say that p is failsafe f-tolerant (read as fault-tolerant) to spec from S if and
only if two conditions hold: ( 1) p satisﬁes spec from S, and (2) there exists T such
that T is an f-span of p from S, and no preﬁx of a computation of p U f that
starts in T contains a transition in spec. Intuitively, this type of fault-tolerance is
only concerned about satisfying the safety speciﬁcation. Failsafe fault-tolerance is
essential for safety-critical systems such as nuclear reactors, high temperature boilers
and etc.

We say p is nonmasking f-tolerant to spec from S if and only if the following
conditions hold: (1) p satisﬁes spec from S, and (2) there exists T such that T is an
f-span of p from S, and every computation of p U f that starts from a state in T
contains a state of S, since a nonmasking fault-tolerant program need not satisfy the
safety speciﬁcation in the presence of faults. Intuitively, this type of fault-tolerance
allows the system to violate the safety speciﬁcation while it is recovering to the normal
behavior. This type of fault-tolerance is usually used in computer networks that can
tolerate communication failures for a limited time.

We say that p is masking f-tolerant (read as fault-tolerant) to spec from S if and

only if the following conditions hold: (1) p satisﬁes spec from S, and (2) there exists T

14

such that T is an f-span of p from S, no preﬁx of a computation of p U f that starts
in T has a transition in spec, and every computation of p U f that starts from a state
in T contains a state of S. Intuitively, in this type of fault-tolerance a computation
of program in the presence of faults never violates the safety speciﬁcation and once
the state of the program is perturbed by faults, it eventually recovers to its normal
behavior.

In [8], the Iiveness speciﬁcation is modeled implicitly. Speciﬁcally, for failsafe
fault-tolerance, the requirement is that the fault-tolerant program does not deadlock
in the absence of faults. And, for masking fault-tolerance, the requirement is that the
fault-tolerant program does not deadlock even in the presence of faults. A program
deadlocks in a state if and only if at that state there is no program transition to take.
Formally, a program deadlocks in state so if and only if V31 I 31 E S : (so, .91) ¢ p,
where p is the set of program transitions and S is the program invariant.

In order to ensure that the synthesis algorithms for adding nonmasking and
masking fault-tolerance to a given fault-intolerant program maintains the Iiveness
speciﬁcation, we assume that the number of occurrences of faults in a computation is
ﬁnite. We express this assumption by an axiom in our PVS speciﬁcation. Note that
throughout the speciﬁcation of all algorithms in this thesis, this is the only axiom
used in formalizing fault-tolerance.

Axiom 2.2 : The number of occurrences of faults in a computation is ﬁnite. Formally,

Vp:Vc(pr) : (EI n I712 0:(Vj IjZ n:(cj,cj+1) 619)).

Based on Axiom 2.2, we can infer that in any computation of program in the
presence of faults, there exists a sufﬁx of computation, which does not contain fault
transitions. The lemma is helpful in the veriﬁcation of addition of nonmasking and
masking fault-tolerance in Chapter 4.

Lemma 2.3: In any computation of any program in the presence of faults, there

exists a sufﬁx of the computation that is fault-free. Formally,

15

Vp : V009 U f) = 3(1’) =V(n) = (SUffixy-(ch, suffixj(6)n+1) 6 p
where su f fig-(c) is an inﬁnite computation that starts from the jth state in c.
Proof. The proof is based on Axiom 2.2. After elimination of quantiﬁers by skolem-
ization and expansion of deﬁnition of suffix, we need to prove:

W?! I n 2 i) = (Cm Cn+1) 6 19 => Vm = (Cm+jacm+j+1) E 19-
After removing the universal quantiﬁer in the consequent, we manually instantiate n
with j + m. Then, a propositional simpliﬁcation discharges the theorem.
Discussion. Throughout this chapter, we modeled all the elements of our formal
framework abstractly. For instance, states and transitions are modeled as uninter-
preted types, and program, faults, and the safety speciﬁcation are modeled by ar-
bitrary sets of transitions. Also, state predicate and, thus, program invariant are
modeled by arbitrary subsets of the state space. This abstract modeling gives us the
freedom to claim that any programs synthesized by the algorithms in chapters 4 and
5 are correct by construction and, hence, there is no need to verify the synthesized

programs individually.

2.5 Problem Statement

In this section, we recall (from [8]) the problem of automatic synthesis of fault-
tolerance and we formally state the problem of mechanical veriﬁcation of synthesis of
fault-tolerant programs. As described earlier in this chapter, the fault-intolerant pro-
gram is speciﬁed in terms of its state space Sp, its transitions, p, and its invariant, S.
The speciﬁcation provides a set of bad transitions (that should not occur in program
computation). The faults, f, are speciﬁed in terms of a ﬁnite set of transitions. Like-
wise, the fault-tolerant program is speciﬁed in terms of its state space Sp, its set of
transitions, p’, its invariant, 3’, its speciﬁcation, spec, and the type of fault-tolerance

it provides.

16

 

 

Now. '
p, Note th
alone to I"
the same I
that p sati:

invariants .

o If 5'
inclnt

iIl thr‘

0 Regar
and p
transit

that. p‘

Based on the

(this deﬁniti.

 

Now, we explain what it means for a fault-tolerant program p’ to be derived from
p. Note that this derivation is based on that p is obtained by adding fault-tolerance
alone to p. Hence, we should be able to prove that in the absence of faults p’ has
the same behavior as 19. Speciﬁcally, p’ should satisfy spec from S’ by simply using
that p satisﬁes spec from S. To ensure this, we consider the relation between (1) the

invariants S and S’, and (2) the transitions p and p’.

o If S’ contains states that are not in S then, in the absence of faults, p’ will
include computations that start outside S and hence, p’ contains new behaviors

in the absence of faults. Therefore, we require that S’ Q S.

0 Regarding the transitions of p and p’, we focus only on the transitions of p’ IS’
and pIS’. If p’ IS’ contains a transition that is not in pIS’, p’ can use this
transition in a new computation in the absence of faults and, hence, we require

that p’IS’ Q pIS’.

Based on the above two requirements we state the transformation problem is as follows
(this deﬁnition will be instantiated in the obvious way depending upon the level of
tolerance):

The Transformation Problem

Given p, S, spec, and f such that p satisﬁes spec from S.

Identify p’ and S’ such that:

o S’ Q S
- (p’IS')§ (plS’)
o p’ is f-tolerant to spec from S’

In order to deﬁne soundness and completeness in the context of the transformation
problem, we deﬁne the corresponding decision problem (Likewise, this deﬁnition will

be instantiated in the obvious way depending upon the level of tolerance):

17

The Decision Problem
Given p, S, spec, and f such that p satisﬁes spec from S.

Does there exist p’ and S’ such that:
o S’ Q S

' (P’IS')§(I9|S’)

o p’ is f-tolerant to spec from S’?

Soundness. An algorithm for the transformation problem is sound if and only if
for any given input, its output, namely p’ and S’, satisﬁes the transformation problem.

Completeness. An algorithm for the transformation problem is complete if
and only if for any given input, if the answer to the decision problem is ”yes” then
algorithm always ﬁnds program p’ and state predicate S’.

In Chapter 4, our goal is to mechanically verify that the proposed algorithms
in [8] are indeed sound and complete using the PVS theorem prover. More speciﬁcally,
based on the deﬁnitions in this chapter, we show that the algorithms in [8] satisfy the
transformation problem. Also, we verify that the algorithms for adding failsafe and
nonmasking fault-tolerance are complete.

In Chapter 5, we redeﬁne the transformation problem for synthesis of multitol-
erant programs separately. Based on the new deﬁnition, we present a new problem

statement for mechanical veriﬁcation of automatic synthesis of multitolerance.

18

Chapter 3

A Theory for Fixpoint Calculations

on Finite Sets

In this chapter, we present a theory for calculation of ﬁxpoint of formulas over
ﬁnite sets 1. This theory is essential in the veriﬁcation of the synthesis algorithms
in chapters 4 and 5. For instance, the essence of adding failsafe and masking fault-
tolerance to a given fault-intolerant program is recalculation of the invariant of the
fault-intolerant program which in turn involves calculating the ﬁxpoint of a formula.
More speciﬁcally, we calculate ﬁxpoint of a given formula to (i) calculate the set of
states from where safety may be violated by faults alone; (ii) remove the set of states
from where closure of fault-span is violated by fault transitions, and (iii) remove
deadlock states that occur in a given state predicate.

The p—calculus theory of the PVS prelude contains general deﬁnitions of the
standard ﬁxpoint calculation, however, it is not convenient to use that theory in
the context of our problem. This is due to the fact that this library focuses on
inﬁnite sets and is not specialized to account for the properties of functions used in

synthesis of fault-tolerant programs. By contrast, we ﬁnd that by customizing the

 

1Appendix B contains the formal speciﬁcation of this theory.

19

theory to the properties of functions used in synthesis of fault-tolerant programs, we
can simplify the veriﬁcation of the synthesis algorithms. Hence, in this chapter, we
develop a theory for ﬁxpoint calculations on ﬁnite sets. We also state and verify
their properties. This library is expected to be reusable for other formalizations that
involve ﬁxpoint calculations on ﬁnite sets as well. Based on the deﬁnitions in this
chapter, we model the synthesis algorithms for addition of fault-tolerance in chapters
4 and 5. Then, using the properties of smallest and largest ﬁxpoints, veriﬁed in this
chapter, we verify the correctness of the algorithms.

A ﬁxpoint of a function f : X —> X is any value 2:0 6 X such that f (120) = $0. In
other words, further application of f does not change its value. For instance, ﬁxpoint
of f (:r) = 2:2 — 2a: — 1 is 2:0 = 1. A function may have more than one ﬁxpoint. For
example, every $0 in the domain of f (:11) = a: is a ﬁxpoint.

The least upper bound of ﬁxpoints is called the smallest ﬁzpoint and the greatest
lower bound of ﬁxpoints is called the largest ﬁrpoint. In our context, the functions
whose ﬁxpoint is calculated demonstrate certain characteristics. Hence, as described
above, we focus on customizing the ﬁxpoint theory based on these characteristics.

In the context of ﬁnite sets, domain and range of f, X, are both ﬁnite sets of
ﬁnite sets. Throughout this chapter, the variables i, j, k range over natural numbers.
The variable a: is any ﬁnite set of any uninterpreted type and variable b is any member
of 3:.

The organization of this chapter is as follows: in Section 3.1, we present how to
formalize and verify the properties of the largest ﬁxpoint calculation. We describe

the the same issues for the smallest ﬁxpoint calculation in Section 3.2.

20

3.1 Calculating the Largest Fixpoint

As mentioned before, a function may have several ﬁxpoints. The greatest lower
bound is ﬁxpoints is called the largest ﬁrpoint. We recall that we study the concept
of ﬁxpoint calculations in the context of ﬁnite sets. In Section 3.1.1, we describe how
we formalize the calculation of the largest ﬁxpoint by the PVS speciﬁcation language.

Then, in Section 3.1.2 we verify the properties of the largest ﬁxpoint.

3.1.1 Speciﬁcation of the Largest Fixpoint Calculation

One type of functions used in synthesis of fault-tolerance is a decreasing function
for which the largest ﬁxpoint is calculated. Given is an initial ﬁnite set a: and a
function g that calculates a subset of 3:. The ﬁxpoint calculation procedure removes
the states in g(r) from a: and repeats this removal recursively until g(zr) becomes
the empty set. This function may resemble a certain property. For example, it may
calculate the set of deadlock states of a given state predicate. We formalize the
procedure of calculating the largest ﬁxpoint of a formula by ﬁrst deﬁning the type

DecFunc for decreasing functions such as 9 such that:
g : {A : finiteset} —> {B : finiteset I B Q A}.

In other words, for all ﬁnite sets :13, g(cc) Q :1: (cf. Figure 3.1).
With such a decreasing function, we deﬁne Dec(i, 11:)(g) to formalize the recursive
behavior of the largest ﬁxpoint calculation. Dec(i, :r)(g) keeps removing the elements

of the initial set, 2:, that the function g of type DecFunc returns at every step:

Dec(i—1,:r)(g)— g(Dec(i—1,:c)(g)) ifi 2% 0;

a: ifi=0

Dec(i, 1") (g) =

21

Full set

 

 

 

 

 

Figure 3.1: The relationship between 9, 2:, and :r - g(z).

Full set

Dﬁl+1 . X)

 

 

     

 

 

 

 

Figure 3.2: Largest ﬁxpoint calculation.

Finally, we deﬁne the largest ﬁxpoint as follows (cf. Figure 3.2):

LgFia:(:r)(g) = {b | Vic : b E Dec(k,:r)(g))}

Our goal is to prove the following properties of the largest ﬁxpoint based on the above

deﬁnitions:

- g(LgFix(:v)(g)) = 0
0 LgFix($)(g) = LgFi$(LgFia=($)(g))(g)

Intuitively, the ﬁrst above property means that further applications of Dec func-
tion does not change the value of the largest ﬁxpoint. The second property means
that LgFizr is indeed the largest ﬁxpoint . We prove the above properties in theorems

3.7 and 3.8 in Section 3.1.2.

22

 

Remark. The above deﬁnition of ﬁxpoint, as compared to deﬁnition of the least
ﬁxpoint in A—calculus [18] and p-calculus, is somewhat non-traditional. We ﬁnd
that this deﬁnition assists in veriﬁcation of the synthesis algorithms. For example,
we apply this ﬁxpoint calculation for removing deadlock states where g(zr) denotes
the deadlock states in set 3:. After calculating the largest ﬁxpoint, we need to show
that no deadlock states remain in the set 1:. Thus, we should show:

g(LgFiz(:c)) = 0.

Moreover, if g(LgFia:(:c)) = (I) then Vi : Dec(i, LgFia:(:r:)) = LgFix(:r).

3.1.2 Veriﬁcation of the Properties of the Largest Fixpoint

In this section, ﬁrst, we develop and prove a chain of lemmas. We use these
lemmas to prove the properties of the largest ﬁxpoint of a formula in theorems 3.7
and 3.8. We effectively use the mentioned theorems to verify the soundness and
completeness of the synthesis algorithms in chapters 4 and 5.

Lemma 3.1: Any application of Dec function decreases the size of the initial set
until it becomes the largest ﬁxpoint. Formally,

(j < ’6) => (D66(k,X)(g) E 0660', 10(9))

Proof. The INDUCT-AND—SIMPLIFY strategy discharges the lemma. CI

Lemma 3.2: Until the ﬁxpoint is achieved, the cardinality of Dec( j + 1,:r) is less
than or equal to It] — j — 1. Formally,

W = ((.<1(DeC(j,-’I=)(9))7é 0) ==> |(D€C(J'+1,$)(9)I S le -J' -1))
Proof. We prove this lemma by induction on j. In the base case, j = 0, after
eliminating the quantiﬁers and expanding the deﬁnitions, we need to show if g(x) is
nonempty then Ix — g(r)I g I:r| — 1. We prove this by using two predeﬁned lemmas
in the PVS prelude:

card.diff.subse: Vy, z : finiteset : ((3; Q 2) => (Iz — yI 2 [2| - IyI)), and

23

 

 

 

 

 

nonempty_card: Vy : finiteset : (y 75 (I) 4:) IyI > 0).
After instantiations, using the facts g(x) Q a: and g(x) 7E (I), the GRIND strategy dis-
charges the base case. For induction step, after eliminating quantiﬁers, and expanding
deﬁnitions, we need to prove
(g(DeCU + 1.x)(g)) 75 I) A lDeCU + 1.x)(g)l S livl - J' - 1) =>
([0860 + 1 + 1,$)(9)| 5 Incl - (J' + 1) - 1)-

We discharge the induction step in the same way that we proved the base case. [3

Lemma 3.3: If the ﬁxpoint is reached by step j then in any subsequent steps, ﬁxpoint
will be maintained. Formally,
Vi : ((9(DEC(J',$)(9)) = 01) => (Vk | k 2 j =9(DeC(k,x)(g)) = 91))
Proof. After skolemization to remove the universal quantiﬁer, we place induction on
k. The base case, k = j = 0, is trivially true. In the induction step, we need to prove:
(g(DeCUC, 110(9)) = 0) => (g(Dedk + 1, x)(g)) = (0)-
By expanding the deﬁnition of Dec in the consequent,
Dec(k + 1,3)(g) = Dec(k,:r)(g) — g(Dec(k,:r)(g)), and considering the antecedent
we infer that g(Dec(k,:r)(g)) = (I), therefore g(Dec(k + 1,a:)(g)) = g(Dec(k,:r)(g)),

which is equal to the empty set. E]

Lemma 3.4: There exists a step i such that subsequent applications of g returns the
empty set. Formally,

3 i: (Vn In _>_ i : g(Dec(n,:r)(g)) = Ill)
Proof. First, we instantiate i with IxI. Then, after skolemization, we need to prove
g(Dec(n, a:)(g)) = 0. Using Lemma 3.2 and instantiating j with Ier, we need to show
two subgoals:
Subgoal l: [Dec(Ier + 1,:r)(g)| > |:r:I — |:r| — 1, which is trivially true.
Subgoal 2: (g(Dec(Ier,a:)(g)) = (0) => (g(Dec(n,:r)(g)) = 0). From Lemma 3.3,

24

 

we know Vj : (g(Dec(j,2:)(g)) = (0) => (W: I k 2 j : g(Dec(k,2:)(g)) = 0)). After
automatic instantiations, we need to prove:
(Vk | k 2 III I g(DeC(k.=r)(g)) = 0) => (g(DeC(n.:v)(g)) = 0)-

Manual instantiation of k with n discharges the lemma. El

Lemma 3.5: There exists a step j where ﬁxpoint is achieved. Formally,

3 j = (Vk l k 2 J' = ((DCCUCa 93)(9) = 0660', 30(9)) A (g(DCC(k.$)(9)) = 91»)
Proof. Proof of the second conjunct is exactly the same as proof of Lemma 3.4, so
we proceed with the proof of the ﬁrst conjunct. From Lemma 3.4, we know that the
existence of j such that We I k 2 j : g(Dec(k,2:)(g)) = 0. Using Lemma 3.4 and after
skolemization, we place induction on k. In the base case, k = j = 0, we need to show
Dec(0,2:)(g) = Dec(j,2:)(g), which is trivially true. In induction step, we need to
prove:

W I i 22' = ((06603 $)(g) = 1366(2) iv)(9) A g(DeCU, 90(9)) = 0) =>
(0660' + 1, :10(9) = 0660'. x)(g)))
We prove this by applying the rule of extensionality and expanding Dec(i + 1, 2:)(g),
which is equal to Dec(i,:r)(g) — g(Dec(i, 2:)(g)). As g(Dec(i, 2:)(g)) = (II,

Dec(i + 1, 2:)(g) = Dec(i, 2:)(g) = Dec(j, 2:)(g) and the proof is complete. III

Lemma 3.6: For some value j, Dec( j, 2:) will reach a ﬁxpoint, and at this step value
of Dec( j, 2:) is equal to the largest ﬁxpoint. Formally,

3 j = (g(DBC(J',$)(9)) = I) A (1366(3) $)(9) = LgF z'.’1=(3=)(g)))
Proof. Similar to proof of Lemma 3.5, the proof of the ﬁrst conjunct is the same
as proof of Lemma 3.4. To prove the second conjunct, ﬁrst, we apply the rule of
extensionality to convert the set equalities to boolean equalities. Then, a propositional
split generates two subgoals:

Subgoal 1: Vb E LgFix(:c)(g) : b E Dec(j, 2:)(g). First, we expand the deﬁnition of

25

 

V

 

 

 

LgFia: = {b I Vk : b E Dec(k,:c)(g)} in the antecedent. Then, instantiating k with j
proves the subgoal.
Subgoal 2: V(b E Dec(j,:r)(g)) : b E LgFi2:(:c)(g).
To verify this subgoal, after expanding the deﬁnition of LgFix and eliminating the
universal quantiﬁer by skolemization, we need to show:
Vb E Dec(j,$)(g) : b E Dec(k,2:)(g).
Using Lemma 3.5, we know:
W I 2' Z 2' = (0600', $)(g) = 0660', $)(g) A g(DeC(i,-T)(9)) = (0)-
We instantiate i with k and by prOpositional simpliﬁcation through the GROUND

command, we prove this subgoal. El

Theorem 3.7: Application of function g on the largest ﬁxpoint of a ﬁnite set returns
the empty set. Formally, g(LgFi22(2:)(g)) = 0

Proof. Using Lemma 3.6, the GRIND strategy completes the proof. [3

Theorem 3.8: The largest ﬁxpoint of the largest ﬁxpoint of a function is equal to

the largest ﬁxpoint. Formally,
LgFi:r(x)(g) = LgFix(LgFi=v($)(g))(g)

Proof. By applying the rule of extensionality we convert the set equality to a boolean
equality. Now we need to prove two subgoals:

Subgoal 1: Vb : (b E LgFi2:(LgFi2:(2:)(g))(g) => b 6 LgFi:r(2:)(g)). After elimi-
nating the universal quantiﬁer, we expand the deﬁnition of the ﬁrst LgFia: in the
antecedent, which turns to We : b E Dec(lc, LgFix(:r)(g))(g). Then after manual in-
stantiation of k with 0, the GRIND strategy discharges the subgoal.

Subgoal 2: We need to show Vb : (b E LgFi2:(:r)(g) => b E LgFi2:(LgFi2:(2:)(g))(g)).

We expand the deﬁnition of the ﬁrst LgFia: in the consequent. Now we need to prove:

26

Vb: (b E LgFi23(2:)(g) => Vk : b E Dec(lc, LgFi2:(23)(g))(g)).

We proceed by induction on k. In the base step, I: = 0, we need to show:

Vb : (b 6 LgFi2:(2:)(g) => b E Dec(0,LgFi:r(2:)(g))(g)). We prove this by
expanding the deﬁnition of Dec(O, LgFi2:(2:)(g))(g), which is equal to LgFi2:(2:)(g).
At induction step, we need to show:

Vb= ((b E LgF z'rlr(r13)(g) A W = (b E 0860', LgFi-‘r($)(g))(g)) =>

b E 0660+1.L9Fi$($)(g))(g)))
After eliminating the quantiﬁers, we expand the deﬁnition of
Dec(j + 1,...) = Dec(j, ...) — g(Dec(j, ...)) in the consequent. From Lemma
3.3, we know:

Vi : (g(DeCU', $)(g)) = 0 => W I k 2 i : awed/6.110(9)) = 0)-

We proceed using this lemma and we instantiate set 2: with LgFi2:(2:)(g) and i with
0. This generates another two subgoals:
Subgoal 2.1: VI: : ((g(Dec(lc, LgFi2:(2:)(g))(g)) = (II) =>

(g(DeCU, LgFix($)(g))(g)) = 0))-
Instantiating k with j and rewriting the deﬁnitions discharges the subgoal.
Subgoal 2.2: g(Dec(O,LgFix(2:)(g))(g))) = (0. Towards this end, we expand the
deﬁnition of Dec(O, LgFi2:(2:)(g))(g), which is equal to LgFi:r(2:)(g). From Theorem

3.7, we know that g(LgFi2:(2:)(g)) = (I). Proving this subgoal, completes the proof. CI

3.2 Calculating the Smallest Fixpoint

As mentioned earlier in this chapter, the least upper bound of ﬁxpoints is called
the largest ﬁrpoint. In Section 3.2.1, we describe how we formalize the calculation
of the largest ﬁxpoint by the PVS speciﬁcation language. Then, in Section 3.2.2 we

verify the properties of the largest ﬁxpoint.

27

Full set

 

 

 

 

 

Figure 3.3: The relationship between r, 2:, and :r U r(:c).

3.2.1 Speciﬁcation of the Smallest Fixpoint Calculation

The second type of ﬁxpoint used in fault-tolerance synthesis is an increasing
function for which the smallest ﬁxpoint is calculated. Given is an initial ﬁnite set
2: and a function r that calculates a ﬁnite set disjoint from 2:. Calculation of the
smallest ﬁxpoint involves adding the state in r(:r) to 2: recursively until r(:c) becomes
the empty set. The function r may resemble a certain property. For instance, it may
calculate the set of state that are reachable from 2:. We formalize the procedure of
calculating the smallest ﬁxpoint of a formula by ﬁrst deﬁning the type IncFunc for

the increasing functions such as r such that:

r : {A : finiteset} —> {B : finiteset | A (I B = 0}.

In other words, for all ﬁnite sets 2: 2: ﬂ r(2:) = (0 (cf. Figure 3.3).
With such an increasing function, we deﬁne I nc(i, 2:)(r) to formalize the
recursive behavior of the smallest ﬁxpoint calculation. I nc(i, 2:)(r) starts adding

elements to the initial set, 2:, that the function r of type IncFunc returns at every step:

Inc(i,2:)(r) _—_ I’M" ’:$)(’)UT(I"C(i-1ix)(r)) 1:: aé :;
at 1 i =

28

Full set

 

 

   

lnc(i+1. x)

 

 

 

 

Figure 3.4: Smallest ﬁxpoint calculation.

Finally, we deﬁne the smallest ﬁxpoint as follows (cf. Figure 3.4):

SmFi27(2:)(r) = {b I El k : b E Inc(k,:r)(r)}

Our goal is to prove the following properties of the smallest ﬁxpoint based on the

above deﬁnitions:
0 r(SmFi:r(2:)(r)) = (II
o SmFi2:(2:)(r) = SmFi2:(SmFi:r(2:)(r))(r)

Intuitively, the ﬁrst property means that SmFia: is indeed the smallest ﬁxpoint.
The second property means that further applications of of Inc function does not
change the value of the smallest ﬁxpoint. These properties are stated in theorems
3.15 and 3.16.

Remark. The above deﬁnition of ﬁxpoint, as compared to deﬁnition of the least
ﬁxpoint in A—calculus and u—calculus, is somewhat non-traditional. We ﬁnd that
this deﬁnition assists in veriﬁcation of the synthesis algorithms. For instance, we
apply the smallest ﬁxpoint for calculating the reachable states of an initial set from
where the safety speciﬁcation may be directly violated, where r(:c) denotes the reverse
reachable states of set 2:. After calculating the smallest ﬁxpoint, we need to show that
the smallest ﬁxpoint contains all the states from where the safety speciﬁcation can

be violated in one or more steps of computation.

29

3.2.2 Veriﬁcation of the Properties of the Smallest Fixpoint

In this section, similar to the largest ﬁxpoint calculation, we develop and prove
a chain of lemmas to verify the properties of smallest ﬁxpoint in theorems 3.15
and 3.16. We effectively use the mentioned theorems to verify the soundness and
completeness of the synthesis algorithms in chapters 4 and 5. The proof of the
mentioned lemmas and theorems 3.15 and 3.16 are very similar to the corresponding

lemmas and theorems in Section 3.1.2.

Lemma 3.9: Any application of Inc function decreases the size of the initial set
until it becomes the smallest ﬁxpoint. Formally,
(’6 < 2') => (1n6(k,$)(7‘) Q 1716(2) $)(T))

Proof. The INDUCT_AND_SIMLIFY discharges the lemma. E]

Lemma 3.10: Until the ﬁxpoint is achieved, the cardinality of I nc( j, 2:) is less than
or equal to | fullset[T] I — |2:| —j — 1). Formally,
Vj : (r(Inc(j,:c)(r)) 96 (0) => IInc(j -l- 1,2:)(r)I S I fullset[T] I — I2:| —j— 1)

 

Proof. The proof is similar to the proof of Lemma 3.2 in Section 3.1.2 El

Lemma 3.11: If the ﬁxpoint is reached by step j then in any subsequent steps,
ﬁxpoint will be maintained. Formally,
W = ((707160) $)(r)) = 0) => W | k 21': (r(1n6(k,$)(r))= 0))

Proof. The proof is similar to the proof of Lemma 3.3 in Section 3.1.2. [:1

Lemma 3.12: There exists a step i such that subsequent applications of g returns
the empty set. Formally,
3 J’ = (V(’6 | k 2 2') = (r(1nC(k,$)(r)) = 0)

Proof. The proof is similar to the proof of Lemma 3.4 in Section 3.1.2. [I

30

Lemma 3.13: There exists a step j where ﬁxpoint is achieved. Formally,
El j : (Vk I k 2 j : (Inc(k,2:)(r) = Inc(j,2:)(r) /\ (r(Inc(k,2:)(r)) = 0)))

Proof. The proof is similar to the proof of Lemma 3.5 in Section 3.1.2. Cl

Lemma 3.14: For some value j, I nc( j, as) will reach a ﬁxpoint, and at this step value
of I nc( j, 2:) is equal to the smallest ﬁxpoint. Formally,
3 j = ((17160, 1v)(7‘) = Sm172317017X?» A (7(1n0(j,$)(7')) = 0))

Proof. The proof is similar to the proof of Lemma 3.6 in Section 3.1.2. C]

Theorem 3.15: Application of function r on the largest ﬁxpoint of a function returns
the empty set. Formally,
r(SmFi2:(:r)(r)) = 0

Proof. The proof is similar to the proof of Theorem 3.7 in Section 3.1.2. E]

Theorem 3.16: The smallest ﬁxpoint of the smallest ﬁxpoint of a function is equal
to the smallest ﬁxpoint. Formally,
SmFia:(2:)(r) = SmFi2:(SmFi2:(2:)(r))(r)

Proof. The proof is similar to the proof of Theorem 3.8 in Section 3.1.2. El

31

Chapter 4

Speciﬁcation and Veriﬁcation of
Algorithms for Synthesis of

Fault-Tolerance

In this chapter, we describe the synthesis algorithms in [8] and explain how we
formalize and verify them in PVS 1. As mentioned in Chapter 2, we are interested in
three levels of fault-tolerance: failsafe, nonmasking, and masking.

In this chapter, our focus is on the algorithms that synthesize fault-tolerant
programs in the high atomicity model that are subject to one and only one class of
faults. In this model, a program transition can read all variables as well as write all
variables in one atomic step. For this model, in [8], the authors propose deterministic
polynomial time algorithms to synthesize fault-tolerant programs for the three desired
levels of fault-tolerance described in Chapter 2.

We will also discuss about the veriﬁcation of the nondeterministic synthesis al-
gorithm, in [8], for the programs in the low atomicity model where a program cannot

read or write all the variables in one atomic step.

 

lAppendices C to E contain the formal speciﬁcation of the algorithms.

32

In Section 4.1, we ﬁrst describe, how to formalize and verify the algorithm for
synthesis of failsafe fault—tolerant programs. Then, in Section 4.2, we present formal
speciﬁcation and veriﬁcation of addition of nonmasking fault-tolerance. In Section
4.3, we describe how we mechanically verify the addition of masking fault-tolerance.
Finally, in Section 4.4, we argue why the veriﬁcation of the nondeterministic algorithm
for synthesizing the programs in the low atomicity model is not quite a challenging
problem.

The essence of adding failsafe and masking fault-tolerance to a given fault-
intolerant program is recalculation of the invariant of fault-intolerant program which
in turn involves calculating the ﬁxpoint of a formula. More speciﬁcally, we calculate
ﬁxpoint of a given formula to (i) calculate the set of states from where safety may
be violated by faults alone; (ii) remove the set of states from where closure of fault-
span is violated by fault transitions, and (iii) remove deadlock states that occur in a
given state predicate. The mentioned ﬁxpoint calculations are essential parts of re-
calculating the invariant and fault-span to obtain fault-tolerant programs. Once the
invariant is calculated the set of program transitions can be calculated in a straight-
forward manner. We effectively use the theory of ﬁxpoint calculations developed in
Chapter 3 to formalize and verify the synthesis algorithms.

Throughout this chapter, the variables x, s, so, 31 range over states. The variables
i, j, k, m range over natural numbers. The variable X ranges over StatePred and the
variable Z ranges over Action. As deﬁned in Section 2.5, p and p’ are fault-intolerant
and fault-tolerant programs respectively, S and S’ are invariants of fault-intolerant
and fault-tolerant programs respectively, T is fault-span, f is the ﬁnite set of faults,

and spec is the ﬁnite set of bad transitions that represents the safety speciﬁcation.

33

4.1 Automatic Addition of Failsafe Fault-

Tolerance

In this section, we present forrnal speciﬁcation and veriﬁcation of the algorithm
for adding failsafe fault-tolerance proposed in [8]. Along the formal speciﬁcation,
we also describe the intuition behind each step of the mentioned algorithm. As for
veriﬁcation, we prove the correctness of both soundness and completeness of the

algorithm.

4.1.1 Speciﬁcation of the Synthesis of Failsafe Fault-

Tolerance

Intuitively, the main feature of a failsafe program is it never violates the safety
speciﬁcation. However, it may not recover to its normal behavior. The essence of
adding failsafe fault-tolerance to a given fault-intolerant program is to remove the
states from where safety may be violated by taking one or more fault transitions. We
reiterate the algorithm Add.failsafe (from [8, 9]) in Figure 4.1. As we describe the
algorithm, we explain how to formalize it in PVS.

In order to construct ms, the set of states from where safety can be violated
by one or more fault transitions, ﬁrst, we deﬁne msI nit as the ﬁnite set of states
from where safety can be violated by taking a single fault transition. Note that

(30,31) 6 spec means (so, 31) directly violates the safety speciﬁcation. Formally,

msInit : StatePred = {so I 3 31 : (so, 31) E f A (30,31) 6 spec}

Now, we deﬁne a function, ReuReachStates, that calculates a state predicate from
where states of another ﬁnite set, rs, are reachable through a fault transition.

Formally,

34

 

Add_failsafe(p, f : transitions, S : state predicate, spec : speciﬁcation)
{
ms := smallestfia:point(X = X U {so I (3 51 :
(80,81) 6 f) /\ (31 E X V (30,31) violates spec) };
mt := {(30,31) : ((31 Ems) V (30,31) violates spec) };
S’ := ConstructInvariant(S — ms, p—mt);
if (S’={}) declare no failsafe f-tolerant program p’ exists;
else p’ :=ConstructTransitions(p—mt, S’)

}

ConstructInvariant(S : state predicate, p : transitions)
// Returns the largest subset of S such that computations of p
within that subset are inﬁnite
return largestfixpoint(X = (X (I S) — {so I (V31 : 81€X : (30,31) ¢ p)}

ConstructTransitions(p : transitions, S : set of states)
{return p-{(so,sl) : sOES A 31¢ S} }

 

 

 

Figure 4.1: The synthesis algorithm for adding failsafe fault-tolerance

ReuReachStates(rs : StatePred) : StatePred =

{so[331231ErsA(so,sl)Ef/\so¢rs}

As mentioned earlier, we use the ﬁxpoint theory developed in Chapter 3 to formalize
the synthesis algorithms. Obviously, ReuReachStates identiﬁes a ﬁnite set that is
disjoint from rs. Hence, ReuReachStates has the type of IncFunc. The following
judgement helps the PVS type checker to discharge later proof obligations:
Judgement 4.1 : ReuReachStates has type of IncFunc.

Proof. The GRIND strategy simply discharges the judgement. C]
We use the deﬁnition of smallest ﬁxpoint, developed in Section 3.2.1, to deﬁne the

state predicate ms. Towards this end, we consider msInit as the initial set, 2:, and

ReuReachStates as the increasing function, r:

35

ms : StatePred = SmFi:r(msInit)(ReuReachStates)

Then, we deﬁne the ﬁnite set of transitions, mt, that must be removed from p. These
transitions are either transitions that may lead a computation to reach a state in

ms, or transitions that directly violate safety:

mt: Action = {(so, 31) | (31 6 ms V (so, 31) E spec)}

The algorithm Add_failsafe removes the set ms from the invariant of the fault-
intolerant program S. It also removes mt from p. This removal may create deadlock
states (cf. Section 2.4). The set of deadlock states in ds using program Z is deﬁned

as follows:

DeadlockStates(Z)(ds : StatePred) : StatePred =
{so I so 6 ds:(V31I31E ds : (so, 31) ¢ Z)}

Similar to ReuReachStates, we deﬁne the following judgement to help the PVS type
checker to discharge proof obligations in the later theorems:
Judgement 4.2: DeadlockStates(Z) has type of DecFunc.

Proof. The GRIND strategy simply discharges the judgement. C]

We construct the invariant of the fault-tolerant program by removing the
deadlock states to ensure that computations of fault-tolerant program are inﬁnite
(cf. Section 2.4). We deﬁne ConstructInvariant using the largest ﬁxpoint of a ﬁnite

set X, that removes deadlock states of a given state predicate X:

36

 

ConstructInvariant(X, Z) : StatePred = LgFi2:(X)(DeadlockStates(Z))

We formally deﬁne deﬁne the invariant of fault-tolerant program as follows:

S’ : StatePred = ConstructInvariant(S — ms,p — mt)

We construct the ﬁnite set of transitions of fault-tolerant program by removing the

transitions that violate the closure of S’:

p’: Action = p— mt — {(30,31) I ((so,sl) E (p— mt)) A (so 6 S’ A 31 ¢ S’)}

Finally, we present the deﬁnitions that we use in veriﬁcation of completeness of
the algorithm Add_failsafe. First, we deﬁne what it means for a program p” with

invariant S” to be failsafe:

IsFailsafe(S” : StatePred, p” : Action) : bool =
(S” # 0) A closed(S”, ”) A
(3"c5) AW’IS”§p|S”) A
(DeadlockStates(p”)(S”) = (II) A

Vj : (Vc : prefi2:(p” U f,j) I co 6 S” : V]: I k <j : (ck,ck+1) 9.5 spec)

Similar to the deﬁnition of ms, we deﬁne the set of states from where safety may be

violated by taking a sequence of i fault transitions:

ms(i) : StatePred = I nc(i, msI nit)(ReuReachableStates)

Now, we deﬁne the set of computations that violate safety in a single step:

37

 

rs

 

so = {cm I (Vs | s e msInit : ((co = s) A ((cO.c1) 6 spec)))}

4.1.2 Veriﬁcation of the Synthesis of Failsafe Fault-Tolerance

In order to verify the correctness of the algorithm Add_failsafe, we prove that
this algorithm is sound and complete. First, we present the proof of soundness and

then we describe how to verify the completeness.

Soundness

In order to verify the soundness of Add_failsafe, we prove that the synthesized
program p’, satisﬁes the three conditions of the transformation problem stated in
Section 2.5. More speciﬁcally, in Theorems 4.4 and 4.5, we prove that the ﬁrst two
conditions of the transformation problem hold. Then, in the remaining theorems,

we show that the program synthesized by Addjailsafe is indeed failsafe fault-tolerant.

Observation 4.3: S’ (1 ms = Q)
Proof. After expanding the deﬁnitions of S’, ConstructInvariant, and LgFix, we

need to prove:
V2: : (Vk : 2: E Dec(k, S — ms)(DeadlockStates(p — mt)) => 2: ¢ ms).

By instantiating k with 0, propositional simpliﬁcation discharges the observation. [:1

Theorem 4.4: The ﬁrst condition of the transformation problem (cf. Section 2.5)
holds. Formally,

S’ Q S
Proof. Our strategy to prove this theorem is based on the fact that S’ is

made out of S by removing some states. After expanding the deﬁnitions of S’,

38

ConstructInvariant, and LgFizr, we need to prove:
Vk : (V2: : (x E Dec(k, S — ms)(DeadlockStates(p — mt)) => 2: 6 S)).
Towards this end, ﬁrst, we instantiate k with zero. Then, after expanding the

deﬁnitions, we need to prove V2: : (2: E S — ms => 2: E S), which is trivially true. Cl

Theorem 4.5: The second condition of the transformation problem (cf. Section 2.5)
holds. Formally,
p’lS’ (_I pIS’

Proof. The GRIND strategy discharges the theorem. Cl

Theorem 4.6: S’ is closed in p’. Formally,
closed(S’, p’)

Proof. The GRIND strategy discharges the theorem. CI

Lemma 4.7: V(so,sl) : ((so, 31) E f A 31 6 ms) => so 6 ms
Proof. After expanding the deﬁnition of ms, we need to prove:

V(so, sl) : ((so, 31) E f A 31 E SmFi2:(msInit)(ReuReachStates)) =>

so 6 S mF i:r(msI nit)(ReuReachStates).

We know that ms is the smallest ﬁxpoint of ms] nit by adding new reverse reachable
states via fault transitions. Hence, recalculation of ms does not change its contents
(cf. Theorem 3.16). Thus, if 31 is in ms, so should already be in ms as well. We
proceed by applying Theorem 3.16. After instantiating 2: with msI nit, and r with
ReuReachStates in Theorem 3.16, we expand the deﬁnition of the ﬁrst SmFia: in

S mF i2:(SmF i2:(msI nit) (ReuReachStates))(ReuReachStates) =

S mF i2:(msI nit) (ReuReachStates).

Then, we prove the lemma by one step manual calculation of Inc to show that if 31

is in Inc(SmFi2:(...)) then so is in it as well. D

39

Remark. Note that Lemma 4.7 is one of the instances where formalization of the
ﬁxpoint calculation in Chapter 3 is used. More speciﬁcally, using Theorem 3.16, we
could show that ms contains all the states that may lead a computation to a state
that violates safety and further calculation of ms does not change its contents. In

other words, ms is a smallest ﬁxpoint.

Lemma 4.8: In the absence of faults, all computations of p — mt that start from a
state in S’ are inﬁnite. Formally,

DeadlockStates(p -— mt)(S’) = 0
Proof. First, we expand the deﬁnitions of S’ and ConstructInvariant. Then, we
need to prove:

Deadlocthates(p — mt)(LgFix(S — ms)(DeadlockStates(p — mt))) = 0.
Using Theorem 3.7, we instantiate z with LgFi:r(S - ms), and g with

DeadlockStates(p — mt) to complete the proof. [I

Theorem 4.9: In the absence of faults, all computations of p’ that start from a state
in S’ are inﬁnite. Formally,
DeadlockStates(p’)(S’) = (0

Proof. In Lemma 4.8, we showed that all computations of p — mt that start from a
state in S’ are inﬁnite. Now, we need to show that all computations of p — mt after
removing the transitions that violate the closure of S’ are still inﬁnite. Obviously,
removal of such transitions does not have anything to do with deadlock states,
because the source of a transition that violates the closure must have been removed
during the removal of deadlock states. Hence, the mechanical veriﬁcation only

involves a sequence of expansions and propositional simpliﬁcations. D

40

Remark. Note that Theorem 4.9 is another instance where formalization of ﬁxpoint
calculation in Chapter 3 is used. More speciﬁcally, DeadlockStates(p’)(S’) denotes
the deadlock states in S’ using program p’. We repeatedly remove these deadlock

states. Hence, once the ﬁxpoint is reached, there are no deadlock states.

Lemma 4.10: In the presence of faults, no computation preﬁx of failsafe tolerant
program that starts from a state in S’ , reaches a state in ms. Formally,

Vj:(Vc:prefi2:(p’Uf,j)IcoES’:Vka<j:ck¢ms)
Proof. After eliminating the universal quantiﬁers by skolemization, we proceed by
induction on k. In the base case, k = 0, we need to prove:

co 6 3’ => co 9E ms.
The base case can be discharged using Observation 4.3. In induction step, we need
to prove:

(ann<j=(cmcn+1)€p’Uf)=>

(Vka<j:ck¢ms=>ck+1¢ms).

From Lemma 4.7, we know that if the terminus of a fault transition (so, 31) is in ms,
then the source, so, is in ms as well. This means that if so is not in ms then 31 is
not in ms either. We know that ck ¢ ms and, hence, by applying Lemma 4.7, we

can infer ck+1 ¢ ms. CI

Theorem 4.11: Any preﬁx of any computation of failsafe tolerant program in the
presence of faults that starts in S’ does not violate safety. Formally,

Vj : (V(c : prefi2:(p’ U f),j I co 6 S’) :Vk I k <j : (ck,ck+1) ¢ spec)
Proof. In Lemma 4.10, we proved that no computation preﬁx of p’ U f that starts
from a state in 8’ never reaches a state in ms. In addition, by deﬁnition, p’ never
reaches a transition that is in spec. Thus, a computation preﬁx of p’ U f that starts

from a state in S’ contains no transitions in spec. I]

41

‘-

Theorem 4.12: For any synthesized failsafe program, there exists a fault-span.
Formally,
3 T: FaultSpan(T, S’,p’ U f)

Proof. First, we instantiate T (the fault-span) with the state space. Then, after
expanding the deﬁnition of FaultSpan, we need to prove that S is a subset of the
state space and faults does not violate the closure of the state space. Obviously, S’
is a subset of state space. In addition, no transition can violate the closure of state
space even in the presence of faults. Thus, after the above mentioned instantiation,

the GRIND strategy discharges the theorem.

Completeness

As can be seen in Figure 4.1, algorithm Add.failsafe declares failure under two
conditions. The ﬁrst type of failure occurs when all the states in S are in ms as well.
In other words, safety can be violated from any state of the program. The second
type of failure occurs when all the states in S — ms are deadlock states. With this
intuition, to verify the completeness of Add.failsafe, we should show that if there
exists a program that satisﬁes the transformation problem then Add_failsafe never
declares failure. Let us assume this hypothetical program is p” with the invariant S”.
In theorems 4.16 and 4.19, we show that if p” exists, Add.failsafe will not declare

failure under neither of the mentioned conditions.

Lemma 4.13: For all states that are added to ms up to step i, there exists a
computation of only faults that violates safety. Formally,

Vi: (Vs I s E ms(i) : (3 c(f) | co = s : (3 k: (ck,ck+1) E spec)))
Proof. The proof idea for this lemma is using induction on i. In base case, i = 0, we

should show that there exists a computation that violates safety in one step:

42

3 c(f) I co = s : (3 k: (ck,ck+1) E spec)
We prove the existence of such a computation by instantiating c with sc (refer to
Section 4.1.1 for deﬁnition of so). In induction step, we need to prove:
(Vsl : (31 E ms(j +1) A
V8 I 3 E m8(2) = (3 C(f) I Co = s = (3 k = (Ckack+1) E 8266)») =>
3 c(f) I co = 31 : (3 k: (ck,ck+1) E spec))
The induction step intuitively means that if there exists a computation that starts
from a state in ms and violates safety in j steps, there also exists a computation that
starts from ms and violates safety in j + 1 steps. To prove the induction step, we do
a case analysis on the step that 31 has been added to ms:
Case I. 51 E ms(j): This case is trivially true, because we already know that there
exists a computation that starts in ms( j ) and violates safety.
Case II. 81 E ms(j + 1) — ms(j): To prove this case, after expanding the deﬁnitions
of ms and Inc, we need to prove:
(31 E ReuReachableStates(Inc(j,msInit)(ReuReachableStates)) A
Vs I s E ms(j) : (3 c(f) I co = s: (3 k : (ck,ck+1) 6 spec)) A
3(31 6 Inc( j, msInit)(ReuReachableStates))) =>
3 c(f) I co = 31 : (3 k: (ck,ck+1) E spec)
N ow, we instantiate s with 31 to eliminate the universal quantiﬁer in the antecedent.
By removing the universal quantiﬁer, the existential quantiﬁer on c( f ) can be removed
by skolemization. Now, we need to prove:
((32 E Inc(j,msInit)(ReuReachableStates)) A ((31,32) E f) A c6 = 32 A
((c’k,cI, +1) 6 spec) A (31 9! I nc(j, msI nit)(ReuReachableStates))) =>
3 CU) I00 = 31 = (3 k = (ck,ck+1) E 8266))
This implication intuitively means that if computation c’ starts from 32 and violates
safety in j steps, and (31, so) is a fault transition, then there should exist a computa—

tion that starts from 51 and violates safety in j + 1 steps. To prove this implication,

43

we instantiate c with add(sl, c’) where add forms a computation such that co = 31
and su f fi2:1(c) = c’. Obviously, c is the computation that we are looking for. The
mechanical proof after instantiation of c with add(sl, c’) is only a chain of automatic

rewriting and propositional simpliﬁcations. C]

Lemma 4.14: For all states that are added to ms, there exists a computation of
only faults that violates safety. Formally,

Vs Is 6 ms: (3 c(f) I co = s : (3 k : (ck,ck+1) E spec))
Proof. From Lemma 4.13 we know that for all i, from any state in ms(i), there
exists a computation that eventually violates safety. Thus, to formally verify this
lemma, we only need to do a chain of automatic skolemization, deﬁnition expansion,

and instantiations. At the end, the GRIND strategy discharges the lemma. D

Lemma 4.15: The invariant of any program that satisﬁes the transformation prob—
lem is a subset of S — ms. Formally,

VS”,p” : (IsFailsafe(S”,p”) => (S” Q S — ms))
Proof. After skolemization, expanding the deﬁnition of IsFailsafe, and applying
Lemma 4.14, we need to prove:

(S"§S A S”;é0/\

V2:I2:€ms:(3c(f)Ico=2::3k:(ck,ck+1)€spec) A

VClP" U f) I 00 E 3" 3V1 1 (02': 9+1) 55 51060) =>
(S” Q S — ms)

From the second and third conjuncts in the antecedent, we can infer that any state
in ms cannot be in S”, as p” is failsafe. In other words, 8” 0 ms = 0. Besides, S”
is nonempty. Hence, S” Q S — ms. To mechanically prove the Lemma based on
the mentioned argument, it sufﬁces to remove the quantiﬁers by skolemization and

perform automatic instantiation, rewriting, and propositional simpliﬁcation. El

44

 

Theorem 4.16: If there exists a program that satisﬁes the transformation problem,
algorithm Add.failsafe does not declare failure under the condition that all states in
S are in ms as well. Formally,
3 S”,p” : (IsFailsafe(S”,p”) => (S — ms yé (0))

Proof. From Lemma 4.15 we know that S” is a subset of S — ms. In addition, p” is
failsafe and S” 79 0. Hence, S —ms cannot be empty as well. In order to mechanically
verify this theorem, ﬁrst we remove the existential quantiﬁer by skolemization. Then,
automatic, instantiations, rewriting, and propositional simpliﬁcation discharges the

theorem. El

Lemma 4.17: For all transitions in mt, there exists a computation of only faults
that starts with that transition and violates safety. Formally,
V(t1,t2) I (11,12) 6 mt: (3 c(f) I 60 = t1=(3 1“ 3 (Ckack+1) E 31980))

Proof. The transitions in mt are the ones that are either in spec or their terminus
are in ms. To mechanically verify this lemma, ﬁrst, we perform a case analysis on the
type of transition. If it is in spec, any computation that starts with that transition
is the answer. As mentioned before, one reason that a transition is in mt is that
its terminus is in ms. Thus, by applying Lemma 4.14 we can discharge the lemma,
because for any computation that contains a state in ms, there exists a suffix that

violates safety. CI

Lemma 4.18: Any failsafe program that satisfy the transformation problem is a
subset of p — mt. Formally,

VS”,p” : (IsFailsafe(S”,p”) => p” I S” Q p - mt)
Proof. The proof idea is similar to the proof of Lemma 4.15; we use the fact that

p” is failsafe and based on Lemma 4.17, we show that p” I 5” must be a subset of

45

p — mt. After skolemizing, expanding the deﬁnitions, and applying Lemma 4.17, we
need to prove:
(V(t1,t2) I (t1,t2) 6 mt: (3 c(f) I co = t1 : (3 k: (ck,ck+1) E spec)) A
(10” I S” E p I S”) A
we" u f) | Co e S" = (w z (are...) at spec)) =>
(19” I S” 9 p - mt)

From the ﬁrst and third conjuncts in the antecedent, we can infer that any transition
in mt cannot be in p”, as p” is failsafe. In other words, (p” I S”) 0 mt = 0.
Hence, p” I S” Q p -— mt. Based on this argument, to mechanically prove the
lemma, it sufﬁces to remove the quantiﬁers by skolemization and perform automatic

instantiation, rewriting, and propositional simpliﬁcation. El

Theorem 4.19: If there exists a program that satisﬁes the transformation problem,
algorithm Add.failsafe does not declare failure under the condition that all the states
in S - ms are deadlock states. Formally,

3 S”,p” I IsFailsafe(S”,p”) : (S’ # (II)
Proof. From Theorem 4.16 we know that if there exists a failsafe program p”, then
S — ms is not equal to the empty set. Hence, it sufﬁces to prove that if such a failsafe
program exists, then all the states in S — ms are not deadlock states. Towards this
end, ﬁrst, we apply Lemma 4.18. Then, after simpliﬁcations we need to prove:

((S”§ S—mS) A (10” | 3” Ep-mt) A

(S" 79 (II) A closed(S”, ”) A (p” I S" Q I) I S”) A
(DeadlockStates(p”)(S”) 75 (0)) =>
(5' 9e 0).

After expanding the deﬁnitions of S’, ConstructInvariant, and LgFix, we need to
prove:

((S”QS—ms) A (p”IS”Qp—mt) A

46

(S” # 0)) A closed(S”,p”) A (p” I S" Q I) I S”) A
(DeadlockStates(p”)(S”) 94 (0)) =>
Vk : (Dec(k, S — ms)(DeadlockStates(p — mt)) 75 0).
We proceed by placing induction on k. We discharge the base case, k = 0, using the
GRIND strategy. In the induction step, we need to prove:
(Dec(j, (S — ms)(DeadlockStates(p - mt)) ¢ (0) A
((S” Q S - W) A (10” I S” E 19- mt) A
(S” 5'5 III) A closed(S”, ”) A (p" I S” E p I S”) A
(DeadlockStates(p”)(S”) 75 0)) =>
Dec(j + 1, (S — ms)(DeadlockStates(p — mt)) 96 (II
To prove the induction step, ﬁrst, we expand the deﬁnition of Dec( j + 1,...) by one

step. Then, the GROUND strategy discharges the theorem. CI

4.2 Automatic Addition of Nonmasking Fault-
Tolerance

In this section, we present formal speciﬁcation and veriﬁcation of the algorithm
for adding nonmasking fault-tolerance proposed in [8]. Along the formal speciﬁcation,
we also describe the intuition behind each step of the mentioned algorithm. As for
veriﬁcation, we prove the correctness of both soundness and completeness of the

algorithm.

4.2.1 Speciﬁcation of the Synthesis of Nonmasking Fault-

Tolerance

To synthesize a nonmasking fault-tolerant program p’, we ensure that from any

state in the state space, p’ eventually recovers to a state in the invariant. However,

47

during this recovery, the safety speciﬁcation may be violated temporarily (cf. Section
2.4). Thus, it suffices to add one step of recovery from every state in the state space
to the invariant. We reiterate the algorithm Add_nonmasking (from [8,9]) in Figure

4.2.

 

Addmonmasking(p, f : transitions, S : state predicate, spec : speciﬁcation)
{

S’ := S;

p, I: (pIS) U {(80,81)130¢S /\ 8163}
}

 

 

 

Figure 4.2: The synthesis algorithm for adding nonmasking fault-tolerance

Modeling Add_nonmasking in PVS is straightforward. As safety may be
temporarily violated during the recovery, the algorithm does not change the invariant
of the fault-intolerant program in synthesizing the fault-tolerant program. Hence,

we deﬁne the invariant of nonmasking program as follows:
S’ : StatePred = S.
As mentioned earlier, the set of transitions for fault-tolerant program is the

original transitions of the program plus the transitions that add recovery to the

invariant in one step. In PVS, we deﬁne the set of transitions for fault-tolerant

program as follows:

p’:Action=(pIS) U {(so,sl)Iso¢S A 81ES}

48

4.2.2 Veriﬁcation of the Synthesis of Nonmasking Fault-

Tolerance

In order to verify the correctness of algorithm Add_nonmasking, we prove that
this algorithm is sound and complete. First, we present the proof of soundness and

then we describe how to verify the completeness.

Soundness

In order to verify the soundness of Add_n0nmasking, we need to prove two properties
to show that p’ is nonmasking fault-tolerant. First, we should show that the
fault-tolerant program satisﬁes spec in the absence of faults. Second, we should show
that in the presence of faults, if faults perturb the state of program, it eventually

recovers to its invariant.

Theorem 4.20: The ﬁrst condition of the transformation problem (cf. Section 2.5)
holds. Formally, S’ Q S

Theorem 4.21: The second condition of the transformation problem (cf. Section
2.5) holds. Formally, p’IS’ Q PIS’

Theorem 4.22: S’ is closed in p’. Formally, closed(S’, p’)

Proof. The GRIND strategy discharges the theorems 4.20, 4.21, and 4.22. CI

Lemma 4.23: In the absence of faults, any computation of nonmasking program
that starts from any state in the state space, eventually recovers to the invariant.
Formally,

Vc(p’):(3j|j>0:c,-ES’)
Proof. After eliminating the universal quantiﬁer by skolemizing, we need to prove:

(Vn=(c..c.+1)ezf) ==> (31|j>0=c.-€S’).

49

Now, if we manually instantiate n with 0 and j with 1, we should prove:
((60.61) 6 P’) = (Cl 6 S’)
This implication is true, as any transition in p’ ends in 8’. Thus, the GRIND strategy

simply discharges the lemma. El

Theorem 4.24: In the presence of faults, any computation of nonmasking program
that starts from any state in the state space, eventually recovers to the invariant.
Formally,

Vc(p’Uf):(3jIj>0:CjES')
Proof. From Lemma 2.3 in Section 2.4 we know that the number of occurrences of
faults is ﬁnite and, hence, there exits a sufﬁx of the computation that is fault-free.
Using this sufﬁx, and by applying Lemma 4.23, we can show that the computation
will eventually recover to the invariant. Hence, after applying the mentioned lemmas,
we need to prove:

(Vn : (c,,,c,,+1) Ep’ Uf A

Vm i (suffifL‘ACIma suffi$j+1(6)m+1) E P') =>
3 i I i > 0 : c,- E S’

where su f f i2:,-(c) is a fault-free sufﬁx of computation c that starts from the jth state
and su f f i2:,-(c)m is the mth state in that suffix. Now, we manually instantiate i with

j + 1 and m with 0. Obviously, Cj+1 E S’, as the transition that ends at c,“ is in p’D

Theorem 4.25: For any synthesized nonmasking program, there exists a fault-span.

Formally,
3 T : FaultSpan(T, S’,p’ U f)

Proof. The proof of this theorem is the same as proof of Theorem 4.12. D

50

 

Completeness
Add-nonmasking always ﬁnds a solution to the transformation problem. In other
words, the answer to the decision problem is always yes. Therefore, the algorithm is

complete and there is no need to formally verify the completeness.

4.3 Automatic Addition of Masking Fault-
Tolerance

In this section, we present formal speciﬁcation and veriﬁcation of the algorithm
for adding masking fault-tolerance proposed in [8]. Along the formal speciﬁcation,
we also describe the intuition behind each step of the algorithm. As for veriﬁcation,

we prove that the algorithm is sound.

4.3.1 Speciﬁcation of the Synthesis of Masking Fault-

Tolerance

In this section, we describe how we model the addition of masking fault-tolerance
to a given program p. First, we reiterate the algorithm Add.masking (from [8, 9]) in
Figure 4.3.

As mentioned in Chapter 2, in addition of masking, the requirement for preserv-
ing the Iiveness properties of a program is that the fault-tolerant program does not
deadlock in the presence of faults and it should recover to the invariant after a ﬁnite
number of steps while preserving safety. As mentioned in Section 2.4, we assume
that the number of occurrences of faults in a computation is ﬁnite. Based on this
assumption, we proved that for any computation of program in the presence of faults,
there exists a sufﬁx of the computation that is free from faults (cf. Lemma 2.3). We

use this lemma to show that a masking fault-tolerant program always recovers to its

51

 

 

Add_rnasking(p, f : transitions, S : state predicate, spec : speciﬁcation)
{
ms 2: smallestfizpoint(X = X U {so I (3.91 :
(so, 31) E f) A (31 E X V (so, 31) violates spec) };
mt :2 {(so, .91) : ((31 Ems) V (so,sl) violates spec) };
S1 :2 ConstructInvariant(S — ms, p—mt);
T1 := true—ms;
repeat
T2, 521: T1, 51;
p12: pISQ U {(80,151) I So¢Sz A SUETQ A S1€T2}-mt;
T1 := ConstructFaultSpan(Tg—
{s : 51 is not reachable from s in p, }, f);
S1 := ConstructInvariant(32 A T1, p1);
if(Si={} V T1={})
declare no masking f-tolerant program p’ exists;
exit
until (T1=T2 A S] :82);

For each state 3 : sETl :
Rank(s) = length of the shortest computation preﬁx of p1
that starts from s and ends in a state in SI;

p’ := {(so,sl) : ((so,sl)€p1) A (soESl V Rank(so)>Rank(sl))};

SI 2: 51;
T.I 2: T1
}

ConstructFaultSpan(T : state predicate, f : transitions)
// Returns the largest subset of T that is closed in f.

{

return largest fizrpoint(X = (X (I T)-
{30 3 (381 3 (30,81) 6 f /\ 31¢Xlll

 

Figure 4.3: The synthesis algorithm for adding masking fault-tolerance

52

 

invariant.

As can be seen in Figure 4.3, the algorithm contains a loop that identiﬁes the
invariant, program transitions, and fault-span of the masking fault-tolerant program.
Modeling the loop is actually the challenging part of formal speciﬁcation and veriﬁ-
cation of the algorithm.

We model Add_masking in three phases: (1) initialization, (2) identifying the loop
invariant, and (3) termination conditions. The initialization is straightforward and is
very similar to how we modeled previous algorithms. The 100p invariant includes two
properties (1) the intermediate invariant, So, at the start of the loop is a subset of S,
the invariant of the fault-intolerant program, and (2) the intersection of ms and the
intermediate fault-span, T2, at the start of the loop is the empty set. Hence, ﬁrst, we
show these properties for the initial guess of invariant and fault-span. Then, we show
that if these properties hold at the start of an iteration, they hold at the start of the
subsequent iteration as well. Modelling the termination condition is straightforward.
Now, we explain the three phases of formalization in details:

Initialization: Reasonably, our ﬁrst estimate of the invariant of fault-tolerant
program, S’, is the invariant of its failsafe fault-tolerant version. This actually makes
sense because every masking tolerant is failsafe as well. Also, the ﬁrst estimate of
the fault-span is the state space excluding the states from where the safety may be
violated, ms. To model the part of Add_masking before the loop, we deﬁne Sinit and

Tom as follows:

Sim: : StatePred = ConstructInvariant(S - ms, p — mt)

Tm“ : StatePred = S, — ms

The 100p invariant: Now, we model the repeat-until loop. Our initial estimate

of the fault-span has two ﬂaws: (1) the fault-span is not necessarily closed under

53

(p — mt) U f and (2) there does not necessarily exist a path from every state in the
fault-span to the invariant. To resolve the mentioned ﬂaws, in each iteration of the
loop, the algorithm, recalculates the fault-span, invariant, and program transitions
until the ﬁxpoint is reached. In other words, the algorithm ensures that for all states
in the fault-span, there exists a path that contains a state in the invariant, and the
fault-span is closed in program in the presence of faults. The value of the intermediate
invariant (respectively, fault-span) at the beginning of the 100p is 52 (respectively, T2).
After the recalculation, let the new values of the invariant and fault-span be 51 and
T1 respectively. We deﬁne 51 and T1 in terms of (arbitrary predicates) S2 and T2 as

follows:

1. We deﬁne an intermediate program p1 as follows. We require that for a
transition (so,sl) in p1, the following conditions are satisﬁed: (1) if so 6 S2
then 31 E SQ, (2) if so 6 T2 then 31 6 T2. Moreover, p1 does not contain
any transition in mt. As mentioned earlier, So and T2 are two arbitrary state
predicates that represent the intermediate invariant and fault-span respectively.

Formally,

S2 : StatePred

T2 : StatePred

P1 2 Action = ((p I S2) U TS) — mt, where

TS: StatePred = {(so,sl) I so 5! S2 A so 6 T2 A 31 6 T2}

2. To formally specify construction of T1, ﬁrst, we deﬁne the ﬁnite set of states

from where closure of T2 may be violated. Formally,

TN Close(X : StatePred) : StatePred =

54

{80I3812806X A (80,81)Ef A 81¢X}.

Then, we deﬁne the ﬁnite set of states from where the intermediate invariant,

So, is reachable. Formally,

TReach : StatePred = {s I s 6 To A reachable(Sg,T2,p1, 3)} where
reachable(Sg, T2, pl, 3) : StatePred =
{sI3c(p1):((co=s) A (SET-2) A 3j:cj€Sg)}.

Now, we deﬁne ConstructFaultspan as the largest subset of TReach that is

closed in f. Formally,

T1 = ConstructF aultspan(T Reach), where
ConstructFaultSpan(X : StatePred) = LgF i2:(X )(TN Close)

We remark that the deﬁnition of ConstructFaultspan is similar to the deﬁ-
nition of ConstructInvariant (cf. Section 4.1.1). The major difference is in
ConstructInvariant, we removed deadlock states recursively until we reach a
ﬁxpoint, but in ConstructFaultspan, we remove the states from where closure

of fault-span may be violated.

. Since 51 should be a subset of T1, we model the recalculation of invariant as

follows:

S1 : StatePred = ConstructInvariant(S,» ﬂ T1)(p1)

55

Termination of the loop: We continue the loop until we achieve the largest ﬁx-
point for S1 and T1. Obviously, when the ﬁxpoint is achieved the intermediate and
ultimate invariants (respectively fault-spans) are equal. We formalize the termina-
tion condition of the loop in the veriﬁcation phase. More speciﬁcally, we prove that
provided (S1 = S2) A (T1 = T 2) is true, p1 is failsafe and provides potential recovery
from every state in the fault-span.

Finally, the algorithm removes the possible cycles from the fault-span. This
removal is essential, otherwise a computation may remain in a cycle and never recov-
ers to the invariant. We deﬁne the ﬁnite set of transitions of masking fault-tolerant

program, p’, as follows:

I” = {(30,51) I ((so,-91) E P1) A ((30 E 51) V rank(so) > rank(sl))}, where
rank(s) = min(length(s,p1, S1)), where

length(s,p,5) = {J' I 3 C(P) = (S = Co A 62' E 51)}

4.3.2 Veriﬁcation of the Synthesis of Masking Fault-

Tolerance

To verify the correctness of algorithm Add_masking, we prove that it is sound.
Our proof idea is based on the three phases of our formalization in Section 4.3.1.
More speciﬁcally, we prove that when the state of the masking fault-tolerant program
is perturbed by faults, the program never violates safety and it eventually recovers to
the invariant. We also prove that after the 100p terminates the program has all the
properties of a masking fault-tolerant program. Note that the proof idea and, in some
cases, the proof tree of signiﬁcant number of lemmas and theorems in this section are
similar or exactly the same as lemmas and theorems we proved for Add.failsafe and
Add-nonmasking. We discuss the issue of proof reusability more in Chapter 7 and 8.

N ow, we describe the three phases of the veriﬁcation in more details.

56

Properties of initial values for the invariant and fault-span: As
mentioned earlier, in the ﬁrst phase of veriﬁcation, we show that the initial values of
the invariant and fault-span satisfy the 100p invariant. More speciﬁcally, we prove
this in Observation 4.26, and theorems 4.27 and 4.28. Their proofs are similar to
proofs of Observation 4.3 and Theorem 4.4 for Add.failsafe. Note that many of the
proofs developed for the veriﬁcation of Add.failsafe are directly applicable to prove

the lemmas and theorems in this section

Observation 4.26: There exists no state in the initial fault-span from where the
safety may be violated. Formally,

Tinit 0 ms = 0
Theorem 4.27: Initially, the invariant is a subset of the fault-span. Formally,

Sinit Q Tim't
Theorem 4.28: The initial invariant is stronger than the invariant of the fault-
tolerant program (the ﬁrst condition of the transformation problem). Formally,

Sinit Q S

We remark that we do not state a theorem for the second condition of the transfor-
mation problem in the initialization part, because the algorithm does not modify the

set of program transitions before the 100p.

Properties of the loop invariant: We prove that the synthesized masking
fault-tolerant program satisﬁes the transformation problem by stating and proving
a series of theorems and intermediate lemmas. Note that, in the ﬁrst iteration S2
(respectively T2) is equal to Sinit (respectively Tinit)- Hence, in the ﬁrst iteration,
the 100p invariant is satisﬁed. Moreover, at the beginning of each iteration S1

(respectively T1) is assigned to So (respectively T2). Hence, we can assume that So

57

and T2 satisfy the loop invariant at the beginning of each iteration.

So as to show the 100p invariant, ﬁrst, we show that if So and T2 (the inter-
mediate invariant and fault-span) satisfy the loop invariant then so do S1 and T1
(cf. Theorem 4.29). Then, we state and prove additional theorems about S1 and T1.
Proofs of Theorems 429-432 are similar to the proofs of corresponding theorems in
the veriﬁcation of failsafe. Hence, we omit these proofs. Note that, we effectively
reuse the proofs deveIOped for veriﬁcation of Add.failsafe in the following theorems.
Theorem 4.29: ((T2 0 ms = 0) => (T10 ms = 0)) A ((32 Q S) => (SI Q S))
Theorem 4.30: The invariant is always a subset of the fault—span. Formally,

SI Q T1
Theorem 4.31 : If the intermediate program and invariant satisfy the second con-
dition of the transformation problem, so do the program and SI. Formally,

(PIIS2QPIS2)=>(P1ISIQPISH)

Theorem 4.32: The recalculated invariant contains no deadlock states. Formally,

DeadlockStates(p1)(Sl) = 0
Theorem 4.33: The recalculated fault-span is closed in f. Formally,

closed(Tl, f)

Proof: The proof is similar to the proof of Lemma 4.8. We know:

T1 = ConstructFaultSpan(...) = LgFi2:(...).

Using Theorem 3.7, in the deﬁnition of LgFix, we instantiate X with TReach, and

g with TN Close to complete the proof. E]

Remark. Note that Theorem 4.33 is another instance where formalization of ﬁxpoint
calculation in Chapter 3 is used. More speciﬁcally, closed(T1, f) denotes wether the
state predicate T1 is closed in f. We repeatedly remove the states from where the

closure of T1 may be violated. Hence, once the ﬁxpoint is reached, there are no such

58

states.

Properties at the termination of the loop: As mentioned in Section 4.3.1,
we formalize the termination condition in the veriﬁcation phase; i.e, we prove that
provided (S1 = 82) A (T1 = T2) is true, p1 is failsafe and provides potential recovery

from every state in fault-span.

Theorem 4.34: Upon the loop termination, the invariant is closed in the program.
Formally,

(81 = S2) => closed(S1,p1)
Proof: Based on the fact that 52 is closed in p1 by construction, when 31 = So, p1
is closed in 81 as well. Thus, to formally prove the theorem, we replace S1 by S2 to

complete the proof. [:1

Theorem 4.35: Any preﬁx of any computation of the masking fault-tolerant program
in the presence of faults does not violate safety. Formally,
((51 = 32) A (T1 = T2)) =>
VI 3 (VC3PT6f333IP1 Ufij) I 00 6 T1 3W9 I k < j 3 (ckack+1) If spec)

Proof: Proof is similar to the proof of Theorem 4.11. CI

Theorem 4.36: (T1 = T2) => closed(T1, p1 U f)
Proof: Based on the fact that To is closed in p1 by construction, when T1 = T 2, T1 is
closed in p1 as well. From Theorem 4.33, we also know that T1 is closed in f. Thus,

using Theorem 4.33 and by replacing T1 by T2, we complete the proof. D

Theorem 4.37: After termination of the loop, for any state in fault-span, T1, there

exists a computation of p1 that starts from that state and reaches the invariant, SI.

59

Formally,

((81: S2) A (T1 2 T2)) => (Vs I s 6 T1 : reachable(Sl,T1,p1,s))
Proof: First, we apply Lemma 2.3 to show that there exists a sufﬁx for every com-
putation of p1 U f that contains no transition in f. After replacing T1 and S1 by T2
and S2 in the consequent, we need to prove:

Vs I s 6 T1 : reachable(S2, T2,p1, 3).
After expanding the deﬁnitions of T1, ConstructF aultS pan, and LgF i2: respectively,
we need to prove:

Vk : (s 6 Dec(k, TReach)(TClose)) => reachable(S2, T2,p1, 3)

After instantiating k with O, the GRIND strategy discharges the theorem. [3

Theorem 4.38: For any synthesized nonmasking program, there exists a fault-span.
Formally,

(T1 = T2) => 3 T: FaultSpan(T,S1,p1 U f)
Proof: From Theorem 4.30, we know that 81 is a subset of T1. Also, from Theorem
4.36, we know that if T1 = To is true, then the fault-span is closed in program in the
presence of faults. Thus, for mechanical veriﬁcation, it sufﬁces to, ﬁrst, instantiate

T with T1, and then, apply theorems 4.30 and 4.36. E]

4.4 Adding Fault-Tolerance in the Low Atomicity

Model

The three algorithms described earlier in this chapter use the notion of program
state in an abstract level. In practice, the state of a program is obtained by assigning
each variable a value from its domain. In the context of synthesis of real-world

programs, variables of a program and restrictions on reading and writing them matter.

60

For example, if a program has two integer variables 2: and y, [2: = 3, y = 10] identiﬁes
a state of this program. In this manner, a predicate over the states of a program
forms a set of states, so—called a state predicates. For example, 2: 2 5 A y < 20 is a
state predicate.

To synthesize distributed programs, where processes of a program run on a dis-
tributed system, one should consider the read-write restrictions of variables, as it is
usually impossible to read or write all the program variables in an atomic step. There
has been a lot of work on synthesis of distributed programs in the literature and we
brieﬂy discuss the issue of program synthesis in Chapter 6.

In [8], the authors prove that the problem of synthesizing a masking fault-tolerant
program is N P-Complete in the size of state space and, hence, there is no polynomial
time algorithm to synthesize a masking tolerant program unless P = NP. Fur-
thermore, the authors present a nondeterministic algorithm that guesses a solution,
namely the fault-tolerant program p’, invariant S’, and the fault-span T’. Then, the
algorithm veriﬁes wether p’ is fault-tolerant from S’ and T’ is a boundary up to which
p’ can be perturbed in the presence of faults. In other words, the nondeterministic
algorithm veriﬁes if the guessed solution satisﬁes the transformation problem. In [19],
Kulkarni and Ebnenasir show that the problem of adding failsafe fault-tolerance in
the low atomicity model is also NP-Complete in the size of state space. However,
identifying the complexity of adding nonmasking fault-tolerance in the low atomicity
model is still an open problem.

Formal veriﬁcation of a nondeterministic algorithm is not quite a challenging
problem. Once the solution is guessed, the algorithm itself veriﬁes the solutions.
Therefore, we do not include the formal speciﬁcation and veriﬁcation of the synthesis

algorithm in the low atomicity model in this thesis.

61

Chapter 5

Speciﬁcation and Veriﬁcation of
Automatic Synthesis of

Mult itolerance

In Chapter 4, we presented speciﬁcation and veriﬁcation of algorithms associated
with synthesis of fault-tolerant programs that are subject to a single class of faults.
Real world systems are often subject to multiple classes of faults and, hence, they need
to provide appropriate level of fault-tolerance to each class of faults. In other words,
in synthesizing multitolerant programs, we are looking for the answer to the following
questions: How a program could handle the occurrence of faults from different classes
at the same time? And how would a synthesis algorithm provide a level of fault-
tolerance for each class?

In this Chapter, ﬁrst, we introduce the concerns and deﬁnitions regarding multi-
tolerance in Section 5.1. In Section 5.2, we formally state the problem of mechanical
veriﬁcation of automatic synthesis of multitolerant programs. In Section 5.3, ﬁrst,
we present how to modify the deﬁnitions in chapters 2 and 4, so that they become

appropriate for modeling the synthesis of multitolerance . Then, we present for-

62

mal speciﬁcation and veriﬁcation of nonmasking-masking multitolerance. Finally, in
Section 5.4, we present formal speciﬁcation and veriﬁcation of failsafe-masking mul-
titolerance1 .

The time complexity of addition of failsafe-masking and nonmasking-masking
multitolerance are polynomial. However, the problem of addition of failsafe-
nonmasking is N P-Complete and, hence, there exists no polynomial time algorithm
to solve this problem unless P=N P. In [10], the authors propose a nondeterministic
algorithm, but we are not interested in veriﬁcation of that algorithm, as the algorithm
veriﬁes the guessed solution by itself.

Remark. We remark that in this chapter, we effectively reuse the speciﬁcation
and formal proofs of the framework and algorithms developed in chapters 2 and 4.
This is one of the instances that our manual proof reusability shows hope for devel-

Oping automated proofs (proof strategies) for future veriﬁcations of extensions of the

algorithms in [8, 10].

5.1 Faults, Fault-tolerance, and Multitolerance

The notion of multitolerance was ﬁrst introduced by Arora and Kulkarni [20].
The deﬁnitions of program, speciﬁcation, faults, and fault-tolerance remains the same
(cf. Chapter 2). Now, we deﬁne what it means when a program is multitolerant.

In Section 2.4, we gave the deﬁnition of different levels of fault-tolerance for a
single class of faults. Now, we consider the case where faults from multiple classes,
say f1 and f2, may occur in a given program computation.

There exist several possible choices in deciding the level of fault-tolerance that
should be provided in the presence of multiple fault-classes. In [10], Kulkarni and

Ebnenasir propose to require that the fault-tolerance provided for the class where f1

 

1Appendix F contains the formal speciﬁcation of the algorithms for synthesizing multitolerant
programs.

63

and f2 occur simultaneously should be equal to the minimum level of fault-tolerance
provided when either f1 or f2 occurs. For example, if masking fault-tolerance is to
be provided to f1 and failsafe fault-tolerance is to be provided to f2, then failsafe
fault-tolerance should be provided for the case f1 and f2 occur simultaneously. We
reiterate the following table from [10] that illustrates the minimum level of fault-
tolerance provided for different combinations of levels of fault-tolerance provided to

individual classes of faults:

 

 

 

 

Level of Fault-Tolerance Failsafe Nonmasking Masking
Failsafe Failsafe No—Tolerance Failsafe
Nonmasking No-Tolerance Nonmasking Nonmasking
Masking failsafe Nonmasking Masking

 

 

 

 

 

 

In order to simplify modeling of different classes of faults, we consider the union
of all the classes of faults that failsafe (respectively nonmasking and masking) is to
be provided, denoted by f failsa ,8 (respectively fnmmasking and fmaskmg). Now, given a
fault-intolerant program p, its invariant S, its speciﬁcation spec, and sets of distinct
classes of faults ffaﬂsafe, fnmmasking, and fma,k,-,,g, we deﬁne what it means for a
synthesized program p’, with invariant S’, to be multitolerant by considering how p’
behaves when (i) no faults occur; (ii) only one class of faults occur, and (iii) multiple
classes of faults occur. Kulkarni and Ebnenasir, in [10], deﬁne a multitolerant program
as follows:

Deﬁnition. Program p’ is multitolerant to ffaﬂsafe, fnmmaumg, and meh-ng

from S’ for spec iff the following conditions hold:
1. p’ satisﬁes spec from S’ in the absence of faults.

2. p’ is masking fmaskmg-tolerant form S’ for spec.

64

 

3. p’ is failsafe ( f failsa ,3 U fma,,k,,,g)-tolerant form S’ for spec.

4. p’ is nonmasking (fnonmaskmg U f,,,a.,k,-,,g)-tolerant form S’ for spec.

5.2 The Problem of Mechanical Veriﬁcation of Au-
tomatic Synthesis of Multitolerance

In this section, we present the problem of mechanical veriﬁcation of automatic
synthesis of multitolerance. Based on the deﬁnition of a multitolerant program in
Section 5.1, Kulkarni and Ebnenasir identify the requirements of the problem syn-
thesizing a multitolerant program p’. The general idea of algorithms for synthesizing
multitolerance proposed in [10] is based on the synthesis of fault-tolerance for sin-
gle class of faults [8]. Similar to the algorithms in [8] the requirements are (l) the
synthesis algorithms should not introduce any new behavior to the fault-intolerant
program, and (2) the synthesis algorithms should simply add multitolerance. The
formal deﬁnition of the transformation problem and soundness are as follows:

The Transformation Problem
Given p, S, spec, ffausafe, fnmmaskmg, and fmasking such that p satisﬁes spec from S.
Identify p’ and S’ such that:

S’ Q S

(p'IS’)§(PIS’)

p’ is multitolerant to f failsafe, fnmma,k,ng, and fmaskmg from S’ for spec
Soundness. An algorithm for the transformation problem is sound if and only if for
any given input, its output, namely p’ and S’, solves the transformation problem.

In this chapter, our goal is to mechanically verify that the proposed algorithms
in [10] for adding failsafe—masking and nonmasking-masking multitolerance are indeed

sound by the PVS theorem prover. In other words, based on the deﬁnitions in this

65

chapter, we Show that the algorithms in [10] satisfy the transformation problem.

5.3 Speciﬁcation and Veriﬁcation of Synthesis of

N onmasking—Masking Multitolerance

In this section, we present description, formal speciﬁcation and veriﬁcation of
the synthesis algorithm for addition of nonmasking-masking multitolerance to fault-
intolerant programs that are subject to two types of faults fnmmsking and fnmking.

First, we reiterate the algorithm Add_Nonmasking_Masking from [10] in Figure 5.1.

 

Add_Nonmasking_Masking(p: transitions, fnoankmg, fmwking: fault,
S: state predicate, spec: safety speciﬁcation)
{

P1, SI: Tmasking i: Add-Masking(p, fmaskingv S: Spec);
if (S’={}) declare no multitolerant program p’ exists;
return 0, 0;

P" T, i: Add_Nonmasking(p1, fnonmasking U fmasking: Tmaskinga SPEC);
return p’, S’;

}

 

 

 

Figure 5.1: The synthesis algorithm for adding nonmasking-masking multitolerance

In order to formalize the algorithm Add_Nonamsking_Masking (and
Add_Failsafe-Masking in Section 5.4), ﬁrst, we deﬁne f fag,“ 1e, fnmmmh-ng, and fmmking

follows:

fnonmaskingmasking 3 ACt’tO’n = f masking U fnonmasking

ffailsafeJnasking : ACtion = fmasking U ffailsafe

Our formalization for the fault-tolerance framework developed in Chapter 2 is

appropriate for the case that we deal with only one class of faults. However, in

66

modeling synthesis of multitolerance, as we have different classes of faults, we need to
redeﬁne some of the deﬁnitions, so that they accept the type of fault. For example,

in Section 4.1.1, we modeled ReuReachableSet as follows:

ReuReachStates(rs : StatePred) : StatePred =

{so[331:31ErsA(so,sl)EfAso¢rs}

The problem with this deﬁnition is it only represents the set that can reach rs
by taking a transition in f, which is the single class of faults. To formalize the
algorithms for synthesizing multitolerance, we need to redeﬁne ReuReachableSet, so
that it accepts different classes of faults. With this introduction, we redeﬁne msInit,

ReuReachableSet, and ms as follows:

msInit(anyFault : Action) : StatePred =
{so I 3 31 : (so, 31) E anyFault A (so,sl) E spec}
ReuReachStates(anyFault : Action)(rs : StatePred) : StatePred =
{so I 3 31 :31 E rs A (so,sl) E anyFault A so 5! rs}
ms(anyFault : Action) : StatePred =

S mFi2:(msI nit(anyF ault))(ReuReachStates(anyFault : Action))

All the other deﬁnitions given in chapters 2 and 4 should also be converted in the
same way, so that they are not restricted to only one class of faults. We refer the
reader to the appendices for formal speciﬁcation of new deﬁnitions.

Now, having the elements of multitolerance framework, we can formalize the
algorithm Add.Nonamsking_Masking. Note that most of the deﬁnitions we developed
in chapters 2 and 4 appear with a parameter for the type of faults. We proceed as

follows: Using the algorithm Add.masking, we synthesize a masking fma,k,-,,g-tolerant

67

program p1, with invariant S’, and fault-span Tmaskmg:

S’ : StatePred = add_masking.S 1( fmaskmg)
Tmasking : StatePred = add_nonmasking.S’(T1(fmaskingﬂ

p1 : Action = add_masking.p1(fma,kmg)

We note that add_masking is the name of the theory developed in Chapter 4 for
the algorithm Add_masking in PVS. Since program p1 is masking fma,k,-,,g-tolerant,
it provides recovery to its invariant S’, from any state in (Tmmking - 5’), while
preserving safety. Thus, in the presence of fnonmmkmgmaskmg, if p1 is perturbed to
(Tma,k,-,,g — S’), then p1 satisﬁes the requirements of nonmasking fault-tolerance.
However, if fnmma,k,ngma,k,ng perturbs p1 to a state that is not in Tmmking, then
recovery should be added from those states. Towards this end, the algorithms ensures
that one step recovery is possible. Thus, using the algorithm Add_nonamsking, we

add one step recovery from states that are not in Tmasking to the states in Tmmkmg:

p’ I Actzo’n = add_n0’nmaSk’l:ng-pl(Tmasking:p1(fmaskin9))

I _
T — Tmasking

As mentioned in Section 4.2.1, addition of one step recovery is independent of the
type of faults. Hence, to simplify the veriﬁcation of Add.Nonmasking.Masking, we
formalize the multitolerant program by p1(fma,k,-,,g) and not p1(fnmma,k,ngm,k,ng).

In order to verify the soundness of Add_Nonmasking_Masking, we prove a chain
of lemmas and theorems. Note that, as the fault-span of nonmasking-masking
multitolerant program and masking fault-tolerant program are equal, we do not
restate and reprove the lemmas and theorems that verify the properties of safety

of fault-span in the presence of only fma,k,-,,g. Note that, in most of the following

68

theorems, we assume that the termination condition of the repeat-until loop in

Add_masking is satisﬁed.

Theorem 5.1: The ﬁrst condition of the transformation problem (cf. Section 5.2)
holds. Formally,
(32 Q 5) => (3' Q 5)

Proof. The proof is the same as proof of Theorem 4.30. CI

Theorem 5.2: The second condition of the transformation problem (cf. Section 5.2)
holds. Formally,
(I),I52§PIS2I=>(P'IS’§PIS’)

Proof. The proof is the same as proof of Theorem 4.31. CI

Theorem 5.3: In the presence of fnmmuhngﬂaumg faults, any computation of
nonmasking-masking multitolerant program that starts from any state in the state
space, eventually recovers to the invariant. Formally,

VC(P’ U fnonmaskingjnasking) 3 (3 j I j > 0 : Cj E 5’)

Proof. The proof is the same as proof of Theorem 4.24. [:1

Theorem 5.4: The invariant of the nonmasking-masking multitolerant program is a
subset of the fault-span. Formally, S’ Q T’
Theorem 5.5: (add.masking.T1(fmmkmg) = T2) => closed(T’, p’ U meh-ng)
Theorem 5.6: Upon the loop termination, the invariant is closed in the program.
Formally,

(add_maSking.Sl(fmasking) = S2) => closed(S’,p’)
Theorem 5.7: For any synthesized nonmasking-masking program, there exists a

fault-span. Formally, 3 T : F aultSpan(T, S’ , p’ U fnmmaskingjnomng)

69

Theorem 5.8: After termination of the loop, for any state in fault-span, T’, there
exists a computation of p’ that starts from a state in T’ — S’ and reaches the invariant,
S1. Formally,
((Sl(fmasking) = 52) /\ (T1(fmasking) = T2» =>
Vs I s E T’ : reachable(S’, T’,p’, s)
Proof. The proof of theorems 5.4 to 5.8 are the same as proof of the corresponding

theorems for the algorithm add_masking in Section 4.3.2. [:1

5.4 Speciﬁcation and Veriﬁcation of Synthesis of
Failsafe-Masking Multitolerance

In this section, we present description, formal speciﬁcation and veriﬁcation of the
synthesis algorithm for addition of failsafe-masking multitolerance to fault-intolerant
programs that are subject to two classes of faults ffaumfe and fmuhng. First, we

reiterate the algorithm Add.Failsafe_Masking from [10] in Figure 5.2.

 

Add_Failsafe_lVIasking(p: transitions, ffaﬂsafe, fmamng: fault, S: state predicate,
spec: safety speciﬁcation,)
{

(s(,,_1), sn) violates spec };
mt := {(so, 31) : ((31 Ems) V (so, 31) violates spec) };
1911 SI: Tmasking 3: Add-MGSkinQUD _ mt, fmaskinga 8—7713: mt);
if (S’={}) declare no multitolerant program p’ exists;
return 0, 0;
I”: T, I: Add-FailsaerJh ffailsafe U fmaskinga Tmasking—‘msa mt);
return p’, S’;

 

 

 

Figure 5.2: The synthesis algorithm for adding failsafe-masking multitolerance.

The essence of the algorithm is as follows: ﬁrst, it identiﬁes the fault-span, such

that no computation of the multitolerant program, p’, violates safety in the presence

70

l
‘I I.- E.
-4

of f foam fe_ma,k,-,,g. Towards this end, the algorithm identiﬁes the states from where

safety may be violated when faults in f fags, fe_ma,k,-,,g occur:

ms : StatePred = ms( f failsa jejunum)

mt I ACtiOTL = mt(ffailsafe_masking)

Then, the algorithm ensures that the multitolerant program can recover to its

invariant, S’, when the state of the program is perturbed by fmaskmg:

Tmasking 3 StatePred = Tl(fmasking)
p1 : Action = add_masking.p1(fma,kmg)
S’ : StatePred = add_masking.S1( fmaskmg)

Finally, if faults f fag” fe_ma,k,-,,g perturbs the state of the program to a state 3,
where s E Tmmhng then the algorithm should ensure that the safety is maintained.

Towards this end, we add failsafe f failsa fe_ma,k,~,,g-tolerance to p1 from (Tmasking —ms):
T’ : StatePred = C'onstructInvariant(Tmamng — ms, p1 — mt)
p’ : Action = 191 — mt—

{(30:81) I ((30131) E (P1 - mt)) A (50,6 T') A (31 ¢ T')}

In order to verify the soundness of Add_Failsafe_Masking, we prove the following

lemmas and theorems:

Theorem 5.10: The ﬁrst condition of the transformation problem (cf. Section 5.2)

holds. Formally, (S2 Q S) => (3’ Q S)

71

 

 

Theo:

5:2“) It

Theo

form

Lem

mull

The

pros

Le
For

Th

For

Th.
Ihei

int:

Theorem 5.11: The second condition of the transformation problem (cf. Section

5.2) holds. Formally, (p’ I S2 Q p I So) => (p’ I S’ Q p I S’)

Theorem 5.12: Upon the loop termination, the invariant is closed in the program.

Formally,
(3'232) => closed(S’,p’)

Lemma 5.13: In the presence of faults, no computation preﬁx of failsafe-masking
multitolerant program that starts from a state in S’, reaches a state in ms. Formally,
(T2 O ms = 0) => VI 3 (Vc 3 197‘ 6f 256(1), U f failsafe_maskinga j ) 3
(coES’) => Vka<jzck¢ms)

Theorem 5.14: Any preﬁx of any computation of failsafe-masking multitolerant
program in the presence of faults that starts in S’ does not violate safety. Formally,
(T2 0 ms = 9)) => VI 3 (Vc 3 PT 8f 393(1), U f failsafe.maskingvj) 3
(co E 3’) => Vk I k < j : (ck,ck+1) 4 spec)

Lemma 5.15: All computations of p — mt that start from a state in T’ are inﬁnite.

Formally, DeadlockStates(p1 — mt)(T’) = 0

Theorem 5.16: All computations of p’ that start from a state in T’ are inﬁnite.

Formally, DeadlockStates(p’)(T’) = 0

Theorem 5.17: After termination of the loop, for any state in the fault-span, T’,
there exists a computation of p1 that starts from a state in T’ — S’ and reaches the

invariant, SI. Formally,

((SI = S2) A (T1(fmasking) = T2» =>

72

 

 

Vs I s E T’ : reachable(S’,T’,p’, 3)

Theorem 5.18: For any synthesized failsafe-masking program, there exists a

fault-span. Formally, El T : F aultSpan(T, S’ , p’ U f lem-Isa fe_masking)

Proof. Formal proof of lemmas and theorems 5.10 to 5.18 are similar to the corre-
sponding lemmas and theorems in sections 4.1.2 and 4.3.2. There are minor differences
such as the symbols and the occasions that we expand deﬁnitions. For instance, in
Section 4.1.2, we proved that no computation that starts from a state in the invari-
ant of failsafe fault-tolerant program 5’, violates safety. In Lemma 5.13 and Theorem
5.14, we prove the same thing, but for a different state predicate, the fault-span T’,

and different set of faults, f MSG “masking.

73

 

 

Chapter 6

Related Work

In this chapter, we focus on the previous work on veriﬁcation of fault-tolerance
properties of programs, circuits, and architectures. In Section 6.1, we brieﬂy present
the challenges in problem of program synthesis and transformation. In Section 6.2,
we introduce the previous work on formal veriﬁcation of fault-tolerant architectures,
processors, and circuits. Then, in Section 6.3, we present related work on mechanical
veriﬁcation of fault-tolerance properties of programs, algorithms, and software related
systems.

Throughout this thesis, our approach for veriﬁcation of correctness of the algo-
rithms in [8, 10] is by using theorem proving. Model checking is also shown to be an
effective tool in veriﬁcation of behavior of fault-tolerant ﬁnite state systems [21-26].
Model checking is widely used in veriﬁcation of properties of domain speciﬁc fault-
tolerant controllers and embedded systems. This approach is not quite related to our
approach in veriﬁcation. Hence, we do not get into the details of this context. In
Chapter 7 we argue why we chose to use theorem proving techniques to address the

problem of veriﬁcation of the synthesis algorithms.

74

6.1 Challenges in Automatic Synthesis of Pro-
grams

In this section, we present a review of related work on the problem of automatic
synthesis of reactive systems prior to [8]. As mentioned in Chapter 1, automatic
synthesis of programs is an algorithmic approach for generating a new program from
its speciﬁcation or from another program. This problem is well studies through
different approaches. In [27, 28], the authors present a survey on the problem
of automatic synthesis of reactive systems. More speciﬁcally, in [27], three main

approaches are presented. We now brieﬂy introduce these approaches:

Model-Theoretic Approach

This approach is based on the extraction of a ﬁnite model from a given temporal logic1
speciﬁcation. A temporal logic speciﬁcation consists of safety and Iiveness speciﬁca-
tions in terms of temporal operators. Model-theoretic approach mostly addresses
synthesis of closed reactive systems where there is no interaction between the system
and its environment.

Several different algorithms have been presented in the literature for synthesis
of reactive systems from their temporal logic speciﬁcation. The initial work was pre-
sented by Emerson and Clark [30]. They present an algorithm for synthesis of reactive
programs form CTL (Computational Tree Logic) speciﬁcation. In their formalization,
the safety speciﬁcation is represented by invariance formulas; i.e., for all paths of ex-
ecution, nothing bad happens. They represent the Iiveness properties of a program
by eventually formulas; i.e., for all paths of execution the eventually formulas hold.
The synthesis method is based on a decision procedure that veriﬁes the satisﬁability

of a CTL temporal logics speciﬁcation.

 

1We refer the reader to [29] for a survey on temporal logics.

75

 

Mana and Wolper [31,32] present a similar algorithm for synthesizing distributed
processes that can communicate by message passing. In their approach, the program
speciﬁcation is given as a PLTL (Propositional Linear Temporal Logics) formula.

In [33,34], Attie and Emerson present algorithmic methods, for synthesis of dis-
tributed and concurrent programs from CTL“ speciﬁcation. In a more recent work,
Arora, Attie, and Emerson [35] present an algorithm for synthesizing fault-tolerant
program from CTL speciﬁcation. Their algorithm consists of two phases. The
ﬁrst phase is basically performing the synthesis algorithm by Emerson and Clarke.

In the second phase, they generate a fault-tolerant version of the synthesized program.

Automata-Theoretic Approach

As mentioned before, model-theoretic approaches address synthesis of closed reac-
tive systems. By contrast, automata—theoretic approaches address synthesis of open
reactive systems where a program has interactions with its environment.

In the automata—theoretic approach, the input of the synthesis algorithm is a
tree automaton [36] that represents the speciﬁcation of the system. The synthesis of
an open reactive system involves converting its temporal logic speciﬁcation to a tree
automaton and, then, checking non-emptiness of the tree automaton; i.e., checking
whether the language accepted by the tree automaton is empty or not [37—40].

In [37,38], Pnueli and Rosner show that the synthesis algorithm that generates
a deterministic ﬁnite automaton that satisﬁes LTL speciﬁcation has a doubly expo-
nential complexity in the size of speciﬁcation. In [39], Kupferman and Vardi show
that the problem of synthesizing an open reactive distributed system is undecidable,

if we have no assumptions about the system architecture.

Calculational Approach

In many cases, it is always desirable to add a prOperty to programs rather than

76

synthesizing a new program from new speciﬁcation. This desire motivated researchers
to design algorithms that can add a particular property to programs in addition to
their old properties. In other words, in later approaches we would like to reuse the
program and its invariant to calculate a new program and its new invariant.

In [41], Arora presents a new foundation for fault-tolerance. Using his formal
framework, one can represent any kind of programs, faults, speciﬁcations, and fault-
tolerance. In Chapter 2, we developed the formal speciﬁcation of our framework based
on Arora’s deﬁnitions in [41].

We described the synthesis algorithms developed by Kulkarni and Arora [8] in
details in Chapter 4. We also described the algorithms for automatic synthesis of
multitolerance [10] in details in Chapter 5. The authors in both [8] and [10], use
a calculational approach to generate new program. As mentioned before, in their
method the synthesized program is correct by construction and there is no need to
verify the individual programs. Furthermore, since their synthesis algorithms begin
with an existing fault-intolerant program, the derived fault-tolerant program reuses
the fault-intolerant version. Another advantage of their approach is, having the set
of bad transitions, program, and its invariant, the algorithm can synthesize the fault-
tolerant program. In other words, synthesis is possible without having the entire
speciﬁcation.

The issue of complexity is a serious barrier to implement the algorithms in [30,
32—35, 37-40]. In contrast, the time complexity of algorithms in [8,10] show h0pe
on practical implementability. More speciﬁcally, the time complexity of addition of
fault-tolerance for single class of faults in the high atomicity model is polynomial and
addition of failsafe and masking fault-tolerance in the low atomicity model are NP-
Complete in the size of state space [8,19]. The problem of addition of multitolerance,
in general, is NP-Complete. However, the time complexity of additions of failsafe-

masking and nonmasking-masking multitolerance are still polynomial. Moreover,

77

In [19], Kulkarni and Ebnenasir identify the polynomial time boundary of the addition
of failsafe fault-tolerance in the low atomicity model. In [11], Kulkarni and Ebnenasir
focus on the problem of enhancing the addition of fault-tolerance of a nonmasking
fault-tolerant program to masking.

To implement the synthesis algorithms, Kulkarni, Arora, and Chippada [42] de-
velop a set of heuristics to synthesize a canonical version of the Byzantine agreement
problem. Based on these heuristics, Ebnenasir and Kulkarni [12] has developed a tool
for automatic synthesis of fault-tolerant programs.

We refer the reader to [27] to investigate more on the described approaches and

detailed comparison.

6.2 Formal Veriﬁcation of Fault-Tolerant Archi-

tectures

Hardware vendors are, perhaps, the largest community that use formal methods
to model and verify their products. Researchers in this community focus on processes
that show equivalence relations between a design and a speciﬁcation. Simulation
helps to gain more conﬁdence, but logic correctness is a more reliable way to verify
hardware. In this research area, proving the properties of computer instruction sets
is perhaps the most established application of formal methods.

Fault-tolerance is usually a crucial part of safety-critical embedded systems.
Thus, veriﬁcation of fault-tolerance properties of hardware in embedded systems is
well-studied and there has been enormous number papers and theses published in this
area of research. In this section, we present the previous work on formal veriﬁcation
of fault-tolerant architectures, circuits, and processors.

Formal veriﬁcation of fault-tolerant circuits. Kljaich, Smith, and Wojcik

[43] focus on veriﬁcation of logical correctness of fault-tolerance properties of digital

78

 

systems. More speciﬁcally, they use theorem proving together with an extended Petri
net representation to validate fault tolerance. Normal Petri net is an abstract machine
that rests between Turing machines and ﬁnite state machines, in terms of decision
power and modeling. Erecution of a Petri net models nondeterministic dynamic
behavior and is represented by ﬁring transitions. The extended Petri net theory in [43]
avoids the nondeterministic behavior and gives functional capability to the ﬁring of
transitions. The authors, ﬁrst, give an examples of a typical 2-bit ALU to show
the modeling capability of their extension to Petri net model. Then, based on their
design for the 2-bit ALU, they present a fault-tolerant processor (FTP) that tolerates
one or more failures. Fault-tolerance is achieved using redundant modules. Using
automated reasoning implemented by PROLOG, the authors show that for possible
arrangements of instructions sets, how many redundant modules are required to meet
the fault-tolerance speciﬁcation of the CPU.

Formal veriﬁcation of synchronized fault-tolerant circuits. Veriﬁcation
of fault-tolerant clock synchronization circuits has been a challenging problem in the
literature of hardware veriﬁcation. Miner and Johnson [44] present an optimization
of their previous work on formal development of a fault-tolerant synchronization cir-
cuit. The authors argue that in the previous work researchers in the formal methods
community had been focused on how to verify hardware using particular veriﬁcation
systems and not on type of reasoning support that we need. They represent circuits
as systems of stream equations, where each stream corresponds to a signal within
the circuit. The signals are annotated with invariants which can be established using
proof by co—induction. Miner and Johnson, exploit the invariants to verify that their
formally designed circuit satisﬁes the localized design reﬁnements. More speciﬁcally,
ﬁrst, they introduce a design hierarchy with different levels of abstraction. Then,
they use PVS as their veriﬁcation tool to prove the properties of their design.

NASA SPIDER project. NASA’s most recent project on fault-tolerant ar-

79

‘I_ W

chitectures is the formally-veriﬁed fault-tolerant architecture, Scalable Processor-
Independent Design for Electromagnetic Resilience (SPIDER) [45]. SPIDER is a
family of fault-tolerant, reconﬁgurable architectures that provide mechanisms for in-
tegrating inter-dependent applications of differing criticalities. SPIDER comes with
three protocols: Consistency Protocol, Diagnosis Protocol, and Synchronization Pro-
tocol. The ﬁrst two protocols provide a fault-tolerant broadcast communication archi-
tecture. More speciﬁcally, the Consistency Protocol takes care of a reliable message
broadcast in the presence of malicious faults, using the Byzantine Agreement prob—
lem. The Diagnosis Protocol distributes local information about the health status of
nodes across the network. In [46], Miner and Geser present mechanical veriﬁcation of
the ﬁrst two protocols. They present formal proofs in PVS that given the dynamic
maximum fault assumption and a sane health status classiﬁcation, the Consistency
Protocol agreement, and the Diagnosis Protocol provides a sane classiﬁcation of faulty
nodes.

Abstractions for hardware veriﬁcation. Melham [47] proposes four ab-
stractions for formalizing and verifying hardware. He argues that such abstractions
decreases the complexity and size of speciﬁcation and veriﬁcation. His argument
is based on the fact that specifying hardware in detail and proving that the speci-
ﬁcation satisﬁes the requirements through formula equivalence provokes the design
hierarchy. Melham suggests a satisfaction relation based on the idea of abstraction,
rather than equivalence, is the key to make large designs tractable. He introduces

four abstractions:

1. Structural abstraction—the suppression of information about a device’s internal
structure. The idea of structural abstraction is that the speciﬁcation of a de-
vice should not reﬂect its internal construction, but its externally observable

behavior.

2. Behavioral abstraction concerns speciﬁcation that only partially deﬁne a de-

80

vice’s behavior.
3. Data abstraction is well known from programming languages theory.
4. Temporal abstraction views a device by sequential time-dependent behavior.

Based on [47], fault-tolerance properties of hardware can be speciﬁed in structural and
behavioral levels of abstraction. Examples in [47] present very top level properties of

every hardware design that can be applied to speciﬁcation of many systems.

6.3 Mechanical Veriﬁcation of Fault-Tolerance
Properties of Programs

In this section, we present the related work on formal veriﬁcation of programs
and algorithms that provide fault-tolerance for software systems. The related work
in this section has a closer theme to our work in this thesis. In Section 6.3.1, we
present related work on formal veriﬁcation of fault-tolerant embedded and real-time
digital systems. In Section 6.3.2, we introduce previous work on mechanical veriﬁ-
cation of algorithms that provide fault-tolerance for mutual exclusion and message
passing problems. Finally, in Section 6.3.3, we present recent research work on deﬁn-
ing abstractions for fault-tolerant computing regardless of algorithm, protocol, and

hardware platform.

6.3.1 Formal Veriﬁcation of Fault-Tolerant Embedded Digi-

tal Systems.

Embedded systems mostly serve in digital controllers. In many cases, such con-
trollers are in charge of monitoring safety-critical systems that need greatest assur-

ance. For example, catastrophic failure of digital ﬂight-control systems for passenger

81

aircraft must be ”extremely improbable” [3]. Many researchers has focused on veriﬁca-
tion of fault-tolerant digital ﬂight control systems in the past two decades and there
has been great advances in speciﬁcation and veriﬁcation techniques of such systems
in the literature [3,48—52]. Clock synchronization and message passing are two major
concerns in such systems. Components across distributed digital embedded systems
miss clock signals due to transient and Byzantine faults. These faults can easily lead
the entire system to an unsynchronized and out of order state. Hence, veriﬁcation of
fault-tolerant clock synchronization has being paid great attention from researchers
and veriﬁcation engineers. Message passing is another serious issue in embedded
systems. Hence, fault-tolerant message passing and message broadcast protocols are
also well-studied [3,46]. Veriﬁcation tools vary from early theorem provers such as
the Boyer-Moor theorem prover and EHDM to modern ones such as PVS, HOL, and
ACL2. However, most attempts address formal proof techniques for a particular
fault-tolerant protocols or algorithms.

Although veriﬁcation of fault-tolerant embedded systems rely on speciﬁc charac-
teristics of different systems, there has been attempts to abstract the concepts that
are common among such systems. For instance, in [49], Rushby introduces and ab-
straction for mechanical veriﬁcation of fault-tolerant time-triggered algorithms. His
approach provides a methodology that can ease the formal speciﬁcation and assurance
of critical fault-tolerant systems. Mantel and Gartner [4] argue that modular reason-
ing about fault-tolerant systems can simplify the veriﬁcation process in practice. They
exercise their idea on the problem of reliable message broadcast in point-to-point net-

works in the presence of failures of processes.

6.3.2 Formal Veriﬁcation of Fault-Tolerant Algorithms

Dijkstra’s self-stabilizing token ring algorithm. A self-stabilizing algo-

rithm is one that guarantees that the system behavior eventually recovers to a safe

82

subset of states regardless of the initial state. Obviously, self-stabilization is a type
of fault-tolerance. In [6], Dijkstra introduced his well-known self-stabilizing mutual
exclusion algorithm (also known as token ring algorithm) in 1974. At the time, he
assumed the algorithm is trivially correct and left the proof as an exercise for the
reader. However, 12 years later, Dijkstra published a belated proof of correctness
of the algorithm in [53], which was not quite trivial! In the algorithm, there are N
Processes for N > 1 numbered 0 to N — 1 arranged in a unidirectional ring where
each process can observe its own state and that of its predecessor. Each process i
maintains a counter v(i) of values from 0 to M — 1, where N 5 M. Process 0 is a
distinguished process that is enabled when v(O) = v(N — 1), and when enabled, it
can make a transition by incrementing its counter modulo M. A nonzero process i is
enabled when v(i) 76 v(i - 1), and when enabled, it can update its counter value so
that v(i) == v(i — 1). It can be shown that despite a centralized controller the system
can self-stabilize from an arbitrary initial state to a stable behavior where exactly one
process is enabled at any state. This algorithm is useful as a solution for distributed
mutual exclusion problem; i.e., the enabled process (in other words, the one that has
the token) can enter its critical section. Qadeer and Shankar [5], exercise mechanical
veriﬁcation of self-stability property of Dijkstra’s algorithm using PVS. First, they
formalize the algorithm using the PVS speciﬁcation language. Then, based on Dijk-
stra’s informal proof sketch they present how to mechanically verify the self-stability
property of the algorithm. However, they do not present any sort of abstraction or

general design. Hence, there formal proof cannot be easily applied elsewhere.

6.3.3 Formal Veriﬁcation of Abstractions for Fault-Tolerance

In previous sections in this chapter, we presented related work on mechanical
veriﬁcation of speciﬁc fault-tolerant circuits, protocols, algorithms, digital embedded

systems, and micrOprocessors. In this section, we introduce the related work that

83

attack the problem of veriﬁcation of fault-tolerant systems by introducing abstractions
for programs, faults, and distributed systems. Deﬁning such abstractions make the
process of veriﬁcation less complex and more easy to prove.

Abstractions for fault-tolerant distributed systems. In [54], Pike et al
present four kinds of abstractions for the design and analysis of fault-tolerant dis-
tributed systems. More speciﬁcally, they introduce (i) Message Abstractions to ad-
dress the correctness of individual messages sent and received; (ii) Fault Abstractions
to address the kinds of faults possible as well as their effect in the system; (iii) Fault-
Masking Abstractions to address the kinds of local computations processes make to
mask faults, and (iv) Communication Abstractions to address the kinds of data com-
municated and the properties required for communication to succeed in the presence
of faults. The authors formalize the abstractions in PVS.

Abstractions for modeling faults, programs, and speciﬁcations. As men-
tioned in Section 6.2, in [47], Melham introduces four abstraction for hardware veri-
ﬁcation. Likewise, we need to introduce different levels of abstractions to make the
speciﬁcation and veriﬁcation of fault-tolerant systems less complex. Moreover, using
such abstractions, it is more likely that we can reuse formal proofs and develop proof
strategies. In [7, 55, 56], the authors introduce abstractions for modeling faults, pro-
grams, speciﬁcations, and component-based design of fault-tolerant programs. In [7],
Kulkarni, Rushby, and Shankar present a case-study in component-based mechani-
cal veriﬁcation of Dijkstra’s self-stabilizing mutual exclusion algorithm. The authors
demonstrate that decomposition of a fault-tolerant program to its components is use-
ful to make the veriﬁcation less complex and develop reusable formal proofs. More
speciﬁcally, they decompose a fault-tolerant program to a fault-intolerant program
and fault-tolerance components that are detectors and correctors. This decomposi-
tion facilitates the veriﬁcation of a given property by focusing on the component that

is responsible for satisfying it. In this sense, veriﬁcation of the entire fault-tolerant

84

program reduces to veriﬁcation of the component that is in charge of satisfying a
certain property. As an exercise, the authors, ﬁrst, implement the theory of detectors
and correctors [2] in PVS. Then, they verify the correctness of Dijkstra’s token ring
algorithm using the mentioned theory. Kulkarni et a1 verify the same algorithm that
Qadeer and Shankar prove its correctness in [5], but their approach (decomposition
of program into its components) is more general and abstract in the sense that their
formal proofs techniques are reusable to verify other fault-tolerant program. Brieﬂy,
in [5], the authors verify three major concerns. First, they formally prove that in
the absence of faults, the fault-tolerant program has the same behavior as the fault-
intolerant program. Then, they verify the corrector; i.e., if faults perturb the state
of program, at the resulting state, recovery to an ideal state of the fault-intolerant
program is possible. Finally, they verify that there is no interference between fault-

intolerant program and the corrector.

85

 

Chapter 7

Discussion

In this chapter, we discuss about different aspects of this thesis and try to answer
the questions that have been raised about the veriﬁcation of synthesis algorithms
proposed in [8,10].

Theorem proving vs. model checking. The ﬁrst issue that we would like to
address is why we used theorem proving techniques to verify the synthesis algorithms.
Model checking is also widely used to verify the fault-tolerance properties of many
safety-critical embedded systems. However, model checking can only be used for
systems that have ﬁnite and known size of state space. In our case, although the
state space of the programs that the synthesis algorithms deal with are ﬁnite, the size
of state space may vary. Therefore, model checking does not seem to be the right
technique for veriﬁcation of the synthesis algorithms.

The synthesis method. In Chapter 6, we presented related work on challenges
in the problem of program synthesis and mechanical veriﬁcation of fault-tolerant
systems. In [30,32,33,35I, the authors propose algorithms that synthesize a program
from its temporal logic speciﬁcation. In the previous work prior to [8], the input to
synthesis algorithms is either an automaton or temporal logics speciﬁcation and any

modiﬁcation in the speciﬁcation requires synthesizing the new program from scratch.

86

 

By contrast, the algorithms in [8,10] reuse the fault-intolerant program to synthesize
the fault-tolerant version. This reusability helps to improve the time complexity to
some extent. Thus, the algorithms proposed in [8,10] seem to be suitable candidates
for practical implementation purposes. As an extension of the algorithms in [8], in [42],
the authors introduce a set of heuristics for synthesizing distributed fault-tolerant
programs in polynomial time. Based on the heuristics, Ebnenasir and Kulkarni have
developed a tool for synthesizing fault-tolerant programs [12]. Therefore, by formal
veriﬁcation of the algorithms in [8,10], we gain more conﬁdence on their practical
implementations as well.

Advantages of mechanical veriﬁcation of algorithms for the synthesis
of fault-tolerant algorithms. Fault-tolerant systems are often in need of strong
assurance. Mechanical veriﬁcation is a reliable way to ensure that the fault-tolerance
requirements of a system are met. We ﬁnd that veriﬁcation of algorithms for synthesis
of fault-tolerance is a systematic and abstract way for formal veriﬁcation of fault-
tolerance.

High level of abstraction. The algorithms presented in [8, 10] make no assump—
tions about the system, except that they have ﬁnite state space. This high level of
abstraction enables the algorithms to be applicable to synthesize both ﬁnite state
hardware and software systems. Our focus on formal veriﬁcation of such abstract
algorithms makes it possible to extend our work to verify other algorithms in [11,42]
that are extensions of algorithms in [8] for any system regardless of the platform and
architecture. In addition, having the formal speciﬁcation and veriﬁcation developed
in this thesis, we can easily verify the extensions of the algorithms by reusing the
speciﬁcation developed in this thesis.

Correctness of synthesized programs. Another advantage of verifying a synthesis
algorithm rather than individual fault-tolerant programs is to guarantee that any

synthesized program by the algorithm is correct by construction. Thus, we are not

87

required to verify individual synthesized programs.

Reusability of formal proofs. Although most of the related work on formal veri-
ﬁcation of fault-tolerance provide conﬁdence in correctness of their concerns, reusing
the formal proof of one, in veriﬁcation of others is not quite convenient. Manual
reusability of formal proofs is the ﬁrst step to develop proof strategies. As an illus-
tration, in Section 4.3.2, we showed how we manually reused the formal proofs of
Add.failsafe to verify the soundness of Add_masking.

The issue of completeness. A synthesis algorithm is complete iff for any
given program p with the invariant S, if there exists a solution p’ with invariant S’
that satisﬁes the transformation problem then the algorithm always ﬁnds program
p’ and state predicate S’ . In Section 4.1.2, we have shown that the algorithm for
adding failsafe fault-tolerance in complete. However, in this thesis, we did not spent
a great deal of attention on veriﬁcation of the completeness of the algorithms due
to two reasons. First, in order to show the correctness of an algorithm, we are
mostly interested in veriﬁcation of the soundness of the algorithm. In our context,
we need to prove that the program synthesized by the algorithm is fault-tolerant
indeed; i.e., the synthesized program satisﬁes the transformation problem. Second,
in the low atomicity model, proving the completeness of a deterministic polynomial
time synthesis algorithm is irrelevant in the sense that the problem of adding fault-
tolerance to distributed programs is known to be NP-Complete [8,19]. In other words,
we have to apply heuristics [11, 42] to synthesize distributed fault-tolerant programs
in polynomial time. As a result of applying such heuristics, we lose the completeness
of the synthesis algorithms; i.e., if the heuristics are not applicable then the synthesis
algorithm fails to synthesize a fault-tolerant distributed program although there may
exist a fault-tolerant program that satisﬁes the requirements of the transformation
problem.

Lessons learned. Formalization and veriﬁcation of the synthesis algorithms

88

 

and ﬁxpoint theory presented in chapters 2 to 5 were not our ﬁrst and last attempts.
In fact, we tried several other ways to model the problem, but they were either not
abstract enough or they were too complicated to handle. We summarize the lessons

we learned as follows:

0 Sometimes it is very tempting to prove the easy-looking theorems before actually
verifying them manually. This temptation may lead the developer to spend a
lot of time to prove an incorrect theorem. Basically, if you cannot do a proof

by hand you cannot do it automatically.

0 In many cases a failed proof teaches more than a successful proof, because
the developer gains a better understanding of the problem and formalization

through a failed proof.

0 Improper formal speciﬁcation usually leads us to state wrong or difﬁcult theo-
rems. For instance, our ﬁrst attempt to formalize the largest ﬁxpoint calculation
was based on an inductive deﬁnition on the ﬁnite set itself rather the steps of
ﬁxpoint calculation. Here is our ﬁrst attempt to model the largest ﬁxpoint

calculation for removing deadlock states from a given state predicate X:

Dec (X: StatePred) : RECURSIVE StatePred =
LET ds = DeadlockStates(X) IN
IF ds = {} THEN X

ELSE Dec (X - ds)
ENDIF
MEASURE IXI

This formalization is probably the simplest way to model a recursive function

on a ﬁnite set, but using the induction schemes for ﬁnite sets in PVS is not quite

89

convenient and proof of very simple theorems turn to huge proof trees. On the
other hand, by deﬁning the recursive function based on the steps of ﬁxpoint
calculation rather than the ﬁnite set itself and using induction on the number
of steps, which is an integer, theorems become much easier to prove. Hence, a

simple formalization does not necessarily lead the developer to easier proofs.

The development of the ﬁxpoint theory was inspired by several implementations
of speciﬁc ﬁxpoint calculations. In fact, we did not develop the ﬁxpoint theory
before developing the synthesis algorithms! In many cases, several implemen-
tations of special cases of a problem leads the developer to develop a general

theory once and instantiate it to verify the special cases.

90

 

Chapter 8

Conclusion and Future Work

In this thesis, we focused on the problem of verifying transformation algorithms
that generate fault-tolerant programs that are correct by construction using PVS.
We considered the programs that are subject to a single class or multiple classes of
faults. For the case that a program is subject to a single class of faults we consid~
ered three types of fault-tolerance that are, failsafe, nonmasking, and masking. We
veriﬁed soundness and completeness of synthesis algorithms for adding failsafe and
nonmasking fault-tolerance. For addition of masking fault-tolerance, we only con-
sidered veriﬁcation of the soundness. For the synthesis algorithms associated with
addition of multitolerance, we veriﬁed the soundness of addition of failsafe-masking
and nonmasking-masking.

The theory for ﬁxpoint calculations on ﬁnite sets. The essence of adding
failsafe and masking fault-tolerance as well as failsafe-masking, and nonmasking-
masking multitolerance is calculating the new invariant of program, which in turn
involves calculating the ﬁxpoint of a formula. We developed a theory for ﬁxpoint
calculation on ﬁnite sets that is customized for speciﬁcation and veriﬁcation of fault-
tolerance. More speciﬁcally, we introduced the formalization of smallest ﬁxpoint

calculation to calculate reachability of a set of sates. Moreover, we presented the

91

 

formalization of largest ﬁxpoint calculation to calculate the set deadlock states of a
given state predicate. The theory developed in Chapter 3 is expected to be reusable
for other formalizations that involve ﬁxpoint calculations.

Mechanized veriﬁcation of fault-tolerance and multitolerance. For the
case that programs are subject to a single class of faults, we considered veriﬁcation of
addition of three levels of fault-tolerance. More speciﬁcally, ﬁrst, we developed a for-
mal framework for modeling fault-tolerance in PVS. This framework introduces formal
deﬁnitions for programs, speciﬁcations, faults, and fault-tolerance. Then, using the
framework, we formally proved that in the presence of faults, any synthesized program
by the Add.failsafe algorithm never violates safety and in the absence of faults, the
program maintains its speciﬁcation. For the Add_nonmasking algorithm, we showed
that the nonmasking fault-tolerant program provides recovery to the normal behavior
when the state of the program is perturbed by faults, and in the absence of faults it
maintains its speciﬁcation. Finally, we veriﬁed that a masking fault-tolerant program
provides safe recovery when the state of the program is perturbed by faults, and in
the absence of faults it maintains its speciﬁcation. In other words, we mechanically
veriﬁed that the algorithms for addition of fault-tolerance in [8] are sound. We also
addressed formal veriﬁcation of completeness of failsafe. More speciﬁcally, we proved
if there exists a failsafe program that satisﬁes the transformation problem (cf. Section
2.5), then the algorithm Add.failsafe never declares failure.

For the case that programs are subject to multiple classes of faults, we consid-
ered veriﬁcation of addition of two types of multitolerance. More speciﬁcally, ﬁrst,
we extended our formal framework for fault-tolerance, so that our formal deﬁni-
tions can deal with multiple classes of faults. Then, using the extended framework,
we formally proved that in the presence of faults, any synthesized program by the
Add.Failsafe_Masking algorithm (i) never violates safety when the state of the pro-

gram is perturbed in the presence of the class of faults that failsafe fault-tolerance is to

92

 

be provided; (ii) provides safe recovery when the state of the program is perturbed in
the presence of the class of faults that masking fault-tolerance is to be provided; (iii)
provides failsafe fault-tolerance if faults from both types occur, and (iv) maintains its
speciﬁcation in the absence of faults. For the Add_Nonmasking_Masking algorithm,
we showed that the nonmasking-masking multitolerant program (i) provides safe re-
covery to the normal behavior when the state of the program is perturbed by class
of faults that masking fault-tolerance is to be provided; (ii) provides recovery to its
invariant when the state of the program is perturbed by class of faults that nonmask-
ing fault-tolerance is to be provided; (iii) provides nonmasking fault-tolerance if faults
from both types occur, and (iv) maintains its speciﬁcation in the absence of faults.
In other words, we mechanically veriﬁed that the algorithms for addition of multitol-
erance in [10] are all sound. Veriﬁcation of multitolerance is one of the instances that
we simply reused both formal speciﬁcation and proofs of the framework developed in
Chapter 2 and the algorithms developed in Chapter 4. We expect this reusability for
veriﬁcation of other extensions of the algorithms [11,42].

The algorithms veriﬁed in this thesis synthesize programs in the high atom-
icity model, where a process can read and write all variables in an atomic step.
In [8], the authors have presented a non-deterministic algorithm for synthesizing
fault-tolerant distributed programs. Moreover, in [10], the authors have introduced
a non-deterministic algorithm for synthesizing failsafe-nonmasking multitolerant pro-
grams. The non-deterministic algorithms in both cases, ﬁrst, guess a solutions and
then veriﬁes a solution by itself. Formal veriﬁcation of such an algorithm that veriﬁes
its output by itself does not have any interesting point and, hence, we did not include
them in this thesis.

Soundness vs. completeness. In this thesis, we concentrated more on veriﬁ-
cation of soundness of algorithms rather than their completeness due to two reasons.

First, in order to show the correctness of an algorithm, we are mostly interested in

93

 

veriﬁcation of the soundness of the algorithm. In our context, we need to prove that
programs synthesized by an algorithm are fault-tolerant indeed. Second, proving the
completeness of a non-deterministic polynomial time synthesis algorithm is irrelevant
for the problem of adding fault-tolerance in the sense that they are known to be N P-
Complete. In order to deal with NP-Complete problems in addition of fault-tolerance,
several heuristics have been proposed [11,42]. As a result of applying such heuristics,
we lose the completeness of the synthesis algorithms. Therefore, we have to only
concentrate on verifying the soundness of such heuristics unless P = NP.

Advantages of veriﬁcation of synthesis algorithms. Since we focus on
veriﬁcation of the transformation algorithms, we note that our results ensure that
the programs synthesized using these algorithms indeed satisfy their required fault-
tolerance properties. Thus, our approach is more general than verifying a particular
fault-tolerant program.

In a broader context, the veriﬁcation of the algorithms considered in this thesis
will assist us in verifying several other transformations. For example, the algorithms
in [8, 9] have also been used to synthesize fault-tolerant distributed programs. As
an illustration, we note that the algorithms in [11,12,42I that are extensions of the
algorithms in [8, 9] have been used to synthesize solutions for several fault-tolerant
programs including, Byzantine agreement, consensus, token ring, and alternating bit
protocol. Thus, the theories developed in this thesis are directly applicable to verify
the transformation algorithms in [11,12,42] as well.

Future work. As a future work, one may consider mechanical veriﬁcation of
heuristics developed in [42] to synthesize fault-tolerant distributed programs. Also,
formal veriﬁcation of the algorithms proposed in [11] for enhancing the synthesis of
masking fault-tolerant distributed programs through their nonmasking version is also
an interesting problem.

Our experience shows that signiﬁcant number of proofs were reused. For instance,

94

 

we manually reused proofs of failsafe fault-tolerance to verify the soundness of the
algorithm for synthesizing masking fault-tolerant programs. We also, reused formal
proofs of soundness of addition of failsafe, nonmasking, and masking fault-tolerance
to verify the algorithms for addition of multitolerance. We expect to reuse many
of the theorems and proofs in future veriﬁcations as well. Therefore, a future work
is deve10ping automated proofs, called proof strategies, based on our experience in

reusability of formal proofs.

95

APPENDICES

96

Appendix A

Formal Speciﬁcation of the

Fault-Tolerance Hamework

FTEstate: TYPE]: THEORY

BEGIN

ASSUMING

ST.is_ﬁnite:

TRJsJ‘lnite :

IMPORTING

IMPORTING

IMPORTING

IMPORTING

IMPORTING

IMPORTING

ASSUMPTION is_ﬁnite_type [state]

ASSUMPTION is_ﬁnite_type [ [state , state] ]

sets [state]
setsJemmas [state]
ﬁnitesets [state]
sequences [state]
ﬁnite_fp [state]

min_nat

97

ENDASSUMING

Transition: TYPE = [state, state]

Action: TYPE = ﬁniteset [Transition]

Computation(Z: Action): TYPE =

{A: sequence[state] l V (n: nat): ((A(n), A(n+1))€ Z)}
preﬁx(Z: Action, j: nat): TYPE =

{c: sequence l V ((i: nat | i < j)): ((c(i), c(i+1))€Z)}

full_state: JUDGEMENT fullset [state] HAS_TYPE ﬁniteset
StateSpace: ﬁniteset = fullset [state]

S: StatePred

T: StatePred

p: Action

f: Action

sf : Action

3, so, 31: VAR state
X, Y: VAR StatePred
Z: VAR Action

2', j, k: VAR nat

g: VAR DecFunc

r: VAR IncFunc

closed?(S: StatePred, P: Action): bool =
V (30. 81)3 “((30, 30613) A (3065)) 3 (3155))

98

 

 

proj(p: Action, S: StatePred): Action =

(so, .91 I ((so, 31) 6p) A (3063) /\ (8165)}

faultspan?(T, S: StatePred, p: Action): bool =
(V so: (so E S) D (so 6 T)) /\ closed?(T, (pr))

axl: AXIOM closed?(S, p)

ax3: AXIOM

V (c: Computation((ZUf))):

(3 (j: nat): (V ((n: nat I n 2 j)): ((c(n), c(n+1))€Z)))

ax5: AXIOM nonempty?(S)

ms_init(anyFault: Action): StatePred =

{so I 3 (31): ((so, sl)EanyFault) /\ ((so, 31)Esf)}

reverse.reachable.states(anyFault: Action) (rs: StatePred): StatePred =

{80 l
El (81):

((31 6 rs) /\ ((so, 31) E anyFault) /\ ﬂ (so 6 rs))}

rs_if: JUDGEMENT reverse_reachablestates(anyFault: Action) HAS_TYPE

IncFunc

deadlockstates(p: Action) (ds: StatePred): StatePred =
{30 l

99

(so Eds) /\ (V (31): (.91 Eds) D n ((so, 31) Ep))}

ax6: AXIOM empty? (deadlockstates (p) (S) )

d1_df: JUDGEMENT deadlockstates(p: Action) HAs-TYPE DecFunc

ms(anyFault: Action): StatePred =

nu_ﬁx (msjnit (anyFault) ) (reverseJeachablestates (anyFault) )

ms(i) (anyFault: Action): StatePred :-

nu_inc (i , ms_init (anyFault) ) (reverseJeachablestates ( anyFault) )

mt(anyFault: Action): Action 2

{30, 31 | ((31 €ms(anyFault)) V ((so, 31) Esf))}

ConstructInvariant (X : StatePred , Z : Action) : StatePred =
mu_ﬁx (X) (deadlockstates (Z) )

ﬁnitefault: THEOREM
V (c: Computation((ZU f))):
3 (j: nat):

(V (n: nat): ((suffix(c, j)(n), sufﬁx(c, j)(n+1))EZ))

END FT

100

 

Appendix B

Formal Speciﬁcation of the

Fixpoint Theory

ﬁnitejpET: TYPE]: THEORY
BEGIN
ASSUMING
IMPORTING sets [T]
IMPORTING setsJemmas [T]
IMPORTING ﬁnitesetSETJ
IMPORTING mucalculus [T]
T.is_ﬁnite: ASSUMPTION is_ﬁnite_type[Tl

ENDASSUMING

StatePred: TYPE = ﬁniteset [T]
IncFunc: TYPE = [A: StatePred —> {B: StatePred I disjoint?(A, B)}]
DecFunc: TYPE = [A: StatePred —-> {B : StatePred I (Bg A)}]

StateSpace: ﬁniteset = fullset [T]

101

s: VAR T

2', j, k: VAR nat
X: VAR StatePred
g: VAR Dethnc

r: VAR IncFunc
fullset_ﬁnite: JUDGEMENT fullset[T] HAs_TYPE ﬁniteset

nu_inc(i: nat, X: StatePred) (r: IncFunc): RECURSIVE StatePred =
IF i = 0 THEN X ELSE (nu_inc(i— 1, X)(r)Ur(nu_inc(i— 1, X)(r))) EN-

DIF

MEASURE (A (:23: nat, y: StatePred): 2:)

nu_ﬁx(X: StatePred) (r: IncFunc): StatePred =
{s | 3 (k: nat): (sEnancUc, X)(r))}

mu_dec(i: nat, X: StatePred) (g: DecFunc): RECURSIVE StatePred =

[F i = 0 THEN X ELSE (mu_dec(i— 1, X)(g)\g(mu.dec(i— 1, X)(g))) EN-

DIF

MEASURE (A (x: nat, y: StatePred): x)

mu_ﬁx(X: StatePred) (g: DecFunc): StatePred =
{s I V (k: nat): (s€mu_dec(k, X)(g))}

sfpcal(fpSet: StatePred) (r: IncFunc): RECURSIVE StatePred =

IF nonempty? (r(fpSet)) THEN sfpcal((r(fpSet) U fpSet)) (r) ELSE fpSet EN-

DIF

102

MEASURE card(StateSpace) — card(fpSet)

lfpcal(fpSet: StatePred) (g: DecFunc): RECURSIVE StatePred =
IF empty?(g(fpSet)) THEN fpSet ELSE lfpcal((fpSet\g(fpSet)))(g) ENDIF

MEASURE card (fpSet)

mul: LEMMA (j < k) D (mu_dec(k, X)(g) Qmu_dec(j, X)(g))

mu3: LEMMA
V (j: nat) :
nonempty? (g (mu_dec(j, X) (g))) D
card(mu.dec(j + 1, X) (g)) g card(X) —j -1

mu5: LEMMA
V (j: nat):
empty?(g(mu.dec(j, X) (g))) D
(V ((k: nat I k 2 j)): empty?(g(mu-dec(k, X)(g))))

mu6: LEMMA 3 (j: nat): V ((k: nat l k _>_ 3)): empty?(g(mu-dec(k, X)(g)))
mu7: LEMMA
3 (j: nat):
V ((k: nat I k 2 j)):

mu_dec(k, X) (g) = mu_dec(j, X) (g) /\ empty?(g(mu.dec(k, X) (g)))

mu8: LEMMA

3 (j: nat):

103

((mu-dec(j, X) (g) = mu_ﬁx(X) (g)) /\ empty?(g(mu_dec(j, X)(g))))

mu9: THEOREM empty? (9 (mu_ﬁx (X) (g) ))

mulO: THEOREM mu_ﬁx(X) (g) = mu_ﬁx (mu_ﬁx(X) (g)) (g)

nul: LEMMA (k < j) 3 (nu_inc(k, X)(r)§nu.inc(j, X)(r))

nu2_3: LEMMA
(sEnancU-I—l, X)(r)) D
((s e r(nu_inc(j, X)(r))) v
(3 ((k: nat l k g j)): (s€nu_inc(k, X)(r))))

nu2_4: LEMMA
(senuﬁx(X)(r)) D
(3 (j: nat):
(s€r(nu.inc(j, X)(r))) v
(3 ((k: nat I k S j)): (sEnancUc, X)(r))))

nu3: LEMMA
V (j: nat):

nonempty? (r (nu.inc (j , X) (r) )) D

 

card(nu_inc(j+ 1, X)(r)) S
card(fullset[TI) — card(X) — j — 1

nu5: LEMMA

V (j: nat):

104

 

empty?(r(nu_inc(j, X) (r))) 3
(V ((k: nat I k 2 j)): empty?(r(nu_inc(k, X)(r))))

nu6: LEMMA 3 (j: nat): V ((k: nat I k _>_ 3)): empty?(r(nu_inc(k, X)(r)))
nu7: LEMMA
3 (j: nat):
V ((k: nat I k 2 j)):
nu_inc(k, X) (r) = nu_inc(j, X) (r) /\ empty?(r(nu_inc(k, X) (r)))
nu8: LEMMA
3 (j: nat):
((nu_inc(j, X) (r) = nu_ﬁx(X) (r)) /\ empty?(r(nu_inc(j, X) (r))))
nu9: THEOREM empty? (r (nu_ﬁx (X) (r) ))

nu10: THEOREM nu.ﬁx(nu_ﬁx(X) (r)) (r) = nu.ﬁx(X) (r)

END ﬁnite_fp

105

Appendix C

Formal Speciﬁcation of the
Synthesis of Failsafe

Fault-Tolerance

add_failsafe [statez TYPE]: THEORY

BEGIN
ASSUMING
ST.is_ﬁnite: ASSUMPTION is_ﬁnite_type [state]
T R_is_ﬁnite: ASSUMPTION is_ﬁnite_type[[state, state]]
IMPORTING FT [state]

ENDASSUMING

i, j: VAR nat

so, 31: VAR state

106

Sp(anyFault: Action): StatePred :2

ConstructInvariant((S \ ms(anyFault)) , (p \ mt(anyFault)))

pp(anyFault: Action): Action =
(p \ mt(anyFault))

{50, sll -: so Sp(anyFault) A 3 member(so, Sp(anyFault))}

sc:
{c: Computation(f) I

v ((3: state I (SEmanit(f)))):

(c(O) = s) A ((c(O), c(1))€sf)}

scl:
{c: Computation(f) I
V ((t: Transition I (tE(pr]sf)))): c(O) = t‘2}

isfailsafe?(Spp: StatePred, ppp: Action): bool =
nonempty?(Spp) A
closed?(Spp, ppp) A
(Spp Q S) A
(pr0j(ppp, Spp) Q proj(P, Spp)) A
empty? (deadlockstates (ppp) (Spp) ) A
(V ((c: Computation((pppU f)) I (c(O) E Spp))):
V (j: nat): -1 ((c(j), c(j+1)) Esf))

add_failsafe_fails?: bool 2 empty? (Sp (f) )

107

 

rs_disj: LEMMA

V (X : StatePred) : disjoint? (X , reverse_reachable_states(f) (X ))

rs_empty: LEMMA reverse.reachable.states( f) (emptyset) = emptyset

disj _Sp_ms: THEOREM disjoint? (Sp(f) , ms (f) )

disj _Sp_sf : THEOREM disjoint? (Sp (f) , ms_init (f) )

ms_f_sf : LEMMA

V (30, 81)3 ((30, 51)€f) A ((30, SIIESf) D (806ms(f))
prop: LEMMA (ms_init(f) g ms(f))
disj_pp_sf: LEMMA disjoint?(pp(f), sf)
prop2: LEMMA
V (anyFault: Action):
(((so, 31) E anyFault) A (31 E ms(anyFault))) D

(so E ms(anyFault))

pr0p3: LEMMA
V (80, 81):
(((so, 31) 65f) A ((30, 31) E f)) 3 (306 ms(f))

prOp4: LEMMA

V (j: nat):

108

 

((30 e ms(j+1)(f)) A 3 (30 E ms(j)(f))> D
(3 (so: ((30, 31) 6f) A (31 €ms(j)(f)))

ﬁx3: THEOREM ConstructInvariant(Sp( f) , (p \ mt( f )) = Sp( f )

scondl: THEOREM (Sp(f) g S)

scond2: THEOREM (pr0j(pp(f), Sp(f)) g proj(p, Sp(f)))

scond2_1: COROLLARY (pp( f) 9 p)

scond3_1: LEMMA closed?(Sp(f) , pp(f))

scond3_1_1: THEOREM
V (j: nat):

V (C: preﬁx((pp(f)Uf). 1)):
(c(0)€Sp(f)) D (V ((k: nat | k < j)): 3 (c(k)€ms(f)))

scond3-1_2: THEOREM
V (j: nat):
V (c: preﬁx((pp(f)Uf), 3)):
(6(0) ESp(f)) D
(V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf))

scond3_2: THEOREM
V (c: Computation (pp(f))):
(c(0)€Sp(f)) D (V (j: nat): (c(j) ESp(f)))

109

scond3_3: THEOREM empty? (deadlockstates ( (p \ mt( f ))) (Sp(f) ))

scond3_3-1 : THEOREM empty? (deadlockstates (pp (f) ) (Sp (f) ) )

scond3-4: THEOREM
V (c: Computation((pp(f)Uf))):
(c(0)ESp(f)) D (V (j: nat): 3 (c(j) €ms(f)))

scond3-5: THEOREM
V (c: Computation((pp(f)Uf))):
(c(O) ESp(f)) D (V (j: nat): 3 ((c(j), c(j+1)) Esf))

scond3-6: THEOREM

3 (T: StatePred): faultspan?(T, Sp(f), (pp(f)Uf))

ccondl: LEMMA
V ((3: state I (sEms(i)(f)))):
(3 ((c: Computation(f) I c(O) = S)):
3 (k: nat): ((c(k), c(k+1))€sf))

ccond2: LEMMA
V ((3: state I (s€ms(f)))):
(3 ((c: Computation(f) I c(O) = 8)):
3 (k: nat): ((c(k), c(k+1))€sf))

ccond3: LEMMA

110

V ((t: Transition I (tEmt(f)))):
3 ((c: Computation(f) l c(0) = t‘1)):
3 (k: nat): ((c(k), c(k+1))€sf)

ccond4: THEOREM
V (Spp: StatePred, ppp: Action):

is_failsafe?(Spp, ppp) D (Spp Q (S\mS(f)))

ccond5: THEOREM

V (Spp: StatePred, ppp: Action):

isiailsafe?(Spp. ppp) D (pr0j(ppp, Spp) Q (p\mt(f)))

ccond6: LEMMA
(3 (Spp: StatePred, ppp: Action): is_failsafe?(Spp, ppp)) D
nonempty?((S \ ms(f)))

completeness: THEOREM
(3 (Spp: StatePred, ppp: Action): is_failsafe?(Spp, ppp)) D

3 add_failsafe_fails?

END add_failsafe

111

Appendix D

Formal Speciﬁcation of the
Synthesis of Nonmasking

Fault-tolerance

addmonmasking Estate: TYPE]: THEORY

BEGIN

ASSUMING
ST_is_ﬁnite: ASSUMPTION is.ﬁnite_type [state]
TR_is_ﬁnite: ASSUMPTION is_ﬁnite_type[[state, state]]
IMPORTING FT [state]

ENDASSUMING

so, 31: VAR state

112

 

Sp(anyS: StatePred): StatePred = anyS

pp(anyS: StatePred, anyp: Action): Action —_—

(Pr0j(anyp, anyS)U{80, 81 I T (30 E anyS) A (816 anyS)})

scond1_1: LEMMA (Sp(S) g S)

scond1_2: LEMMA closed?(Sp(S). PMS, p))

scond2: LEMMA (proj(pp(S, p), Sp(S))Qproﬂp, Sp(S)))

scond3: LEMMA

V (c: Computation(pp(S, p))): 3 ((j: nat I j > 0)): (c(j)€Sp(S))
scond4: LEMMA
V (c: Computation((pp(S, p)Uf))):

3 ((j: nat I j > 0)): (c(j)ESp(S))

soundness: THEOREM

3 (T: StatePred): faultspan?(T, Sp(S), (pp(S, p)Uf))

END addmonmasking

113

Appendix E

Formal Speciﬁcation of the
Synthesis of Masking

Fault-tolerance

addmaskingEstate: TYPE]: THEORY

BEGIN

ASSUMING
ST_is.ﬁnite: ASSUMPTION is.ﬁnite_type [state]
TR_is_ﬁnite: ASSUMPTION is_ﬁnite_type[[state, state]]
IMPORTING addiailsafe [state]

ENDASSUMING

s, so, 31: VAR state

52 , T2 : StatePred

114

mask_ax: AXIOM (S2 Q T2)
reachable?(S, T: StatePred, p: Action, 3: state): bool =
3 (c: Computation(p)):

(sET) A s = 6(0) A (3 (j: nat): (c(j)ES))

S_init(anyFault: Action): StatePred =

ConstructInvariant ((S \ ms(anyFault)) , (p \ mt(anyFault)))

T_init(anyFault: Action): StatePred = {s l 3 (sE ms(anyFault))}

TmS : Action 2

{30: 31 I (ﬁ (80 652)) /\ ((306712) A (316T2))}

p1(anyFault: Action): Action = ((proj(p, S2)UTmS)\mt(anyFault))

TmR(anyFault: Action): StatePred =
{s I (sETQ) A reachable?(Sg, T2, p1(anyFault), 3)}

TcL(anyFault: Action) (X: StatePred): StatePred =

{30 I
3 (81):

(30 E X) /\ ((30, 81) E anyFault) A "I (81 E X)}

ConstructFaultSpan(X : StatePred , anyFault: Action) : StatePred =
mu_ﬁx (X) (TcL(anyFault))

115

T1(anyFau1t: Action): StatePred =

ConstructFaultspan (TmR (anyFault) , anyFault)

S1(anyFault: Action): StatePred =

ConstructInvariant ((32 ﬂ T1(anyFault)) , p1(anyFault))

T1_T2: LEMMA (T1(f) Q T2)

SIJSZ:

SIJTI:

propl :

prop2:

LEMMA (51(f) E 32)

THEOREM V (f: Action): (Sl(f) Q T1(f))

LEMMA disjoint?(T2, ms(f)) D disjoint?(T1 (f), ms(f))

LEMMA V (f: Action): empty?(TcL(f) (T1(f)))

sf_p1: LEMMA disjoint?(p1(f), sf)

prOp: LEMMA V (f : Action): ConstructFaultspan(T1 (f), f) = T1(f)

scondl: LEMMA V (f: Action): closed?(Sg, p1(f))

scond2: LEMMA V (f: Action): closed?(Tg, p1(f))

scond3_1: THEOREM (52 Q S) 3 (Sl(f) Q S)

116

scond3-2: THEOREM
(proj(pdf), S2)§pr0j(P, S2)) 3
(Pf0.l(P1(f)a 51(f))§proj(z?, Sl(f)))

scond4: LEMMA V (f: Action): closed?(T1(f), f)

scond4_1: THEOREM V (f: Action): (Sl(f) = Sg) D closed?(Sﬂf), p1(f))

scond4.2: THEOREM

V (f: Action): (T1(f) = T2) 2) closed?(T1(f), (P1(f)Uf))

scond7: LEMMA (T1 (f) = T2) 3 closed?(Tg, f)

scond8: THEOREM
V (f: Action):
(((Sl(f) = S2) A (T1(f) 2 T2)) 3
(V ((3: state I (sET1(f)))):
reachable?(Sl(f), T1(f), p1(f), s)))

scond8_2: THEOREM
V (f: Action):
(((Sl(f) = 32) A (T1(f) 2 T2)) 3
(V ((3: state I (sET1(f)))):
reachable?(Sl(f), T1(f), pp(f), S)))

scond3-3: THEOREM

V (c: Computation((p1(f)Uf))):

117

(c(0)6T1(f)) 3 (V (j: nat>= 3 (c(j) €ms(f))>

scond3-4: THEOREM
(T1(f) = T2) 3
(V (c: Computation((p1(f)Uf))):
(C(0)€T1(f)) D (V (j: nat): 3 ((60), C(j+1))€sf))>

scond3_5: THEOREM

(T1 (f) = T2) 3 (3 (T: StatePred): faultspan?(T, 51(f). P1(f)))

scond3_6: THEOREM empty? (deadlockstates (p1 (f) ) (51 (f) ))

scond3_8: LEMMA
(Sl(f) = 32 A T1(f) = T2) 3
(V ((c: Computation((pp(f)Uf)) | (c(0)€T1(f)))):
3 ((j: nat | j > 0)): (c(j)€Sl(f)))

scond3-9: THEOREM
(Sl(f) = 52 A T1(f) 2 T2) 3
(V (j: nat):
V (c: preﬁx((p1(f)Uf), 3)):
(0(0) €T1(f)) 3
(V ((k: nat l k < j)): 3 (c(k)€ms(f))))

scond3_10: THEOREM
(51(f) = $2 /\ T1(f) = T2) 3

(V (j: nat):

118

 

V (c: preﬁx((pp(f)Uf). 2)):
(6(0) €T1(f)) D
(V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf)))

END adenasking

119

Appendix F

Formal Speciﬁcation of the

Synthesis of Multitolerance

adenultift [statez TYPE]: THEORY

BEGIN
ASSUMING

ST _is_ﬁnite: ASSUMPTION is_ﬁnite_type [state]
TR_is_ﬁnite: ASSUMPTION is_ﬁnite_type[[state, state]]
IMPORTING add_failsafe [state]
IMPORTING addmonmasking [state]
IMPORTING adenasking [state]

ENDASSUMING

i, j: VAR nat

so, 31: VAR state

120

 

Z : VAR Action

f_failsafe: Action

Lnonmasking: Action

fJnasking: Action

f_nonmasking_masking: Action 2 (fJnaskingULnonmasking)

f_failsafe_masking: Action = (fJnaskingLijailsafe)

ax4: AXIOM

V (c: Computation((Z U fmonmasking)» :

(3 (j: nat): (V ((n: nat I n 2 j)): ((c(n), c(n+1))€Z)))

Sp1: StatePred = add.masking.S1 (fJnasking)

Tp1: StatePred = add_nonmasking.Sp(T1(f.masking))

ppl: Action = addmonmasking.pp(Tp1, p1(f_masking))

ms: StatePred = FT .ms(f_failsafe.masking)

mt: Action = FT .mt(f_failsafe_masking)

121

 

TJnasking: StatePred = T1(f_masking)

p1: Action 2 add_masking.p1(f_masking)

Sp2: StatePred = adenasking.Sl(f_masking)

Tp2: StatePred = ConstructInvariant((T-masking\ms) , (p1\mt))

pp2: Action =
(p1 - mt)-
(sO, sl)| ((30, 31) 6 (p1 — mt)) AND ((sOETp2) AND NOT (3] ETp2))
ﬁnite_fault_nonmasking: THEOREM
V (c: Computation( (Z Ufmonmaskingnnasking)»:
3 (j: nat):
(V (n: nat): ((sufﬁx(c, j)(n), sufﬁx(c, j)(n+1))€Z))

scond3: LEMMA

V (c: Computation(pp1)): 3 ((j: nat I j > 0)): (c(j)€Tp1)

scond4: LEMMA
V (c: Computation ( (pp1 U f_nonmasking_masking))) :
3 ((j: nat I j > 0)): (c(j)6Tp1)

scond1-l: THEOREM (52 Q S) 3 (Sp1 Q S)

scond1-3-2: THEOREM

(proj(ppl, S2)§pr0j(p, 52)) D (proj(ppl, Sp1)§pr0i(p, Sp1))

122

 

scond1-3_5: THEOREM (Spl Q Tpl)

scond1-3_3: LEMMA

(adenasking.T1(f_masking) = T2) 3 closed?(Tpl, (pplUfJnasking))
scond1_3_4: LEMMA (adenasking.S1(f_masking) = 52) D closed?(Spl, ppl)
scond1_8: THEOREM

((81 (f_masking) = 52) A (T1(f_masking) == T2)) 3

(V ((3: state | (seTp1))): reachable?(Sp1, Tpl, ppl, 3))

scond1_3_6: THEOREM

3 (T: StatePred): faultspan?(T, Sp1, (pplUfmonmaskingnnaskingﬂ

scond2_1: THEOREM (32 Q S) :3 (Sp2 Q S)

scond2_3_2 : THEOREM

(proj(pp2, $2) Sprojm 82)) D (proj(pp2, SP2) 9 pr0j(p, Sp2))

scond2_3_3: THEOREM (Sp2 = S2) 3 closed?(Sp2, pp2)

scond2_3_5: THEOREM (Sp2 Q Tp2)

scond2_3-1 -1 : THEOREM

disjoint?(T2, ms) 3

(V (j: nat):

123

V (c: preﬁx((pp2Uf_failsafe.masking), j)):
(C(0)€Sp2) I) (V ((k: nat I k < j)): 3 (c(k)€ms)))

scond2_3_1-2: THEOREM
disjoint?(T2, ms) 3
(V (j: nat):
V (c: preﬁx((pp2Uf_failsafe_masking), 3)):
(C(0) E Sp2) D
(V ((k: nat I k < j)): 3 ((c(k), c(k+1))€sf)))

scond2_3_3_1 : THEOREM empty? (deadlockstates((p1 \mt)) (Tp2))

scond2-3-3_2: THEOREM empty? (deadlockstates (pp2) (Tp2))

scond2_8: THEOREM

((Sp2 2 S2) A (Tp2 2 T2)) 2)

(V ((3: state I (sETp2))): reachable?(Sp2, Tp2, pp2, 8))

scond2_3_6: THEOREM

3 (T: StatePred): faultspan?(T, Sp2, (pp2Uf_failsafeJnasking))

END add_multift

124

[1]

I2]

[3]

[4]

I5]

[6]

[7]

l8]

I9]

I10]

I11]

[12]

Bibliography
K. Bhargavan, D. Obradovic, and C. Gunter. Formal veriﬁcation of standards
for distance vector routing protocols. Journal of the ACM, 49(4):538 — 576, 2002.

S. S. Kulkarni. Component-based design of fault-tolerance. PhD thesis, Ohio
State University, 1999.

Sam Owre, John Rushby, Natarajan Shankar, and Friedrich von Henke. Formal
veriﬁcation for fault-tolerant architectures: Prolegomena to the design of PVS.
IEEE Transactions on Software Engineering, 21(2):107—125, February 1995.

Heiko Mantel and Felix C.Gartner. A case study in the mechanical veriﬁcation
of fault-tolerance. Technical Report TUD-BS—1999-08, Department of Computer
Science, Darmstadt University Of Technology, 1999.

S. Qadeer and N. Shankar. Verifying a self-stabilizing mutual exclusion algo-
rithm. In David Gries and Willem-Paul de Roever, editors, IFIP International
Conference on Programming Concepts and Methods (PROCOMET ’98), pages
424—443, Shelter Island, NY, June 1998. Chapman & Hall.

E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commu-
nications of the ACM, 17(11), 1974.

S. S. Kulkarni, J. Rushby, and N. Shankar. A case-study in component-based
mechanical veriﬁcation of fault-tolerant programs. Proceedings of the 19th IEEE

International Conference on Distributed Computing Systems Workshop on Self-
Stabilization ( WSS ’99) Austin, Texas, USA, pages 33-40, June 1999.

S. S. Kulkarni and A. Arora. Automating the addition of fault-tolerance. Formal
Techniques in Real- Time and Fault— Tolerant Systems, 2000.

S. S. Kulkarni and A. Arora. Automating the addition of fault-tolerance. Techni-
cal Report MSU-CSE—OO—13, Department of Computer Science and Engineering,
Michigan State University, East Lansing, Michigan, 2001.

S. S. Kulkarni and A. Ebnenasir. Automated synthesis of multitolerance. IEEE
Conference on Dependable and Network Systems (DSN’04), 2004.

S. S. Kulkarni and A. Ebnenasir. Enhancing the fault-tolerance of nonmasking
programs. International Conference on Distributed Computing Systems, 2003.

Ali Ebnenasir and Sandeep S. Kulkarni. A framework for automatic synthe-
sis of fault-tolerance. http://www.cse.msu.edu/”sandeep/software/Code/
synthesis-frameworkl.

[13] B. Alpern and F. B. Schneider. Deﬁning Iiveness. Information Processing Letters,

21:181—185, 1985.

125

[14] A. Arora and M. G. Gouda. Closure and convergence: A foundation of fault-
tolerant computing. IEEE Transactions on Software Engineering, 19(11):1015-
1027, 1993.

[15] N. Shankar, S. Owre, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS Prover
Guide: Version 2.4. Computer Science Laboratory, SRI International, Menlo
Park, CA, November 2001. URL: http://pvs.csl.sri.com/manuals.html.

[16] S. Owre, N. Shankar, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS System
Guide, Version 2.4. Computer Science Laboratory, SRI International, Menlo
Park, CA, December 2001. URL: http://pvs.csl.sri.com/manuals.html.

[17] S. Owre, N. Shankar, J. M. Rushby, and D. W. J. Stringer-Calvert. PVS Lan-
guage Reference, Version 2.4. Computer Science Laboratory, SRI International,
Menlo Park, CA, December 2001. URL: http://pvs.csl.sri.com/manuals.html.

[18] H. P. Barendregt. Handbook of Theoretical Computer Science: Volume B, Chap-
ter 7, Functional Programming and Lambda Calculus. Elsevier Science Publishers
B. V., 1990.

[19] S. S. Kulkarni and A. Ebnenasir. The complexity of adding failsafe fault-
tolerance. International Conference on Distributed Computing Systems, 2002.

[20] A. Arora and S. S. Kulkarni. Component based design of multitolerant systems.
IEEE Transactions on Software Engineering, 24(1):63—78, January 1998.

[21] Tomoyuki Yokogawa, Tatsuhiro Tsuchiya, and Tsuchiya Kikuno. Automatic
veriﬁcation of fault tolerance using model checking. International Symposium on
Dependable Computing, 2001.

[22] Wilfried Steiner and John Rushby. Model checking a fault-tolerant startup algo-
rithm: From design exploration to exhaustive fault simulation. IEEE Conference
on Dependable and Network Systems (DSN’04), 2004.

[23] John Rushby. Sal tutorial: Analyzing the fault-tolerant algorithm om(1). Tech-
nical report, CSL, SRI International, 2004.

[24] F. Schneider, S. M. Easterbrook, J. R. Callahan, and G. J. Holzmann. Validat-
ing requirements for fault tolerant systems using model checking. Third IEEE
Conference on Requirements Engineering, 1998.

[25] P. Ramanathan and K. G. Shin. Use of common time base for checkpointing
and rollback recovery in distributed systems. IEEE Transactions on Software
Engineering, 19:571-583, 1993.

[26] Jean-Charles Fabre, Thomas Losert, Eric Marsden, Nick Moffat, Michael
Paulitsch, David Powell, and William Simmonds. Validation of fault tolerance
and timing properties. Project Deliverable DSC2 for DSoS, Research Report
20/ 2003, Technische Universitat Wien, Institut fiir Technische Informatik, 'Il‘e-
itlstr. 1-3/ 182-1, 1040 Vienna, Austria, 2003.

126

[27]

Ali Ebnenasir. Automatic synthesis of distributed programs: A survey. 2002.
http://www.cse.msu.edu/“ebnenasi/survey.pdf.

[28] N. S. Bjorner. A servey of reactive synthesis. 1996. http://theory.stanford.

edu/peoPle/nikolaj/dimacs96.ps.

[29] E. A Emerson. Handbook of Theoretical Computer Science: Chapter 16, Tempo-

I30]

[31]

I32]

I33]

I34]

I35]

[36]

I371

[38]

I39]

[40]

ml and Modal Logics. Elsevier Science Publishers B. V., 1990.

EA. Emerson and EM. Clarke. Using branching time temporal logic to synthe-
sis synchronization skeletons. Science of Computer Programming, 2(3):241—266,
1982.

Z. Manna and R. Waldinger. A deductive approach to program synthesis. ACM
Transactions on Programming Languages and Systems, 2(1):90—121, 1980.

Z. Manna and P. Wolper. Synthesis of communicating processes from temporal
logic speciﬁcations. ACM Transactions on Programming Languages and Systems,
6(1):68-93, 1984.

P. Attie and E. Emerson. Synthesis of concurrent systems with many similar pro-
cesses. ACM Transactions on Programming Languages and Systems, 20(1):51—
115, 1998.

Paul C. Attie and E. Allen Emerson. Synthesis of concurrent programs for an
atomic read/write model of computation. ACM Transactions on Programming
Languages and Systems (TOPLAS), 23(2):187 -— 242, 2001.

A. Arora, P. C. Attie, and E. A. Emerson. Synthesis of fault-tolerant concurrent
programs. Proceedings of the 17th ACM Symposium on Principles of Distributed
Computing (PCDC), 1998.

W. Thomas. Handbook of Theoretical Computer Science: Chapter 4, Automata
on Inﬁnite Objects. Elsevier Science Publishers B. V., 1990.

A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings
of the 16th ACM Symposium on Principles of Programming Languages, pages
179—190, 1989.

A. Pnueli and R. Rosner. On the synthesis of an asynchronous reactive module.
In Proceeding of 16th International Colloqium on Automata, Languages, and
Programming, Lec. Notes in Computer Science 372, Springer-Verlagz652—671,
1989.

O. Kupferman and M.Y. Vardi. Synthesizing distributed systems. In Proc. 16th
IEEE Symp. on Logic in Computer Science, July 2001.

M. Y. Vardi. An automata—theoretic approach to fair realizability and synthesis.
Computer Aided Verﬁﬁcation, volume .939 of LNCS,, pages 267—278, 1995.

127

[41] A. Arora. A foundation of fault-tolerant computing. PhD thesis, The University
Of Texas at Austin, 1992.

[42] S. S. Kulkarni, A. Arora, and A. Chippada. Polynomial time synthesis of Byzan-
tine agreement. Symposium on Reliable Distributed Systems, 2001.

[43] J. Kljaich Jr., B. T. Smith, and A. S. Wojcik. Formal veriﬁcation of fault tol-
erance using theorem-proving techniques. IEEE Transactions on Computers,
38(3):366 — 376, 1989.

[44] PS. Miner and SD. Johnson. Veriﬁcation of an Optimized fault-tolerant
clock synchronization circuit. Third Workshop on Designing Correct Circuits
(DCC96), 1996.

[45] NASA Langley Research Center Formal Methods Group. SPIDER homepage.
http://shemesh.1arc.nasa.gov/fm/spider/.

[46] Alfons Geser and Paul S. Miner. A formal correctness proof of the SPIDER
diagnosis protocol. In Theorem Proving in Higher-Order Logics (TPHCLs) -
Track B Proceedings, number NASA/CP-2002—211736, pages 71—86, NASA Lan-
gley Research Center, Hampton, VA, August 2002.

[47] TR Melham. Abstraction mechanisms for hardware veriﬁcation. In G. Birtwistle
and RA. Subrahmanyam, editors, VLSI Speciﬁcation, Veriﬁcation, and Synthe-
sis, pages 129-157, Boston, 1988. Kluwer Academic Publishers.

[48] Ben L. Di Vito and Ricky W. Butler. Formal techniques for synchronized fault-
tolerant systems. 3rd IFIP Working Conference on Dependable Computing for
Critical Applications, 1992.

[49] John Rushby. Systematic formal veriﬁcation for fault-tolerant time-triggered
algorithms. IEEE Transactions on Software Engineering, 25(5):651-661, 1999.

[50] Natarajan Shankar. Mechanical veriﬁcation of a generalized protocol for byzan-
tine fault tolerant clock synchronization. In J. Vytopil, editor, Formal Techniques
in Real— Time and Fault- Tolerant Systems, volume 571 of Lecture Notes in Com-
puter Science, pages 217—236. Springer-Verlag, 1992.

[51] Patrick Lincoln and John Rushby. The formal veriﬁcation of an algorithm for
interactive consistency under a hybrid fault model. In Costas Courcoubetis,
editor, Computer-Aided Veriﬁcation, CAV ’93, volume 697 of Lecture Notes in
Computer Science, pages 292—304, Elounda, Greece, June/July 1993. Springer-
Verlag.

[52] John Rushby. Formal veriﬁcation of an oral messages algorithm for interactive
consistency. Technical Report CSL—91-01, SRI International, 1992.

[53] E. W. Dijkstra. A belated proof of self-stabilization. Distributed Computing,
(1):5—6, 1986.

128

[54] Lee Pike, Jeffery Maddalon, Paul Miner, and Alfons geser. Abstraction for fault-
tolerant distributed system veriﬁcation. Theorem Proving in Higher- Order Logics
(TPHCLs), 2004.

[55] S. S. Kulkarni, Borzoo Bonakdarpour, and Ali Ebnenasir. Mechanical veriﬁcation
of automatic synthesis of fault-tolerant programs. International Symposium on
Logic-Based Program Synthesis and Transformation (LOPS TR), 2004.

[56] S. S. Kulkarni, Borzoo Bonakdarpour, and Ali Ebnenasir. Mechanical veriﬁcation
of automatic synthesis of failsafe fault-tolerance. Theorem Proving in Higher—
Oreder Logics (TPHOLs) Emerging Trends, 2004.

129

   

IIIIIIIIIIIIII[III]