Automated addition of fault-tolerance via lazy repair and graceful degradation

In this dissertation, we concentrate on the problem of automated addition of fault-tolerance that transforms a fault-intolerant program to be a fault-tolerant program. We solve this problem via model repair. Model repair is a correct-by-construct technique to revise an existing model so that the revised model satisfies the given correctness criteria, such as safety, liveness, or fault-tolerance. We consider two problems of using model repair to add fault-tolerance. First, if the repaired model violates the assumptions (e.g., partial observability, inability to detect crashed processes, etc) made in the underlying system, then it cannot be implemented. We denote these requirements as realizability constraints. Second, the addition of fault-tolerance may fail if the program cannot fully recover after certain faults occur. In this dissertation, we propose a lazy repair approach to address realizability issues in adding fault-tolerance. Additionally, we propose a technique to automatically add graceful degradation to a program, so that the program can recover with partial functionality (that is identified by the designer to be the critical functionality) if full recovery is impossible.A model repair technique transforms a model to another model that satisfies a new set of properties. Such a transformation should also maintain the mapping between the model and the underlying program. For example, in a distributed program, every process is restricted to read (or write) some variables in other processes. A model that represents this program should also disallow the process to read (or write) those inaccessable variables. If these constraints are violated, then the corresponding model will be unrealizable. An unrealizable model (in this context, a model that violates the read/write restrictions) may make it impossible to obtain the corresponding implementation.%In this dissertation, we call the read (or write) restriction as a realizability constraint in distributed systems. An unrealizable model (a model that violates the realizability constraints) may complicate the implementation by introducing extra amount of modification to the program. Such modification may in turn break the program's correctness.Resolving realizability constraints increases the complexity of model repair. Existing model repair techniques introduce heuristics to reduce the complexity. However, this heuristic-based approach is designed and optimized specifically for distributed programs. We need a more generic model repair approach for other types of programs, e.g., synchronous programs, cyber-physical programs, etc. Hence, in this dissertation, we propose a model repair technique, i.e., lazy repair, to add fault-tolerance to programs with different types of realizability constraints. It involves two steps. First, we only focus on repairing to obtain a model that satisfies correctness criteria while ignoring realizability constraints. In the second step, we repair this model further by removing behaviors while ensuring that the desired specification is preserved. The lazy repair approach simplifies the process of developing heuristics, and provides a tradeoff in terms of the time saved in the first step and the extra work required in the second step. We demonstrate that lazy repair is applicable in the context of distributed systems, synchronous systems and cyber-physical systems.In addition, safety critical systems such as airplanes, automobiles and elevators should operate with high dependability in the presence of faults. If the occurrence of faults breaks down some components, the system may not be able to fully recover. In this scenario, the system can still operate with remaining resources and deliver partial but core functionality, i.e., to display graceful degradation. Existing model repair approaches, such as addition of fault-tolerance, cannot transform a program to provide graceful degradation. In this dissertation, we propose a technique to add fault-tolerance to a program with graceful degradation. In the absence of faults, such a program exhibits ideal behaviors. In the presence of faults, the program is allowed to recover with reduced functionality. This technique involves two steps. First, it automatically generates a program with graceful degradation based on the input fault-intolerant program. Second, it adds fault-tolerance to the output program from first step. We demonstrate that this technique is applicable in the context of high atomicity programs as well as low atomicity programs (i.e., distributed programs). We also present a case study on adding multi-graceful degradation to a dangerous gas detection and ventilation system. Through this case study, we show that our approach can assist the designer to obtain a program that behaves like the deployed system.

Read

In Collections: Electronic Theses & Dissertations

Copyright Status: In Copyright

Material Type: Theses

Authors: Lin, Yiyan

Thesis Advisors: Kulkarni, Sandeep

Committee Members: Torng, Eric
Dillon, Laura
Kulkarni, Rajesh

Date Published: 2015

Subjects: Fault-tolerant computing--Mathematical models

Program of Study: Computer Science - Doctor of Philosophy

Degree Level: Doctoral

Language: English

Pages: x, 145 pages

ISBN: 9781339002712
133900271X

Automated addition of fault-tolerance via lazy repair and graceful degradation

Full text