Ii .xﬁ’gun
52......9 :.
F: .34
.2. .7...

3:3“?

 

 

 

“I“ J
. 30.1%
.. :3
“mm, 9.1;.
.i . . a
. 33..
1‘
‘L’v.

 

 

3r.
2
1 z .
5"?de 3
«r

“2%
0. III: A.

.2: . .
3335...:

. “mumﬁkn

.‘A.!. 1. V... , ,. . ‘ . . .. ‘ . . L i ‘. . . ‘..-.I
. .. . . i , A , ,J

 

 

g“ ‘ UBRARY
ow Michigt. State
University

This is to certify that the
dissertation entitled

ASSURANCE OF ADAPTATION IN DISTRIBUTED
SYSTEMS

presented by
Karunkumar N. Biyani

has been accepted towards fulfillment
of the requirements for the

Doctoral degree in Computer Science and Engineering

 

959w IMW

Major Professor’s Signature

l2Lo3/07

Date

 

MSU is an affinnative-action, equal-opportunity employer

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:/ClRC/DateDue.indd—p.1

ASSUI

 

 

ASSURANCE OF ADAPTATION IN DISTRIBUTED SYSTEMS
By

Karunkumar N. Biyani

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2007

AB STRACT

ASSURANCE OF ADAPTATION IN DISTRIBUTED SYSTEMS
By
Karunkumar N. Biyani

Software systems need to adapt due to changing requirements or changing environ-
ment conditions. For long-running and safety-critical applications it is highly desirable to
adapt the system without completely stopping the system. In the case of distributed sys-
tems, adaptation often requires changes to multiple processes. Typically, such adaptation
is performed by dynamically adding or removing components from multiple processes.
As a result, during adaptation, the system may consist of both changed and unchanged
processes, causing the old and the new components to overlap. This overlapping of com-
ponents during adaptation may induce cross-component communication, which may lead
to unpredictable and/or undesirable behavior during adaptation.

In order to gain conﬁdence in adaptation in distributed systems, in this disseration, we
address the assurance requirements at various stages of adaptation development: (2') mod-
eling and veriﬁcation of adaptation, (ii) testing of adaptation, (iii) design of components
involved in adaptation, and (iv) design of a framework that supports adaptation.

In this dissertation, we describe an approach based on adaptation lattices to model and
verify adaptation. Speciﬁcally, we present transitional-invariant lattices and transitional-
faultspan lattices to verify the correctness of adaptation in absence and presence of faults,

respectively.

 

 

 

 

lunhcm

namely. min ..

    
   
  

changed pros.
time and com:
ltnges inx'olx c
approach can I

We also d;
how the existii
can be extend.

We also d
One aspect 0‘

not only the

Furthermore, we identify the issues that arise in an important class of adaptation,
namely, mixed-mode adaptation. Mixed-mode adaptation allows the changed and the un-
changed processes to interact during adaptation, thereby, minimizing service interruption
time and communication overhead. In this dissertation, we identify and address the chal-
lenges involved in mixed-mode adaptation. Speciﬁcally, we show how the adaptation lattice
approach can be used in the case of mixed-mode adaptation.

We also discuss an approach for testing adaptation in distributed systems. We show
how the existing approaches based on predicate detection for testing distributed systems
can be extended for testing adaptation.

We also describe component family design to build a library of adaptive components.
One aspect of the design is to build an adaptation-veriﬁed library of components in which
not only the components but also the adaptations between the components are veriﬁed.
The design applies the principle of separation of concerns to separate adapt-active parts of
the components ﬁ'om their core functionality. Furthermore, the component family design
integrates the framework that performs adaptation while ensuring that the adaptation logic

is separate from the core functionality of components as well as the application.

© C0pyright by
KARUNKUMAR N. BIYANI
2007

To my parents and wife for their love and sacriﬁces

 

 

 

 

 

First of alj
motivation an

and guidance

 

 

this dissertatit
with me and u;

I Want to t]
Dr. Philip K
Conﬂuents an.
SCIEUCC and .‘
Want I0 than}
is “my 3 W01
Llllda MOOrc
Compurcr sC

I Want to
low Specifk

heSit-«mum.

ll Zhang and

fathEr RII' \

ACKNOWLEDGMENTS

First of all I want to thank my loving Lord, Krishna, and my teacher for the inspiration,
motivation and support to pursue things the right way. I am very thankful to my advisor
and guidance committee chairperson, Dr. Sandeep S. Kulkarni, without whose guidance
this dissertation would have never completed. I speciﬁcally thank him for being patient
with me and understanding my limitations. He is a wonderful person and a great mentor.

I want to thank the other members of my guidance committee, Dr. Betty H. C. Cheng,
Dr. Philip K. McKinley, Dr. Laura Dillon and Dr. Jonathan I. Hall, for their helpﬁrl
comments and advise over the years. I also want to thank the faculty of the Computer
Science and Engineering Department at the Michigan State University. In particular, I
want to thank Dr. Abdol Esfahanian, whose teachings had a great inﬂuence on me. He
is truly a wonderful teacher, who makes learning easy and ﬁm. I also want to thank Mrs.
Linda Moore, Mrs. Debbie Kruch, Mrs. Norma Teague and the rest of the staff of the
Computer Science and Engineering Department.

I want to thank my colleagues in Software Engineering and Network Systems Labora-
tory. Speciﬁcally, I want to thank: Bruhadeshwar (Bru) Bezawada, Umamaheswaran (Ma-
hesh) Arumugam, Ali Ebnenasir, Borzoo Bonakdarpour, Farshad Samimi, Masoud Sadjadi,
Ji Zhang and Limin Wang. I want to thank my friends and family for their encouragement
and support. No words of gratitude can ever sum up the contribution of my family: my
father Mr. Nandkishore Biyani, my mother Mrs. Santosh Biyani, and my sisters Kavita
Rathi and Rinku Malpani, towards my life. And last but not least, my special thanks to my

loving wife Kavita Biyani for all her love and sacriﬁces.

vi

 

 

 

 

LIST OF Flt

l Introduct z.
1.] Adaptor;

LIST OF TAI

1.1.1 Class:;
1.1.2 Adt-a;
1.1.3 Chan-J
1-3 Thesis .
2 Backgmm
2.l .‘5tdaptat1.I
2.2 Related\'
2.2.1 Dle
3.2.2 COXIL
2.2.4 Onlinc
2.2.5 Mode]-
2.2.6 Others
3 .‘vIOdelingil‘
31 Adaplalltt
3.2 Stracu
3.2.1 Adam:
3.2.2 Adaptaz |
3? amalloyl
iii-l aull~t<s I
34.1 Oucretc [.

I -- Yogurt}
Sig Ompon
3.4‘ dapm ;

. .4 State ma
:1 ‘Ierifbing Ac
4:2 TTYirisitjona

43 I age Stud}
auh‘lm.

 

TABLE OF CONTENTS

LIST OF FIGURES
LIST OF TABLES

1 Introduction

1.1 Adaptation in Distributed Systems .........
1.1.1 Classiﬁcation of Adaptation ...........
1.1.2 Advantages of Mixed-Mode Adaptation .....
1.1.3 Challenges in Mixed-Mode Adaptation .....
1.2 Thesis ........................

2 Background and Related Work

2.1 Adaptation Taxonomy ................
2.2 Related Work ....................
2.2.1 DYMOS .....................
2.2.2 CONIC ......................
2.2.3 ARGUS ......................
2.2.4 Online Software Version Change .........
2.2.5 Model-Based Development of Adaptive Software
2.2.6 Others .......................

3 Modeling Adaptation

3.1 Adaptation Overview ................
3.2 Abstract Model of Adaptation ...........
3.2.1 Adaptation as a set of automata .........
3.2.2 Adaptation as an automaton ...........
3.3 Adaptation Speciﬁcation ..............
3.3.1 F ault-tolerance ..................
3.4 Concrete Representation ..............
3.4.1 Program ......................
3.4.2 Component ....................
3.4.3 Adaptive action ..................
3.4.4 State mapping ...................

4 Verifying Adaptation in Absence of Faults

4.1 Transitional-Invariant ................
4.2 Transitional-Invariant Lattice ............
4.3 Case Study: Reliable Message Communication . .
4.3.1 Fault-Intolerant Communication Program . . . .

vii

OOOOOOOOOOOOOOO

OOOOOOOOOOOOOOO

10
10
13
13
14
15
16
17
17

18
18
21
21
25
25
28
30
30
32
32
33

35
35
36
4O
4O

 

 

 

  
  
  

4 3 2 Free.

5.1 Transit:
5.2 Transili
5.3 Adapts:
5.4 Case St.
5.4.] Proac-
5-4-2 ReaCII
5.4.3 Adapt

6.1.1 gym,
6.1.2 5pm}
6.1.3 LCadctl
Dl4 Lcade:

 

 

6‘2 Adaptau I
6.2.] State 3,
6.2.2 Verify]
6.3 PCTIOrma
64 Limitatto
7 Tradeoffs I
7.1 Concum1
7‘2 Concurrcj
7.3 Case Slut
Tesﬁ“g Ad
8 Prcllmjna
2 Testing
8.3 Adaptatic
8-3.1 mplcn
8.3.2 mplcn
8.4 DetecunK
84'1 Detecti
84.2 Detect]
2; “ling o
aplcr S
9
p0n9n
9 1 odu
9 a C11

4.3.2 Proactive Component .............................. 42

4.3.3 Adaptation: Addition of the Proactive Component ............... 44
4.4 Discussion ..................................... 49
5 Verifying Adaptation in Presence of Faults 53
5.1 Transitional-Faultspan .............................. 53
5.2 Transitional-Faultspan Lattice .......................... 54
5.3 Adaptation of Self-Stabilizing Programs ..................... 56
5.4 Case Study: Reliable Message Communication (Continued) .......... 57
5.4.1 Proactive Component .............................. 58
5.4.2 Reactive Component .............................. 59
5.4.3 Adaptation: Replacement of Proactive Component with Reactive Component 63
6 Case Study: Mixed Mode Adaptation 69
6.1 Leader Election Protocols ............................. 70
6.1.1 System Model and Assumptions ........................ 70
6.1.2 Speciﬁcation of Leader Election Protocols ................... 71
6.1.3 Leader Election based on Process ID ...................... 71
6.1.4 Leader Election based on Process Value .................... 75
6.2 Adaptation ..................................... 78
6.2.1 State Mapping and Overlap Communication .................. 83
6.2.2 Verifying Adaptation .............................. 85
6.3 Performance of Mixed-Mode Adaptation ..................... 86
6.4 Limitations of Mixed-Mode Adaptation ..................... 91
7 Tradeoffs in Adaptation 93
7.1 Concurrency v/s Veriﬁcation Complexity ..................... 94
7.2 Concurrency v/s Message Complexity ...................... 95
7.3 Case Study: Publish-Subscribe Application ................... 99
8 Testing Adaptation 101
8. 1 Preliminaries ................................... l 02
8.2 Testing ....................................... 106
8.3 Adaptation Vector ................................. 107
8.3.1 Implementation of Adaptation Vectors ..................... 108
8.3.2 Implementation of Vector Clocks ....................... 108
8.4 Detecting Global Predicates During Adaptation ................. 109
8.4.1 Detecting Adaptation-Stable Predicates .................... 111
8.4.2 Detecting Adaptation-Transient Predicates ................... 113
8.5 Testing Only Atomic Adaptations ........................ 115
8.6 Chapter Summary ................................. 119
9 Component Family: Design of Adaptive Components 121
9.1 Introduction .................................... 121
9.2 Abstract Component Family ........................... 128

viii

9.3.] Abstf'
9.3 C mere
9.3.] C om;
9.3.2 C om;
9.3.3 C cm;
9.4 Case S:
9.4.] Inter:
9.4.2 Class.
9.4.3 Pc. (1'
9.5 Case 511..
9.5.] C omp
9.5.2 Adapt.
9.6 DISCUS>.|
9.7 Related
9.8 Summer

10 Conclusiu I
1 _ l0.l Contriht.
limos; 10-3 Future R

 

 

 

 

 

9.2.1 Abstract Component Structure ......................... 129

9.3 Concrete Component Family ........................... 131
9.3.1 Component Family Interfaces ......................... 133
9.3.2 Component Family Instantiation ........................ 137
9.3.3 Component Family Implementation ...................... 139
9.4 Case Study: Leader Election Component Family ................ 146
9.4.1 Interfaces of the Family ............................ 147
9.4.2 Classes of the Family .............................. 150
9.4.3 Performance Results .............................. 152
9.5 Case study: Reliable Communication Component Family ............ 154
9.5.1 Components of the Family ........................... 154
9.5.2 Adaptations of the Family ........................... 159
9.6 Discussion ..................................... 164
9.7 Related Work ................................... 166
9.8 Summary ..................................... 167
10 Conclusion and Future Work 171
10.1 Contributions ................................... 171
10.2 Future Research .................................. 174
APPENDICES 176
A Model-Checking of Adaptive Leader Election Program 177
A.1 Adaptive Leader Election Program ........................ 177
A2 Model-Checking Results ............................. 189
A.2.l End States .................................... 189
A22 Safety Property of Leader Election ....................... 190
A23 Liveness Property of Leader Election ..................... 192
A24 Safety Property During Adaptation ....................... 193
BIBLIOGRAPHY 195

ix

3.1

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10

LIST OF FIGURES

An example of an adaptation lattice. ....................... 24
Message communication program (fault-intolerant version) ............ 41
Proactive component ................................ 43
Message communication program (with proactive component) .......... 44
Message communication program (fault-intolerant version, adapt-ready). . . . . 45
Intermediate program ’P‘H'pl. .......................... 45
Intermediate program ’P 41,2. .......................... 46
Intermediate program 7304;93- .......................... 46
Intermediate program ’P 41,4. .......................... 47
Adaptation lattice for addition of proactive component. ............. 48
Adaptation lattice for parallel adaptation. .................... 51
Message communication program (with proactive component, adapt-ready). ..... 58
Fault class F 1. .................................. 59
Fault class F2. .................................. 59
Acknowledgment component: sender fraction ................... 60
Acknowledgment component: receiver fraction .................. 61
Message communication program (with reactive component) ........... 62
Intermediate program pan-i171 ........................... 64
Intermediate program ’Pawim ........................... 64
Intermediate program poo-2113 ........................... 65
Intermediate program Paw,“ ........................... 66

5.11 Intermediate program Paa_ip5 ...........................

5.12 Adaptation lattice for replacement of proactive component ...........

6.1 Leader election algorithm based on node Id. ...................
6.2 Leader election algorithm based on node value. .................
6.3 Program using ldrI d. ...............................
6.4 Program using ldr Val. ..............................
6.5 Intermediate program during adaptation from 731d, Id to pldr Val .........
6.6 Adaptive program to adapt from Pldr Id to ”PM, Val ................
6.7 Quiescence adaptation. ..............................
6.8 Mixed-mode adaptation. .............................
6.9 Quiescence vs. mixed-mode adaptation. .....................

6.10 Time for leader election protocols. ........................

7.1 Executing three atomic adaptations. .......................
7.2 Space-time diagram of adaptation. ........................
7.3 Adaptation with concurrency ............................
7.4 Adaptation with (reduced) concurrency. .....................
7.5 Adaptation with no concurrency. .........................

7.6 Adaptation in publish-subscribe application ....................

8.1 Space-time diagram of a distributed computation. ................
8.2 Identifying intermediate program states. .....................
8.3 Algorithm for adaptation-stable predicate (non-checker process 190- ......
8.4 Algorithm for adaptation-stable predicate (checker process). ..........
8.5 Algorithm for adaptation-transient predicates (non-checker process p,).

8.6 Algorithm for adaptation-transient predicate (checker process) ..........

9.1 Reference update during adaptation. .......................

xi

66
68

72
76
8O

80
82
87

90
91

96
97
98
98
99

103
110
112
113

.116

117

124

 

 

 

9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
9.17
9.18

An example of a component family ........................ 129
Structure of a component in a family ....................... 130
Architecture of a component family ....................... 131
Interfaces of a component family .......................... 133
Adaptation handlers. ............................... 135
An example of delegation pattern. ........................ 136
Interfaces of the leader election component family. ............... 146
Classes of the leader election component family .................. 149
Component family performance. ......................... 153
Reliable communication component family. ................... 154
Forward error correction component ........................ 156
Acknowledgment component. .......................... 157
Forward error correction with acknowledgment: sender fraction. ........ 158
Forward error correction with acknowledgment: receiver fraction. ....... 159
Adaptations 1 and 2 in reliable communication component family. ....... 160
Adaptations 3 and 4 in reliable communication component family. ....... 161
Adaptation lattices. ................................ 162

xii

4.1

5.1

6.1
6.2
6.3
6.4

LIST OF TABLES

State mappings for the adaptation. ........................ 48
State mappings for the adaptation. ........................ 67
Message types used in the protocol. ....................... 73
Variables maintained by each process in the protocol. .............. 74
Overlap communication between protocols. ................... 83
State mapping ((Daai) for each atomic adaptation aaz- ............... 84

xiii

Chapter 1

Introduction

Evolution is vital in all software systems and has been studied for decades in the context of
software maintenance and software upgrade. It is a well known fact in software engineering
that programs undergo several changes during their lifetimes. These changes are usually
performed for ﬁxing bugs or adding new functionality that the users of the systems did not
anticipate initially at the time of original speciﬁcations.

Furthermore, software systems have become more and more pervasive with easy access
to personal computers, smartphones, and cellular and wireless networks. The challenges in
building today’s software systems include providing seamless service to user requirements
in such heterogeneous operating conditions as: device failures, transitions across wired
and wireless environments, high packet loss in wireless networks, byzantine behaviors, and
security attacks. In order to meet these challenges, software systems must be able to adapt
to environment conditions that may not be foreseen at the time of software development.

In general, software systems need to adapt (change) in response to one or more of

the following reasons: (i) discovery of errors/bugs, (ii) change in requirements, and (iii)

change in environment. In a traditional approach, the change is usually performed by
stopping the currently running program and then installing the new version of the program.
However, stopping the system to perform the change is undesirable for a variety of reasons,
such as: (i) it may be simply inconvenient for users; (ii) it may lead to monetary loss, for
example, in the case of banking and e-commerce systems; or (iii) it may be unsafe, for
example, in the case of safety critical systems such as air traffic control systems. Moreover,
in systems that need to adapt in response to frequent or transient changes in environment,
stopping the system for upgrade may not even be an option. Clearly, in all these cases, it is
highly desirable to perform the change while the system continues to operate. This type of
change is commonly referred to as dynamic adaptation. A software system that supports
dynamic adaptation is known as adaptive software.

There has been a growing interest in building adaptive softwares. An increasing mun-
ber of systems are now being developed with some built-in mechanisms for adaptation that
allow for change to be done without completely stopping the system. Adaptive software
techniques (e. g., [1—10]) allow the system to modify its own functional or non-functional
behavior (e. g., its fault-tolerance, quality of service or security requirements). These modi-
ﬁcations include reconﬁguration of some parameters, or addition, removal, or replacement
of application code. A survey in [11] presents various tools and techniques in building
adaptive software. Additionally, Buckley et al. [12] has given a taxonomy in the context of
when, how, where, and what, of software change.

Numerous works in adaptation have either focused on single-process systems or in dis-
tributed systems where changes to processes are independent of each other. However, com-

paratively fewer works have addressed adaptation in distributed systems where changes to

2

multiple processes need synchronization. Moreover, behavioral veriﬁcation during adap-
tation in distributed systems that require changes to multiple processes has not been ad-
equately addressed. In the next section, we discuss some of the issues in adaptation in

distributed systems.

1.1 Adaptation in Distributed Systems

In distributed systems, multiple processes need to be changed during adaptation. In such
adaptations, changes to multiple processes need to be synchronized and interactions be-
tween changed and unchanged processes need to be controlled. We refer to the system
before adaptation as the old program and to the system after adaptation as the new pro-
gram. A process before it is modiﬁed is considered as a part of the old program, and after
modiﬁcation it is considered as a part of the new program. We now give the classiﬁcation

of adaptation in distributed systems.

1.1.1 Classiﬁcation of Adaptation

We classify adaptation in distributed systems as: (i) overlap adaptation - when the old
program and the new program overlap during adaptation, and (ii) non-overlap adaptation
- when the old program and the new program are not present in the system simultaneously
during adaptation.

Furthermore, we classify overlap adaptation into three main categories: (i) mixed-mode
adaptation, (ii) quiescence adaptation, and (iii) parallel adaptation. In the case of quies-

cence adaptation, which is the most common approach for adaptation in distributed sys-

tems, there is no communication allowed between the old program and the new program.
Consequently, during adaptation, changed and unchanged processes exist in the system si-
multaneously, but the processes are modiﬁed in such a way that the changed process and
the unchanged process do not communicate with each other. In contrast, in case of mixed-
mode adaptation, changed processes and unchanged processes are allowed to communicate
with each other. In the case of parallel adaptation, each node has both the changed process
and the unchanged process, but communication between changed processes and unchanged
processes is not allowed; the changed (respectively, unchanged) process at one node can
communicate with the changed (respectively, unchanged) process at another node.

To gain assurance in adaptation, formal speciﬁcation and veriﬁcation of adaptation is
crucial. In the context of adaptive distributed systems, there are three aspects of veriﬁca-
tion: (2') verifying the system before adaptation, (ii) verifying the system during adapta-
tion, and (iii) verifying the system after adaptation. While existing veriﬁcation techniques
can be used to verify the system before and the system after adaptation, such techniques
cannot be applied directly to verify the system during adaptation. This is because during
adaptation the system is changing whereas existing work assumes that the system remains
unchanged. Especially, in the case of distributed systems, during adaptation the system
exhibits overlapping behavior that is not well speciﬁed.

Numerous techniques have been proposed to address various issues in forrnalizing
adaptation. A survey in [13] discusses various approaches based on graphs, process alge-
bras, logic and other formalisms used to specify adaptive systems. Most of the approaches
[1, 2, 4, 5, 7, 14—20] focus on design and implementation of adaptive systems. Other ap-
proaches [21—24] address the issue of verifying adaptation. The approaches in [21—23]

4

focus on ofﬂine adaptation, whereas the approach in [24] focuses on online adaptation of
a single process system (that can also be extended to distributed systems that communi-
cate via RPC). However, none of these approaches explicitly focus on the behavior of the
system during adaptation in distributed systems.

Furthermore, mixed-mode adaptation has not received much attention, as it is normally
considered difﬁcult. Most existing approaches avoid dealing with mixed-mode behavior
during adaptation by employing non-overlap, quiescence or parallel adaptation. Quies-
cence adaptation behaves as if the adaptation is performed by changing all processes at the
same logical time. There is overhead in performing quiescence adaptation as a large num-
ber of messages are required to enforce synchrony among processes. In contrast, mixed-
mode adaptation gives better performance in terms of service interruption time and com-
munication overhead. Nevertheless, to develop mixed-mode adaptation correctly involves
a lot of challenges.

In the rest of this chapter, we ﬁrst discuss advantages of mixed-mode adaptation in
Section 1.1.2. Then, in Section 1.1.3, we identify challenges in adaptation that arise or are
exaggerated due to overlapping behavior of mixed-mode adaptation. Finally, in Section

1.2, we discuss the contributions of this research.

1.1.2 Advantages of Mixed-Mode Adaptation

We expect mixed-mode adaptation to offer the following two main advantages compared

to other types of adaptation:

0 Reduced service interruption. Since individual processes need not block while
waiting for other processes during adaptation, the service interruption time is re-
duced. This is especially important when adaptation occurs frequently (in response

to changes in the environment).

0 Low communication overhead. Since mixed-mode adaptation allows the old pro-
gram and the new program to interact during adaptation, the synchronization required
among processes during adaptation is reduced, thereby reducing the communication
overhead. This is especially important in systems that are operating on limited-power

01' resources.

We validate these advantages in this dissertation using a case study.

1.1.3 Challenges in Mixed-Mode Adaptation

The challenges in mixed-mode adaptation arise because the behavior of the old program
and the new program overlap during adaptation. These challenges also occur in other forms

of adaptation; here we discuss them in the context of mixed-mode adaptation.

0 Consistency. All updates to individual processes must ensure consistency of the
whole system. In mixed-mode adaptation some interactions between the old program
and the new program during adaptation may not be acceptable. Only mixed-mode
interactions that are acceptable should be allowed to occur. For example, if process
X requires a service ﬁ'om process Y, then updating Y before X should be allowed
only if new process Y’ can handle the service requests from old process X (we use
notation Y’ to denote process Y after it has been changed).

6

o State-ﬁansfer. To ensure proper mixed-mode operation, each process will need
to' preserve the state during adaptation. Speciﬁcally, if component Cl is replaced by
component 02, then the state of component Cl at each process needs to be trans-
ferred to component Cg. The efﬁciency of state-transfer mechanisms is particularly
important in the case of mixed-mode adaptation compared to other forms of adapta-

tion.

0 Assurance. To provide assurance guarantees for mixed-mode adaptation requires a
formal way to specify and verify the adaptation. In the case of mixed-mode adap-
tation, the changes cannot be speciﬁed in terms of system structure because the be-
havior of the old program and the new program overlap during adaptation. The ﬁrst
challenge is in specifying the overlapping behavior. Next, there are challenges in

veriﬁcation and testing of mixed-mode adaptation due to overlapping behavior.

0 Reuse. There is a large amount of work done in adaptation. An approach for mixed-

mode adaptation should reuse existing adaptation techniques whenever possible.

1.2 Thesis

Based on the above motivation, in this dissertation we propose the following thesis. Our
goal is to develop an approach for assurance of adaptation that applies to both quiescence

and mixed-mode adaptations.

The lattice-based modeling helps in verifying and testing of adaptive be-

havior in distributed systems.

To defend this, we make the following contributions in this dissertation:

1. Modeling and speciﬁcation of adaptive behavior. We present the concept of an
adaptation lattice to use for modeling and specifying the adaptive behavior. The
adaptation lattice approach identiﬁes the atomic adaptations, and the behaviors of

the intermediate programs that occur during adaptation.

2. Veriﬁcation of adaptive behavior. We present the concept of a transitional-
invariant lattice to use for verifying the correctness of the system during adapta-
tion. The transitional-invariant lattice approach ensures that safety is satisﬁed during

adaptation and that adaptation eventually terminates.

3. Fault-tolerance during adaptation. We present the concept of a transitional-
faultspan lattice to use for verifying the fault-tolerance properties of the system dur-
ing adaptation. The approach can be used to verify different types of fault-tolerance
during adaptation. Also, faults considered during adaptation can be different from

the faults that the system is subjected to before or after adaptation.

4. Tradeoffs in adaptation. In order to assist the adaptation developer, we identify
tradeoffs in adaptation that are useful while designing adaptation in various contexts.
We show how concurrency during adaptation leads to increased veriﬁcation complex-
ity. We also show that increased concurrency during adaptation can also increase the

communication overhead, which may not be desirable in certain systems.

5. Testing of adaptation. We use predicate detection techniques to test adaptation in

distributed systems. We identify two classes of predicates for testing adaptation: (i)

adaptation-transient, and (ii) adaptation-stable. We introduce the notion of adapta-

tion vector to identify states of the intermediate programs during adaptation.

6. Component family design. We describe the design of a component family to sup-
port adaptation. The component family design integrates aspects related to decision-
making, adaptation logic, and component functionality, while maintaining a strict
separation of different concerns. Speciﬁcally, the design separates the adapt-active
parts of the component from the core functionality of the component. Moreover,
the design separates the adaptation logic from the component functionality, thereby,

simplifying the task of verifying adaptation.

Organization of the dissertation. The remainder of this dissertation is organized as
follows: We ﬁrst discuss the background and related work in adaptation in Chapter 2. In
Chapter 3, we describe the adaptation lattice approach to model adaptation and adaptive
systems. In Chapter 4, we present transitional-invariant lattice to verify adaptation in the
absence of faults and discuss a case study based on message communication application
to demonstrate its use. In Chapter 5, we present transitional-faultspan lattice to verify
adaptation in the presence of faults and discuss a case study to demonstrate its use. In
Chapter 6, we present a case study of mixed-mode adaptation where we show performance
advantage of mixed-mode adaptation. We also discuss how the adaptation lattice approach
can be used to verify mixed-mode adaptation. In Chapter 8, we describe the testing of
adaptation using predicate detection techniques. In Chapter 9, we describe the component
family design to build an adaptation-veriﬁed library of components. Finally, we discuss

conclusions and future work in Chapter 10.

Chapter 2

Background and Related Work

Adaptation has been studied in various contexts, such as software change, reconﬁguration,
upgrade or update, change on ﬂy, program modiﬁcation, software evolution, etc. In this
chapter, we ﬁrst give an overview of some taxonomies of software adaptation and discuss
the areas in adaptation that are focus of this dissertation. Then, we review some of the

related works.

2.1 Adaptation Taxonomy

Several works [1 1—13, 25—27] have presented taxonomies of adaptation to categorize dif-
ferent adaptation mechanisms. In this section we discuss some of them.

One of the early works in classifying adaptation was done by Lientz and Swanson
[25] in context of software maintenance. They proposed a topology that distinguishes
between perfective, adaptive and corrective maintenance activities. This taxonomy was

further reﬁned by Chapin et al. [26], where they classify software evolution and software

10

maintenance into 12 different types: evaluative, consultive, training, updative, reforrnative,
adaptive, performance, preventive, groomative, enhancive, corrective and reductive. In
essence, the works in [25] and [26] categorize the adaptation activities on the basis of their
purpose (i.e., the why of software change). In our work, the reason why a software needs
to adapt is irrelevant.

In [12], Buckley et al. gave a taxonomy of adaptation that focuses on the how, when,
where, and what, of software change. This taxonomy is based on the characteristics of
adaptation mechanisms and the factors that inﬂuence these mechanisms. The adaptation
mechanisms refer to the software tools and algorithms used to perform the adaptation.
However, this taxonomy does not consider formalisms used in adaptation. Regardless, the
taxonomy is very useful in identifying and classifying various adaptation scenarios, and in
categorizing and comparing different adaptation mechanisms. In our work, we focus on
dynamic adaptation, i. e. the one performed at run-time (when of adaptation).

In [11], McKinley et al. classiﬁed adaptation into two categories, namely, param-
eter adaptation and compositional adaptation. Parameter adaptation modiﬁes program
variables that determine behavior. In contrast, compositional adaptation exchanges algo-
rithmic or structural system components with another. Compositional adaptation provides
more ﬂexibility in supporting a variety of changes compared to simple tuning of program
variables in case of parameter adaptation. However, compositional adaptation involves a
lot more challenge in design, implementation and veriﬁcation. Although our approach in
this dissertation focuses on compositional adaptation, it can also be applied in the case of
parameter adaptation. Speciﬁcally, we consider addition, removal, or replacement of com-

ponents [28] in this dissertation. A component implements part of the desired behavior

11

of the program. Our notion of component is different from the popular usage of the term
in the development community, where a component and a class (or object) (an artifact in
object-oriented programming) are considered one and the same. According to our deﬁni-
tion, a component can consist of more than one class, and may even be deployed at multiple
processes.

Dynamically changing software is challenging in terms of correctness, robustness, and
efﬁciency. Formal speciﬁcation and veriﬁcation is important in order to gain assurance in
adaptive software. A variety of formal speciﬁcation languages have been developed to gain
a better understanding of the foundations of software change. Bradbury et al. in [13] has
given a survey of 14 formal speciﬁcation approaches based on graphs, process algebras,
logic, and other formalisms. Graph-based approaches [14—16] use graph rewriting rules
to specify dynamism. Approaches in [17, 18, 29, 30] use a variety of process algebras
such as Calculus of Communicating Systems (CCS), Communicating Sequential Processes
(CSP), and n-calculus. Architectural Description Language (ADL) based approaches [19,
20, 31] model programs as components and connectors, and adaptation as reconﬁguration
of connections. Generally, these approaches have focused on specifying the design of
adaptive software and changes are speciﬁed in terms of the system structure. However,
the approaches are inadequate in specifying the adaptive behavior. Approaches in [17,
18, 29, 30] are a few exceptions that have used process algebras to specify the behavior
of adaptive programs. However, these approaches suffer from the following limitations:
(2') adaptation-speciﬁc behavior of the program is not distinctly separated from the non-
adaptive behavior of the program, (it) the approaches are appropriate to specify changes in
client—server applications, but it is not clear how it would apply to protocol changes in group

12

communication application, (iii) state-transfer is not speciﬁed explicitly, so it is not clear if
the new program behavior has to start in the initial state or can start in some arbitrary state,
(iv) the approaches are inadequate in specifying mixed-mode behavior during adaptation,
and (v) the approaches use speciﬁc type of formalisms that may potentially limit wide-
spread use and any extensions to include different types of adaptations. In our work, we

address formal speciﬁcation and veriﬁcation of the behavior of system during adaptation.

2.2 Related Work

In this section, we brieﬂy review some of the previous work in the context of verifying

adaptation.

2.2.1 DYMOS

Lee presented one of the early systems to support dynamic updating called DYMOS (Dy-
namic Modiﬁcation System) [32]. It supports a single programmer modifying a module-
based program dynamically (that is, without stopping its execution). In DYMOS, the pro-
grammer modiﬁes and recompiles the source code of procedures and modules that need to
be replaced. The programmer then requests the system to change the current core image
to incorporate new code and data. New object code is inserted by a dynamic modiﬁcation
process that is executed in parallel with other user processes.

DYMOS supports program written in StarMod language [33]. In the DYMOS envi—
ronment each updateable program is associated with a command interpreter, a source code

management system, a StarMod compiler and a run-time environment. The system allows

13

D‘.

is

.u‘

Co.

tier

.1

VI"

 

 

2.

C031

don

t n _ . xii-hump) r
1.—
1.
L
J

 

individual procedures of a program to be changed.

In the context of ensuring correctness, Lee gave a procedure for partitioning change into
sequence of smaller changes. The decomposition is done in such a way that the program
behaves “acceptably” after each change. The decomposition approach helps in specifying
behavior between changes where the program is operating with some reduced functionality.
DYMOS does not address changes in distributed programs, however, it is important as it
is one of the early systems that studied dynamic adaptation with particular concern for
correctness. It gave some basic theoretical contributions in the context of correctness of

adaptation.

2.2.2 CONIC

Conic [9, 34, 35] is a distributed programming system that supports dynamic reconﬁgura-
tion of programs. A program in Conic consists of a number of processes that communicate
with each other using well-deﬁned entry and exit ports. Conic modules (processes) do not
communicate by naming each other but by naming the ports. Thus, reconﬁguration can be
done by creating instances of modules and linking entry ports of these modules with exit
ports of other instances.

The reconﬁguration is done by providing a conﬁguration change speciﬁcation. The
speciﬁcation speciﬁes the creation and deletion of modules and links. The conﬁguration
manager translates the change speciﬁcation into commands to the operating system to ex-
ecute the reconﬁguration operation.

In [9], Kramer and Magee gave a formal basis for dynamic reconﬁguration. They

I4

specify change as structural change, in terms of component creation/deletion and con-
nection/disconnection. In their model, interactions between processes are considered as
transactions. A transaction is an exchange of information between two nodes, initiated by
one of the nodes. A node is said to be in passive state if it is not currently engaged in a
transaction that it initiated and it will not initiate new transactions. A node is quiescent if it
is in passive state, is not currently engaged in servicing a transaction, and no transactions
have been or will be initiated by other nodes which require service ﬁom this node. It is
claimed that a dynamic reconﬁguration will leave the system in a “consistent” state if all
the involved processes are quiescent at the time of reconﬁguration.

The main limitation of using node quiescence for adaptation is that it leads to excessive
blocking during reconﬁguration. Moreover, a large number of messages are required in
synchronizing all nodes to reach quiescence state. Furthermore, local states of nodes are
not considered in determining node quiescence. As a result, if local states of nodes are not
consistent at the time of adaptation (reconﬁguration), then the new version of the program

will start in an inconsistent state.

2.2.3 ARGUS

Argus [36] is a programming language for building reliable distributed applications. It is
based on the CLU programming language and provides support for atomic transactions and
crash recovery. An application in Argus consists of a set of servers called guardians, that
communicate with each other using remote procedure calls.

Dynamic reconﬁguration in Argus is described in [37]. Similar to Conic, the connec-

15

tions between guardians can be rerouted dynamically. Guardians are made quiescent before
replacement. To reach quiescent state the guardian can either abort running transactions or
wait for them to complete. Argus suffers from the same limitations as Conic. Additionally,
the replacement system requires Argus crash recovery facilities in order to work properly,

which may make it difficult for use in other systems.

2.2.4 Online Software Version Change

Gupta and Jalote [24, 38] presented a framework for modeling changes to running programs
and use it to study the validity of an on-line change. In their notion of validity, a change
is valid if some time after the change, the process reaches a reachable state of the new
program version. Thus, there is a “transition period” following a change, after which the
the system behaves like a new program. However, it is not clear what behavior is acceptable
during the transition period.

In their work, they consider different programming language styles, including imper-
ative languages without procedures, imperative languages with procedures, and object-
oriented languages. Their work focuses on change in sequential programs. They also dis-
cuss how their approach can be extended to distributed programs where only one process is
changed. For the case where multiple processes are affected due to change, they consider a
remote procedure call based model. They stop all processes before the change which leads
to disruption of service till the change is completed. Moreover, their work does not address
validity of on-line change in a distributed system that uses unrestricted message passing

model.

16

2.2.5 Model-Based Development of Adaptive Software

Zhang et al. [39—41] proposed a model-based development of dynamically adaptive soft-
ware. Their approach separates the model specifying the adaptive behavior from the model
specifying the non—adaptive behavior. They use global invariants to specify properties that
should be satisﬁed by adaptive programs regardless of adaptations. They enumerate differ-
ent execution domains in which the program is required to execute, and build a state-based
model in each domain. They enumerate possible adaptations of the program from one do-
main to another. Furthermore, they introduce A-LTL, an extension to linear temporal logic,
to specify an adaptation from one program to another. They present three semantics of
adaptation: one-point, guided and overlap. Similar to our goals, their work also addresses
behavioral veriﬁcatiOn during adaptation. However, they do not consider general safety
and liveness properties [42, 43] during adaptation. Their approach seems more suitable for

quiescence adaptation and its application in mixed-mode adaptation is not straightforward.

2.2.6 Others

Several other works that addressed runtime adaptation include Podus [44—46], Durra [47],
Polylith [48] and Dynamic ML [49, 50]. Similar to the limitations of the works discussed
earlier, these approaches also suffer from one or more of the following limitations: (2) apply
only to a single process change, (it) apply only to distributed systems that communicate
via RPC, but not in the case of asynchronous message passing model, and (iii) verify only
structural changes; do not consider behavioral veriﬁcation of system during adaptation.

Other surveys of adaptation approaches can be found in [11, 38, 51, 52].

17

Chapter 3

Modeling Adaptation

In this chapter, we introduce a formal model for adaptation in asynchronous programs.
We ﬁrst present an informal overview of how adaptation occurs in a distributed system in
Section 3.1. Then, we use the ideas discussed in Section 3.1 to formalize the model of adap-
tation in Section 3.2. In Section 3.3, we give deﬁnitions used in formal reasoning about the
correctness of adaptation. Finally, in Section 3.4, we describe the concrete representation

of programs and adaptations using guarded commands.

3.1 Adaptation Overview

We consider compositional adaptation as one that adds, removes, or replaces a component
during adaptation. A component implements a part of the desired behavior of the system. A
component (formally deﬁned later in the chapter) consists of one or more fractions, where
each fraction is associated with one process in the system. For the discussion in this chapter,

we assume that adaptation replaces a component; addition and removal can be considered

18

as special cases of replacement.

We refer to the component that gets replaced as the old component and the component
that replaces the old component as the new component. To replace an old component with a
new component requires replacing each fraction of the old component with the correspond-
ing ﬁaction of the new component at all processes. An adaptation in a distributed system
involves multiple steps that are executed at various processes. For example, consider a
protocol that provides encrypted communication between a sender and a receiver. Such a
protocol consists of two types of fractions, namely, encryption fraction at the sender that
encrypts the packets before sending and decryption ﬁaction at the receiver that decrypts the
encrypted packets received ﬁom the sender. To replace such a protocol, each ﬁaction of the
protocol needs to be replaced. Thus, the adaptation in a distributed program involves mul-
tiple steps that are executed at various processes. We consider the replacement of a fraction
at a single process as an atomic step of adaptation, and call it an atomic adaptation.

The old program, i. e., the program before adaptation uses the old component and the
new program, i. e., the program after adaptation uses the new component. An adaptation
replaces the old component with the new component, or equivalently we can say that the
adaptation replaces the old program being executed by the system with the new program.

We assume that the old program and the new program are independently correct, i. e.,
by themselves they can execute and produce acceptable behavior. The goal of verifying
adaptation is to ensure that: (i) the adaptation ends in a state from where the system satisﬁes
the behavior of the new program, and (ii) the (overlapping) behavior during adaptation is
acceptable (as deﬁned by speciﬁcation during adaptation).

To verify the behavior during adaptation we need to classify the states of the program

19

during adaptation. The intermediate states that occur during adaptation are due to overlap-
ping of the old program and the new program. The properties satisﬁed by these intermediate
states may be different from the old program and the new program. Consequently, the be-
havior expected during adaptation needs to be speciﬁed separately from the old program
and the new program. For example, in the case of adaptation of encrypted communica-
tion protocol discussed above, consider the system in which the adaptation has replaced
the encryption fractions at the sender but has yet to replace the decryption fractions at the
receiver. During adaptation the sender may continue to send packets that may be buffered
at the receiver or the sender may be blocked from sending more packets until the receiver
has replaced the decryption fractions. Clearly, there are different possible behaviors during
adaptation, and the expected behavior during adaptation needs to be speciﬁed separately
from the behavior of the old program and the new program. Towards this end, we deﬁne
the notion of intermediate program.

Intermediate program. An intermediate program arises due to overlapping of behavior
of the old program and the new program. The ﬁrst atomic adaptation modiﬁes the old
program into the ﬁrst intermediate program. Similarly, other atomic adaptations modify
one intermediate program into the next intermediate program. The last atomic adaptation
results in the new program. The speciﬁcation during adaptation identiﬁes the requirements
for these intermediate programs.

We now present the formal model for adaptation and adaptive systems.

20

 

 

 

 

 

actio
fora 1

15 not

antom,
Where

asali't

 

l0 mail
filthy“c
reStilts .
3.2.1

We mm

 

3.2 Abstract Model of Adaptation

We model a process as an automaton A represented as a tuple (S, 2, 6, So), where
o S (A) - a set of states
0 2(A) - a set of actions

0 6(A) - a state-transition relation, where 6(A) Q S (A) x 2(A) x S (A)

SO(A) - a nonempty subset of 5(A) known as initial states

Each element (5. 7r, 5’) of 6(A) is known as a transition, where s, 3’ E S (A) and it E
2(A). If A has a transition (3, 7r, 5’) it means that 7r is enabled in state s and executing
action 1r in state 3 will lead to state 5’. A transition of the form (3, _, s’) is an abbreviation
for a transition whose source is s and target is 3’, where the action that caused the transition
is not of interest.

A program consists of a set of process automata. We assume the sets of actions of
automata are disjoint. We consider asynchronous or interleaved execution for a program,
where at any time only a single process can execute its action. This approach can be viewed
as a reduction of concurrency to non-determinism, where a concurrent execution gives rise
to many possible corresponding interleaving orders. We could have used more complex
automata such as [53—55] to model concurrency, but we adopt an interleaving model as it

results in a simpler theory for speciﬁcation and veriﬁcation of adaptation.

3.2.1 Adaptation as a set of automata

We model an adaptation A using a 5-tuple as follows:

21

o I — a set of automata

o P - an automaton of the old program, P E I

Q - an automaton of the new program, Q E I

Ea - a set of special type of actions known as adaptive actions, Sam U E(A)) = (i)
AEI

Smap - a state mapping is a partial function ( U S (A)) x 23a ——> ( U S (A, )) that
AEI A’eI

satisﬁes the following two properties:

2'. Vs,s’,7ra,A1,A2 : s E 3(A1),s’ E 3(A2).na E Ea,A1,A2 E I:
((s, tra), 3’) E Smap => A1 75 A2, and

ii. V31,32,s'1,s'2,na,A,A1,A2 : 81,82 6 S(A),s'16 S(A1),s’2 E S(A2),
7Ta E 2a,A,A1,A2 E I:

((31,7ra),s’1)6 Smap /\ ((32, tra), 9’2) 6 Smap => A1: A2

The old program, the new program, and all intermediate programs are modeled as au-
tomata. We also assume that the states of the automata in I are pair-wise disjoint. Given
an adaptive action, the state mapping deﬁnes an automaton and the states of that automaton
in which the adaptive action can execute, and the resulting automaton and the state of the
resulting automaton in Which the adaptive action terminates. The state mapping function
satisﬁes two properties. The ﬁrst property ensures that executing an adaptive action results
in a change of automaton (whereas, executing an action results in a state of the same au-
tomaton). The second property states that if an adaptive action can be executed in different
states of the automaton, then it will result in an unique automaton (the resulting states may
be different, but they will be of the same automaton).

22

Note that the state mapping is a partial ﬁmction, as it may not be possible to perform
corresponding atomic adaptation in all states. Each element ((3, tra), 3') of Smap can be
represented as a triplet (5,1ra,s’). Similar to the state-transition relation of an automaton,
a state mapping Smap can be deﬁned as a subset of ( U S (A)) x Ba x ( U S (AI )) with

A61 .A’eI
”l E Smap then s’ = 3”. Each element

the restriction that if (3, na, 8’) E Smap and (3, 7rd, 5
of Smap is known as adaptive transition. If Smap has an adaptive transition (3, ﬁg, 5’),
where s E S (A) and s’ E S (A, ), it means that 7ra is enabled in A and executing 7rd in
state 3 of A will lead to state 8' of A'.

Note that the range of Smap is S (AI ) and not SD (A’). In other words, we do not require
an adaptive action to terminate in an initial state of the resulting automaton.

Now, given the state mapping of adaptation A, we can deﬁne an automata-
transformation (partial) function 5., : I x 23., —» I. We have, ((A, M), A’) e 5., 111’
33,5’ : s E S(A),s’ E S(A') : (s, na,s') E Smap. Each element ((A,7ra),A') (equiva-
lently, (A, ira, A')) of 6a is known as an atomic adaptation. Thus, each atomic adaptation
is modeled as transforming one automaton to another automaton.

The automata-transformation ﬁmction represents an adaptation lattice deﬁned as fol-
lows:

Adaptation Lattice. Adaptation lattice (cf. Fig. 3.1) is a ﬁnite directed acyclic graph

in which each node is labeled with an automaton and each edge is labeled with an atomic

adaptation, such that,

1. There is a single start node P having no incoming edges. The start node is associ-

ated with the automaton representing the old program. The automata-transfonnation

23

function (correspondingly, Smap) satisﬁes the following condition:

VA,7Ia I: (A, Na, P) g 6a

2. There is a single end node Q having no outgoing edges. The end node is associ-
ated with the automaton representing the new program. The automata-transformation

function (correspondingly, Smap) satisﬁes the following condition:

VA,7ra :: (Q. 7l'a,.A) Q 6a

3. Each intermediate node R has at least one incoming edge and at least one outgo-
ing edge. It is associated with the automaton representing the intermediate program.
The automata-transformation function (correspondingly, Smap) satisﬁes the follow-
ing condition:

VA : .4 ,2 P ; (324%.. :: (.4’. 71a,A) 6 6a) /\

VA=A#Q=(3A'17ra:=(v4.7ra,«4’) 66a)

A path in the lattice ﬁom the start node to the end node is called adaptation path.

 

Figure 3.1: An example of an adaptation lattice.

24

3.2.2 Adaptation as an automaton

In the previous subsection we deﬁned adaptation as a 5-tuple. An adaptation can also be

viewed as an automaton deﬁned as follows:

- S(A)= U {(A.s)ls€S(A)}
AeI

- 2(A) = U {(Am) l r E 204)} U 23..
AeI

- 6(4): U {((A.s).(A.r).(A.s’))Hairs-9’)66(A)}U
AEI

{((A,8),7l'a, (A’,8’) l (8177078,) E Smap}
0 S0(4) 9 (79,309))

In deﬁnition of 80(A) we use S (P), and not 80(P), because adaptation should be able
to start at any point in the execution of P.

Modeling adaptation as an automaton allows us to verify some general properties of
adaptation not concerning any overlapping behavior. On the other hand, modeling adapta-
tion as a set of automata is important to identify individual intermediate automata during

adaptation to verify properties due to overlapping behavior of the old program and the new

program.

3.3 Adaptation Speciﬁcation

In this section, we give some formal deﬁnitions used in specifying and verifying adaptive

programs. We adapt these deﬁnitions from Arora and Kulkami [56, 57].

25

Deﬁnition (State predicate). A state predicate X of A is any subset of S (A). We say X
istrueinstatesifs E X.
Deﬁnition (Closure). A state predicate X of A is closed in A (respectively, 6 (A), E(A))
iff the following condition holds:

Vs,s’,7r :: ((s,1r,s’) E 6(A)) => (8 E X => 3’ E X)
Deﬁnition (Computation). A computation of program A (respectively, adaptation A) is

a sequence of states a = (so, 31, ...) satisfying the following conditions:

0 For ﬁrst state 30 in a, so 6 80(A) (respectively, SO(A))

o Ifa is inﬁnite then Vj :j > O : (Eltr :: (sj_1,7r,.sj) E 6(A)) (respectively, 6(A))

o If a is ﬁnite and terminates in state 3,, then for all it, there does not exist a state
5 such that (31,7r,s) E 6(A) (respectively, 6(A)), and Vj : 0 < j 3 la] : (3n ::

(sj_1, 7r, Sj) E 6(A)) (respectively, 6(A))

Deﬁnition (Speciﬁcation). A speciﬁcation of A is a set of computations. Given a speciﬁ-
cation, a computation in a speciﬁcation is known as an acceptable computation. Following
Alpem and Schneider [42], a speciﬁcation can be decomposed into a safety speciﬁcation
and a liveness speciﬁcation. As shown in [58], for a rich class of speciﬁcations, safety
speciﬁcation can be represented as a set of bad transitions that must not occur in program
computations.

Deﬁnition (Satisﬁes). A satisﬁes a speciﬁcation if each computation of A is in the speciﬁ-
cation. A satisﬁes a speciﬁcation from X iff (i ) X is closed in A, and (ii) each computation

of A is in the speciﬁcation and starts from a state where X is true (i.e., 50(A) _C_ X).

26

 

 

 

CI

to
ah

llli

D1.

f/m
adap'.
each 1

hahf i")

 

F9“

Deﬁnition (Invariant). The state predicate X of A is an invariant iff A satisﬁes the speciﬁ-
cation from X. Note that, if X is an invariant of A, then X 2 30 (A). Inforrnally speaking,
the invariant predicate includes the set of all states reached in the “acceptable” (correct)
computations of A. Note that the invariant predicate may include states that are not reach-
able in all computations of the program. However, computations from those states satisfy
the speciﬁcation and, hence, those states may be valuable in adding recovery transitions to
provide fault tolerance [5 8].
Deﬁnition (Safety during adaptation). Similar to the speciﬁcation of A, safety speci-
ﬁcation during adaptation A is speciﬁed as a set of bad transitions that must not occur in
computations of adaptation A.
Liveness during adaptation. We argue that the speciﬁcation during adaptation should
be a safety speciﬁcation. This is due to the fact that one often wants the adaptation to be
completed as quickly as possible. Hence, it is desirable not to delay adaptation to satisfy
the liveness speciﬁcation during adaptation. Rather, it is desirable to guarantee that, after
adaptation, the program reaches states from where its (new) safety and liveness speciﬁca-
tions are satisﬁed. Thus, the implicit liveness speciﬁcation during adaptation is that the
adaptation completes. In other words, the liveness speciﬁcation during adaptation is that
each intermediate program eventually executes its adaptive action. For these reasons, we

have omitted the representation of liveness speciﬁcation of the program.

27

3.3.1 Fault-tolerance

In this subsection we give formal deﬁnitions for specifying and verifying fault-tolerance
properties of adaptive programs. These deﬁnitions are also adapted from Arora and Kulka-
rni [56, 57].

Deﬁnition (Fault class). Let 23f be a set of fault actions. A fault class F(A) for program
A is a subset of the set S (A) x 23f x S (A). We use A[]F to denote the transitions obtained
by taking the union of the transitions in 6 (A) and the transitions in F (A). A fault class
F (A) for adaptation A is:

U {((A73)17rfr(“413’))l(327rerI)E F(A)}

A e I
Deﬁnition (F ault-span). A state predicate T is a fault-span (F-span) of A from invariant

S iff (i) S g T, and (ii) T is closed in A[]F. A fault—span of a program includes the set of

states that a program can reach in the presence of faults and it is closed under the execution

of program and fault actions.
Deﬁnition (Computation in presence of faults). A computation of program A (respec-

tively, adaptation A) in the presence of faults is a sequence of states a = (so, 31, ...)

satisfying the following conditions:

0 For ﬁrst state 30 in a, 80 E SO(A) (respectively, SO(A))
o Ifo is inﬁnite then Vj :j > 0 : (3n :: (sj_1,7r, 53') E 6(A) U F(A)) (respectively,

5(A) U F (Al)

o If 0' is ﬁnite and terminates in state 31, then for all it, there does not exist a state
5 such that (Sl,7l',8) E 6(A) (respectively, 6(A)), and Vj : 0 < j S |a| : (3n .:
(sj__1, 7r, 3]) E 6(A) U F(A)) (respectively, 6(A) U F(A))

28

o

o Ifais inﬁnite then 3n : n 2 0 : (‘v’j : j > n : (3n :: (sj_1,7r,sj) E 0(A))

(respectively, 6 (A))

The ﬁrst requirement captures that the computation begins in a initial state of the pro-
gram (respectively adaptation). The second requirement captures that in each step, either a
program (respectively, program or adaptive) transition or a fault transition is executed. The
third requirement captures that faults do not have to execute, i.e., if the program reaches
a state where only a fault transition can be executed then the fault transition need not be
executed. Finally, the fourth requirement captures that the number of fault-occurrences in

the computation is ﬁnite. This requirement is the same as that made in the previous work
[59—62] to ensure that eventually recovery can occur.
Deﬁnition (Fault-tolerance (F-tolerant)). A is F-tolerant for speciﬁcation spec from S
iff the following two conditions hold: (i)A satisﬁes spec from S, and (ii) there exists T
such that T is an F-span of A from S, and every computation of A[]F starting in a state
where T is true satisﬁes spec.

Remark 1 . Henceforth, whenever the invariant S, the program A, and the speciﬁcation
spec are clear from the context, we will omit them; thus, “T is a F-span of A from S for
spec” abbreviates to “T is a F-span”.

Remark 2. Different types of tolerance speciﬁcations that normally occur in practice,
namely, masking, fail-safe, and non-masking tolerance have been considered in [56-58].
In this dissertation, we assume masking fault-tolerance unless speciﬁed otherwise. The

deﬁnitions can be easily extended to consider fail-safe and non-masking tolerance during

adaptation.

29

3.4 Concrete Representation

In this section, we discuss the programming notation we use to describe the system. For

brevity, we express programs using guarded commands [63, 64]. This gives a compact
representation of the program deﬁned as automata in Section 3.2 (in terms of state space
and transitions). Translating the guarded command representation of the program to its
automata representation is straightforward, as we discuss in this section.

Furthermore, the guarded command representation is closely related to a concrete im-
plementation. Speciﬁcally, techniques for obtaining a guarded command representation
from a program written in a general purpose language, such as C, are discussed in [65].

Also, techniques for transforming a program in guarded commands into a program in gen-

eral purpose languages are discussed in [66—68].

3.4.] Program

A program P is speciﬁed by a ﬁnite set of processes and channels. A process p is speciﬁed
by a set of variables and a ﬁnite set of actions. The processes in a program communicate
with one another by sending and receiving messages over unbounded channels that connect
the processes. A channel from process p to process q is denoted by a channel variable CW],
which is an unbounded queue. Only process p can append an item of data to the rear of
the queue Cp,q and only process q can delete an item at the head of the queue Cp,q. Each
variable has a predeﬁned nonempty domain. A state of a process is obtained by assigning
each variable a value from its respective domain. The state of the channel connecting p and

q is given by the value of the queue CW]. The state of the program is given by the state of

30

all the processes and the channels. The state space of the program P, S (P), is the set of all
possible states of P. We use 3(a) to denote the value of variable a: in state 3, and V(p) to
denote the set of variables of process p. A state predicate of P is a boolean expression over
process and channel variables.
Note that a state predicate may be characterized by the set of all states in which its
boolean expression is true and, therefore, is a subset of the state space of the program.

Action. An action of p is uniquely identiﬁed by a name, and is of the form

(name) : (guard) —> (statement)

A guard of each action is a state predicate of P. The statement of each action is such
that its execution updates zero or more process or channel variables. The sending of a
message from p to q causes a message to be appended at the tail of the queue CW]. The
receipt of a message from q by p is modeled by removing a message from the head of the
queue Cp,q.
The set of actions of the program P, 2(P) is given by the set of names of all the actions
of all the processes of P. Each action of p gives the set of transitions of the form (3, 7r, 3’)
such that the guard of action 7r is true in state 3 and execution of statement of it in 3 results
in state 3’. Thus, the state-transition relation 6 (P) is obtained from the set of actions of

all the processes of P. We say that an action of p is enabled in a state of p iff its guard

evaluates to true in that state.

31

3.4.2 Component

A component is speciﬁed by a ﬁnite set of fractions that are involved in providing a common
functionality. Intuitively, a component implements a part of the desired behavior of the
system, such as some algorithm or protocol. A component fraction is speciﬁed by a set of
variables and a ﬁnite set of actions that are associated with a single process. A component
(respectively, fraction) is syntactically the same as a program (respectively, process), with
the only difference that some variables of the component are designated as input, whose
values are supplied by the program with which it is composed. The composition of the

component and the program is the union of the variables and actions of the component and

the program.

3.4.3 Adaptive action

An adaptive action is a special type of action, which is identiﬁed by a unique name and is

of the form

(name) : (guard) —+ Transfor'mT0(p’,<I>).

If the adaptive action is an action of process p, then when the statement of the adaptive
action is executed, p is replaced by p’ and state-mapping (I) is used to initialize the variables
ofp’. Each adaptive action 1ra gives a set of adaptive transitions of the form (3, 7ra, s') such
that the guard of 7ra is true in state 3 of process p and execution of the statement of na

results in state 3’ = <I>(s) of process p’. The state mapping function S map(A) is obtained

ﬁ'om the set of all adaptive actions.

32

From a modeling perspective, we consider that the adaptive action replaces the entire
process, even if only a small part of it is actually changed. In an actual implementation,
the adaptive action can be performed in various ways, such as by blocking execution of

some method, or by loading/unloading some class. However, for veriﬁcation we need to

consider only the effect of the adaptive action. Additionally, considering each adaptive

action as a generic form of process replacement gives the developer freedom to implement

the adaptive action based on the platform and the language used.

3.4.4 State mapping

We deﬁne the following classes of state mapping (P that occur during atomic adaptation:

0 Identity mapping. In identity mapping, the names and the values of the variables

remain the same. Formally, V(p) g V(p') and for all s, (<I>(s))(y) : 3(3)).

0 Quasi mapping. In quasi mapping, the name of the variable of the new process is
different from that of the old process, though its value is the same as the value of
some equivalent variable in the old process state. Formally, for a variable y of V(p’),

there exists a variable :1: of V(p) such that for all s, (<I)(s))(y) = s(:r).

0 Initial mapping. In initial mapping, the variables of the new process are initialized
to some value as in the initial state of the new process. Formally, for a variable y of

V(p’), for all s, (<I)(s))(y) = yo, where yo E 30(3)) and SO(y) is the set of values

from domain of y that y can take in the initial states of process p’.

33

0 Functional mapping. In functional mapping, the value of the variable of the new

process is some function of the values of variables of the old process.

0 Arbitrary mapping. An arbitrary mapping is a special type of functional mapping,
where all variables of the new process are assigned some arbitrary value. Formally,
for a variable y of V(p’), for all s, (<I>(s))(y) = yd, where yd E D(y) and D(y)

denotes the domain of variable y.

0 Mixed mapping. Most mappings that occur in practice are mixed mappings, in which

variables of the new process V(p’ ) are divided into disjoint sets, and one of the above

mappings is associated with each set.
Notation. We use “.” to denote the belongs to relation. For example, if variable 1) belongs
to process p, it is denoted by pa, and action a of process p is denoted by pa. A process p

of program P is denoted by P.p, and a fraction 2' of component C is denoted by C.z'. For

brevity, we avoid using belongs to relation if it is obvious from the context.

34

Chapter 4

Verifying Adaptation in Absence of

Faults

In this chapter, we introduce the notion of transitional-invariant and transitional-invariant
lattice to verify the correctness of adaptation. We ﬁrst deﬁne transitional-invariant in Sec-
tion 4.1. Next, in Section 4.2, we deﬁne transitional-invariant lattice and give a theorem to
prove correctness of adaptation. In Section 4.3, we present a case study of adaptation in the
message communication application to demonstrate the use of transitional-invariant lattice.

Finally, we discuss some of the questions raised by this work in Section 4.4.

4.1 Transitional-Invariant

As discussed in Chapter 3, the program during adaptation consists of actions of the old pro-
gram and actions of the new program. Therefore, we consider intermediate programs ob-

tained after one or more atomic adaptations. Similar to the invariants that are used to iden-

35

 

 

1111.

De

are

ads:

Ira:

8.)

La)

 

tify “legal” program states and are closed under program execution, we deﬁne transitional-
invariants.
Deﬁnition (I‘r'ansitional-invariant). Let R be an intermediate program in the adaptation
A. A transitional-invariant is a predicate that is closed in R.

Note that the actions of an intermediate program are the old program’s actions that
are not yet removed and the new program’s actions that are already added. However, the

adaptive actions do not necessarily preserve the transitional-invariant. Now, we deﬁne

transitional-invariant lattice.

4.2 Transitional-Invariant Lattice

A transitional-invariant lattice is an adaptation lattice with each node having one predicate

and that satisﬁes the following ﬁve conditions:

1. Safety of old program. The start node P is associated with an invariant Sp of the

program before adaptation.

2. Safety of new program. The end node Q is associated with an invariant SQ of the

program after adaptation.

3. Safety of intermediate program. Each intermediate node R is associated with a
predicate TS R that is a transitional-invariant for any intermediate program at R (i. e.,
an intermediate program obtained by performing adaptations from the entry node to

R). Furthermore, any intermediate program at R satisﬁes the (safety) speciﬁcation

during adaptation ﬁom TS R-

36

 

 

 

 

for
are 5
durii
11111

8&1 n

and 51.11

 

4. Safety of adaptive action. If a node labeled R,- has an outgoing edge labeled a to
a node labeled Rj, then for all adaptive transitions (5, a, s’) in Smap where TS Rt is
true in state 3, TSR]. is true in state 3’. In other words, Vs, s’ : (s, a, s’) E Smap :
s E TS R,- => 3’ E TS Rj‘ Furthermore, all the adaptive transitions (3, a, s’) satisfy

the safety speciﬁcation during adaptation.

5. Progress of adaptation. If a node labeled R has outgoing edges labeled

a1, a2, ..., ak to nodes labeled R1, R2, ..., Rk, respectively, then in all computations
of adaptation there exists a transition (3, s') such that for some 2' : 1 g 2'. g k :

(s,a~,s’) E Sma . Furthermore, ‘v’s : s E TS :(Va,s’ : a E Ea — a ,...,a :
2 P R 1 k

(3,11, 8') 6! Smapl-

Correctness of adaptation. Intuitively, an adaptation is correct if the following conditions
are satisﬁed: If the adaptation begins in a legitimate state of the old program, then safety
during adaptation is met and the resulting state of the new program is legitimate. With this

intuition, if adaptation begins in a state where invariant of the old program is true, then we

say that adaptation is correct if:

0 Adaptation terminates in a state where invariant of the new program is true
0 During adaptation safety speciﬁcation during adaptation is satisﬁed

0 Eventually adaptation terminates

The following theorem states that ﬁnding a transitional-invariant lattice is necessary

and sufﬁcient for proving correctness of adaptation.

37

 

 

 

 

 

 

im'

lsﬁe

dun

Iallw

 

the :

 

 

Theorem 4.1. Given Sp as the invariant of the program before adaptation and SQ as the
invariant of the program after adaptation, the adaptation from P to Q is correct if and only
if there is a transitional-invariant lattice for the adaptation with the start node associated
with Sp and the end node associated with SQ.

Proof

(=>) If the transitional-invariant lattice exists, then adaptation is correct.

If the stated conditions are satisﬁed, then the speciﬁcation of the old program is sat-
isﬁed when the adaptation starts. Also, the existence of the transitional-invariant lattice
during adaptation ensures that for each intermediate program that occurs during adap-
tation, the speciﬁcation during adaptation is satisﬁed. Moreover, from the deﬁnition of
the transitional-invariant lattice, each adaptive action satisﬁes safety speciﬁcation during
adaptation. Also, in each intermediate program eventually some adaptive action will be
executed, which ensures the liveness of adaptation. Furthermore, the last adaptive action
terminates in the invariant of the new program, from where the system satisﬁes the behav-
ior of the new program. Thus, the existence of the transitional-invariant lattice proves the
correctness of adaptation.

(<2) If adaptation is correct, then transitional-invariant lattice exists (proof by construction).

Let adaptation consist of n adaptive actions a1, ..., an. Consider all adaptive actions
that can occur in a state of the old program. Since the adaptation is correct, each of these
adaptive actions occur in a state of the old program where Sp holds, and execution of these

adaptive actions satisﬁes the safety during adaptation.

38

ﬁ 59“.! an».

. i

. i‘
0

'31:“ VF!"

 

E
tum W _

 

put

the

Sill)

old

¥'CAAI

1111 L.’

let-c7

 

 

Now, consider the intermediate program [1 reached after execution of al. In all com-
putations of the old program till the execution of al, the invariant Sp is satisﬁed since
the old program is correct. Since the adaptation is correct, the intermediate program [1
satisﬁes the speciﬁcation during adaptation (otherwise, a1 is not permitted in a state of the
old program). Once a1 is executed, we consider all the computations of the intermedi-
ate prograrn 11, and identify the transitional-invariant TS 11 associated with it. Similarly,
we consider all computations of the intermediate program reached after the execution of
some adaptive action other than al in a state of the old program, and ﬁnd the transitional-
invariant corresponding to that intermediate program. In this way, we construct the ﬁrst
level of intermediate programs and corresponding transitional-invariants starting ﬁ‘om the

old program.

Now, for each intermediate program at the ﬁrst level we consider all possible adap-
tive actions that can occur in some state of its computations. We can then identify the
transitional-invariants at the second level in the lattice by considering all the computations
of the intermediate programs reached due to the execution of adaptive actions in the states
of the corresponding intermediate programs at ﬁrst level.

In this way, we can continue to ﬁnd transitional-invariants at various levels in the lattice.
Since the adaptation is correct, the atomic adaptation in each intermediate program at level

n — 1 will result in a state of the new program where SQ holds.

Thus, the correctness of adaptation proves the existence of the transitional-invariant

lattice.

El

39

 

 

gr

 

4.3 Case Study: Reliable Message Communication

In this section, we present an example that illustrates how the transitional-invariant lattice
can be used to verify correctness of adaptation in the context of a simple message communi-
cation program. The communication program that we consider is an abstraction of the com-
munication aspect of the applications such as video conferencing, audio streaming, or any
distributed application where messages are transferred over wired or wireless channel. We

ﬁrst describe the fault-intolerant message communication program in Section 4.3.1. Then,

 

we describe the FEC-based proactive component in Section 4.3.2. Next, in Section 4.3.3,
we discuss adaptation of adding the proactive component to the fault-intolerant message
communication program. In Section 4.3.3, we also identify the transitional-invariant lattice
for the adaptation. In Chapter 5, we continue with the message communication program

to discuss the adaptation of replacing the proactive component with the acknowledgment-

based reactive component.

4.3.1 Fault-Intolerant Communication Program

Speciﬁcation of the communication program. An inﬁnite queue of messages at sender
process 3 is to be sent to two receiver processes T1 and r2 via two unicast channels and
copied into corresponding inﬁnite queues at the receivers. Faults may cause loss of mes-
sages in the channel.

The message communication program is shown in Figure 4.]. Only send and receive

actions of the program are shown, since only those actions are considered for adaptation.

Processes 3, r1, and r2 maintain queues sQ, r1.rQ, and r2.rQ respectively. sQ con-

40

 

 

program 7pintol

process 3

var sQ : queue of integer

m : integer
begin
send : pisEmpty(sQ) —+ m :2 head(sQ);
CS,TerS,T2 3: Cs,r1 0 m, 08,7‘2 0 777.

end

process ri[i = 1,2]

var rQ : queue of integer
m :integer

 

begin
receive : -dsEmpty(Cs,rz-) ——+ m := head(C3,rz-);
rQ :2 'rQ o m
end

 

 

Figure 4.]: Message communication program (fault-intolerant version).

tains messages that 3 needs to send to 7'1 and r2. The messages received by r,- from s are
stored in 7310. Let mQ be the queue of all messages to be sent. (mQ is an auxiliary vari-
able that is used only for the proof.) Initially, sQ = 77162. The function head(sQ) returns
the message at the front of 3Q, and head(sQ, It) returns k messages from the front of 362.
The notation 362 o d denotes the concatenation of SC) and (d).
Invariant. The invariant of the communication program is Sp = S 1 /\ S2, where
51 2 Vi : (m,- E r1.rQ V m,- E r2.rQ) => mi E mQ, and
5'2 2 Vi : m,- E mQ => (m,- E sQ 2 ((m2 E C3,?1 X m,- E r1.rQ)

/\ (m,- E C5”? Y m,- E r2.rQ))).
In the above invariant, S 1 indicates that messages received by the receivers are sent by the
sender. S2 indicates that a message m, is not yet sent by the sender, or it is in the channel,

or it is already received by the receiver, all exclusive.

41

Notation. The s bol X denotes exact! one o erator, i. e., a: )1 X 2 im lies exact] one of
ym y P y P Y

x, y and z is true.

4.3.2 Proactive Component

The proactive component sends extra messages to the receiver, which the receiver can use to
recover from the lost messages. It consists of two types of fractions: encoder and decoder.
The encoder fraction is added at the sender process and the decoder fraction is added at
the receiver process. The encoder takes (n — k) data packets and encodes them to add I:
parity packets. It then sends the group of 71. (data and parity) packets. The decoder needs to
receive at least (it — k) packets of a group to decode all the data packets. This component
provides tolerance to certain message loss faults (discussed in Chapter 5).

Figure 4.2 shows the abstract version of the proactive component. The encoder and
the decoder fractions of the component are shown. The encoder fraction consists of two
actions: encode and f ec-send. The decoder consists of two actions: decode and
fec-receive. These fractions are composed with the process that will use them. The
composition of a ﬁ'action and a process is done by union, which is equivalent to com-
bining the actions of the fraction and the process. We assume that appropriate renaming
is performed so that there are no inconsistencies in the deﬁnitions of the variables of the
fractions and the processes. The message communication program composed with the
proactive component is shown in Figure 4.3.

Speciﬁcation of program using the proactive component. The program using the proac-

tive component satisﬁes the same speciﬁcation as the communication program.

42

 

 

 

 

 

Component fec

Fraction encoder

inp sQ : queue of integer
7‘1,7‘2
var n, k, u, l, m : integer {initially, it = l = m = 0}
ean : array [integer, 0..n — 1] of integer {initially, ean = .L}
begin
encode : true —> ean[u, 0..n — 1] z: fec_encode(head(sQ, n —— k));
u := u + 1

[] fec_send : ean[l,m] 74 J. —) 03,7.l :2 C5,,»1 o {l,m,ean[l,m]};
Csﬂ‘2 :2 C33,,»2 o {l, m, ean[l, m]};
m := (m + 1) mod n;

if m = 0 then
I := l + 1
ﬁ
end
Fraction decoder,
inp rQ : queue of integer
3
var n, k, :13, y,p, m : integer {initially,p = 0}
rbqu : array [integer, 0..n — 1] of integer {initially, rbqu = I}
begin
fec_receive : —dsEmpty(Cs,7-i) —> :13, y, m :2 head(Cs,rz.);
rbqu[a:, y] := m

[] decode : count(rbqu[p,0..n — 1] 72 .I.) >= (n — k) -+
rQ := rQ o fec_decode(rbqu[p, 0..n —— 1]);
p 2: p + 1
end

 

 

 

Figure 4.2: Proactive component.

Invariant. The invariant of the program using the proactive component is S Q = S 1 /\ S p,
where
SF =Vi2miEmQ=>(mZ-ESQ

‘1 ((mi E r1.rQ )1 m,- E data(ean U Cs,r1 U r1.rbqu))

/\ (m,- E r2.rQ X m,- E data(ean U C33? U r2.rbrer)))).

43

 

Program P fec

process 3
var: Pintol-S-Var U fec.encoder.var
begin
fec.encoder.encode
[] fec.encoder.fec.send
end

process rz-[r' = 1, 2]
var: Pintol-Ti-Var U fec.decoder,~.var
begin
fec.decoder.fec.receive

[] fec.decoder.decode
end

 

 

 

Figure 4.3: Message communication program (with proactive component).

We use the notation m, E data(ean U CM1 U r1.rbqu) to imply that message m,- can
be generated from the data in {ean U Car1 U r1.rbqu}. In the above invariant, S p
indicates that the message is either at the sender, or already received by the receiver, or it

can be generated from the data in the channel and the buffers at the sender and the receiver.

4.3.3 Adaptation: Addition of the Proactive Component

The adaptation of adding the proactive component converts the program shown in Figure
4.1 to the one shown in Figure 4.3. We ﬁrst require an adapt-ready version of the program
Pintol as shown in Figure 4.4. (An adapt-ready program is one that is composed with
adaptive actions.) We now give the speciﬁcation during adaptation of adding the proactive

component.

Speciﬁcation during adaptation. The speciﬁcation during adaptation is that S 1 continues

44

 

program ’P -intol
process 3
var : Pintol-S-Var
begin
Pintolssend
[] a1 :true —-> TransformTo(Pa_z-p1.s,(Dal);
end
process r,[i = 1, 2]
var rQ : queue of integer
begin
pinto l .rz- .receive
a(,-+1) :a1 A isEmpty(C3,7~z.) -—> transformTo(Pfec.r,-,<I>a(

241));
end

 

 

Figure 4.4: Message communication program (fault-intolerant version, adapt-ready).

to be true during adaptation.

We describe the adaptation by identifying the intermediate programs and the corre-

sponding transitional-invariants during adaptation after each atomic adaptation.

 

program P -2-p1
process 3
var : Pintolsvar
begin
a4 :a2 /\ a3 ——> TransformTo(Pfec.s,<I>a4);
end

 

process rz-[i = 1,2] : same as in Fig. 4.4

 

 

Figure 4.5: Intermediate program Pa_z-p1.

The execution of adaptive action al in P _,-nt 01 results in intermediate program P 'ipr

shown in Figure 4.5. PCH-p1 does not send any packets, but the packets that are there in

o

the channel can still be received by the receivers TI and r2. In the execution of P 4101

eventually all the packets in the channel are read and no new packets are added in the

45

 

 

 

 

u.
n

 

 

channel from the sender to the receiver. Thus, the guards of the adaptive actions a2 and
a3 eventually get enabled. The transitional-invariant of Pug-1,,1 is: TS 1 = 81 A 32, where

81,52 are as deﬁned earlier in Section 4.3.1.

 

program P 41,2
process 3 : same as in Fig. 4.5

process r1 : same as in Fig. 4.3

 

process r2 : same as in Fig. 4.4

 

 

Figure 4.6: Intermediate program P 4192.

Since a1 and a2 occur independently, we consider both possible orderings among them.
The execution of adaptive action oz in P -2-p1 results in intermediate program P 41,2
shown in Figure 4.6. In Pa_ip2, receiver r1 has replaced its fraction, whereas receiver
r2 has not yet replaced its fraction and can receive any remaining packets in the channel
from s to r2. Eventually, in the execution of Pa-z-p2, the guard of adaptive action a3 gets
enabled and a3 is executed resulting in intermediate program P041, 4.

The transitional-invariant of P,”p2 is T32 = S 1 /\ 33, where
S3 2 Vi : mi E mQ => (m,- E sQ X ((mz- E r1.rQ) /\ (m,- E 03,7? ‘1 m,- E r2.rQ))

/\ isEmpty(Cs,r1) = true /\ r1.rbqu = _L /\ up = 0).

 

program 82...,
process 8 : same as in Fig. 4.5

process r1 : same as in Fig. 4.4

 

process r2 : same as in Fig. 4.3

 

 

Figure 4.7: Intermediate program Pa_,-p3.

The execution of adaptive action a3 in PCH-p1 results in intermediate program P 41,3
shown in Figure 4.7. In Pa_.,-p3, receiver r2 has replaced its ﬁaction, whereas receiver r1

46

 

 

137m=¥— .

 

 

has not yet replaced its fraction and can receive any remaining packets in the channel from
s to r1. Eventually, in the execution of Pa,ip3, the guard of adaptive action a2 gets enabled

and a2 is executed resulting in intermediate program Pa_ip4.

The transitional-invariant of Pa is T53 = S 1 A S4, where

4123
S4 = Vi : m,- E mQ =9 (m,- E sQ \_/ ((mz' E r2.rQ) A (m,- E CS”.1 X m,- E r1.rQ))

A isEmpty(C'3,7~2) = true A r2.rbqu = J. A r2.p = 0).

 

program Pa-.,-p4
process 3 : same as in Fig. 4.5

process ri[i = 1,2] : same as in Fig. 4.3

 

 

 

Figure 4.8: Intermediate program P 41,4.

In intermediate program P,”-p4 shown in Figure 4.8, only adaptive action a4 is enabled,
and execution of (14 results in new program Pfec. The transitional-invariant of 734-4194 is
TS4 = 81 A S5, where
S5 2 Vi. : m,- E mQ => (mi E sQ Y (mi E r1.rQ A m,- E r2.rQ))

AisEmpty(C3,1~1) = true A r1.rbqu = J. A r1.p = 0
A isEmpty(C3,T2) = true A r2.rbqu = J. A r2.p = 0.

We now give the state mappings for the adaptive actions in the adaptation that are used
in initializing the state of the new ﬁ'action at each process.

State mapping. The state mapping for each adaptive action is shown in Table 4.1. Each
adaptive action initializes the state of the new process when it is executed based on this
mapping.

Based on the description of adaptation in this section, we ﬁnd the transitional-invariant

lattice as shown in Figure 4.9 for the adaptation of adding the proactive component. Thus,

47

 

Mapping Function Process Affected New State

 

(Dal 3 Identity mapping

(1)42 r1 {rd}, 3} - Identity mapping,
{n, k, r, y, p, m, rbqu} - Initial mapping

(1)43 r2 {rQ, s} - Identity mapping,
{72, k, 2:, 21.19. m. rbqu} - Initial mapping

(1)44 s {sQ, r1, r2} - Identity mapping,
{71, k, u, l, m, ean} - Initial mapping

 

 

 

 

 

Table 4.1: State mappings for the adaptation.

Pinto] : S P

 

 

 

Figure 4.9: Adaptation lattice for addition of proactive component.

we have the following theorem.

Theorem 4.2. The adaptation lattice of Figure 4.9 is the transitional-invariant lattice for

the adaptation of adding the proactive component. Hence, the adaptation is correct. [I]

48

4.4 Discussion

In this section, we discuss some of the questions raised by this work.

Why is the speciﬁcation during adaptation a safety speciﬁcation?

The speciﬁcation of the application before adaptation can be arbitrary. However, during
adaptation the speciﬁcation should be a safety speciﬁcation. It is not desirable to delay the
adaptation to satisfy liveness during adaptation. Rather, we would expect the adaptation to
complete as quickly as possible and the new program to satisfy the safety and liveness after
adaptation. For example, consider a transaction processing system with liveness guarantees
to commit or abort. In this case, either the adaptation should not start in the middle of
the transaction, or if the adaptation can be started in the middle of the transaction then
the liveness should be met once the adaptation is completed. Thus, the implicit liveness

speciﬁcation during adaptation is that adaptation completes.

How is transitional-invariant lattice constructed?

Techniques [64, 69] developed to calculate invariants of a program can also be used to ﬁnd
transitional-invariants. For a given adaptation model, we can perform reachability analy-
sis for each intermediate program obtained after execution of the atomic adaptation. The
reachability computation for an intermediate program helps in identifying the transitional-
invariant for that intermediate program, and we can construct a transitional-invariant lattice
for the given adaptation. Furthermore, the techniques for dynamically discovering likely
invariants from the execution of the system such as [70—72] can be used to ﬁnd transitional-

invariants for the intermediate programs in the adaptation lattice.

Can transitional-invariant lattice approach be used to verify existing adaptation tech-

49

niques?

Yes, the transitional-invariant lattice approach can also be used to verify existing adapta-
tion techniques. We give an outline of how existing adaptation techniques can be veriﬁed
using transitional-invariant lattice. To verify existing adaptation approaches, we ﬁrst need
to identify all atomic adaptations and the corresponding adaptive actions. We then need to
consider all possible orderings and concurrency among adaptive actions, and identify inter-
mediate programs after each adaptive action. Next, we ﬁnd transitional-invariants for each

intermediate program and check if the transitional-invariants imply safety of adaptation.

Can adaptation lattice approach be used in the case of parallel adaptation?

Yes, the adaptation lattice approach can also be used to verify parallel adaptation. We dis-
cuss the use of adaptation lattice approach in the case of parallel adaptation using a simple
example. Consider a system consisting of two processes, a sender process and a receiver
process. Both processes communicate using an encryption protocol. The adaptation re-
quires that eventually the current (or, old) encryption protocol be replaced by another (or,
new) encryption protocol. The parallel adaptation adds the new encryption protocol along-
side the current encryption protocol. In other words, the sender process has two types of
ﬁactions (for encryption) and the receiver process has two types of fractions (for decryp-
tion). All users using the sender process to send data to users at the receiver process initially
use the old encryption protocol. Once the new encryption protocol is added, some users
use the old protocol whereas some users use the new protocol. Eventually, after all users
have stopped using the old protocol, it can be removed from the system.

The adaptation lattice in this case is as shown in Figure 4.10. Program P uses the

50

 

 

 

P,: old, new P,: old
Pr: old Pr: old, new

P,: old, new

Pr: old, new

P3: new P.: old, new
Pr: old, new Pr: new

 

Figure 4.10: Adaptation lattice for parallel adaptation.

old protocol fractions at both the processes. Eventually, after adaptation is completed,
program Q uses the new protocol fractions at both the processes. Program P’ uses both
types of fractions at both the processes. During adaptation, program P' can stay active for
an unbounded time. The correctness requirements for P’ are that the old (respectively, new)
fraction at the sender process communicates only with the old (respectively, new) ﬁ'action at
the receiver process. The correctness requirements for the intermediate programs between

P and P' are that the old fraction at the sender process communicates only with the old

51

fraction at the receiver process, and the communication from the new fraction at the sender
process to the receiver process is buffered. Similarly, the correctness requirements for the
intermediate programs between P’ and Q are that the new fraction at the sender process
communicates only with the new fraction at the receiver process, and no communication
occurs between the old fraction at the sender process and the old fraction at the receiver

process.

52

Chapter 5

Verifying Adaptation in Presence of

Faults

In this chapter, we introduce the notion of transitional-faultspan and transitional-faultspan
lattice to verify the fault-tolerance properties during adaptation. We ﬁrst deﬁne transitional-
faultspan in Section 5.1. Then, in Section 5.2, we deﬁne transitional-faultspan lattice and
give a theorem to prove the correctness of adaptation in the presence of faults. Finally, in
Section 5.4, we present a case study of adaptation in the message communication applica-

tion to demonstrate the use of transitional-faultspan lattice.

5.1 Transitional-Faultspan

Let F p be the fault class of the old program and FQ be the fault class of the new program.
Let Sp be an invariant and Tp be a Fp-span of the old program. Similarly, let SQ be an

invariant and TQ be a FQ-span of the new program. The old program is Fp-tolerant, and

53

the new program is F Q-tolerant. Let F be the fault class during adaptation.

In the context of adaptation, we deﬁne transitional-faultspans to identify the set of states
that the program can reach while performing adaptation in the presence of faults.
Deﬁnition ('IYansitional-faultspan). Let R be an intermediate program in the adaptation
A, and TS be a transitional-invariant of R. A transitionalfaultspan (F-span) of R from
TS is a predicate TT that satisﬁes following two conditions: (2') TS ; TT, and (ii) TT is
closed in R[] F.

Now, we deﬁne transitional-faultspan lattice.

5.2 Transitional-Faultspan Lattice

A transitionalfaultspan (F-span) lattice is an adaptation lattice where each node is as-
sociated with two predicates, a transitional-invariant and a transitional-faultspan, and the

following conditions are satisﬁed:

0. Correctness in absence of faults. The adaptation lattice obtained by considering

the transitional-invariants only forms a transitional-invariant lattice.

1. F ault-tolerance of old program. The entry node P is associated with a FP-span

Tp of the program before adaptation.

2. Fault-tolerance of new program. The exit node Q is associated with a FQ-span

TQ of the program after adaptation.

3. F ault-tolerance of intermediate program. Each intermediate node B is associated
with a predicate TT R that is a transitional-faultspan (F-span) from TS R for any

54

intermediate program at R (i.e., intermediate program obtained by performing adap-
tations from the entry node to R). Furthermore, any intermediate program at R is

F-tolerant ﬁom TS R-

4. Safety of adaptive action. If a node labeled R1- has an outgoing edge labeled a to
a node labeled Rj, then for all adaptive transitions (3, a, s') in Smap where TTR,

'. In other words, Vs,s’ : (s,a,s’) E

is true in state 3, TTRJ- is true in state 3
Smap : s E TTR, => 3’ E T T Rj' Furthermore, all the adaptive transitions (3, a, 3’)

satisﬁes the safety speciﬁcation during adaptation.

5. Progress of adaptation. If a node labeled R has outgoing edges labeled
a1, a2, ..., a k to nodes R1, R2, ..., Rk, respectively, then in all computations of adap-
tation there exists a transition (.9, s’) such that for some 2' : 1 _<_ i S k : (s, 0.2-, s’) E
Smap. Furthermore, Vs : s E TTR : (Va,s’ : a 6 Ba — {a1,...,ak} : (s,a,s’) g!

Smap)-

Correctness of adaptation in presence of faults. Intuitively, an adaptation is correct in
presence of faults F if the following conditions are satisﬁed: If the adaptation begins in
a legitimate state of the old program then during adaptation each intermediate program is
F—tolerant and the resulting state of the new program is legitimate. With this intuition, if
the adaptation begins in a state where the fault-span of the old program is true, then we say

that the adaptation is correct if:

0 Adaptation terminates in a state where fault-span of the new program is true

0 During adaptation, each intermediate program is F -tolerant

55

o Eventually adaptation terminates

The following theorem states that ﬁnding a transitional-faultspan lattice is necessary
and sufﬁcient for proving correctness of adaptation.
Theorem 5.1. Given Sp as the invariant of the program before adaptation, Tp as the
faultspan used to show that the program before adaptation is tolerant to F p, SQ as the
invariant of the program after adaptation, and TQ as the faultspan used to show that the
program after adaptation is tolerant to F 62' the adaptation from P to Q is correct in pres-
ence of faults F if and only if there is a transitional-faultspan (F -span) lattice for the
adaptation with start node associated with Sp and Tp, and end node associated with SQ
and TQ.
Proof

The proof of this theorem is similar to Theorem 4.1 discussed in Chapter 4. E]

5.3 Adaptation of Self-Stabilizing Programs

In this section, we consider the adaptation by Gouda et al. in [73], where the authors have
focused on adapting from one self-stabilizing [59] program into another self-stabilizing
program. We show that this is an instance of our approach where all the transitional-
faultspan predicates are true.

A program is self-stabilizing if starting ﬁ'om an arbitrary state it eventually recovers to
a legitimate state. Thus, in transforming from one stabilizing program to another, we can
let all the fault-spans (i. e., fault-span of the old program, fault-span of the new program
and transitional-faultspans of the intermediate programs) be true. With this approach, if the

56

old program starts in any state, eventually the new program execution begins although the
state of the new program may be arbitrary. Since the new program is self-stabilizing, it will
eventually recover to legitimate states.

Note that in [73] the corresponding transitional-invariants may not exist. Speciﬁcally,
even if the old program begins in legitimate states, the new program may initially be in ille-
gitimate states before recovery takes place. Moreover, the approach in [73] allows arbitrary

behavior during adaptation and, hence, the speciﬁcation during adaptation may not be met.

5.4 Case Study: Reliable Message Communication (Con-

tinned)

In this section, we continue with the example of Chapter 4. We consider the adaptation
that replaces the proactive component with the reactive component. We ﬁrst discuss the
adapt-ready version of the proactive component and the faults tolerated by the proactive
component in Section 5.4.1. Next, in Section 5.4.2, we describe the acknowledgment-
based reactive component. We then discuss the adaptation of replacing the proactive com-
ponent by the reactive component in Section 5.4.3. Finally, in Section 5.4.3, we identify
the transitional-faultspan lattice to verify the correctness of this adaptation in the presence

of faults.

57

 

 

 

C (I

5.4.1 Proactive Component

We discussed the proactive component in Chapter 4. The adapt-ready version of the proac-

tive component is shown in Figure 5.1.

 

Program Poo-fee
process 3
var: Pfecsvar
begin
fec.encoder.encode
ﬂ fec.encoder.fec-send
[] aal :true -—> transformTo(Paa_z-p1.s,(Dual)
end
process 'rz-[z' = 1, 2]
var: Pfesz-yar
begin
fec.decoder.fec_receive
I] fec.decoder.decode
[] ooh-+2) : aa2 AisEmpty(Cs,7~z-) —> transformTo(’Pack.rz-,(Paa(i+2))
end

 

 

 

Figure 5.1: Message communication program (with proactive component, adapt-ready).

The speciﬁcation of the program using proactive component is discussed in Chapter 4.
Additionally, it tolerates message loss faults of class F 1 (cf. Figure 5.2). Faults of class
F 1 causes a loss of up to k messages in a group. In writing the fault transitions, we use the
following auxiliary variables: m9 to denote a message m from group 9, and lostC’ount?
to denote the number of messages lost in group g in the channel from s to ri. Initially,
Vg :: lostCounti.’ = 0. We now give the fault-span of the program using the proactive
component.

Fault-span. The F 1-span of the program using the proactive component is TQ = SQ.
The fault-span is same as the invariant since the proactive component provides masking

58

 

 

a:

In

Ilt‘.

 

/ 3

LI.
' ’1

555:.

Ctr-
‘\

 

 

mngossi : m9 6 CS,” /\ lostCountzg _<_ k —> Cs,” := Cs,” — m9;
lostCounté2 + +

 

 

 

Figure 5.2: Fault class F1.

fault-tolerance.

5.4.2 Reactive Component

The reactive component deals with the message loss by retransmitting the lost packets. It
uses acknowledgments to conﬁrm the receipt of messages sent by the sender, and negative
acknowledgments to conﬁrm the loss of messages sent by the sender. It consists of aSnd
fraction at the sender and aRcv fraction at the receiver. The aSnd fraction adds a group
and a packet number in each packet. It maintains a window of size to and sends all packets
in that window to the receiver. It waits for the acknowledgment of receipt of a group
before moving the window one group forward. If it receives a negative acknowledgment
for any packet, it sends that packet again to the receiver. When the aRcv fraction at the
receiver receives a packet out of order, it waits for a few more packets before sending a
negative acknowledgment to the sender. When all packets in a group are received, it sends

an acknowledgment for that group to the sender.

 

mngossz- : m 6 CS,”- -—> C3,”. :2 Cs,” — {m}

 

 

 

Figure 5.3: Fault class F2.

The reactive component provides tolerance to message loss faults F2 shown in Figure
5.3. Faults of class F2 causes loss of messages from the channel. For simplicity, we
assume that acknowledgment messages are not lost; however, the component can be easily
extended to deal with faults that lose acknowledgments by using timeout guards.

S9

 

 

Component ack
Fraction aSnd
inp sQ : queue of integer
r1, r2
var n, w, 92-, p,- : integer {initially, p,- = g,- = 0}
9a, gnamna 3 integer {initially, Pi = 92' = 0}
chopyz- : queue of integer {initially, Empty}
snti : array [0..w — 1, 0..n — 1] of integer {initially _L}
param i : i = 1, 2
begin
com I isEmpty(chopyz-) —+ SQCOPyz' == 8Q
U sendz- : ﬁisEmpty(chopyz-) A Mitt-[923197;] = _L —+
sntilgimz'] == {9231923 head(8Q60pt/z')};
C3,”. :2 Cs,” 0 snti[gi,pi];
p,- := (p,- + 1) mod n;
if p,- = 0 then
9,- := (91- + 1) mod 10
ﬁ
[] resend,- : type(Cri,s) = nack -—+ 9nmpna :2 head(Crz.,s);
if snti [912a, pm] 524 i then
03,73- 3: 03,13; 0 sntz’lgnamna]
ﬁ
[] ack_rcvz- : type(Cri,s) = ack —> 90,, sntz-[ga,0..n — 1] := head(C7~2-_,3), i
end

 

Figure 5.4: Acknowledgment component: sender fraction.

Figures 5.4 and 5.5 shows the abstract version of the reactive component. The aSnd
fraction consists of four types of actions: copy, send, resend, ack_rcv. The aRcv
fraction consists of three types of actions: receive, del ive r, and sendmack. These
fractions are composed with processes that will use them. The message communication

program composed with the reactive component is shown in Figure 5.6. We now give the

speciﬁcation of the program using the reactive component.

Speciﬁcation of program using the reactive component.

component satisﬁes the same speciﬁcation as the communication program (cf. Chapter 4).

6O

 

Program using the reactive

 

Component ack

Fraction aRcvz-

inp rQ
.3
var n, to, g, p : integer {initially k 2 ng :2 0}
k, ng, m : integer {initially k = 77.9 = 0}
rbqu : array [0..w — 1, 0..n — 1] of integer {initially 1}
ug : array [0..w — 1] of boolean {initially false}
paramj :Ogigw—l
begin

receive : -uisEmpty(C3,ri) —-> g,p,m := head(Cs,7~i);
rb’uf Q19, 29], ugly] == m, true
[] deliverj : ug[j] = true —> if count(rbqu[j, O..n — 1] 7£ _L) = n then
rQ := rQ o rbquLj, 0..n — 1];
rbquLj, 0..n - 1], CTN :2 1,073”; o ack(j);
ug[j],ng :=false, (j + 1) mod to
ﬁ
[] send_nack : count(ug[0..w — 1] = true) > 2 ——> for k = 0 to n — 1
if rbqu[ng, k] = .1. then
013,3 := 013,3 0 nack(ng, 1:)
ﬁ
end

 

 

 

Figure 5.5: Acknowledgment component: receiver ﬁaction.
Additionally, it tolerates message loss faults F2.
Invariant. The invariant of the program using the reactive component is S R = S 1 /\ S A,
where
SA=VizmiEmQ=>
((mi 6 chopy1 X m,- E r1.rQ
\_/ (m2 9! (chopyl U r1.rQ) => (m,- E sntl A (m,- E 03¢1 2 mi 6 r1.rbqu))))
/\ (m,- E chopr X m,- E r2.rQ
X (mi ¢ (choprUrng) => (m,- E snt2 /\ (m,- 6 ngr2 Xmi E r2.rbqu))))).

In the above invariant, S A indicates that for a message m, exactly one of the following

61

 

Program 'Pack

process 5
var :Paa_fec.s.var U ack.aSnd.var
param i :z'= 1,2
begin
ack.aSndcopyz-

[] ack.aSndsendz—

[] ack.aSnd.send_again,~
[] ack.aSnd.ack.rcvz-
end

process ri[2‘ = 1,2]

begin

var : Paa_fec.r,-.var U ack.aRcvz-Nar
param k :OSksw—l

begin

ack.aRcv.receive
[] ack.aRcvdeliverk
[] ack.aRcv.send_nack
end

 

 

 

Figure 5.6: Message communication program (with reactive component).

is true:

- m is at the sender, and is not yet sent

- m is received by the receiver

- m is buffered by the sender, and m is either in the channel or is buffered at the

receiver.

Fault-span. The F2-span of the program using the reactive component is TR = S1 /\ TA,
where
TA=Vi:mz-EmQ=>

((mi 6 chopyl X m,- E r1.rQ Y (m,- ¢ (chopy1 U r1.rQ) : m,- E snt1))

62

 

 

 

 

/\ (mi 6 chopy2 \_/ m,- E r2.rQ Y (m,- ¢ (chopy2 U r2.rQ) => m7; 6 snt2))).
In the above fault-span, T A indicates that for a message m, exactly one of the following

is true:
- m is at the sender, and is not yet sent
- m is received by the receiver

- if m is sent by the sender and not yet received by the receiver, then m is buffered by

the sender.

5.4.3 Adaptation: Replacement of Proactive Component with Reac-
tive Component

The adaptation of replacing the proactive component with the reactive component converts
the program shown in Figure 5.1 to the one shown in Figure 5.6. We now give the speciﬁ-
cation during adaptation for the replacement of the proactive component with the reactive
component.

Speciﬁcation during adaptation. The speciﬁcation during adaptation is that S 1 continues
to be true during adaptation in presence of faults F1.

We now describe the adaptation by discovering the intermediate programs and the cor-
responding transitional-invariants and transitional-faultspans during adaptation. We iden-
tify the intermediate programs after each atomic adaptation.

The execution of adaptive action aal in “Pawfec results in intermediate program

”Pawz-pl shown in Figure 5.7. ”Pawipl does not encode any new packets, but will send

63

 

Program ’Pawim

process .9
var: ’Paa_fec.s.var
begin
fec.encoder.fec_send
[] aag : aal AI = u —> transformTo(’Paa_Z-p2.s,(Page);
end

 

process 7,-[2‘ = 1, 2] : same as in Fig. 5.1

 

 

Figure 5.7: Intermediate program poo-i191-
any remaining encoded packets. In the execution of ’Pawipl, eventually all the encoded
packets are sent to the receivers. Thus, the guard of adaptive action aag becomes true.
The transitional-invariant of ’Pawim is: TS 5 = SQ A S6, where SQ is deﬁned earlier in
Chapter 4, and

$6 = (Vj :j _>_ u : eanLj,O..n — 1] = _L) A (l S it).

In the above transitional-invariant, 36 indicates that no new packets will be encoded by the

sender. The transitional-faultspan TT5 of ’Pawipl is same as TS 5.

 

Program ’Pawim

process 3
var : ”Paa_fec.s.var
begin
aa5 : aa3 A am; —> transformTo(73ack.s,<I>aa5);
end

 

 

process ri[z’ = 1, 2] : same as in Fig. 5.1

 

Figure 5.8: Intermediate program ”Pam-1,2.

The execution of aag results in intermediate program Pam-p2 shown in Figure 5.8.
Paw,” does not send any packets, but the packets that are there in the channel can still

be received by the receivers 7‘1 and r2. In the execution of ’Pawim eventually all the

64

packets in the channel are read and no new packets are added in the channel from sender
to receiver. Thus, the guards of the adaptive actions aa3 and M4 eventually becomes true.

The transitional-invariant of ’Pa is: TS 6 = 31 A S7 A SB, where 31 is deﬁned earlier

mm
in Chapter 4, and
S7 = (Vj :j 2 u : ean[j,0..n — 1] = _L) A (l = u), and
SS 2 Vi : m,- E mQ => (m,- E 3Q 2 ((mi 6 r1.rQ X m,- E data(CS,7~1 Ur1.rbqu))

A (m,- E r2.rQ M mi 6 data(Cs,r2 U r2.rbqu)))).
In the above transitional-invariant, 57 indicates that there are no encoded packets left at the
sender for sending, and 38 indicates that all packets that are sent are either received by the
receivers or are in the corresponding channels. The transitional-faultspan TT5 of ’Pawim

is same as T56.

 

Program ’Pawim
process 3 : same as in Fig. 5.8
process r1 : same as in Fig. 5.6

process r2 : same as in Fig. 5.1

 

 

 

Figure 5.9: Intermediate program Pawim.

Since M3 and M4 occur independently, we consider both possible orderings between
them. The execution of adaptive action (103 in ”Paw,” results in intermediate program
’Pawim shown in Figure 5.9. In Paa_z-p3, receiver r1 has replaced its fraction, whereas
receiver r2 has not yet replaced its fraction and can receive any remaining packets in the
channel from s to T2. Eventually, in the execution of Pawim the guard of adaptive action
aa4 gets enabled and M4 is executed resulting in intermediate program ’Pawim. The

transitional-invariant of ’Pa is TS 7 2 SI A S7 A SQ A Sm, where

a-ip3

65

 

 

 

 

Sg=VizmiEmQ=>(mZ-ESQ
\_/ (m,- E r1.rQ A (mi 6 r2.rQ 2 mi 6 data(C'5,r2 U r2.rbqu)))), and
S10 2 isEmpty(Cs,r1) = true A r1.rbqu = _L.

The transitional-faultspan TT7 of ’Paa_ip3 is same as TS 7.

 

Program poo-i124
process 3 : same as in Fig. 5.8
process r1 : same as in Fig. 5.1

process r2 : same as in Fig. 5.6

 

 

 

Figure 5.10: Intermediate program Pawn“.

The execution of adaptive action M4 in Puma-p2 results in intermediate program
poo-i194 shown in Figure 5 .10. In ’Paa_z-p4, receiver r2 has replaced its fraction, whereas
receiver r1 has not yet replaced its fraction and can receive any remaining packets in the

channel from s to r1. Eventually, in the execution of ’Paa_ the guard of adaptive ac-

i104
tion dog, gets enabled and (tag is executed resulting in intermediate program ’Paa_ip5. The

transitional-invariant of PM. is TSg = S1 A S7 A S11 A 512, where

2104
$11=Vi:mz:€mQ=>(mz- ESQ

\_/ ((mi 6 r1.rQ Y mi 6 data(C'S,r,~1 U r1.rbqu)) A m,- E r2.rQ)), and
312 = isEmpty(Cs,r2) = true A r2.rbqu = _L.

The transitional-faultspan TT8 of {Paar-i124 is same as TS 8-

 

Program 7706,4105
process 3 : same as in Fig. 5.8

process 73-[2' = 1, 2] : same as in Fig. 5.6

 

 

 

Figure 5.11: Intermediate program 7300,4195.
In intermediate program 730,me shown in Figure 5.11, only adaptive action (105 is

66

enabled, and execution of Mg, results in the new program Pack. The transitional-invariant
of’l30a_ip5 is T89 = 51 A 510 A 512 A 513, where

313 2 Vi. : m, E mQ :> (m,- E 8Q Y (m,- E r1.rQ A mi 6 r2.rQ)).
The transitional-faultspan TT9 of Poo-i195 is same as TS 9.

We now give the state mappings for the adaptive actions in the adaptation that are used
in initializing the state of the new fraction at each process.
State mapping. The state mapping for each adaptive action is shown in Table 5.1. Each

adaptive action initializes the state of the new process when it is executed based on this

 

 

 

 

 

 

mapping.
Mapping Function Process Affected New State
(Dual 3 Identity mapping
<I>aa2 3 Identity mapping
(baa3 r1 {rQ, s} - Identity mapping,
V(rl) — {rQ, s} - Initial mapping
(baa,1 r2 {rQ, s} - Identity mapping,
V(rg) - {rQ, s} - Initial mapping
(1)0115 S {8Q, 7'1 1 T2} ' Identity mapping:
V(s) — {sQ, r1, r2} - Initial mapping

 

Table 5.1: State mappings for the adaptation.

Based on the description of the adaptation in this section, we ﬁnd the transitional-
faultspan (F—span) lattice as shown in Figure 5.12 for the adaptation of replacing the proac-
tive component with the reactive component. Thus, we have the following theorem.
Theorem 5.2. The adaptation lattice of Figure 5. I 2 is a transitional-faultspan (F 1-span)
lattice for the adaptation of replacing the proactive component with the reactive compo-

nent. Hence, the adaptation is correct in presence of faults. D

67

 

 

 

 

 

SR,TR

Figure 5.12: Adaptation lattice for replacement of proactive component with reactive com-
ponent.

68

Chapter 6

Case Study: Mixed Mode Adaptation

In this chapter, we discuss the case study of mixed-mode adaptation. We study mixed-mode
adaptation in the context of protocol change as discussed in dynamically adaptable middle-
ware [1, 74, 75]. Speciﬁcally, we consider two leader election protocols and adaptation that
changes one leader election protocol to another at run-time. We replace a leader election
protocol that elects a leader based on process identiﬁcation to a leader election protocol that
elects a leader based on process value, where the value of a process can be deﬁned using
process’s battery life, its average distance to other processes, etc.

In the rest of this chapter, we ﬁrst discuss the two leader election protocols in Section
6.1; we discuss the system model, protocol speciﬁcations and descriptions in this section.
Next, in Section 6.2, we discuss the adaptation of leader election protocols; we discuss
the overlap communication scenarios, state mappings and veriﬁcation of the adaptation in
this section. In Section 6.3, we present the performance results of mixed-mode adaptation.

Finally, in Section 6.4, we discuss the limitations of mixed-mode adaptation.

69

6.1 Leader Election Protocols

Leader election is a ﬁmdamental problem in distributed computing. The basic description
of leader election problem is stated as: eventually elect a unique leader. For example, in
the case of group communication protocols, the leader election is employed to elect a new
coordinator whenever a group coordinator fails. Numerous leader election protocols have
been proposed in the literature for a variety of applications. We discuss two versions of the
leader election protocol that is based on the termination detection protocol by Dijkstra and
Scholten [76]. The protocols discussed in this section are an abstraction of the protocol
discussed in [77]. We assume that there is another module at all (or selected) processes that
monitors the status of the leader process. A monitor process starts an instance of the leader

election protocol whenever it detects a failure of the leader.

6.1.1 System Model and Assumptions

The system consists of n processes. All processes have unique identiﬁers which we denote
by id. Each process maintains a variable ldr, which denotes the value of the leader that the
protocol elected, and a variable 6, which denotes if the process is in election or not . We

make the following assumptions about the system:

0 Bi-directional channel. The channels between processes are bidirectional, i. e., if the

system has a channel Cpl] from p to q, then it also has a channel Cqm.

0 Static processes. We assume that processes are static and the network is connected.
If the processes are mobile and the network suffers from partitioning, then the proto-
col can be extended as discussed in [77].

70

0 Process failure and reliable communication. A process or a link can fail, however,
for simplicity, we assume that while the election is going on there is no process or
link failure. We assume reliable communication, i. e., if process p sends a message m

to process q, then eventually q receives the message m.

6.1.2 Speciﬁcation of Leader Election Protocols

We consider the following problem speciﬁcation of the leader election protocols that we
use in this case study:
Safety: Cl(i.ldr 7g j.ldr => ale Vj.e)
Liveness: UOVi,j : i.ldr = j.ldr
The safety property asserts that no two stable processes (i. e., processes not in the election)
can have different leaders. The liveness property asserts that eventually a unique leader is
elected.

We now discuss two leader election protocols, one that elects the process with the max-
imum id as the leader, and one that elects the process having the maximum value as the

leader.

6.1.3 Leader Election based on Process ID

The leader election protocol, 1dr] d , that elects the process with maximum id as a leader is
shown in Figure 6.1. In addition to the safety and liveness properties discussed earlier in
Section 6.1.2, the protocol also satisﬁes the following liveness property:

Liveness of ldrId: DOinr = max{k I dug < oo},

71

 

Component ldrId

Fractioni
inp N : list
var startElectz'on, e,ack : bool {initially false}
p, num, max, ldr : integer {initially p = num = max = 0, ldr = _L}
src(Num, Id) : (integer, Ln) {initially (O, 2)}
W, chd : list {initially W, chd = o}
begin

startElection : ﬁt A startElectz’on ——>
src, num :2 (num, 2'), num + 1;
ELECT.src := src;
forj=1tonAjEN
Ci,j.add(ELECT);
W, ack,6,p,max, chd 2: N, true, true, 2', i, (f);
startElectz'on := false
ﬂ joinElection : ELECT(ldrId) e Cjn‘ —->
Cj,i.remove(ELECT);
if ﬁe V (e A ELECT.src > are) then
src := ELECT.src;
num := src.Num + 1;
fork=1tonA(k7éjAkEN)
Ci’k.add(ELECT);
W, ack, 6,19, max,chd := N — {7'}, true, true,j, i, (t);
else if e A src = ELECT.src then
ACK.{src, chd} := src, false;
Ci,j.add(ACK)

ﬁ
[] ackToParent : e A W 2 g!) A src.Id 3A i A ack -—>
ack := false;

ACK.{src, chd, id} :2 src, true, max;
Ci,p.add(ACK)

 

 

 

Figure 6.1: Leader election algorithm based on node Id.
which asserts that the elected leader is the process having maximum id among all connected
processes.
The protocol uses three types of messages as shown in Table 6.1. It also shows the ﬁelds

associated with each message type. The variables used by the leader election protocol are

72

 

[] ackReceive : e A ACK(ldrId) e 033,- —>
Cj,z-.remove(ACK);
if ack A src = ACK.src then
W == W — {j};
if ACK.chd then
chd = chd + {j};
max := MAX(ACK.id, max);
ﬁ
ﬁ
[] electLeader : e A W = 05 A src.Id = i A ack ——>
ack, 6, ldr := false, false, max;
LD.{src, id} := src, max;
forj= 1tonAj Echd
Ci,j.add(LD)
[] setLeader : e /\ LD(ldrId) 6 CM A pack —+
Cj,i.remove(LD)
if src = LD.src then
ldr, 6 := LD.ld, false;
fork=1tonAk E chd
Ci’k.add(LD)

 

 

 

 

ﬁ
end
Figure 6.]: Leader election algorithm based on node Id (Continued).
shown in Table 6.2.
Message Meaning Message Fields

 

ELECT for building a spanning tree type : protocol type
src : computation index of the election

ACK to acknowledge the receipt type : protocol type

of ELECT message src : computation index of the election
chd : denotes if the sender is a child

id : maximum id as seen by the sender

LD to announce a leader type : protocol type
src : computation index of the election
id : id of leader elected in the election

 

 

 

 

 

Table 6.1: Message types used in the protocol.

73

 

Variable Type Meaning

 

e bool indicates whether process is currently in election or not
ldr int(1..n) id of the leader process
num int index of the last computation that this process started

src(Num, Id) (int, int(1..n)) computation index of the last computation in which this

process participated
p int(1..n) parent process in last computation
ack bool indicates if ACK message is sent to parent or not
W list list of neighbors from which ACK is being awaited
C list list of my children in current computation
max int(1..n) maximum id among my children in current computation

 

 

 

 

 

Table 6.2: Variables maintained by each process in the protocol.

6.1.3.1 Description of the protocol

When the process i detects the failure of the leader, it sets the value of the variable
startElection to true to start the instance of the ldrId protocol to elect a new leader.
The fraction at i begins the election by starting a diffusing computation by sending an
ELECT message to all of its neighbors. The ELECT message has a ﬁeld that contains the
computation-index of the diffusing computation. When a process receives the ELECT mes-
sage, it sets the neighbor from which it ﬁrst received the message as its parent. It then
propagates the ELECT message to all of its neighbors. In this way, during the ﬁrst phase a
spanning tree is build.

When a fraction 2' receives an ELECT message from a process that is not its parent,
it immediately responds by sending an ACK message. The ACK message has a ﬁeld that
identiﬁes if the message is from a child or not, and a ﬁeld that denotes the maximum id

as seen by the process. The fraction i sends an ACK message to its parent only when it

74

has received ACK messages from all of its neighbors. When a fraction 2' has received ACK
messages from all of its neighbors, it ﬁrst ﬁnds the process having maximum id among all
its children. It then sends the ACK message to its parent.

When the source process that started the computation (election) receives ACK messages
from all of its neighbors, it can compute the leader by ﬁnding the process having maximum
id among all its children. It then starts a diffusing computation to forward the leader infor-
mation to all the processes.

As one or more processes can concurrently detect failure of a leader, it is possible that
more than one process can start elections independently, thereby, leading to concurrent dif—
fusing computations. To ensure correctness of the protocol, it is required that each process
participate in only one diffusing computation. This is done by associating a computation-
index to each computation. The computation-index is a pair (num, id), where id represents
the identiﬁer of the process, and num is an integer. When a process participating in a diffus-
ing computation, receives another diffusing computation with higher computation-index, it
stops participating in the current computation in favor of the diffusing computation with
higher computation-index. Two computation-index are compared as follows:

(num.1,id1) > (numg, idg) 4:) ((numl > numg) V ((numl 2 77.11.7722) A (idl > id2))).

6.1.4 Leader Election based on Process Value

The leader election protocol, ldr Val that computes the leader based on the value of a pro-
cess is shown in Figure 6.2. This protocol satisﬁes the following liveness property in addi-

tion to the safety and liveness properties discussed in Section 6.1.2:

75

 

Component ldr Val

Fractioni

inp value : int
N : list

var startElection, e, ack : bool {initially false}
p, num, ldr : int {initially p = num = 0, ldr = _L}
src(Num,Id),ma:r(Val,1d) : (int, 1..n) {initially (0,i)}
N, W, chd : list {initially W, chd = o}

begin

startElection : -~e A startElection —>

src, num :2 (num, i), num + 1;

ELECT.ST‘C I: i;

fork: 1tonAk E N
Ci,k.add(ELECT);

W, ack, e,p, max, chd :2 N, true, true, i, (value, i), a);

startElection := false

[] joinElection : ELECT(ldrVal) 6 CM -—>

Cj_i.remove(ELECT);

if we V (e A ELECT.src > src) then
src :2 ELECT.src;
num := src.Num + 1;
fork=1tonA(k#jAk€N)

Ci’k.add(ELECT);

W, ack, 6,1), chd := N — {j}, true, true,j, cf);
max := (value,i);

else if e A src = ELECT.src then
ACK.{src, chd} := src, false;
Ci,j.add(ACK)

ﬁ
[] ackToParent : e A W = d) A src.Id 7e i A ack ——>
ack := false;

ACK.{src, chd, val, id} :2 src, true, max.{Val, Id};
Ci,p.add(ACK)

 

 

Figure 6.2: Leader election algorithm based on node value.
Liveness of ldrVal: D0(i.ldr =j => j.value = mar{k.value | diJc < 00},
which asserts that the elected process is the process having maximum value among all

connected processes.

76

 

 

[] ackReceive : e A ACK(ldrVal) E Cj,i A ack —>
C -,i.remove(ACK);

.7

if src = ACK.src then
W == W - {j};
if ACK.chd then

chd = chd+ {j};
maze := MAX((ACK.val, ACK.id), max)
ﬁ
ﬁ
[] electLeader : e A W = (15 A src.Id = i A ack —>
ack, e, ldr := false,false,ma:r.1d;
LD.{src, id} :2 src, maxid;
fork=1tonAk E chd
Ci,k.add(LD)
[] setLeader : e A LD(ldrVal) 6 CN- A pack —>
Cj,i.remove(LD);
if src = LD.src then
ldr, e := LD.ld, false;
fork: 1tonAk E chd
Ci,k.add(LD)
ﬁ
end

 

 

 

Figure 6.2: Leader election algorithm based on node value (Continued).

The value of a process is calculated based on the resources available at the process,

such as, battery power, CPU load, distance to other processes, and degree of the process.

The leader election protocol is independent of how the value of a process is calculated.

We assume that there is a separate component that monitors the resources at a process and

computes the value of the process; and the leader election protocol fraction at the process

has access to this value.

77

6.1.4.1 Description of the protocol

The ldr Val protocol is similar to the ldrId protocol described earlier where the leader is
computed based on the process whose id is maximum. The ldrVal protocol is different

from ldrId protocol in the following ways:

0 The ldr Val protocol has an additional input variable, value that indicates the value

of the process.

0 The max variable in ldrVal is represented as a pair (Val, Id) and is of type
(int, int(1..n)); marld denotes the id of the process and max. Val represents the

value of the process.

0 The ACK message of ldr Val has a ﬁeld, ACK.val that carries the information about
the process having maximum value among all of its children. This ﬁeld is of type:
(int, int(1..n)). The equivalent ﬁeld in ACK message of ldrId is ACK.id, which is of
type int. In the case where two processes have the same value, the tie is broken in

favor of the process having higher id. We have the following property:

marl > mastz <=> ((ma331.Val > ma$2.Val) V ((max1.Val = max2.Val) A

maxlld > max2.1d))).

6.2 Adaptation

We now consider adaptation that dynamically replaces the leader election protocol ldrId
to the leader election protocol ldrVal. When the adaptation is initiated, an instance of
ldrId can be underway, or it is also possible that no instance of ldrId is underway. If the

78

processes are not engaged in election, then replacing one leader election protocol to another
leader election protocol can be done easily; by replacing the corresponding fractions and
initializing the new fractions using initial, identity or quasi state mapping.

However, it is impossible for a process to locally determine whether other processes are
in election or not. One way to deal with this is to use a centralized control during adaptation
and/or enforce synchrony by quiescing the processes (blocking the processes from starting
new election). This type of approach causes service interruption where the leader election
service is not available till the adaptation is completed. Also, there is communication
overhead because of extra messages required for synchronization.

We consider mixed-mode adaptation that lets the old and the new protocol overlap dur-
ing adaptation. As a result, a process where the new fraction is installed, can start the elec-
tion right away without waiting for other processes to ﬁnish the adaptation. Thus, the time
for which the application is blocked or the service interruption time is reduced. Moreover,
since all processes can perform replacement of fractions independently (in parallel), there
is a little or no need for extra messages to achieve synchrony. Thus, the communication
overhead during adaptation is reduced.

The program using ldrId protocol is shown in Figure 6.3, and the program using ldr Val
protocol is shown in Figure 6.4. We ﬁrst give the speciﬁcation during adaptation for the
adaptation that replaces ldrId with ldr Val.

Speciﬁcation during adaptation. The speciﬁcation during adaptation is that the safety
property of leader election protocols as discussed in Section 6.1.2 continues to be true
during adaptation.

Adapting ldrId protocol to ldrVal protocol requires replacing each fraction of ldrId

79

 

Program pldr Id

process i[i = 1, .., n]

begin
ldrId.i.startElection
ldrId.i .joinElection
ldrI d. i .ackToParent
ldrId . i .ackReceive
ldrI d .i.electLeader
ldrId.i.setLeader

DEED:

end

 

 

 

Figure 6.3: Program using ldrId.

 

Program 7Dldr Val
process i[i = 1, ..,n]

begin
ldrVal.i.startElection
ldrVal.i.joinElection
ldrVal.i.ackToParent
ldr Val . i .ackReceive
ldr Val. i .elect Leader
ldrVal.i.setLeader

DECIDE:

 

 

end

 

Figure 6.4: Program using ldr Val.

with the corresponding fraction of ldr Val. The replacement of fraction at each process is
an atomic adaptation and is modeled by an adaptive action. The intermediate programs that

occur during adaptation are as shown in Figure 6.5.

 

Program pip(ldrId-ldr Val)
process i[i = 1, ..,n;i 75 J]

Pldrld‘i U aaz- 2 true —* TransformT0(pldTVal'i’(Dani)
process j[j =1,..,n;i#j]

Pzdrvaz-J'

 

 

 

Figure 6.5: Intermediate program during adaptation from Pldr Id to Pldr Val-

In this example, the fractions of the two protocols at all processes are similar. Also, as

80

discussed later in this section, the overlap communication scenarios for each intermediate
program are similar. We take advantage of this symmetry by modeling the adaptation as an
adaptive program (cf. Section 3.2.2 (adaptation as an automaton) of Chapter 3).

To model each adaptive action, we introduce two new boolean variables idActive and
valActive. We do restriction composition [58] of ldrId program with idActive, and of
ldrVal with valActive. A restriction of program P by Z is a program whose actions are
of the form Z A g —-> st, for each action 9 -—> st of P. Thus, actions of ldrId will be
enabled only if idActive is true, and actions of ldrVal will be enabled only if valActive
is true. Initially, when the old program is running, idActive is true and valActive is false
at each process. The adaptive action at each process atomically sets idActive to false and

valActive to true. The adaptation modeled as an adaptive program is shown in Figure 6.6.

We allow these adaptive actions to execute independently of each other. In other words,
at each process the fraction of ldrId protocol can be replaced by the fraction of ldrVal
protocol independent of other processes. During this replacement, the state of the new
fraction needs to be initialized appropriately to ensure correctness of adaptation.

Furthermore, independent replacement of fractions at various processes will lead to
a situation where during adaptation some of the processes have ﬁactions of ldrId active
and some of the processes have fractions of ldrVal active. This will cause overlap of
communication between the two protocols. The communication between two protocols
need to be handled appropriately to ensure correctness of adaptation.

The correctness of adaptation requires that the safety property, which asserts that two

processes not in the election cannot have different leaders, also needs to be satisﬁed during

81

 

Program pldrId—ldr Val
process i[i = 1, ..,n]
var idActive :bool {initially true}
valActive : bool {initially false}
ldrId.i.{var} U ldrVal.i.{var}
begin
idActive A ldrId.i.startElection
idActive A ldrId.i.joinElection
idActive A ldrId.i.ackToParent
idActive A ldrId. i .ackReceive
idActive A ldrId . i .electLeader
idActive A ldrId.i.setLeader

valActive A ldr Val. i .startElection
valActive A ldr Val. i .joinElection
valActive A ldr Val. i .ackToParent
valActive A ldr Val. i .ackReceive
valActive A ldr Val. i .electLeader
valActive A ldr Val. i .setLeader
aaz- : idActive —-> idActive,valActive :2 false,true;
s, := (Pam-(3)
discardElect : valActive A ELECT(ldrId) e eri —> -j’i.remove(ELECT)
discardAck : valActive A ACK(ldrId) E Cj,i ——» j,i.remove(ACK)
acceptLd : valActive A LD(ldrId) e 0.7-9,- —-> true V ldrVal.i.setLeader

E EEEEEE 53:3:c_w

23::

end

 

 

 

Figure 6.6: Adaptive program to adapt from Pldr Id to pldr Val-

adaptation. Furthermore, when all processes have ﬁnished replacing their fractions, the
program should be in a reachable (invariant) state of the ldr Val. This is required to ensure
that once the adaptation is complete, the new program satisﬁes its speciﬁcation.

We now describe the state mapping that we use in the adaptation, and also discuss

various overlapping communication scenarios that arise during the adaptation and how we

deal with them.

82

6.2.1 State Mapping and Overlap Communication

During adaptation, a process replaces the fraction of ldrId with the fraction of ldr Val. Since
the protocol fractions are replaced independently at each process, we need to explicitly deal
with the communication between the two protocols. A process with the old fraction may
receive a message from the process with the new fraction and vice-versa. In Table 6.3, we
show different overlap communication scenarios that can occur during adaptation and how
they are dealt. Any message of type ELECT from the new fraction to the old fraction is
buffered. Any message of type ELECT or ACK from the old fraction to the new fraction
is discarded. The new fraction accepts a message of type LD from the old fraction. The
scenarios where the new fraction sends a message of type ACK or LD to the old ﬁaction do
not arise because the new fraction always discards any message of type ELECT from the
old fraction. We introduce the following actions: discardElect, discardAck, and

acceptLd as shown in Figure 6.6 to deal with overlap communication scenarios during

 

 

 

 

 

 

 

 

 

adaptation.
, Message Type ELECT ACK LD
Overlap scenario
Old (ldrId) to New (ldr Val) discard discard accept
New (ldr Val) to Old (ldrId) buffer NA NA

 

Table 6.3: Overlap communication between protocols.

To ensure correctness while dealing with overlap communication, the new fractions are
initialized using the state mapping deﬁned in the Table 6.4. When the replacement is done,
the old fraction is in any one of the four states as described in the Table 6.4. As described

in the ﬁrst case, if the old fraction is not involved in the election, then identity mapping is

83

done, i. e. , each variable of the new fraction is initialized to the corresponding variable of the
old fraction. In the second case, the old fraction is involved in the election, and is waiting
to receive ACK from its children. In this case, the old fraction can locally determine that
the source process has not yet elected the leader. The new fraction in this case is initialized
to its initial state, i. e., it is not taking part in election any more. The num and src variables
of the new fraction are assigned the same value as that of the old fraction. In the third case,
the fraction is waiting to receive the leader value (i. e., LD message) from the parent. In this
case, an identity mapping is done. In the last case, the process is the source of the election,
i. e., it started the election. In this case, we replace the fraction and have the new ﬁ'action
start the election again. We initialize the state of the new fraction to its initial state, and
num variable of the new fraction is initialized to the same value as that of the old fraction.
For brevity, in Figure 6.6 we show the state mapping action as s' z: (Pam-(8)- This is
equivalent to a set of assignment statements that assign values to each variable of the new

ﬁaction using the state mapping (baa,- of Table 6.4.

 

(old) state 3 of process i Description of (old) state 3 (new) state 3’ of process i

 

ﬁldrld .c not in election Identity mapping

ldrId.e A ldrId.W 5A (25 A in election and waiting for {num, src} - Identity mapping;
ldrId.src 76 i ACK from children and is not Initial mapping
source of election

ldrId.e A ldrId .W = q') in election and waiting for LD Identity mapping
ldrIde A ldrId.W sé o A in election and waiting for {n.um, e} — Identity mapping;

 

 

 

 

ldrId.src = i ACK from children and is {ldrVal.startElection := true;
source of election - Functional mapping;
L Initial mapping

 

Table 6.4: State mapping ((Daai) for each atomic adaptation 0a,.

84

6.2.2 Verifying Adaptation

To verify adaptation, we need to verify all intermediate programs that occur during adap-
tation. The intermediate programs that occur during the adaptation from Mr] d to ldrVal
are as shown in Figure 6.5. Since the protocol fractions and the overlap communication
scenarios are symmetrical, we modeled the adaptation as the adaptive program shown in
Figure 6.6. Thus, instead of verifying all intermediate programs, we verify the adaptive
program of Figure 6.6.
Safety of adaptive program. The safety property for the adaptive program pldrld— 1d,. Val
is similar to the safety property of the leader election protocols, and given as follows:
Vp1,p2 :p1,p2 E {ldr1d, ldrVal} :p1.i.ldr # p2.j.ldr => plie Vp2.j.e.
Invariant. We establish the following invariant for the adaptive program that implies the
safety is not violated during adaptation:
I E PE A P]; A PL A Pi, where
PE _=_ i.idActive Aj.valActive Aj.c Aj.ack A j.VV 76 d => ELECT(ldrVal) E ij
Pg. E j.valActiveAﬁj.eA(3i : i E j.N : i.idActiveAj.src = i.srcAi.eAi.ackACZ-J = o)

=>Elk:k.p=kAk.eAk.ackAk.W#¢

PL E i.idActive A j.valActive Aj.e A j.p = i A -1j.acl: A j.W = (15
=> ((ic A isrc = j.src A ﬂiack A i.W = o)V
(ﬂieA LD(ldrId) E Ci,j)V
(is A tack A i.W 7t at A i.src = j.src)V

(it. A Lack A i.W 75 (J5 A i.src 35 j.src => j E i.VV))

85

Pi E i.idActive Aj.valActive A is A -wj.e A i.p = j A -ri.ack A i.W = qb A i.src = j.src

=> LD(ldrId) e Cjn’ v 31.: : k.p = k A lee A hack A k.W at 4')

In the above invariant, PE asserts that if process j starts an election after adapting to
ldrVal fraction, then any ELECT message that j sends to process i, which is still using
ldrId fraction, remains in the channel CM until i gets adaptated. P1,; asserts that if j was
in election and waiting for ACK(s) from its neighbor(s) when it replaced its ﬁaction then
the source of that election is still in the election. PL asserts that if process j was in election
and waiting for LD when it replaced its fraction and the parent of j is still using the old
fraction then one of the following is true: (i) the parent of j is also waiting for LD, (ii) the
parent of j is not in election and the channel CiJ has a message LD(ldrId), (iii) the parent
of j is still waiting for ACK(s) from its neighbor(s), or (iv) if the parent of j has started a
new election then it will not receive ACK from j till it replaces its ﬁ'action. Pi asserts that
if a process i using the old fraction is waiting for a message LD from its parent j, and 3'
has completed its election and is now using the new fraction, then one of the following is
true: (i) the channel Cg},- has a message LD(ldrId) which was sent by process j before it

got adapted, or (ii) the source of that election is still in the election.

6.3 Performance of Mixed-Mode Adaptation

In this section, we compare the performance of mixed-mode and quiescence adaptation. We
consider two different conﬁgurations for the following discussion: (2) a connected network

(straight line) of 5 processes and 4 edges, and (2) a connected network of 7 processes and

86

11 edges. We have considered other network topologies and obtained similar results. In
this discussion, we consider adaptation from ldrId to ldr Val. We have also implemented

adaptation from ldr Val to ldrI d and obtained similar performance results.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Quiescence Adaptation (Ldrld to LdrVal)
2500 ~
2000 . .
A . DNode1
g 1500 13'; 0: Node 2
”J 3 ; a Node 3
g 1000 5 ~ ' INode4
, 5;! T5 : EINodes
5004 2:: 2:; _, -
:éi '
0 " : T l
1 2 3 4 5
Runs
(a) Conﬁguration 1
Quiescence Adaptatlon (Ldrld to LdrVal)
2500 ]
.; E]
2000 '5; U Node 1
A 5; nNode 2
g E; . El Node 3
I E? nNode 4
- r
g g 5; ' 7 a Node 5
E 52 g a Node 6
E :2 a 1] Node 7
g :5 5
. . t 7 =5 5
1 2 3 4 5
Runs

 

 

 

(b) Conﬁguration 2

Figure 6.7: Quiescence adaptation.

Figure 6.7 shows the time required for quiescence adaptation to adapt from Id-based

leader election protocol to value-based leader election protocol. It shows two conﬁgura-

87

tions and 5 runs for each conﬁguration. In each run it shows the time taken by each process
for adaptation. In run number 5 of conﬁguration 1 and run number 4 of conﬁguration 2, the
time taken for adaptation is almost twice the average time taken by the process over other
runs. This is because when the adaptation started the instance of the leader election proto-
col is already running. The adaptation waits for the election to complete before replacing
the protocol.

Figure 6.8 shows the time required for the mixed-mode adaptation to adapt from Id-
based leader election protocol to value-based leader election protocol. It shows two conﬁg-
urations and 5 runs for each conﬁguration; and in each run the time taken by each process
to ﬁnish the adaptation. The time taken in each run is almost the same for a given conﬁg-
uration regardless of whether the instance of leader election protocol is underway or not
when adaptation occurs.

Figure 6.9 shows the comparison between the average time taken by quiescence and
mixed-mode adaptation to adapt from Id-based leader election protocol to value-based
leader election protocol. The average time taken by quiescence adaptation is almost 8 times
that of mixed-mode adaptation in the case of conﬁguration 1, and 6 times that of mixed-
mode adaptation in the case of conﬁguration 2. The result is as expected, because the
quiescence adaptation sends more messages for synchronization during adaptation. Fur-
thermore, in the case of quiescence adaptation there is more processing time required at
each process during adaptation as channels and other resources used by the existing pro-
tocol at that process need to be released before the new protocol can be installed. On the
contrary, in the case of mixed-mode adaptation the new protocol is able to deal with overlap

communication and hence explicit release of channels and other resources is not necessary.

88

 

Mixed-Mode Adaptation (Ldrld to LdrVal)

 

 

 

 

 

 

 

 

 

350 -

300
A 250 [Node 1
a
E 200 El Node 2
0 EINode 3
E
|__. . = : 1! Node 4

Em Egg. , 5 . . : a Node 5
1

 

 

 

 

(a) Conﬁguration 1

 

Mixed-Mode Adaptation (Ldrld to LdrVal)

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

300 ,
250 l 'g " BNode1
' I
A 200 ‘ :5 mN°de2
E E? :7 ; mNodes
:1504 E? E; I? ‘ 3 EINode4
E :t 55 :5 ‘ ' ENode5
I- 100 :5 .:t :5
E; :2 E? ZNodeS
50 :g E? E; nNode7
.II - I f
0 l 2., it it.
1 2 3 4 5
Runs

 

 

(b) Conﬁguration 2

Figure 6.8: Mixed-mode adaptation.

Figure 6.10 shows the time required for electing a leader by Id-based and value-based
leader election protocols. It is clear that both the protocols take almost the same amount of
time to do an election.

Furthermore, from Figures 6.9 and 6.10 it can be observed that time taken for quies-

cence adaptation is more than twice the time it takes for leader election. Also, the time

89

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Quiescence Vs. Mlxed-Mode Adaptatlon
1400
1200
i?
g 1000 a Node 1
E 800 In Node 2
E a Node 3
5: 50° Node 4
g 400 E Node 5
<
200
o <
Quiescence Mixed-Mode
(a) Conﬁguration 1
Qulescenca Va. Mixed-Mode Adaptation
.3 I3 Node 1
E, [Mode 2
g a Node 3
p. a Node 4
g! B Node 5
9; ZNode 6
< El Node 7
Quiescence Mixed-Mode

 

 

 

(b) Conﬁguration 2

Figure 6.9: Quiescence vs. mixed-mode adaptation.

for mixed-mode adaptation is less than one-fourth of the time it takes for leader election.
Clearly, if a user at some process requests an election at the same instant when the sys-
tem decided to adapt (using quiescence adaptation), then user would notice a long delay
which could be up to twice the time it normally experiences for an election. However, if

mixed-mode adaptation is chosen, the adaptation would be almost transparent to the user.

90

 

 

 

 

 

 

 

 

 

    

 

 

 

Ldrld Va. LdrVal

1000

950.L
A 9001
o ] n
g 850. ‘DNode1
g 300]f {nNodez
I: 7501 EINode3
g 700’ INode4
; 650I ..,, ENode5
‘4 600‘

550 ~ 1‘: ,

500 I ><> . ‘ , ,‘, -. .

Ldrld LdrVal

 

 

 

(a) Conﬁguration 1

 

Ldrld V8. LdrVal

 

 

‘DNode1
lUNodez
‘DNode3
‘EINode4
]ENode$

‘lZNodeﬁ

@121

I

Average Time (ms)

 

—
—
‘—
—
'—
_-
..-
—
'—
‘—
'—
:-
_—
_—
'—
.—
'—
'—
-
'—
—
'—
—
_—
_

/ss
/29
/:3
/22
/5E

 

 

 

 

(b) Conﬁguration 2
Figure 6.10: Time for leader election protocols.

6.4 Limitations of Mixed-Mode Adaptation

One of the limitations with mixed-mode adaptation is that it requires the adaptation de-
veloper to have a deeper knowledge of the components involved in adaptation. Addition-
ally, such adaptation may require support from components themselves. Nonetheless, in

our experience, we ﬁnd that components involved in mixed-mode adaptation exhibit var-

91

ious levels of mixed-mode behaviors. Consequently, based on the details available about
the components involved in adaptation, an adaptation developer can provide an appropriate
mixed-mode behavior during adaptation. However, in cases where components are not con-
ducive in the development of mixed-mode adaptation, adaptation based on system structure

such as quiescence adaptation may be more appropriate.

92

Chapter 7

Tradeoffs in Adaptation

In this chapter we identify various tradeoffs that arise in developing adaptation. We iden-
tify tradeoffs in veriﬁcation complexity, completion time, and communication overhead
during adaptation. Concurrent executions are generally considered faster than sequential
executions. Speciﬁcally, with respect to adaptation, we expect that concurrent execution
of atomic adaptations (if possible after considering any dependencies) would be faster than
sequential execution. However, veriﬁcation complexity increases exponentially with in-
crease in concurrency, and also message communication overhead increases with increase
in concurrency. In the rest of this chapter, we ﬁrst discuss tradeoff between concurrency and
veriﬁcation complexity in Section 7.1. In Section 7.2, we discuss tradeoffs between concur-
rency and communication overhead. Finally, we discuss a simple casestudy to demonstrate

tradeoffs in adaptation in the publish-subscribe application.

93

7.1 Concurrency v/s Veriﬁcation Complexity

As discussed in Chapter 3, to verify a given adaptation, we consider all possible order-
ings of concurrent atomic adaptations. As a result, in the lattice, we have multiple paths
from start node to end node to encompass all possible orderings among concurrent atomic
adaptations. This increases the number of intermediate programs that need to be veriﬁed.
Putting concurrent atomic adaptations in various possible orderings is a potential cause
of the explosion in the size of the lattice. For example, if an adaptation consists of n
atomic adaptations that can be executed concurrently, then there are 77.! different orderings
and 2” — 2 different intermediate programs. Thus, 2” — 2 transitional-invariants need to
be identiﬁed corresponding to each intermediate program. The lattice in this case is as
shown in Figure 7.1(a) for n = 3. To identify all these transitional-invariants and verify the

corresponding intermediate programs is a difficult process.

 

(b)

Figure 7.1: Executing three atomic adaptations.

Clearly, the speciﬁcation during adaptation is satisﬁed, if the adaptation follows any

94

path in the lattice. So instead of choosing to verify all adaptation paths, if we verify only
one path (e.g., a1, a3, a2), then the lattice would be as shown in Figure 7.1(b). For speciﬁ-
cation during adaptation to be satisﬁed the adaptation must follow this path, i. e., a1 should
occur before (13, and a3 should occur before a2. In this case there is no concurrency
among adaptive actions during adaptation, and we are able to reduce the cost of veriﬁca-
tion from 0(2") to 0(a). Speciﬁcally, for n concurrent atomic adaptations, the number of
transitional-invariants that need to be identiﬁed is reduced to n — 1.

Alternatively, we could have chosen the lattice as shown in Figure 7.1(c). In this case
the cost of veriﬁcation is more compared to the lattice in Figure 7.1(b), but less compared
to the lattice in Figure 7.1(a). Also, the concurrency in the lattice in Figure 7.1(c) is more
compared to the lattice in Figure 7.1(b), but less when compared to the lattice in Figure
7.1(a).

Thus, based on the tradeoff between concurrency of adaptation and complexity of ver-
ifying that adaptation, we can choose a subgraph (sublattice) of a given lattice that has all
the properties of the lattice deﬁned in Chapters 4 and 5. If we verify only a sublattice, we

also need to constrain the adaptation so that it follows only path of the sublattice.

7 .2 Concurrency v/s Message Complexity

Many systems are limited by communication overhead and message delays. Speciﬁcally,
for wireless and mobile systems, energy—communication tradeoff may require system to
reduce communication overhead whenever possible. For designing adaptation in such sys-

tems, communication overhead should also be taken into account. In this section, we show

95

how concurrency during adaptation affects the communication overhead.

 

 

 

 

 

 

 

 

IP SP
a1 IP a] SP
pl ! : 4‘ p1 : ‘ e
,2 - .3, o p \ - ‘12 .
j X f V 2 V '
p3 t c 4 p3 \v 3 .j e
a, a,
(a) (b)

Figure 7.2: Space-time diagram of adaptation.

Consider the case where all atomic adaptations are executed concurrently. This is de-
scribed by the lattice in Figure 7.1(a), and the space-time diagram for this adaptation is
shown in Figure 7.2(a). We show only the minimum number of adaptation-speciﬁc mes-
sages in the space-time diagram. There may be other application-speciﬁc messages that
we do not consider as they are not related to adaptation. We divide adaptation into two
phases: (2') initialization phase, and (ii) synchronization phase. In the initialization phase
(denoted by IP in the ﬁgures) the initiator process that decides on the adaptation informs
other processes of this decision. In the synchronization phase (denoted by S P in the ﬁgures)
processes exchange messages to co-ordinate the execution of adaptive actions.

In Figure 7.2(a), process p2 is the initiator that informs other processes to start per-
forming any steps required for adaptation. In this case, there are at least two messages
required to initiate adaptation. Now, if the adaptation were to occur according to the lattice
of Figure 7.1(b), then we can make process p1 as the initiator, and the space-time diagram
would be as shown in Figure 7.2(b). In this case, we got rid of the initialization messages
that were required for the adaptation described by Figure 7.2(a). In both the cases, the min-

imum number of adaptation-speciﬁc messages required during adaptation is two. Thus, we

96

 

 

did not increase any communication overhead by reducing concurrency. We now consider
another scenario where the number of messages can actually be reduced if concurrency is

reduced during adaptation.

 

 

 

 

 

 

(b)

Figure 7.3: Adaptation with concurrency.

Consider the lattice of Figure 7.3(a) that describes the adaptation consisting of four
adaptive actions a1, a2, a3, and a4 occurring at processes p1, p2, p3, and p4 respectively.
Adaptive actions a1 and a2 are independent of each other and can occur concurrently.
Similarly, a3 and a4 can occur concurrently. The corresponding space-time diagram is
shown in Figure 7.3(b). The minimum number of adaptation-speciﬁc messages required
during adaptation is ﬁve. Now, if were to reduce concurrency in this case, we can have
the adaptation that is described by the lattice of Figure 7.4(a), and corresponding space-
time diagram as shown in Figure 7.4(b). In this case, the minimum number of adaptation-
speciﬁc messages required is reduced to four.

Further, if we have no concurrency during adaptation as described by the lattice of

Figure 7.5(a), then the space-time diagram would be as shown in Figure 7.5(b). In this case,

97

P1
P2

‘HD

0 Q f
N
_————4
D

 

P4

 

 

(b)

Figure 7.4: Adaptation with (reduced) concurrency.

a minimum of only three adaptation-speciﬁc messages are required during adaptation.

 

 

 

 

IP SP
01

P1 : X e
p2 - :21 1
P3 . c -

a3 X
P4 Ta: e

4

 

 

(b)

Figure 7.5: Adaptation with no concurrency.

Thus, by reducing concurrency during adaptation, it is possible to reduce the number
of messages required during adaptation. However, from the space-time diagrams of Fig-
ure 7.2-7.5, it is clear that time required to complete adaptation would probably be less
when there is more concurrency during adaptation. Thus, while designing adaptation, one
should consider various factors such as concurrency during adaptation, message delays and

98

communication overhead, and veriﬁcation complexity.

7.3 Case Study: Publish-Subscribe Application

In this section, we illustrate the tradeoffs due to concurrent adaptive actions during adap-
tation using a simple publish-subscribe application. We consider the publish-subscribe ap-
plication with two publishers (senders) and two subscribers (receivers). Both the receivers
subscribe to receive data from both the senders. For reliable communication between pub-
lishers and subscribers we consider two protocols, namely, the proactive protocol based
on forward error correction and the reactive protocol based on acknowledgments. These

protocols are discussed earlier in Chapters 4 and 5.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

P3
P4
(b)
IP 3:2
P1 (=11 - :5- a
P2 1 c2 1 Va; a
P3 - e ‘
a3X
P4 - g
4

 

(C)

 

Figure 7.6: Adaptation in publish-subscribe application.

99

The adaptation in publish-subscribe application replaces the proactive protocol with the
reactive protocol. The adaptation is done by ﬁrst blocking the two senders. The blocking of
two senders can be done concurrently, i. e. , independent of each other. We note that the local
guards of the adaptive actions that block the senders need to be true before they can be exe-
cuted. As a result, though the two adaptive actions are independent of each other, they may
not necessarily execute at the same instant during adaptation. Once the protocol fractions
at both the senders are blocked, the protocol fractions at both the receivers can be replaced
concurrently. Finally, once the receivers have replaced to the new protocol fractions, the
protocol fractions at the senders can be replaced. These replacement of fractions at the
senders can also occur concurrently. The adaptation lattice in this case is shown in Figure
76(3). The corresponding space-time diagram is shown in Figure 7.6(b). The veriﬁcation
complexity and communication overhead can be reduced for this adaptation as discussed in
Section 7.1. Speciﬁcally, if all the adaptive actions are serialized then the space-time dia-
gram for the adaptation is as shown in Figure 7.6(c). Clearly, the communication overhead
is reduced ﬁ'om a minimum of 9 messages to a minimum of 5 messages. Also, the number
of intermediate programs that need to considered for veriﬁcation of adaptation is reduced

from 8 to 5.

100

Chapter 8

Testing Adaptation

In order to specify and verify the behavior of the system during dynamic adaptation, we
presented an approach based on adaptation lattice in Chapters 3, 4, and 5. Due to com-
plexity of the adaptive systems, the veriﬁcation is often done on an abstract model of the
system. In this chapter, we present an approach for testing adaptation to gain assurance
about the implementation of adaptation.

Predicate detection is a common approach used in testing and debugging of distributed
systems, as many problems in distributed systems can be formulated as an instance of
Global Predicate Evaluation (GPE) [78]. Typical properties of distributed systems such as
deadlock detection, mutual exclusion, termination and many more properties can be tested
using predicate detection techniques. Numerous approaches [78—83] have recognized a
variety of predicate classes and presented algorithms for predicate detection. In this paper,
we discuss predicate detection approach for testing adaptive systems.

Due to overlapping behavior of the old program and the new program during adaptation,

the existing algorithms for predicate detection cannot be applied directly. Speciﬁcally, these

101

algorithms do not deal with the system whose code is being changed. In many cases, the
algorithms can be modiﬁed so that if any error is detected during adaptation then it can be
mapped to a particular step of adaptation that caused the error.

With this motivation, we extend the existing algorithms to test the system during adap-
tation. In particular, we classify the predicates to be detected during adaptation into two
types: (2') adaptation-stable predicates, and (ii) adaptation-transient predicates. We call
a predicate adaptation-stable if the predicate holds throughout during adaptation, and call
it adaptation-transient if it holds only in some interval during adaptation. We show how
existing algorithms can be modiﬁed to detect both these types of predicates during adapta-
tion. Furthermore, we show how we can reduce the cost of testing by testing only atomic
adaptations.

The rest of the chapter is organized as follows. In Section 8.1, we review preliminary
concepts of distributed computation, causal precedence, consistent global state and consis-
tent cut. Then, we introduce adaptation vector in Section 8.3. We also give a brief overview
of vector clocks used to track causality among events in Section 8.3. Subsequently, in Sec-
tion 8.4, we discuss algorithms to test adaptation. To reduce the cost of testing, in Section
8.5, we identify a subset of states during adaptation to do limited testing. Finally, we give

a summary of this chapter in Section 8.6.

8.1 Preliminaries

As discussed in Chapter 3, a program ’P consists of a set of n processes {p1, p2, pn}

communicating via asynchronous messages on interprocess channels. We do not assume

102

the channels to be FIFO (unless the application requires so). Furthermore, in this chapter,
we do not specify the channel states explicitly; a channel state can be constructed from con-
sidering the local states of the processes. The execution of a process consists of a sequence
of events. An event is the execution of a process action. An event is one of the three types:
local (or internal) event, send event or receive event. A action corresponding to send event
has a statement of the form: send (m) to 19,. A receive event has a corresponding action
that has the following form: rcv (m) from p,- —> stmts.

We now present formal deﬁnitions of distributed computation, causal precedence, con-
sistent global state and consistent cut. In this section, we consider partial order semantics
for a program.

Deﬁnition (Distributed computation). A distributed computation r of a program P de-
scribes a single execution of P by a collection of traces r[i] for each process pi. Each

r[i] is a ﬁnite alternating sequence of states and events. For example, the trace of process

01 12 ls:

’l and

p,- is s ..., where sf denotes the local state of p,- immediately aﬁer event e
3? denotes the initial state before any actions are executed. A distributed computation is

commonly depicted using a space-time diagram as shown in Figure 8.1.

 

 

 

1 2 3 4
e1 61 31 e1
P1 ‘ ‘ ‘ ‘ r
1 2 3 4
92 92 92 32
p2 - f ‘ -
p3 - ~2 -
e e3 e3

Figure 8.1: Space-time diagram of a distributed computation.

103

Formally, a distributed computation is a partially ordered set deﬁned by the pair (E, —> ) ,
where E is the set containing all events and —> is the happened-before relation [84] that
deﬁnes the causal precedence relation on events (or states). The happened-before relation
on states is deﬁned as follows:

Deﬁnition (Causal precedence). The state 3 causally precedes the state t (denoted as

s ——> t) if and only if one of the following holds:
0 if s and t are states on the same process and 3 occurred before t

o if action following 3 is the send of a message and the action before t is the corre-

sponding receive
a there exists a state a: such that s —+ a: and a: —> t.

We use the notation —> to denote the causal-precedence relation for both states and
events. If for two events el and e2, neither el —+ e2 nor e2 —> e1, then el and e2 are said
to be concurrent and denoted by e1 ll e2.

Not all events are relevant for testing of dynamic adaptation, and hence to simplify the
testing, we only consider a subset of events of distributed computation. Let R Q E be
the set of relevant events. The poset (R, —->) describes an abstraction of the distributed
computation.

Deﬁnition (Global state and cut). We deﬁned a global state as a state of all the processes
and channels in the system. We, however, ignore the channel states and represent global
state as a n-tuple of local states (3?, ..., sf," ). A channel state can be constructed from the

set of all messages that have been sent but not received yet.

104

A cut C associated with a global state g is a set of events, one event per process, such
that event e E C if and only if the process state immediately after event e is a part of g. A
global history H associated with the cut C (or corresponding global state 9) of P is deﬁned

as the subset hi1 U U 12%", where It?i is the local history of process pi containing ﬁrst c,-

Deﬁnition (Consistent cut and consistent global state). A cut C is consistent if for all
events e in the corresponding global history H, we have (6 E C) A (e’ ——> e) => e' E H.
A global state 9 is consistent if the cut corresponding to it is consistent. Intuitively, a
consistent global cut corresponds to a view of the run which could be obtained given the
existence of a global clock.

Given a space-time diagram, it may not always be possible to say how the global ex-
ecution actually occurred. For example, it is not clear in Figure 8.1 if a? or a3 occurred
ﬁrst. In other words, there are many total orders of a distributed computation r. Thus,
several possible global executions correspond to a given distributed computation r. We call
each total order of r as an observation. In other words, a sequence of consistent global
states 90919293... is an observation, where g0 denotes the initial global state (3?, ..., 39,),
and each global state 9; is obtained ﬁom previous state gz-_1 by some process executing
a single event. For two such global states Eli-1 and 92-, we say that gi_1 leads-to 92-. The
set of all consistent global states of a computation along with the leads-to relation deﬁnes
a lattice of global states or computation lattice. A path in the computation lattice corre-
sponds to an observation, and each observation has a corresponding path in the computation
lattice. Thus, the computation lattice represents the set of all possible observations of the

computation.

105

Remark. In the following sections, unless mentioned otherwise, we will use global

state to mean consistent global state.

8.2 Testing

Various global properties of the system need to be tested during adaptation. Examples of
the properties that can be tested during adaptation include deadlock detection, token loss
detection, and in general monitoring. We classify the properties to be checked during adap-
tation into two categories: (i) adaptation-stable properties, and (ii) adaptation-transient
properties. If a property is to be satisﬁed for all states during adaptation, then it is known
as adaptation-stable property, and if a property is to be satisﬁed for some interval during
adaptation, then it is known as adaptation-transient property. A predicate used to specify
adaptation-stable property is known as adaptation-stable predicate, and a predicate used
to specify adaptation-transient property is known as adaptation-transient predicate. From
deﬁnition, it is easy to observe that adaptation-stable predicates are speciﬁed in terms of
variables that are not affected (added or removed) due to atomic adaptations. On the con-
trary, adaptation-transient predicates are speciﬁed in terms of variables that are affected
(added or removed) at some point during adaptation.

Given a test predicate, a predicate evaluation strategy, in general, would construct and
test every global state of the system during adaptation. General predicate testing is normally
considered impractical for reasonably big systems, as the number of global states can be
exponential in the number of processes.

While testing for arbitrary predicates may be very expensive, efficient testing is feasible

106

if predicates have certain characteristics. Speciﬁcally, in [78—83], authors have identiﬁed
efﬁcient algorithms for a variety of predicate classes such as conjunctive predicates, dis-
junctive predicates, observer-independent predicates, stable or unstable predicates. How-
ever, in context of testing adaptation, these algorithms either cannot be employed directly
or are inefﬁcient if used directly as code of the system is changing during adaptation. In
particular, these algorithms need to be extended to classify the global states constructed
during adaptation into speciﬁc intermediate programs. Towards this end, we introduce

adaptation vector to distinguish intermediate program states.

8.3 Adaptation Vector

Adaptation vector is used to identify the intermediate program states. An adaptation vector
A is a vector of n elements, where n is the number of atomic adaptations. Each element
of the adaptation vector, denoted by A[i], is boolean valued (1 represents true, 0 represents
false); A[i] = 1 denotes execution of atomic adaptation ai in past and A[i] = 0 denotes that
atomic adaptation a,- has not yet executed. For simplicity of discussion, we assume that the
adaptation consists of 7?. atomic adaptations a1, a2, ..., an and atomic adaptation a, occurs
at process 1723- Our approach can be easily extended if multiple atomic adaptations occur at
a process.

Each node of the adaptation lattice is assigned an adaptation vector. A value of [0, ..., O]
is assigned to the start node, which denotes the old program where no atomic adaptations
have occurred. A value of [1, ..., 1] is assigned to the end node, which denotes the new

program reached after all atomic adaptations have occurred. If [a1, a2, ..., an] is an adap-

107

tation vector, then the value of [1, 1,0, ..., 0] denotes an intermediate program reached after
execution of atomic adaptations a1 and a2. Given an intermediate program and its corre-
sponding adaptation vector, we can determine what atomic adaptations occurred in the past
to reach that intermediate program. However, we cannot determine the order (or causal

precedence relation) among those atomic adaptations.

8.3.1 Implementation of Adaptation Vectors

Each process keeps its own local adaptation vector AV, which is updated as the process

learns about new atomic adaptation. The adaptation vector is maintained as follows:

0 When a process p, executes its own atomic adaptation ai, it updates the adaptation

vector AV,- by AV,[i] = 1.

0 When a process p,- sends a message m to process pj, it attaches the current value of

AV, to m. This value is denoted by m.AV.

0 When a process p,- receives a message m from process pj, it updates its adaptation
vector value as AV,- 2 AV, V m.AV, where the 0R operation over vectors is deﬁned

on a component-by-component basis.

8.3.2 Implementation of Vector Clocks

A vector clock system [84, 85] is a mechanism that assigns vector timestamps to each event
such that comparing the timestamps of two events indicates the causal relation between
two events. Each vector timestamp is of size n, where n is the number of processes. Each
process p,- has a vector V,- [1..n] of integers, which is maintained as follows:

108

o Vi[1..n] is initialized to [0,0, ...,0].

o If e,- is a relevant event, then p,- increments its vector clock entry as V,- [i] := V,- [i] + 1.

It also associates the vector timestamp V,- with the event e,- which is denoted as ei.V.

0 When a process pi sends a message m, it attaches the current value of Vi to m. This

value is denoted by m.V

0 When p,- receives a message m, it updates its vector clock value as V,- =
max(V,-, m.V), where the maximum operator over vectors is deﬁned on a

component-by-component basis.

If e.V and f .V are two timestamps associated with distinct events e and f respectively,
then the fundamental property associated with vector clocks is described as follows:
V(e,f) E RXR : ((e —> f) 4:) e.V < f.V),where

e.V < f.V E (Vk : (e.V[lc] S f.V[k]) A 313 : (e.V[k] < f.V[k])).

8.4 Detecting Global Predicates During Adaptation

Given a global state of the system and the value of adaptation vector in that state, we
can identify the predicates from the adaptation lattice that should be true in that state.
Adaptation-stable predicates need to be checked for all global states constructed during
adaptation. The intermediate program in which the adaptation-stable predicate was de-
tected can be easily identiﬁed from the value of adaptation vector associated with the state
in which the predicate was detected. On the other hand, adaptation-transient predicates
need to be checked only in states of the corresponding intermediate program.

109

 

 

 

P

 

73

C)

.55. 0

ll 1}
$3

é

NSBN _x

ES

4’.

l

 

 

pzeooo \ \x 010 " "111——]-—.

 

 

 

p3 000 %010$]—110—]o[——’———111 1
2 3
e; e. 63 =65

 

 

Figure 8.2: Identifying intermediate program states.

Figure 8.2 shows how each process is divided into different sections based on the value

of its adaptation vector AV. Whenever a local state sf of process p,- is to be collected, the

k

state 32- is assigned an adaptation timestamp whose value (denoted as s.AV) is equal to the
current value of AV).

We construct a global state as a n-tuple of local states (31, 32, ..., sn). This global state
is a state of the intermediate program whose adaptation vector is equal to .31 .A V V 32 .AV V
V sn.AV, where

3)./1V V sj.AV = [si.AV[0] V sj.AV[O], ..., si.AV[n] V sj.AV[n]]

We now discuss the algorithm for detecting weak conjunctive [79] adaptation predi-
cates. A predicate is called weak if it is true for some observation of the distributed compu-
tation, and is similar to possibly predicate in [78]. Conjunctive predicates are of the form
C1 A A On, where each C,- is a local predicate. This class of predicates allow each pro-
cess to independently evaluate its local predicate. A weak conjunctive predicate is true if

and only if there exists an observation in which all conjuncts are true in some global state.

This type of predicate typically describes some bad or undesirable property; in other words,

110

predicates that should never become true in the system. The algorithms discussed in this

section are extensions of algorithms in [79].

8.4.1 Detecting Adaptation-Stable Predicates

In [79] it has been shown that to detect a weak conjunctive predicate it is necessary
and sufﬁcient to ﬁnd a set of concurrent states in which local predicates are true. Let
urcp = C1 A A Cn denote the weak conjunctive predicate to be detected. We discuss
the centralized algorithm in which one process serves as a checker process and all other
processes involved in urcp are referred to as non-checker processes.

The algorithms for non-checker and checker processes are shown in Figures 8.3 and
8.4 respectively. Each non-checker process maintains its own vector clock and adaptation
vector. The values of vector clock and adaptation vector are updated as discussed earlier in
this section. Whenever the local predicate of a process becomes true for the ﬁrst time since
the most recently sent message (or the beginning of the trace), it generates a debug message
containing its vector clock and adaptation vector, and sends it to the checker process.

The checker process maintains a separate queue for each process involved in the uwcp.
Whenever a debug message is received from the process it is enqueued in the queue corre-
sponding to that process. It is assumed (for sake of efﬁciency) that the checker process gets
its message from any process in FIFO order. If the underlying computation is not FIFO,
the checker process can ensure FIFO property by using sequence numbers in messages.
The algorithm compares the vector clock values at the head of the queues to determine if a

consistent global state can be constructed.

1]]

 

process p,- (non-checker)

var V : array {initially,Vj : i 72$ j : V[j] = 0; V[i] = 1}
/* vector clock */
AV : array {initially, Vj :: AV[j] = 0}
/* adaptation vector */
first f lag : boolean {initially, first f lag = true}

/"‘ ﬁrst time the local predicate is true after any send event */

local _pred : boolean expression
/* the local predicate to be tested */

D replace adaptive action {0.2-} with
(12'; AV]l] = 1
C] replace send statement {send (m) to pj} with
send (m, V, AV) to pj;
V[i], firstflag 2: V[i] + 1, true
El replace receive action {rcv (m) from pj —> stmts} with
rcv (m,m.V, m.AV) from pj —> stmts;
W = V17] = maX(Vlilam-V[J'l);
v3" : Avm = AVljl v m.Avm
El add local predicate detection action
if (local_pred = true) A first f lag then
firstflag :2 false;
send (dbg, V, AV) to p (checker process)
ﬁ

 

 

 

Figure 8.3: Algorithm for adaptation-stable predicate (non-checker process Pi)-

The algorithm is initiated whenever any debug message is received from a non-checker

process. If the corresponding queue is non-empty, then the message is inserted in the queue;

otherwise it is checked if the message lead to the case where the conjunctive predicate

became true. The algorithm maintains changed and newchanged variables to ensure that

only those heads of the queue are compared which have not been compared earlier. When

the while loop terminates if all the queues are non-empty, then the intermediate program

can be ascertained in which wcp was ﬁrst detected.

112

 

process p (checker)
var q1,...,qn : queue of(V, AV)
changed, newchanged : set of {1, 2, ..., n}

Cl rcv (elem) from Pk —->
/* elem.V and elem.AV denotes the value of vector clock and adaptation vector */
insert(qk, elem);
if (head(qk) = elem) then
changed := {1:};
while (changed 7Q ab) do
newchanged := {};
for i in changed A j in {1,2, ...,n}
if (aemptvmi) /\ aempty(qj)) then
if head(q,;).V < head(qj).V then
newchanged := newchanged U {i};
ﬁ
if head(qj).V < head(q,-).V then
newchanged z: newchanged U {j};
ﬁ
ﬁ
changed 2: newchanged;
for i in changed
deletehead(q,-);
od /* end while */
if Vi : nempty(qi) then
found := true;
int.program_AV :2 [V?=1 head(q,-).AV[1], ..., 52:1 head(q,-).AV[n]]
/* intermediate program in which the predicate was detected */
ﬁ
ﬁ

 

 

 

Figure 8.4: Algorithm for adaptation-stable predicate (checker process).

8.4.2 Detecting Adaptation-Transient Predicates

Consider a weak conjunctive predicate wcp that is to be checked for some intermediate
program I during adaptation. Let I be represented by the adaptation vector I AV- The
goal is to detect wcp during the interval of adaptation that corresponds to the intermediate
program I.

113

When detecting adaptation-transient predicates, the non-checker processes have to
check for local predicates only for some interval of adaptation as deﬁned by the inter-
mediate program in which the adaptation-transient predicate is to be checked. Speciﬁcally,
each non-checker process identiﬁes a set of local states that are potential states of the in-
termediate program I. Let the current value of adaptation vector in state 8 of process p,-
be AV), which is denoted by s.AV. We ﬁrst present the following lemma in the context of
identifying potential states of an intermediate program locally at each process.

Lemma 7.1. For state 3 ofprocess pi, s.AV[i] = 0 2) Vj : j -+‘ i : AVy-[i] = 0

Proof. From deﬁnition of adaptation vector in Section 8.3, adaptive action a,- occurs at
process 1),. The lemma states that if process p,- has not performed its atomic adaptation,
then adaptation vectors at all other processes denotes the same. Also, if adaptation vectors
at all other processes denote that process p,- has not performed its atomic adaptation, then it
does not imply that p,- has not performed its atomic adaptation. The proof is apparent from
the implementation of adaptation vectors. El

We now discuss the following two cases at each process p,- to identify the potential
states of intermediate program I:

Case 1 : I AVli] = 0. This is the case where intermediate program 1 occurs before
the atomic adaptation at process p, is executed. s is a potential state of the interme-
diate program I, if s.AV V [AV 2 IAV- For example, if AV] 2 [1,1,0,...,O] and
s.AV = [1,0,0, ..., 0] then s is a potential state of intermediate program I, whereas
s.AV = [1,0, 1, ..., 0] is not a potential state of intermediate program I.

Case 2 : I AVlll = 1. This is the case where intermediate program I is reached after the
atomic adaptation at process p, is executed. From Lemma 7.1, it is clear that in this case

114

checking the condition s.AV V I AV 2 I AV is not sufﬁcient. For example, if s.AV =
[0,0, ..., 0] then the condition is true, but atomic adaptation at process p,- is not executed
(because s.AV[i] = 0). Therefore, we also need to check that s.AV[i] = 1, i. e., atomic
adaptation at process i has executed. In other words, we also need to check for the following
condition: s.AV[i] = I AVlil-

Based on the above discussion, we modify the algorithm of Figure 8.3 for non-checker
process so it can detect adaptation—transient predicates. The modiﬁed non-checker process
algorithm is shown in Figure 8.5. In the algorithm, plist maintains a list of - predicate,
the corresponding intermediate program in which it needs to be checked, and the checker
process that is checking that predicate.

We maintain one checker process algorithm for each adaptation-transient predicate that
needs to be checked. The checker process algorithm is modiﬁed so that whenever it ﬁnds
a consistent state (i. e., all queues are non-empty) it checks the adaptation vectors to ensure
if that consistent state belongs to intermediate program I. The modiﬁed checker process

algorithm to detect adaptation transient predicate is shown in Figure 8.6.

8.5 Testing Only Atomic Adaptations

In Section 8.4, we showed how we can test predicates that have certain characteristics
during adaptation. However, if the test run is large then it increases the cost of testing. It
is, therefore, desirable if we can reduce the cost of testing by doing a partial testing on a
small subset of states rather than testing the entire run of adaptation. In this case, the states

chosen should be such that testing predicates in these states would still give reasonable

115

 

process p,- (non-checker)

var V : array {initially,Vj : i #j : V[j] = 0; V[i] = 1}
/* vector clock */
AV : array {initially,‘v’j :: AV[j] = 0}

/* adaptation vector */
first flag : boolean {initially, first flag = true}
/"‘ ﬁrst time the local predicate is true after any send event */
plist (IPAV, local_pred, cp)
: array ( array, boolean expression, process id )
/* local predicates to be tested */
[:1 replace adaptive action {ai} with
a,; AV[Z] = 1
1:1 replace send statement {send (m) to pj} with
send (m, V, AV) to pj;
V[i], firstflag := V[i] + 1,true

El replace receive action {rcv (m) from pj —> stmts} with

 

rcv (m,m.V,m.AV) from pj —* stmts;
w = vm = maxrvv1,m.vm>;
Vj : AV[j] = AVLj] V m.AV[j]
Cl add local predicate detection action
forp in plist
ifp.IPAv[i] = AV[i] /\ (p.IPAV V AV) = p.IPAV
Ap.local_pred = true A first f lag then
firstflag := false;
send (dbg, V, AV) to pop (checker process)
ﬁ

 

 

 

Figure 8.5: Algorithm for adaptation-transient predicates (non-checker process pi).

assurance about correctness of adaptation.

With this motivation, we choose the states before and after each atomic adaptation for
testing. For adaptation to be correct, each atomic adaptation should occur in some “safe
state” of the system. Informally, each atomic adaptation should occur when the system
is ready to deal with this change. Thus, instead of checking all the states of intermediate

programs, we can check all the possible states in which the atomic adaptation could have

116

 

process p (checker)

var q1,...,qn : queue of(V, AV)
changed, newchanged : set of {1, 2, ..., n}
[P AV : array

/* adaptation vector of being checked */

[:1 rev (elem) from Pk -—>
/* elem.V and elem.AV denotes the value of vector clock and adaptation vector */
insert(qk, elem);
if (head(qk) = elem) then
changed := {1:};
while (changed aé (:5) do
newchanged := {};
for i in changed A j in {1,2, ..., n}
if (nemptymi) A permits/((13)) then
if head(qz').V < head(qj).V then
neurchanged :2 newchanged U {7.};
ﬁ
if head(qj).V < head(qz-).V then
neutchanged :2 newchanged U {j};
ﬁ
ﬁ
changed :2 newchanged;
for i in changed
deletehead(q,-);
od; /* end while */
if Vi : nempty(q,-) then
int_program_AV := [V?:1head(qz-).AV[1],...,V?:1head(q,-).AV[72]]
if IP AV = int.program_AV then
found := true
ﬁ
ﬁ
ﬁ

 

 

 

Figure 8.6: Algorithm for adaptation-transient predicate (checker process).

occurred (or could have lead to). Moreover, since the set of possible states before and after
atomic adaptations is expected to be much smaller than the set of all states it may also be

possible to check for arbitrary predicates in those states.

117

 

For example, consider the adaptation lattice of Figure 4.2. In this case, when atomic
adaptation a3 is executed the predicate associated with node R1 should be true, and when
a3 has completed execution the predicate associated with node Q should be true.

We now discuss how we calculate, for each atomic adaptation, the set of states in which
the atomic adaptation could have occurred. The approach to ﬁnd the set of states after each
atomic adaptation is similar. We ﬁrst deﬁne some notions that we borrow from Venkatesan
and Dathan [82]. A spectrum spec,- in process p,- is a sequence of consecutive states of
Pi- The ﬁrst (respectively, last) event of spec,- is denoted by f irst,-(spec,-) (respectively,
lasti(spec,-). A state 3': in spec,- is the ﬁrst state in spec,- if Vs : s E spec,- : s: —> 3.
Similarly, a state sf in spec,- is the last state in spec,- if Vs : s E spec,- : s —-> sf . The ﬁrst

I

consistent cut state of 5‘: at process pk, denoted by first “5‘: ), is the earliest state 3k in Pk

such that 3k and sf can be in a consistent state. Similarly, the last consistent cut state of szl

at process pk, denoted by last k(sg ), is the latest state 3% in pk such that sic and s; can be

in a consistent state. For example, consider Figure 8.1 of Section 8.1. Let 527 be the state of
. -I
p,- immediately before the execution of event eg , and 3': be the one immediately after the

I
execution ofthat event. In Figure 8.1, first1(s%) = s? , and last1(s‘21) = 3%.

f = a,- be the atomic adaptation at process 1),. Let sf be the state of the

Now, let e
process p,- just before the execution of atomic adaptation (2,. To identify the set of global
states in which a,- could have occurred, we need to identify the local states at all processes
that are consistent with state sf. We ﬁrst identify the spectrum of states in each process that
are consistent with 51-“. Let specj(a,-) denote the spectrum of states in process pj, possibly
empty, that are consistent with state sf . We have,

specj(ai) = [firstj(sf),lastj(sf)]

118

 

Note that sped-(oi) = sf. We use the property of the vector clock to ﬁnd f irst j (s 2. ) =

l.

8.7

and lastﬂsf) == 55.”. If s§.V[j] = 0, i.e., no state of pj precedes sf, then I = 0;
otherwise,

sly-.VU] = sir/[J] A $371.14)] < sf.V[j], and

(sE-"zVM g sf.V[i]) A ((8? = final_statej) V (s?+1.V[i] > sf.V[i]))

We use final _statej to denote the state in which process pj terminates.

In this manner, we can ﬁnd the spectra spec1(ai), ..., specn(ai) at processes p1, ..., pn
respectively. Since the trace is typically small, it may be possible to construct all possible
consistent states from these spectra. This allows us to check even arbitrary predicates in
all possible states in which atomic adaptation could have occurred. Furthermore, given
all these spectra we can use the algorithms from Venkatesan and Dathan [82] to test for
different types of predicates using intersection of these spectra. Moreover, if the predicates

are of certain types, we can also use algorithms discussed in Section 8.5 on the collected

trace for each atomic adaptation.

8.6 Chapter Summary

In this chapter, we discussed testing of adaptation in distributed systems using predicate
detection. We identiﬁed two classes of predicates, namely adaptation-stable and adapta-
tion-transient predicates, that occur during adaptation. We introduced adaptation vector to
identify intermediate program states in testing dynamic adaptation.

Furthermore, to reduce the cost of testing, we identiﬁed a subset of states to do limited

testing of adaptation. Speciﬁcally, instead of checking for all intermediate program states,

119

we presented an approach to test the system before and after each atomic adaptation. We
discussed an approach to calculate spectra at each process consistent with the state before
(or after) each atomic adaptation. We can use the existing algorithms for predicate detection

on this smaller trace.

 

t-

120

Chapter 9

Component Family: Design of Adaptive

Components

In this chapter, we discuss component family design to support runtime adaptation. We
ﬁrst discuss, in Section 9.1, the issues that arise in developing compositional adaptation.
Then, in Section 9.2, we present the abstract design of component family. In Section 9.3,
we discuss the concrete design of component family. Next, in Sections 9.4 and 9.5, we
illustrate the use of component family using the case studies. We discuss some questions
related to component family design in Section 9.6. In Section 9.7, we discuss the related

work and ﬁnally, in Section 9.8, we summarize the advantages of component family.

9.1 Introduction

In compositional adaptation, which is the primary focus of this dissertation, some algo-

rithmic or structural parts of the system are added, removed or replaced at runtime. Com-

121

positional adaptation enables an application to adopt new algorithms and strategies for
addressing concerns that were not known at the time the original application is developed.

Typically, in the case of compositional adaptation, separations of concerns principle
[64, 86] is employed to separate the adaptation concern. The separation of concerns is an
ubiquitous software engineering principle which states that for a given problem different
kinds of concerns should be identiﬁed and separated to cope with complexity and achieve
robustness, adaptability, maintainability and reusability. In other words, it states that soft-
ware should be decomposed in such a way that different aspects of the problem are solved
in well-separated modules or parts of the software. Consequently, adaptive applications
are designed to separate the functionality that gets adapted from rest of the application.
In object-oriented programming, separated concerns are modeled as objects and classes,
while in structural programming these concerns are modeled as functions or procedures.
We argue that these two programming paradigms are not adequate to model concerns that
spread across multiple processes. For example, in the case of adaptive distributed systems,
functionality that gets adapted may involve changes to multiple processes across the sys-
tem. To deal with this we consider component-based paradigm, where the basic unit of
concerns are components [28, 87].

A component is deﬁned as follows : “a component is an executable unit of code that
provides physical black-box encapsulation of related services. Its services can only be ac-
cessed through a consistent, published interface that includes an interaction standard. A
component must be capable of being connected to other components (through a commu-
nications interface) to form a larger group” [87]. Although objects or procedures form

the underlying fabric of software solutions, it are components that provide the effective

122

granularity to model adaptive systems. In other words, object-oriented programming or
structural programming may be used to realize component-based systems. Thus, composi-
tional adaptation can be considered as addition, removal or replacement of components.
Existing approaches to building adaptive software have some notion of a component,
which is similar to the one deﬁned above. In many of the approaches, the term component
refers to a replaceable class or object [88]. However, in our approach, a component can
consist of several objects and these objects may be spread across different processes of a
system. Regardless of how a component is deﬁned, adaptive software needs some mecha-
nisms for examining and understanding the state of its components (introspection) and also
some mechanisms to modify the components (intercession). There are three main issues
that occur while developing introspection and intercession mechanisms for compositional

adaptation:

0 Reference update. As shown in Figure 9.1, when a component is replaced (respec-
tively, added or removed) the references pointing to the component need to be up-
dated. Speciﬁcally, when an old component is exchanged for a new component it
is necessary to update all the references to the old component such that they refer
to the new component. This is important to ensure that the system is not left in an
inconsistent state after component replacement. A common solution to this prob-
lem is achieved through indirection, which allows decoupling of an application. The
standard practice of programming to an interface [89] is useful in achieving this indi-
rection. This allows an application to access a component while ensuring that access

methods are not coupled to the implementation of the component.

123

   

/
[ Reference ]

Old 1
Component ]
gy

 

(a) Incorrect/incomplete update (b) Correct/complete update

Figure 9.1: Reference update during adaptation.

0 State transfer. When a component is replaced, the state of the old component needs
to be transferred to the new component. This is necessary for performance as well
as correctness reasons. It is not optimal to let the new component start from initial
state, as the new component may have to do the computations again which the old
component already performed. Moreover, in some cases it may be incorrect to have
the new component start ﬁom an arbitrary initial state. If two components are to be
interchanged at runtime, then mechanisms to extract the state from the old component

and inject the equivalent state into the new component need to be considered.

Synchronization. When a component is added, removed or replaced, it is important
to control the access to the component during the change. For example, once the
state of the old component is extracted for transfer to the new component, the appli-
cation should not modify the old component as it would lead to unexpected results.
It is necessary to have proper intra—process and inter-process synchronization during

adaptation, as lack of synchronization can lead to undesirable and incorrect behavior.

We now consider the three main limitations concerning most of the existing approaches
for building adaptive software:

124

1. Tight coupling. Most adaptive softwares have an entity that is normally categorized
as aﬂamework or a composer, whose goal is to perform adaptation related steps. In
other words, it primarily deals with the above three issues, namely, reference update,
state transfer and synchronization. Most approaches to adaptive software are tightly
coupled in terms of components that are adapted and framework that performs adap-
tation. Speciﬁcally, the frameworks are either not reusable with other components
and applications, or the role of the framework is not clearly deﬁned. Many times
the introspection and intercession related steps are tightly coupled with the compo-
nent functionality. These kind of couplings leads to the following four problems: (i)
makes the veriﬁcation of adaptation difficult, (ii) restricts independent development
of components, (ii) restricts reuse of component and framework, and (iv) modifying

introspection and intercession mechanisms are difficult.

2. Large number of adaptations. There are several components available for a given
functional requirement. For example, to provide reliable communication one can use
a proactive component based on forward error correction or a reactive component
based on acknowledgments. The main motivation for adaptation is to provide an
appropriate component for a given functionality. Now, consider the scenario where
there are n different components that provide similar interface and ﬁinctionality, and
the choice of the component depends on application requirements and environment
conditions. In this case, there are n(n — 1) possible adaptations among these com-
ponents. Considering the fact that each adaptation needs to address introspection

and intercession issues and also leads to the four problems related to tight coupling,

125

identifying all these adaptations is a difficult and unfeasible task. Moreover, if the
developer of the component wants to expose the details of that component to only a
subset of developers of other components then adaptation between that component

and the remaining components may not be possible.

. Unanticipated components. While developing adaptations among existing com-
ponents, it is difficult to anticipate new components those would be developed later.
Furthermore, while developing a new component one does not know about all the
existing components that can potentially be replaced by the new component. More-
over, for the case where one does not know the details of some components, it is
impossible to provide assurance for adaptations to and from those components. As
there is a growing need to build systems that are autonomic in nature, it is desirable
to build applications that can support the use of unanticipated components with little

or no human intervention while still being able to guarantee assurance of adaptation.

To overcome above limitations, we introduce the notion of component family — a sys-

tematic and extensible repository of components that provide similar functionality. There

are two main aspects to component family: (i) overall design of the library of components,

and (ii) design of individual components in the library. The overall design of component

family is based on the ideas of program families as proposed by Pamas [90] for the study

of reusable software. We ﬁrst list the design goals for component family and then give a

formal deﬁnition of component family:

1. Adaptation ready. The components should be designed with adaptation in mind. In

other words, components should provide additional methods required during adapta-

126

tion. For example, the components discussed in [3, 91] provide checkstate function
to check state of the component for safety, block to temporarily block the component

operations, etc.

2. Separation of adaptation logic. The adaptation logic should be separate from the
core functionality of components, thereby, simplifying speciﬁcation and veriﬁcation
of adaptation. The application developer is primarily concerned with using com-
ponent functionality and should not have to deal with details of adaptation among

components.

3. Replaceable adaptation logic. Adaptation from one component to another can be
done in multiple ways. The design should support easy replacement of adaptation
logic. For example, as discussed in Chapter 6, replacing one component by another
can be done either using quiescence adaptation or mixed-mode adaptation. Clearly,

the adaptation logic should be modeled separately and should be easy to replace.

4. Language and platform neutral. The design should be independent of any partic-
ular language or platform so that component family can be implemented in different

languages on a variety of platforms.

5. Application independent. The design should be applicable to components from

different domains.

6. Extensible. The design should be extensible so that new components that are devel-

oped later can be easily integrated into the family.

127

Considering the above goals, we ﬁrst discuss the abstract design of a component family

and then describe the concrete representation of the family.

9.2 Abstract Component Family

Deﬁnition. A component family is a strongly connected directed graph, say (V, E), where
(i) each vertex in V denotes a component, (ii) all components have similar interfaces and
syntactically one component in the family can be replaced by another, and (iii) Each arc
(v1, v2) 6 E denotes that there exists an adaptation from v1 to ’02.

We illustrate a component family in Figure 9.2. Consider the subgraph consisting of
vertices A, B, C, and D, which represents the four components in the family. Direct adap-
tations are deﬁned for each arc in this graph. The arc from A to D denotes that there exists
an adaptation from A to D. In other words, arc (A, D) implies that there is a corresponding
adaptation lattice (as discussed in Chapter 3) that deﬁnes the adaptation from A to D. We
say that arc (A, D) is a veriﬁed are if veriﬁed adaptation from A to D exists. Otherwise
stated, there exists a transitional-invariant or transitional-faultspan lattice corresponding to
the adaptation from A to D.

Furthermore, as shown in the Figure 9.2, there exists a path from component A to
component B, which implies that there exists a sequence of adaptations through which A

can be replaced by B. Such a path exists from A to D, from D to C, and from C to B.

128

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

D C ’

 

 

 

 

 

 

Figure 9.2: An example of a component family

9.2.] Abstract Component Structure

As discussed earlier in this section, in order to keep a graph of component family strongly
connected, we need at least two adaptations associated with each component, such that one
adaptation is to the component and one adaptation is from the component. A component has
different adaptation related actions corresponding to different adaptations that it is involved
in. However, the part of the component that performs the actual ﬁmctionality remains the
same irrespective of the adaptations.

Reckoning the adaptation requirements, each component in a component family is de-
signed to consist of two parts: (i) a functional part, and (ii) an adapt-active part. The
adapt-active part is involved in state-transfer and synchronization related actions that are
needed only during adaptation. In other words, functions of the adapt-active part are in-
voked only during adaptation.

Each component in a family consists of exactly one functional part. A component may
have zero or more adapt—active parts. The adapt-active part of a component corresponds
to a particular adaptation the component is involved in. A component may not have an
adapt-active part (in other words, have an empty adapt-active part), if the component does
not perform any introspection or intercession related actions during adaptation. Also, a

component may have an adapt-active part that is shared with multiple adaptations that it is

129

associated with. From an implementation perspective, depending on the adaptation that the
component is involved in, the appropriate adapt-active part corresponding to that adaptation
should be loaded before adaptation. This can be triggered internally (by some monitoring

module) or externally (by an user).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[7 F
Functional Part Functional Part F‘
Adapt-active Parts Adapt-active Parts
Aad I Ada Bbd Bob
17?— 173— 9
Functional Part Functional Part i”
Adapt-active Parts Adapt-active Parts
Dad ] Dda ]DbdiDdc Cdc L Cob

 

 

 

 

 

 

Figure 9.3: Structure of a component in a family

For example, consider the components A, B, C, and D in a component family as shown
in Figure 9.3. The component B has two adapt-active parts, Bbd and Bob The part Bbd is
used during adaptation from B to D, and the part Bcb is used during adaptation from C to
B.

The abstract design of a component family, as discussed in this section, is useful when
developing high-level abstract models of the components and adaptation. However, there
are several concerns that need to be addressed from implementation perspective while de-
veloping the component families. In the next section, we discuss the concrete representa-

tion of component family.

130

9.3 Concrete Component Family

The various architectural parts of a component family and high-level relationships among

them are shown in Figure 9.4.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[3
Decision Adaptation Adaptation
Maker Controller instantiates Handlers
D l:
8
s ..
E ~08? rd‘wa‘e :g
.9 (1° 00° 5
5 e’” :7.
S .E
j Inspects and E] [3 L
Component Initializes StateMapping
Manager delegate Components Handlers
[:1 D

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 9.4: Architecture of a component family

0 Components. At the core of the component family are the set of components that
provide the actual ﬁmctionality required by the application/user. In the absence of

component family, the application developer uses the components directly.

0 Component Manager. To provide adaptation transparency to the application using
the components, we decouple the application from the components by introducing
an entity between the application and the components. We call this entity Com-
ponentManager. ComponentManager accepts the requests from the application and
delegates that request to the currently active component. It also controls the exchange

of information between DecisionMaker and AdaptationController.

131

0 Decision Maker. DecisionMaker decides what is the appropriate component to be
used based on the environment conditions. The entity external to the component
family may also decide on when and what to adapt. For example, a user requirement
may deﬁne the adaptation. In this case, DecisionMaker that is part of the family may
be either disabled or made to co-ordinate with the external decision maker to reach a
decision. The details of DecisionMaker is beyond the scope of this work. The reader

is referred to [92—95] for more information on decision makers.

o Adaptation Controller. Once ComponentManager has learnt of the decision from
DecisionMaker and decided to adapt it passes that information to AdaptationCon-
troller. AdaptationControiler selects an appropriate AdaptationHandler to execute

the adaptation.

o Adaptation Handlers. Each AdaptationHandIer implements the logic for adapta-
tion. It executes tasks related to reference updates, synchronization, and state trans-
fer. Speciﬁcally, it deals with any dependency and synchronization related issues
while adding, removing, or replacing the components. It also instantiates an appro-

priate StateMappingHandIer to perform any required state transfer during adaptation.

o StateMapping Handlers. StateMappingHandlers are responsible to map the state
of the old component to the new component during component replacement. There
is a separate StateMappingHandler for each adaptation. Various mappings are dis-

cussed in Chapter 3 and StateMappingHandlers implement those mappings.

132

 

 

Doclslonlaakor] L AdaptatlonControllor

 

 

 

 

 

 

 

 

:"Appu"7caro"3": . ,
..... ‘Z---- «uses» «uses»
\ I x
\ I I
«USES» : /
\ I I
\\ T Decision Maker T lAdaptationController
\\
\
\\
ICom onent ~b——| Com onentMana er
p ,I p g I ,1 Adaptationl-Iandlon ]
/ «USES»’,”’
/ ’I,”
I ’ I
X ””
«delegates»/ , ‘ ICMAdaptReady
,’ // ‘\\ «uses»
I I ‘ \ ‘
lComponent ,/ ‘~-\
/’ «delegates» “‘\‘\
I Component I—d-ﬁ’lAdaptReady I Statollapplngl-Iandlon

 

 

Figure 9.5: Interfaces of a component family.

9.3.1 Component Family Interfaces

Figure 9.5 shows how different parts of a component family are related to ComponentMan-
ager. It shows that ComponentManager implements four interfaces, namely, lComponent,
ICMAdaptReady, IDecisionMaker and lAdaptationController. All the functional compo-
nents in the family provide two interfaces [Component and IAdaptReady. I CMAdaptReady
interface extends IAa’aptReady interface. [Component interface speciﬁes the core function-
ality provided by the component. IAdaptReady interface speciﬁes adaptation-related func-
tionality provided by the component. It maps to the adapt-active parts discussed earlier
in the abstract design of component family. The application is external to the component
family and it uses [Component interface provided by ComponentManager.
ComponentManager instantiates the component that will provide the functionality re-
quested by the application. At any time, ComponentManager maintains exactly one active

instance of some component. It uses delegation pattern to delegate any service request from

133

the application to the active component.

ComponentManager also provides IDecisionMaker interface for use by the decision
maker. The choice of how the decision maker is implemented affects the design of IDeci-
sionMaker. The following are two main issues that need to be considered while designing
IDecisionMaker: (2') Whether the decision maker will be a part of the family or external to
the family, and (ii) In the case of a distributed system, which process will have the decision
maker. In the case where the decision maker is part of the family, ComponentManager
creates and initializes an instance of DecisionMaker. DecisionMaker uses IDecisionMaker
interface to collect information from ComponentManager that is required to make a deci-
sion about adaptation and to inform ComponentManager of the decision.

ComponentManager also provides lAdaptationController interface that is used by
AdaptationController. ComponentManager creates and initializes an instance of Adapta-
tionController and informs AdaptationController of the decision made by DecisionMaker.
AdaptationController uses lAdaptationController to get details about the current active
component and any other information that is required by it.

Once AdaptationController has learnt of the decision, it selects an appropriate Adap-
tationHandler that will perform the adaptation. As shown in Figure 9.6, each Adaptation-
Handler extends Abstrachaptatz’onHandler, which provides IAdaptationHandler inter-
face that is used by AdaptationController. Each AddptationHandler also instantiates an
appropriate StateMappingHandler, which performs any required state transfer to map the
state of old component to the new component.

AdaptationHandlers and StateMappingHandlers use ICMAdaptReady interface of
ComponentManager to communicate with the components to coordinate the adaptation

134

 

and state transfer. ComponentManager delegates the requests received from Adaptation-

Handlers and StateMappingHandIers to the active component.

 

 

I AdaptationControIlor I

\
\
\
\
\
\

 

AbstractAdaptaﬁonHandler
adapt()
if F

AdaptationI-Iandlor_1 AdaptationI-Iandlor_N
adapto ' ' ' ﬂap“)

lAdaptationHandleF—‘b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 9.6: Adaptation handlers.

We now give a brief overview of delegation pattern used by ComponentManager.

9.3.11 Delegation Pattern

Delegation pattern is a technique where a component outwardly expresses some behavior,
but delegates the actual responsibility of implementing that behavior to another component.
Delegation is a way of extending and reusing a class by writing another class with additional
functionality that uses instances of original class to provide the original functionality. Chain
of responsibility and observer pattern [89] use delegation pattern. The sample code in Java
programming language that illustrates delegation pattern is shown in Figure 9.7.

In this example, ComponentManager class has instances of two components Compo-
nentX and ComponentY. ComponentManager, ComponentX, and ComponentY all imple-
ment [Component interface. ComponentManager maintains an active instance of one com-
ponent (which can be changed). ComponentManager can delegate the call to foo function
to either ComponentX or componentY.

135

 

 

// Common Interface

public interface [Component {
public void foo();

}

// Component X
public class Componentx implements [Component {
public void foo() {
// Component X implementing foo
}

}

// Component Y
public class ComponentY implements lComponent {
public void foo() {
// Component Y implementing foo
}

}

// Component Manager
public class ComponentManager implements lComponent {
lComponent instance = new ComponentX();

// For Delegation to X
public void toX {

instance = new ComponentX();
}

// For Delegation to Y
public void toY {

instance = new ComponentY();
}

// foo is delegated
public void foo () {

instance.foo();
}

 

Figure 9.7: An example of delegation pattern.

136

 

9.3.2 Component Family Instantiation

In the previous subsection, we discussed the different interfaces of a component family.
In this section, we discuss instantiation of a component family. We consider two cases
depending on the type of components represented by the component family: (2') local com-
ponents, and (ii) distributed components.

We use the term local components to refer to components that are installed at a single
process, and distributed components to denote components installed across multiple pro-
cesses. Examples of adaptations involving local components include: (2') replacing local
monitoring component that monitors some environmental conditions and takes appropriate
actions, (ii) replace logging component that stores a log of events at a process. In this case,
the application of component family is relatively easy. The issues related to inter—process
synchronization do not occur in this case, as a result, AdaptationHandlers are also less
complex. We focus our discussion on distributed components as it offers more challenges
and the results in the case of distributed components can be trivially applied in the case of
local components.

We ﬁrst identify the challenges during adaptation involving distributed components that
any implementation of a component family needs to address and then discuss our imple-

mentation.

o Atomicity. When a distributed component is replaced in an application, we need
to ensure atomicity. In other words, all local fractions of the distributed component
should be replaced across all or selected processes of the application or none should

be replaced.

137

_t'ﬂ‘lil: __ .'l 'l‘

 

0 Minimal blocking Another challenge is that during adaptation involving a dis-
tributed component, the application should not be blocked. While it is desirable that
the adaptation of a distributed component be entirely non-blocking, it is not always
possible to do so due to dependency among component fractions. We, therefore,

require that blocking introduced during the adaptation be minimal.

0 Synchronization. To avoid interference between concurrent versions of distributed
components, some global synchronization is often required. For example, if a pro-
cess that has added a component fraction interacts with another that has not then the

results can be unpredictable.

o Transparency and noninterference. The execution of an application that is using

a distributed component should not be affected while the adaptation is underway.

The code that addresses these challenges is implemented as AdaptationHandlers in
the component family. The design of component family separates out the implementation
of AdaptationHandlers, thereby, allowing different ways to deal with these challenges.
Several implementations of adaptation logic, written by different people can be part of
the same family and an appropriate implementation can be chosen for execution based on
environment conditions and application requirements. In the next section, we discuss one
particular implementation.

Remark. For sake of brevity, we will use the term component to mean distributed

component in the rest of this chapter.

138

9.3.3 Component Family Implementation

In this subsection, we discuss an implementation of AdaptationHandler, which speciﬁes
adaptation logic to replace one component by another. ComponentManager at one of the
process acts as initiator. DecisionMaker at the initiator process may be the only active
DecisionMaker in the system, or if there are DecisionMakers at other processes then the
DecisionMaker at the initiator process may coordinate with DecisionMakers at other pro-
cesses to decide on adaptation. Regardless, once ComponentManager at the initiator pro-
cess learns of the decision and it decides to adapt, it informs AdaptationController about it.
The Adaptation Controller will instantiate appropriate AdaptationHandler that will execute
the adaptation. AdaptationHandlers at different processes coordinate to execute the adap-
tation. We use a variation of distributed reset protocol [96, 97] to achieve inter-process
synchronization among component fractions. We ﬁrst discuss the distributed reset protocol

in brief and then give details of how it is used to achieve synchronization during adaptation.

9.3.3.1 Distributed Reset Protocol

The reset subsystem in [96, 97] can be embedded in an arbitrary distributed system to allow
processes to reset the system to a given global state. In the model described in [96, 97],
each process consists of an application module and a reset module. The application module
at any process may begin the reset operation. The ﬁmction of a reset module is to (I) reset
the state of the application to a state that is reachable from the given global state, and (2)
inform the application module when the reset operation is complete.

Each reset operation satisﬁes the following two properties: (1) Every reset operation is

139

ma' Ill-F
.

 

 

I1

non-premature, i. e. , if the reset operation completes, then all processes have been reset and
the program state is reachable from the given global state, and (2) Every reset operation
eventually completes, i. e., if an application module at a process initiates a reset operation,
eventually the reset module at that process informs the application module that the reset
operation is complete. The reset solutions in [96, 97] allow the program computation to
proceed concurrently with the reset, to any extent that does not interfere with the correct-
ness of the reset.

To simplify the reset operation, the algorithms in [96, 97] maintain a rooted spanning
tree of all non-failed processes. It uses this spanning tree to perform a difusing computation
[76] in which each process resets its state. The diffusing computation begins at the root of
the spanning tree. The root of the tree resets the state of its local application module and
initiates a reset wave that propagates along the tree towards the leaves; whenever the reset
wave reaches a process, the process resets the state of its local application module and
propagates the reset wave to its children. After the reset wave reaches a leaf, it is reﬂected
as a completion wave towards the root. A process propagates the completion wave to its
parent when it receives the completion wave from all its children. The reset is complete
when the root receives the completion wave from all its children.

We now discuss how the distributed reset protocol is used to replace a component across

a distributed application.

9.3.3.2 Using Distributed Reset Protocol for Adaptation

AdaptationHandlers at all the processes communicate using a variation of the distributed

reset protocol. The reset protocol consists of two waves: a reset-initialization wave, and

140

 

r"— "‘.:‘.'I' I...- ‘l"-
4

a replacement wave. The replacement wave consists of two sub-waves, namely, a reset-
transition wave and a reset-completion wave. We ﬁrst present the outline describing the
reset process and then explain each of the reset waves in detail. The AdaptationHandler at
the initiator process initiates the reset by sending the reset-initialization wave. In the reset-
initialization wave, all processes change to the transit state and initialize the component
fraction of the new component. Thus, in the transit state, a process has initialized the
new component fraction, although it is still using the old component ﬁ'action. After all
processes have set themselves into the transit state, the reset-initialization wave completes
successfully. In case any process does not set itself into transit state during the reset-
initialization wave, the reset-initialization wave completes unsuccessfully. The option of
unsuccessful completion is provided to deal with the case where the processes need to
obtain their component fraction remotely and they fail to do so, or sufﬁcient resources
are not available at a process to start the new component. If the reset-initialization wave
completes unsuccessfully, component replacement is abandoned.

Upon successful completion of the reset-initialization wave, the AdaptationHandler at
the initiator starts the replacement wave. The replacement wave begins with the reset-
transition wave from the initiator (root) towards leaves. The AdaptationHandler at each
process receiving the reset-transition wave removes the old component ﬁaction and adds
the new component fraction. It uses services provided by IAdaptReady interface of the
component fraction to ensure safety (correctness) of such replacement. IAdaptReady in-
terface of each component fraction provides checkState fmetion to determine the state of
the component fraction. After a leaf process has completed the replacement of its com-

ponent fraction, it sends the reset-completion wave to its parent. Further, if a non-leaf

141

 

process has completed the replacement of its component ﬁaction and it has received the
reset-completion wave from all of its children, it propagates the reset-completion wave to
its parent. The reset-completion wave eventually reaches the initiator. We allow another
reset to start once the ﬁrst reset is completed (successfully or unsuccessfully). We now
explain the reset waves in detail.

Reset-initialization wave. The AdaptationHandler at the initiator initializes the reset by
sending the reset-initialization wave to all of its neighbors. The AdaptationHandler uses the
adaptation initialization protocol to communicate with other reset modules. The adaptation
initialization protocol communicates information such as the name of the component, the
location of the server where components are available, etc. The AdaptationHandler at each

process that receives the reset-initialization wave performs the following tasks:

1. It sets its parent to the ﬁrst process from which it received the reset-initialization

wave .

2. It propagates the reset-initialization wave to all of its neighbors except its parent.

3. If a process receives the reset-initialization wave again it informs the sender of the

identity of its parent. This information is used to form a tree.

4. If the process that receives the reset-initialization wave is a leaf (i.e., no neighbor
process has set its parent to this process), it initializes the new component and sets
itself into the transit state, where it is still using the old component while waiting to
use the new component. If the process fails to initialize the new component, it sets
itself into error state. The process then communicates its state (transit or error) to its
parent.

142

 

5. When a process has received transit state message from all of its children, it sets itself
into transit state by initializing the new component. If it receives the error state infor-
mation from any of its children or if it fails to initialize the new component fraction,
it sends the error state message to its parent. Eventually, the root process receives
the transit state or the error state information from its children. If it receives the error
state information from any of its children, it can restart the reset-initialization wave
or abandon the component replacement based on the threshold value set for the num-
ber of reset-initialization waves that can be initiated. If the component replacement
is abandoned, other processes would be informed about it so that they can return
to normal state. If the root process receives transit state information from all of its

children, it initializes the new component and sets itself into transit state.

Reset-transition wave. When all processes are in transit state, i.e., at the successful
completion of the reset-initialization wave, the AdaptationHandler at the initiator starts
the reset-transition wave. The AdaptationHandler at each process that receives the reset-

transition wave performs the following tasks:

1. It propagates this wave to all of its children.

2. It invokes the checkState function of the component, which returns one of the three

values: safetoremove, safetoblock or unsafetoremove.

(a) If the function returns safetoremove, the AdaptationHandler requests the C om-
ponentManager to remove the old component fraction and activate the new

component. It then sets itself into the normal state.

143

_- X...“ l .

 

fl ' ‘HO

(b) If the function returns safetoblock, the AdaptationHandler requests the Compo-
nentManager to block the component fraction. It periodically calls checkState
until it returns safetoremove. After a component fraction receives information
about other component fractions being removed or other application processes
being blocked, eventually, checkState function at the blocking process will re-

turn safetoremove. When that occurs, we follow case 2a.

(c) If the function returns unsafetoremove, it is periodically invoked till it returns
safetoblock or safetoremove, in which case we follow the case 2b or 2a respec-

tively.

During the transition phase, AdaptationHandler at each process will normally invoke
a sequence (zero or more) of operations provided by ICMAdaptReady interface. These
operations are invoked before or after each invocation of checkState. The details of what
operations are invoked are dependent on the adaptation logic and are speciﬁc to the appli-
cation and the component at hand. Typically, these operations are related to state transfer
and to ensure that eventually checkState function returns safetoremove.
Reset-completion wave. The transition wave is reﬂected towards the initiator (root)
as the reset-completion wave. The leaf process sends the reset-completion wave to its
parent after it removes the old component fraction and starts using the new component
fraction. Any non-leaf process, which completes the component fraction replacement and
receives the reset-completion wave from all of its children, sends the reset-completion wave
to its parent. Once the initiator has replaced its component fraction and received the reset-

completion wave from all of its children, the component replacement is complete.

144

 

9.3.3.3 Customizing Reset Protocol

In the above discussion, we laid out a general overview of how the reset protocol is used
in adaptation. However, depending on the application context, each implementation of
AdaptationHandler customizes the reset protocol. For example, depending on the compo-
nents and the application, the adaptation logic to replace the component may not require
all of the two reset waves. In the case of mixed-mode adaptation discussed in Chapter 6
we need only one reset wave. Moreover, the operations invoked at each process during
the reset waves will normally vary depending on the components at hand. Speciﬁcally, as
discussed above, during transition wave each process will invoke a sequence of operations

of ICMAdaptReady interface based on the adaptation logic.

9.3.3.4 Fault-Tolerant Reset Protocol

By treating the reset protocol discussed earlier as an intolerant application and using the
fault-tolerance components from [96, 97], we can make the reset protocol fault-tolerant.
If we were to add the fault-tolerance component from [96], the resulting protocol will
ensure that stabilizing fault-tolerance [59] is provided to faults including process/channel
failures/repairs and transients. Thus, even if these faults occur, eventually the application
will recover to a state from where subsequent component replacements will be correct.
If we were to add the fault-tolerance component from [97], in addition to the stabilizing
fault-tolerance to these faults, the resulting protocol will provide masking fault-tolerance to
process/channel failures/repairs. Thus, if only process/channel failures/repairs occur then

the component replacement will be always correct. Moreover, if more general faults such as

145

 

 

transients occur then the protocol will recover to a state from where subsequent component

replacements will be correct.

9.4 Case Study: Leader Election Component Family

In this section, we discuss the case study of leader election component family, which we
denote by LEFamily. LEFamily consists of components that implement leader election
protocol. In Chapter 6, we discussed the abstract model of two leader election protocols,
namely ldrId and ldrVal, and mixed-mode adaptation from ldrId to ldrVal. In this sec-
tion, we discuss the details of different interfaces and implementation of different parts of

LEFamin.

   

 
 

«interface»

   
    
 
      
   
   

«interface»
lComponent

  
 

«interface»

  

«interface»
IDecisionMaker

   

«interface» «interface»

«interface»

Figure 9.8: Interfaces of the leader election component family.

146

9.4.1 Interfaces of the Family

The interfaces of LEFamily are shown in Figure 9.8. We now discuss the ﬁinctions provided

by each interface:

0 lComponent. This interface provides functions supported by leader election com-

ponents.

— initial i ze function is used by the application to initialize the leader elec-
tion component before using it. The initialization parameters that a component

needs from an application include host address and neighbor list.

— getLeader function returns the address of the leader. In our implementation,
getLeader function blocks if leader election is underway. getLeader can
also have a non-blocking implementation that will return the address of the last

known leader process.
- startElect ion function is used by the application to start the election.

— i 5 Per f ormingE l e c t i on function is used by the application to check if the

protocol is performing election.

0 IAdaptReady. This interface provides functions that correspond to adapt-active parts
of the component. They include functions that are used by AdaptationHandlers to

perform adaptation.

— adapt Ini t i al i z e function is used to initialize the component during adap-

tation. It uses StateMappingObject to read the state of the previous component.

147

 

— checkState function returns one of the three values: safetoremove, safeto-

block or unsafetoremove.
— block function blocks the component from starting a new election.
— unblock removes the block so that the component can start a new election.

- remove function releases any resources and transfers the current state of the

component to StateMappingObject.

o IAdaptReadyComponent. This interface extends [Component and IAdaptReady in-

terface. Both the leader election components in LEFamily implement this interface.

0 ICMAdaptReady. This interface extends IAdaptReady and provides additional
functions that are used by AdaptationHandlers to perform adaptation. It includes
act ivateComponent function that activates the new component (so that calls to

ComponentManager are now delegated to the new component).

0 lAdaptationController. This interface provides getCurrent Component that is

used by AdaptationControlIer to get the current active component.

0 IDecisionMaker. This interface provides getCurrent Component that is used
by DecisionMaker to get the current active component and not if yDecis ion to
inform ComponentManager of the decision. The ComponentManager provides this
interface only at the initiator node (a designated node that will initiate adaptation) as

DecisionMaker is only available at the initiator node.

0 IComponentlIlanager. This interface extends lComponent, ICMAdaptReady,
lAdaptationController and IDecisionMaker interfaces. IDecisionMaker interface is

148

 

 

 

j lAdaptationHandler

 

 

 

 

 

AdaptationController AbsmctAdaptaﬂonHandIOr
DecisionMaker cManager: lAdaptationController """ adapt()
cManager: IDecisionMaker aHandler: lAdaptationHandler

 

 

 

 

 

 

 

 

i

 

 

 

 

 

 

 

 

 

 

 

QuisconceAdaptationHandler

 

 

MixodModoAdaptationHandIor

 

 

cManager: ICMAdaptReady

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cManager: ICMAdaptReady

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LEc°inp°"°"m'"'°" IsHandler: IStateMappingHandler IsHandIer: lStateMappingHandIer
dMaker :1 DecrsionMaker adapt() d t
aControIIer : AdaptationControIIer initWave() 3mm 39, e O
aComponent : lAdaptReadyComponent transitWaveO t ra n sitWav e()
in “ializeo completeWaveo : I
startE Iection() Ir i I l
getLeadero <_ _______ I I i i
isPerformingElectionO I. i :
adaptlnitialize() ( ---------- I: ----------------- I
checkStateO g ------------- 1 ------------------- . :
block() (— --------- . i I :
uanocko : \l/ l I
remgvd) C QuiesceneSMHandIer Mixedﬂodesm-Iandler
gzttivggzndmgtrfgriﬁyto cManager: ICMAdaptReady cManager: ICMAdaptReady
notifyDecisiono storeStateOi storeState()
' : I IrestoreStateo restoreStateO
I l ' i (ID
I I ' I
I : IComponentManager I IStateMappingHandler : lStateMappingHandIer
l l
: : i :
I I _______ I
I I ' I
: : ' . . I
i i : ——————————————————— )I StateMappmgObIectk-
l {I\ _ _ _ _’| ___________________ I
: ‘. I
I I I I
\'/ I \I/ I
r LaadarElectionld I LoaderEIectionVal

 

IsObj : StateMappingObject

 

 

initializeO
startElectionO
getLeader()
isPerformingElectionO
adaptlnitialize()
checkState()

block()

unblookO

remove()

 

IsObj : StateMappingObject

 

 

 

 

 

 

J3 lAdaptReadyComponent

1’ lAdaptReadyComponent

Figure 9.9: Classes of the leader election component family.

extended by IComponentManager only at the initiator node.

149

 

'1'“ V

9.4.2 Classes of the Family

The class diagram of LEFamily is shown in Figure 9.9. It consists of the following classes:

0 LEComponentManager. This class implements [ComponentManager interface. At
the initiator node, it creates and maintains an instance of DecisionMaker. It in-
forms DecisionMaker once it starts using a new component. It also maintains an
instance of AdaptationController. At the initiator node, it informs AdaptationCon-
troller of the new component that need to be installed. At the non—initiator nodes
it creates an instance of AdaptationController, which listens on the adaptation port.
The init ial i ze function of LEComponentManager class is used to create an in-
stance of LeaderElectionId or LeaderEIection Val. It delegates the calls received on

[Component and IAdaptReady to the active instance of the component.

0 LeaderElectionId. This class implements leader election protocol based on process

id. The abstract version of the protocol is discussed in Chapter 6.

o LeaderElection Val. This class implements leader election protocol based on process

value. The abstract version of the protocol is discussed in Chapter 6.

o DecisionMaker. This class is installed only at the initiator node and it notiﬁes
LEComponentManager of the decision. In our current implementation, Decision-
Maker gets the decision (on when and what to adapt) directly from the user and

passes that decision to LE ComponentManager.

o AdaptationController. This class uses strategy pattern [89] to instantiate the class
that implements the adaptation logic. It has two implementations:

150

 

- Initiator. At the initiator node it gets the input from ComponentManager.
Upon receiving the input, it initializes the appropriate AdaptationHandler (Qui-
escenceAdaptationHandler or MixedModeAdaptationHandler). In our current
implementation, we let the user choose the type .of adaptation. Another possi-
bility is to let AdaptationController decide itself the type of adaptation based

on some rules.

- Non-Initiator. At the non-initiator node, AdaptationController listens on the
adaptation port for requests from the other nodes. Based on initialization pa-
rameters received during reset-initialization wave, it instantiates an appropriate

AdaptationHandler.

The AdaptationController can be extended to check for the availability of the new
component, to retrieve the new component from the server, and to verify if the new
component satisﬁes the contract of the family. In our current implementation we

assume that the new component is available and is veriﬁed.

o AbstrachaptationHandler. In our current implementation, this class provides ab-
stract adapt function, whose behavior is deﬁned in the derived classes. This class

can be extended to provide behaviors common to all AdaptationHandlers.

o QuiescenceAdaptationHandler. This class is derived from AbstractAdaptation-
Handler and it implements the quiescence adaptation between LeaderElectionId and
LeaderElection Val. At the initiator node, it acts as the root of the reset protocol, and

at non-initiator nodes it acts as a participant in the reset protocol.

151 ,

 

o MiredModeAdaptationHandler. This is similar to QuiescenceAdaptationHandler

except that it implements mixed-mode adaptation.

o QuiescenceSMHandIer. This class is used by QuiescenceAdaptationHandler to do
state mapping from LeaderElectionId to LeaderElection Val and vice-versa during
adaptation. It gets a reference to LEComponentManager from QuiescenceAdapta-
tionHandler, and also maintains StateMappingObject. It stores the state of the old
component in StateMappingObject and uses it to initialize the state of the new com-

ponent.

o MixedModeSMHandler. This class is similar to QuiescenceSMHandIer, however,

it implements the mapping required by mixed-mode adaptation.

o StateMappingObject. This class stores the state of the components during adapta-
tion. The reference to this object is passed to the old component, which stores its
current state in this object. The StateMappingHandlers use appropriate state map-
ping on this object to map the state of the old component to the new component, and
then passes the reference to this object to the new component during initialization of

the new component.

9.4.3 Performance Results

When comparing performance of adaptive systems with non-adaptive systems, it is rea-
sonable to assume that adaptive systems will have some performance overhead. When

the system is not adapting, the adaptive system is functionally equivalent to corresponding

152

.'

 

non-adaptive system. Therefore, there should be little or no overhead from the mechanisms

that make non-adaptive systems adaptive.

 

 

 

 

 

650
‘3‘
5 600 ~
0
E lWith Family
a El Without Family
E 550 ,
>
<

500 - - '

Node 2 Node 3 Node 4 Node 5

 

 

 

 

Figure 9.10: Component family performance.

We now present experimental results to show that the component family design does
not impose any signiﬁcant overhead on the use of components. The chart in Figure 9.10
shows the average time of leader election (using ldrId component) at each node for the two
cases: (2') application using the leader election protocol directly, and (it) application using
the component family. The time of leader election is calculated from the instance when
the application makes a request for leader election. In the case of component family, the
request by application for leader election is delegated to the active component (in this case
ldrId). We observe that the time for leader election in both cases is almost similar. At three
nodes (node 1, node 2 and node 4) the time for leader election is less when component
is used directly, whereas at the other two nodes (node 3 and node 5) it is slightly more.
We conclude that overhead induced by component family is negligible. In particular, any

overhead due to delegation is within the variance of the time for leader election.

153

9.5 Case study: Reliable Communication Component
Family

In this section, we discuss the component family consisting of components that provide
reliable communication. We describe abstract version of the component family in this
section. The details of concrete version (interfaces and classes) are similar the case study of
Section 9.4. There are various components available for providing reliable communication.
We consider three components in this family (cf. Figure 9.11), namely, (2') forward error
correction (f ec) component, (ii) acknowledgment (a ck) component, and (iii) forward
error correction with acknowledgment (f e cAck) component. Components f ec and ack
are similar to the proactive component discussed in Chapter 4 and the reactive component
discussed in Chapter 5, respectively. For simplicity, we discuss only one sender and one
receiver; however, the case study can be easily extended for multiple sender and multiple

receivers.

 

fecAck
ec ac

Figure 9.11: Reliable communication component family.

 

 

 

 

 

f k

 

 

 

 

 

 

9.5.1 Components of the Family

We now discuss abstract models of the components and the adaptations involved in the

component family. In Chapters 4 and 5, we gave details of these components. The com-

154

 

ponents discussed in this chapter have unbounded buffer, whereas the reactive component
discussed in Chapter 5 had bounded buffer. Also, while the reactive component in Chapter
5 used both acknowledgments and negative acknowledgments, ack component discussed
in this chapter uses only negative acknowledgments.

Forward Error Correction (fec) Component. f ec component is used to deal with
message loss by sending extra packets. The receiver can recover the lost packets without
requesting retransmission of lost packets. fec component consists of two types of frac-
tions: encoder and decoder. The encoder fraction is added at the sender process and the
decoder fraction is added at the receiver process. The encoder takes (77. — k) data packets
and encodes them to add I: parity packets. It then sends the group of n (data and parity)
packets. The decoder needs to receive at least (n — 1:) packets of a group to decode all the
data packets. Thus, if less than 1: packets in a group are lost then the receiver can recover
the lost packets by using the extra packets from that group.

The abstract version of f ec component is shown in Figure 9.12, which consists of
encoder fraction s_fec at sender process 3 and decoder fraction r_fec at receiver process
r. The encoder fraction reads the input from 3Q, encodes the input and sends it to r. The
decoder fraction receives the input from s, decodes it and delivers it to the output listener,
rQ.

Acknowledgment (a ck) Component. ack component deals with message loss by re-
transmitting the lost packets. It uses negative acknowledgments to conﬁrm the message
loss. It consists of the encoder fraction at the sender and the decoder fraction at the receiver.
The encoder fraction adds a group and a packet number in each packet. If it receives a neg-

ative acknowledgment for any packet, it sends that packet again to the decoder fraction.

155

 

 

Component fec

Fraction s_fec

inp sQ : queue of integer
r
var n, k, u. l, m : integer {initially, u = l = m = 0}
ean : array [integer, O..n — 1] of integer {initially, ean = I}
begin
encode : —-isEmpty(sQ) —> ean[u, O..n —- 1] :2 fec-encode(head(sQ, n — k));
u := u + 1

ﬂ fec.send : eanIl,m] 7é _L —> 03¢ := Cs,r 0 {I’mIWCQllaml}§
m := (m+ 1) mod n;

 

if m = 0 then
I := l + 1
13
end
Fraction r_fec
inp rQ : queue of integer
3
var n, k, :r, y, p, m : integer {initially, p = 0}
rbqu : array [integer, O..n — 1] of integer {initially, rbqu = _I_}
begin ‘

fec.receive : nisEmpty(C3,r) —> :r,y.m :2 head(Cs,r);
rbquIx, y] := m
[] decode : count(rbqu[p,O..n — 1] 74 _L) >2 (n — k) -—>
rQ := rQ o fec-decode(rbqu[p, 0..n — 1]);
p := p + 1
end

 

 

 

Figure 9.12: Forward error correction component.

When the decoder fraction detects a packet loss it sends a negative acknowledgment to the
encoder fraction. The decoder fraction detects packet loss when it starts receiving packets
from a new group before all the packets of the current group are received. When all packets
in a group are received, it delivers them to the output listener.

The abstract version of ack component is shown in the Figure 9.13, which consists of

encoder fraction s_a ck at sender process 3 and decoder fraction r_a ck at receiver process

156

 

Component ack

Fraction snack

inp 3Q : queue of integer
r
var n, g, p, gnu, pna, m : integer {initially, p = g = O}
snt : array [integer, O..n — 1] of integer {initially snt = _L}
begin

ack.send : -dsEmpty(sQ) —-> sntIg,p] 2: {g,p, head(sQ)};
03,1- 2: Cs,«,~ o sntIg,p];
p := (p+ 1) mod n;
ifp = 0 then
9 := g + 1
ﬁ
[] resend : type(C7~,3) =nack —> gnaIPna I: head(Cr,s)§
Cs,r 3= 03,7: 0 sntlgnaapna]
end

Fraction Lack

lap 70
3
var n, k, x, y,p, m : integer {initially,p = 0}
rbqu : array [integer, 0..n — 1] of integer {initially, rbqu = _L}
begin
ack.receive : —uisEmpty(Cs,,~) —> :1:,y,m := head(C3,r);
rbqulir, :9] == m
[] deliver : count(rbqu[p, 0..n — 1] sé _L) = n -—>
rQ := 7‘62 0 rbquIp, 0..n — 1];
p := p + 1

[] sendmack : count(rbqu[p + 1,0..n — 1] 75 _L) > O -+
for k = O to n —- 1
ifrlmeIp, k] = i then
CT,8 := 07330 nack(p, kt)
ﬁ
end

 

 

 

Figure 9.13: Acknowledgment component.
r. Similar to f ec component, the encoder fraction has 362 and r as the input parameters.

The decoder fraction has IQ and s as the input parameters.

157

_
‘ I

 

 

Component fecAck

Fraction s.fecAck
inp sQ

7.
var n,k,u,l,m

ean
ackMode

begin
encode

[] resend

end

 

[] fecAck_send :

: queue of integer

: integer {initially, it = l = m = 0}
: array [integer, 0..n — 1] of integer {initially, ean = _L}
: boolean {initially, true}

: -IisEmpty(sQ) ——> eanIu,0..n — 1] :=

fec_encode(head(sQ, n — k));
u := u + 1
eanIl, m] yé _L ——> (73¢ 2: 03,7: 0 {l,m,ean[l,m]};
if ackMode = true then
snt[l,m] = {l,m,ean[l,m]};

ﬁ

m:= (m+1)modn;

ifm=0then
l:=l+1

fl

ackMode —_-—. true /\ type(Cr,3) = nack -——>
gnarpna 3: head(Cr,s);
03,7: 3: 03,7: 0 sntlgnaapnal

"'l

 

 

 

Figure 9.14: Forward error correction with acknowledgment: sender fraction.

Forward Error Correction with Acknowledgment (f ecAck) Component. f ecAck
component uses both forward error correction and acknowledgments. If the rate of message
loss is high and more than k packets are lost in a group, then negative acknowledgments
can be used for retransmission of the lost packets. If the rate of message loss is low and

less than k packets are lost in a group, then the receiver can recover the lost packets without

requesting any retransmission.

The abstract version of f ecAck component is shown in Figures 9.14 and 9.15, which

consists of s_f e cAck fraction at the sender process and r_f e cAck fraction at the receiver

158

 

Component fecAck
Fraction r_fecA ck

inp rQ : queue of integer
5

var n, k, :r, y,p, m : integer {initially,p = 0}
rbqu : array [integer, O..n — l] of integer {initially, rbqu = J_}
ackMode : boolean {initially, true}

begin

fec.receive : -dsEmpty(CS,r) —> :c,y,m :2 head(Cs,7~);
rbquIx, y] := m
[] decode : count(rbqu[p,0..n — I] 75 .1.) >= (n — k) —+
rQ :2 rQ o fec-decode(rbqu[p, 0..n — 1]);
p := p + 1
[] send_nack : ackMode = true /\ count(rbqu[p + 1,0..n — 1] # _L) > 0 —->
for k = O to n — 1
if rbqu[p, k] = J. then
Cr”, 2: C730 nack(p, 1:)
ﬁ
end

L"! ‘35..- as "-01.5: ‘ Cl -‘

 

 

 

 

Figure 9.15: Forward error correction with acknowledgment: receiver fraction.

process.

9.5.2 Adaptations of the Family

We now consider the adaptations that exist in this family as shown in Figure 9.11. There
are four adaptations that exist in this family, namely, (1) fecAck to f ec, (2) fec to
f ecAck, (3) f ecAck to ack, and (4) ack to f ecAck. The arcs in the graph are labeled
accordingly. (Note that the maximum number of possible adaptations in a family of three
components is six, and the minimum number of required adaptations to keep the graph
strongly connected is three).

The adapt-active parts of each fractions that are involved in adaptations are shown in

159

3";

 

Adapt-active parts for adaptation 1 : f ecAck to f ec

Fraction: s_fecAck
“113 : true —> ack-mode :2 false

a128: 3115 Aallr —> sremove s_fecAck[n,k,u,l,m,eQ]
Fraction: r_fecAck
“111' : true -—> ack-mode := false

012,. : 3115 Aallr —-> sremove r_fecACk [n,k,p,m,rbqu]
Fraction: s_fec

c1123: 3125(s_fecAck) ——> sadd s_fec[n,k,u,l,m,eQ]
Fraction: r_fec
0121': al2r(r_fecAck) —+ sadd r-fec[n,k,p,m,rbqu]

Adapt-active parts for adaptation 2 : fec to f ecAck
Fraction: s-fec
0213: true —-> sremove s_fec[n,k,u,l,m,eQ]
Fraction: r-fec
a21,. : true ——> sremove r_fec [n,k,p,m,rbqu]
Fraction: s_fecAck
0213 : 3215(s_fec) —> sadd s_fecAck [n,k,u, l,m,eQ]; ackMode :2 true

Fraction: r-fecAck
021,. : 321r(r_fec) —+ sadd r-fecAck [n, k,p,m,rbu,fQ]; ackMode := false

022,. : 3215 —> ackMode := true

 

 

 

Figure 9.16: Adaptations l and 2 in reliable communication component family.

Figures 9.16 and 9.17. The adapt-active parts of the fractions contain atomic adaptations
that are associated with them. The name of each atomic adaptation has three subscripts
(e.g., a133). The ﬁrst subscript denotes the adaptation (of. label of the arc in Figure 9.11).
The second subscript denotes the order in the sequence of atomic adaptations. If two atomic
adaptations has the same second subscript, it means that they can be executed in any order.
The third subscript denotes the process that the atomic adaptation is associated with. For
example, atomic adaptation a133 denotes that in adaptation 1 (f ecAck to f e c), it is third
in the sequence of atomic adaptations and it occurs at process .9.

Adaptation 1: f ecAck to f ec having enhanced-primitive relationship. Consider fec

and f ecAck components as shown in Figures 9.12, 9.14 and 9.15, respectively. fecAck

160

 

P.n.wi"£&u'al u. l._ h l

 

Adapt-active parts for adaptation 3 : f ecAck to ack

Fraction: s_fecAck
c1318: true —> block encode

a333: 332r —> remove s-fecAck
Fraction: r_fecAck

a32r: 3315 A eanIl,m] = _L /\ nisEmpty(Cs,7~) —> remove r_feCACk
Fraction: s_ack

(1343: 3335 /\ a33r --> add S_aCk
Fraction: r_ack

0.337.: 832, —> add r-ack

Adapt-active parts for adaptation 4 : ack to f e cAck'

Fraction: s_ack
“41.93 true -—» block send

a433: 342, —+ remove s-ack
Fraction : r_a ck

a42r: 3415 A /\ nisEmpty(C5,r) —> remove r_ack
Fraction: s_fecAck

0.443: 3435 /\ 343, —-> add S_feCACk
Fraction: r-fecAck

0.437.: 342r ——> add r-fecAck

 

 

 

 

Figure 9.17: Adaptations 3 and 4 in reliable communication component family.

component is syntactically compatible with f ec and it also provides all the services that
f ec provides. In this case, f ecAck is an enhanced version of f ec, which is a primitive
version. During the adaptation that replaces f e cAck to f e c, the fractions can be changed
arbitrarily, provided (27) f ecAck component is running in a primitive mode before the
adaptation begins, and (it) state of f ecAck component is transferred to f ec component.
As shown in Figure 9.16, ﬁrst ack-mode is set to false (ans and (1117‘); as a result the
f ecAck component is now running in primitive mode, i.e., in a mode compatible with
f ec. Now, the ﬁactions can be changed arbitrarily (0.125 and a 127.). There are two atomic
adaptations named a123, one associated with fraction s-fecAck and another associated

with fraction s_fec. This implies that sremove and sadd can be considered as an

161

I”

atomic action that affects both fractions s-fecAck and s_fec. Similarly, removal of
r_fec and addition of r-fecAck is done in an atomic manner. Note that sremove and
sadd carry an extra argument, which represents the state information that is transferred
from the existing component to the new component. In this case, the state of f ecAck
component is transferred to f ec component. The adaptation lattice corresponding to this

adaptation is shown in Figure 9.18(a).

 

(a) (b)

Figure 9.18: Adaptation lattices.

Adaptation 2: f ec to f ecAck having primitive-enhanced relationship. As discussed
in the previous adaptation, in this case, the fractions can be changed arbitrarily, as the
components share a primitive-enhanced relationship. We need to ensure that (2') state of
f ec component is transferred to f e cAck component, and (it) f e cA ck runs in a primitive
mode till the adaptation is complete. Initially, ack_mode is set to false, so f ecAck runs in
a mode that is compatible with f ec. After the adaptation is complete, ack-mode is set to

true. The adaptation from f ec to f ecAck is as shown in Figure 9.16 and the adaptation

162

 

lattice corresponding to this adaptation is shown in Figure 9.18(b).

Adaptation 3: f e cAck to ack. The adaptation that replaces f e click to ack is as shown
in Figure 9.17. Unlike adaptations 1 and 2, components f ecAck and ack do not share
an enhanced-primitive relationship, because f ecAck component is not designed to run
in ack-only mode. Speciﬁcally, fecAck.send action of f ecAck is not compatible with
ack.send action of ack. Hence, f e cAck component does not provide all the functionality
that ack component does. Therefore, the fractions cannot be changed arbitrarily. First,
encoder fraction s_fecAck at sender 3 needs to be blocked from encoding more packets
((2313). Once all the packets that are already encoded are sent by s, and received by receiver
r, then decoder fraction r_f e cAck can be removed ((132,). After r_fecAck fraction is
removed, the removal of s_fecAck fraction ((1333) and the addition of r_a ck fraction
(c1337.) can be done in any order. Finally, after s-fecAck fraction at s is removed and
r_a ck fraction at r is added, encoder fraction s_a ck is added at s (a343). The adaptation
lattice corresponding to this adaptation is shown in Figure 9.18(c). (Note that fecAck
and ack could be modiﬁed so that they satisfy the enhanced-primitive relationship. We
have chosen not to do so to illustrate the case where the components are not related by an
enhanced-primitive relationship.)

Adaptation 4: ack to f ecAck. The adaptation that replaces ack to f ecAck is similar
to adaptation 3 discussed above. Since the components do not share an enhanced-primitive
relationship, the fractions need to be changed in a speciﬁc order as shown in Figure 9.17.

The adaptation lattice corresponding to this adaptation is shown in Figure 9.18(d).

163

“iii

 

9.6 Discussion

In this section, we address some questions related to the component family design.
Can there be multiple adaptation paths between two components? Ifyes, then which adap-
tation path should be chosen?

Yes. There can be multiple paths between two components. In this case, additional
factors could be taken into account while deciding the appropriate path for adaptation. To
this end, for each arc, we could associate several factors, e. g. , the time or resources required
for adaptation, types of faults that could be tolerated in the adaptation. Based on the factors
associated with each are, we can compute the characteristics for different paths. These

characteristics can then be used to determine the suitable path.

How do we develop components that satisﬁz the enhanced-primitive relationship?

One way to design such components is to use inheritance. Inheritance provides syn-
tactic compatibility between components. To ensure semantic compatibility, we need to
extend the inheritance relationship, such that, a derived component provides all services
that a parent component provides.

How does the component family design help when adaptation involves components provid-
ing diﬁ‘erentfunctionalities, say security and reliability?

In this case, there will be two separate component families, namely, a family of com-
ponents that provide reliability and a family of components that provide security. The
application will have to perform separate adaptations for reliability and security compo-
nents.

There may exist a scenario, although undesirable as it violates the principle of separa-

164

 

 

tion of concerns, where some component, say C, provides reliability as well as security. In
this scenario, C will be present in both the families. Now, if the application that is using C
to provide security and reliability decides to use only a security component, then the appli-
cation can perform adaptation to replace C with a security component. Also, existence of
such components could be used in adaptations where one needs to trade off between two
desirable properties such as reliability and security. Speciﬁcally, if the application were
to replace a reliability component by a security component (may be because of environ-
ment changes that require it to have security but where reliability cannot be provided due
to other constraints such as energy management) then the application could ﬁrst replace the
reliability component with C and then replace C by the security component.

HOW can we perform the adaptation where some component is removed although not re-
placed by other component?

If such a scenario is desired for a particular component family then that family should
have a default component which is equivalent to having no component at all. For example,
in the context of our case study in Section 9.5, the default component would be one that
provides no recovery for lost messages. Thus, removal of a component is equivalent to

replacing that component by the default component. This approach is similar to that in [3].

How does component family assist in independent development of components?

Consider a scenario, where components, say A and B, are developed for a component
family 1‘. These components can be developed independently and we can still perform
adaptation from A to B and vice-versa, even if direct adaptations were not to exist between

A and B. One way to achieve this would be if adaptations from A and B to some compo-

165

 

nent C and vice-versa were identiﬁed. In this case, adaptations between A and B could be
done via C. Speciﬁcally, this approach is easy to comprehend when a component family
has a primitive component and different enhanced components are developed correspond-
ing to that primitive component. The adaptation between any two enhanced components
can be done via the corresponding primitive component. Examples of such adaptation can
be found in [91].

How can component family design be used to develop parallel adaptation?

To implement parallel adaptation, ComponentManager can be modiﬁed so that it main-
tains active instances of more than one component. It can delegate the request received ﬁom
the application to the appropriate component. Also, the switching algorithms discussed
in [98] for parallel adaptation of distributed agreement protocols can be implemented as

A daptationHandlers.

9.7 Related Work

In this chapter, we introduced the notion of component family to build a systematic and
extensible library of adaptive components. The idea is based on the original notion of
program families proposed by Parnas [86, 90]. A program family is a set of programs (not
all of which necessarily have been or will ever be constructed) for which it is proﬁtable or
useﬁrl to consider as a group. However, the idea of program families does not address issues
related to adaptation. In this work, we try to study different components with adaptation
in mind. Moreover, we focus on components with similar ﬁmctionalities and interfaces. In

other words, our focus is more narrow compared to the general goals for program families

166

 

 

[‘7’"

where components with (slightly) different functionalities and interfaces are also studied
together (for different reasons).

In feature-oriented programming [99, 100] and software product lines [101], goals of
product lines (creating family of related programs) are addressed. Product families pro-
vides an architecture that is based on commonality and similarity. In product families,
various product variants can be derived from the basic product family, thereby, allowing
reuse of products in the family. The main focus of product lines (product families) is from
the perspective of static development and reuse. In contrast, the component family design
addresses dynamic reuse of components. The current design of component family focuses
on components having similar functionalities and interfaces. We believe that the product
family and feature-oriented programming work can be leveraged when we consider exten-
sions to the component family to build a library of adaptive components that have different

interfaces.

9.8 Summary

In this chapter, we presented a systematic and extensible component family design to build
a library of adaptive components. In summary, we discuss the following advantages:

1. Simplifying adaptation between components and enabling independent develop-
ment of new components. Consider the case where an application is using a component
from a component family .7: consisting of n components. In this case, to provide adaptation
between any two components of a family f, we need a minimum of only n adaptations (in

this case, the graph consists of a directed cycle). When a new component is deve10ped that

167

 

will be a part of f, it suffices to have only two more adaptations while still keeping the
graph strongly connected. For example, if a new component E is added to the component
family shown in Figure 9.2, only two adaptations, say from C to E and ﬁ'om E to B, are
enough to keep the graph strongly connected. Also, since every component implements
IAdaptReady interface, it becomes easy to develop adaptation between these components.

2. Simplifying veriﬁcation of adaptation. To verify that the adaptation between com-
ponents is correct (e. g., by using the approach in Chapters 4 and 5), it is required that
after adaptation the component continues to correctly perform its functionality, and spec-
iﬁcation during adaptation is satisﬁed. The separation of adaptation logic from compo-
nent functionality simpliﬁes the task of specifying and verifying adaptation. Moreover, the
adaptation lattice speciﬁcation has a direct mapping to the implementation (Adaptation-
Handlers). Furthermore, if the number of such adaptations is low, then less veriﬁcation
needs to be performed.

3. Reusability of components and adaptations. The design of component family not
only enhances the reuse of components but also promotes the reuse of adaptations between
components. For example, consider two components X and Y and that the adaptation from
X to Y exists. Now, consider a component Z and the adaptation from Z to Y. In this
case, by providing the adaptation from Z to X, the adaptation from Z to Y can be done in
two steps while reusing the adaptation that already exists from X to Y. We note that if the
direct adaptation from Z to Y were to exist, it would not necessarily be fast or simple. In
fact, there are cases, as discussed in next point, where a two (or more) step adaptation is
simpler than a direct adaptation between components.

4. Simplifying adaptation in case of an enhanced-primitive relationship among com-

168

 

ponents. The state—transfer and synchronization during adaptation is in general difﬁcult
between arbitrary components. However, if one component is an enhancement of another
component, then the state-transfer and synchronization can be simpliﬁed in adaptation be-
tween those two components (cf. Section 9.5 for an example). To take advantage of this, we
deﬁne the enhanced-primitive relationship between components. We say that a component
A is an enhanced component of component B (respectively, component B is a primitive
component of A) iff A is syntactically and semantically compatible with B, i. e., it extends
the interface of B, and it provides all services that B provides.

Now, consider a scenario where A is an enhanced component of B and that B is be-
ing replaced by A. In this scenario, fractions of B can be replaced in an arbitrary order
by fractions of component A, as each fraction of A can provide the required service to
the remaining fractions of B. Moreover, the fractions of A can communicate with the re-
maining ﬁactions of B using a protocol that the latter understands. Thus, in this case, the
synchronization requirement among component fractions is relaxed, and also transferring
state to/from primitive component is easy.

In a case where two enhanced components, say C and D, are derived from the same
primitive component, say Z, the adaptation from C to D can be done in two steps; by ﬁrst
replacing C by Z and then replacing Z by D. Since the adaptation for enhanced-primitive
relationship is relatively easy, the direct adaptation from C to D may not necessarily be fast
or easy. We note that the idea of an enhanced-primitive relationship can be extended to have
a multi-level hierarchy of components, where components at a higher level are enhanced
version of components at lower level.

5. Easy replacement of adaptation logic. An adaptation from one component to another

169

 

 

component can be performed in several ways. For example, we discussed quiescence and
mixed-mode adaptations in this dissertation. Moreover, we used reset-based approach to
achieve synchronization among processes in distributed system during adaptation. How-
ever, other approaches or variations of reset protocol can be used for synchronization. The
environment conditions or resource constraints may determine which adaptation logic will
provide optimum performance. The component family design has AdaptationHandlers that
implement adaptation logic and AdaptationController can choose among different Adap-
tationHandlers that will execute the adaptation. This clear separation of adaptation logic

makes it easy to replace between different adaptation logics.

170

 

Chapter 10

Conclusion and Future Work

 

Software systems undergo adaptation for reasons that include changes in environment con-
ditions, ﬁxing of some bugs, or changes in requirements. In the case of distributed systems,
adaptation causes changes to multiple processes. It is required that these changes to mul-
tiple processes be properly synchronized in order for the adaptation to provide acceptable
behavior. Furthermore, in the case of adaptation in distributed systems, the behavior of the
old program and the behavior of the new program may overlap causing mixed-mode behav-
ior during adaptation. To gain assurance in adaptation it is important to formally specify

and verify the behavior of the system during adaptation.

10.1 Contributions

In this dissertation, we made the following contributions:

0 We presented an approach based on adaptation lattice to model the adaptation in

distributed systems [102]. The adaptation lattice approach identiﬁes various overlap

171

scenarios that can occur during adaptation. It classiﬁes system during adaptation into
intermediate programs and speciﬁes the speciﬁcation for the intermediate programs.
Speciﬁcally, various properties that need to be checked during adaptation can be

speciﬁed using adaptation lattice.

We presented an approach based on transitional-invariant lattice to verify the cor-
rectness of adaptation in the absence of faults, and transitional-faultspan lattice to
verify the correctness of adaptation in the presence of faults [102]. In the transitional-
invariant lattice approach, we identify the invariants associated with the intermediate
programs to show the safety during adaptation. In the transitional-faultspan lattice
approach, we identify the invariants and the faultspans associated with the interme-
diate programs to verify the fault-tolerance properties during adaptation. The ap-
proach applies to different types of fault-tolerances, namely, failsafe, masking and
non-masking. The transitional-invariant and transitional-faultspan lattices also sat-
isfy the liveness during adaptation which states that eventually adaptation is com-

pleted.

We discussed the case study of mixed-mode adaptation using leader election proto-
cols [103]. We showed the performance beneﬁts that can be obtained by mixed-mode
adaptation compared to quiescence adaptation. Speciﬁcally, mixed-mode adaptation
reduces the service interruption time and the communication overhead during adap-
tation. We also showed how the lattice-based approach can be used to specify and

verify mixed-mode behavior during adaptation.

a We identiﬁed the tradeoffs that occur in adaptation [104]. The tradeoff shows that

172

’1‘!th all). -

 

increasing concurrency among adaptive actions leads to increase in veriﬁcation com-
plexity. Moreover, reducing concurrency leads to decrease in communication over-
head due to adaptation. These tradeoffs should assist the adaptation developer in
choosing concurrency during adaptation based on complexity of adaptation, adapta-

tion completion time, and resource availability during adaptation.

Because of complexity, speciﬁcation and veriﬁcation of adaptation in distributed sys-
tems is often done on abstract model. To gain assurance in implementation, it is im-
portant that veriﬁcation be supplemented by testing. In this context, we presented an
approach for testing of adaptation using predicate detection techniques [105]. We in-
troduced adaptation vector to classify intermediate program states that occur during
adaptation. We also introduce two classes of predicates that occur during adaptation,
namely, adaptation-stable and adaptation-transient predicates. We showed how ex-

isting techniques for predicate detection can be extended for testing adaptation.

We described component family design to support adaptation and build an adaptation-
veriﬁed library of components. The components in the component family provides
adaptation-related services in addition to the core ﬁrnctionality to support adaptation.
The design of component family also separates the adaptation logic from the compo-
nent functionality, thereby, simplifying the speciﬁcation, veriﬁcation and implemen-
tation of the adaptation logic. The design also simpliﬁes independent development
of new components and adaptation among components. In this context, we have pre-
sented some of the results of component family approach in [106]. The component

family design integrates the framework for adaptation while ensuring separation of

173

 

adaptation logic. The framework supports both mixed-mode and quiescence adap-
tation. A preliminary design of this framework has been published in [3], and in a

recent case study in sensor networks we have used this framework [107].

 

10.2 Future Research

 

This dissertation has created the possibility of several new research directions. Some of

these are outlined below: i

 

0 Automatic generation of lattices. In this dissertation, we used transitional-invariant :—
and transitional-faultspan lattice to verify adaptation. However, identifying the lat-
tice is a complex task. We would like to explore methods to generate these lattices in
a semi-automatic fashion. Speciﬁcally, considering all possible atomic adaptations,
we would like to explore approaches that can generate models of different interme-
diate programs. Also, we would like to explore approaches to identify transitional-

invariants and transitional-faultspans automatically.

0 Component family having components with varied interfaces. In the component
family design discussed in Chapter 9, we considered components having same inter-
faces. We would like to explore ways to extend the design to deal with components

having same ﬁrnctionality but (slightly) different interfaces.

0 Applying component family in new domains. In this dissertation, we presented com-
ponent family design and illustrated it in context of algorithms in group communi-

cation application such as leader election protocols and reliable communication pro-

174

tocols. We would like to apply this design in other domains, such as web services,

middleware components, and other application-level components.

Runtime veriﬁcation. The component family design allows new components to be
included and used in the family at runtime. Currently, we expect that veriﬁcation of
adaptation involving the new component is done ofﬂine. We would like to explore
approaches to do runtime checking of new components and adaptations involving
those components. In this context, we would like to leverage the design of component

family as it separates the adaptation logic from component functionality.

175

 

 

APPENDICES

176

 

.-FE‘IUI ‘L' a.‘ . .9. .

Appendix A

Model-Checking of Adaptive Leader

Election Program

In this appendix, we describe the model-checking of adaptive leader election program using
SPIN. The leader election programs are discussed in Chapter 6. The code of the adaptive
leader election program in Promela language is shown in Section A.l. In Section A.2, we
show the results of model checking safety and liveness properties of the adaptive leader

election program.

A.1 Adaptive Leader Election Program

The adaptive leader election program in promela is shown below. We consider ﬁve pro-
cesses. All processes are initially using ldrId protocol. The program should cover fol-
lowing scenarios for model-checking: (i) adaptation occurs after election, (ii) adaptation

occurs before election, (iii) adaptation occurs while election is underway, and (iv) elec-

177

 

 

 

tion occurs while adaptation is underway. To ensure that above scenarios are covered, we
choose two process (namely, process 3 and process 4) to start the election. As discussed in
Chapter 6, each process performs its adaptive action independent of others. All processes

perform adaptation that replaces the til-rid protocol with ldrl’hl.

#define N 5 /* number of nodes */
#define L 10 /* buffer size of channel */

/* types of messages */
mtype = { LDRID, LDRVAL, ELECT, ACK, LD };

/* structure to store neighbors for a node */
/* nL[i] (0 <= 1' < N) = true if i is a neighbor */
typedef ArrayNeighbors {
bool nL[N] = false
};

/* neighbor list for each node */
hidden ArrayNeighbors thrs[N];

/* computation index (election index) */
typedef complndex {

byte num;

byte id
};

/* defining channel structure */
typedef Arraychannels {
chan ch[N] = [L] of {mtype, mtype, complndex, byte, byte, bool
}
};

Arraychannels A[N];

hidden complndex rcvtmp;
hidden byte junkBytelD;

bool startNodes[N] = false; /* nodes that will start election
*/

byte value[N] = 0; /* value at each node */

bool ADAPT[N] = false; /* boolean variable indicating if node

is adapting */

/* for joining in a election — LDRID */
inline joinElectionld(j) {

178

 

protocolType 2: LDRID && thrslj —- l ].nleyId-Il && A[j - l ].ch[
myld — I l‘?[LDRID(ELECT, rcvsrc ,tmpbyte ,junkBytelD ,tmpbool )] —>
atomic {
A[j — l ].ch[myld—1J?LDRID(ELECT, rcvsrc ,tmpbyte .junkBytelD ,

/*

/=I<
if

fi:

if

tmpbool) —>

if E then check if waiting for leader (LI) might be in
queue?) */
need to remove LD from the queue */

E && waitCount == 0 && A[p—I].ch[myld—-—|]?[LDRID(LD,

rcvtmp , ldr ,junkBytelD , tmpbool )] —>

A[p—l].ch[myld—l]?LDRID(LD, rcvtmp , ldr ,junkByteID ,
tmpbool);

else —-> skip

(!E II (E&& (rcvsrc.num > src.num || (rcvsrc.num ==
src.num && rcvsrc.id > src.id)))) —>

src.num = rcvsrc.num; src.id = rcvsrc.id;
mynum = src.num + l;
waitCount = 0;
loopvar = 1;
do
loopvar <2 N —>
if

(loopvar != j && thrs[myId—l].nL[loopvar—l])

—>

A[myld—l].ch[loopvar -—l]!LDRID(ELECT, src ,tmpbyte
,junkBytelD ,tmpbool);

wait[loopvar—l] = true;

waitCount++;

else —>

wait[loopvar—l] = false;
fi;

loopvar++;
loopvar > N —>

break
od;
if
E —> printf(”MSC:”Stoppingcoldcelection...Joiningc.
new..election\n”);
else —> printf("MSC:..Joining..new..election\n”);
fi;
ack = true; E = true; p = j: max = myld;

179

 

 

‘
I”

E && (rcvsrc .num 2: src.num && rcvsrc.id :2 src.id)

->

A[myld— 1 ].ch[] — I ]!LDRID(ACK, src ,tmpbyte ,junkBytelD ,
false);

:: else -> skip

fi;

}

inline ackReceiveld(j) {
protocolType == LDRID && wait[j—l] && A[j —1].ch[myld—l]?[LDRID
(ACK,rcvsrc , maxchildld ,junkByteID , isChild)] -—>
atomic {
A[j —l].ch[myld——l]?LDRID(ACK, rcvsrc , maxchildld ,junkByteID
, isChild) ->

if
E —>
if
'° ack && src.num == rcvsrc.num && src.id == rcvsrc.
id —>
wait[J—l] = false;
waitCount——;
if
" isChild —>
chd[j—l] = true;
if
max < maxchildld —> max = maxchildld;
else —> skip
fi;
else —> skip
fi;
else —> skip
fi;
else —> skip
fi;

}

/* for joining in a election - LDRVAL */
inline joinElectionVal(j) {
protocolType == IDRVAL && thrs[j —l].nL[myId—l] && A[j —l].ch[
myld —l]?[LDRVAL(ELECT, rcvsrc ,tmpVal ,tmpbool )] —>
atomic {
A[j —- l ].ch[myld—l]?LDRVAL(ELECT, rcvsrc ,tmpVal ,tmpbool) ——>

/* if E then check if waiting for leader (LI) might be in
queue?) */

l 80

 

 

/* need to remove Ll) from the queue */

 

 

if
E && waitCount :2 O && Alp——l].ch[myld—l]?[LDRVAL(LD,
rcvtmp ,tmpVal ,tmpbool )] —>
A[p» l ].ch[myld - I ]‘lLDRVAULD, rcvtmp ,tmpVal ,tmpbool );
else —:> skip
fi;
if
(!E H (E && (rcvsrc.num > src.num || (rcvsrc.num 2:
src.num && rcvsrc.id > src.id)))) —-> W
src.num = rcvsrc.num; src.id = rcvsrc.id; i '
mynum = src.num + l; ";
waitCount = O; I
loopvar = I; 2
do Lg
loopvar <= N —> i.
if =--'
(loopvar I: j && thrs[myId-—l].nL[loopvar—-l])
—>
A[myld — 1 ].ch [ loopvar — l]!LDRVAI.(ELECT, src ,tmpVal
,tmpbool);
wait[loopvar—l] = true;
waitCount++;
else —>
wait[loopvar—l] = false;
fi;
loopvar++;
loopvar > N -—>
break
od;
if
E —> printf(”MSC:”Stoppingcoldcelection...Joiningc
newcelection\n”);
else —> printf(”MSC:..Joininganewcelection\n”);
fi;
ack = true; E = true; p = j;
maxVal.id = myld; maxVal.num = value[myld—l];
E && (rcvsrc.num == src.num && rcvsrc.id == src.id)
->
A[myld—l].ch[j —l]!LDRVAL(ACK, src ,tmpVal , false);
else —> skip
fi;

I81

inline ackReceiveVal (j) {

protocolTyp 2: LDRVAL && wait[j—Il && A[j —I].ch[myld—l]?[
LDRVAL(ACK,rcvsrc , tmpVal, isChild)l —>
atomic {
A[j—l].ch[myld—l]‘.’LDRVAL(ACK,rcvsrc, tmpVal, isChild) -—>
if
.. E _>
if
'° ack && src.num == rcvsrc.num && src.id == rcvsrc.
id —>
wait[j—~I] = false;
waitCount——:
if
isChild —>
chdlj—l] = true;
if
((maxVal.num < tmpVal.num) |I (maxVal.num ==

tmpVal.num && maxVal.id < tmpVal.id)) —>
maxVa|.id = tmpVal.id; maxVal.num = tmpVal.
num;
else —> skip
fi;
else —> skip
fi;
else —> skip
fi;
else —-> skip
fi;

}

inline OverlappedMessage (j) {
protocolType == LDRVAL && A[j - l ].ch[myld —l]?[LDRID(rcvdMngype
, rcvsrc , tmpbyte ,junkByteID , tmpbool )] —>
atomic {
A[j - [].ch[myld-l]?LDRID(rcvdMngype , rcvsrc , tmpbyte ,
junkByteID , tmpbool );
printf(”MSC: ”Message ..from ..LDRID\n”);

if
" rcvdMngype == ELECT —>
skip;
rcvdMngype == ACK —>
skip;
rcvdMngype == LD —>
if
src .num == rcvsrc .num && src . id == rcvsrc . id —>
E = false;

182

 

IF

 

£1171;

 

Idr = tmpbyte;

loopvar = I;
do
loopvar <= N —>
if
" chd[loopvar—J] —> A[nwdd»—l[ch[100pvar—l]!
LDRVALUJDAHC,ldr,junkByteH),tmpbool);
chd[loopvar—4]=false:
:: else —¢> skip
fi;
loopvar++
loopvar > NI~>
break
0d;

else —> skip
fi;
fi ,

 

}

/* defining the process */
proctype node (bool st; byte mynumber) {
bool startElection = st, E = false . ack = false, isChild =
false;
byte myld = mynumbcr, maxchildld;
byte p = O, mynum = 0, max = 0, ldr =0;
complndex src, rcvsrc , maxVal, tmpVal;
bool wait[N] = false , chd[N] = false , tmpbool;
byte waitCount = O, loopvar, tmpbyte;
mtype protocolType = LDRID; /* initially LDRID is running */
mtype rcvdMngype;

 

end: do

/* LDRID .' start election */
protocolType :2 LDRID && (!E && startElection) —>
atomic {
printf(”MSC:...Starting..new..election\n”);
src .num = mynum;
src.id = myld;
mynum = mynum + l;
waitCount = 0;
loopvar = I;
do
loopvar <2 N —>
if
thrslmyId—I].nL[loopvar—l] —>

I83

A[ myld -— I ].ch[ loopvar -— I l!l.l)RlD(ELECT. src . tmpbyte,
junkBytelD , tmpbool );

wait[loopvar—l] :2 true;
waitCount++;

else —->

wait[loopvar—l] = false;

fi;

loopvar++;

loopvar > N —>

break
od;
ack = true; E = true; p = myld; max = myld;
startElection false;

}

joinElectionId ( l );
joinElectionld (2);
joinElectionId (3);
joinElectionId (4) ;
joinElectionld (5);

 

ackReceiveld (l);
ackReceiveId (2);
ackReceiveld(3);
ackReceiveId (4);
ackReceiveld (5);

 

/* LDRID : ack to parent */
'° protocolType == LDRID && E && waitCount == 0 && myld != src
.id && ack —->
atomic {
ack = false;
A[myId—l].ch[p—I]!LDRID(ACK, src ,max,junkBytelD,true);
}

/* LDRID .' elect leader */
" protocolType == LDRID && E && waitCount == 0 && myld == src
.id && ack —>
atomic {
ack = false;
E = false;
Idr = max;
loopvar = 1;
do
loopvar <= N —>
if

184

chd[loopvarml] _> Alnudd—-chhlloopvar—lllLDRHDUJ)
.src,ldr,junkByteH),tmpbool);
chd[loopvar—d] = false;
else -> skip

fi;

loopvar++

loopvar > N’—>

break

0d;
}

/* LDRID .' receive and forward leader Ir/

" protocolType :2 LDRID && E && lack && myld != src.id && A[p
—I].ch[myld—I]?[LDRID(LD, rcvsrc , 1dr ,JunkByteID ,tmpbool)] -—->
A[p—l].ch[myld—l]?LDRID(LD, rcvsrc ,ldr ,junkBytelD , tmpbool);

 

atomic {
if
" src.num == rcvsrc.num && src.id 2: rcvsrc.id —>
E = false;
loopvar = 1;
do
loopvar <= N «>
if
" chd[loopvar—l] —-> A[myld -—l].ch[loopvar —l]!LDRID
(LD, src , ldr ,junkBytelD . tmpbool );
chd[loopvar—l]=false;
:: else --> skip
fi;
loopvar++
loopvar > N —>
break
od;
else -> skip
fi;
}

/* adaptive action */
ADAPT[myId-—I] && protocolType == LDRID -—>
atomic {
printf(”MSC:..Adaptingcprotocol..tocLDRVALcatcnode:z‘7cd\n”,
myld);
protocolType = LDRVAL;
/* perform state transfer actions */
if
E && waitCount I: 0 && src.id != myld —>
E = false; ack = false;
waitCount = 0; p = O; Idr = 0;

 

/*

}

OverlappedMessage (l);

loopvar = I;

do

" loopvar <2 N —>
wait[loopvar-—l] = false:
chd[loopvar—I] = false;

loopvar++;

:: loopvar > N —:> break
od;
E && waitCount l: O && src.id == myld —>
E = false; ack = false;

waitCount = O; p = 0; Idr = 0;

loopvar = l; n
do .

loopvar <= N —->

 

I" ii ['1

 

wait[loopvar—l] = false;

chd[I00pvar—l] = false;

loopvar++; l‘ ;
:. loopvar > N —> break E.)
0d; ..
startElection = true;

:. else -> skip
fi;

 

OverlappedMessage (2);
OverlappedMessage (3);
OverlappedMessage (4);
OverlappedMessage (5);

LDRVAL .' start election */

protocolType == LDRVAL&& (!E && startElection) —>
atomic {

printf(”MSC:..Startingcnewcelection\n");
src .num = mynum;
src.id = myld;
mynum = mynum + I;
waitCount = 0;
loopvar = I;
do
loopvar <= N —>
if
°° thrs[myId—-l].nL[loopvar ——l] -—>
A[myld—l].ch[loopvar -—l]!LDRVAL(ELECT, src ,tmpVal ,
tmpbool);
wait[loopvar—l] = true;
waitCount++;

186

else —>

waitlloopvar—1| = false;
fi;
loopvar++;
loopvar > N —>

break
od;
ack = true; E = true; p = myld;
maxVal.id = myld; maxVal.num = valuelmyId--l];
startElection = false;

}

joinElectionVal (I);
joinElectionVal (2);
joinElectionVa1(3) ;
joinElectionVa1(4);
joinElectionVal (5);

 

ackReceiveVa1(l);
ackReceiveVa1(2);
ackReceiveVal (3);
ackRecerveVal (4);
ackReceiveVa1(5);

/* LDRVAL .' ack to parent */
" protocolType == IDRVAL && E && waitCount == 0 && myld !=
src.id && ack —->
atomic {
ack = false;
A[myld—l].ch[p—l]!IDRVAL(ACK, src,maxVa1,true);

 

}

/* LDRVAL : elect leader */
" protocolType == IDRVAL && E && waitCount == 0 && myld ==
src.Id && ack -—>
atomic {
ack = false;
E = false;
1dr = maxVal.id;
loopvar = 1;
do
loopvar <= N —>
if
" chd[loopvar—l] —> A[myld—l].ch[100pvar—1]!LDRVAL(
LD, src , maxVa1,tmpbool);
chd[loopvar—l] = false;
else —> skip

187

fi:
loopvar++
loopvar > N —>
break
od;
}

/* LDRVAL : receive and forward leader a/

" protocolType == LDRVAL && E && lack && myld 1: src . id && A[
p—l].ch[myld—1]?[LDRVAL(LD, rcvsrc ,tmpVa1,tmpbool)] -—>
A[p——1Lch[nudd—J]?EDRVAL(LD,rcvsn:,Uanal,nnpbool);
atomic {

if
" src.num == rcvsrc .num && src.id == rcvsrc.id ->

E = false;

1dr = UanaLid;

loopvar = I;

do
loopvar <= Nt~>
if

 

chd[loopvar—l] ~> A[nndd-1Lch[loopvar-—IN
LDRVAL(LD,sn:,Uanal,nnpbool);
chd[loopvar—l]=false;
:: else —> skip
fi;
loopvar++
loopvar > N'“>
break
od;
else *> skip
fi;
}
od

}

/* initial process */
init {
/* defining the neighbors */
lVbhrs[O].nL[l] = true; lVbhrs[l].nL[OJ = true;

thrs[0].nL[2] = true; bﬂﬂns[2].nL[O] = true;
thrle].nL[4] = true; huﬂns[4].nL[O] = true;
IthrslI].nL[2] = true; lVbhrsl2].nL[I] = true;
thrsll].nL[4] = true; hﬂﬁns[4].nL[I] = true;

thrs[ ] nL[3] = true; 1Vbhrs[3].nL[21 = true;

/* processes that will start the election */
starﬂﬂodesl3] = true;

188

 

sHartNodesl4l = true:
value [3] = 1;

byte proc:
atomic {
proc = 1;
do
proc <= N —>
if

startNodes[proc—I] —> run nodeI'true,

else —-> run node(false, proc)
fi;
proc++;
proc > N —>
break
0d;

do
" !ADAPT[O] —> ADAPT[O] = true;
IADAP'I‘H] —> ADAPTll] = true;
1ADAPT[2] —-> ADAPT[2] = true;
!ADAP'T[3] —-> ADAPT[3] = true;
!ADAPT[4] —> ADAPT[4] = .true;

3} ADAPT[O]&&ADAPT[1]&&ADAPT[2I && ADAPFI3I&&ADAPF141

-—> break;
od

A.2 Model-Checking Results

adaptive leader election program of Section A] using SPIN.

A.2.1 End States

189

proc);

In this section, we discuss results of safety and liveness properties that we verify for the

In this subsection we show the results of verifying the program for valid end states. The

valid end states for the adaptive leader election program discussed in section A] are those

 

 

 

in which every process that was instantiated has reached the end of its code. Speciﬁcally,
those are ones in which all process have completed the election (started by process 1) and
the adaptation (from ldrId to ldrVal). As shown in the result, there are no invalid end

states.

Bit statespace search for:
never claim -— (not selected)
assertion violations -— (disabled by —A flag)
cycle checks (disabled by —DSAFETY)
invalid end states +

State—vector 2332 byte, depth reached 1251, errors: 0
3.36632e+07 states , stored

l.l4232e+08 states , matched

1.47896e+08 transitions (= stored+matched)
5.90994e+08 atomic steps

hash factor: 1.99354 (best if > 100.)
bits set per state: 3 (—k3)

Stats on memory usage (in Megabytes):

78637.270 equivalent memory usage for states (stored*(State—
vector + overhead))

16.777 memory used for hash array (—w26)

0.360 memory used for DFS stack (—m10000)

0.623 other (proc and chan stacks)
0.017 memory lost to fragmentation
17.777 total actual memory usage

A.2.2 Safety Property of Leader Election

In this subsection, we consider the safety property which states that if any two processes are
not in election then they have the same leader. For sake of reducing redundancy, we show
the result for the case in which we verify the property for process 1 and process 2. Similar

results are obtained when we consider any two processes. The never claim describing the

190

I'd.-
ﬂ
. ..,
-. at.

 

..,-I A. '_ '..AZ.A&.'.-HI.A' I 4)

l... _

 

property is shown ﬁrst, which is followed by the veriﬁcation result.

Never claim

#define p (node[l]:E || node[2]:E || node[l]:ldr == node[2]:ldr
)

/*

*

Formula As Typed: []p

The Never Claim Below Corresponds

To The Negated Formula !([]p)
(formalizing violations of the original)

*

*

*

*/

never { /* !([]p) */
T0-init:
if
(! ((p))) -> goto accept-aH
(1) —> goto TTLJnit
fi;
accept-all:
skip
}

Veriﬁcation result

Bit statespace search for:

never claim +
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)

invalid end states

(disabled by never claim)

State—vector 2336 byte, depth reached 1383, errors: 0
3.3699lc+07 states , stored

1.09351e+08 states , matched

l.4305e+08 transitions (= stored+matched)

5.64732e+08 atomic steps

hash factor: 1.99142 (best if > 100.)
bits set per state: 3 (—k3)

Stats on memory usage (in Megabytes):

79260.243 equivalent memory usage for states (stored*(State—
vector + overhead))

8.389 memory used for hash array (-w26)

0.040 memory used for bit stack

0.320 memory used for DFS stack (—m10000)

191

 

0.584 other (proc and chan stacks)
0.056 memory lost to fragmentation
9.389 total actual memory usage

A.2.3 Liveness Property of Leader Election

In this subsection, we consider the liveness property which states that eventually all pro-
cesses have the same leader. The never claim describing the property is shown ﬁrst, which

is followed by the veriﬁcation result.

Never claim

#define p (node[l]:ldr == node[2]:ldr && node[2]:ldr == node
[3]:ldr && node[3]:ldr == node[4]:ldr && node[4]:ldr == node
[5]:ldr && node[S]:ldr == node[l]:ldr)

/:I=

=I= Formula As Typed: []<>p

* The Never Claim Below Corresponds

* To The Negated Formula !([]<>p)

* (formalizing violations of the original)
*/

never { /* !([]<>p) */
T0-init:
if
(I ((p))) —> goto accept_S4
(l) —> goto T0_init
fi;
accept_S4:
if
(I ((p))) —> goto accept-S4
fi;

Veriﬁcation result

Bit statespace search for:

never claim +
assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)

invalid end states

(disabled by never claim)

192

 

State—vector 2336 byte, depth reached 1391, errors: 0
3.3996e+07 states , stored

l.10277e+08 states , matched

1.44273e+08 transitions (= stored+matched)

5.7067e+08 atomic steps

hash factor: 1.97402 (best if > 100.)

bits set per state: 3 (—k3)

Stats on memory usage (in Megabytes):

79958.646 equivalent memory usage for states (stored*(State—

vector + overhead))
8.389 memory used for hash array (—w26)

0.040 memory used for bit stack

0.320 memory used for DFS stack (—m10000)
0.585 other (proc and chan stacks)

0.055 memory lost to fragmentation

9.389 total actual memory usage

A.2.4 Safety Property During Adaptation

In this subsection, we consider a safety property during adaptation. The property states that
if a process using ldr Val protocol sends a message of type ELECT to a process using ldrId
protocol, then that message is buffered. The never claim describing the property is shown

ﬁrst, which is followed by the veriﬁcation result.

Never claim

#define p (!(node[l]: protocolType == LDRID && node [2]:
protocolType == IDRVAL && node[2]:E && node[2]:ack && node[2]:
waitCount !=0) || (A[l].ch[0]??[LDRVAL(ELECT)]))

/*

* Formula As Typed: []p

* The Never Claim Below Corresponds

To The Negated Formula .’([]p)
(formalizing violations of the original)

* *

*/

never { /* !([]p) */

193

 

 

T0-init:

if
(I ((p))) -> goto accept_all
(l) —> goto TTLJnit
fi;
accept-all:
skip

}

Veriﬁcation result

Bit statespace search for:

never claim +

assertion violations + (if within scope of claim)
acceptance cycles + (fairness disabled)
invalid end states — (disabled by never claim)

State—vector 2336 byte, depth reached 1383, errors: 0
3.3699le+07 states , stored

1.09351e+08 states , matched

l.4305e+08 transitions (= stored+matched)
5.64732e+08 atomic steps

hash factor: 1.99142 (best if > 100.)

bits set per state: 3 (—k3)

Stats on memory usage (in Megabytes):

79260.243 equivalent memory usage for states (stored*(State—

vector + overhead))
8.389 memory used for hash array (—w26)

0.040 memory used for bit stack

0.320 memory used for DFS stack (—m10000)
0.684 other (proc and chan stacks)

0.176 memory lost to fragmentation

9.609 total actual memory usage

194

 

 

 

 

BIBLIOGRAPHY

195

 

Bibliography

[1] W. K. Chen, M. Hiltunen, and R. Schlichting, “Constructing adaptive software in
distributed systems,” in Proceedings of the 21st International Conference on Dis-
tributed Computing Systems, pp. 635—643, April 2001.

[2] P. McKinley and U. Padmanabhan, “Design of composable proxy ﬁlters for mo-
bile computing,” in Proceedings of the Workshop on Wireless Networks and Mobile
Computing, 2001.

[3] S. S. Kulkarni, K. N. Biyani, and U. Arumugam, “Composing distributed fault-
tolerance components,” in Proceedings of International Workshop on Principles of
Dependable Systems - PoDSy, at DSN, pp. W127—l36, June 2003.

[4] J. Hallstrom, W. Lea], and A. Arora, “Scalable evolution of highly available sys-
tems,” Transactions of the Institute for Electronics, Information and Communication
Engineers, vol. E86-D, no. 10, pp. 2154-2164, 2003.

[5] B. Redmond and V. Cahill, “Supporting unanticipated dynamic adaptation of appli-
cation behavior,” in Proceedings of the European Conference on Object-Oriented
Programming (ECOOP), pp. 205—230, 2002.

[6] S. M. Sadjadi, Transparent Shaping of Existing Software to Support Pervasive and
Autonomic Computing. PhD thesis, Michigan State University, 2004.

[7] J .-C. F abre and T. Pcrennou, “FRIENDS: A ﬂexible architecture for implementing
fault tolerant and secure distributed applications,” in Proceedings of the European
Dependable Computing Conference, pp. 3—20, 1996.

[8] R. Keller and U. Hdlzle, “Binary component adaptation,” Lecture Notes in Computer
Science, vol. 1445, 1998.

[9] J. Kramer and J. Magee, “The evolving philosophers problem: Dynamic change
management,” IEEE Transactions of Soﬁware Engineering, vol. 16, no. 11,
pp. 1293—1306, 1990.

[10] N. Amano and T. Watanabe, “A software model for ﬂexible and safe adaptation of
mobile code programs,” in Proceedings of the International Workshop on Principles
of Software Evolution, pp. 57—61, ACM Press, 2002.

196

[11] P. McKinley, S. Sadjadi, E. Kasten, and B. Cheng, “Composing adaptive software,”
IEEE Computer, vol. 37, no. 7, pp. 56—64, 2004.

[12] J. Buckley, T. Mens, M. Zenger, A. Rashid, and G. Kniesel, “Towards a taxonomy
of software change,” Journal of Software Maintenance and Evolution: Research and
Practice, 2003.

[13] J. S. Bradbury, J. R. Cordy, J. Dingel, and M. Werrnelinger, “A survey of self-
management in dynamic software architecture speciﬁcations,” in Proceedings of the
International Workshop on Self-Managed Systems (WOSS), 2004.

[14] D. L. Metayer, “Describing software architecture styles using graph grammers,”
IEEE Transaction on Software Engineering, vol. 24, no. 7, pp. 521—533, 1998.

[15] G. Taentzer, M. Goedicke, and T. Meyer, “Dynamic change management by dis-
tributed graph transformation: Towards conﬁgurable distributed systems,” in Pro-
ceedings of the 6th International Workshop on Theory and Application of Graph
Transformation, vol. 1764 of LNCS, Springer, 1998.

 

[16] M. Werrnelinger, A. Lopes, and J. L. Fiadeiro, “A graph based architectural
(re)conﬁguration language,” in Proceedings of the 8th European Software Engi-
neering Conference and 9th ACM SIGSOF T Symposium on the Foundatations of
Software Engineering (ESEC/FSE 2001 ), vol. 26 of Software Engineering Notes,
pp. 21—32, 2001.

[17] R. Allen, R. Douence, and D. Garlan, “Specifying and analyzing dynamic software
architectures,” in Proceedings of Conference on Fundamental Approaches to Soft-
ware Engineering, vol. 1382 of Lecture Notes in Computer Science (LNCS), pp. 21—
35, 1998.

[18] C. E. Cuesta, P. de la Fuente, and M. Barrio-Solarzano, “Dynamic coordination ar-
chitecture through use of reﬂection,” in Proceedings of A CM Symposium on Applied
Computing, pp. 134—140, ACM Press, 2001.

[19] P. Oreizy, N. Medvidovic, and R. N. Taylor, “Architecture-based runtime software
evolution,” in Proceedings of the 20th International Conference on Software Engi-
neering, pp. 177—186, 1998.

[20] J. Kramer, J. Magee, and M. Sloman, “Conﬁguring distributed systems,” in Pro-
ceedings of the 5th Workshop on ACM SIGOPS European Workshop, pp. 1—5, ACM
Press, 1992.

[21] S. McCamant and M. D. Ernst, “Predicting problems caused by component up-
grades,” in ESEC/FSE: Proceedings of the 10th European Software Engineering
Conference and the 11th ACM SIGSOF T Symposium on the Foundations of Soft-
ware Engineering, (Helsinki, Finland), pp. 287—296, September 2003.

197

 

 

[22] L. Mariani and M. Pezzé, “A technique for verifying component-based software,”

in International Workshop on Test and Analysis of Component Based Systems,
(Barcelona, Spain), pp. 17—30, March 27—28, 2004.

[23] S. Chaki, N. Sharygina, and N. Sinha, “Veriﬁcation of evolving software,” in Pro-

ceedings of the 3rd International Workshop on Speciﬁcation and Veriﬁcation of
Component-based Systems, pp. 55—61, 2004.

[24] D. Gupta and P. Jalote, “On-line software version change using state transfer be-
tween processes,” Software - Practice and Experience, vol. 23, no. 9, pp. 949-964,
1993.

[25] B. P. Lientz and E. B. Swanson, Software Maintenance Management: A Study of the
Maintenance of Computer-Application Software in 487 Data Processing Organiza-
tions. Addison-Wesley, August 1980.

[26] N. Chapin, J. Hale, K. Khan, J. Ramil, and W. G. Than, “Types of software evolution
and software maintenance,” Journal of Software Maintenance and Evolution, no. 13,
pp. 3—30, 2001.

[27] P. K. McKinley, S. M. Sadjadi, E. P. Kasten, and H. C. Cheng, “A taxonomy of
compositional adaptation,” Tech. Rep. MSU-CSE-O4-l 7, Michigan State University,
May 2004.

[28] C. Szyperski, Component Software: Beyond Object-Oriented Programming.
Addison-Wesley Professional, 2nd ed., 2002.

[29] J. Kramer and J. Magee, “Analysing dynamic change in software architectures: A
case study,” in Proceedings of IEEE International Conference on Conﬁgurable Dis-
tributed Systems, 1998.

[30] C. Canal, E. Pirnentel, and J. M. Troya, “Speciﬁcation and reﬁnement of dynamic
software architectures,” in Proceedings of the Working IF IP Conference on Software
Architecture, pp. 107—126, 1999.

[31] R. N. Taylor, N. Medvidovic, K. M. Anderson, E. J. Whitehead, and J. E. Robbins,
“A component- and message-based architectural style for gui software,” in Proceed-

ings of the 17th International Conference on Software Engineering, pp. 295—304,
ACM Press, 1995.

[32] 1. Lee, DYMOS: A Dynamic MOdiﬁcation System. PhD thesis, University of Wis-
consin, 1983.

[33] R. P. Cook, “*mod - a language for distributed programming,” IEEE Transactions
on Software Engineering, vol. 6, no. 6, pp. 563—571, 1980.

[34] J. Magee and J. Kramer, “Dynamic conﬁguration for distributed systems,” IEEE
Transactions on Software Engineering, vol. 11, no. 4, pp. 424—436, 1985.

198

 

 

[35] J. Magee, J. Kramer, and M. Sloman, “Constructing distributed systems in conic,”
IEEE Transactions on Software Engineering, vol. 15, no. 6, 1989.

[36] B. Liskov, “Distributed programming in argus,” Communications of the ACM,
pp. 300—312, March 1988.

[37] T. Bloom, Dynamic Module Replacement in a Distributed Programming System.
PhD thesis, Massachusetts Institute of Technology, 1983.

[38] D. Gupta, On-line Software Version Change. PhD thesis, Department of Computer
Science and Engineering, Indian Institute of Technology, 1994.

[39] J. Zhang and B. Cheng, “Using temporal logic to specify adaptive program seman-
tics,” Journal of Systems and Software (JSS), vol. 79, no. 10, pp. 1361-1369, 2006.

[40] J. Zhang and B. Cheng, “Model-based development of dynamically adaptive soft-

ware,” in Proceedings of International Conference on Software Engineering, May
2006.

[41] J. Zhang, Z. Yang, B. Cheng, and P. McKinley, “Adding safeness to dynamic adap-
tation techniques,” in Proceedings of I CSE 2004 Workshop on Architecting Depend-
able Systems, (Edinburgh, Scotland, UK), May 2004.

[42] B. Alpern and F. B. Schneider, “Deﬁning liveness,” Information Processing Letters,
vol. 21, pp. 181-185, October 1985.

[43] B. Alpern and F. B. Schneider, “Proving boolean combinations of deterministic prop-

erties,” in Proceedings of the Second Symposium on Logic in Computer Science,
pp. 131—137, 1987.

[44] M. E. Segal and O. Frieder, “Dynamic program updating: a software maintenance
technique for minimizing software downtime,” Software Maintenance : Research
and Practice, vol. 1, pp. 59—79, September 1989.

[45] M. E. Segal and O. Frieder, “Dynamically updating distributed software: Supporting
change in uncertain and mistrustful environments,” in Proceedings of International
Conference on Software Maintenance, October 1989.

[46] M. E. Segal, “Extending dynamic program updating systems to support distributed
systems that communicate via remote evaluation,” in Proceedings of the Interna-
tional Workshop on Conﬁgurable Distributed Systems, pp. 188-199, 1991.

[47] M. R. Barbacci, D. L. Doubleday, and C. B. Weinstock, “Application level pro-
gramming,” in Proceedings of International Conference on Distributed Computing
Systems, pp. 458—465, 1990.

[48] C. Hofrneister, Dynamic Reconﬁguration. PhD thesis, Computer Science Depart-
ment, University of Maryland, 1993.

199

 

 

 

 

[49] S. Gihnore, D. Kirli, and C. Walton, “Dynamic ml without dynamic types,” Tech.
Rep. ECS-LFCS-97-3 78, Laboratory for the Foundations of Computer Science, Uni-
versity of Edinburgh, December 1997.

[50] C. Walton, D. Kirli, and S. Gilmore, “An abstract machine for module replacement,”
tech. rep., Laboratory for the Foundations of Computer Science, University of Edin-
burgh, June 1998.

[51] M. Hicks, Dynamic Software Updating. PhD thesis, University of Pennsylvania,
2001.

[52] M. Segal and O. Frieder, “On-the-ﬂy program modiﬁcation: Systems for dynamic
updating,” IEEE Software, pp. 53—65, March 1993.

[53] D. Peled and W. Penczek, “Using asynchronous buchi automata for efﬁcient auto-
matic veriﬁcation of concurrent systems,” in PSTV, pp. 315—330, 1995.

[54] J. Shutt and R. Rubinstein, “Self-modifying ﬁnite automata: An introduction,” in
Information Processing Letters, vol. 56, pp. 185-190, 1995.

[55] N. Lynch, Distributed Algorithms. Morgan Kauﬁnann, 1996.

[56] A. Arora and M. G. Gouda, “Closure and convergence: A foundation of fault-
tolerant computing,” IEEE Transactions on Software Engineering, 1993.

[57] A. Arora and S. S. Kulkarni, “Component based design of multitolerant systems,”
IEEE transactions on Software Engineering, vol. 24, pp. 63—78, January 1998.

[58] S. Kulkarni, Component-based Design of Fault Tolerance. PhD thesis, Ohio State
University, 1999.

[59] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed control,” Communi-
cations of the ACM, vol. 17, no. 11, 1974.

[60] A. Arora and S. S. Kulkarni, “Designing masking fault-tolerance via nonmask-
ing fault-tolerance,” IEEE Transactions on Software Engineering, vol. 24, no. 6,
pp. 435—450, 1998.

[61] A. Arora and M. G. Gouda, “Closure and convergence: A foundation of fault-
tolerant computing,” IEEE Transactions on Software Engineering, vol. 19, no. 11,
pp.1015—1027,1993.

[62] G. Varghese, Self-stabilization by local checking and correction. PhD thesis,
MIT/LCS/TR-583, 1993.

[63] M. Gouda, Elements of Network Protocol Desgin. John Wiley & Sons, 1998.
[64] E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1976.

200

 

 

 

[65] G. Holzrnann, “Logic veriﬁcation of ansi-c code with spin,” Proceedings of the The
Sixth SPIN Workshop, 2000.

[66] M. G. Gouda and T. McGuire, “Correctness preserving transformations for network
protocol compilers,” Prepared for the Workshop on New Visions for Software Design
and Productivity: Research and Applications, 2001 .

[67] M. Nesterenko and A. Arora, “Stabilization-preserving atomicity reﬁnement,” Jour-
nal of Parallel and Distributed Computing, vol. 62(5), pp. 766—791 , 2002.

[68] M. Demirbas and A. Arora, “Convergence reﬁnement,” Proceedings of the Interna-
tional Conference on Distributed Computing Systems, 2002.

[69] D. Gries, The Science of Programming. Springer-Verlag, 1981.

[70] M. D. Ernst, Dynamically Discovering Likely Program Invariants. PhD thesis, Uni-

versity of Washington Department of Computer Science and Engineering, (Seattle,
. Washington), 2000.

[71] M. Ernst, J. Cockrell, W. Griswold, and D. Notkin, “Dynamically discovering likely
program invariants to support program evolution,” in Proceedings of the Intema-
tional Conference on Software Engineering, pp. 213—224, 1999.

[72] M. Ernst, A. Czeisler, W. Griswold, and D. Notkin, “Quickly detecting relevant
program invariants,” in Proceedings of the International Conference on Software
Engineering, pp. 449—458, 2000.

[73] M. G. Gouda and T. Herman, “Adaptive programming,” IEEE Transactions on Soft-
ware Engineering, vol. 17, pp. 911—921, 1991.

[74] R. V. Renesse, K. Birman, M. Hayden, A. Vaysburd, and D. Karr, “Building adaptive
systems using ensemble,” Software - Practice & Experience, vol. 28, pp. 963-979,
July 1998.

[75] N. Sridhar, S. M. Pike, and B. W. Weide, “Dynamic module replacement in dis-
tributed protocols,” in Proceedings of the 23rd International Conference on Dis-
tributed Computing Systems, May 2003.

[76] E. W. Dijkstra and C. S. Scholten, “Termination detection for diffusing computa-
tion,” in Information Processing Letters, vol. 11, pp. 1—4, August 1980.

[77] S. Vasudevan, J. Kurose, and D. Towsley, “Design and analysis of a leader elec-
tion algorithm for mobile ad hoc networks,” in Proceedings of the 12th IEEE Inter-
national Conference on Network Protocols, pp. 350—360, IEEE Computer Society,
October 2004.

[78] O. Babaoglu and K. Marzullo, “Consistent global states of distributed systems: Fun-
damental concepts and mechanisms,” in Distributed Systems (S. Mullender, ed.),
pp. 55—96, Addison-Wesley, 1993.

201

 

 

 

[79] V. Garg and B. Waldecker, “Detection of weak unstable predicates in distributed

programs,” IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 299—
307, March 1994.

[80] M. Chandy and L. Lamport, “Distributed snapshots: Determining global states of

distributed systems,” ACM Transactions on Computer Systems, vol. 3, pp. 63—75,
Feb 1985.

[81] R. Cooper and K. Marzullo, “Consistent detection of global predicates,” Proceedings
of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in
ACM SIGPLAN Notices, vol. 26, no. 12, pp. 167—174, 1991.

[82] S. Venkatesan and B. Dathan, “Testing and debugging distributed programs us-
ing global predicates,” IEEE Transactions of Software Engineering, vol. 21, no. 2,
pp. 163—177, 1995.

[83] V. Garg and B. Waldecker, “Detection of strong unstable predicates in distributed
programs,” IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 12,
pp. 1323—1333, 1996.

[84] L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” Com-
munications of the ACM, vol. 21, pp. 558—565, July 1978.

[85] F. Mattem, “Virtual time and global states of distributed systems,” in Proceedings
of the International Workshop on Parallel and Distributed Algorithms, pp. 215-226,
1989.

[86] D. L. Pamas, “On the criteria to be used in decomposing systems into modules,”
Communications of the ACM, vol. 15, no. 12, 1972.

[87] P. Allen and S. Frost, Component-Based Development for Enterprise Systems. Cam-
bridge University Press, 1998.

[88] K. Czamecki and U. Eisenecker, Generative Programming. Addison-Wesley, May
2000.

[89] E. Gamma, R. Helm, R Johnson, and J. Vlissides, Design Patterns: Elements of
Reusable Object-Oriented Software. Addison-Wesley, 1995.

[90] D. M. Hoffman and D. M. Weiss, eds., Software Fundamentals - Collected Papers
by David L. Parnas. Addison-Wesley, 2001.

[91] K. Biyani, “Dynamic composition of distributed components,” Master’s thesis,
Michigan State University, 2003.

[92] E. P. Kasten, An Integrated Approach to Autonomous Computation for Data Stream-
ing Applications. PhD thesis, Michigan State University, 2007.

[93] “IBM systems journal, special issue on autonomic computing,” 2003.

202

 

 

[94] T. Jebara and A. Pentland, “Statistical imitative learning from perceptual data,” in

Proceedings of the Second International Conference on Development and Learning,
pp. 191-196, June 2002.

[95] R. Laddaga, M. L. Swinson, and P. Robertson, “Seeing clearly and moving forward,”
IEEE Intelligent Systems, vol. 15, pp. 46—50, 2000.

[96] A. Arora and M. G. Gouda, “Distributed reset,” IEEE Transactions on Computers,
vol. 43, no. 9, pp. 1026-1038, 1994.

[97] S. S. Kulkarni and A. Arora, “Multitolerance in distributed reset,” Chicago Journal
of Theoretical Computer Science, 1998.

[98] P. T. Wojciechowski and O. Riitti, Formal Methods for Open Object-Based Dis-
tributed Systems, vol. 3535 of Lecture Notes in Computer Science, ch. On Correct-
ness of Dynamic Protocol Update, pp. 275—289. Springer Berlin / Heidelberg, 2005.

[99] D. Batory, “Multi-level models in model-driven development, product lines, and
metaprogramming,” IBM Systems Journal, vol. 45, no. 3, 2006.

 

[100] J. Liu and D. Batory, “Automatic remodularization and optimized synthesis of prod-

uct families,” in Generative Programming and Component Engineering (GPCE),
October 2004.

[101] P. Clements and L. Northrop, Software Product Lines. Addison Wesley, 2001.

[102] S. Kulkarni and K. Biyani, “Correctness of component-based adaptation,” in Pro-
ceedings of International Symposium on Component-based Software Engineering -
CBSE, at ICSE, vol. 3054 of Lecture Notes in Computer Science, pp. 48-58, May
2004.

[103] K. Biyani and S. Kulkarni, “Mixed-mode adaptation in distributed systems: A case
study,” in Proceedings of International Workshop on Software Engineering for Adap-
tive and Self-Managing Systems - SEAMS, at ICSE, May 2007.

[104] K. Biyani and S. Kulkarni, “Concurrency tradeoffs in dynamic adaptation,” in Pro-
ceedings of International Workshop on Assurance in Distributed Systems and Net-
works - ADSN, at ICDCS, July 2006.

[105] K. Biyani and S. Kulkarni, “Testing dynamic adaptation in distributed systems,” in
Proceedings of International Workshop on Automation of Software Test - AST at
ICSE, May 2007.

[106] K. Biyani and S. Kulkarni, “Building component families to support adaptation,”
in Proceedings of International Workshop on Design and Evolution of Autonomic
Application Software - DEAS, at ICSE, May 2005.

[107] M. Arumugam, S. Kulkarni, and K. Biyani, “Adaptation in sensor—actuator net-

works: A case study,” in Proceedings of the Third International Conference on Net-
worked Sensing Systems - INSS, June 2006.

203

 

run; ------

IIIIIIIIIIIIIIIIIIIIIIIIIIIII

11111111111|11111111111111111111111