Ii .xfi’gun 52......9 :. F: .34 .2. .7... 3:3“? “I“ J . 30.1% .. :3 “mm, 9.1;. .i . . a . 33.. 1‘ ‘L’v. 3r. 2 1 z . 5"?de 3 «r “2% 0. III: A. .2: . . 3335...: . “mumfikn .‘A.!. 1. V... , ,. . ‘ . . .. ‘ . . L i ‘. . . ‘..-.I . .. . . i , A , ,J g“ ‘ UBRARY ow Michigt. State University This is to certify that the dissertation entitled ASSURANCE OF ADAPTATION IN DISTRIBUTED SYSTEMS presented by Karunkumar N. Biyani has been accepted towards fulfillment of the requirements for the Doctoral degree in Computer Science and Engineering 959w IMW Major Professor’s Signature l2Lo3/07 Date MSU is an affinnative-action, equal-opportunity employer PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/ClRC/DateDue.indd—p.1 ASSUI ASSURANCE OF ADAPTATION IN DISTRIBUTED SYSTEMS By Karunkumar N. Biyani A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2007 AB STRACT ASSURANCE OF ADAPTATION IN DISTRIBUTED SYSTEMS By Karunkumar N. Biyani Software systems need to adapt due to changing requirements or changing environ- ment conditions. For long-running and safety-critical applications it is highly desirable to adapt the system without completely stopping the system. In the case of distributed sys- tems, adaptation often requires changes to multiple processes. Typically, such adaptation is performed by dynamically adding or removing components from multiple processes. As a result, during adaptation, the system may consist of both changed and unchanged processes, causing the old and the new components to overlap. This overlapping of com- ponents during adaptation may induce cross-component communication, which may lead to unpredictable and/or undesirable behavior during adaptation. In order to gain confidence in adaptation in distributed systems, in this disseration, we address the assurance requirements at various stages of adaptation development: (2') mod- eling and verification of adaptation, (ii) testing of adaptation, (iii) design of components involved in adaptation, and (iv) design of a framework that supports adaptation. In this dissertation, we describe an approach based on adaptation lattices to model and verify adaptation. Specifically, we present transitional-invariant lattices and transitional- faultspan lattices to verify the correctness of adaptation in absence and presence of faults, respectively. lunhcm namely. min .. changed pros. time and com: ltnges inx'olx c approach can I We also d; how the existii can be extend. We also d One aspect 0‘ not only the Furthermore, we identify the issues that arise in an important class of adaptation, namely, mixed-mode adaptation. Mixed-mode adaptation allows the changed and the un- changed processes to interact during adaptation, thereby, minimizing service interruption time and communication overhead. In this dissertation, we identify and address the chal- lenges involved in mixed-mode adaptation. Specifically, we show how the adaptation lattice approach can be used in the case of mixed-mode adaptation. We also discuss an approach for testing adaptation in distributed systems. We show how the existing approaches based on predicate detection for testing distributed systems can be extended for testing adaptation. We also describe component family design to build a library of adaptive components. One aspect of the design is to build an adaptation-verified library of components in which not only the components but also the adaptations between the components are verified. The design applies the principle of separation of concerns to separate adapt-active parts of the components fi'om their core functionality. Furthermore, the component family design integrates the framework that performs adaptation while ensuring that the adaptation logic is separate from the core functionality of components as well as the application. © C0pyright by KARUNKUMAR N. BIYANI 2007 To my parents and wife for their love and sacrifices First of alj motivation an and guidance this dissertatit with me and u; I Want to t] Dr. Philip K Confluents an. SCIEUCC and .‘ Want I0 than} is “my 3 W01 Llllda MOOrc Compurcr sC I Want to low Specifk heSit-«mum. ll Zhang and fathEr RII' \ ACKNOWLEDGMENTS First of all I want to thank my loving Lord, Krishna, and my teacher for the inspiration, motivation and support to pursue things the right way. I am very thankful to my advisor and guidance committee chairperson, Dr. Sandeep S. Kulkarni, without whose guidance this dissertation would have never completed. I specifically thank him for being patient with me and understanding my limitations. He is a wonderful person and a great mentor. I want to thank the other members of my guidance committee, Dr. Betty H. C. Cheng, Dr. Philip K. McKinley, Dr. Laura Dillon and Dr. Jonathan I. Hall, for their helpfirl comments and advise over the years. I also want to thank the faculty of the Computer Science and Engineering Department at the Michigan State University. In particular, I want to thank Dr. Abdol Esfahanian, whose teachings had a great influence on me. He is truly a wonderful teacher, who makes learning easy and fim. I also want to thank Mrs. Linda Moore, Mrs. Debbie Kruch, Mrs. Norma Teague and the rest of the staff of the Computer Science and Engineering Department. I want to thank my colleagues in Software Engineering and Network Systems Labora- tory. Specifically, I want to thank: Bruhadeshwar (Bru) Bezawada, Umamaheswaran (Ma- hesh) Arumugam, Ali Ebnenasir, Borzoo Bonakdarpour, Farshad Samimi, Masoud Sadjadi, Ji Zhang and Limin Wang. I want to thank my friends and family for their encouragement and support. No words of gratitude can ever sum up the contribution of my family: my father Mr. Nandkishore Biyani, my mother Mrs. Santosh Biyani, and my sisters Kavita Rathi and Rinku Malpani, towards my life. And last but not least, my special thanks to my loving wife Kavita Biyani for all her love and sacrifices. vi LIST OF Flt l Introduct z. 1.] Adaptor; LIST OF TAI 1.1.1 Class:; 1.1.2 Adt-a; 1.1.3 Chan-J 1-3 Thesis . 2 Backgmm 2.l .‘5tdaptat1.I 2.2 Related\' 2.2.1 Dle 3.2.2 COXIL 2.2.4 Onlinc 2.2.5 Mode]- 2.2.6 Others 3 .‘vIOdelingil‘ 31 Adaplalltt 3.2 Stracu 3.2.1 Adam: 3.2.2 Adaptaz | 3? amalloyl iii-l aull~t.| 9.7 Related 9.8 Summer 10 Conclusiu I 1 _ l0.l Contriht. limos; 10-3 Future R 9.2.1 Abstract Component Structure ......................... 129 9.3 Concrete Component Family ........................... 131 9.3.1 Component Family Interfaces ......................... 133 9.3.2 Component Family Instantiation ........................ 137 9.3.3 Component Family Implementation ...................... 139 9.4 Case Study: Leader Election Component Family ................ 146 9.4.1 Interfaces of the Family ............................ 147 9.4.2 Classes of the Family .............................. 150 9.4.3 Performance Results .............................. 152 9.5 Case study: Reliable Communication Component Family ............ 154 9.5.1 Components of the Family ........................... 154 9.5.2 Adaptations of the Family ........................... 159 9.6 Discussion ..................................... 164 9.7 Related Work ................................... 166 9.8 Summary ..................................... 167 10 Conclusion and Future Work 171 10.1 Contributions ................................... 171 10.2 Future Research .................................. 174 APPENDICES 176 A Model-Checking of Adaptive Leader Election Program 177 A.1 Adaptive Leader Election Program ........................ 177 A2 Model-Checking Results ............................. 189 A.2.l End States .................................... 189 A22 Safety Property of Leader Election ....................... 190 A23 Liveness Property of Leader Election ..................... 192 A24 Safety Property During Adaptation ....................... 193 BIBLIOGRAPHY 195 ix 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 LIST OF FIGURES An example of an adaptation lattice. ....................... 24 Message communication program (fault-intolerant version) ............ 41 Proactive component ................................ 43 Message communication program (with proactive component) .......... 44 Message communication program (fault-intolerant version, adapt-ready). . . . . 45 Intermediate program ’P‘H'pl. .......................... 45 Intermediate program ’P 41,2. .......................... 46 Intermediate program 7304;93- .......................... 46 Intermediate program ’P 41,4. .......................... 47 Adaptation lattice for addition of proactive component. ............. 48 Adaptation lattice for parallel adaptation. .................... 51 Message communication program (with proactive component, adapt-ready). ..... 58 Fault class F 1. .................................. 59 Fault class F2. .................................. 59 Acknowledgment component: sender fraction ................... 60 Acknowledgment component: receiver fraction .................. 61 Message communication program (with reactive component) ........... 62 Intermediate program pan-i171 ........................... 64 Intermediate program ’Pawim ........................... 64 Intermediate program poo-2113 ........................... 65 Intermediate program Paw,“ ........................... 66 5.11 Intermediate program Paa_ip5 ........................... 5.12 Adaptation lattice for replacement of proactive component ........... 6.1 Leader election algorithm based on node Id. ................... 6.2 Leader election algorithm based on node value. ................. 6.3 Program using ldrI d. ............................... 6.4 Program using ldr Val. .............................. 6.5 Intermediate program during adaptation from 731d, Id to pldr Val ......... 6.6 Adaptive program to adapt from Pldr Id to ”PM, Val ................ 6.7 Quiescence adaptation. .............................. 6.8 Mixed-mode adaptation. ............................. 6.9 Quiescence vs. mixed-mode adaptation. ..................... 6.10 Time for leader election protocols. ........................ 7.1 Executing three atomic adaptations. ....................... 7.2 Space-time diagram of adaptation. ........................ 7.3 Adaptation with concurrency ............................ 7.4 Adaptation with (reduced) concurrency. ..................... 7.5 Adaptation with no concurrency. ......................... 7.6 Adaptation in publish-subscribe application .................... 8.1 Space-time diagram of a distributed computation. ................ 8.2 Identifying intermediate program states. ..................... 8.3 Algorithm for adaptation-stable predicate (non-checker process 190- ...... 8.4 Algorithm for adaptation-stable predicate (checker process). .......... 8.5 Algorithm for adaptation-transient predicates (non-checker process p,). 8.6 Algorithm for adaptation-transient predicate (checker process) .......... 9.1 Reference update during adaptation. ....................... xi 66 68 72 76 8O 80 82 87 90 91 96 97 98 98 99 103 110 112 113 .116 117 124 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 An example of a component family ........................ 129 Structure of a component in a family ....................... 130 Architecture of a component family ....................... 131 Interfaces of a component family .......................... 133 Adaptation handlers. ............................... 135 An example of delegation pattern. ........................ 136 Interfaces of the leader election component family. ............... 146 Classes of the leader election component family .................. 149 Component family performance. ......................... 153 Reliable communication component family. ................... 154 Forward error correction component ........................ 156 Acknowledgment component. .......................... 157 Forward error correction with acknowledgment: sender fraction. ........ 158 Forward error correction with acknowledgment: receiver fraction. ....... 159 Adaptations 1 and 2 in reliable communication component family. ....... 160 Adaptations 3 and 4 in reliable communication component family. ....... 161 Adaptation lattices. ................................ 162 xii 4.1 5.1 6.1 6.2 6.3 6.4 LIST OF TABLES State mappings for the adaptation. ........................ 48 State mappings for the adaptation. ........................ 67 Message types used in the protocol. ....................... 73 Variables maintained by each process in the protocol. .............. 74 Overlap communication between protocols. ................... 83 State mapping ((Daai) for each atomic adaptation aaz- ............... 84 xiii Chapter 1 Introduction Evolution is vital in all software systems and has been studied for decades in the context of software maintenance and software upgrade. It is a well known fact in software engineering that programs undergo several changes during their lifetimes. These changes are usually performed for fixing bugs or adding new functionality that the users of the systems did not anticipate initially at the time of original specifications. Furthermore, software systems have become more and more pervasive with easy access to personal computers, smartphones, and cellular and wireless networks. The challenges in building today’s software systems include providing seamless service to user requirements in such heterogeneous operating conditions as: device failures, transitions across wired and wireless environments, high packet loss in wireless networks, byzantine behaviors, and security attacks. In order to meet these challenges, software systems must be able to adapt to environment conditions that may not be foreseen at the time of software development. In general, software systems need to adapt (change) in response to one or more of the following reasons: (i) discovery of errors/bugs, (ii) change in requirements, and (iii) change in environment. In a traditional approach, the change is usually performed by stopping the currently running program and then installing the new version of the program. However, stopping the system to perform the change is undesirable for a variety of reasons, such as: (i) it may be simply inconvenient for users; (ii) it may lead to monetary loss, for example, in the case of banking and e-commerce systems; or (iii) it may be unsafe, for example, in the case of safety critical systems such as air traffic control systems. Moreover, in systems that need to adapt in response to frequent or transient changes in environment, stopping the system for upgrade may not even be an option. Clearly, in all these cases, it is highly desirable to perform the change while the system continues to operate. This type of change is commonly referred to as dynamic adaptation. A software system that supports dynamic adaptation is known as adaptive software. There has been a growing interest in building adaptive softwares. An increasing mun- ber of systems are now being developed with some built-in mechanisms for adaptation that allow for change to be done without completely stopping the system. Adaptive software techniques (e. g., [1—10]) allow the system to modify its own functional or non-functional behavior (e. g., its fault-tolerance, quality of service or security requirements). These modi- fications include reconfiguration of some parameters, or addition, removal, or replacement of application code. A survey in [11] presents various tools and techniques in building adaptive software. Additionally, Buckley et al. [12] has given a taxonomy in the context of when, how, where, and what, of software change. Numerous works in adaptation have either focused on single-process systems or in dis- tributed systems where changes to processes are independent of each other. However, com- paratively fewer works have addressed adaptation in distributed systems where changes to 2 multiple processes need synchronization. Moreover, behavioral verification during adap- tation in distributed systems that require changes to multiple processes has not been ad- equately addressed. In the next section, we discuss some of the issues in adaptation in distributed systems. 1.1 Adaptation in Distributed Systems In distributed systems, multiple processes need to be changed during adaptation. In such adaptations, changes to multiple processes need to be synchronized and interactions be- tween changed and unchanged processes need to be controlled. We refer to the system before adaptation as the old program and to the system after adaptation as the new pro- gram. A process before it is modified is considered as a part of the old program, and after modification it is considered as a part of the new program. We now give the classification of adaptation in distributed systems. 1.1.1 Classification of Adaptation We classify adaptation in distributed systems as: (i) overlap adaptation - when the old program and the new program overlap during adaptation, and (ii) non-overlap adaptation - when the old program and the new program are not present in the system simultaneously during adaptation. Furthermore, we classify overlap adaptation into three main categories: (i) mixed-mode adaptation, (ii) quiescence adaptation, and (iii) parallel adaptation. In the case of quies- cence adaptation, which is the most common approach for adaptation in distributed sys- tems, there is no communication allowed between the old program and the new program. Consequently, during adaptation, changed and unchanged processes exist in the system si- multaneously, but the processes are modified in such a way that the changed process and the unchanged process do not communicate with each other. In contrast, in case of mixed- mode adaptation, changed processes and unchanged processes are allowed to communicate with each other. In the case of parallel adaptation, each node has both the changed process and the unchanged process, but communication between changed processes and unchanged processes is not allowed; the changed (respectively, unchanged) process at one node can communicate with the changed (respectively, unchanged) process at another node. To gain assurance in adaptation, formal specification and verification of adaptation is crucial. In the context of adaptive distributed systems, there are three aspects of verifica- tion: (2') verifying the system before adaptation, (ii) verifying the system during adapta- tion, and (iii) verifying the system after adaptation. While existing verification techniques can be used to verify the system before and the system after adaptation, such techniques cannot be applied directly to verify the system during adaptation. This is because during adaptation the system is changing whereas existing work assumes that the system remains unchanged. Especially, in the case of distributed systems, during adaptation the system exhibits overlapping behavior that is not well specified. Numerous techniques have been proposed to address various issues in forrnalizing adaptation. A survey in [13] discusses various approaches based on graphs, process alge- bras, logic and other formalisms used to specify adaptive systems. Most of the approaches [1, 2, 4, 5, 7, 14—20] focus on design and implementation of adaptive systems. Other ap- proaches [21—24] address the issue of verifying adaptation. The approaches in [21—23] 4 focus on offline adaptation, whereas the approach in [24] focuses on online adaptation of a single process system (that can also be extended to distributed systems that communi- cate via RPC). However, none of these approaches explicitly focus on the behavior of the system during adaptation in distributed systems. Furthermore, mixed-mode adaptation has not received much attention, as it is normally considered difficult. Most existing approaches avoid dealing with mixed-mode behavior during adaptation by employing non-overlap, quiescence or parallel adaptation. Quies- cence adaptation behaves as if the adaptation is performed by changing all processes at the same logical time. There is overhead in performing quiescence adaptation as a large num- ber of messages are required to enforce synchrony among processes. In contrast, mixed- mode adaptation gives better performance in terms of service interruption time and com- munication overhead. Nevertheless, to develop mixed-mode adaptation correctly involves a lot of challenges. In the rest of this chapter, we first discuss advantages of mixed-mode adaptation in Section 1.1.2. Then, in Section 1.1.3, we identify challenges in adaptation that arise or are exaggerated due to overlapping behavior of mixed-mode adaptation. Finally, in Section 1.2, we discuss the contributions of this research. 1.1.2 Advantages of Mixed-Mode Adaptation We expect mixed-mode adaptation to offer the following two main advantages compared to other types of adaptation: 0 Reduced service interruption. Since individual processes need not block while waiting for other processes during adaptation, the service interruption time is re- duced. This is especially important when adaptation occurs frequently (in response to changes in the environment). 0 Low communication overhead. Since mixed-mode adaptation allows the old pro- gram and the new program to interact during adaptation, the synchronization required among processes during adaptation is reduced, thereby reducing the communication overhead. This is especially important in systems that are operating on limited-power 01' resources. We validate these advantages in this dissertation using a case study. 1.1.3 Challenges in Mixed-Mode Adaptation The challenges in mixed-mode adaptation arise because the behavior of the old program and the new program overlap during adaptation. These challenges also occur in other forms of adaptation; here we discuss them in the context of mixed-mode adaptation. 0 Consistency. All updates to individual processes must ensure consistency of the whole system. In mixed-mode adaptation some interactions between the old program and the new program during adaptation may not be acceptable. Only mixed-mode interactions that are acceptable should be allowed to occur. For example, if process X requires a service fi'om process Y, then updating Y before X should be allowed only if new process Y’ can handle the service requests from old process X (we use notation Y’ to denote process Y after it has been changed). 6 o State-fiansfer. To ensure proper mixed-mode operation, each process will need to' preserve the state during adaptation. Specifically, if component Cl is replaced by component 02, then the state of component Cl at each process needs to be trans- ferred to component Cg. The efficiency of state-transfer mechanisms is particularly important in the case of mixed-mode adaptation compared to other forms of adapta- tion. 0 Assurance. To provide assurance guarantees for mixed-mode adaptation requires a formal way to specify and verify the adaptation. In the case of mixed-mode adap- tation, the changes cannot be specified in terms of system structure because the be- havior of the old program and the new program overlap during adaptation. The first challenge is in specifying the overlapping behavior. Next, there are challenges in verification and testing of mixed-mode adaptation due to overlapping behavior. 0 Reuse. There is a large amount of work done in adaptation. An approach for mixed- mode adaptation should reuse existing adaptation techniques whenever possible. 1.2 Thesis Based on the above motivation, in this dissertation we propose the following thesis. Our goal is to develop an approach for assurance of adaptation that applies to both quiescence and mixed-mode adaptations. The lattice-based modeling helps in verifying and testing of adaptive be- havior in distributed systems. To defend this, we make the following contributions in this dissertation: 1. Modeling and specification of adaptive behavior. We present the concept of an adaptation lattice to use for modeling and specifying the adaptive behavior. The adaptation lattice approach identifies the atomic adaptations, and the behaviors of the intermediate programs that occur during adaptation. 2. Verification of adaptive behavior. We present the concept of a transitional- invariant lattice to use for verifying the correctness of the system during adapta- tion. The transitional-invariant lattice approach ensures that safety is satisfied during adaptation and that adaptation eventually terminates. 3. Fault-tolerance during adaptation. We present the concept of a transitional- faultspan lattice to use for verifying the fault-tolerance properties of the system dur- ing adaptation. The approach can be used to verify different types of fault-tolerance during adaptation. Also, faults considered during adaptation can be different from the faults that the system is subjected to before or after adaptation. 4. Tradeoffs in adaptation. In order to assist the adaptation developer, we identify tradeoffs in adaptation that are useful while designing adaptation in various contexts. We show how concurrency during adaptation leads to increased verification complex- ity. We also show that increased concurrency during adaptation can also increase the communication overhead, which may not be desirable in certain systems. 5. Testing of adaptation. We use predicate detection techniques to test adaptation in distributed systems. We identify two classes of predicates for testing adaptation: (i) adaptation-transient, and (ii) adaptation-stable. We introduce the notion of adapta- tion vector to identify states of the intermediate programs during adaptation. 6. Component family design. We describe the design of a component family to sup- port adaptation. The component family design integrates aspects related to decision- making, adaptation logic, and component functionality, while maintaining a strict separation of different concerns. Specifically, the design separates the adapt-active parts of the component from the core functionality of the component. Moreover, the design separates the adaptation logic from the component functionality, thereby, simplifying the task of verifying adaptation. Organization of the dissertation. The remainder of this dissertation is organized as follows: We first discuss the background and related work in adaptation in Chapter 2. In Chapter 3, we describe the adaptation lattice approach to model adaptation and adaptive systems. In Chapter 4, we present transitional-invariant lattice to verify adaptation in the absence of faults and discuss a case study based on message communication application to demonstrate its use. In Chapter 5, we present transitional-faultspan lattice to verify adaptation in the presence of faults and discuss a case study to demonstrate its use. In Chapter 6, we present a case study of mixed-mode adaptation where we show performance advantage of mixed-mode adaptation. We also discuss how the adaptation lattice approach can be used to verify mixed-mode adaptation. In Chapter 8, we describe the testing of adaptation using predicate detection techniques. In Chapter 9, we describe the component family design to build an adaptation-verified library of components. Finally, we discuss conclusions and future work in Chapter 10. Chapter 2 Background and Related Work Adaptation has been studied in various contexts, such as software change, reconfiguration, upgrade or update, change on fly, program modification, software evolution, etc. In this chapter, we first give an overview of some taxonomies of software adaptation and discuss the areas in adaptation that are focus of this dissertation. Then, we review some of the related works. 2.1 Adaptation Taxonomy Several works [1 1—13, 25—27] have presented taxonomies of adaptation to categorize dif- ferent adaptation mechanisms. In this section we discuss some of them. One of the early works in classifying adaptation was done by Lientz and Swanson [25] in context of software maintenance. They proposed a topology that distinguishes between perfective, adaptive and corrective maintenance activities. This taxonomy was further refined by Chapin et al. [26], where they classify software evolution and software 10 maintenance into 12 different types: evaluative, consultive, training, updative, reforrnative, adaptive, performance, preventive, groomative, enhancive, corrective and reductive. In essence, the works in [25] and [26] categorize the adaptation activities on the basis of their purpose (i.e., the why of software change). In our work, the reason why a software needs to adapt is irrelevant. In [12], Buckley et al. gave a taxonomy of adaptation that focuses on the how, when, where, and what, of software change. This taxonomy is based on the characteristics of adaptation mechanisms and the factors that influence these mechanisms. The adaptation mechanisms refer to the software tools and algorithms used to perform the adaptation. However, this taxonomy does not consider formalisms used in adaptation. Regardless, the taxonomy is very useful in identifying and classifying various adaptation scenarios, and in categorizing and comparing different adaptation mechanisms. In our work, we focus on dynamic adaptation, i. e. the one performed at run-time (when of adaptation). In [11], McKinley et al. classified adaptation into two categories, namely, param- eter adaptation and compositional adaptation. Parameter adaptation modifies program variables that determine behavior. In contrast, compositional adaptation exchanges algo- rithmic or structural system components with another. Compositional adaptation provides more flexibility in supporting a variety of changes compared to simple tuning of program variables in case of parameter adaptation. However, compositional adaptation involves a lot more challenge in design, implementation and verification. Although our approach in this dissertation focuses on compositional adaptation, it can also be applied in the case of parameter adaptation. Specifically, we consider addition, removal, or replacement of com- ponents [28] in this dissertation. A component implements part of the desired behavior 11 of the program. Our notion of component is different from the popular usage of the term in the development community, where a component and a class (or object) (an artifact in object-oriented programming) are considered one and the same. According to our defini- tion, a component can consist of more than one class, and may even be deployed at multiple processes. Dynamically changing software is challenging in terms of correctness, robustness, and efficiency. Formal specification and verification is important in order to gain assurance in adaptive software. A variety of formal specification languages have been developed to gain a better understanding of the foundations of software change. Bradbury et al. in [13] has given a survey of 14 formal specification approaches based on graphs, process algebras, logic, and other formalisms. Graph-based approaches [14—16] use graph rewriting rules to specify dynamism. Approaches in [17, 18, 29, 30] use a variety of process algebras such as Calculus of Communicating Systems (CCS), Communicating Sequential Processes (CSP), and n-calculus. Architectural Description Language (ADL) based approaches [19, 20, 31] model programs as components and connectors, and adaptation as reconfiguration of connections. Generally, these approaches have focused on specifying the design of adaptive software and changes are specified in terms of the system structure. However, the approaches are inadequate in specifying the adaptive behavior. Approaches in [17, 18, 29, 30] are a few exceptions that have used process algebras to specify the behavior of adaptive programs. However, these approaches suffer from the following limitations: (2') adaptation-specific behavior of the program is not distinctly separated from the non- adaptive behavior of the program, (it) the approaches are appropriate to specify changes in client—server applications, but it is not clear how it would apply to protocol changes in group 12 communication application, (iii) state-transfer is not specified explicitly, so it is not clear if the new program behavior has to start in the initial state or can start in some arbitrary state, (iv) the approaches are inadequate in specifying mixed-mode behavior during adaptation, and (v) the approaches use specific type of formalisms that may potentially limit wide- spread use and any extensions to include different types of adaptations. In our work, we address formal specification and verification of the behavior of system during adaptation. 2.2 Related Work In this section, we briefly review some of the previous work in the context of verifying adaptation. 2.2.1 DYMOS Lee presented one of the early systems to support dynamic updating called DYMOS (Dy- namic Modification System) [32]. It supports a single programmer modifying a module- based program dynamically (that is, without stopping its execution). In DYMOS, the pro- grammer modifies and recompiles the source code of procedures and modules that need to be replaced. The programmer then requests the system to change the current core image to incorporate new code and data. New object code is inserted by a dynamic modification process that is executed in parallel with other user processes. DYMOS supports program written in StarMod language [33]. In the DYMOS envi— ronment each updateable program is associated with a command interpreter, a source code management system, a StarMod compiler and a run-time environment. The system allows 13 D‘. is .u‘ Co. tier .1 VI" 2. C031 don t n _ . xii-hump) r 1.— 1. L J individual procedures of a program to be changed. In the context of ensuring correctness, Lee gave a procedure for partitioning change into sequence of smaller changes. The decomposition is done in such a way that the program behaves “acceptably” after each change. The decomposition approach helps in specifying behavior between changes where the program is operating with some reduced functionality. DYMOS does not address changes in distributed programs, however, it is important as it is one of the early systems that studied dynamic adaptation with particular concern for correctness. It gave some basic theoretical contributions in the context of correctness of adaptation. 2.2.2 CONIC Conic [9, 34, 35] is a distributed programming system that supports dynamic reconfigura- tion of programs. A program in Conic consists of a number of processes that communicate with each other using well-defined entry and exit ports. Conic modules (processes) do not communicate by naming each other but by naming the ports. Thus, reconfiguration can be done by creating instances of modules and linking entry ports of these modules with exit ports of other instances. The reconfiguration is done by providing a configuration change specification. The specification specifies the creation and deletion of modules and links. The configuration manager translates the change specification into commands to the operating system to ex- ecute the reconfiguration operation. In [9], Kramer and Magee gave a formal basis for dynamic reconfiguration. They I4 specify change as structural change, in terms of component creation/deletion and con- nection/disconnection. In their model, interactions between processes are considered as transactions. A transaction is an exchange of information between two nodes, initiated by one of the nodes. A node is said to be in passive state if it is not currently engaged in a transaction that it initiated and it will not initiate new transactions. A node is quiescent if it is in passive state, is not currently engaged in servicing a transaction, and no transactions have been or will be initiated by other nodes which require service fiom this node. It is claimed that a dynamic reconfiguration will leave the system in a “consistent” state if all the involved processes are quiescent at the time of reconfiguration. The main limitation of using node quiescence for adaptation is that it leads to excessive blocking during reconfiguration. Moreover, a large number of messages are required in synchronizing all nodes to reach quiescence state. Furthermore, local states of nodes are not considered in determining node quiescence. As a result, if local states of nodes are not consistent at the time of adaptation (reconfiguration), then the new version of the program will start in an inconsistent state. 2.2.3 ARGUS Argus [36] is a programming language for building reliable distributed applications. It is based on the CLU programming language and provides support for atomic transactions and crash recovery. An application in Argus consists of a set of servers called guardians, that communicate with each other using remote procedure calls. Dynamic reconfiguration in Argus is described in [37]. Similar to Conic, the connec- 15 tions between guardians can be rerouted dynamically. Guardians are made quiescent before replacement. To reach quiescent state the guardian can either abort running transactions or wait for them to complete. Argus suffers from the same limitations as Conic. Additionally, the replacement system requires Argus crash recovery facilities in order to work properly, which may make it difficult for use in other systems. 2.2.4 Online Software Version Change Gupta and Jalote [24, 38] presented a framework for modeling changes to running programs and use it to study the validity of an on-line change. In their notion of validity, a change is valid if some time after the change, the process reaches a reachable state of the new program version. Thus, there is a “transition period” following a change, after which the the system behaves like a new program. However, it is not clear what behavior is acceptable during the transition period. In their work, they consider different programming language styles, including imper- ative languages without procedures, imperative languages with procedures, and object- oriented languages. Their work focuses on change in sequential programs. They also dis- cuss how their approach can be extended to distributed programs where only one process is changed. For the case where multiple processes are affected due to change, they consider a remote procedure call based model. They stop all processes before the change which leads to disruption of service till the change is completed. Moreover, their work does not address validity of on-line change in a distributed system that uses unrestricted message passing model. 16 2.2.5 Model-Based Development of Adaptive Software Zhang et al. [39—41] proposed a model-based development of dynamically adaptive soft- ware. Their approach separates the model specifying the adaptive behavior from the model specifying the non—adaptive behavior. They use global invariants to specify properties that should be satisfied by adaptive programs regardless of adaptations. They enumerate differ- ent execution domains in which the program is required to execute, and build a state-based model in each domain. They enumerate possible adaptations of the program from one do- main to another. Furthermore, they introduce A-LTL, an extension to linear temporal logic, to specify an adaptation from one program to another. They present three semantics of adaptation: one-point, guided and overlap. Similar to our goals, their work also addresses behavioral verificatiOn during adaptation. However, they do not consider general safety and liveness properties [42, 43] during adaptation. Their approach seems more suitable for quiescence adaptation and its application in mixed-mode adaptation is not straightforward. 2.2.6 Others Several other works that addressed runtime adaptation include Podus [44—46], Durra [47], Polylith [48] and Dynamic ML [49, 50]. Similar to the limitations of the works discussed earlier, these approaches also suffer from one or more of the following limitations: (2) apply only to a single process change, (it) apply only to distributed systems that communicate via RPC, but not in the case of asynchronous message passing model, and (iii) verify only structural changes; do not consider behavioral verification of system during adaptation. Other surveys of adaptation approaches can be found in [11, 38, 51, 52]. 17 Chapter 3 Modeling Adaptation In this chapter, we introduce a formal model for adaptation in asynchronous programs. We first present an informal overview of how adaptation occurs in a distributed system in Section 3.1. Then, we use the ideas discussed in Section 3.1 to formalize the model of adap- tation in Section 3.2. In Section 3.3, we give definitions used in formal reasoning about the correctness of adaptation. Finally, in Section 3.4, we describe the concrete representation of programs and adaptations using guarded commands. 3.1 Adaptation Overview We consider compositional adaptation as one that adds, removes, or replaces a component during adaptation. A component implements a part of the desired behavior of the system. A component (formally defined later in the chapter) consists of one or more fractions, where each fraction is associated with one process in the system. For the discussion in this chapter, we assume that adaptation replaces a component; addition and removal can be considered 18 as special cases of replacement. We refer to the component that gets replaced as the old component and the component that replaces the old component as the new component. To replace an old component with a new component requires replacing each fraction of the old component with the correspond- ing fiaction of the new component at all processes. An adaptation in a distributed system involves multiple steps that are executed at various processes. For example, consider a protocol that provides encrypted communication between a sender and a receiver. Such a protocol consists of two types of fractions, namely, encryption fraction at the sender that encrypts the packets before sending and decryption fiaction at the receiver that decrypts the encrypted packets received fiom the sender. To replace such a protocol, each fiaction of the protocol needs to be replaced. Thus, the adaptation in a distributed program involves mul- tiple steps that are executed at various processes. We consider the replacement of a fraction at a single process as an atomic step of adaptation, and call it an atomic adaptation. The old program, i. e., the program before adaptation uses the old component and the new program, i. e., the program after adaptation uses the new component. An adaptation replaces the old component with the new component, or equivalently we can say that the adaptation replaces the old program being executed by the system with the new program. We assume that the old program and the new program are independently correct, i. e., by themselves they can execute and produce acceptable behavior. The goal of verifying adaptation is to ensure that: (i) the adaptation ends in a state from where the system satisfies the behavior of the new program, and (ii) the (overlapping) behavior during adaptation is acceptable (as defined by specification during adaptation). To verify the behavior during adaptation we need to classify the states of the program 19 during adaptation. The intermediate states that occur during adaptation are due to overlap- ping of the old program and the new program. The properties satisfied by these intermediate states may be different from the old program and the new program. Consequently, the be- havior expected during adaptation needs to be specified separately from the old program and the new program. For example, in the case of adaptation of encrypted communica- tion protocol discussed above, consider the system in which the adaptation has replaced the encryption fractions at the sender but has yet to replace the decryption fractions at the receiver. During adaptation the sender may continue to send packets that may be buffered at the receiver or the sender may be blocked from sending more packets until the receiver has replaced the decryption fractions. Clearly, there are different possible behaviors during adaptation, and the expected behavior during adaptation needs to be specified separately from the behavior of the old program and the new program. Towards this end, we define the notion of intermediate program. Intermediate program. An intermediate program arises due to overlapping of behavior of the old program and the new program. The first atomic adaptation modifies the old program into the first intermediate program. Similarly, other atomic adaptations modify one intermediate program into the next intermediate program. The last atomic adaptation results in the new program. The specification during adaptation identifies the requirements for these intermediate programs. We now present the formal model for adaptation and adaptive systems. 20 actio fora 1 15 not antom, Where asali't l0 mail filthy“c reStilts . 3.2.1 We mm 3.2 Abstract Model of Adaptation We model a process as an automaton A represented as a tuple (S, 2, 6, So), where o S (A) - a set of states 0 2(A) - a set of actions 0 6(A) - a state-transition relation, where 6(A) Q S (A) x 2(A) x S (A) SO(A) - a nonempty subset of 5(A) known as initial states Each element (5. 7r, 5’) of 6(A) is known as a transition, where s, 3’ E S (A) and it E 2(A). If A has a transition (3, 7r, 5’) it means that 7r is enabled in state s and executing action 1r in state 3 will lead to state 5’. A transition of the form (3, _, s’) is an abbreviation for a transition whose source is s and target is 3’, where the action that caused the transition is not of interest. A program consists of a set of process automata. We assume the sets of actions of automata are disjoint. We consider asynchronous or interleaved execution for a program, where at any time only a single process can execute its action. This approach can be viewed as a reduction of concurrency to non-determinism, where a concurrent execution gives rise to many possible corresponding interleaving orders. We could have used more complex automata such as [53—55] to model concurrency, but we adopt an interleaving model as it results in a simpler theory for specification and verification of adaptation. 3.2.1 Adaptation as a set of automata We model an adaptation A using a 5-tuple as follows: 21 o I — a set of automata o P - an automaton of the old program, P E I Q - an automaton of the new program, Q E I Ea - a set of special type of actions known as adaptive actions, Sam U E(A)) = (i) AEI Smap - a state mapping is a partial function ( U S (A)) x 23a ——> ( U S (A, )) that AEI A’eI satisfies the following two properties: 2'. Vs,s’,7ra,A1,A2 : s E 3(A1),s’ E 3(A2).na E Ea,A1,A2 E I: ((s, tra), 3’) E Smap => A1 75 A2, and ii. V31,32,s'1,s'2,na,A,A1,A2 : 81,82 6 S(A),s'16 S(A1),s’2 E S(A2), 7Ta E 2a,A,A1,A2 E I: ((31,7ra),s’1)6 Smap /\ ((32, tra), 9’2) 6 Smap => A1: A2 The old program, the new program, and all intermediate programs are modeled as au- tomata. We also assume that the states of the automata in I are pair-wise disjoint. Given an adaptive action, the state mapping defines an automaton and the states of that automaton in which the adaptive action can execute, and the resulting automaton and the state of the resulting automaton in Which the adaptive action terminates. The state mapping function satisfies two properties. The first property ensures that executing an adaptive action results in a change of automaton (whereas, executing an action results in a state of the same au- tomaton). The second property states that if an adaptive action can be executed in different states of the automaton, then it will result in an unique automaton (the resulting states may be different, but they will be of the same automaton). 22 Note that the state mapping is a partial fimction, as it may not be possible to perform corresponding atomic adaptation in all states. Each element ((3, tra), 3') of Smap can be represented as a triplet (5,1ra,s’). Similar to the state-transition relation of an automaton, a state mapping Smap can be defined as a subset of ( U S (A)) x Ba x ( U S (AI )) with A61 .A’eI ”l E Smap then s’ = 3”. Each element the restriction that if (3, na, 8’) E Smap and (3, 7rd, 5 of Smap is known as adaptive transition. If Smap has an adaptive transition (3, fig, 5’), where s E S (A) and s’ E S (A, ), it means that 7ra is enabled in A and executing 7rd in state 3 of A will lead to state 8' of A'. Note that the range of Smap is S (AI ) and not SD (A’). In other words, we do not require an adaptive action to terminate in an initial state of the resulting automaton. Now, given the state mapping of adaptation A, we can define an automata- transformation (partial) function 5., : I x 23., —» I. We have, ((A, M), A’) e 5., 111’ 33,5’ : s E S(A),s’ E S(A') : (s, na,s') E Smap. Each element ((A,7ra),A') (equiva- lently, (A, ira, A')) of 6a is known as an atomic adaptation. Thus, each atomic adaptation is modeled as transforming one automaton to another automaton. The automata-transformation fimction represents an adaptation lattice defined as fol- lows: Adaptation Lattice. Adaptation lattice (cf. Fig. 3.1) is a finite directed acyclic graph in which each node is labeled with an automaton and each edge is labeled with an atomic adaptation, such that, 1. There is a single start node P having no incoming edges. The start node is associ- ated with the automaton representing the old program. The automata-transfonnation 23 function (correspondingly, Smap) satisfies the following condition: VA,7Ia I: (A, Na, P) g 6a 2. There is a single end node Q having no outgoing edges. The end node is associ- ated with the automaton representing the new program. The automata-transformation function (correspondingly, Smap) satisfies the following condition: VA,7ra :: (Q. 7l'a,.A) Q 6a 3. Each intermediate node R has at least one incoming edge and at least one outgo- ing edge. It is associated with the automaton representing the intermediate program. The automata-transformation function (correspondingly, Smap) satisfies the follow- ing condition: VA : .4 ,2 P ; (324%.. :: (.4’. 71a,A) 6 6a) /\ VA=A#Q=(3A'17ra:=(v4.7ra,«4’) 66a) A path in the lattice fiom the start node to the end node is called adaptation path. Figure 3.1: An example of an adaptation lattice. 24 3.2.2 Adaptation as an automaton In the previous subsection we defined adaptation as a 5-tuple. An adaptation can also be viewed as an automaton defined as follows: - S(A)= U {(A.s)ls€S(A)} AeI - 2(A) = U {(Am) l r E 204)} U 23.. AeI - 6(4): U {((A.s).(A.r).(A.s’))Hairs-9’)66(A)}U AEI {((A,8),7l'a, (A’,8’) l (8177078,) E Smap} 0 S0(4) 9 (79,309)) In definition of 80(A) we use S (P), and not 80(P), because adaptation should be able to start at any point in the execution of P. Modeling adaptation as an automaton allows us to verify some general properties of adaptation not concerning any overlapping behavior. On the other hand, modeling adapta- tion as a set of automata is important to identify individual intermediate automata during adaptation to verify properties due to overlapping behavior of the old program and the new program. 3.3 Adaptation Specification In this section, we give some formal definitions used in specifying and verifying adaptive programs. We adapt these definitions from Arora and Kulkami [56, 57]. 25 Definition (State predicate). A state predicate X of A is any subset of S (A). We say X istrueinstatesifs E X. Definition (Closure). A state predicate X of A is closed in A (respectively, 6 (A), E(A)) iff the following condition holds: Vs,s’,7r :: ((s,1r,s’) E 6(A)) => (8 E X => 3’ E X) Definition (Computation). A computation of program A (respectively, adaptation A) is a sequence of states a = (so, 31, ...) satisfying the following conditions: 0 For first state 30 in a, so 6 80(A) (respectively, SO(A)) o Ifa is infinite then Vj :j > O : (Eltr :: (sj_1,7r,.sj) E 6(A)) (respectively, 6(A)) o If a is finite and terminates in state 3,, then for all it, there does not exist a state 5 such that (31,7r,s) E 6(A) (respectively, 6(A)), and Vj : 0 < j 3 la] : (3n :: (sj_1, 7r, Sj) E 6(A)) (respectively, 6(A)) Definition (Specification). A specification of A is a set of computations. Given a specifi- cation, a computation in a specification is known as an acceptable computation. Following Alpem and Schneider [42], a specification can be decomposed into a safety specification and a liveness specification. As shown in [58], for a rich class of specifications, safety specification can be represented as a set of bad transitions that must not occur in program computations. Definition (Satisfies). A satisfies a specification if each computation of A is in the specifi- cation. A satisfies a specification from X iff (i ) X is closed in A, and (ii) each computation of A is in the specification and starts from a state where X is true (i.e., 50(A) _C_ X). 26 CI to ah llli D1. f/m adap'. each 1 hahf i") F9“ Definition (Invariant). The state predicate X of A is an invariant iff A satisfies the specifi- cation from X. Note that, if X is an invariant of A, then X 2 30 (A). Inforrnally speaking, the invariant predicate includes the set of all states reached in the “acceptable” (correct) computations of A. Note that the invariant predicate may include states that are not reach- able in all computations of the program. However, computations from those states satisfy the specification and, hence, those states may be valuable in adding recovery transitions to provide fault tolerance [5 8]. Definition (Safety during adaptation). Similar to the specification of A, safety speci- fication during adaptation A is specified as a set of bad transitions that must not occur in computations of adaptation A. Liveness during adaptation. We argue that the specification during adaptation should be a safety specification. This is due to the fact that one often wants the adaptation to be completed as quickly as possible. Hence, it is desirable not to delay adaptation to satisfy the liveness specification during adaptation. Rather, it is desirable to guarantee that, after adaptation, the program reaches states from where its (new) safety and liveness specifica- tions are satisfied. Thus, the implicit liveness specification during adaptation is that the adaptation completes. In other words, the liveness specification during adaptation is that each intermediate program eventually executes its adaptive action. For these reasons, we have omitted the representation of liveness specification of the program. 27 3.3.1 Fault-tolerance In this subsection we give formal definitions for specifying and verifying fault-tolerance properties of adaptive programs. These definitions are also adapted from Arora and Kulka- rni [56, 57]. Definition (Fault class). Let 23f be a set of fault actions. A fault class F(A) for program A is a subset of the set S (A) x 23f x S (A). We use A[]F to denote the transitions obtained by taking the union of the transitions in 6 (A) and the transitions in F (A). A fault class F (A) for adaptation A is: U {((A73)17rfr(“413’))l(327rerI)E F(A)} A e I Definition (F ault-span). A state predicate T is a fault-span (F-span) of A from invariant S iff (i) S g T, and (ii) T is closed in A[]F. A fault—span of a program includes the set of states that a program can reach in the presence of faults and it is closed under the execution of program and fault actions. Definition (Computation in presence of faults). A computation of program A (respec- tively, adaptation A) in the presence of faults is a sequence of states a = (so, 31, ...) satisfying the following conditions: 0 For first state 30 in a, 80 E SO(A) (respectively, SO(A)) o Ifo is infinite then Vj :j > 0 : (3n :: (sj_1,7r, 53') E 6(A) U F(A)) (respectively, 5(A) U F (Al) o If 0' is finite and terminates in state 31, then for all it, there does not exist a state 5 such that (Sl,7l',8) E 6(A) (respectively, 6(A)), and Vj : 0 < j S |a| : (3n .: (sj__1, 7r, 3]) E 6(A) U F(A)) (respectively, 6(A) U F(A)) 28 o o Ifais infinite then 3n : n 2 0 : (‘v’j : j > n : (3n :: (sj_1,7r,sj) E 0(A)) (respectively, 6 (A)) The first requirement captures that the computation begins in a initial state of the pro- gram (respectively adaptation). The second requirement captures that in each step, either a program (respectively, program or adaptive) transition or a fault transition is executed. The third requirement captures that faults do not have to execute, i.e., if the program reaches a state where only a fault transition can be executed then the fault transition need not be executed. Finally, the fourth requirement captures that the number of fault-occurrences in the computation is finite. This requirement is the same as that made in the previous work [59—62] to ensure that eventually recovery can occur. Definition (Fault-tolerance (F-tolerant)). A is F-tolerant for specification spec from S iff the following two conditions hold: (i)A satisfies spec from S, and (ii) there exists T such that T is an F-span of A from S, and every computation of A[]F starting in a state where T is true satisfies spec. Remark 1 . Henceforth, whenever the invariant S, the program A, and the specification spec are clear from the context, we will omit them; thus, “T is a F-span of A from S for spec” abbreviates to “T is a F-span”. Remark 2. Different types of tolerance specifications that normally occur in practice, namely, masking, fail-safe, and non-masking tolerance have been considered in [56-58]. In this dissertation, we assume masking fault-tolerance unless specified otherwise. The definitions can be easily extended to consider fail-safe and non-masking tolerance during adaptation. 29 3.4 Concrete Representation In this section, we discuss the programming notation we use to describe the system. For brevity, we express programs using guarded commands [63, 64]. This gives a compact representation of the program defined as automata in Section 3.2 (in terms of state space and transitions). Translating the guarded command representation of the program to its automata representation is straightforward, as we discuss in this section. Furthermore, the guarded command representation is closely related to a concrete im- plementation. Specifically, techniques for obtaining a guarded command representation from a program written in a general purpose language, such as C, are discussed in [65]. Also, techniques for transforming a program in guarded commands into a program in gen- eral purpose languages are discussed in [66—68]. 3.4.] Program A program P is specified by a finite set of processes and channels. A process p is specified by a set of variables and a finite set of actions. The processes in a program communicate with one another by sending and receiving messages over unbounded channels that connect the processes. A channel from process p to process q is denoted by a channel variable CW], which is an unbounded queue. Only process p can append an item of data to the rear of the queue Cp,q and only process q can delete an item at the head of the queue Cp,q. Each variable has a predefined nonempty domain. A state of a process is obtained by assigning each variable a value from its respective domain. The state of the channel connecting p and q is given by the value of the queue CW]. The state of the program is given by the state of 30 all the processes and the channels. The state space of the program P, S (P), is the set of all possible states of P. We use 3(a) to denote the value of variable a: in state 3, and V(p) to denote the set of variables of process p. A state predicate of P is a boolean expression over process and channel variables. Note that a state predicate may be characterized by the set of all states in which its boolean expression is true and, therefore, is a subset of the state space of the program. Action. An action of p is uniquely identified by a name, and is of the form (name) : (guard) —> (statement) A guard of each action is a state predicate of P. The statement of each action is such that its execution updates zero or more process or channel variables. The sending of a message from p to q causes a message to be appended at the tail of the queue CW]. The receipt of a message from q by p is modeled by removing a message from the head of the queue Cp,q. The set of actions of the program P, 2(P) is given by the set of names of all the actions of all the processes of P. Each action of p gives the set of transitions of the form (3, 7r, 3’) such that the guard of action 7r is true in state 3 and execution of statement of it in 3 results in state 3’. Thus, the state-transition relation 6 (P) is obtained from the set of actions of all the processes of P. We say that an action of p is enabled in a state of p iff its guard evaluates to true in that state. 31 3.4.2 Component A component is specified by a finite set of fractions that are involved in providing a common functionality. Intuitively, a component implements a part of the desired behavior of the system, such as some algorithm or protocol. A component fraction is specified by a set of variables and a finite set of actions that are associated with a single process. A component (respectively, fraction) is syntactically the same as a program (respectively, process), with the only difference that some variables of the component are designated as input, whose values are supplied by the program with which it is composed. The composition of the component and the program is the union of the variables and actions of the component and the program. 3.4.3 Adaptive action An adaptive action is a special type of action, which is identified by a unique name and is of the form (name) : (guard) —+ Transfor'mT0(p’,). If the adaptive action is an action of process p, then when the statement of the adaptive action is executed, p is replaced by p’ and state-mapping (I) is used to initialize the variables ofp’. Each adaptive action 1ra gives a set of adaptive transitions of the form (3, 7ra, s') such that the guard of 7ra is true in state 3 of process p and execution of the statement of na results in state 3’ = (s) of process p’. The state mapping function S map(A) is obtained fi'om the set of all adaptive actions. 32 From a modeling perspective, we consider that the adaptive action replaces the entire process, even if only a small part of it is actually changed. In an actual implementation, the adaptive action can be performed in various ways, such as by blocking execution of some method, or by loading/unloading some class. However, for verification we need to consider only the effect of the adaptive action. Additionally, considering each adaptive action as a generic form of process replacement gives the developer freedom to implement the adaptive action based on the platform and the language used. 3.4.4 State mapping We define the following classes of state mapping (P that occur during atomic adaptation: 0 Identity mapping. In identity mapping, the names and the values of the variables remain the same. Formally, V(p) g V(p') and for all s, ((s))(y) : 3(3)). 0 Quasi mapping. In quasi mapping, the name of the variable of the new process is different from that of the old process, though its value is the same as the value of some equivalent variable in the old process state. Formally, for a variable y of V(p’), there exists a variable :1: of V(p) such that for all s, ((s))(y) = yd, where yd E D(y) and D(y) denotes the domain of variable y. 0 Mixed mapping. Most mappings that occur in practice are mixed mappings, in which variables of the new process V(p’ ) are divided into disjoint sets, and one of the above mappings is associated with each set. Notation. We use “.” to denote the belongs to relation. For example, if variable 1) belongs to process p, it is denoted by pa, and action a of process p is denoted by pa. A process p of program P is denoted by P.p, and a fraction 2' of component C is denoted by C.z'. For brevity, we avoid using belongs to relation if it is obvious from the context. 34 Chapter 4 Verifying Adaptation in Absence of Faults In this chapter, we introduce the notion of transitional-invariant and transitional-invariant lattice to verify the correctness of adaptation. We first define transitional-invariant in Sec- tion 4.1. Next, in Section 4.2, we define transitional-invariant lattice and give a theorem to prove correctness of adaptation. In Section 4.3, we present a case study of adaptation in the message communication application to demonstrate the use of transitional-invariant lattice. Finally, we discuss some of the questions raised by this work in Section 4.4. 4.1 Transitional-Invariant As discussed in Chapter 3, the program during adaptation consists of actions of the old pro- gram and actions of the new program. Therefore, we consider intermediate programs ob- tained after one or more atomic adaptations. Similar to the invariants that are used to iden- 35 1111. De are ads: Ira: 8.) La) tify “legal” program states and are closed under program execution, we define transitional- invariants. Definition (I‘r'ansitional-invariant). Let R be an intermediate program in the adaptation A. A transitional-invariant is a predicate that is closed in R. Note that the actions of an intermediate program are the old program’s actions that are not yet removed and the new program’s actions that are already added. However, the adaptive actions do not necessarily preserve the transitional-invariant. Now, we define transitional-invariant lattice. 4.2 Transitional-Invariant Lattice A transitional-invariant lattice is an adaptation lattice with each node having one predicate and that satisfies the following five conditions: 1. Safety of old program. The start node P is associated with an invariant Sp of the program before adaptation. 2. Safety of new program. The end node Q is associated with an invariant SQ of the program after adaptation. 3. Safety of intermediate program. Each intermediate node R is associated with a predicate TS R that is a transitional-invariant for any intermediate program at R (i. e., an intermediate program obtained by performing adaptations from the entry node to R). Furthermore, any intermediate program at R satisfies the (safety) specification during adaptation fiom TS R- 36 for are 5 durii 11111 8&1 n and 51.11 4. Safety of adaptive action. If a node labeled R,- has an outgoing edge labeled a to a node labeled Rj, then for all adaptive transitions (5, a, s’) in Smap where TS Rt is true in state 3, TSR]. is true in state 3’. In other words, Vs, s’ : (s, a, s’) E Smap : s E TS R,- => 3’ E TS Rj‘ Furthermore, all the adaptive transitions (3, a, s’) satisfy the safety specification during adaptation. 5. Progress of adaptation. If a node labeled R has outgoing edges labeled a1, a2, ..., ak to nodes labeled R1, R2, ..., Rk, respectively, then in all computations of adaptation there exists a transition (3, s') such that for some 2' : 1 g 2'. g k : (s,a~,s’) E Sma . Furthermore, ‘v’s : s E TS :(Va,s’ : a E Ea — a ,...,a : 2 P R 1 k (3,11, 8') 6! Smapl- Correctness of adaptation. Intuitively, an adaptation is correct if the following conditions are satisfied: If the adaptation begins in a legitimate state of the old program, then safety during adaptation is met and the resulting state of the new program is legitimate. With this intuition, if adaptation begins in a state where invariant of the old program is true, then we say that adaptation is correct if: 0 Adaptation terminates in a state where invariant of the new program is true 0 During adaptation safety specification during adaptation is satisfied 0 Eventually adaptation terminates The following theorem states that finding a transitional-invariant lattice is necessary and sufficient for proving correctness of adaptation. 37 im' lsfie dun Iallw the : Theorem 4.1. Given Sp as the invariant of the program before adaptation and SQ as the invariant of the program after adaptation, the adaptation from P to Q is correct if and only if there is a transitional-invariant lattice for the adaptation with the start node associated with Sp and the end node associated with SQ. Proof (=>) If the transitional-invariant lattice exists, then adaptation is correct. If the stated conditions are satisfied, then the specification of the old program is sat- isfied when the adaptation starts. Also, the existence of the transitional-invariant lattice during adaptation ensures that for each intermediate program that occurs during adap- tation, the specification during adaptation is satisfied. Moreover, from the definition of the transitional-invariant lattice, each adaptive action satisfies safety specification during adaptation. Also, in each intermediate program eventually some adaptive action will be executed, which ensures the liveness of adaptation. Furthermore, the last adaptive action terminates in the invariant of the new program, from where the system satisfies the behav- ior of the new program. Thus, the existence of the transitional-invariant lattice proves the correctness of adaptation. (<2) If adaptation is correct, then transitional-invariant lattice exists (proof by construction). Let adaptation consist of n adaptive actions a1, ..., an. Consider all adaptive actions that can occur in a state of the old program. Since the adaptation is correct, each of these adaptive actions occur in a state of the old program where Sp holds, and execution of these adaptive actions satisfies the safety during adaptation. 38 fi 59“.! an». . i . i‘ 0 '31:“ VF!" E tum W _ put the Sill) old ¥'CAAI 1111 L.’ let-c7 Now, consider the intermediate program [1 reached after execution of al. In all com- putations of the old program till the execution of al, the invariant Sp is satisfied since the old program is correct. Since the adaptation is correct, the intermediate program [1 satisfies the specification during adaptation (otherwise, a1 is not permitted in a state of the old program). Once a1 is executed, we consider all the computations of the intermedi- ate prograrn 11, and identify the transitional-invariant TS 11 associated with it. Similarly, we consider all computations of the intermediate program reached after the execution of some adaptive action other than al in a state of the old program, and find the transitional- invariant corresponding to that intermediate program. In this way, we construct the first level of intermediate programs and corresponding transitional-invariants starting fi‘om the old program. Now, for each intermediate program at the first level we consider all possible adap- tive actions that can occur in some state of its computations. We can then identify the transitional-invariants at the second level in the lattice by considering all the computations of the intermediate programs reached due to the execution of adaptive actions in the states of the corresponding intermediate programs at first level. In this way, we can continue to find transitional-invariants at various levels in the lattice. Since the adaptation is correct, the atomic adaptation in each intermediate program at level n — 1 will result in a state of the new program where SQ holds. Thus, the correctness of adaptation proves the existence of the transitional-invariant lattice. El 39 gr 4.3 Case Study: Reliable Message Communication In this section, we present an example that illustrates how the transitional-invariant lattice can be used to verify correctness of adaptation in the context of a simple message communi- cation program. The communication program that we consider is an abstraction of the com- munication aspect of the applications such as video conferencing, audio streaming, or any distributed application where messages are transferred over wired or wireless channel. We first describe the fault-intolerant message communication program in Section 4.3.1. Then, we describe the FEC-based proactive component in Section 4.3.2. Next, in Section 4.3.3, we discuss adaptation of adding the proactive component to the fault-intolerant message communication program. In Section 4.3.3, we also identify the transitional-invariant lattice for the adaptation. In Chapter 5, we continue with the message communication program to discuss the adaptation of replacing the proactive component with the acknowledgment- based reactive component. 4.3.1 Fault-Intolerant Communication Program Specification of the communication program. An infinite queue of messages at sender process 3 is to be sent to two receiver processes T1 and r2 via two unicast channels and copied into corresponding infinite queues at the receivers. Faults may cause loss of mes- sages in the channel. The message communication program is shown in Figure 4.]. Only send and receive actions of the program are shown, since only those actions are considered for adaptation. Processes 3, r1, and r2 maintain queues sQ, r1.rQ, and r2.rQ respectively. sQ con- 40 program 7pintol process 3 var sQ : queue of integer m : integer begin send : pisEmpty(sQ) —+ m :2 head(sQ); CS,TerS,T2 3: Cs,r1 0 m, 08,7‘2 0 777. end process ri[i = 1,2] var rQ : queue of integer m :integer begin receive : -dsEmpty(Cs,rz-) ——+ m := head(C3,rz-); rQ :2 'rQ o m end Figure 4.]: Message communication program (fault-intolerant version). tains messages that 3 needs to send to 7'1 and r2. The messages received by r,- from s are stored in 7310. Let mQ be the queue of all messages to be sent. (mQ is an auxiliary vari- able that is used only for the proof.) Initially, sQ = 77162. The function head(sQ) returns the message at the front of 3Q, and head(sQ, It) returns k messages from the front of 362. The notation 362 o d denotes the concatenation of SC) and (d). Invariant. The invariant of the communication program is Sp = S 1 /\ S2, where 51 2 Vi : (m,- E r1.rQ V m,- E r2.rQ) => mi E mQ, and 5'2 2 Vi : m,- E mQ => (m,- E sQ 2 ((m2 E C3,?1 X m,- E r1.rQ) /\ (m,- E C5”? Y m,- E r2.rQ))). In the above invariant, S 1 indicates that messages received by the receivers are sent by the sender. S2 indicates that a message m, is not yet sent by the sender, or it is in the channel, or it is already received by the receiver, all exclusive. 41 Notation. The s bol X denotes exact! one o erator, i. e., a: )1 X 2 im lies exact] one of ym y P y P Y x, y and z is true. 4.3.2 Proactive Component The proactive component sends extra messages to the receiver, which the receiver can use to recover from the lost messages. It consists of two types of fractions: encoder and decoder. The encoder fraction is added at the sender process and the decoder fraction is added at the receiver process. The encoder takes (n — k) data packets and encodes them to add I: parity packets. It then sends the group of 71. (data and parity) packets. The decoder needs to receive at least (it — k) packets of a group to decode all the data packets. This component provides tolerance to certain message loss faults (discussed in Chapter 5). Figure 4.2 shows the abstract version of the proactive component. The encoder and the decoder fractions of the component are shown. The encoder fraction consists of two actions: encode and f ec-send. The decoder consists of two actions: decode and fec-receive. These fractions are composed with the process that will use them. The composition of a fi'action and a process is done by union, which is equivalent to com- bining the actions of the fraction and the process. We assume that appropriate renaming is performed so that there are no inconsistencies in the definitions of the variables of the fractions and the processes. The message communication program composed with the proactive component is shown in Figure 4.3. Specification of program using the proactive component. The program using the proac- tive component satisfies the same specification as the communication program. 42 Component fec Fraction encoder inp sQ : queue of integer 7‘1,7‘2 var n, k, u, l, m : integer {initially, it = l = m = 0} ean : array [integer, 0..n — 1] of integer {initially, ean = .L} begin encode : true —> ean[u, 0..n — 1] z: fec_encode(head(sQ, n —— k)); u := u + 1 [] fec_send : ean[l,m] 74 J. —) 03,7.l :2 C5,,»1 o {l,m,ean[l,m]}; Csfl‘2 :2 C33,,»2 o {l, m, ean[l, m]}; m := (m + 1) mod n; if m = 0 then I := l + 1 fi end Fraction decoder, inp rQ : queue of integer 3 var n, k, :13, y,p, m : integer {initially,p = 0} rbqu : array [integer, 0..n — 1] of integer {initially, rbqu = I} begin fec_receive : —dsEmpty(Cs,7-i) —> :13, y, m :2 head(Cs,rz.); rbqu[a:, y] := m [] decode : count(rbqu[p,0..n — 1] 72 .I.) >= (n — k) -+ rQ := rQ o fec_decode(rbqu[p, 0..n —— 1]); p 2: p + 1 end Figure 4.2: Proactive component. Invariant. The invariant of the program using the proactive component is S Q = S 1 /\ S p, where SF =Vi2miEmQ=>(mZ-ESQ ‘1 ((mi E r1.rQ )1 m,- E data(ean U Cs,r1 U r1.rbqu)) /\ (m,- E r2.rQ X m,- E data(ean U C33? U r2.rbrer)))). 43 Program P fec process 3 var: Pintol-S-Var U fec.encoder.var begin fec.encoder.encode [] fec.encoder.fec.send end process rz-[r' = 1, 2] var: Pintol-Ti-Var U fec.decoder,~.var begin fec.decoder.fec.receive [] fec.decoder.decode end Figure 4.3: Message communication program (with proactive component). We use the notation m, E data(ean U CM1 U r1.rbqu) to imply that message m,- can be generated from the data in {ean U Car1 U r1.rbqu}. In the above invariant, S p indicates that the message is either at the sender, or already received by the receiver, or it can be generated from the data in the channel and the buffers at the sender and the receiver. 4.3.3 Adaptation: Addition of the Proactive Component The adaptation of adding the proactive component converts the program shown in Figure 4.1 to the one shown in Figure 4.3. We first require an adapt-ready version of the program Pintol as shown in Figure 4.4. (An adapt-ready program is one that is composed with adaptive actions.) We now give the specification during adaptation of adding the proactive component. Specification during adaptation. The specification during adaptation is that S 1 continues 44 program ’P -intol process 3 var : Pintol-S-Var begin Pintolssend [] a1 :true —-> TransformTo(Pa_z-p1.s,(Dal); end process r,[i = 1, 2] var rQ : queue of integer begin pinto l .rz- .receive a(,-+1) :a1 A isEmpty(C3,7~z.) -—> transformTo(Pfec.r,-,a( 241)); end Figure 4.4: Message communication program (fault-intolerant version, adapt-ready). to be true during adaptation. We describe the adaptation by identifying the intermediate programs and the corre- sponding transitional-invariants during adaptation after each atomic adaptation. program P -2-p1 process 3 var : Pintolsvar begin a4 :a2 /\ a3 ——> TransformTo(Pfec.s,a4); end process rz-[i = 1,2] : same as in Fig. 4.4 Figure 4.5: Intermediate program Pa_z-p1. The execution of adaptive action al in P _,-nt 01 results in intermediate program P 'ipr shown in Figure 4.5. PCH-p1 does not send any packets, but the packets that are there in o the channel can still be received by the receivers TI and r2. In the execution of P 4101 eventually all the packets in the channel are read and no new packets are added in the 45 u. n channel from the sender to the receiver. Thus, the guards of the adaptive actions a2 and a3 eventually get enabled. The transitional-invariant of Pug-1,,1 is: TS 1 = 81 A 32, where 81,52 are as defined earlier in Section 4.3.1. program P 41,2 process 3 : same as in Fig. 4.5 process r1 : same as in Fig. 4.3 process r2 : same as in Fig. 4.4 Figure 4.6: Intermediate program P 4192. Since a1 and a2 occur independently, we consider both possible orderings among them. The execution of adaptive action oz in P -2-p1 results in intermediate program P 41,2 shown in Figure 4.6. In Pa_ip2, receiver r1 has replaced its fraction, whereas receiver r2 has not yet replaced its fraction and can receive any remaining packets in the channel from s to r2. Eventually, in the execution of Pa-z-p2, the guard of adaptive action a3 gets enabled and a3 is executed resulting in intermediate program P041, 4. The transitional-invariant of P,”p2 is T32 = S 1 /\ 33, where S3 2 Vi : mi E mQ => (m,- E sQ X ((mz- E r1.rQ) /\ (m,- E 03,7? ‘1 m,- E r2.rQ)) /\ isEmpty(Cs,r1) = true /\ r1.rbqu = _L /\ up = 0). program 82..., process 8 : same as in Fig. 4.5 process r1 : same as in Fig. 4.4 process r2 : same as in Fig. 4.3 Figure 4.7: Intermediate program Pa_,-p3. The execution of adaptive action a3 in PCH-p1 results in intermediate program P 41,3 shown in Figure 4.7. In Pa_.,-p3, receiver r2 has replaced its fiaction, whereas receiver r1 46 137m=¥— . has not yet replaced its fraction and can receive any remaining packets in the channel from s to r1. Eventually, in the execution of Pa,ip3, the guard of adaptive action a2 gets enabled and a2 is executed resulting in intermediate program Pa_ip4. The transitional-invariant of Pa is T53 = S 1 A S4, where 4123 S4 = Vi : m,- E mQ =9 (m,- E sQ \_/ ((mz' E r2.rQ) A (m,- E CS”.1 X m,- E r1.rQ)) A isEmpty(C'3,7~2) = true A r2.rbqu = J. A r2.p = 0). program Pa-.,-p4 process 3 : same as in Fig. 4.5 process ri[i = 1,2] : same as in Fig. 4.3 Figure 4.8: Intermediate program P 41,4. In intermediate program P,”-p4 shown in Figure 4.8, only adaptive action a4 is enabled, and execution of (14 results in new program Pfec. The transitional-invariant of 734-4194 is TS4 = 81 A S5, where S5 2 Vi. : m,- E mQ => (mi E sQ Y (mi E r1.rQ A m,- E r2.rQ)) AisEmpty(C3,1~1) = true A r1.rbqu = J. A r1.p = 0 A isEmpty(C3,T2) = true A r2.rbqu = J. A r2.p = 0. We now give the state mappings for the adaptive actions in the adaptation that are used in initializing the state of the new fi'action at each process. State mapping. The state mapping for each adaptive action is shown in Table 4.1. Each adaptive action initializes the state of the new process when it is executed based on this mapping. Based on the description of adaptation in this section, we find the transitional-invariant lattice as shown in Figure 4.9 for the adaptation of adding the proactive component. Thus, 47 Mapping Function Process Affected New State (Dal 3 Identity mapping (1)42 r1 {rd}, 3} - Identity mapping, {n, k, r, y, p, m, rbqu} - Initial mapping (1)43 r2 {rQ, s} - Identity mapping, {72, k, 2:, 21.19. m. rbqu} - Initial mapping (1)44 s {sQ, r1, r2} - Identity mapping, {71, k, u, l, m, ean} - Initial mapping Table 4.1: State mappings for the adaptation. Pinto] : S P Figure 4.9: Adaptation lattice for addition of proactive component. we have the following theorem. Theorem 4.2. The adaptation lattice of Figure 4.9 is the transitional-invariant lattice for the adaptation of adding the proactive component. Hence, the adaptation is correct. [I] 48 4.4 Discussion In this section, we discuss some of the questions raised by this work. Why is the specification during adaptation a safety specification? The specification of the application before adaptation can be arbitrary. However, during adaptation the specification should be a safety specification. It is not desirable to delay the adaptation to satisfy liveness during adaptation. Rather, we would expect the adaptation to complete as quickly as possible and the new program to satisfy the safety and liveness after adaptation. For example, consider a transaction processing system with liveness guarantees to commit or abort. In this case, either the adaptation should not start in the middle of the transaction, or if the adaptation can be started in the middle of the transaction then the liveness should be met once the adaptation is completed. Thus, the implicit liveness specification during adaptation is that adaptation completes. How is transitional-invariant lattice constructed? Techniques [64, 69] developed to calculate invariants of a program can also be used to find transitional-invariants. For a given adaptation model, we can perform reachability analy- sis for each intermediate program obtained after execution of the atomic adaptation. The reachability computation for an intermediate program helps in identifying the transitional- invariant for that intermediate program, and we can construct a transitional-invariant lattice for the given adaptation. Furthermore, the techniques for dynamically discovering likely invariants from the execution of the system such as [70—72] can be used to find transitional- invariants for the intermediate programs in the adaptation lattice. Can transitional-invariant lattice approach be used to verify existing adaptation tech- 49 niques? Yes, the transitional-invariant lattice approach can also be used to verify existing adapta- tion techniques. We give an outline of how existing adaptation techniques can be verified using transitional-invariant lattice. To verify existing adaptation approaches, we first need to identify all atomic adaptations and the corresponding adaptive actions. We then need to consider all possible orderings and concurrency among adaptive actions, and identify inter- mediate programs after each adaptive action. Next, we find transitional-invariants for each intermediate program and check if the transitional-invariants imply safety of adaptation. Can adaptation lattice approach be used in the case of parallel adaptation? Yes, the adaptation lattice approach can also be used to verify parallel adaptation. We dis- cuss the use of adaptation lattice approach in the case of parallel adaptation using a simple example. Consider a system consisting of two processes, a sender process and a receiver process. Both processes communicate using an encryption protocol. The adaptation re- quires that eventually the current (or, old) encryption protocol be replaced by another (or, new) encryption protocol. The parallel adaptation adds the new encryption protocol along- side the current encryption protocol. In other words, the sender process has two types of fiactions (for encryption) and the receiver process has two types of fractions (for decryp- tion). All users using the sender process to send data to users at the receiver process initially use the old encryption protocol. Once the new encryption protocol is added, some users use the old protocol whereas some users use the new protocol. Eventually, after all users have stopped using the old protocol, it can be removed from the system. The adaptation lattice in this case is as shown in Figure 4.10. Program P uses the 50 P,: old, new P,: old Pr: old Pr: old, new P,: old, new Pr: old, new P3: new P.: old, new Pr: old, new Pr: new Figure 4.10: Adaptation lattice for parallel adaptation. old protocol fractions at both the processes. Eventually, after adaptation is completed, program Q uses the new protocol fractions at both the processes. Program P’ uses both types of fractions at both the processes. During adaptation, program P' can stay active for an unbounded time. The correctness requirements for P’ are that the old (respectively, new) fraction at the sender process communicates only with the old (respectively, new) fi'action at the receiver process. The correctness requirements for the intermediate programs between P and P' are that the old fraction at the sender process communicates only with the old 51 fraction at the receiver process, and the communication from the new fraction at the sender process to the receiver process is buffered. Similarly, the correctness requirements for the intermediate programs between P’ and Q are that the new fraction at the sender process communicates only with the new fraction at the receiver process, and no communication occurs between the old fraction at the sender process and the old fraction at the receiver process. 52 Chapter 5 Verifying Adaptation in Presence of Faults In this chapter, we introduce the notion of transitional-faultspan and transitional-faultspan lattice to verify the fault-tolerance properties during adaptation. We first define transitional- faultspan in Section 5.1. Then, in Section 5.2, we define transitional-faultspan lattice and give a theorem to prove the correctness of adaptation in the presence of faults. Finally, in Section 5.4, we present a case study of adaptation in the message communication applica- tion to demonstrate the use of transitional-faultspan lattice. 5.1 Transitional-Faultspan Let F p be the fault class of the old program and FQ be the fault class of the new program. Let Sp be an invariant and Tp be a Fp-span of the old program. Similarly, let SQ be an invariant and TQ be a FQ-span of the new program. The old program is Fp-tolerant, and 53 the new program is F Q-tolerant. Let F be the fault class during adaptation. In the context of adaptation, we define transitional-faultspans to identify the set of states that the program can reach while performing adaptation in the presence of faults. Definition ('IYansitional-faultspan). Let R be an intermediate program in the adaptation A, and TS be a transitional-invariant of R. A transitionalfaultspan (F-span) of R from TS is a predicate TT that satisfies following two conditions: (2') TS ; TT, and (ii) TT is closed in R[] F. Now, we define transitional-faultspan lattice. 5.2 Transitional-Faultspan Lattice A transitionalfaultspan (F-span) lattice is an adaptation lattice where each node is as- sociated with two predicates, a transitional-invariant and a transitional-faultspan, and the following conditions are satisfied: 0. Correctness in absence of faults. The adaptation lattice obtained by considering the transitional-invariants only forms a transitional-invariant lattice. 1. F ault-tolerance of old program. The entry node P is associated with a FP-span Tp of the program before adaptation. 2. Fault-tolerance of new program. The exit node Q is associated with a FQ-span TQ of the program after adaptation. 3. F ault-tolerance of intermediate program. Each intermediate node B is associated with a predicate TT R that is a transitional-faultspan (F-span) from TS R for any 54 intermediate program at R (i.e., intermediate program obtained by performing adap- tations from the entry node to R). Furthermore, any intermediate program at R is F-tolerant fiom TS R- 4. Safety of adaptive action. If a node labeled R1- has an outgoing edge labeled a to a node labeled Rj, then for all adaptive transitions (3, a, s') in Smap where TTR, '. In other words, Vs,s’ : (s,a,s’) E is true in state 3, TTRJ- is true in state 3 Smap : s E TTR, => 3’ E T T Rj' Furthermore, all the adaptive transitions (3, a, 3’) satisfies the safety specification during adaptation. 5. Progress of adaptation. If a node labeled R has outgoing edges labeled a1, a2, ..., a k to nodes R1, R2, ..., Rk, respectively, then in all computations of adap- tation there exists a transition (.9, s’) such that for some 2' : 1 _<_ i S k : (s, 0.2-, s’) E Smap. Furthermore, Vs : s E TTR : (Va,s’ : a 6 Ba — {a1,...,ak} : (s,a,s’) g! Smap)- Correctness of adaptation in presence of faults. Intuitively, an adaptation is correct in presence of faults F if the following conditions are satisfied: If the adaptation begins in a legitimate state of the old program then during adaptation each intermediate program is F—tolerant and the resulting state of the new program is legitimate. With this intuition, if the adaptation begins in a state where the fault-span of the old program is true, then we say that the adaptation is correct if: 0 Adaptation terminates in a state where fault-span of the new program is true 0 During adaptation, each intermediate program is F -tolerant 55 o Eventually adaptation terminates The following theorem states that finding a transitional-faultspan lattice is necessary and sufficient for proving correctness of adaptation. Theorem 5.1. Given Sp as the invariant of the program before adaptation, Tp as the faultspan used to show that the program before adaptation is tolerant to F p, SQ as the invariant of the program after adaptation, and TQ as the faultspan used to show that the program after adaptation is tolerant to F 62' the adaptation from P to Q is correct in pres- ence of faults F if and only if there is a transitional-faultspan (F -span) lattice for the adaptation with start node associated with Sp and Tp, and end node associated with SQ and TQ. Proof The proof of this theorem is similar to Theorem 4.1 discussed in Chapter 4. E] 5.3 Adaptation of Self-Stabilizing Programs In this section, we consider the adaptation by Gouda et al. in [73], where the authors have focused on adapting from one self-stabilizing [59] program into another self-stabilizing program. We show that this is an instance of our approach where all the transitional- faultspan predicates are true. A program is self-stabilizing if starting fi'om an arbitrary state it eventually recovers to a legitimate state. Thus, in transforming from one stabilizing program to another, we can let all the fault-spans (i. e., fault-span of the old program, fault-span of the new program and transitional-faultspans of the intermediate programs) be true. With this approach, if the 56 old program starts in any state, eventually the new program execution begins although the state of the new program may be arbitrary. Since the new program is self-stabilizing, it will eventually recover to legitimate states. Note that in [73] the corresponding transitional-invariants may not exist. Specifically, even if the old program begins in legitimate states, the new program may initially be in ille- gitimate states before recovery takes place. Moreover, the approach in [73] allows arbitrary behavior during adaptation and, hence, the specification during adaptation may not be met. 5.4 Case Study: Reliable Message Communication (Con- tinned) In this section, we continue with the example of Chapter 4. We consider the adaptation that replaces the proactive component with the reactive component. We first discuss the adapt-ready version of the proactive component and the faults tolerated by the proactive component in Section 5.4.1. Next, in Section 5.4.2, we describe the acknowledgment- based reactive component. We then discuss the adaptation of replacing the proactive com- ponent by the reactive component in Section 5.4.3. Finally, in Section 5.4.3, we identify the transitional-faultspan lattice to verify the correctness of this adaptation in the presence of faults. 57 C (I 5.4.1 Proactive Component We discussed the proactive component in Chapter 4. The adapt-ready version of the proac- tive component is shown in Figure 5.1. Program Poo-fee process 3 var: Pfecsvar begin fec.encoder.encode fl fec.encoder.fec-send [] aal :true -—> transformTo(Paa_z-p1.s,(Dual) end process 'rz-[z' = 1, 2] var: Pfesz-yar begin fec.decoder.fec_receive I] fec.decoder.decode [] ooh-+2) : aa2 AisEmpty(Cs,7~z-) —> transformTo(’Pack.rz-,(Paa(i+2)) end Figure 5.1: Message communication program (with proactive component, adapt-ready). The specification of the program using proactive component is discussed in Chapter 4. Additionally, it tolerates message loss faults of class F 1 (cf. Figure 5.2). Faults of class F 1 causes a loss of up to k messages in a group. In writing the fault transitions, we use the following auxiliary variables: m9 to denote a message m from group 9, and lostC’ount? to denote the number of messages lost in group g in the channel from s to ri. Initially, Vg :: lostCounti.’ = 0. We now give the fault-span of the program using the proactive component. Fault-span. The F 1-span of the program using the proactive component is TQ = SQ. The fault-span is same as the invariant since the proactive component provides masking 58 a: In Ilt‘. / 3 LI. ' ’1 555:. Ctr- ‘\ mngossi : m9 6 CS,” /\ lostCountzg _<_ k —> Cs,” := Cs,” — m9; lostCounté2 + + Figure 5.2: Fault class F1. fault-tolerance. 5.4.2 Reactive Component The reactive component deals with the message loss by retransmitting the lost packets. It uses acknowledgments to confirm the receipt of messages sent by the sender, and negative acknowledgments to confirm the loss of messages sent by the sender. It consists of aSnd fraction at the sender and aRcv fraction at the receiver. The aSnd fraction adds a group and a packet number in each packet. It maintains a window of size to and sends all packets in that window to the receiver. It waits for the acknowledgment of receipt of a group before moving the window one group forward. If it receives a negative acknowledgment for any packet, it sends that packet again to the receiver. When the aRcv fraction at the receiver receives a packet out of order, it waits for a few more packets before sending a negative acknowledgment to the sender. When all packets in a group are received, it sends an acknowledgment for that group to the sender. mngossz- : m 6 CS,”- -—> C3,”. :2 Cs,” — {m} Figure 5.3: Fault class F2. The reactive component provides tolerance to message loss faults F2 shown in Figure 5.3. Faults of class F2 causes loss of messages from the channel. For simplicity, we assume that acknowledgment messages are not lost; however, the component can be easily extended to deal with faults that lose acknowledgments by using timeout guards. S9 Component ack Fraction aSnd inp sQ : queue of integer r1, r2 var n, w, 92-, p,- : integer {initially, p,- = g,- = 0} 9a, gnamna 3 integer {initially, Pi = 92' = 0} chopyz- : queue of integer {initially, Empty} snti : array [0..w — 1, 0..n — 1] of integer {initially _L} param i : i = 1, 2 begin com I isEmpty(chopyz-) —+ SQCOPyz' == 8Q U sendz- : fiisEmpty(chopyz-) A Mitt-[923197;] = _L —+ sntilgimz'] == {9231923 head(8Q60pt/z')}; C3,”. :2 Cs,” 0 snti[gi,pi]; p,- := (p,- + 1) mod n; if p,- = 0 then 9,- := (91- + 1) mod 10 fi [] resend,- : type(Cri,s) = nack -—+ 9nmpna :2 head(Crz.,s); if snti [912a, pm] 524 i then 03,73- 3: 03,13; 0 sntz’lgnamna] fi [] ack_rcvz- : type(Cri,s) = ack —> 90,, sntz-[ga,0..n — 1] := head(C7~2-_,3), i end Figure 5.4: Acknowledgment component: sender fraction. Figures 5.4 and 5.5 shows the abstract version of the reactive component. The aSnd fraction consists of four types of actions: copy, send, resend, ack_rcv. The aRcv fraction consists of three types of actions: receive, del ive r, and sendmack. These fractions are composed with processes that will use them. The message communication program composed with the reactive component is shown in Figure 5.6. We now give the specification of the program using the reactive component. Specification of program using the reactive component. component satisfies the same specification as the communication program (cf. Chapter 4). 6O Program using the reactive Component ack Fraction aRcvz- inp rQ .3 var n, to, g, p : integer {initially k 2 ng :2 0} k, ng, m : integer {initially k = 77.9 = 0} rbqu : array [0..w — 1, 0..n — 1] of integer {initially 1} ug : array [0..w — 1] of boolean {initially false} paramj :Ogigw—l begin receive : -uisEmpty(C3,ri) —-> g,p,m := head(Cs,7~i); rb’uf Q19, 29], ugly] == m, true [] deliverj : ug[j] = true —> if count(rbqu[j, O..n — 1] 7£ _L) = n then rQ := rQ o rbquLj, 0..n — 1]; rbquLj, 0..n - 1], CTN :2 1,073”; o ack(j); ug[j],ng :=false, (j + 1) mod to fi [] send_nack : count(ug[0..w — 1] = true) > 2 ——> for k = 0 to n — 1 if rbqu[ng, k] = .1. then 013,3 := 013,3 0 nack(ng, 1:) fi end Figure 5.5: Acknowledgment component: receiver fiaction. Additionally, it tolerates message loss faults F2. Invariant. The invariant of the program using the reactive component is S R = S 1 /\ S A, where SA=VizmiEmQ=> ((mi 6 chopy1 X m,- E r1.rQ \_/ (m2 9! (chopyl U r1.rQ) => (m,- E sntl A (m,- E 03¢1 2 mi 6 r1.rbqu)))) /\ (m,- E chopr X m,- E r2.rQ X (mi ¢ (choprUrng) => (m,- E snt2 /\ (m,- 6 ngr2 Xmi E r2.rbqu))))). In the above invariant, S A indicates that for a message m, exactly one of the following 61 Program 'Pack process 5 var :Paa_fec.s.var U ack.aSnd.var param i :z'= 1,2 begin ack.aSndcopyz- [] ack.aSndsendz— [] ack.aSnd.send_again,~ [] ack.aSnd.ack.rcvz- end process ri[2‘ = 1,2] begin var : Paa_fec.r,-.var U ack.aRcvz-Nar param k :OSksw—l begin ack.aRcv.receive [] ack.aRcvdeliverk [] ack.aRcv.send_nack end Figure 5.6: Message communication program (with reactive component). is true: - m is at the sender, and is not yet sent - m is received by the receiver - m is buffered by the sender, and m is either in the channel or is buffered at the receiver. Fault-span. The F2-span of the program using the reactive component is TR = S1 /\ TA, where TA=Vi:mz-EmQ=> ((mi 6 chopyl X m,- E r1.rQ Y (m,- ¢ (chopy1 U r1.rQ) : m,- E snt1)) 62 /\ (mi 6 chopy2 \_/ m,- E r2.rQ Y (m,- ¢ (chopy2 U r2.rQ) => m7; 6 snt2))). In the above fault-span, T A indicates that for a message m, exactly one of the following is true: - m is at the sender, and is not yet sent - m is received by the receiver - if m is sent by the sender and not yet received by the receiver, then m is buffered by the sender. 5.4.3 Adaptation: Replacement of Proactive Component with Reac- tive Component The adaptation of replacing the proactive component with the reactive component converts the program shown in Figure 5.1 to the one shown in Figure 5.6. We now give the specifi- cation during adaptation for the replacement of the proactive component with the reactive component. Specification during adaptation. The specification during adaptation is that S 1 continues to be true during adaptation in presence of faults F1. We now describe the adaptation by discovering the intermediate programs and the cor- responding transitional-invariants and transitional-faultspans during adaptation. We iden- tify the intermediate programs after each atomic adaptation. The execution of adaptive action aal in “Pawfec results in intermediate program ”Pawz-pl shown in Figure 5.7. ”Pawipl does not encode any new packets, but will send 63 Program ’Pawim process .9 var: ’Paa_fec.s.var begin fec.encoder.fec_send [] aag : aal AI = u —> transformTo(’Paa_Z-p2.s,(Page); end process 7,-[2‘ = 1, 2] : same as in Fig. 5.1 Figure 5.7: Intermediate program poo-i191- any remaining encoded packets. In the execution of ’Pawipl, eventually all the encoded packets are sent to the receivers. Thus, the guard of adaptive action aag becomes true. The transitional-invariant of ’Pawim is: TS 5 = SQ A S6, where SQ is defined earlier in Chapter 4, and $6 = (Vj :j _>_ u : eanLj,O..n — 1] = _L) A (l S it). In the above transitional-invariant, 36 indicates that no new packets will be encoded by the sender. The transitional-faultspan TT5 of ’Pawipl is same as TS 5. Program ’Pawim process 3 var : ”Paa_fec.s.var begin aa5 : aa3 A am; —> transformTo(73ack.s,aa5); end process ri[z’ = 1, 2] : same as in Fig. 5.1 Figure 5.8: Intermediate program ”Pam-1,2. The execution of aag results in intermediate program Pam-p2 shown in Figure 5.8. Paw,” does not send any packets, but the packets that are there in the channel can still be received by the receivers 7‘1 and r2. In the execution of ’Pawim eventually all the 64 packets in the channel are read and no new packets are added in the channel from sender to receiver. Thus, the guards of the adaptive actions aa3 and M4 eventually becomes true. The transitional-invariant of ’Pa is: TS 6 = 31 A S7 A SB, where 31 is defined earlier mm in Chapter 4, and S7 = (Vj :j 2 u : ean[j,0..n — 1] = _L) A (l = u), and SS 2 Vi : m,- E mQ => (m,- E 3Q 2 ((mi 6 r1.rQ X m,- E data(CS,7~1 Ur1.rbqu)) A (m,- E r2.rQ M mi 6 data(Cs,r2 U r2.rbqu)))). In the above transitional-invariant, 57 indicates that there are no encoded packets left at the sender for sending, and 38 indicates that all packets that are sent are either received by the receivers or are in the corresponding channels. The transitional-faultspan TT5 of ’Pawim is same as T56. Program ’Pawim process 3 : same as in Fig. 5.8 process r1 : same as in Fig. 5.6 process r2 : same as in Fig. 5.1 Figure 5.9: Intermediate program Pawim. Since M3 and M4 occur independently, we consider both possible orderings between them. The execution of adaptive action (103 in ”Paw,” results in intermediate program ’Pawim shown in Figure 5.9. In Paa_z-p3, receiver r1 has replaced its fraction, whereas receiver r2 has not yet replaced its fraction and can receive any remaining packets in the channel from s to T2. Eventually, in the execution of Pawim the guard of adaptive action aa4 gets enabled and M4 is executed resulting in intermediate program ’Pawim. The transitional-invariant of ’Pa is TS 7 2 SI A S7 A SQ A Sm, where a-ip3 65 Sg=VizmiEmQ=>(mZ-ESQ \_/ (m,- E r1.rQ A (mi 6 r2.rQ 2 mi 6 data(C'5,r2 U r2.rbqu)))), and S10 2 isEmpty(Cs,r1) = true A r1.rbqu = _L. The transitional-faultspan TT7 of ’Paa_ip3 is same as TS 7. Program poo-i124 process 3 : same as in Fig. 5.8 process r1 : same as in Fig. 5.1 process r2 : same as in Fig. 5.6 Figure 5.10: Intermediate program Pawn“. The execution of adaptive action M4 in Puma-p2 results in intermediate program poo-i194 shown in Figure 5 .10. In ’Paa_z-p4, receiver r2 has replaced its fraction, whereas receiver r1 has not yet replaced its fraction and can receive any remaining packets in the channel from s to r1. Eventually, in the execution of ’Paa_ the guard of adaptive ac- i104 tion dog, gets enabled and (tag is executed resulting in intermediate program ’Paa_ip5. The transitional-invariant of PM. is TSg = S1 A S7 A S11 A 512, where 2104 $11=Vi:mz:€mQ=>(mz- ESQ \_/ ((mi 6 r1.rQ Y mi 6 data(C'S,r,~1 U r1.rbqu)) A m,- E r2.rQ)), and 312 = isEmpty(Cs,r2) = true A r2.rbqu = _L. The transitional-faultspan TT8 of {Paar-i124 is same as TS 8- Program 7706,4105 process 3 : same as in Fig. 5.8 process 73-[2' = 1, 2] : same as in Fig. 5.6 Figure 5.11: Intermediate program 7300,4195. In intermediate program 730,me shown in Figure 5.11, only adaptive action (105 is 66 enabled, and execution of Mg, results in the new program Pack. The transitional-invariant of’l30a_ip5 is T89 = 51 A 510 A 512 A 513, where 313 2 Vi. : m, E mQ :> (m,- E 8Q Y (m,- E r1.rQ A mi 6 r2.rQ)). The transitional-faultspan TT9 of Poo-i195 is same as TS 9. We now give the state mappings for the adaptive actions in the adaptation that are used in initializing the state of the new fraction at each process. State mapping. The state mapping for each adaptive action is shown in Table 5.1. Each adaptive action initializes the state of the new process when it is executed based on this mapping. Mapping Function Process Affected New State (Dual 3 Identity mapping aa2 3 Identity mapping (baa3 r1 {rQ, s} - Identity mapping, V(rl) — {rQ, s} - Initial mapping (baa,1 r2 {rQ, s} - Identity mapping, V(rg) - {rQ, s} - Initial mapping (1)0115 S {8Q, 7'1 1 T2} ' Identity mapping: V(s) — {sQ, r1, r2} - Initial mapping Table 5.1: State mappings for the adaptation. Based on the description of the adaptation in this section, we find the transitional- faultspan (F—span) lattice as shown in Figure 5.12 for the adaptation of replacing the proac- tive component with the reactive component. Thus, we have the following theorem. Theorem 5.2. The adaptation lattice of Figure 5. I 2 is a transitional-faultspan (F 1-span) lattice for the adaptation of replacing the proactive component with the reactive compo- nent. Hence, the adaptation is correct in presence of faults. D 67 SR,TR Figure 5.12: Adaptation lattice for replacement of proactive component with reactive com- ponent. 68 Chapter 6 Case Study: Mixed Mode Adaptation In this chapter, we discuss the case study of mixed-mode adaptation. We study mixed-mode adaptation in the context of protocol change as discussed in dynamically adaptable middle- ware [1, 74, 75]. Specifically, we consider two leader election protocols and adaptation that changes one leader election protocol to another at run-time. We replace a leader election protocol that elects a leader based on process identification to a leader election protocol that elects a leader based on process value, where the value of a process can be defined using process’s battery life, its average distance to other processes, etc. In the rest of this chapter, we first discuss the two leader election protocols in Section 6.1; we discuss the system model, protocol specifications and descriptions in this section. Next, in Section 6.2, we discuss the adaptation of leader election protocols; we discuss the overlap communication scenarios, state mappings and verification of the adaptation in this section. In Section 6.3, we present the performance results of mixed-mode adaptation. Finally, in Section 6.4, we discuss the limitations of mixed-mode adaptation. 69 6.1 Leader Election Protocols Leader election is a fimdamental problem in distributed computing. The basic description of leader election problem is stated as: eventually elect a unique leader. For example, in the case of group communication protocols, the leader election is employed to elect a new coordinator whenever a group coordinator fails. Numerous leader election protocols have been proposed in the literature for a variety of applications. We discuss two versions of the leader election protocol that is based on the termination detection protocol by Dijkstra and Scholten [76]. The protocols discussed in this section are an abstraction of the protocol discussed in [77]. We assume that there is another module at all (or selected) processes that monitors the status of the leader process. A monitor process starts an instance of the leader election protocol whenever it detects a failure of the leader. 6.1.1 System Model and Assumptions The system consists of n processes. All processes have unique identifiers which we denote by id. Each process maintains a variable ldr, which denotes the value of the leader that the protocol elected, and a variable 6, which denotes if the process is in election or not . We make the following assumptions about the system: 0 Bi-directional channel. The channels between processes are bidirectional, i. e., if the system has a channel Cpl] from p to q, then it also has a channel Cqm. 0 Static processes. We assume that processes are static and the network is connected. If the processes are mobile and the network suffers from partitioning, then the proto- col can be extended as discussed in [77]. 70 0 Process failure and reliable communication. A process or a link can fail, however, for simplicity, we assume that while the election is going on there is no process or link failure. We assume reliable communication, i. e., if process p sends a message m to process q, then eventually q receives the message m. 6.1.2 Specification of Leader Election Protocols We consider the following problem specification of the leader election protocols that we use in this case study: Safety: Cl(i.ldr 7g j.ldr => ale Vj.e) Liveness: UOVi,j : i.ldr = j.ldr The safety property asserts that no two stable processes (i. e., processes not in the election) can have different leaders. The liveness property asserts that eventually a unique leader is elected. We now discuss two leader election protocols, one that elects the process with the max- imum id as the leader, and one that elects the process having the maximum value as the leader. 6.1.3 Leader Election based on Process ID The leader election protocol, 1dr] d , that elects the process with maximum id as a leader is shown in Figure 6.1. In addition to the safety and liveness properties discussed earlier in Section 6.1.2, the protocol also satisfies the following liveness property: Liveness of ldrId: DOinr = max{k I dug < oo}, 71 Component ldrId Fractioni inp N : list var startElectz'on, e,ack : bool {initially false} p, num, max, ldr : integer {initially p = num = max = 0, ldr = _L} src(Num, Id) : (integer, Ln) {initially (O, 2)} W, chd : list {initially W, chd = o} begin startElection : fit A startElectz’on ——> src, num :2 (num, 2'), num + 1; ELECT.src := src; forj=1tonAjEN Ci,j.add(ELECT); W, ack,6,p,max, chd 2: N, true, true, 2', i, (f); startElectz'on := false fl joinElection : ELECT(ldrId) e Cjn‘ —-> Cj,i.remove(ELECT); if fie V (e A ELECT.src > are) then src := ELECT.src; num := src.Num + 1; fork=1tonA(k7éjAkEN) Ci’k.add(ELECT); W, ack, 6,19, max,chd := N — {7'}, true, true,j, i, (t); else if e A src = ELECT.src then ACK.{src, chd} := src, false; Ci,j.add(ACK) fi [] ackToParent : e A W 2 g!) A src.Id 3A i A ack -—> ack := false; ACK.{src, chd, id} :2 src, true, max; Ci,p.add(ACK) Figure 6.1: Leader election algorithm based on node Id. which asserts that the elected leader is the process having maximum id among all connected processes. The protocol uses three types of messages as shown in Table 6.1. It also shows the fields associated with each message type. The variables used by the leader election protocol are 72 [] ackReceive : e A ACK(ldrId) e 033,- —> Cj,z-.remove(ACK); if ack A src = ACK.src then W == W — {j}; if ACK.chd then chd = chd + {j}; max := MAX(ACK.id, max); fi fi [] electLeader : e A W = 05 A src.Id = i A ack ——> ack, 6, ldr := false, false, max; LD.{src, id} := src, max; forj= 1tonAj Echd Ci,j.add(LD) [] setLeader : e /\ LD(ldrId) 6 CM A pack —+ Cj,i.remove(LD) if src = LD.src then ldr, 6 := LD.ld, false; fork=1tonAk E chd Ci’k.add(LD) fi end Figure 6.]: Leader election algorithm based on node Id (Continued). shown in Table 6.2. Message Meaning Message Fields ELECT for building a spanning tree type : protocol type src : computation index of the election ACK to acknowledge the receipt type : protocol type of ELECT message src : computation index of the election chd : denotes if the sender is a child id : maximum id as seen by the sender LD to announce a leader type : protocol type src : computation index of the election id : id of leader elected in the election Table 6.1: Message types used in the protocol. 73 Variable Type Meaning e bool indicates whether process is currently in election or not ldr int(1..n) id of the leader process num int index of the last computation that this process started src(Num, Id) (int, int(1..n)) computation index of the last computation in which this process participated p int(1..n) parent process in last computation ack bool indicates if ACK message is sent to parent or not W list list of neighbors from which ACK is being awaited C list list of my children in current computation max int(1..n) maximum id among my children in current computation Table 6.2: Variables maintained by each process in the protocol. 6.1.3.1 Description of the protocol When the process i detects the failure of the leader, it sets the value of the variable startElection to true to start the instance of the ldrId protocol to elect a new leader. The fraction at i begins the election by starting a diffusing computation by sending an ELECT message to all of its neighbors. The ELECT message has a field that contains the computation-index of the diffusing computation. When a process receives the ELECT mes- sage, it sets the neighbor from which it first received the message as its parent. It then propagates the ELECT message to all of its neighbors. In this way, during the first phase a spanning tree is build. When a fraction 2' receives an ELECT message from a process that is not its parent, it immediately responds by sending an ACK message. The ACK message has a field that identifies if the message is from a child or not, and a field that denotes the maximum id as seen by the process. The fraction i sends an ACK message to its parent only when it 74 has received ACK messages from all of its neighbors. When a fraction 2' has received ACK messages from all of its neighbors, it first finds the process having maximum id among all its children. It then sends the ACK message to its parent. When the source process that started the computation (election) receives ACK messages from all of its neighbors, it can compute the leader by finding the process having maximum id among all its children. It then starts a diffusing computation to forward the leader infor- mation to all the processes. As one or more processes can concurrently detect failure of a leader, it is possible that more than one process can start elections independently, thereby, leading to concurrent dif— fusing computations. To ensure correctness of the protocol, it is required that each process participate in only one diffusing computation. This is done by associating a computation- index to each computation. The computation-index is a pair (num, id), where id represents the identifier of the process, and num is an integer. When a process participating in a diffus- ing computation, receives another diffusing computation with higher computation-index, it stops participating in the current computation in favor of the diffusing computation with higher computation-index. Two computation-index are compared as follows: (num.1,id1) > (numg, idg) 4:) ((numl > numg) V ((numl 2 77.11.7722) A (idl > id2))). 6.1.4 Leader Election based on Process Value The leader election protocol, ldr Val that computes the leader based on the value of a pro- cess is shown in Figure 6.2. This protocol satisfies the following liveness property in addi- tion to the safety and liveness properties discussed in Section 6.1.2: 75 Component ldr Val Fractioni inp value : int N : list var startElection, e, ack : bool {initially false} p, num, ldr : int {initially p = num = 0, ldr = _L} src(Num,Id),ma:r(Val,1d) : (int, 1..n) {initially (0,i)} N, W, chd : list {initially W, chd = o} begin startElection : -~e A startElection —> src, num :2 (num, i), num + 1; ELECT.ST‘C I: i; fork: 1tonAk E N Ci,k.add(ELECT); W, ack, e,p, max, chd :2 N, true, true, i, (value, i), a); startElection := false [] joinElection : ELECT(ldrVal) 6 CM -—> Cj_i.remove(ELECT); if we V (e A ELECT.src > src) then src :2 ELECT.src; num := src.Num + 1; fork=1tonA(k#jAk€N) Ci’k.add(ELECT); W, ack, 6,1), chd := N — {j}, true, true,j, cf); max := (value,i); else if e A src = ELECT.src then ACK.{src, chd} := src, false; Ci,j.add(ACK) fi [] ackToParent : e A W = d) A src.Id 7e i A ack ——> ack := false; ACK.{src, chd, val, id} :2 src, true, max.{Val, Id}; Ci,p.add(ACK) Figure 6.2: Leader election algorithm based on node value. Liveness of ldrVal: D0(i.ldr =j => j.value = mar{k.value | diJc < 00}, which asserts that the elected process is the process having maximum value among all connected processes. 76 [] ackReceive : e A ACK(ldrVal) E Cj,i A ack —> C -,i.remove(ACK); .7 if src = ACK.src then W == W - {j}; if ACK.chd then chd = chd+ {j}; maze := MAX((ACK.val, ACK.id), max) fi fi [] electLeader : e A W = (15 A src.Id = i A ack —> ack, e, ldr := false,false,ma:r.1d; LD.{src, id} :2 src, maxid; fork=1tonAk E chd Ci,k.add(LD) [] setLeader : e A LD(ldrVal) 6 CN- A pack —> Cj,i.remove(LD); if src = LD.src then ldr, e := LD.ld, false; fork: 1tonAk E chd Ci,k.add(LD) fi end Figure 6.2: Leader election algorithm based on node value (Continued). The value of a process is calculated based on the resources available at the process, such as, battery power, CPU load, distance to other processes, and degree of the process. The leader election protocol is independent of how the value of a process is calculated. We assume that there is a separate component that monitors the resources at a process and computes the value of the process; and the leader election protocol fraction at the process has access to this value. 77 6.1.4.1 Description of the protocol The ldr Val protocol is similar to the ldrId protocol described earlier where the leader is computed based on the process whose id is maximum. The ldrVal protocol is different from ldrId protocol in the following ways: 0 The ldr Val protocol has an additional input variable, value that indicates the value of the process. 0 The max variable in ldrVal is represented as a pair (Val, Id) and is of type (int, int(1..n)); marld denotes the id of the process and max. Val represents the value of the process. 0 The ACK message of ldr Val has a field, ACK.val that carries the information about the process having maximum value among all of its children. This field is of type: (int, int(1..n)). The equivalent field in ACK message of ldrId is ACK.id, which is of type int. In the case where two processes have the same value, the tie is broken in favor of the process having higher id. We have the following property: marl > mastz <=> ((ma331.Val > ma$2.Val) V ((max1.Val = max2.Val) A maxlld > max2.1d))). 6.2 Adaptation We now consider adaptation that dynamically replaces the leader election protocol ldrId to the leader election protocol ldrVal. When the adaptation is initiated, an instance of ldrId can be underway, or it is also possible that no instance of ldrId is underway. If the 78 processes are not engaged in election, then replacing one leader election protocol to another leader election protocol can be done easily; by replacing the corresponding fractions and initializing the new fractions using initial, identity or quasi state mapping. However, it is impossible for a process to locally determine whether other processes are in election or not. One way to deal with this is to use a centralized control during adaptation and/or enforce synchrony by quiescing the processes (blocking the processes from starting new election). This type of approach causes service interruption where the leader election service is not available till the adaptation is completed. Also, there is communication overhead because of extra messages required for synchronization. We consider mixed-mode adaptation that lets the old and the new protocol overlap dur- ing adaptation. As a result, a process where the new fraction is installed, can start the elec- tion right away without waiting for other processes to finish the adaptation. Thus, the time for which the application is blocked or the service interruption time is reduced. Moreover, since all processes can perform replacement of fractions independently (in parallel), there is a little or no need for extra messages to achieve synchrony. Thus, the communication overhead during adaptation is reduced. The program using ldrId protocol is shown in Figure 6.3, and the program using ldr Val protocol is shown in Figure 6.4. We first give the specification during adaptation for the adaptation that replaces ldrId with ldr Val. Specification during adaptation. The specification during adaptation is that the safety property of leader election protocols as discussed in Section 6.1.2 continues to be true during adaptation. Adapting ldrId protocol to ldrVal protocol requires replacing each fraction of ldrId 79 Program pldr Id process i[i = 1, .., n] begin ldrId.i.startElection ldrId.i .joinElection ldrI d. i .ackToParent ldrId . i .ackReceive ldrI d .i.electLeader ldrId.i.setLeader DEED: end Figure 6.3: Program using ldrId. Program 7Dldr Val process i[i = 1, ..,n] begin ldrVal.i.startElection ldrVal.i.joinElection ldrVal.i.ackToParent ldr Val . i .ackReceive ldr Val. i .elect Leader ldrVal.i.setLeader DECIDE: end Figure 6.4: Program using ldr Val. with the corresponding fraction of ldr Val. The replacement of fraction at each process is an atomic adaptation and is modeled by an adaptive action. The intermediate programs that occur during adaptation are as shown in Figure 6.5. Program pip(ldrId-ldr Val) process i[i = 1, ..,n;i 75 J] Pldrld‘i U aaz- 2 true —* TransformT0(pldTVal'i’(Dani) process j[j =1,..,n;i#j] Pzdrvaz-J' Figure 6.5: Intermediate program during adaptation from Pldr Id to Pldr Val- In this example, the fractions of the two protocols at all processes are similar. Also, as 80 discussed later in this section, the overlap communication scenarios for each intermediate program are similar. We take advantage of this symmetry by modeling the adaptation as an adaptive program (cf. Section 3.2.2 (adaptation as an automaton) of Chapter 3). To model each adaptive action, we introduce two new boolean variables idActive and valActive. We do restriction composition [58] of ldrId program with idActive, and of ldrVal with valActive. A restriction of program P by Z is a program whose actions are of the form Z A g —-> st, for each action 9 -—> st of P. Thus, actions of ldrId will be enabled only if idActive is true, and actions of ldrVal will be enabled only if valActive is true. Initially, when the old program is running, idActive is true and valActive is false at each process. The adaptive action at each process atomically sets idActive to false and valActive to true. The adaptation modeled as an adaptive program is shown in Figure 6.6. We allow these adaptive actions to execute independently of each other. In other words, at each process the fraction of ldrId protocol can be replaced by the fraction of ldrVal protocol independent of other processes. During this replacement, the state of the new fraction needs to be initialized appropriately to ensure correctness of adaptation. Furthermore, independent replacement of fractions at various processes will lead to a situation where during adaptation some of the processes have fiactions of ldrId active and some of the processes have fractions of ldrVal active. This will cause overlap of communication between the two protocols. The communication between two protocols need to be handled appropriately to ensure correctness of adaptation. The correctness of adaptation requires that the safety property, which asserts that two processes not in the election cannot have different leaders, also needs to be satisfied during 81 Program pldrId—ldr Val process i[i = 1, ..,n] var idActive :bool {initially true} valActive : bool {initially false} ldrId.i.{var} U ldrVal.i.{var} begin idActive A ldrId.i.startElection idActive A ldrId.i.joinElection idActive A ldrId.i.ackToParent idActive A ldrId. i .ackReceive idActive A ldrId . i .electLeader idActive A ldrId.i.setLeader valActive A ldr Val. i .startElection valActive A ldr Val. i .joinElection valActive A ldr Val. i .ackToParent valActive A ldr Val. i .ackReceive valActive A ldr Val. i .electLeader valActive A ldr Val. i .setLeader aaz- : idActive —-> idActive,valActive :2 false,true; s, := (Pam-(3) discardElect : valActive A ELECT(ldrId) e eri —> -j’i.remove(ELECT) discardAck : valActive A ACK(ldrId) E Cj,i ——» j,i.remove(ACK) acceptLd : valActive A LD(ldrId) e 0.7-9,- —-> true V ldrVal.i.setLeader E EEEEEE 53:3:c_w 23:: end Figure 6.6: Adaptive program to adapt from Pldr Id to pldr Val- adaptation. Furthermore, when all processes have finished replacing their fractions, the program should be in a reachable (invariant) state of the ldr Val. This is required to ensure that once the adaptation is complete, the new program satisfies its specification. We now describe the state mapping that we use in the adaptation, and also discuss various overlapping communication scenarios that arise during the adaptation and how we deal with them. 82 6.2.1 State Mapping and Overlap Communication During adaptation, a process replaces the fraction of ldrId with the fraction of ldr Val. Since the protocol fractions are replaced independently at each process, we need to explicitly deal with the communication between the two protocols. A process with the old fraction may receive a message from the process with the new fraction and vice-versa. In Table 6.3, we show different overlap communication scenarios that can occur during adaptation and how they are dealt. Any message of type ELECT from the new fraction to the old fraction is buffered. Any message of type ELECT or ACK from the old fraction to the new fraction is discarded. The new fraction accepts a message of type LD from the old fraction. The scenarios where the new fraction sends a message of type ACK or LD to the old fiaction do not arise because the new fraction always discards any message of type ELECT from the old fraction. We introduce the following actions: discardElect, discardAck, and acceptLd as shown in Figure 6.6 to deal with overlap communication scenarios during adaptation. , Message Type ELECT ACK LD Overlap scenario Old (ldrId) to New (ldr Val) discard discard accept New (ldr Val) to Old (ldrId) buffer NA NA Table 6.3: Overlap communication between protocols. To ensure correctness while dealing with overlap communication, the new fractions are initialized using the state mapping defined in the Table 6.4. When the replacement is done, the old fraction is in any one of the four states as described in the Table 6.4. As described in the first case, if the old fraction is not involved in the election, then identity mapping is 83 done, i. e. , each variable of the new fraction is initialized to the corresponding variable of the old fraction. In the second case, the old fraction is involved in the election, and is waiting to receive ACK from its children. In this case, the old fraction can locally determine that the source process has not yet elected the leader. The new fraction in this case is initialized to its initial state, i. e., it is not taking part in election any more. The num and src variables of the new fraction are assigned the same value as that of the old fraction. In the third case, the fraction is waiting to receive the leader value (i. e., LD message) from the parent. In this case, an identity mapping is done. In the last case, the process is the source of the election, i. e., it started the election. In this case, we replace the fraction and have the new fi'action start the election again. We initialize the state of the new fraction to its initial state, and num variable of the new fraction is initialized to the same value as that of the old fraction. For brevity, in Figure 6.6 we show the state mapping action as s' z: (Pam-(8)- This is equivalent to a set of assignment statements that assign values to each variable of the new fiaction using the state mapping (baa,- of Table 6.4. (old) state 3 of process i Description of (old) state 3 (new) state 3’ of process i fildrld .c not in election Identity mapping ldrId.e A ldrId.W 5A (25 A in election and waiting for {num, src} - Identity mapping; ldrId.src 76 i ACK from children and is not Initial mapping source of election ldrId.e A ldrId .W = q') in election and waiting for LD Identity mapping ldrIde A ldrId.W sé o A in election and waiting for {n.um, e} — Identity mapping; ldrId.src = i ACK from children and is {ldrVal.startElection := true; source of election - Functional mapping; L Initial mapping Table 6.4: State mapping ((Daai) for each atomic adaptation 0a,. 84 6.2.2 Verifying Adaptation To verify adaptation, we need to verify all intermediate programs that occur during adap- tation. The intermediate programs that occur during the adaptation from Mr] d to ldrVal are as shown in Figure 6.5. Since the protocol fractions and the overlap communication scenarios are symmetrical, we modeled the adaptation as the adaptive program shown in Figure 6.6. Thus, instead of verifying all intermediate programs, we verify the adaptive program of Figure 6.6. Safety of adaptive program. The safety property for the adaptive program pldrld— 1d,. Val is similar to the safety property of the leader election protocols, and given as follows: Vp1,p2 :p1,p2 E {ldr1d, ldrVal} :p1.i.ldr # p2.j.ldr => plie Vp2.j.e. Invariant. We establish the following invariant for the adaptive program that implies the safety is not violated during adaptation: I E PE A P]; A PL A Pi, where PE _=_ i.idActive Aj.valActive Aj.c Aj.ack A j.VV 76 d => ELECT(ldrVal) E ij Pg. E j.valActiveAfij.eA(3i : i E j.N : i.idActiveAj.src = i.srcAi.eAi.ackACZ-J = o) =>Elk:k.p=kAk.eAk.ackAk.W#¢ PL E i.idActive A j.valActive Aj.e A j.p = i A -1j.acl: A j.W = (15 => ((ic A isrc = j.src A fliack A i.W = o)V (flieA LD(ldrId) E Ci,j)V (is A tack A i.W 7t at A i.src = j.src)V (it. A Lack A i.W 75 (J5 A i.src 35 j.src => j E i.VV)) 85 Pi E i.idActive Aj.valActive A is A -wj.e A i.p = j A -ri.ack A i.W = qb A i.src = j.src => LD(ldrId) e Cjn’ v 31.: : k.p = k A lee A hack A k.W at 4') In the above invariant, PE asserts that if process j starts an election after adapting to ldrVal fraction, then any ELECT message that j sends to process i, which is still using ldrId fraction, remains in the channel CM until i gets adaptated. P1,; asserts that if j was in election and waiting for ACK(s) from its neighbor(s) when it replaced its fiaction then the source of that election is still in the election. PL asserts that if process j was in election and waiting for LD when it replaced its fraction and the parent of j is still using the old fraction then one of the following is true: (i) the parent of j is also waiting for LD, (ii) the parent of j is not in election and the channel CiJ has a message LD(ldrId), (iii) the parent of j is still waiting for ACK(s) from its neighbor(s), or (iv) if the parent of j has started a new election then it will not receive ACK from j till it replaces its fi'action. Pi asserts that if a process i using the old fraction is waiting for a message LD from its parent j, and 3' has completed its election and is now using the new fraction, then one of the following is true: (i) the channel Cg},- has a message LD(ldrId) which was sent by process j before it got adapted, or (ii) the source of that election is still in the election. 6.3 Performance of Mixed-Mode Adaptation In this section, we compare the performance of mixed-mode and quiescence adaptation. We consider two different configurations for the following discussion: (2) a connected network (straight line) of 5 processes and 4 edges, and (2) a connected network of 7 processes and 86 11 edges. We have considered other network topologies and obtained similar results. In this discussion, we consider adaptation from ldrId to ldr Val. We have also implemented adaptation from ldr Val to ldrI d and obtained similar performance results. Quiescence Adaptation (Ldrld to LdrVal) 2500 ~ 2000 . . A . DNode1 g 1500 13'; 0: Node 2 ”J 3 ; a Node 3 g 1000 5 ~ ' INode4 , 5;! T5 : EINodes 5004 2:: 2:; _, - :éi ' 0 " : T l 1 2 3 4 5 Runs (a) Configuration 1 Quiescence Adaptatlon (Ldrld to LdrVal) 2500 ] .; E] 2000 '5; U Node 1 A 5; nNode 2 g E; . El Node 3 I E? nNode 4 - r g g 5; ' 7 a Node 5 E 52 g a Node 6 E :2 a 1] Node 7 g :5 5 . . t 7 =5 5 1 2 3 4 5 Runs (b) Configuration 2 Figure 6.7: Quiescence adaptation. Figure 6.7 shows the time required for quiescence adaptation to adapt from Id-based leader election protocol to value-based leader election protocol. It shows two configura- 87 tions and 5 runs for each configuration. In each run it shows the time taken by each process for adaptation. In run number 5 of configuration 1 and run number 4 of configuration 2, the time taken for adaptation is almost twice the average time taken by the process over other runs. This is because when the adaptation started the instance of the leader election proto- col is already running. The adaptation waits for the election to complete before replacing the protocol. Figure 6.8 shows the time required for the mixed-mode adaptation to adapt from Id- based leader election protocol to value-based leader election protocol. It shows two config- urations and 5 runs for each configuration; and in each run the time taken by each process to finish the adaptation. The time taken in each run is almost the same for a given config- uration regardless of whether the instance of leader election protocol is underway or not when adaptation occurs. Figure 6.9 shows the comparison between the average time taken by quiescence and mixed-mode adaptation to adapt from Id-based leader election protocol to value-based leader election protocol. The average time taken by quiescence adaptation is almost 8 times that of mixed-mode adaptation in the case of configuration 1, and 6 times that of mixed- mode adaptation in the case of configuration 2. The result is as expected, because the quiescence adaptation sends more messages for synchronization during adaptation. Fur- thermore, in the case of quiescence adaptation there is more processing time required at each process during adaptation as channels and other resources used by the existing pro- tocol at that process need to be released before the new protocol can be installed. On the contrary, in the case of mixed-mode adaptation the new protocol is able to deal with overlap communication and hence explicit release of channels and other resources is not necessary. 88 Mixed-Mode Adaptation (Ldrld to LdrVal) 350 - 300 A 250 [Node 1 a E 200 El Node 2 0 EINode 3 E |__. . = : 1! Node 4 Em Egg. , 5 . . : a Node 5 1 (a) Configuration 1 Mixed-Mode Adaptation (Ldrld to LdrVal) 300 , 250 l 'g " BNode1 ' I A 200 ‘ :5 mN°de2 E E? :7 ; mNodes :1504 E? E; I? ‘ 3 EINode4 E :t 55 :5 ‘ ' ENode5 I- 100 :5 .:t :5 E; :2 E? ZNodeS 50 :g E? E; nNode7 .II - I f 0 l 2., it it. 1 2 3 4 5 Runs (b) Configuration 2 Figure 6.8: Mixed-mode adaptation. Figure 6.10 shows the time required for electing a leader by Id-based and value-based leader election protocols. It is clear that both the protocols take almost the same amount of time to do an election. Furthermore, from Figures 6.9 and 6.10 it can be observed that time taken for quies- cence adaptation is more than twice the time it takes for leader election. Also, the time 89 Quiescence Vs. Mlxed-Mode Adaptatlon 1400 1200 i? g 1000 a Node 1 E 800 In Node 2 E a Node 3 5: 50° Node 4 g 400 E Node 5 < 200 o < Quiescence Mixed-Mode (a) Configuration 1 Qulescenca Va. Mixed-Mode Adaptation .3 I3 Node 1 E, [Mode 2 g a Node 3 p. a Node 4 g! B Node 5 9; ZNode 6 < El Node 7 Quiescence Mixed-Mode (b) Configuration 2 Figure 6.9: Quiescence vs. mixed-mode adaptation. for mixed-mode adaptation is less than one-fourth of the time it takes for leader election. Clearly, if a user at some process requests an election at the same instant when the sys- tem decided to adapt (using quiescence adaptation), then user would notice a long delay which could be up to twice the time it normally experiences for an election. However, if mixed-mode adaptation is chosen, the adaptation would be almost transparent to the user. 90 Ldrld Va. LdrVal 1000 950.L A 9001 o ] n g 850. ‘DNode1 g 300]f {nNodez I: 7501 EINode3 g 700’ INode4 ; 650I ..,, ENode5 ‘4 600‘ 550 ~ 1‘: , 500 I ><> . ‘ , ,‘, -. . Ldrld LdrVal (a) Configuration 1 Ldrld V8. LdrVal ‘DNode1 lUNodez ‘DNode3 ‘EINode4 ]ENode$ ‘lZNodefi @121 I Average Time (ms) — — ‘— — '— _- ..- — '— ‘— '— :- _— _— '— .— '— '— - '— — '— — _— _ /ss /29 /:3 /22 /5E (b) Configuration 2 Figure 6.10: Time for leader election protocols. 6.4 Limitations of Mixed-Mode Adaptation One of the limitations with mixed-mode adaptation is that it requires the adaptation de- veloper to have a deeper knowledge of the components involved in adaptation. Addition- ally, such adaptation may require support from components themselves. Nonetheless, in our experience, we find that components involved in mixed-mode adaptation exhibit var- 91 ious levels of mixed-mode behaviors. Consequently, based on the details available about the components involved in adaptation, an adaptation developer can provide an appropriate mixed-mode behavior during adaptation. However, in cases where components are not con- ducive in the development of mixed-mode adaptation, adaptation based on system structure such as quiescence adaptation may be more appropriate. 92 Chapter 7 Tradeoffs in Adaptation In this chapter we identify various tradeoffs that arise in developing adaptation. We iden- tify tradeoffs in verification complexity, completion time, and communication overhead during adaptation. Concurrent executions are generally considered faster than sequential executions. Specifically, with respect to adaptation, we expect that concurrent execution of atomic adaptations (if possible after considering any dependencies) would be faster than sequential execution. However, verification complexity increases exponentially with in- crease in concurrency, and also message communication overhead increases with increase in concurrency. In the rest of this chapter, we first discuss tradeoff between concurrency and verification complexity in Section 7.1. In Section 7.2, we discuss tradeoffs between concur- rency and communication overhead. Finally, we discuss a simple casestudy to demonstrate tradeoffs in adaptation in the publish-subscribe application. 93 7.1 Concurrency v/s Verification Complexity As discussed in Chapter 3, to verify a given adaptation, we consider all possible order- ings of concurrent atomic adaptations. As a result, in the lattice, we have multiple paths from start node to end node to encompass all possible orderings among concurrent atomic adaptations. This increases the number of intermediate programs that need to be verified. Putting concurrent atomic adaptations in various possible orderings is a potential cause of the explosion in the size of the lattice. For example, if an adaptation consists of n atomic adaptations that can be executed concurrently, then there are 77.! different orderings and 2” — 2 different intermediate programs. Thus, 2” — 2 transitional-invariants need to be identified corresponding to each intermediate program. The lattice in this case is as shown in Figure 7.1(a) for n = 3. To identify all these transitional-invariants and verify the corresponding intermediate programs is a difficult process. (b) Figure 7.1: Executing three atomic adaptations. Clearly, the specification during adaptation is satisfied, if the adaptation follows any 94 path in the lattice. So instead of choosing to verify all adaptation paths, if we verify only one path (e.g., a1, a3, a2), then the lattice would be as shown in Figure 7.1(b). For specifi- cation during adaptation to be satisfied the adaptation must follow this path, i. e., a1 should occur before (13, and a3 should occur before a2. In this case there is no concurrency among adaptive actions during adaptation, and we are able to reduce the cost of verifica- tion from 0(2") to 0(a). Specifically, for n concurrent atomic adaptations, the number of transitional-invariants that need to be identified is reduced to n — 1. Alternatively, we could have chosen the lattice as shown in Figure 7.1(c). In this case the cost of verification is more compared to the lattice in Figure 7.1(b), but less compared to the lattice in Figure 7.1(a). Also, the concurrency in the lattice in Figure 7.1(c) is more compared to the lattice in Figure 7.1(b), but less when compared to the lattice in Figure 7.1(a). Thus, based on the tradeoff between concurrency of adaptation and complexity of ver- ifying that adaptation, we can choose a subgraph (sublattice) of a given lattice that has all the properties of the lattice defined in Chapters 4 and 5. If we verify only a sublattice, we also need to constrain the adaptation so that it follows only path of the sublattice. 7 .2 Concurrency v/s Message Complexity Many systems are limited by communication overhead and message delays. Specifically, for wireless and mobile systems, energy—communication tradeoff may require system to reduce communication overhead whenever possible. For designing adaptation in such sys- tems, communication overhead should also be taken into account. In this section, we show 95 how concurrency during adaptation affects the communication overhead. IP SP a1 IP a] SP pl ! : 4‘ p1 : ‘ e ,2 - .3, o p \ - ‘12 . j X f V 2 V ' p3 t c 4 p3 \v 3 .j e a, a, (a) (b) Figure 7.2: Space-time diagram of adaptation. Consider the case where all atomic adaptations are executed concurrently. This is de- scribed by the lattice in Figure 7.1(a), and the space-time diagram for this adaptation is shown in Figure 7.2(a). We show only the minimum number of adaptation-specific mes- sages in the space-time diagram. There may be other application-specific messages that we do not consider as they are not related to adaptation. We divide adaptation into two phases: (2') initialization phase, and (ii) synchronization phase. In the initialization phase (denoted by IP in the figures) the initiator process that decides on the adaptation informs other processes of this decision. In the synchronization phase (denoted by S P in the figures) processes exchange messages to co-ordinate the execution of adaptive actions. In Figure 7.2(a), process p2 is the initiator that informs other processes to start per- forming any steps required for adaptation. In this case, there are at least two messages required to initiate adaptation. Now, if the adaptation were to occur according to the lattice of Figure 7.1(b), then we can make process p1 as the initiator, and the space-time diagram would be as shown in Figure 7.2(b). In this case, we got rid of the initialization messages that were required for the adaptation described by Figure 7.2(a). In both the cases, the min- imum number of adaptation-specific messages required during adaptation is two. Thus, we 96 did not increase any communication overhead by reducing concurrency. We now consider another scenario where the number of messages can actually be reduced if concurrency is reduced during adaptation. (b) Figure 7.3: Adaptation with concurrency. Consider the lattice of Figure 7.3(a) that describes the adaptation consisting of four adaptive actions a1, a2, a3, and a4 occurring at processes p1, p2, p3, and p4 respectively. Adaptive actions a1 and a2 are independent of each other and can occur concurrently. Similarly, a3 and a4 can occur concurrently. The corresponding space-time diagram is shown in Figure 7.3(b). The minimum number of adaptation-specific messages required during adaptation is five. Now, if were to reduce concurrency in this case, we can have the adaptation that is described by the lattice of Figure 7.4(a), and corresponding space- time diagram as shown in Figure 7.4(b). In this case, the minimum number of adaptation- specific messages required is reduced to four. Further, if we have no concurrency during adaptation as described by the lattice of Figure 7.5(a), then the space-time diagram would be as shown in Figure 7.5(b). In this case, 97 P1 P2 ‘HD 0 Q f N _————4 D P4 (b) Figure 7.4: Adaptation with (reduced) concurrency. a minimum of only three adaptation-specific messages are required during adaptation. IP SP 01 P1 : X e p2 - :21 1 P3 . c - a3 X P4 Ta: e 4 (b) Figure 7.5: Adaptation with no concurrency. Thus, by reducing concurrency during adaptation, it is possible to reduce the number of messages required during adaptation. However, from the space-time diagrams of Fig- ure 7.2-7.5, it is clear that time required to complete adaptation would probably be less when there is more concurrency during adaptation. Thus, while designing adaptation, one should consider various factors such as concurrency during adaptation, message delays and 98 communication overhead, and verification complexity. 7.3 Case Study: Publish-Subscribe Application In this section, we illustrate the tradeoffs due to concurrent adaptive actions during adap- tation using a simple publish-subscribe application. We consider the publish-subscribe ap- plication with two publishers (senders) and two subscribers (receivers). Both the receivers subscribe to receive data from both the senders. For reliable communication between pub- lishers and subscribers we consider two protocols, namely, the proactive protocol based on forward error correction and the reactive protocol based on acknowledgments. These protocols are discussed earlier in Chapters 4 and 5. P3 P4 (b) IP 3:2 P1 (=11 - :5- a P2 1 c2 1 Va; a P3 - e ‘ a3X P4 - g 4 (C) Figure 7.6: Adaptation in publish-subscribe application. 99 The adaptation in publish-subscribe application replaces the proactive protocol with the reactive protocol. The adaptation is done by first blocking the two senders. The blocking of two senders can be done concurrently, i. e. , independent of each other. We note that the local guards of the adaptive actions that block the senders need to be true before they can be exe- cuted. As a result, though the two adaptive actions are independent of each other, they may not necessarily execute at the same instant during adaptation. Once the protocol fractions at both the senders are blocked, the protocol fractions at both the receivers can be replaced concurrently. Finally, once the receivers have replaced to the new protocol fractions, the protocol fractions at the senders can be replaced. These replacement of fractions at the senders can also occur concurrently. The adaptation lattice in this case is shown in Figure 76(3). The corresponding space-time diagram is shown in Figure 7.6(b). The verification complexity and communication overhead can be reduced for this adaptation as discussed in Section 7.1. Specifically, if all the adaptive actions are serialized then the space-time dia- gram for the adaptation is as shown in Figure 7.6(c). Clearly, the communication overhead is reduced fi'om a minimum of 9 messages to a minimum of 5 messages. Also, the number of intermediate programs that need to considered for verification of adaptation is reduced from 8 to 5. 100 Chapter 8 Testing Adaptation In order to specify and verify the behavior of the system during dynamic adaptation, we presented an approach based on adaptation lattice in Chapters 3, 4, and 5. Due to com- plexity of the adaptive systems, the verification is often done on an abstract model of the system. In this chapter, we present an approach for testing adaptation to gain assurance about the implementation of adaptation. Predicate detection is a common approach used in testing and debugging of distributed systems, as many problems in distributed systems can be formulated as an instance of Global Predicate Evaluation (GPE) [78]. Typical properties of distributed systems such as deadlock detection, mutual exclusion, termination and many more properties can be tested using predicate detection techniques. Numerous approaches [78—83] have recognized a variety of predicate classes and presented algorithms for predicate detection. In this paper, we discuss predicate detection approach for testing adaptive systems. Due to overlapping behavior of the old program and the new program during adaptation, the existing algorithms for predicate detection cannot be applied directly. Specifically, these 101 algorithms do not deal with the system whose code is being changed. In many cases, the algorithms can be modified so that if any error is detected during adaptation then it can be mapped to a particular step of adaptation that caused the error. With this motivation, we extend the existing algorithms to test the system during adap- tation. In particular, we classify the predicates to be detected during adaptation into two types: (2') adaptation-stable predicates, and (ii) adaptation-transient predicates. We call a predicate adaptation-stable if the predicate holds throughout during adaptation, and call it adaptation-transient if it holds only in some interval during adaptation. We show how existing algorithms can be modified to detect both these types of predicates during adapta- tion. Furthermore, we show how we can reduce the cost of testing by testing only atomic adaptations. The rest of the chapter is organized as follows. In Section 8.1, we review preliminary concepts of distributed computation, causal precedence, consistent global state and consis- tent cut. Then, we introduce adaptation vector in Section 8.3. We also give a brief overview of vector clocks used to track causality among events in Section 8.3. Subsequently, in Sec- tion 8.4, we discuss algorithms to test adaptation. To reduce the cost of testing, in Section 8.5, we identify a subset of states during adaptation to do limited testing. Finally, we give a summary of this chapter in Section 8.6. 8.1 Preliminaries As discussed in Chapter 3, a program ’P consists of a set of n processes {p1, p2, pn} communicating via asynchronous messages on interprocess channels. We do not assume 102 the channels to be FIFO (unless the application requires so). Furthermore, in this chapter, we do not specify the channel states explicitly; a channel state can be constructed from con- sidering the local states of the processes. The execution of a process consists of a sequence of events. An event is the execution of a process action. An event is one of the three types: local (or internal) event, send event or receive event. A action corresponding to send event has a statement of the form: send (m) to 19,. A receive event has a corresponding action that has the following form: rcv (m) from p,- —> stmts. We now present formal definitions of distributed computation, causal precedence, con- sistent global state and consistent cut. In this section, we consider partial order semantics for a program. Definition (Distributed computation). A distributed computation r of a program P de- scribes a single execution of P by a collection of traces r[i] for each process pi. Each r[i] is a finite alternating sequence of states and events. For example, the trace of process 01 12 ls: ’l and p,- is s ..., where sf denotes the local state of p,- immediately afier event e 3? denotes the initial state before any actions are executed. A distributed computation is commonly depicted using a space-time diagram as shown in Figure 8.1. 1 2 3 4 e1 61 31 e1 P1 ‘ ‘ ‘ ‘ r 1 2 3 4 92 92 92 32 p2 - f ‘ - p3 - ~2 - e e3 e3 Figure 8.1: Space-time diagram of a distributed computation. 103 Formally, a distributed computation is a partially ordered set defined by the pair (E, —> ) , where E is the set containing all events and —> is the happened-before relation [84] that defines the causal precedence relation on events (or states). The happened-before relation on states is defined as follows: Definition (Causal precedence). The state 3 causally precedes the state t (denoted as s ——> t) if and only if one of the following holds: 0 if s and t are states on the same process and 3 occurred before t o if action following 3 is the send of a message and the action before t is the corre- sponding receive a there exists a state a: such that s —+ a: and a: —> t. We use the notation —> to denote the causal-precedence relation for both states and events. If for two events el and e2, neither el —+ e2 nor e2 —> e1, then el and e2 are said to be concurrent and denoted by e1 ll e2. Not all events are relevant for testing of dynamic adaptation, and hence to simplify the testing, we only consider a subset of events of distributed computation. Let R Q E be the set of relevant events. The poset (R, —->) describes an abstraction of the distributed computation. Definition (Global state and cut). We defined a global state as a state of all the processes and channels in the system. We, however, ignore the channel states and represent global state as a n-tuple of local states (3?, ..., sf," ). A channel state can be constructed from the set of all messages that have been sent but not received yet. 104 A cut C associated with a global state g is a set of events, one event per process, such that event e E C if and only if the process state immediately after event e is a part of g. A global history H associated with the cut C (or corresponding global state 9) of P is defined as the subset hi1 U U 12%", where It?i is the local history of process pi containing first c,- Definition (Consistent cut and consistent global state). A cut C is consistent if for all events e in the corresponding global history H, we have (6 E C) A (e’ ——> e) => e' E H. A global state 9 is consistent if the cut corresponding to it is consistent. Intuitively, a consistent global cut corresponds to a view of the run which could be obtained given the existence of a global clock. Given a space-time diagram, it may not always be possible to say how the global ex- ecution actually occurred. For example, it is not clear in Figure 8.1 if a? or a3 occurred first. In other words, there are many total orders of a distributed computation r. Thus, several possible global executions correspond to a given distributed computation r. We call each total order of r as an observation. In other words, a sequence of consistent global states 90919293... is an observation, where g0 denotes the initial global state (3?, ..., 39,), and each global state 9; is obtained fiom previous state gz-_1 by some process executing a single event. For two such global states Eli-1 and 92-, we say that gi_1 leads-to 92-. The set of all consistent global states of a computation along with the leads-to relation defines a lattice of global states or computation lattice. A path in the computation lattice corre- sponds to an observation, and each observation has a corresponding path in the computation lattice. Thus, the computation lattice represents the set of all possible observations of the computation. 105 Remark. In the following sections, unless mentioned otherwise, we will use global state to mean consistent global state. 8.2 Testing Various global properties of the system need to be tested during adaptation. Examples of the properties that can be tested during adaptation include deadlock detection, token loss detection, and in general monitoring. We classify the properties to be checked during adap- tation into two categories: (i) adaptation-stable properties, and (ii) adaptation-transient properties. If a property is to be satisfied for all states during adaptation, then it is known as adaptation-stable property, and if a property is to be satisfied for some interval during adaptation, then it is known as adaptation-transient property. A predicate used to specify adaptation-stable property is known as adaptation-stable predicate, and a predicate used to specify adaptation-transient property is known as adaptation-transient predicate. From definition, it is easy to observe that adaptation-stable predicates are specified in terms of variables that are not affected (added or removed) due to atomic adaptations. On the con- trary, adaptation-transient predicates are specified in terms of variables that are affected (added or removed) at some point during adaptation. Given a test predicate, a predicate evaluation strategy, in general, would construct and test every global state of the system during adaptation. General predicate testing is normally considered impractical for reasonably big systems, as the number of global states can be exponential in the number of processes. While testing for arbitrary predicates may be very expensive, efficient testing is feasible 106 if predicates have certain characteristics. Specifically, in [78—83], authors have identified efficient algorithms for a variety of predicate classes such as conjunctive predicates, dis- junctive predicates, observer-independent predicates, stable or unstable predicates. How- ever, in context of testing adaptation, these algorithms either cannot be employed directly or are inefficient if used directly as code of the system is changing during adaptation. In particular, these algorithms need to be extended to classify the global states constructed during adaptation into specific intermediate programs. Towards this end, we introduce adaptation vector to distinguish intermediate program states. 8.3 Adaptation Vector Adaptation vector is used to identify the intermediate program states. An adaptation vector A is a vector of n elements, where n is the number of atomic adaptations. Each element of the adaptation vector, denoted by A[i], is boolean valued (1 represents true, 0 represents false); A[i] = 1 denotes execution of atomic adaptation ai in past and A[i] = 0 denotes that atomic adaptation a,- has not yet executed. For simplicity of discussion, we assume that the adaptation consists of 7?. atomic adaptations a1, a2, ..., an and atomic adaptation a, occurs at process 1723- Our approach can be easily extended if multiple atomic adaptations occur at a process. Each node of the adaptation lattice is assigned an adaptation vector. A value of [0, ..., O] is assigned to the start node, which denotes the old program where no atomic adaptations have occurred. A value of [1, ..., 1] is assigned to the end node, which denotes the new program reached after all atomic adaptations have occurred. If [a1, a2, ..., an] is an adap- 107 tation vector, then the value of [1, 1,0, ..., 0] denotes an intermediate program reached after execution of atomic adaptations a1 and a2. Given an intermediate program and its corre- sponding adaptation vector, we can determine what atomic adaptations occurred in the past to reach that intermediate program. However, we cannot determine the order (or causal precedence relation) among those atomic adaptations. 8.3.1 Implementation of Adaptation Vectors Each process keeps its own local adaptation vector AV, which is updated as the process learns about new atomic adaptation. The adaptation vector is maintained as follows: 0 When a process p, executes its own atomic adaptation ai, it updates the adaptation vector AV,- by AV,[i] = 1. 0 When a process p,- sends a message m to process pj, it attaches the current value of AV, to m. This value is denoted by m.AV. 0 When a process p,- receives a message m from process pj, it updates its adaptation vector value as AV,- 2 AV, V m.AV, where the 0R operation over vectors is defined on a component-by-component basis. 8.3.2 Implementation of Vector Clocks A vector clock system [84, 85] is a mechanism that assigns vector timestamps to each event such that comparing the timestamps of two events indicates the causal relation between two events. Each vector timestamp is of size n, where n is the number of processes. Each process p,- has a vector V,- [1..n] of integers, which is maintained as follows: 108 o Vi[1..n] is initialized to [0,0, ...,0]. o If e,- is a relevant event, then p,- increments its vector clock entry as V,- [i] := V,- [i] + 1. It also associates the vector timestamp V,- with the event e,- which is denoted as ei.V. 0 When a process pi sends a message m, it attaches the current value of Vi to m. This value is denoted by m.V 0 When p,- receives a message m, it updates its vector clock value as V,- = max(V,-, m.V), where the maximum operator over vectors is defined on a component-by-component basis. If e.V and f .V are two timestamps associated with distinct events e and f respectively, then the fundamental property associated with vector clocks is described as follows: V(e,f) E RXR : ((e —> f) 4:) e.V < f.V),where e.V < f.V E (Vk : (e.V[lc] S f.V[k]) A 313 : (e.V[k] < f.V[k])). 8.4 Detecting Global Predicates During Adaptation Given a global state of the system and the value of adaptation vector in that state, we can identify the predicates from the adaptation lattice that should be true in that state. Adaptation-stable predicates need to be checked for all global states constructed during adaptation. The intermediate program in which the adaptation-stable predicate was de- tected can be easily identified from the value of adaptation vector associated with the state in which the predicate was detected. On the other hand, adaptation-transient predicates need to be checked only in states of the corresponding intermediate program. 109 P 73 C) .55. 0 ll 1} $3 é NSBN _x ES 4’. l pzeooo \ \x 010 " "111——]-—. p3 000 %010$]—110—]o[——’———111 1 2 3 e; e. 63 =65 Figure 8.2: Identifying intermediate program states. Figure 8.2 shows how each process is divided into different sections based on the value of its adaptation vector AV. Whenever a local state sf of process p,- is to be collected, the k state 32- is assigned an adaptation timestamp whose value (denoted as s.AV) is equal to the current value of AV). We construct a global state as a n-tuple of local states (31, 32, ..., sn). This global state is a state of the intermediate program whose adaptation vector is equal to .31 .A V V 32 .AV V V sn.AV, where 3)./1V V sj.AV = [si.AV[0] V sj.AV[O], ..., si.AV[n] V sj.AV[n]] We now discuss the algorithm for detecting weak conjunctive [79] adaptation predi- cates. A predicate is called weak if it is true for some observation of the distributed compu- tation, and is similar to possibly predicate in [78]. Conjunctive predicates are of the form C1 A A On, where each C,- is a local predicate. This class of predicates allow each pro- cess to independently evaluate its local predicate. A weak conjunctive predicate is true if and only if there exists an observation in which all conjuncts are true in some global state. This type of predicate typically describes some bad or undesirable property; in other words, 110 predicates that should never become true in the system. The algorithms discussed in this section are extensions of algorithms in [79]. 8.4.1 Detecting Adaptation-Stable Predicates In [79] it has been shown that to detect a weak conjunctive predicate it is necessary and sufficient to find a set of concurrent states in which local predicates are true. Let urcp = C1 A A Cn denote the weak conjunctive predicate to be detected. We discuss the centralized algorithm in which one process serves as a checker process and all other processes involved in urcp are referred to as non-checker processes. The algorithms for non-checker and checker processes are shown in Figures 8.3 and 8.4 respectively. Each non-checker process maintains its own vector clock and adaptation vector. The values of vector clock and adaptation vector are updated as discussed earlier in this section. Whenever the local predicate of a process becomes true for the first time since the most recently sent message (or the beginning of the trace), it generates a debug message containing its vector clock and adaptation vector, and sends it to the checker process. The checker process maintains a separate queue for each process involved in the uwcp. Whenever a debug message is received from the process it is enqueued in the queue corre- sponding to that process. It is assumed (for sake of efficiency) that the checker process gets its message from any process in FIFO order. If the underlying computation is not FIFO, the checker process can ensure FIFO property by using sequence numbers in messages. The algorithm compares the vector clock values at the head of the queues to determine if a consistent global state can be constructed. 1]] process p,- (non-checker) var V : array {initially,Vj : i 72$ j : V[j] = 0; V[i] = 1} /* vector clock */ AV : array {initially, Vj :: AV[j] = 0} /* adaptation vector */ first f lag : boolean {initially, first f lag = true} /"‘ first time the local predicate is true after any send event */ local _pred : boolean expression /* the local predicate to be tested */ D replace adaptive action {0.2-} with (12'; AV]l] = 1 C] replace send statement {send (m) to pj} with send (m, V, AV) to pj; V[i], firstflag 2: V[i] + 1, true El replace receive action {rcv (m) from pj —> stmts} with rcv (m,m.V, m.AV) from pj —> stmts; W = V17] = maX(Vlilam-V[J'l); v3" : Avm = AVljl v m.Avm El add local predicate detection action if (local_pred = true) A first f lag then firstflag :2 false; send (dbg, V, AV) to p (checker process) fi Figure 8.3: Algorithm for adaptation-stable predicate (non-checker process Pi)- The algorithm is initiated whenever any debug message is received from a non-checker process. If the corresponding queue is non-empty, then the message is inserted in the queue; otherwise it is checked if the message lead to the case where the conjunctive predicate became true. The algorithm maintains changed and newchanged variables to ensure that only those heads of the queue are compared which have not been compared earlier. When the while loop terminates if all the queues are non-empty, then the intermediate program can be ascertained in which wcp was first detected. 112 process p (checker) var q1,...,qn : queue of(V, AV) changed, newchanged : set of {1, 2, ..., n} Cl rcv (elem) from Pk —-> /* elem.V and elem.AV denotes the value of vector clock and adaptation vector */ insert(qk, elem); if (head(qk) = elem) then changed := {1:}; while (changed 7Q ab) do newchanged := {}; for i in changed A j in {1,2, ...,n} if (aemptvmi) /\ aempty(qj)) then if head(q,;).V < head(qj).V then newchanged := newchanged U {i}; fi if head(qj).V < head(q,-).V then newchanged z: newchanged U {j}; fi fi changed 2: newchanged; for i in changed deletehead(q,-); od /* end while */ if Vi : nempty(qi) then found := true; int.program_AV :2 [V?=1 head(q,-).AV[1], ..., 52:1 head(q,-).AV[n]] /* intermediate program in which the predicate was detected */ fi fi Figure 8.4: Algorithm for adaptation-stable predicate (checker process). 8.4.2 Detecting Adaptation-Transient Predicates Consider a weak conjunctive predicate wcp that is to be checked for some intermediate program I during adaptation. Let I be represented by the adaptation vector I AV- The goal is to detect wcp during the interval of adaptation that corresponds to the intermediate program I. 113 When detecting adaptation-transient predicates, the non-checker processes have to check for local predicates only for some interval of adaptation as defined by the inter- mediate program in which the adaptation-transient predicate is to be checked. Specifically, each non-checker process identifies a set of local states that are potential states of the in- termediate program I. Let the current value of adaptation vector in state 8 of process p,- be AV), which is denoted by s.AV. We first present the following lemma in the context of identifying potential states of an intermediate program locally at each process. Lemma 7.1. For state 3 ofprocess pi, s.AV[i] = 0 2) Vj : j -+‘ i : AVy-[i] = 0 Proof. From definition of adaptation vector in Section 8.3, adaptive action a,- occurs at process 1),. The lemma states that if process p,- has not performed its atomic adaptation, then adaptation vectors at all other processes denotes the same. Also, if adaptation vectors at all other processes denote that process p,- has not performed its atomic adaptation, then it does not imply that p,- has not performed its atomic adaptation. The proof is apparent from the implementation of adaptation vectors. El We now discuss the following two cases at each process p,- to identify the potential states of intermediate program I: Case 1 : I AVli] = 0. This is the case where intermediate program 1 occurs before the atomic adaptation at process p, is executed. s is a potential state of the interme- diate program I, if s.AV V [AV 2 IAV- For example, if AV] 2 [1,1,0,...,O] and s.AV = [1,0,0, ..., 0] then s is a potential state of intermediate program I, whereas s.AV = [1,0, 1, ..., 0] is not a potential state of intermediate program I. Case 2 : I AVlll = 1. This is the case where intermediate program I is reached after the atomic adaptation at process p, is executed. From Lemma 7.1, it is clear that in this case 114 checking the condition s.AV V I AV 2 I AV is not sufficient. For example, if s.AV = [0,0, ..., 0] then the condition is true, but atomic adaptation at process p,- is not executed (because s.AV[i] = 0). Therefore, we also need to check that s.AV[i] = 1, i. e., atomic adaptation at process i has executed. In other words, we also need to check for the following condition: s.AV[i] = I AVlil- Based on the above discussion, we modify the algorithm of Figure 8.3 for non-checker process so it can detect adaptation—transient predicates. The modified non-checker process algorithm is shown in Figure 8.5. In the algorithm, plist maintains a list of - predicate, the corresponding intermediate program in which it needs to be checked, and the checker process that is checking that predicate. We maintain one checker process algorithm for each adaptation-transient predicate that needs to be checked. The checker process algorithm is modified so that whenever it finds a consistent state (i. e., all queues are non-empty) it checks the adaptation vectors to ensure if that consistent state belongs to intermediate program I. The modified checker process algorithm to detect adaptation transient predicate is shown in Figure 8.6. 8.5 Testing Only Atomic Adaptations In Section 8.4, we showed how we can test predicates that have certain characteristics during adaptation. However, if the test run is large then it increases the cost of testing. It is, therefore, desirable if we can reduce the cost of testing by doing a partial testing on a small subset of states rather than testing the entire run of adaptation. In this case, the states chosen should be such that testing predicates in these states would still give reasonable 115 process p,- (non-checker) var V : array {initially,Vj : i #j : V[j] = 0; V[i] = 1} /* vector clock */ AV : array {initially,‘v’j :: AV[j] = 0} /* adaptation vector */ first flag : boolean {initially, first flag = true} /"‘ first time the local predicate is true after any send event */ plist (IPAV, local_pred, cp) : array ( array, boolean expression, process id ) /* local predicates to be tested */ [:1 replace adaptive action {ai} with a,; AV[Z] = 1 1:1 replace send statement {send (m) to pj} with send (m, V, AV) to pj; V[i], firstflag := V[i] + 1,true El replace receive action {rcv (m) from pj —> stmts} with rcv (m,m.V,m.AV) from pj —* stmts; w = vm = maxrvv1,m.vm>; Vj : AV[j] = AVLj] V m.AV[j] Cl add local predicate detection action forp in plist ifp.IPAv[i] = AV[i] /\ (p.IPAV V AV) = p.IPAV Ap.local_pred = true A first f lag then firstflag := false; send (dbg, V, AV) to pop (checker process) fi Figure 8.5: Algorithm for adaptation-transient predicates (non-checker process pi). assurance about correctness of adaptation. With this motivation, we choose the states before and after each atomic adaptation for testing. For adaptation to be correct, each atomic adaptation should occur in some “safe state” of the system. Informally, each atomic adaptation should occur when the system is ready to deal with this change. Thus, instead of checking all the states of intermediate programs, we can check all the possible states in which the atomic adaptation could have 116 process p (checker) var q1,...,qn : queue of(V, AV) changed, newchanged : set of {1, 2, ..., n} [P AV : array /* adaptation vector of being checked */ [:1 rev (elem) from Pk -—> /* elem.V and elem.AV denotes the value of vector clock and adaptation vector */ insert(qk, elem); if (head(qk) = elem) then changed := {1:}; while (changed aé (:5) do newchanged := {}; for i in changed A j in {1,2, ..., n} if (nemptymi) A permits/((13)) then if head(qz').V < head(qj).V then neurchanged :2 newchanged U {7.}; fi if head(qj).V < head(qz-).V then neutchanged :2 newchanged U {j}; fi fi changed :2 newchanged; for i in changed deletehead(q,-); od; /* end while */ if Vi : nempty(q,-) then int_program_AV := [V?:1head(qz-).AV[1],...,V?:1head(q,-).AV[72]] if IP AV = int.program_AV then found := true fi fi fi Figure 8.6: Algorithm for adaptation-transient predicate (checker process). occurred (or could have lead to). Moreover, since the set of possible states before and after atomic adaptations is expected to be much smaller than the set of all states it may also be possible to check for arbitrary predicates in those states. 117 For example, consider the adaptation lattice of Figure 4.2. In this case, when atomic adaptation a3 is executed the predicate associated with node R1 should be true, and when a3 has completed execution the predicate associated with node Q should be true. We now discuss how we calculate, for each atomic adaptation, the set of states in which the atomic adaptation could have occurred. The approach to find the set of states after each atomic adaptation is similar. We first define some notions that we borrow from Venkatesan and Dathan [82]. A spectrum spec,- in process p,- is a sequence of consecutive states of Pi- The first (respectively, last) event of spec,- is denoted by f irst,-(spec,-) (respectively, lasti(spec,-). A state 3': in spec,- is the first state in spec,- if Vs : s E spec,- : s: —> 3. Similarly, a state sf in spec,- is the last state in spec,- if Vs : s E spec,- : s —-> sf . The first I consistent cut state of 5‘: at process pk, denoted by first “5‘: ), is the earliest state 3k in Pk such that 3k and sf can be in a consistent state. Similarly, the last consistent cut state of szl at process pk, denoted by last k(sg ), is the latest state 3% in pk such that sic and s; can be in a consistent state. For example, consider Figure 8.1 of Section 8.1. Let 527 be the state of . -I p,- immediately before the execution of event eg , and 3': be the one immediately after the I execution ofthat event. In Figure 8.1, first1(s%) = s? , and last1(s‘21) = 3%. f = a,- be the atomic adaptation at process 1),. Let sf be the state of the Now, let e process p,- just before the execution of atomic adaptation (2,. To identify the set of global states in which a,- could have occurred, we need to identify the local states at all processes that are consistent with state sf. We first identify the spectrum of states in each process that are consistent with 51-“. Let specj(a,-) denote the spectrum of states in process pj, possibly empty, that are consistent with state sf . We have, specj(ai) = [firstj(sf),lastj(sf)] 118 Note that sped-(oi) = sf. We use the property of the vector clock to find f irst j (s 2. ) = l. 8.7 and lastflsf) == 55.”. If s§.V[j] = 0, i.e., no state of pj precedes sf, then I = 0; otherwise, sly-.VU] = sir/[J] A $371.14)] < sf.V[j], and (sE-"zVM g sf.V[i]) A ((8? = final_statej) V (s?+1.V[i] > sf.V[i])) We use final _statej to denote the state in which process pj terminates. In this manner, we can find the spectra spec1(ai), ..., specn(ai) at processes p1, ..., pn respectively. Since the trace is typically small, it may be possible to construct all possible consistent states from these spectra. This allows us to check even arbitrary predicates in all possible states in which atomic adaptation could have occurred. Furthermore, given all these spectra we can use the algorithms from Venkatesan and Dathan [82] to test for different types of predicates using intersection of these spectra. Moreover, if the predicates are of certain types, we can also use algorithms discussed in Section 8.5 on the collected trace for each atomic adaptation. 8.6 Chapter Summary In this chapter, we discussed testing of adaptation in distributed systems using predicate detection. We identified two classes of predicates, namely adaptation-stable and adapta- tion-transient predicates, that occur during adaptation. We introduced adaptation vector to identify intermediate program states in testing dynamic adaptation. Furthermore, to reduce the cost of testing, we identified a subset of states to do limited testing of adaptation. Specifically, instead of checking for all intermediate program states, 119 we presented an approach to test the system before and after each atomic adaptation. We discussed an approach to calculate spectra at each process consistent with the state before (or after) each atomic adaptation. We can use the existing algorithms for predicate detection on this smaller trace. t- 120 Chapter 9 Component Family: Design of Adaptive Components In this chapter, we discuss component family design to support runtime adaptation. We first discuss, in Section 9.1, the issues that arise in developing compositional adaptation. Then, in Section 9.2, we present the abstract design of component family. In Section 9.3, we discuss the concrete design of component family. Next, in Sections 9.4 and 9.5, we illustrate the use of component family using the case studies. We discuss some questions related to component family design in Section 9.6. In Section 9.7, we discuss the related work and finally, in Section 9.8, we summarize the advantages of component family. 9.1 Introduction In compositional adaptation, which is the primary focus of this dissertation, some algo- rithmic or structural parts of the system are added, removed or replaced at runtime. Com- 121 positional adaptation enables an application to adopt new algorithms and strategies for addressing concerns that were not known at the time the original application is developed. Typically, in the case of compositional adaptation, separations of concerns principle [64, 86] is employed to separate the adaptation concern. The separation of concerns is an ubiquitous software engineering principle which states that for a given problem different kinds of concerns should be identified and separated to cope with complexity and achieve robustness, adaptability, maintainability and reusability. In other words, it states that soft- ware should be decomposed in such a way that different aspects of the problem are solved in well-separated modules or parts of the software. Consequently, adaptive applications are designed to separate the functionality that gets adapted from rest of the application. In object-oriented programming, separated concerns are modeled as objects and classes, while in structural programming these concerns are modeled as functions or procedures. We argue that these two programming paradigms are not adequate to model concerns that spread across multiple processes. For example, in the case of adaptive distributed systems, functionality that gets adapted may involve changes to multiple processes across the sys- tem. To deal with this we consider component-based paradigm, where the basic unit of concerns are components [28, 87]. A component is defined as follows : “a component is an executable unit of code that provides physical black-box encapsulation of related services. Its services can only be ac- cessed through a consistent, published interface that includes an interaction standard. A component must be capable of being connected to other components (through a commu- nications interface) to form a larger group” [87]. Although objects or procedures form the underlying fabric of software solutions, it are components that provide the effective 122 granularity to model adaptive systems. In other words, object-oriented programming or structural programming may be used to realize component-based systems. Thus, composi- tional adaptation can be considered as addition, removal or replacement of components. Existing approaches to building adaptive software have some notion of a component, which is similar to the one defined above. In many of the approaches, the term component refers to a replaceable class or object [88]. However, in our approach, a component can consist of several objects and these objects may be spread across different processes of a system. Regardless of how a component is defined, adaptive software needs some mecha- nisms for examining and understanding the state of its components (introspection) and also some mechanisms to modify the components (intercession). There are three main issues that occur while developing introspection and intercession mechanisms for compositional adaptation: 0 Reference update. As shown in Figure 9.1, when a component is replaced (respec- tively, added or removed) the references pointing to the component need to be up- dated. Specifically, when an old component is exchanged for a new component it is necessary to update all the references to the old component such that they refer to the new component. This is important to ensure that the system is not left in an inconsistent state after component replacement. A common solution to this prob- lem is achieved through indirection, which allows decoupling of an application. The standard practice of programming to an interface [89] is useful in achieving this indi- rection. This allows an application to access a component while ensuring that access methods are not coupled to the implementation of the component. 123 / [ Reference ] Old 1 Component ] gy (a) Incorrect/incomplete update (b) Correct/complete update Figure 9.1: Reference update during adaptation. 0 State transfer. When a component is replaced, the state of the old component needs to be transferred to the new component. This is necessary for performance as well as correctness reasons. It is not optimal to let the new component start from initial state, as the new component may have to do the computations again which the old component already performed. Moreover, in some cases it may be incorrect to have the new component start fiom an arbitrary initial state. If two components are to be interchanged at runtime, then mechanisms to extract the state from the old component and inject the equivalent state into the new component need to be considered. Synchronization. When a component is added, removed or replaced, it is important to control the access to the component during the change. For example, once the state of the old component is extracted for transfer to the new component, the appli- cation should not modify the old component as it would lead to unexpected results. It is necessary to have proper intra—process and inter-process synchronization during adaptation, as lack of synchronization can lead to undesirable and incorrect behavior. We now consider the three main limitations concerning most of the existing approaches for building adaptive software: 124 1. Tight coupling. Most adaptive softwares have an entity that is normally categorized as aflamework or a composer, whose goal is to perform adaptation related steps. In other words, it primarily deals with the above three issues, namely, reference update, state transfer and synchronization. Most approaches to adaptive software are tightly coupled in terms of components that are adapted and framework that performs adap- tation. Specifically, the frameworks are either not reusable with other components and applications, or the role of the framework is not clearly defined. Many times the introspection and intercession related steps are tightly coupled with the compo- nent functionality. These kind of couplings leads to the following four problems: (i) makes the verification of adaptation difficult, (ii) restricts independent development of components, (ii) restricts reuse of component and framework, and (iv) modifying introspection and intercession mechanisms are difficult. 2. Large number of adaptations. There are several components available for a given functional requirement. For example, to provide reliable communication one can use a proactive component based on forward error correction or a reactive component based on acknowledgments. The main motivation for adaptation is to provide an appropriate component for a given functionality. Now, consider the scenario where there are n different components that provide similar interface and fiinctionality, and the choice of the component depends on application requirements and environment conditions. In this case, there are n(n — 1) possible adaptations among these com- ponents. Considering the fact that each adaptation needs to address introspection and intercession issues and also leads to the four problems related to tight coupling, 125 identifying all these adaptations is a difficult and unfeasible task. Moreover, if the developer of the component wants to expose the details of that component to only a subset of developers of other components then adaptation between that component and the remaining components may not be possible. . Unanticipated components. While developing adaptations among existing com- ponents, it is difficult to anticipate new components those would be developed later. Furthermore, while developing a new component one does not know about all the existing components that can potentially be replaced by the new component. More- over, for the case where one does not know the details of some components, it is impossible to provide assurance for adaptations to and from those components. As there is a growing need to build systems that are autonomic in nature, it is desirable to build applications that can support the use of unanticipated components with little or no human intervention while still being able to guarantee assurance of adaptation. To overcome above limitations, we introduce the notion of component family — a sys- tematic and extensible repository of components that provide similar functionality. There are two main aspects to component family: (i) overall design of the library of components, and (ii) design of individual components in the library. The overall design of component family is based on the ideas of program families as proposed by Pamas [90] for the study of reusable software. We first list the design goals for component family and then give a formal definition of component family: 1. Adaptation ready. The components should be designed with adaptation in mind. In other words, components should provide additional methods required during adapta- 126 tion. For example, the components discussed in [3, 91] provide checkstate function to check state of the component for safety, block to temporarily block the component operations, etc. 2. Separation of adaptation logic. The adaptation logic should be separate from the core functionality of components, thereby, simplifying specification and verification of adaptation. The application developer is primarily concerned with using com- ponent functionality and should not have to deal with details of adaptation among components. 3. Replaceable adaptation logic. Adaptation from one component to another can be done in multiple ways. The design should support easy replacement of adaptation logic. For example, as discussed in Chapter 6, replacing one component by another can be done either using quiescence adaptation or mixed-mode adaptation. Clearly, the adaptation logic should be modeled separately and should be easy to replace. 4. Language and platform neutral. The design should be independent of any partic- ular language or platform so that component family can be implemented in different languages on a variety of platforms. 5. Application independent. The design should be applicable to components from different domains. 6. Extensible. The design should be extensible so that new components that are devel- oped later can be easily integrated into the family. 127 Considering the above goals, we first discuss the abstract design of a component family and then describe the concrete representation of the family. 9.2 Abstract Component Family Definition. A component family is a strongly connected directed graph, say (V, E), where (i) each vertex in V denotes a component, (ii) all components have similar interfaces and syntactically one component in the family can be replaced by another, and (iii) Each arc (v1, v2) 6 E denotes that there exists an adaptation from v1 to ’02. We illustrate a component family in Figure 9.2. Consider the subgraph consisting of vertices A, B, C, and D, which represents the four components in the family. Direct adap- tations are defined for each arc in this graph. The arc from A to D denotes that there exists an adaptation from A to D. In other words, arc (A, D) implies that there is a corresponding adaptation lattice (as discussed in Chapter 3) that defines the adaptation from A to D. We say that arc (A, D) is a verified are if verified adaptation from A to D exists. Otherwise stated, there exists a transitional-invariant or transitional-faultspan lattice corresponding to the adaptation from A to D. Furthermore, as shown in the Figure 9.2, there exists a path from component A to component B, which implies that there exists a sequence of adaptations through which A can be replaced by B. Such a path exists from A to D, from D to C, and from C to B. 128 D C ’ Figure 9.2: An example of a component family 9.2.] Abstract Component Structure As discussed earlier in this section, in order to keep a graph of component family strongly connected, we need at least two adaptations associated with each component, such that one adaptation is to the component and one adaptation is from the component. A component has different adaptation related actions corresponding to different adaptations that it is involved in. However, the part of the component that performs the actual fimctionality remains the same irrespective of the adaptations. Reckoning the adaptation requirements, each component in a component family is de- signed to consist of two parts: (i) a functional part, and (ii) an adapt-active part. The adapt-active part is involved in state-transfer and synchronization related actions that are needed only during adaptation. In other words, functions of the adapt-active part are in- voked only during adaptation. Each component in a family consists of exactly one functional part. A component may have zero or more adapt—active parts. The adapt-active part of a component corresponds to a particular adaptation the component is involved in. A component may not have an adapt-active part (in other words, have an empty adapt-active part), if the component does not perform any introspection or intercession related actions during adaptation. Also, a component may have an adapt-active part that is shared with multiple adaptations that it is 129 associated with. From an implementation perspective, depending on the adaptation that the component is involved in, the appropriate adapt-active part corresponding to that adaptation should be loaded before adaptation. This can be triggered internally (by some monitoring module) or externally (by an user). [7 F Functional Part Functional Part F‘ Adapt-active Parts Adapt-active Parts Aad I Ada Bbd Bob 17?— 173— 9 Functional Part Functional Part i” Adapt-active Parts Adapt-active Parts Dad ] Dda ]DbdiDdc Cdc L Cob Figure 9.3: Structure of a component in a family For example, consider the components A, B, C, and D in a component family as shown in Figure 9.3. The component B has two adapt-active parts, Bbd and Bob The part Bbd is used during adaptation from B to D, and the part Bcb is used during adaptation from C to B. The abstract design of a component family, as discussed in this section, is useful when developing high-level abstract models of the components and adaptation. However, there are several concerns that need to be addressed from implementation perspective while de- veloping the component families. In the next section, we discuss the concrete representa- tion of component family. 130 9.3 Concrete Component Family The various architectural parts of a component family and high-level relationships among them are shown in Figure 9.4. [3 Decision Adaptation Adaptation Maker Controller instantiates Handlers D l: 8 s .. E ~08? rd‘wa‘e :g .9 (1° 00° 5 5 e’” :7. S .E j Inspects and E] [3 L Component Initializes StateMapping Manager delegate Components Handlers [:1 D Figure 9.4: Architecture of a component family 0 Components. At the core of the component family are the set of components that provide the actual fimctionality required by the application/user. In the absence of component family, the application developer uses the components directly. 0 Component Manager. To provide adaptation transparency to the application using the components, we decouple the application from the components by introducing an entity between the application and the components. We call this entity Com- ponentManager. ComponentManager accepts the requests from the application and delegates that request to the currently active component. It also controls the exchange of information between DecisionMaker and AdaptationController. 131 0 Decision Maker. DecisionMaker decides what is the appropriate component to be used based on the environment conditions. The entity external to the component family may also decide on when and what to adapt. For example, a user requirement may define the adaptation. In this case, DecisionMaker that is part of the family may be either disabled or made to co-ordinate with the external decision maker to reach a decision. The details of DecisionMaker is beyond the scope of this work. The reader is referred to [92—95] for more information on decision makers. o Adaptation Controller. Once ComponentManager has learnt of the decision from DecisionMaker and decided to adapt it passes that information to AdaptationCon- troller. AdaptationControiler selects an appropriate AdaptationHandler to execute the adaptation. o Adaptation Handlers. Each AdaptationHandIer implements the logic for adapta- tion. It executes tasks related to reference updates, synchronization, and state trans- fer. Specifically, it deals with any dependency and synchronization related issues while adding, removing, or replacing the components. It also instantiates an appro- priate StateMappingHandIer to perform any required state transfer during adaptation. o StateMapping Handlers. StateMappingHandlers are responsible to map the state of the old component to the new component during component replacement. There is a separate StateMappingHandler for each adaptation. Various mappings are dis- cussed in Chapter 3 and StateMappingHandlers implement those mappings. 132 Doclslonlaakor] L AdaptatlonControllor :"Appu"7caro"3": . , ..... ‘Z---- «uses» «uses» \ I x \ I I «USES» : / \ I I \\ T Decision Maker T lAdaptationController \\ \ \\ ICom onent ~b——| Com onentMana er p ,I p g I ,1 Adaptationl-Iandlon ] / «USES»’,”’ / ’I,” I ’ I X ”” «delegates»/ , ‘ ICMAdaptReady ,’ // ‘\\ «uses» I I ‘ \ ‘ lComponent ,/ ‘~-\ /’ «delegates» “‘\‘\ I Component I—d-fi’lAdaptReady I Statollapplngl-Iandlon Figure 9.5: Interfaces of a component family. 9.3.1 Component Family Interfaces Figure 9.5 shows how different parts of a component family are related to ComponentMan- ager. It shows that ComponentManager implements four interfaces, namely, lComponent, ICMAdaptReady, IDecisionMaker and lAdaptationController. All the functional compo- nents in the family provide two interfaces [Component and IAdaptReady. I CMAdaptReady interface extends IAa’aptReady interface. [Component interface specifies the core function- ality provided by the component. IAdaptReady interface specifies adaptation-related func- tionality provided by the component. It maps to the adapt-active parts discussed earlier in the abstract design of component family. The application is external to the component family and it uses [Component interface provided by ComponentManager. ComponentManager instantiates the component that will provide the functionality re- quested by the application. At any time, ComponentManager maintains exactly one active instance of some component. It uses delegation pattern to delegate any service request from 133 the application to the active component. ComponentManager also provides IDecisionMaker interface for use by the decision maker. The choice of how the decision maker is implemented affects the design of IDeci- sionMaker. The following are two main issues that need to be considered while designing IDecisionMaker: (2') Whether the decision maker will be a part of the family or external to the family, and (ii) In the case of a distributed system, which process will have the decision maker. In the case where the decision maker is part of the family, ComponentManager creates and initializes an instance of DecisionMaker. DecisionMaker uses IDecisionMaker interface to collect information from ComponentManager that is required to make a deci- sion about adaptation and to inform ComponentManager of the decision. ComponentManager also provides lAdaptationController interface that is used by AdaptationController. ComponentManager creates and initializes an instance of Adapta- tionController and informs AdaptationController of the decision made by DecisionMaker. AdaptationController uses lAdaptationController to get details about the current active component and any other information that is required by it. Once AdaptationController has learnt of the decision, it selects an appropriate Adap- tationHandler that will perform the adaptation. As shown in Figure 9.6, each Adaptation- Handler extends Abstrachaptatz’onHandler, which provides IAdaptationHandler inter- face that is used by AdaptationController. Each AddptationHandler also instantiates an appropriate StateMappingHandler, which performs any required state transfer to map the state of old component to the new component. AdaptationHandlers and StateMappingHandlers use ICMAdaptReady interface of ComponentManager to communicate with the components to coordinate the adaptation 134 and state transfer. ComponentManager delegates the requests received from Adaptation- Handlers and StateMappingHandIers to the active component. I AdaptationControIlor I \ \ \ \ \ \ AbstractAdaptafionHandler adapt() if F AdaptationI-Iandlor_1 AdaptationI-Iandlor_N adapto ' ' ' flap“) lAdaptationHandleF—‘b Figure 9.6: Adaptation handlers. We now give a brief overview of delegation pattern used by ComponentManager. 9.3.11 Delegation Pattern Delegation pattern is a technique where a component outwardly expresses some behavior, but delegates the actual responsibility of implementing that behavior to another component. Delegation is a way of extending and reusing a class by writing another class with additional functionality that uses instances of original class to provide the original functionality. Chain of responsibility and observer pattern [89] use delegation pattern. The sample code in Java programming language that illustrates delegation pattern is shown in Figure 9.7. In this example, ComponentManager class has instances of two components Compo- nentX and ComponentY. ComponentManager, ComponentX, and ComponentY all imple- ment [Component interface. ComponentManager maintains an active instance of one com- ponent (which can be changed). ComponentManager can delegate the call to foo function to either ComponentX or componentY. 135 // Common Interface public interface [Component { public void foo(); } // Component X public class Componentx implements [Component { public void foo() { // Component X implementing foo } } // Component Y public class ComponentY implements lComponent { public void foo() { // Component Y implementing foo } } // Component Manager public class ComponentManager implements lComponent { lComponent instance = new ComponentX(); // For Delegation to X public void toX { instance = new ComponentX(); } // For Delegation to Y public void toY { instance = new ComponentY(); } // foo is delegated public void foo () { instance.foo(); } Figure 9.7: An example of delegation pattern. 136 9.3.2 Component Family Instantiation In the previous subsection, we discussed the different interfaces of a component family. In this section, we discuss instantiation of a component family. We consider two cases depending on the type of components represented by the component family: (2') local com- ponents, and (ii) distributed components. We use the term local components to refer to components that are installed at a single process, and distributed components to denote components installed across multiple pro- cesses. Examples of adaptations involving local components include: (2') replacing local monitoring component that monitors some environmental conditions and takes appropriate actions, (ii) replace logging component that stores a log of events at a process. In this case, the application of component family is relatively easy. The issues related to inter—process synchronization do not occur in this case, as a result, AdaptationHandlers are also less complex. We focus our discussion on distributed components as it offers more challenges and the results in the case of distributed components can be trivially applied in the case of local components. We first identify the challenges during adaptation involving distributed components that any implementation of a component family needs to address and then discuss our imple- mentation. o Atomicity. When a distributed component is replaced in an application, we need to ensure atomicity. In other words, all local fractions of the distributed component should be replaced across all or selected processes of the application or none should be replaced. 137 _t'fl‘lil: __ .'l 'l‘ 0 Minimal blocking Another challenge is that during adaptation involving a dis- tributed component, the application should not be blocked. While it is desirable that the adaptation of a distributed component be entirely non-blocking, it is not always possible to do so due to dependency among component fractions. We, therefore, require that blocking introduced during the adaptation be minimal. 0 Synchronization. To avoid interference between concurrent versions of distributed components, some global synchronization is often required. For example, if a pro- cess that has added a component fraction interacts with another that has not then the results can be unpredictable. o Transparency and noninterference. The execution of an application that is using a distributed component should not be affected while the adaptation is underway. The code that addresses these challenges is implemented as AdaptationHandlers in the component family. The design of component family separates out the implementation of AdaptationHandlers, thereby, allowing different ways to deal with these challenges. Several implementations of adaptation logic, written by different people can be part of the same family and an appropriate implementation can be chosen for execution based on environment conditions and application requirements. In the next section, we discuss one particular implementation. Remark. For sake of brevity, we will use the term component to mean distributed component in the rest of this chapter. 138 9.3.3 Component Family Implementation In this subsection, we discuss an implementation of AdaptationHandler, which specifies adaptation logic to replace one component by another. ComponentManager at one of the process acts as initiator. DecisionMaker at the initiator process may be the only active DecisionMaker in the system, or if there are DecisionMakers at other processes then the DecisionMaker at the initiator process may coordinate with DecisionMakers at other pro- cesses to decide on adaptation. Regardless, once ComponentManager at the initiator pro- cess learns of the decision and it decides to adapt, it informs AdaptationController about it. The Adaptation Controller will instantiate appropriate AdaptationHandler that will execute the adaptation. AdaptationHandlers at different processes coordinate to execute the adap- tation. We use a variation of distributed reset protocol [96, 97] to achieve inter-process synchronization among component fractions. We first discuss the distributed reset protocol in brief and then give details of how it is used to achieve synchronization during adaptation. 9.3.3.1 Distributed Reset Protocol The reset subsystem in [96, 97] can be embedded in an arbitrary distributed system to allow processes to reset the system to a given global state. In the model described in [96, 97], each process consists of an application module and a reset module. The application module at any process may begin the reset operation. The fimction of a reset module is to (I) reset the state of the application to a state that is reachable from the given global state, and (2) inform the application module when the reset operation is complete. Each reset operation satisfies the following two properties: (1) Every reset operation is 139 ma' Ill-F . I1 non-premature, i. e. , if the reset operation completes, then all processes have been reset and the program state is reachable from the given global state, and (2) Every reset operation eventually completes, i. e., if an application module at a process initiates a reset operation, eventually the reset module at that process informs the application module that the reset operation is complete. The reset solutions in [96, 97] allow the program computation to proceed concurrently with the reset, to any extent that does not interfere with the correct- ness of the reset. To simplify the reset operation, the algorithms in [96, 97] maintain a rooted spanning tree of all non-failed processes. It uses this spanning tree to perform a difusing computation [76] in which each process resets its state. The diffusing computation begins at the root of the spanning tree. The root of the tree resets the state of its local application module and initiates a reset wave that propagates along the tree towards the leaves; whenever the reset wave reaches a process, the process resets the state of its local application module and propagates the reset wave to its children. After the reset wave reaches a leaf, it is reflected as a completion wave towards the root. A process propagates the completion wave to its parent when it receives the completion wave from all its children. The reset is complete when the root receives the completion wave from all its children. We now discuss how the distributed reset protocol is used to replace a component across a distributed application. 9.3.3.2 Using Distributed Reset Protocol for Adaptation AdaptationHandlers at all the processes communicate using a variation of the distributed reset protocol. The reset protocol consists of two waves: a reset-initialization wave, and 140 r"— "‘.:‘.'I' I...- ‘l"- 4 a replacement wave. The replacement wave consists of two sub-waves, namely, a reset- transition wave and a reset-completion wave. We first present the outline describing the reset process and then explain each of the reset waves in detail. The AdaptationHandler at the initiator process initiates the reset by sending the reset-initialization wave. In the reset- initialization wave, all processes change to the transit state and initialize the component fraction of the new component. Thus, in the transit state, a process has initialized the new component fraction, although it is still using the old component fi'action. After all processes have set themselves into the transit state, the reset-initialization wave completes successfully. In case any process does not set itself into transit state during the reset- initialization wave, the reset-initialization wave completes unsuccessfully. The option of unsuccessful completion is provided to deal with the case where the processes need to obtain their component fraction remotely and they fail to do so, or sufficient resources are not available at a process to start the new component. If the reset-initialization wave completes unsuccessfully, component replacement is abandoned. Upon successful completion of the reset-initialization wave, the AdaptationHandler at the initiator starts the replacement wave. The replacement wave begins with the reset- transition wave from the initiator (root) towards leaves. The AdaptationHandler at each process receiving the reset-transition wave removes the old component fiaction and adds the new component fraction. It uses services provided by IAdaptReady interface of the component fraction to ensure safety (correctness) of such replacement. IAdaptReady in- terface of each component fraction provides checkState fmetion to determine the state of the component fraction. After a leaf process has completed the replacement of its com- ponent fraction, it sends the reset-completion wave to its parent. Further, if a non-leaf 141 process has completed the replacement of its component fiaction and it has received the reset-completion wave from all of its children, it propagates the reset-completion wave to its parent. The reset-completion wave eventually reaches the initiator. We allow another reset to start once the first reset is completed (successfully or unsuccessfully). We now explain the reset waves in detail. Reset-initialization wave. The AdaptationHandler at the initiator initializes the reset by sending the reset-initialization wave to all of its neighbors. The AdaptationHandler uses the adaptation initialization protocol to communicate with other reset modules. The adaptation initialization protocol communicates information such as the name of the component, the location of the server where components are available, etc. The AdaptationHandler at each process that receives the reset-initialization wave performs the following tasks: 1. It sets its parent to the first process from which it received the reset-initialization wave . 2. It propagates the reset-initialization wave to all of its neighbors except its parent. 3. If a process receives the reset-initialization wave again it informs the sender of the identity of its parent. This information is used to form a tree. 4. If the process that receives the reset-initialization wave is a leaf (i.e., no neighbor process has set its parent to this process), it initializes the new component and sets itself into the transit state, where it is still using the old component while waiting to use the new component. If the process fails to initialize the new component, it sets itself into error state. The process then communicates its state (transit or error) to its parent. 142 5. When a process has received transit state message from all of its children, it sets itself into transit state by initializing the new component. If it receives the error state infor- mation from any of its children or if it fails to initialize the new component fraction, it sends the error state message to its parent. Eventually, the root process receives the transit state or the error state information from its children. If it receives the error state information from any of its children, it can restart the reset-initialization wave or abandon the component replacement based on the threshold value set for the num- ber of reset-initialization waves that can be initiated. If the component replacement is abandoned, other processes would be informed about it so that they can return to normal state. If the root process receives transit state information from all of its children, it initializes the new component and sets itself into transit state. Reset-transition wave. When all processes are in transit state, i.e., at the successful completion of the reset-initialization wave, the AdaptationHandler at the initiator starts the reset-transition wave. The AdaptationHandler at each process that receives the reset- transition wave performs the following tasks: 1. It propagates this wave to all of its children. 2. It invokes the checkState function of the component, which returns one of the three values: safetoremove, safetoblock or unsafetoremove. (a) If the function returns safetoremove, the AdaptationHandler requests the C om- ponentManager to remove the old component fraction and activate the new component. It then sets itself into the normal state. 143 _- X...“ l . fl ' ‘HO (b) If the function returns safetoblock, the AdaptationHandler requests the Compo- nentManager to block the component fraction. It periodically calls checkState until it returns safetoremove. After a component fraction receives information about other component fractions being removed or other application processes being blocked, eventually, checkState function at the blocking process will re- turn safetoremove. When that occurs, we follow case 2a. (c) If the function returns unsafetoremove, it is periodically invoked till it returns safetoblock or safetoremove, in which case we follow the case 2b or 2a respec- tively. During the transition phase, AdaptationHandler at each process will normally invoke a sequence (zero or more) of operations provided by ICMAdaptReady interface. These operations are invoked before or after each invocation of checkState. The details of what operations are invoked are dependent on the adaptation logic and are specific to the appli- cation and the component at hand. Typically, these operations are related to state transfer and to ensure that eventually checkState function returns safetoremove. Reset-completion wave. The transition wave is reflected towards the initiator (root) as the reset-completion wave. The leaf process sends the reset-completion wave to its parent after it removes the old component fraction and starts using the new component fraction. Any non-leaf process, which completes the component fraction replacement and receives the reset-completion wave from all of its children, sends the reset-completion wave to its parent. Once the initiator has replaced its component fraction and received the reset- completion wave from all of its children, the component replacement is complete. 144 9.3.3.3 Customizing Reset Protocol In the above discussion, we laid out a general overview of how the reset protocol is used in adaptation. However, depending on the application context, each implementation of AdaptationHandler customizes the reset protocol. For example, depending on the compo- nents and the application, the adaptation logic to replace the component may not require all of the two reset waves. In the case of mixed-mode adaptation discussed in Chapter 6 we need only one reset wave. Moreover, the operations invoked at each process during the reset waves will normally vary depending on the components at hand. Specifically, as discussed above, during transition wave each process will invoke a sequence of operations of ICMAdaptReady interface based on the adaptation logic. 9.3.3.4 Fault-Tolerant Reset Protocol By treating the reset protocol discussed earlier as an intolerant application and using the fault-tolerance components from [96, 97], we can make the reset protocol fault-tolerant. If we were to add the fault-tolerance component from [96], the resulting protocol will ensure that stabilizing fault-tolerance [59] is provided to faults including process/channel failures/repairs and transients. Thus, even if these faults occur, eventually the application will recover to a state from where subsequent component replacements will be correct. If we were to add the fault-tolerance component from [97], in addition to the stabilizing fault-tolerance to these faults, the resulting protocol will provide masking fault-tolerance to process/channel failures/repairs. Thus, if only process/channel failures/repairs occur then the component replacement will be always correct. Moreover, if more general faults such as 145 transients occur then the protocol will recover to a state from where subsequent component replacements will be correct. 9.4 Case Study: Leader Election Component Family In this section, we discuss the case study of leader election component family, which we denote by LEFamily. LEFamily consists of components that implement leader election protocol. In Chapter 6, we discussed the abstract model of two leader election protocols, namely ldrId and ldrVal, and mixed-mode adaptation from ldrId to ldrVal. In this sec- tion, we discuss the details of different interfaces and implementation of different parts of LEFamin. «interface» «interface» lComponent «interface» «interface» IDecisionMaker «interface» «interface» «interface» Figure 9.8: Interfaces of the leader election component family. 146 9.4.1 Interfaces of the Family The interfaces of LEFamily are shown in Figure 9.8. We now discuss the fiinctions provided by each interface: 0 lComponent. This interface provides functions supported by leader election com- ponents. — initial i ze function is used by the application to initialize the leader elec- tion component before using it. The initialization parameters that a component needs from an application include host address and neighbor list. — getLeader function returns the address of the leader. In our implementation, getLeader function blocks if leader election is underway. getLeader can also have a non-blocking implementation that will return the address of the last known leader process. - startElect ion function is used by the application to start the election. — i 5 Per f ormingE l e c t i on function is used by the application to check if the protocol is performing election. 0 IAdaptReady. This interface provides functions that correspond to adapt-active parts of the component. They include functions that are used by AdaptationHandlers to perform adaptation. — adapt Ini t i al i z e function is used to initialize the component during adap- tation. It uses StateMappingObject to read the state of the previous component. 147 — checkState function returns one of the three values: safetoremove, safeto- block or unsafetoremove. — block function blocks the component from starting a new election. — unblock removes the block so that the component can start a new election. - remove function releases any resources and transfers the current state of the component to StateMappingObject. o IAdaptReadyComponent. This interface extends [Component and IAdaptReady in- terface. Both the leader election components in LEFamily implement this interface. 0 ICMAdaptReady. This interface extends IAdaptReady and provides additional functions that are used by AdaptationHandlers to perform adaptation. It includes act ivateComponent function that activates the new component (so that calls to ComponentManager are now delegated to the new component). 0 lAdaptationController. This interface provides getCurrent Component that is used by AdaptationControlIer to get the current active component. 0 IDecisionMaker. This interface provides getCurrent Component that is used by DecisionMaker to get the current active component and not if yDecis ion to inform ComponentManager of the decision. The ComponentManager provides this interface only at the initiator node (a designated node that will initiate adaptation) as DecisionMaker is only available at the initiator node. 0 IComponentlIlanager. This interface extends lComponent, ICMAdaptReady, lAdaptationController and IDecisionMaker interfaces. IDecisionMaker interface is 148 j lAdaptationHandler AdaptationController AbsmctAdaptaflonHandIOr DecisionMaker cManager: lAdaptationController """ adapt() cManager: IDecisionMaker aHandler: lAdaptationHandler i QuisconceAdaptationHandler MixodModoAdaptationHandIor cManager: ICMAdaptReady cManager: ICMAdaptReady LEc°inp°"°"m'"'°" IsHandler: IStateMappingHandler IsHandIer: lStateMappingHandIer dMaker :1 DecrsionMaker adapt() d t aControIIer : AdaptationControIIer initWave() 3mm 39, e O aComponent : lAdaptReadyComponent transitWaveO t ra n sitWav e() in “ializeo completeWaveo : I startE Iection() Ir i I l getLeadero <_ _______ I I i i isPerformingElectionO I. i : adaptlnitialize() ( ---------- I: ----------------- I checkStateO g ------------- 1 ------------------- . : block() (— --------- . i I : uanocko : \l/ l I remgvd) C QuiesceneSMHandIer Mixedflodesm-Iandler gzttivggzndmgtrfgrifiyto cManager: ICMAdaptReady cManager: ICMAdaptReady notifyDecisiono storeStateOi storeState() ' : I IrestoreStateo restoreStateO I l ' i (ID I I ' I I : IComponentManager I IStateMappingHandler : lStateMappingHandIer l l : : i : I I _______ I I I ' I : : ' . . I i i : ——————————————————— )I StateMappmgObIectk- l {I\ _ _ _ _’| ___________________ I : ‘. I I I I I \'/ I \I/ I r LaadarElectionld I LoaderEIectionVal IsObj : StateMappingObject initializeO startElectionO getLeader() isPerformingElectionO adaptlnitialize() checkState() block() unblookO remove() IsObj : StateMappingObject J3 lAdaptReadyComponent 1’ lAdaptReadyComponent Figure 9.9: Classes of the leader election component family. extended by IComponentManager only at the initiator node. 149 '1'“ V 9.4.2 Classes of the Family The class diagram of LEFamily is shown in Figure 9.9. It consists of the following classes: 0 LEComponentManager. This class implements [ComponentManager interface. At the initiator node, it creates and maintains an instance of DecisionMaker. It in- forms DecisionMaker once it starts using a new component. It also maintains an instance of AdaptationController. At the initiator node, it informs AdaptationCon- troller of the new component that need to be installed. At the non—initiator nodes it creates an instance of AdaptationController, which listens on the adaptation port. The init ial i ze function of LEComponentManager class is used to create an in- stance of LeaderElectionId or LeaderEIection Val. It delegates the calls received on [Component and IAdaptReady to the active instance of the component. 0 LeaderElectionId. This class implements leader election protocol based on process id. The abstract version of the protocol is discussed in Chapter 6. o LeaderElection Val. This class implements leader election protocol based on process value. The abstract version of the protocol is discussed in Chapter 6. o DecisionMaker. This class is installed only at the initiator node and it notifies LEComponentManager of the decision. In our current implementation, Decision- Maker gets the decision (on when and what to adapt) directly from the user and passes that decision to LE ComponentManager. o AdaptationController. This class uses strategy pattern [89] to instantiate the class that implements the adaptation logic. It has two implementations: 150 - Initiator. At the initiator node it gets the input from ComponentManager. Upon receiving the input, it initializes the appropriate AdaptationHandler (Qui- escenceAdaptationHandler or MixedModeAdaptationHandler). In our current implementation, we let the user choose the type .of adaptation. Another possi- bility is to let AdaptationController decide itself the type of adaptation based on some rules. - Non-Initiator. At the non-initiator node, AdaptationController listens on the adaptation port for requests from the other nodes. Based on initialization pa- rameters received during reset-initialization wave, it instantiates an appropriate AdaptationHandler. The AdaptationController can be extended to check for the availability of the new component, to retrieve the new component from the server, and to verify if the new component satisfies the contract of the family. In our current implementation we assume that the new component is available and is verified. o AbstrachaptationHandler. In our current implementation, this class provides ab- stract adapt function, whose behavior is defined in the derived classes. This class can be extended to provide behaviors common to all AdaptationHandlers. o QuiescenceAdaptationHandler. This class is derived from AbstractAdaptation- Handler and it implements the quiescence adaptation between LeaderElectionId and LeaderElection Val. At the initiator node, it acts as the root of the reset protocol, and at non-initiator nodes it acts as a participant in the reset protocol. 151 , o MiredModeAdaptationHandler. This is similar to QuiescenceAdaptationHandler except that it implements mixed-mode adaptation. o QuiescenceSMHandIer. This class is used by QuiescenceAdaptationHandler to do state mapping from LeaderElectionId to LeaderElection Val and vice-versa during adaptation. It gets a reference to LEComponentManager from QuiescenceAdapta- tionHandler, and also maintains StateMappingObject. It stores the state of the old component in StateMappingObject and uses it to initialize the state of the new com- ponent. o MixedModeSMHandler. This class is similar to QuiescenceSMHandIer, however, it implements the mapping required by mixed-mode adaptation. o StateMappingObject. This class stores the state of the components during adapta- tion. The reference to this object is passed to the old component, which stores its current state in this object. The StateMappingHandlers use appropriate state map- ping on this object to map the state of the old component to the new component, and then passes the reference to this object to the new component during initialization of the new component. 9.4.3 Performance Results When comparing performance of adaptive systems with non-adaptive systems, it is rea- sonable to assume that adaptive systems will have some performance overhead. When the system is not adapting, the adaptive system is functionally equivalent to corresponding 152 .' non-adaptive system. Therefore, there should be little or no overhead from the mechanisms that make non-adaptive systems adaptive. 650 ‘3‘ 5 600 ~ 0 E lWith Family a El Without Family E 550 , > < 500 - - ' Node 2 Node 3 Node 4 Node 5 Figure 9.10: Component family performance. We now present experimental results to show that the component family design does not impose any significant overhead on the use of components. The chart in Figure 9.10 shows the average time of leader election (using ldrId component) at each node for the two cases: (2') application using the leader election protocol directly, and (it) application using the component family. The time of leader election is calculated from the instance when the application makes a request for leader election. In the case of component family, the request by application for leader election is delegated to the active component (in this case ldrId). We observe that the time for leader election in both cases is almost similar. At three nodes (node 1, node 2 and node 4) the time for leader election is less when component is used directly, whereas at the other two nodes (node 3 and node 5) it is slightly more. We conclude that overhead induced by component family is negligible. In particular, any overhead due to delegation is within the variance of the time for leader election. 153 9.5 Case study: Reliable Communication Component Family In this section, we discuss the component family consisting of components that provide reliable communication. We describe abstract version of the component family in this section. The details of concrete version (interfaces and classes) are similar the case study of Section 9.4. There are various components available for providing reliable communication. We consider three components in this family (cf. Figure 9.11), namely, (2') forward error correction (f ec) component, (ii) acknowledgment (a ck) component, and (iii) forward error correction with acknowledgment (f e cAck) component. Components f ec and ack are similar to the proactive component discussed in Chapter 4 and the reactive component discussed in Chapter 5, respectively. For simplicity, we discuss only one sender and one receiver; however, the case study can be easily extended for multiple sender and multiple receivers. fecAck ec ac Figure 9.11: Reliable communication component family. f k 9.5.1 Components of the Family We now discuss abstract models of the components and the adaptations involved in the component family. In Chapters 4 and 5, we gave details of these components. The com- 154 ponents discussed in this chapter have unbounded buffer, whereas the reactive component discussed in Chapter 5 had bounded buffer. Also, while the reactive component in Chapter 5 used both acknowledgments and negative acknowledgments, ack component discussed in this chapter uses only negative acknowledgments. Forward Error Correction (fec) Component. f ec component is used to deal with message loss by sending extra packets. The receiver can recover the lost packets without requesting retransmission of lost packets. fec component consists of two types of frac- tions: encoder and decoder. The encoder fraction is added at the sender process and the decoder fraction is added at the receiver process. The encoder takes (77. — k) data packets and encodes them to add I: parity packets. It then sends the group of n (data and parity) packets. The decoder needs to receive at least (n — 1:) packets of a group to decode all the data packets. Thus, if less than 1: packets in a group are lost then the receiver can recover the lost packets by using the extra packets from that group. The abstract version of f ec component is shown in Figure 9.12, which consists of encoder fraction s_fec at sender process 3 and decoder fraction r_fec at receiver process r. The encoder fraction reads the input from 3Q, encodes the input and sends it to r. The decoder fraction receives the input from s, decodes it and delivers it to the output listener, rQ. Acknowledgment (a ck) Component. ack component deals with message loss by re- transmitting the lost packets. It uses negative acknowledgments to confirm the message loss. It consists of the encoder fraction at the sender and the decoder fraction at the receiver. The encoder fraction adds a group and a packet number in each packet. If it receives a neg- ative acknowledgment for any packet, it sends that packet again to the decoder fraction. 155 Component fec Fraction s_fec inp sQ : queue of integer r var n, k, u. l, m : integer {initially, u = l = m = 0} ean : array [integer, O..n — 1] of integer {initially, ean = I} begin encode : —-isEmpty(sQ) —> ean[u, O..n —- 1] :2 fec-encode(head(sQ, n — k)); u := u + 1 fl fec.send : eanIl,m] 7é _L —> 03¢ := Cs,r 0 {I’mIWCQllaml}§ m := (m+ 1) mod n; if m = 0 then I := l + 1 13 end Fraction r_fec inp rQ : queue of integer 3 var n, k, :r, y, p, m : integer {initially, p = 0} rbqu : array [integer, O..n — 1] of integer {initially, rbqu = _I_} begin ‘ fec.receive : nisEmpty(C3,r) —> :r,y.m :2 head(Cs,r); rbquIx, y] := m [] decode : count(rbqu[p,O..n — 1] 74 _L) >2 (n — k) -—> rQ := rQ o fec-decode(rbqu[p, 0..n — 1]); p := p + 1 end Figure 9.12: Forward error correction component. When the decoder fraction detects a packet loss it sends a negative acknowledgment to the encoder fraction. The decoder fraction detects packet loss when it starts receiving packets from a new group before all the packets of the current group are received. When all packets in a group are received, it delivers them to the output listener. The abstract version of ack component is shown in the Figure 9.13, which consists of encoder fraction s_a ck at sender process 3 and decoder fraction r_a ck at receiver process 156 Component ack Fraction snack inp 3Q : queue of integer r var n, g, p, gnu, pna, m : integer {initially, p = g = O} snt : array [integer, O..n — 1] of integer {initially snt = _L} begin ack.send : -dsEmpty(sQ) —-> sntIg,p] 2: {g,p, head(sQ)}; 03,1- 2: Cs,«,~ o sntIg,p]; p := (p+ 1) mod n; ifp = 0 then 9 := g + 1 fi [] resend : type(C7~,3) =nack —> gnaIPna I: head(Cr,s)§ Cs,r 3= 03,7: 0 sntlgnaapna] end Fraction Lack lap 70 3 var n, k, x, y,p, m : integer {initially,p = 0} rbqu : array [integer, 0..n — 1] of integer {initially, rbqu = _L} begin ack.receive : —uisEmpty(Cs,,~) —> :1:,y,m := head(C3,r); rbqulir, :9] == m [] deliver : count(rbqu[p, 0..n — 1] sé _L) = n -—> rQ := 7‘62 0 rbquIp, 0..n — 1]; p := p + 1 [] sendmack : count(rbqu[p + 1,0..n — 1] 75 _L) > O -+ for k = O to n —- 1 ifrlmeIp, k] = i then CT,8 := 07330 nack(p, kt) fi end Figure 9.13: Acknowledgment component. r. Similar to f ec component, the encoder fraction has 362 and r as the input parameters. The decoder fraction has IQ and s as the input parameters. 157 _ ‘ I Component fecAck Fraction s.fecAck inp sQ 7. var n,k,u,l,m ean ackMode begin encode [] resend end [] fecAck_send : : queue of integer : integer {initially, it = l = m = 0} : array [integer, 0..n — 1] of integer {initially, ean = _L} : boolean {initially, true} : -IisEmpty(sQ) ——> eanIu,0..n — 1] := fec_encode(head(sQ, n — k)); u := u + 1 eanIl, m] yé _L ——> (73¢ 2: 03,7: 0 {l,m,ean[l,m]}; if ackMode = true then snt[l,m] = {l,m,ean[l,m]}; fi m:= (m+1)modn; ifm=0then l:=l+1 fl ackMode —_-—. true /\ type(Cr,3) = nack -——> gnarpna 3: head(Cr,s); 03,7: 3: 03,7: 0 sntlgnaapnal "'l Figure 9.14: Forward error correction with acknowledgment: sender fraction. Forward Error Correction with Acknowledgment (f ecAck) Component. f ecAck component uses both forward error correction and acknowledgments. If the rate of message loss is high and more than k packets are lost in a group, then negative acknowledgments can be used for retransmission of the lost packets. If the rate of message loss is low and less than k packets are lost in a group, then the receiver can recover the lost packets without requesting any retransmission. The abstract version of f ecAck component is shown in Figures 9.14 and 9.15, which consists of s_f e cAck fraction at the sender process and r_f e cAck fraction at the receiver 158 Component fecAck Fraction r_fecA ck inp rQ : queue of integer 5 var n, k, :r, y,p, m : integer {initially,p = 0} rbqu : array [integer, O..n — l] of integer {initially, rbqu = J_} ackMode : boolean {initially, true} begin fec.receive : -dsEmpty(CS,r) —> :c,y,m :2 head(Cs,7~); rbquIx, y] := m [] decode : count(rbqu[p,0..n — I] 75 .1.) >= (n — k) —+ rQ :2 rQ o fec-decode(rbqu[p, 0..n — 1]); p := p + 1 [] send_nack : ackMode = true /\ count(rbqu[p + 1,0..n — 1] # _L) > 0 —-> for k = O to n — 1 if rbqu[p, k] = J. then Cr”, 2: C730 nack(p, 1:) fi end L"! ‘35..- as "-01.5: ‘ Cl -‘ Figure 9.15: Forward error correction with acknowledgment: receiver fraction. process. 9.5.2 Adaptations of the Family We now consider the adaptations that exist in this family as shown in Figure 9.11. There are four adaptations that exist in this family, namely, (1) fecAck to f ec, (2) fec to f ecAck, (3) f ecAck to ack, and (4) ack to f ecAck. The arcs in the graph are labeled accordingly. (Note that the maximum number of possible adaptations in a family of three components is six, and the minimum number of required adaptations to keep the graph strongly connected is three). The adapt-active parts of each fractions that are involved in adaptations are shown in 159 3"; Adapt-active parts for adaptation 1 : f ecAck to f ec Fraction: s_fecAck “113 : true —> ack-mode :2 false a128: 3115 Aallr —> sremove s_fecAck[n,k,u,l,m,eQ] Fraction: r_fecAck “111' : true -—> ack-mode := false 012,. : 3115 Aallr —-> sremove r_fecACk [n,k,p,m,rbqu] Fraction: s_fec c1123: 3125(s_fecAck) ——> sadd s_fec[n,k,u,l,m,eQ] Fraction: r_fec 0121': al2r(r_fecAck) —+ sadd r-fec[n,k,p,m,rbqu] Adapt-active parts for adaptation 2 : fec to f ecAck Fraction: s-fec 0213: true —-> sremove s_fec[n,k,u,l,m,eQ] Fraction: r-fec a21,. : true ——> sremove r_fec [n,k,p,m,rbqu] Fraction: s_fecAck 0213 : 3215(s_fec) —> sadd s_fecAck [n,k,u, l,m,eQ]; ackMode :2 true Fraction: r-fecAck 021,. : 321r(r_fec) —+ sadd r-fecAck [n, k,p,m,rbu,fQ]; ackMode := false 022,. : 3215 —> ackMode := true Figure 9.16: Adaptations l and 2 in reliable communication component family. Figures 9.16 and 9.17. The adapt-active parts of the fractions contain atomic adaptations that are associated with them. The name of each atomic adaptation has three subscripts (e.g., a133). The first subscript denotes the adaptation (of. label of the arc in Figure 9.11). The second subscript denotes the order in the sequence of atomic adaptations. If two atomic adaptations has the same second subscript, it means that they can be executed in any order. The third subscript denotes the process that the atomic adaptation is associated with. For example, atomic adaptation a133 denotes that in adaptation 1 (f ecAck to f e c), it is third in the sequence of atomic adaptations and it occurs at process .9. Adaptation 1: f ecAck to f ec having enhanced-primitive relationship. Consider fec and f ecAck components as shown in Figures 9.12, 9.14 and 9.15, respectively. fecAck 160 P.n.wi"£&u'al u. l._ h l Adapt-active parts for adaptation 3 : f ecAck to ack Fraction: s_fecAck c1318: true —> block encode a333: 332r —> remove s-fecAck Fraction: r_fecAck a32r: 3315 A eanIl,m] = _L /\ nisEmpty(Cs,7~) —> remove r_feCACk Fraction: s_ack (1343: 3335 /\ a33r --> add S_aCk Fraction: r_ack 0.337.: 832, —> add r-ack Adapt-active parts for adaptation 4 : ack to f e cAck' Fraction: s_ack “41.93 true -—» block send a433: 342, —+ remove s-ack Fraction : r_a ck a42r: 3415 A /\ nisEmpty(C5,r) —> remove r_ack Fraction: s_fecAck 0.443: 3435 /\ 343, —-> add S_feCACk Fraction: r-fecAck 0.437.: 342r ——> add r-fecAck Figure 9.17: Adaptations 3 and 4 in reliable communication component family. component is syntactically compatible with f ec and it also provides all the services that f ec provides. In this case, f ecAck is an enhanced version of f ec, which is a primitive version. During the adaptation that replaces f e cAck to f e c, the fractions can be changed arbitrarily, provided (27) f ecAck component is running in a primitive mode before the adaptation begins, and (it) state of f ecAck component is transferred to f ec component. As shown in Figure 9.16, first ack-mode is set to false (ans and (1117‘); as a result the f ecAck component is now running in primitive mode, i.e., in a mode compatible with f ec. Now, the fiactions can be changed arbitrarily (0.125 and a 127.). There are two atomic adaptations named a123, one associated with fraction s-fecAck and another associated with fraction s_fec. This implies that sremove and sadd can be considered as an 161 I” atomic action that affects both fractions s-fecAck and s_fec. Similarly, removal of r_fec and addition of r-fecAck is done in an atomic manner. Note that sremove and sadd carry an extra argument, which represents the state information that is transferred from the existing component to the new component. In this case, the state of f ecAck component is transferred to f ec component. The adaptation lattice corresponding to this adaptation is shown in Figure 9.18(a). (a) (b) Figure 9.18: Adaptation lattices. Adaptation 2: f ec to f ecAck having primitive-enhanced relationship. As discussed in the previous adaptation, in this case, the fractions can be changed arbitrarily, as the components share a primitive-enhanced relationship. We need to ensure that (2') state of f ec component is transferred to f e cAck component, and (it) f e cA ck runs in a primitive mode till the adaptation is complete. Initially, ack_mode is set to false, so f ecAck runs in a mode that is compatible with f ec. After the adaptation is complete, ack-mode is set to true. The adaptation from f ec to f ecAck is as shown in Figure 9.16 and the adaptation 162 lattice corresponding to this adaptation is shown in Figure 9.18(b). Adaptation 3: f e cAck to ack. The adaptation that replaces f e click to ack is as shown in Figure 9.17. Unlike adaptations 1 and 2, components f ecAck and ack do not share an enhanced-primitive relationship, because f ecAck component is not designed to run in ack-only mode. Specifically, fecAck.send action of f ecAck is not compatible with ack.send action of ack. Hence, f e cAck component does not provide all the functionality that ack component does. Therefore, the fractions cannot be changed arbitrarily. First, encoder fraction s_fecAck at sender 3 needs to be blocked from encoding more packets ((2313). Once all the packets that are already encoded are sent by s, and received by receiver r, then decoder fraction r_f e cAck can be removed ((132,). After r_fecAck fraction is removed, the removal of s_fecAck fraction ((1333) and the addition of r_a ck fraction (c1337.) can be done in any order. Finally, after s-fecAck fraction at s is removed and r_a ck fraction at r is added, encoder fraction s_a ck is added at s (a343). The adaptation lattice corresponding to this adaptation is shown in Figure 9.18(c). (Note that fecAck and ack could be modified so that they satisfy the enhanced-primitive relationship. We have chosen not to do so to illustrate the case where the components are not related by an enhanced-primitive relationship.) Adaptation 4: ack to f ecAck. The adaptation that replaces ack to f ecAck is similar to adaptation 3 discussed above. Since the components do not share an enhanced-primitive relationship, the fractions need to be changed in a specific order as shown in Figure 9.17. The adaptation lattice corresponding to this adaptation is shown in Figure 9.18(d). 163 “iii 9.6 Discussion In this section, we address some questions related to the component family design. Can there be multiple adaptation paths between two components? Ifyes, then which adap- tation path should be chosen? Yes. There can be multiple paths between two components. In this case, additional factors could be taken into account while deciding the appropriate path for adaptation. To this end, for each arc, we could associate several factors, e. g. , the time or resources required for adaptation, types of faults that could be tolerated in the adaptation. Based on the factors associated with each are, we can compute the characteristics for different paths. These characteristics can then be used to determine the suitable path. How do we develop components that satisfiz the enhanced-primitive relationship? One way to design such components is to use inheritance. Inheritance provides syn- tactic compatibility between components. To ensure semantic compatibility, we need to extend the inheritance relationship, such that, a derived component provides all services that a parent component provides. How does the component family design help when adaptation involves components provid- ing difi‘erentfunctionalities, say security and reliability? In this case, there will be two separate component families, namely, a family of com- ponents that provide reliability and a family of components that provide security. The application will have to perform separate adaptations for reliability and security compo- nents. There may exist a scenario, although undesirable as it violates the principle of separa- 164 tion of concerns, where some component, say C, provides reliability as well as security. In this scenario, C will be present in both the families. Now, if the application that is using C to provide security and reliability decides to use only a security component, then the appli- cation can perform adaptation to replace C with a security component. Also, existence of such components could be used in adaptations where one needs to trade off between two desirable properties such as reliability and security. Specifically, if the application were to replace a reliability component by a security component (may be because of environ- ment changes that require it to have security but where reliability cannot be provided due to other constraints such as energy management) then the application could first replace the reliability component with C and then replace C by the security component. HOW can we perform the adaptation where some component is removed although not re- placed by other component? If such a scenario is desired for a particular component family then that family should have a default component which is equivalent to having no component at all. For example, in the context of our case study in Section 9.5, the default component would be one that provides no recovery for lost messages. Thus, removal of a component is equivalent to replacing that component by the default component. This approach is similar to that in [3]. How does component family assist in independent development of components? Consider a scenario, where components, say A and B, are developed for a component family 1‘. These components can be developed independently and we can still perform adaptation from A to B and vice-versa, even if direct adaptations were not to exist between A and B. One way to achieve this would be if adaptations from A and B to some compo- 165 nent C and vice-versa were identified. In this case, adaptations between A and B could be done via C. Specifically, this approach is easy to comprehend when a component family has a primitive component and different enhanced components are developed correspond- ing to that primitive component. The adaptation between any two enhanced components can be done via the corresponding primitive component. Examples of such adaptation can be found in [91]. How can component family design be used to develop parallel adaptation? To implement parallel adaptation, ComponentManager can be modified so that it main- tains active instances of more than one component. It can delegate the request received fiom the application to the appropriate component. Also, the switching algorithms discussed in [98] for parallel adaptation of distributed agreement protocols can be implemented as A daptationHandlers. 9.7 Related Work In this chapter, we introduced the notion of component family to build a systematic and extensible library of adaptive components. The idea is based on the original notion of program families proposed by Parnas [86, 90]. A program family is a set of programs (not all of which necessarily have been or will ever be constructed) for which it is profitable or usefirl to consider as a group. However, the idea of program families does not address issues related to adaptation. In this work, we try to study different components with adaptation in mind. Moreover, we focus on components with similar fimctionalities and interfaces. In other words, our focus is more narrow compared to the general goals for program families 166 [‘7’" where components with (slightly) different functionalities and interfaces are also studied together (for different reasons). In feature-oriented programming [99, 100] and software product lines [101], goals of product lines (creating family of related programs) are addressed. Product families pro- vides an architecture that is based on commonality and similarity. In product families, various product variants can be derived from the basic product family, thereby, allowing reuse of products in the family. The main focus of product lines (product families) is from the perspective of static development and reuse. In contrast, the component family design addresses dynamic reuse of components. The current design of component family focuses on components having similar functionalities and interfaces. We believe that the product family and feature-oriented programming work can be leveraged when we consider exten- sions to the component family to build a library of adaptive components that have different interfaces. 9.8 Summary In this chapter, we presented a systematic and extensible component family design to build a library of adaptive components. In summary, we discuss the following advantages: 1. Simplifying adaptation between components and enabling independent develop- ment of new components. Consider the case where an application is using a component from a component family .7: consisting of n components. In this case, to provide adaptation between any two components of a family f, we need a minimum of only n adaptations (in this case, the graph consists of a directed cycle). When a new component is deve10ped that 167 will be a part of f, it suffices to have only two more adaptations while still keeping the graph strongly connected. For example, if a new component E is added to the component family shown in Figure 9.2, only two adaptations, say from C to E and fi'om E to B, are enough to keep the graph strongly connected. Also, since every component implements IAdaptReady interface, it becomes easy to develop adaptation between these components. 2. Simplifying verification of adaptation. To verify that the adaptation between com- ponents is correct (e. g., by using the approach in Chapters 4 and 5), it is required that after adaptation the component continues to correctly perform its functionality, and spec- ification during adaptation is satisfied. The separation of adaptation logic from compo- nent functionality simplifies the task of specifying and verifying adaptation. Moreover, the adaptation lattice specification has a direct mapping to the implementation (Adaptation- Handlers). Furthermore, if the number of such adaptations is low, then less verification needs to be performed. 3. Reusability of components and adaptations. The design of component family not only enhances the reuse of components but also promotes the reuse of adaptations between components. For example, consider two components X and Y and that the adaptation from X to Y exists. Now, consider a component Z and the adaptation from Z to Y. In this case, by providing the adaptation from Z to X, the adaptation from Z to Y can be done in two steps while reusing the adaptation that already exists from X to Y. We note that if the direct adaptation from Z to Y were to exist, it would not necessarily be fast or simple. In fact, there are cases, as discussed in next point, where a two (or more) step adaptation is simpler than a direct adaptation between components. 4. Simplifying adaptation in case of an enhanced-primitive relationship among com- 168 ponents. The state—transfer and synchronization during adaptation is in general difficult between arbitrary components. However, if one component is an enhancement of another component, then the state-transfer and synchronization can be simplified in adaptation be- tween those two components (cf. Section 9.5 for an example). To take advantage of this, we define the enhanced-primitive relationship between components. We say that a component A is an enhanced component of component B (respectively, component B is a primitive component of A) iff A is syntactically and semantically compatible with B, i. e., it extends the interface of B, and it provides all services that B provides. Now, consider a scenario where A is an enhanced component of B and that B is be- ing replaced by A. In this scenario, fractions of B can be replaced in an arbitrary order by fractions of component A, as each fraction of A can provide the required service to the remaining fractions of B. Moreover, the fractions of A can communicate with the re- maining fiactions of B using a protocol that the latter understands. Thus, in this case, the synchronization requirement among component fractions is relaxed, and also transferring state to/from primitive component is easy. In a case where two enhanced components, say C and D, are derived from the same primitive component, say Z, the adaptation from C to D can be done in two steps; by first replacing C by Z and then replacing Z by D. Since the adaptation for enhanced-primitive relationship is relatively easy, the direct adaptation from C to D may not necessarily be fast or easy. We note that the idea of an enhanced-primitive relationship can be extended to have a multi-level hierarchy of components, where components at a higher level are enhanced version of components at lower level. 5. Easy replacement of adaptation logic. An adaptation from one component to another 169 component can be performed in several ways. For example, we discussed quiescence and mixed-mode adaptations in this dissertation. Moreover, we used reset-based approach to achieve synchronization among processes in distributed system during adaptation. How- ever, other approaches or variations of reset protocol can be used for synchronization. The environment conditions or resource constraints may determine which adaptation logic will provide optimum performance. The component family design has AdaptationHandlers that implement adaptation logic and AdaptationController can choose among different Adap- tationHandlers that will execute the adaptation. This clear separation of adaptation logic makes it easy to replace between different adaptation logics. 170 Chapter 10 Conclusion and Future Work Software systems undergo adaptation for reasons that include changes in environment con- ditions, fixing of some bugs, or changes in requirements. In the case of distributed systems, adaptation causes changes to multiple processes. It is required that these changes to mul- tiple processes be properly synchronized in order for the adaptation to provide acceptable behavior. Furthermore, in the case of adaptation in distributed systems, the behavior of the old program and the behavior of the new program may overlap causing mixed-mode behav- ior during adaptation. To gain assurance in adaptation it is important to formally specify and verify the behavior of the system during adaptation. 10.1 Contributions In this dissertation, we made the following contributions: 0 We presented an approach based on adaptation lattice to model the adaptation in distributed systems [102]. The adaptation lattice approach identifies various overlap 171 scenarios that can occur during adaptation. It classifies system during adaptation into intermediate programs and specifies the specification for the intermediate programs. Specifically, various properties that need to be checked during adaptation can be specified using adaptation lattice. We presented an approach based on transitional-invariant lattice to verify the cor- rectness of adaptation in the absence of faults, and transitional-faultspan lattice to verify the correctness of adaptation in the presence of faults [102]. In the transitional- invariant lattice approach, we identify the invariants associated with the intermediate programs to show the safety during adaptation. In the transitional-faultspan lattice approach, we identify the invariants and the faultspans associated with the interme- diate programs to verify the fault-tolerance properties during adaptation. The ap- proach applies to different types of fault-tolerances, namely, failsafe, masking and non-masking. The transitional-invariant and transitional-faultspan lattices also sat- isfy the liveness during adaptation which states that eventually adaptation is com- pleted. We discussed the case study of mixed-mode adaptation using leader election proto- cols [103]. We showed the performance benefits that can be obtained by mixed-mode adaptation compared to quiescence adaptation. Specifically, mixed-mode adaptation reduces the service interruption time and the communication overhead during adap- tation. We also showed how the lattice-based approach can be used to specify and verify mixed-mode behavior during adaptation. a We identified the tradeoffs that occur in adaptation [104]. The tradeoff shows that 172 ’1‘!th all). - increasing concurrency among adaptive actions leads to increase in verification com- plexity. Moreover, reducing concurrency leads to decrease in communication over- head due to adaptation. These tradeoffs should assist the adaptation developer in choosing concurrency during adaptation based on complexity of adaptation, adapta- tion completion time, and resource availability during adaptation. Because of complexity, specification and verification of adaptation in distributed sys- tems is often done on abstract model. To gain assurance in implementation, it is im- portant that verification be supplemented by testing. In this context, we presented an approach for testing of adaptation using predicate detection techniques [105]. We in- troduced adaptation vector to classify intermediate program states that occur during adaptation. We also introduce two classes of predicates that occur during adaptation, namely, adaptation-stable and adaptation-transient predicates. We showed how ex- isting techniques for predicate detection can be extended for testing adaptation. We described component family design to support adaptation and build an adaptation- verified library of components. The components in the component family provides adaptation-related services in addition to the core firnctionality to support adaptation. The design of component family also separates the adaptation logic from the compo- nent functionality, thereby, simplifying the specification, verification and implemen- tation of the adaptation logic. The design also simplifies independent development of new components and adaptation among components. In this context, we have pre- sented some of the results of component family approach in [106]. The component family design integrates the framework for adaptation while ensuring separation of 173 adaptation logic. The framework supports both mixed-mode and quiescence adap- tation. A preliminary design of this framework has been published in [3], and in a recent case study in sensor networks we have used this framework [107]. 10.2 Future Research This dissertation has created the possibility of several new research directions. Some of these are outlined below: i 0 Automatic generation of lattices. In this dissertation, we used transitional-invariant :— and transitional-faultspan lattice to verify adaptation. However, identifying the lat- tice is a complex task. We would like to explore methods to generate these lattices in a semi-automatic fashion. Specifically, considering all possible atomic adaptations, we would like to explore approaches that can generate models of different interme- diate programs. Also, we would like to explore approaches to identify transitional- invariants and transitional-faultspans automatically. 0 Component family having components with varied interfaces. In the component family design discussed in Chapter 9, we considered components having same inter- faces. We would like to explore ways to extend the design to deal with components having same firnctionality but (slightly) different interfaces. 0 Applying component family in new domains. In this dissertation, we presented com- ponent family design and illustrated it in context of algorithms in group communi- cation application such as leader election protocols and reliable communication pro- 174 tocols. We would like to apply this design in other domains, such as web services, middleware components, and other application-level components. Runtime verification. The component family design allows new components to be included and used in the family at runtime. Currently, we expect that verification of adaptation involving the new component is done offline. We would like to explore approaches to do runtime checking of new components and adaptations involving those components. In this context, we would like to leverage the design of component family as it separates the adaptation logic from component functionality. 175 APPENDICES 176 .-FE‘IUI ‘L' a.‘ . .9. . Appendix A Model-Checking of Adaptive Leader Election Program In this appendix, we describe the model-checking of adaptive leader election program using SPIN. The leader election programs are discussed in Chapter 6. The code of the adaptive leader election program in Promela language is shown in Section A.l. In Section A.2, we show the results of model checking safety and liveness properties of the adaptive leader election program. A.1 Adaptive Leader Election Program The adaptive leader election program in promela is shown below. We consider five pro- cesses. All processes are initially using ldrId protocol. The program should cover fol- lowing scenarios for model-checking: (i) adaptation occurs after election, (ii) adaptation occurs before election, (iii) adaptation occurs while election is underway, and (iv) elec- 177 tion occurs while adaptation is underway. To ensure that above scenarios are covered, we choose two process (namely, process 3 and process 4) to start the election. As discussed in Chapter 6, each process performs its adaptive action independent of others. All processes perform adaptation that replaces the til-rid protocol with ldrl’hl. #define N 5 /* number of nodes */ #define L 10 /* buffer size of channel */ /* types of messages */ mtype = { LDRID, LDRVAL, ELECT, ACK, LD }; /* structure to store neighbors for a node */ /* nL[i] (0 <= 1' < N) = true if i is a neighbor */ typedef ArrayNeighbors { bool nL[N] = false }; /* neighbor list for each node */ hidden ArrayNeighbors thrs[N]; /* computation index (election index) */ typedef complndex { byte num; byte id }; /* defining channel structure */ typedef Arraychannels { chan ch[N] = [L] of {mtype, mtype, complndex, byte, byte, bool } }; Arraychannels A[N]; hidden complndex rcvtmp; hidden byte junkBytelD; bool startNodes[N] = false; /* nodes that will start election */ byte value[N] = 0; /* value at each node */ bool ADAPT[N] = false; /* boolean variable indicating if node is adapting */ /* for joining in a election — LDRID */ inline joinElectionld(j) { 178 protocolType 2: LDRID && thrslj —- l ].nleyId-Il && A[j - l ].ch[ myld — I l‘?[LDRID(ELECT, rcvsrc ,tmpbyte ,junkBytelD ,tmpbool )] —> atomic { A[j — l ].ch[myld—1J?LDRID(ELECT, rcvsrc ,tmpbyte .junkBytelD , /* /=I< if fi: if tmpbool) —> if E then check if waiting for leader (LI) might be in queue?) */ need to remove LD from the queue */ E && waitCount == 0 && A[p—I].ch[myld—-—|]?[LDRID(LD, rcvtmp , ldr ,junkBytelD , tmpbool )] —> A[p—l].ch[myld—l]?LDRID(LD, rcvtmp , ldr ,junkByteID , tmpbool); else —-> skip (!E II (E&& (rcvsrc.num > src.num || (rcvsrc.num == src.num && rcvsrc.id > src.id)))) —> src.num = rcvsrc.num; src.id = rcvsrc.id; mynum = src.num + l; waitCount = 0; loopvar = 1; do loopvar <2 N —> if (loopvar != j && thrs[myId—l].nL[loopvar—l]) —> A[myld—l].ch[loopvar -—l]!LDRID(ELECT, src ,tmpbyte ,junkBytelD ,tmpbool); wait[loopvar—l] = true; waitCount++; else —> wait[loopvar—l] = false; fi; loopvar++; loopvar > N —> break od; if E —> printf(”MSC:”Stoppingcoldcelection...Joiningc. new..election\n”); else —> printf("MSC:..Joining..new..election\n”); fi; ack = true; E = true; p = j: max = myld; 179 ‘ I” E && (rcvsrc .num 2: src.num && rcvsrc.id :2 src.id) -> A[myld— 1 ].ch[] — I ]!LDRID(ACK, src ,tmpbyte ,junkBytelD , false); :: else -> skip fi; } inline ackReceiveld(j) { protocolType == LDRID && wait[j—l] && A[j —1].ch[myld—l]?[LDRID (ACK,rcvsrc , maxchildld ,junkByteID , isChild)] -—> atomic { A[j —l].ch[myld——l]?LDRID(ACK, rcvsrc , maxchildld ,junkByteID , isChild) -> if E —> if '° ack && src.num == rcvsrc.num && src.id == rcvsrc. id —> wait[J—l] = false; waitCount——; if " isChild —> chd[j—l] = true; if max < maxchildld —> max = maxchildld; else —> skip fi; else —> skip fi; else —> skip fi; else —> skip fi; } /* for joining in a election - LDRVAL */ inline joinElectionVal(j) { protocolType == IDRVAL && thrs[j —l].nL[myId—l] && A[j —l].ch[ myld —l]?[LDRVAL(ELECT, rcvsrc ,tmpVal ,tmpbool )] —> atomic { A[j —- l ].ch[myld—l]?LDRVAL(ELECT, rcvsrc ,tmpVal ,tmpbool) ——> /* if E then check if waiting for leader (LI) might be in queue?) */ l 80 /* need to remove Ll) from the queue */ if E && waitCount :2 O && Alp——l].ch[myld—l]?[LDRVAL(LD, rcvtmp ,tmpVal ,tmpbool )] —> A[p» l ].ch[myld - I ]‘lLDRVAULD, rcvtmp ,tmpVal ,tmpbool ); else —:> skip fi; if (!E H (E && (rcvsrc.num > src.num || (rcvsrc.num 2: src.num && rcvsrc.id > src.id)))) —-> W src.num = rcvsrc.num; src.id = rcvsrc.id; i ' mynum = src.num + l; "; waitCount = O; I loopvar = I; 2 do Lg loopvar <= N —> i. if =--' (loopvar I: j && thrs[myId-—l].nL[loopvar—-l]) —> A[myld — 1 ].ch [ loopvar — l]!LDRVAI.(ELECT, src ,tmpVal ,tmpbool); wait[loopvar—l] = true; waitCount++; else —> wait[loopvar—l] = false; fi; loopvar++; loopvar > N -—> break od; if E —> printf(”MSC:”Stoppingcoldcelection...Joiningc newcelection\n”); else —> printf(”MSC:..Joininganewcelection\n”); fi; ack = true; E = true; p = j; maxVal.id = myld; maxVal.num = value[myld—l]; E && (rcvsrc.num == src.num && rcvsrc.id == src.id) -> A[myld—l].ch[j —l]!LDRVAL(ACK, src ,tmpVal , false); else —> skip fi; I81 inline ackReceiveVal (j) { protocolTyp 2: LDRVAL && wait[j—Il && A[j —I].ch[myld—l]?[ LDRVAL(ACK,rcvsrc , tmpVal, isChild)l —> atomic { A[j—l].ch[myld—l]‘.’LDRVAL(ACK,rcvsrc, tmpVal, isChild) -—> if .. E _> if '° ack && src.num == rcvsrc.num && src.id == rcvsrc. id —> wait[j—~I] = false; waitCount——: if isChild —> chdlj—l] = true; if ((maxVal.num < tmpVal.num) |I (maxVal.num == tmpVal.num && maxVal.id < tmpVal.id)) —> maxVa|.id = tmpVal.id; maxVal.num = tmpVal. num; else —> skip fi; else —> skip fi; else —> skip fi; else —-> skip fi; } inline OverlappedMessage (j) { protocolType == LDRVAL && A[j - l ].ch[myld —l]?[LDRID(rcvdMngype , rcvsrc , tmpbyte ,junkByteID , tmpbool )] —> atomic { A[j - [].ch[myld-l]?LDRID(rcvdMngype , rcvsrc , tmpbyte , junkByteID , tmpbool ); printf(”MSC: ”Message ..from ..LDRID\n”); if " rcvdMngype == ELECT —> skip; rcvdMngype == ACK —> skip; rcvdMngype == LD —> if src .num == rcvsrc .num && src . id == rcvsrc . id —> E = false; 182 IF £1171; Idr = tmpbyte; loopvar = I; do loopvar <= N —> if " chd[loopvar—J] —> A[nwdd»—l[ch[100pvar—l]! LDRVALUJDAHC,ldr,junkByteH),tmpbool); chd[loopvar—4]=false: :: else —¢> skip fi; loopvar++ loopvar > NI~> break 0d; else —> skip fi; fi , } /* defining the process */ proctype node (bool st; byte mynumber) { bool startElection = st, E = false . ack = false, isChild = false; byte myld = mynumbcr, maxchildld; byte p = O, mynum = 0, max = 0, ldr =0; complndex src, rcvsrc , maxVal, tmpVal; bool wait[N] = false , chd[N] = false , tmpbool; byte waitCount = O, loopvar, tmpbyte; mtype protocolType = LDRID; /* initially LDRID is running */ mtype rcvdMngype; end: do /* LDRID .' start election */ protocolType :2 LDRID && (!E && startElection) —> atomic { printf(”MSC:...Starting..new..election\n”); src .num = mynum; src.id = myld; mynum = mynum + l; waitCount = 0; loopvar = I; do loopvar <2 N —> if thrslmyId—I].nL[loopvar—l] —> I83 A[ myld -— I ].ch[ loopvar -— I l!l.l)RlD(ELECT. src . tmpbyte, junkBytelD , tmpbool ); wait[loopvar—l] :2 true; waitCount++; else —-> wait[loopvar—l] = false; fi; loopvar++; loopvar > N —> break od; ack = true; E = true; p = myld; max = myld; startElection false; } joinElectionId ( l ); joinElectionld (2); joinElectionId (3); joinElectionId (4) ; joinElectionld (5); ackReceiveld (l); ackReceiveId (2); ackReceiveld(3); ackReceiveId (4); ackReceiveld (5); /* LDRID : ack to parent */ '° protocolType == LDRID && E && waitCount == 0 && myld != src .id && ack —-> atomic { ack = false; A[myId—l].ch[p—I]!LDRID(ACK, src ,max,junkBytelD,true); } /* LDRID .' elect leader */ " protocolType == LDRID && E && waitCount == 0 && myld == src .id && ack —> atomic { ack = false; E = false; Idr = max; loopvar = 1; do loopvar <= N —> if 184 chd[loopvarml] _> Alnudd—-chhlloopvar—lllLDRHDUJ) .src,ldr,junkByteH),tmpbool); chd[loopvar—d] = false; else -> skip fi; loopvar++ loopvar > N’—> break 0d; } /* LDRID .' receive and forward leader Ir/ " protocolType :2 LDRID && E && lack && myld != src.id && A[p —I].ch[myld—I]?[LDRID(LD, rcvsrc , 1dr ,JunkByteID ,tmpbool)] -—-> A[p—l].ch[myld—l]?LDRID(LD, rcvsrc ,ldr ,junkBytelD , tmpbool); atomic { if " src.num == rcvsrc.num && src.id 2: rcvsrc.id —> E = false; loopvar = 1; do loopvar <= N «> if " chd[loopvar—l] —-> A[myld -—l].ch[loopvar —l]!LDRID (LD, src , ldr ,junkBytelD . tmpbool ); chd[loopvar—l]=false; :: else --> skip fi; loopvar++ loopvar > N —> break od; else -> skip fi; } /* adaptive action */ ADAPT[myId-—I] && protocolType == LDRID -—> atomic { printf(”MSC:..Adaptingcprotocol..tocLDRVALcatcnode:z‘7cd\n”, myld); protocolType = LDRVAL; /* perform state transfer actions */ if E && waitCount I: 0 && src.id != myld —> E = false; ack = false; waitCount = 0; p = O; Idr = 0; /* } OverlappedMessage (l); loopvar = I; do " loopvar <2 N —> wait[loopvar-—l] = false: chd[loopvar—I] = false; loopvar++; :: loopvar > N —:> break od; E && waitCount l: O && src.id == myld —> E = false; ack = false; waitCount = O; p = 0; Idr = 0; loopvar = l; n do . loopvar <= N —-> I" ii ['1 wait[loopvar—l] = false; chd[I00pvar—l] = false; loopvar++; l‘ ; :. loopvar > N —> break E.) 0d; .. startElection = true; :. else -> skip fi; OverlappedMessage (2); OverlappedMessage (3); OverlappedMessage (4); OverlappedMessage (5); LDRVAL .' start election */ protocolType == LDRVAL&& (!E && startElection) —> atomic { printf(”MSC:..Startingcnewcelection\n"); src .num = mynum; src.id = myld; mynum = mynum + I; waitCount = 0; loopvar = I; do loopvar <= N —> if °° thrs[myId—-l].nL[loopvar ——l] -—> A[myld—l].ch[loopvar -—l]!LDRVAL(ELECT, src ,tmpVal , tmpbool); wait[loopvar—l] = true; waitCount++; 186 else —> waitlloopvar—1| = false; fi; loopvar++; loopvar > N —> break od; ack = true; E = true; p = myld; maxVal.id = myld; maxVal.num = valuelmyId--l]; startElection = false; } joinElectionVal (I); joinElectionVal (2); joinElectionVa1(3) ; joinElectionVa1(4); joinElectionVal (5); ackReceiveVa1(l); ackReceiveVa1(2); ackReceiveVal (3); ackRecerveVal (4); ackReceiveVa1(5); /* LDRVAL .' ack to parent */ " protocolType == IDRVAL && E && waitCount == 0 && myld != src.id && ack —-> atomic { ack = false; A[myld—l].ch[p—l]!IDRVAL(ACK, src,maxVa1,true); } /* LDRVAL : elect leader */ " protocolType == IDRVAL && E && waitCount == 0 && myld == src.Id && ack -—> atomic { ack = false; E = false; 1dr = maxVal.id; loopvar = 1; do loopvar <= N —> if " chd[loopvar—l] —> A[myld—l].ch[100pvar—1]!LDRVAL( LD, src , maxVa1,tmpbool); chd[loopvar—l] = false; else —> skip 187 fi: loopvar++ loopvar > N —> break od; } /* LDRVAL : receive and forward leader a/ " protocolType == LDRVAL && E && lack && myld 1: src . id && A[ p—l].ch[myld—1]?[LDRVAL(LD, rcvsrc ,tmpVa1,tmpbool)] -—> A[p——1Lch[nudd—J]?EDRVAL(LD,rcvsn:,Uanal,nnpbool); atomic { if " src.num == rcvsrc .num && src.id == rcvsrc.id -> E = false; 1dr = UanaLid; loopvar = I; do loopvar <= Nt~> if chd[loopvar—l] ~> A[nndd-1Lch[loopvar-—IN LDRVAL(LD,sn:,Uanal,nnpbool); chd[loopvar—l]=false; :: else —> skip fi; loopvar++ loopvar > N'“> break od; else *> skip fi; } od } /* initial process */ init { /* defining the neighbors */ lVbhrs[O].nL[l] = true; lVbhrs[l].nL[OJ = true; thrs[0].nL[2] = true; bflflns[2].nL[O] = true; thrle].nL[4] = true; huflns[4].nL[O] = true; IthrslI].nL[2] = true; lVbhrsl2].nL[I] = true; thrsll].nL[4] = true; hflfins[4].nL[I] = true; thrs[ ] nL[3] = true; 1Vbhrs[3].nL[21 = true; /* processes that will start the election */ starflflodesl3] = true; 188 sHartNodesl4l = true: value [3] = 1; byte proc: atomic { proc = 1; do proc <= N —> if startNodes[proc—I] —> run nodeI'true, else —-> run node(false, proc) fi; proc++; proc > N —> break 0d; do " !ADAPT[O] —> ADAPT[O] = true; IADAP'I‘H] —> ADAPTll] = true; 1ADAPT[2] —-> ADAPT[2] = true; !ADAP'T[3] —-> ADAPT[3] = true; !ADAPT[4] —> ADAPT[4] = .true; 3} ADAPT[O]&&ADAPT[1]&&ADAPT[2I && ADAPFI3I&&ADAPF141 -—> break; od A.2 Model-Checking Results adaptive leader election program of Section A] using SPIN. A.2.1 End States 189 proc); In this section, we discuss results of safety and liveness properties that we verify for the In this subsection we show the results of verifying the program for valid end states. The valid end states for the adaptive leader election program discussed in section A] are those in which every process that was instantiated has reached the end of its code. Specifically, those are ones in which all process have completed the election (started by process 1) and the adaptation (from ldrId to ldrVal). As shown in the result, there are no invalid end states. Bit statespace search for: never claim -— (not selected) assertion violations -— (disabled by —A flag) cycle checks (disabled by —DSAFETY) invalid end states + State—vector 2332 byte, depth reached 1251, errors: 0 3.36632e+07 states , stored l.l4232e+08 states , matched 1.47896e+08 transitions (= stored+matched) 5.90994e+08 atomic steps hash factor: 1.99354 (best if > 100.) bits set per state: 3 (—k3) Stats on memory usage (in Megabytes): 78637.270 equivalent memory usage for states (stored*(State— vector + overhead)) 16.777 memory used for hash array (—w26) 0.360 memory used for DFS stack (—m10000) 0.623 other (proc and chan stacks) 0.017 memory lost to fragmentation 17.777 total actual memory usage A.2.2 Safety Property of Leader Election In this subsection, we consider the safety property which states that if any two processes are not in election then they have the same leader. For sake of reducing redundancy, we show the result for the case in which we verify the property for process 1 and process 2. Similar results are obtained when we consider any two processes. The never claim describing the 190 I'd.- fl . .., -. at. ..,-I A. '_ '..AZ.A&.'.-HI.A' I 4) l... _ property is shown first, which is followed by the verification result. Never claim #define p (node[l]:E || node[2]:E || node[l]:ldr == node[2]:ldr ) /* * Formula As Typed: []p The Never Claim Below Corresponds To The Negated Formula !([]p) (formalizing violations of the original) * * * */ never { /* !([]p) */ T0-init: if (! ((p))) -> goto accept-aH (1) —> goto TTLJnit fi; accept-all: skip } Verification result Bit statespace search for: never claim + assertion violations + (if within scope of claim) acceptance cycles + (fairness disabled) invalid end states (disabled by never claim) State—vector 2336 byte, depth reached 1383, errors: 0 3.3699lc+07 states , stored 1.09351e+08 states , matched l.4305e+08 transitions (= stored+matched) 5.64732e+08 atomic steps hash factor: 1.99142 (best if > 100.) bits set per state: 3 (—k3) Stats on memory usage (in Megabytes): 79260.243 equivalent memory usage for states (stored*(State— vector + overhead)) 8.389 memory used for hash array (-w26) 0.040 memory used for bit stack 0.320 memory used for DFS stack (—m10000) 191 0.584 other (proc and chan stacks) 0.056 memory lost to fragmentation 9.389 total actual memory usage A.2.3 Liveness Property of Leader Election In this subsection, we consider the liveness property which states that eventually all pro- cesses have the same leader. The never claim describing the property is shown first, which is followed by the verification result. Never claim #define p (node[l]:ldr == node[2]:ldr && node[2]:ldr == node [3]:ldr && node[3]:ldr == node[4]:ldr && node[4]:ldr == node [5]:ldr && node[S]:ldr == node[l]:ldr) /:I= =I= Formula As Typed: []<>p * The Never Claim Below Corresponds * To The Negated Formula !([]<>p) * (formalizing violations of the original) */ never { /* !([]<>p) */ T0-init: if (I ((p))) —> goto accept_S4 (l) —> goto T0_init fi; accept_S4: if (I ((p))) —> goto accept-S4 fi; Verification result Bit statespace search for: never claim + assertion violations + (if within scope of claim) acceptance cycles + (fairness disabled) invalid end states (disabled by never claim) 192 State—vector 2336 byte, depth reached 1391, errors: 0 3.3996e+07 states , stored l.10277e+08 states , matched 1.44273e+08 transitions (= stored+matched) 5.7067e+08 atomic steps hash factor: 1.97402 (best if > 100.) bits set per state: 3 (—k3) Stats on memory usage (in Megabytes): 79958.646 equivalent memory usage for states (stored*(State— vector + overhead)) 8.389 memory used for hash array (—w26) 0.040 memory used for bit stack 0.320 memory used for DFS stack (—m10000) 0.585 other (proc and chan stacks) 0.055 memory lost to fragmentation 9.389 total actual memory usage A.2.4 Safety Property During Adaptation In this subsection, we consider a safety property during adaptation. The property states that if a process using ldr Val protocol sends a message of type ELECT to a process using ldrId protocol, then that message is buffered. The never claim describing the property is shown first, which is followed by the verification result. Never claim #define p (!(node[l]: protocolType == LDRID && node [2]: protocolType == IDRVAL && node[2]:E && node[2]:ack && node[2]: waitCount !=0) || (A[l].ch[0]??[LDRVAL(ELECT)])) /* * Formula As Typed: []p * The Never Claim Below Corresponds To The Negated Formula .’([]p) (formalizing violations of the original) * * */ never { /* !([]p) */ 193 T0-init: if (I ((p))) -> goto accept_all (l) —> goto TTLJnit fi; accept-all: skip } Verification result Bit statespace search for: never claim + assertion violations + (if within scope of claim) acceptance cycles + (fairness disabled) invalid end states — (disabled by never claim) State—vector 2336 byte, depth reached 1383, errors: 0 3.3699le+07 states , stored 1.09351e+08 states , matched l.4305e+08 transitions (= stored+matched) 5.64732e+08 atomic steps hash factor: 1.99142 (best if > 100.) bits set per state: 3 (—k3) Stats on memory usage (in Megabytes): 79260.243 equivalent memory usage for states (stored*(State— vector + overhead)) 8.389 memory used for hash array (—w26) 0.040 memory used for bit stack 0.320 memory used for DFS stack (—m10000) 0.684 other (proc and chan stacks) 0.176 memory lost to fragmentation 9.609 total actual memory usage 194 BIBLIOGRAPHY 195 Bibliography [1] W. K. Chen, M. Hiltunen, and R. Schlichting, “Constructing adaptive software in distributed systems,” in Proceedings of the 21st International Conference on Dis- tributed Computing Systems, pp. 635—643, April 2001. [2] P. McKinley and U. Padmanabhan, “Design of composable proxy filters for mo- bile computing,” in Proceedings of the Workshop on Wireless Networks and Mobile Computing, 2001. [3] S. S. Kulkarni, K. N. Biyani, and U. Arumugam, “Composing distributed fault- tolerance components,” in Proceedings of International Workshop on Principles of Dependable Systems - PoDSy, at DSN, pp. W127—l36, June 2003. [4] J. Hallstrom, W. Lea], and A. Arora, “Scalable evolution of highly available sys- tems,” Transactions of the Institute for Electronics, Information and Communication Engineers, vol. E86-D, no. 10, pp. 2154-2164, 2003. [5] B. Redmond and V. Cahill, “Supporting unanticipated dynamic adaptation of appli- cation behavior,” in Proceedings of the European Conference on Object-Oriented Programming (ECOOP), pp. 205—230, 2002. [6] S. M. Sadjadi, Transparent Shaping of Existing Software to Support Pervasive and Autonomic Computing. PhD thesis, Michigan State University, 2004. [7] J .-C. F abre and T. Pcrennou, “FRIENDS: A flexible architecture for implementing fault tolerant and secure distributed applications,” in Proceedings of the European Dependable Computing Conference, pp. 3—20, 1996. [8] R. Keller and U. Hdlzle, “Binary component adaptation,” Lecture Notes in Computer Science, vol. 1445, 1998. [9] J. Kramer and J. Magee, “The evolving philosophers problem: Dynamic change management,” IEEE Transactions of Sofiware Engineering, vol. 16, no. 11, pp. 1293—1306, 1990. [10] N. Amano and T. Watanabe, “A software model for flexible and safe adaptation of mobile code programs,” in Proceedings of the International Workshop on Principles of Software Evolution, pp. 57—61, ACM Press, 2002. 196 [11] P. McKinley, S. Sadjadi, E. Kasten, and B. Cheng, “Composing adaptive software,” IEEE Computer, vol. 37, no. 7, pp. 56—64, 2004. [12] J. Buckley, T. Mens, M. Zenger, A. Rashid, and G. Kniesel, “Towards a taxonomy of software change,” Journal of Software Maintenance and Evolution: Research and Practice, 2003. [13] J. S. Bradbury, J. R. Cordy, J. Dingel, and M. Werrnelinger, “A survey of self- management in dynamic software architecture specifications,” in Proceedings of the International Workshop on Self-Managed Systems (WOSS), 2004. [14] D. L. Metayer, “Describing software architecture styles using graph grammers,” IEEE Transaction on Software Engineering, vol. 24, no. 7, pp. 521—533, 1998. [15] G. Taentzer, M. Goedicke, and T. Meyer, “Dynamic change management by dis- tributed graph transformation: Towards configurable distributed systems,” in Pro- ceedings of the 6th International Workshop on Theory and Application of Graph Transformation, vol. 1764 of LNCS, Springer, 1998. [16] M. Werrnelinger, A. Lopes, and J. L. Fiadeiro, “A graph based architectural (re)configuration language,” in Proceedings of the 8th European Software Engi- neering Conference and 9th ACM SIGSOF T Symposium on the Foundatations of Software Engineering (ESEC/FSE 2001 ), vol. 26 of Software Engineering Notes, pp. 21—32, 2001. [17] R. Allen, R. Douence, and D. Garlan, “Specifying and analyzing dynamic software architectures,” in Proceedings of Conference on Fundamental Approaches to Soft- ware Engineering, vol. 1382 of Lecture Notes in Computer Science (LNCS), pp. 21— 35, 1998. [18] C. E. Cuesta, P. de la Fuente, and M. Barrio-Solarzano, “Dynamic coordination ar- chitecture through use of reflection,” in Proceedings of A CM Symposium on Applied Computing, pp. 134—140, ACM Press, 2001. [19] P. Oreizy, N. Medvidovic, and R. N. Taylor, “Architecture-based runtime software evolution,” in Proceedings of the 20th International Conference on Software Engi- neering, pp. 177—186, 1998. [20] J. Kramer, J. Magee, and M. Sloman, “Configuring distributed systems,” in Pro- ceedings of the 5th Workshop on ACM SIGOPS European Workshop, pp. 1—5, ACM Press, 1992. [21] S. McCamant and M. D. Ernst, “Predicting problems caused by component up- grades,” in ESEC/FSE: Proceedings of the 10th European Software Engineering Conference and the 11th ACM SIGSOF T Symposium on the Foundations of Soft- ware Engineering, (Helsinki, Finland), pp. 287—296, September 2003. 197 [22] L. Mariani and M. Pezzé, “A technique for verifying component-based software,” in International Workshop on Test and Analysis of Component Based Systems, (Barcelona, Spain), pp. 17—30, March 27—28, 2004. [23] S. Chaki, N. Sharygina, and N. Sinha, “Verification of evolving software,” in Pro- ceedings of the 3rd International Workshop on Specification and Verification of Component-based Systems, pp. 55—61, 2004. [24] D. Gupta and P. Jalote, “On-line software version change using state transfer be- tween processes,” Software - Practice and Experience, vol. 23, no. 9, pp. 949-964, 1993. [25] B. P. Lientz and E. B. Swanson, Software Maintenance Management: A Study of the Maintenance of Computer-Application Software in 487 Data Processing Organiza- tions. Addison-Wesley, August 1980. [26] N. Chapin, J. Hale, K. Khan, J. Ramil, and W. G. Than, “Types of software evolution and software maintenance,” Journal of Software Maintenance and Evolution, no. 13, pp. 3—30, 2001. [27] P. K. McKinley, S. M. Sadjadi, E. P. Kasten, and H. C. Cheng, “A taxonomy of compositional adaptation,” Tech. Rep. MSU-CSE-O4-l 7, Michigan State University, May 2004. [28] C. Szyperski, Component Software: Beyond Object-Oriented Programming. Addison-Wesley Professional, 2nd ed., 2002. [29] J. Kramer and J. Magee, “Analysing dynamic change in software architectures: A case study,” in Proceedings of IEEE International Conference on Configurable Dis- tributed Systems, 1998. [30] C. Canal, E. Pirnentel, and J. M. Troya, “Specification and refinement of dynamic software architectures,” in Proceedings of the Working IF IP Conference on Software Architecture, pp. 107—126, 1999. [31] R. N. Taylor, N. Medvidovic, K. M. Anderson, E. J. Whitehead, and J. E. Robbins, “A component- and message-based architectural style for gui software,” in Proceed- ings of the 17th International Conference on Software Engineering, pp. 295—304, ACM Press, 1995. [32] 1. Lee, DYMOS: A Dynamic MOdification System. PhD thesis, University of Wis- consin, 1983. [33] R. P. Cook, “*mod - a language for distributed programming,” IEEE Transactions on Software Engineering, vol. 6, no. 6, pp. 563—571, 1980. [34] J. Magee and J. Kramer, “Dynamic configuration for distributed systems,” IEEE Transactions on Software Engineering, vol. 11, no. 4, pp. 424—436, 1985. 198 [35] J. Magee, J. Kramer, and M. Sloman, “Constructing distributed systems in conic,” IEEE Transactions on Software Engineering, vol. 15, no. 6, 1989. [36] B. Liskov, “Distributed programming in argus,” Communications of the ACM, pp. 300—312, March 1988. [37] T. Bloom, Dynamic Module Replacement in a Distributed Programming System. PhD thesis, Massachusetts Institute of Technology, 1983. [38] D. Gupta, On-line Software Version Change. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology, 1994. [39] J. Zhang and B. Cheng, “Using temporal logic to specify adaptive program seman- tics,” Journal of Systems and Software (JSS), vol. 79, no. 10, pp. 1361-1369, 2006. [40] J. Zhang and B. Cheng, “Model-based development of dynamically adaptive soft- ware,” in Proceedings of International Conference on Software Engineering, May 2006. [41] J. Zhang, Z. Yang, B. Cheng, and P. McKinley, “Adding safeness to dynamic adap- tation techniques,” in Proceedings of I CSE 2004 Workshop on Architecting Depend- able Systems, (Edinburgh, Scotland, UK), May 2004. [42] B. Alpern and F. B. Schneider, “Defining liveness,” Information Processing Letters, vol. 21, pp. 181-185, October 1985. [43] B. Alpern and F. B. Schneider, “Proving boolean combinations of deterministic prop- erties,” in Proceedings of the Second Symposium on Logic in Computer Science, pp. 131—137, 1987. [44] M. E. Segal and O. Frieder, “Dynamic program updating: a software maintenance technique for minimizing software downtime,” Software Maintenance : Research and Practice, vol. 1, pp. 59—79, September 1989. [45] M. E. Segal and O. Frieder, “Dynamically updating distributed software: Supporting change in uncertain and mistrustful environments,” in Proceedings of International Conference on Software Maintenance, October 1989. [46] M. E. Segal, “Extending dynamic program updating systems to support distributed systems that communicate via remote evaluation,” in Proceedings of the Interna- tional Workshop on Configurable Distributed Systems, pp. 188-199, 1991. [47] M. R. Barbacci, D. L. Doubleday, and C. B. Weinstock, “Application level pro- gramming,” in Proceedings of International Conference on Distributed Computing Systems, pp. 458—465, 1990. [48] C. Hofrneister, Dynamic Reconfiguration. PhD thesis, Computer Science Depart- ment, University of Maryland, 1993. 199 [49] S. Gihnore, D. Kirli, and C. Walton, “Dynamic ml without dynamic types,” Tech. Rep. ECS-LFCS-97-3 78, Laboratory for the Foundations of Computer Science, Uni- versity of Edinburgh, December 1997. [50] C. Walton, D. Kirli, and S. Gilmore, “An abstract machine for module replacement,” tech. rep., Laboratory for the Foundations of Computer Science, University of Edin- burgh, June 1998. [51] M. Hicks, Dynamic Software Updating. PhD thesis, University of Pennsylvania, 2001. [52] M. Segal and O. Frieder, “On-the-fly program modification: Systems for dynamic updating,” IEEE Software, pp. 53—65, March 1993. [53] D. Peled and W. Penczek, “Using asynchronous buchi automata for efficient auto- matic verification of concurrent systems,” in PSTV, pp. 315—330, 1995. [54] J. Shutt and R. Rubinstein, “Self-modifying finite automata: An introduction,” in Information Processing Letters, vol. 56, pp. 185-190, 1995. [55] N. Lynch, Distributed Algorithms. Morgan Kaufinann, 1996. [56] A. Arora and M. G. Gouda, “Closure and convergence: A foundation of fault- tolerant computing,” IEEE Transactions on Software Engineering, 1993. [57] A. Arora and S. S. Kulkarni, “Component based design of multitolerant systems,” IEEE transactions on Software Engineering, vol. 24, pp. 63—78, January 1998. [58] S. Kulkarni, Component-based Design of Fault Tolerance. PhD thesis, Ohio State University, 1999. [59] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed control,” Communi- cations of the ACM, vol. 17, no. 11, 1974. [60] A. Arora and S. S. Kulkarni, “Designing masking fault-tolerance via nonmask- ing fault-tolerance,” IEEE Transactions on Software Engineering, vol. 24, no. 6, pp. 435—450, 1998. [61] A. Arora and M. G. Gouda, “Closure and convergence: A foundation of fault- tolerant computing,” IEEE Transactions on Software Engineering, vol. 19, no. 11, pp.1015—1027,1993. [62] G. Varghese, Self-stabilization by local checking and correction. PhD thesis, MIT/LCS/TR-583, 1993. [63] M. Gouda, Elements of Network Protocol Desgin. John Wiley & Sons, 1998. [64] E. W. Dijkstra, A Discipline of Programming. Prentice Hall, 1976. 200 [65] G. Holzrnann, “Logic verification of ansi-c code with spin,” Proceedings of the The Sixth SPIN Workshop, 2000. [66] M. G. Gouda and T. McGuire, “Correctness preserving transformations for network protocol compilers,” Prepared for the Workshop on New Visions for Software Design and Productivity: Research and Applications, 2001 . [67] M. Nesterenko and A. Arora, “Stabilization-preserving atomicity refinement,” Jour- nal of Parallel and Distributed Computing, vol. 62(5), pp. 766—791 , 2002. [68] M. Demirbas and A. Arora, “Convergence refinement,” Proceedings of the Interna- tional Conference on Distributed Computing Systems, 2002. [69] D. Gries, The Science of Programming. Springer-Verlag, 1981. [70] M. D. Ernst, Dynamically Discovering Likely Program Invariants. PhD thesis, Uni- versity of Washington Department of Computer Science and Engineering, (Seattle, . Washington), 2000. [71] M. Ernst, J. Cockrell, W. Griswold, and D. Notkin, “Dynamically discovering likely program invariants to support program evolution,” in Proceedings of the Intema- tional Conference on Software Engineering, pp. 213—224, 1999. [72] M. Ernst, A. Czeisler, W. Griswold, and D. Notkin, “Quickly detecting relevant program invariants,” in Proceedings of the International Conference on Software Engineering, pp. 449—458, 2000. [73] M. G. Gouda and T. Herman, “Adaptive programming,” IEEE Transactions on Soft- ware Engineering, vol. 17, pp. 911—921, 1991. [74] R. V. Renesse, K. Birman, M. Hayden, A. Vaysburd, and D. Karr, “Building adaptive systems using ensemble,” Software - Practice & Experience, vol. 28, pp. 963-979, July 1998. [75] N. Sridhar, S. M. Pike, and B. W. Weide, “Dynamic module replacement in dis- tributed protocols,” in Proceedings of the 23rd International Conference on Dis- tributed Computing Systems, May 2003. [76] E. W. Dijkstra and C. S. Scholten, “Termination detection for diffusing computa- tion,” in Information Processing Letters, vol. 11, pp. 1—4, August 1980. [77] S. Vasudevan, J. Kurose, and D. Towsley, “Design and analysis of a leader elec- tion algorithm for mobile ad hoc networks,” in Proceedings of the 12th IEEE Inter- national Conference on Network Protocols, pp. 350—360, IEEE Computer Society, October 2004. [78] O. Babaoglu and K. Marzullo, “Consistent global states of distributed systems: Fun- damental concepts and mechanisms,” in Distributed Systems (S. Mullender, ed.), pp. 55—96, Addison-Wesley, 1993. 201 [79] V. Garg and B. Waldecker, “Detection of weak unstable predicates in distributed programs,” IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 299— 307, March 1994. [80] M. Chandy and L. Lamport, “Distributed snapshots: Determining global states of distributed systems,” ACM Transactions on Computer Systems, vol. 3, pp. 63—75, Feb 1985. [81] R. Cooper and K. Marzullo, “Consistent detection of global predicates,” Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, vol. 26, no. 12, pp. 167—174, 1991. [82] S. Venkatesan and B. Dathan, “Testing and debugging distributed programs us- ing global predicates,” IEEE Transactions of Software Engineering, vol. 21, no. 2, pp. 163—177, 1995. [83] V. Garg and B. Waldecker, “Detection of strong unstable predicates in distributed programs,” IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 12, pp. 1323—1333, 1996. [84] L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” Com- munications of the ACM, vol. 21, pp. 558—565, July 1978. [85] F. Mattem, “Virtual time and global states of distributed systems,” in Proceedings of the International Workshop on Parallel and Distributed Algorithms, pp. 215-226, 1989. [86] D. L. Pamas, “On the criteria to be used in decomposing systems into modules,” Communications of the ACM, vol. 15, no. 12, 1972. [87] P. Allen and S. Frost, Component-Based Development for Enterprise Systems. Cam- bridge University Press, 1998. [88] K. Czamecki and U. Eisenecker, Generative Programming. Addison-Wesley, May 2000. [89] E. Gamma, R. Helm, R Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. [90] D. M. Hoffman and D. M. Weiss, eds., Software Fundamentals - Collected Papers by David L. Parnas. Addison-Wesley, 2001. [91] K. Biyani, “Dynamic composition of distributed components,” Master’s thesis, Michigan State University, 2003. [92] E. P. Kasten, An Integrated Approach to Autonomous Computation for Data Stream- ing Applications. PhD thesis, Michigan State University, 2007. [93] “IBM systems journal, special issue on autonomic computing,” 2003. 202 [94] T. Jebara and A. Pentland, “Statistical imitative learning from perceptual data,” in Proceedings of the Second International Conference on Development and Learning, pp. 191-196, June 2002. [95] R. Laddaga, M. L. Swinson, and P. Robertson, “Seeing clearly and moving forward,” IEEE Intelligent Systems, vol. 15, pp. 46—50, 2000. [96] A. Arora and M. G. Gouda, “Distributed reset,” IEEE Transactions on Computers, vol. 43, no. 9, pp. 1026-1038, 1994. [97] S. S. Kulkarni and A. Arora, “Multitolerance in distributed reset,” Chicago Journal of Theoretical Computer Science, 1998. [98] P. T. Wojciechowski and O. Riitti, Formal Methods for Open Object-Based Dis- tributed Systems, vol. 3535 of Lecture Notes in Computer Science, ch. On Correct- ness of Dynamic Protocol Update, pp. 275—289. Springer Berlin / Heidelberg, 2005. [99] D. Batory, “Multi-level models in model-driven development, product lines, and metaprogramming,” IBM Systems Journal, vol. 45, no. 3, 2006. [100] J. Liu and D. Batory, “Automatic remodularization and optimized synthesis of prod- uct families,” in Generative Programming and Component Engineering (GPCE), October 2004. [101] P. Clements and L. Northrop, Software Product Lines. Addison Wesley, 2001. [102] S. Kulkarni and K. Biyani, “Correctness of component-based adaptation,” in Pro- ceedings of International Symposium on Component-based Software Engineering - CBSE, at ICSE, vol. 3054 of Lecture Notes in Computer Science, pp. 48-58, May 2004. [103] K. Biyani and S. Kulkarni, “Mixed-mode adaptation in distributed systems: A case study,” in Proceedings of International Workshop on Software Engineering for Adap- tive and Self-Managing Systems - SEAMS, at ICSE, May 2007. [104] K. Biyani and S. Kulkarni, “Concurrency tradeoffs in dynamic adaptation,” in Pro- ceedings of International Workshop on Assurance in Distributed Systems and Net- works - ADSN, at ICDCS, July 2006. [105] K. Biyani and S. Kulkarni, “Testing dynamic adaptation in distributed systems,” in Proceedings of International Workshop on Automation of Software Test - AST at ICSE, May 2007. [106] K. Biyani and S. Kulkarni, “Building component families to support adaptation,” in Proceedings of International Workshop on Design and Evolution of Autonomic Application Software - DEAS, at ICSE, May 2005. [107] M. Arumugam, S. Kulkarni, and K. Biyani, “Adaptation in sensor—actuator net- works: A case study,” in Proceedings of the Third International Conference on Net- worked Sensing Systems - INSS, June 2006. 203 run; ------ IIIIIIIIIIIIIIIIIIIIIIIIIIIII 11111111111|11111111111111111111111