*1! r , .35.. Kim. 45...! .. . .. o = 1:! #5 , :23 . I .. «dummy: waifiummw; ZR... whrfifiau mfihm a .4 . . Hm..." «.3 "ARM .uMflwTrju .“fliahfi 2:. . an)“ x mummy... .15 .w. . on: 5.! cud“? . ‘ .2... 4. I. It Vr {V t P... :91. «.1. I i . . a... a.» 34305 "nu hm. 3.1%... n. .lihh JI 4‘2..qu Ilflnfiiv .. ,. «(annuawouflmn .. .Mu. LL.— . L : SEMI:- i Sccaialo LIBRARY Michigan State This is to certify that the University dissertation entitled Designing and Implementing a Model of Synchronization Contracts in Object-Oriented Languages presented by Karl Reimer Behrends has been accepted towards fulfillment of the requirements for the Ph.D. degree in Computer Science and Engineering Major Professor’s Signature IZ/H/zw/Z. Date MSU is an Affirmative Action/Equal Opportunity Institution --—.--¢-—.--—- PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DateDue.p65-p. 15 DESIGNING AND IMPLEMENTING A MODEL OF SYNCHRONIZATION CONTRACTS IN OBJECT-ORIENTED LANGUAGES By Karl Reimer Behrends A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2003 ABSTRACT DESIGNING AND IMPLEMENTING A MODEL OF SYNCHRONIZATION CONTRACTS IN OBJECT-ORIENTED LANGUAGES By Karl Reimer Behrends This thesis describes the design and implementation of a synchronization mechanism, called the universe model, that is based on declaratively specified contractual relation- ships between client and supplier components. These synchronization contracts not only specify basic mutual exclusion and semantic predicates, but also explicit logical compo- sition of those basic contracts by boolean operators, and implicit composition to recom- bine contracts that have been decomposed in order to modularize them. Such contracts are negotiated implicitly and transparently by a runtime system that observes the abstract state of the program, instead of using explicit procedural synchronization primitives. We present a classification scheme for synchronization contracts, and describe how they can be added to existing object-oriented languages by means of a simple declarative language that expresses these contracts. We also present an implementation of a runtime system that negotiates these synchronization contracts: It draws upon ideas from garbage collection and distributed databases, and adapts them to negotiate synchronization contracts in a fash- ion that avoids deadlocks and starvation where feasible. While the resulting algorithms are complex, we have implemented them and present empirical validation to Show that this implementation is efficient. TABLE OF CONTENTS LIST OF TABLES vii LIST OF FIGURES viii 1 Introduction 1 1.1 Synchronization Contracts ............................ 2 1.2 The Universe Model ............................... 3 1.3 Thesis Overview ................................. 5 2 State of the Art 6 2.1 Synchronization in Shared Memory Systems ................... 7 2.2 Message Passing Systems ............................. 11 2.3 High-Level Synchronization Paradigms ..................... 15 2.3.] Server-side Synchronization for Passive Objects ................ 16 2.3.2 Client-side Synchronization for Passive Objects ................ 18 2.3.3 Active Object Approaches ........................... 21 2.4 Synchronization of Multi-step Transactions ................... 24 2.5 Synchronization and Composability ....................... 25 2.6 A Classification of Synchronization Mechanisms ................ 30 2.6.1 Existing Approaches for Explicit Composition of Synchronization Contracts 31 2.6.2 Beyond Composition of Exclusion Contracts ................. 34 3 The Universe Model 38 3.1 Motivating Example ............................... 39 3.2 Formal introduction to the universe model .................... 43 3.3 Contract language: Concurrency constraints ................... 45 3.3.] Conditions and Condition Verification ..................... 45 3.3.2 Constraints and Constraint Satisfaction .................... 47 3.4 Contract Negotiation and the Run-time System ................. 51 4 Realm update 54 4.1 Run-time representation of realms ........................ 55 4.2 Realm-update procedure ............................. 57 4.3 Realm contraction ................................. 59 4.4 Realm completion ................................. 63 4.5 Preventing Starvation and Deadlock ....................... 66 iv 5 Feature: Deadlock Avoidance 70 5.1 Specification ................................... 71 5.2 Data Refinement ................................. 73 5.2.1 State refinements ................................ 74 5.2.2 Inter-process messaging ............................ 75 5.2.3 Additional primitives .............................. 77 5.3 Algorithms .................................... 77 5.3.] Universe Acquisition .............................. 78 5.3.2 Universe Surrender ............................... 82 5.3.3 Universe Release ................................ 85 6 Feature: Deadlock Recovery 87 6.] Specification ................................... 88 6.1.1 Negotiation Refinement ............................ 89 6.1.2 Restart Condition: Conceptual Definition ................... 91 6.1.3 Restart Condition: Effective Definition ..................... 92 6.2 Data Refinement ................................. 95 6.3 Algorithms .................................... 97 7 Feature: Conditional Contract Negotiation 101 8 Eiffel Language Extensions for the Universe Model 105 8.] Defining and Creating Universes ......................... 106 8.1.1 Defining Universes ............................... 106 8.1.2 Universe Classes and Inheritance ........................ 109 8.1.3 Alternate Universe Declaration ......................... 110 8.1.4 Creating Universes ............................... 1 11 8.2 Defining and Creating Processes ......................... 112 8.2.1 The Constituent Parts of a Process ....................... 113 8.2.2 An Example of Process Creation ........................ 115 8.3 Accessing Global Data .............................. 119 8.4 Assignments and Data Access .......................... 123 8.4. 1 Motivation ................................... 124 8.4.2 Shared and Normal References ......................... 125 8.4.3 Locality of Objects and Variables ....................... 127 8.4.4 Semantics of Assignment ............................ 128 8.4.5 Semantics of Argument Passing ........................ 130 8.4.6 Assignment of Values .............................. 131 8.4.7 Deep Copying .................................. 132 8.5 Summary of Language Extensions ........................ 137 9 Performance Evaluation 139 9.1 The Benchmark Infrastructure .......................... 140 9.1.1 The Class Schema ............................... 140 9.1.2 Input Parameters ................................ 142 9.1.3 Output Values and Precision .......................... 142 9.1.4 Configuration Options ............................. 144 9.2 Benchmark Results ................................ 145 9.3 Analysis ...................................... 147 9.3.1 Uniprocessor Results .............................. 147 9.3.2 SMP Architectures ............................... 149 10 Conclusion and Future Research 152 10.] Comparison of the Universe Model to Existing Approaches ........... 152 10.1.1 Explicit Composition beyond the Universe Model ............... 153 10.1.2 Intra-object Contracts .............................. 154 10.1.3 Implicit Composition of Contracts in the Universe Model ........... 155 10.2 Open Research Questions ............................. 156 10.2.1 Treatment of Disjunctions ........................... 157 10.2.2 Composition of Conditional Contracts ..................... 159 10.3 Outlook ...................................... 160 APPENDICES 161 A Low-Level Algorithmic Details 162 A.1 The Message Passing Subsystem ......................... 162 A.1 .1 Checking for realm completion ......................... 162 A.1.2 Message Transmission ............................. 164 A.1.3 In-state predicates ............................... 164 A2 Releasing and Requesting Universes ....................... 164 A21 Manipulating the Set of Blocked Universes .................. 164 A22 Requesting a Universe ............................. 166 A23 Releasing a Process from a Queue ....................... 166 vi LIST OF TABLES 1 Scenarios 1 and 2 — No contention and with contention ............ 146 2 Scenario 3 - Process contention. ....................... 147 vii HOOOQONUI-wai-t ll 13 14 15 16 17 18 19 2O 21 22 23 24 25 26 27 28 29 3O 31 32 33 LIST OF FIGURES Two concurrent processes ............................. 8 Message Ordering ................................. 12 Causality ..................................... 1 3 Enrollment Example ............................... 20 Dining Philosophers in Linda ........................... 22 Dining Philosophers using Monitors ....................... 27 A Composability Problem ............................ 28 UML model of page classes in the wiki example. ................ 4O Elided definition of class REAL_PAGE in (universe-model extended) Eiffel. . . 41 Elided definition of class EVENT_MON I TOR. Notice constraint includes a con- ditional universe reference. .......................... 45 Functions and procedures for realm contraction .................. 62 Functions and procedures for realm completion .................. 64 Protocol of negotiation by a process for a universe. ............... 66 Refinement of protocol in Figure 13 to avoid deadlocks using wound—wait strategy. .................................... 72 Functions and procedures for acquiring a universe while avoiding deadlock. . . 79 Procedures for surrendering a universe to another process. ........... 82 Example: Surrendering a universe ........................ 84 Procedures for releasing a universe from a realm. ................ 86 Refinement of protocol in Figure 14 with cautious waiting for deadlock recovery 90 Refinement of function getMes sage to support deadlock recovery ....... 98 Refinement of protocol in Figure 19 to negotiate conditional contracts. This diagram depicts the most refined negotiation protocol. . . . . . . . . . . . . 102 A reentrant mutex ................................. 108 Improved reentrant mutex ............................. 109 The PROCESS.BASE class ............................ 114 The PROCESS class ................................ 115 An event monitor ................................. 116 An event generator ................................ 116 Creating processes ................................ 117 Accessing global data ............................... 122 Thread-local once functions ........................... 123 Shared Data .................................... 127 Deep Copy Example ............................... 136 Schema used to generate benchmark classes .................... 143 viii 34 Various algorithms for message passing and negotiation-state observation. . . . 163 35 Support procedures for requesting and releasing universes. ........... 165 ix Chapter 1 Introduction Applications in which multiple threads operate on shared data are notoriously difficult to design. Without proper synchronization, concurrent access to shared objects can lead to race conditions [NM92], and incorrect synchronization logic can lead to starvation and deadlock. To address these concerns, we and others have looked to extend and articulate Meyer’s design by contract method to raise the level of abstraction with which to design such systems (cf. [Mey97, BJPW99, BSOO, SBDO3]). Contracts used for this purpose are called synchronization contracts because they specify and constrain the legal sequences 1 in a concurrent system. For example, in a of interaction among objects and processes client-server system with one server and multiple clients, a synchronization contract could assert that a client can perform a sequence of server operations without interference by other clients. Our universe model [BSOO, SBDO3] is a powerful model of synchronization contracts that integrates well with existing object-oriented languages. A key concept in this 1We shall henceforth use the term process rather than thread, as the former is the term used more frequently by other researchers. Throughout this thesis we treat these terms as synonyms. 1 model is a run-time system that dynamically negotiates contracts among multiple groups of objects by scheduling processes so as to guarantee mutual exclusion while attempting to avoid starvation and avoid or recover from deadlock situations. This thesis describes the design of a programming model for such contracts, the algorithms used to implement run- time negotiation for them, documents extensions for the Eiffel programming language to use that model, and provides evidence for its scalability. 1.1 Synchronization Contracts The term contract refers to a formal agreement between client and supplier modules. Beug- nard and others [BJPW99] identify a taxonomy of different kinds of contracts, the most fa- miliar type being behavioral contracts, which include the pre- and post-condition features of Eiffel [Mey92]. Behavioral contracts are specified in the interface of a supplier mod- ule, and they spell out the module’s rights and responsibilities. In the design-by-contract method, each module is designed to assume the rights and guarantee the responsibilities of its contract, and the designer must verify that a client guarantees the rights assumed by any supplier the client uses. Meyer shows how such contract-aware modules are dramatically smaller and simpler than their contract-unaware counterparts [Mey97]. For example, the former will not contain code to check operation pre-conditions, as the designer will have verified that clients only invoke the operation when the pre-condition is satisfied. Synchronization contracts afford a similar simplicity by allowing a designer to declar- atively specify constraints on concurrent interaction in lieu of programming low-level syn- chronization mechanisms and protocols. Unlike behavioral contracts, synchronization con- 2 tracts are not verified at design time, but rather are negotiated at run time. In Java, for example, when multiple client objects (running in different threads) invoke a synchronized method on the same supplier object, the clients negotiate for exclusive access to the sup- plier. The mechanism used to support this negotiation is a simple mutex, which each client will attempt to lock before entering the method, but neither the client modules nor the supplier module contains code for acquiring and releasing this mutex. More powerful synchronization contracts employ more powerful synchronization mechanisms. Consider, for example, a contract that guarantees a client exclusive access to multiple suppliers. To prevent undesirable phenomena, such as starvation and deadlock, clients and suppliers must negotiate this contract using a protocol, such as one of the com- mon two-phase protocols [RSL78, FR85, H292, FHRT92]. The logic for such protocols is not localized to a single client or supplier module. Thus the ability to specify this behav- ior declaratively, via a contract, should simplify (as well as decouple) the implementation of client and supplier modules. We believe that integrating ever more powerful models of synchronization contracts into programming languages will dramatically simplify the de- velopment and improve the reliability of multi-threaded shared-memory systems. 1.2 The Universe Model The universe model supports a powerful class of synchronization contracts. The most ba- sic forms are exclusion contracts, which guarantee exclusive access to a supplier S from a client C, and conditional contracts, which guarantee that C has exclusive access to S once some condition in S is satisfied. The model also supports composite contracts, which 3 combine exclusion and conditional contracts by conjunction and disjunction. Composite contracts are used, for example, to guarantee a client exclusive access to multiple suppli- ers. Moreover, composite contracts may arise implicitly, as when one module m1 has an exclusion contract with a supplier module m2, which itself has an exclusion contract with another supplier m3. This transitivity of contract dependencies is essential for guaranteeing mutual exclusion without violating information hiding, especially in hierarchically layered systems [BSOO]. Finally, the model also supports parameterized contracts, which prefix a contract with an enabling predicate that may refer to complex state conditions (i.e., the parameters) in the client. Parameterized contracts are used to specify how a client’s needs change according to changes in the state of the client. At runtime, changes to these param- eters implicitly trigger a renegotiation of the contract. Because the underlying mechanisms to negotiate individual exclusion and conditional contracts are well understood, support for these basic contracts is common in modern object-oriented languages. By contrast, composite contracts have yet to gain widespread adoption in general purpose programming languagesz, presumably because the synchro- nization mechanisms required to dynamically negotiate them are still the subject of current research. This thesis contributes the design of a model for the general composition of syn- chronization contracts, and an algorithm that uses a combination of synchronization mech- anisms and protocols to dynamically negotiate these composite contracts while avoiding starvation and avoiding or recovering from a large class of deadlocks. To validate the a] go- rithm, we extended the Eiffel language with primitives to declare universe-model contracts, 2A counterexample is [FP97]. and we implemented a compiler and run-time system for this extended language. 1.3 Thesis Overview The remainder of the thesis is organized as follows. First, we survey existing research in the area of concurrency, and establish the relationship between synchronization mechanisms and synchronization contracts (Chapter 2). Second, we formally introduce the universe model and its contract-specification language (Chapter 3). A major portion of the thesis (Chapter 4 through Chapter 7) describes our algorithm to dynamically negotiate contracts and schedule processes. This algorithm borrows and extends ideas from reference-counting garbage collection [BAL+01] and combines deadlock avoidance and recovery mechanisms [RSL78, HZ92],in order to negotiate contracts of the power expressible in the universe model. For clarity of exposition, we first present the algorithm for a restricted class of con— tracts, namely exclusion contracts and their explicit and implicit composites, and a negoti- ation protocol that is starvation free and that avoids a limited class of deadlocks. We then successively refine this algorithm to include a deadlock-recovery heuristic, which expands the class of preventable deadlocks, and to handle the (more general) conditional contracts. Chapter 8 then describes a set of extensions to the Eiffel programming language that make the semantics of the universe model available to the programmer. Finally, we demonstrate that the algorithm, as implemented in our extended Eiffel implementation, operates with reasonable efficiency (Chapter 9) and conclude by comparing the universe model to alter- native mechanisms that support synchronization contracts and discussing follow-on open research questions (Chapter 10). Chapter 2 State of the Art Before defining the universe model in detail, we first survey the state of the art of current research with respect to concurrency management. Fundamental for the study of concur- rency management is the distinction between the two most common execution models for concurrent systems, shared memory and message passing environments. These two models differ both in their implementation and the programming paradigms that are available in each model. In the shared memory model, concurrent processes access shared data and this access must be regulated to avoid race conditions [NM92] (Section 2.1). In the message passing model, each concurrent process exists in a separate dataspace and communicates with other processes only via the sending and receiving of messages. Message passing in a concurrent system is inherently non-deterministic and must be coordinated to avoid un- predictable behavior (Section 2.2). A variety of approaches have been developed based on these paradigmatic differences. Both execution models, however, assign client and server (or supplier) roles to objects and resources, where clients access servers (concurrently) to request services. This provides another dimension along which concurrent programming 6 models can vary, focusing either on the client or the supplier as being responsible for syn- chronization (Section 2.3) A major challenge with either approach is to create high-level models that support the programming of reliable concurrent systems. A key element in the design of such high-level approaches is the treatment of multi-step transactions, which perform multiple operations on one or more server objects atomically. Language support for multi-step transactions is scarce (Section 2.4) and where available, introduces modularity problems, such as viola- tion of information hiding, in hierarchically structured software designs (Section 2.5). We conclude the survey of current research with an overview of existing synchronization mech- anisms that support multi-step transactions, and classify them according to their capabilities (Section 2.6). 2.1 Synchronization in Shared Memory Systems The shared memory model of parallel programs assumes that processes share a common memory area. Processes communicate by writing to and reading from a shared object in that area. While communication through shared data structures is an essential feature of the shared memory model, it is also an inherent problem. Consider the code in Figure 1, which depicts two processes, P1 and P2. When P1 and P2 are running concurrently, instructions can be executed in a variety of orders. The sequence (1), (2), (3), (4) is just as possible as (l), (3), (2), (4) or (3), (1), (2), (4). Each version will result in the variables x and y containing different values. Also, statements (1) and (3) can be executed simultaneously, which makes the contents of t unpredictable. In other words, P1 and P2 interfere with each 7 process P1 process P1 t:=x*cl (1) t:=y*02 (3) x := t (2) y := t (4) end process end process Figure 1: Two concurrent processes other. Interference between processes occurs due to race conditions [NM92], which can take the form of either data races, where two processes access the same data item simultane- ously, or general races, where the execution of a process Q concurrently with another pro- cess P alters the behavior of P. For example, in Figure 1, both data races and general races can occur. Program correctness in shared memory systems is therefore generally predicated on the concept of non-interference [OG76], which requires programming mechanisms that guarantee non-interference by eliminating race conditions. Traditionally, to guarantee non-interference, one must identify critical sections of code, where processes might interfere with one another, and then use mutual exclusion mech- anisms, such as semaphores [Dij68], to protect these critical sections. Most concurrent programming languages provide higher-level constructs, such as monitors [Hoa74], which automatically lock and unlock a module on procedure entry and exit, to describe mutual exclusion mechanisms. One typically says that a Shared resource or object has been locked or acquired by a process when other processes are excluded from accessing it. Semaphores and monitors supply all the mechanisms needed to ensure non- interference. The real problem, however, is the correct identification of critical sections 8 and the generation of correct mutual exclusion code. If critical sections are misidentified, then shared resources can be accessed concurrently, leading to unpredictable errors. Fur- thermore, if mutual exclusion for more than one shared resource is needed, incorrect mutual exclusion code may introduce deadlock or starvation [Tan92]. Most high-level approaches that employ a shared memory model (including the universe model) employ mechanisms to reduce or eliminate the potential for error. In particular, all data races can be eliminated automatically by ensuring that a shared object is locked before it is accessed or modified. Automatic elimination of data races is thus a minimum non-interference guarantee that any safe synchronization mechanism can offer, though several elect not to follow the above ap- proach for efficiency reasons [GJSB96] or to allow intra-object parallelism [CH74, Mit95]. However, this assurance cannot automatically be extended to general races. General races occur primarily during multi-step transactions, which consist of sequences of service invocations by and among multiple shared objects. We use the word transaction to suggest that the sequence should either run to completion or not at all; that is, we want to remove the possibility that service requests from other concurrent processes can interfere with the sequence. Many programming languages allow the programmer to guard transactions with a mechanism for locking an object for the duration of such a transaction. However, even if one requires that an object is locked before it is accessed (as in [Mey93]), there is no guarantee that the lock is held throughout the acquisition. For instance, process P1 in the above example could be rewritten as follows, where the synchroni zed statement locks t for the duration of the enclosed block: process P1 synchronized t do u=x*cl end synchronized t do x := t end end process Observe that the code, after having been augmented with synchronization instructions, will not access the shared object t without locking it first; however, the program is just as vulnerable to interference as the original. While this defect is easy to detect in such a brief segment of code, common programming practice can easily hide it. Assume, for example, that x, t, and c1 are matrices. In such a case, the matrix op- erations are often performed by subroutine calls, and temporary results are stored in an intermediate variable to avoid unnecessary copying of a two-dimensional array. Suppose that functions multiply and assigmfronutmp reference an intermediate variable (called t), which is global (i.e., not local to these functions). The correct code to avoid non-interference would then look as follows: process P1 synchronized I do call multiply(x, c1) call assign.from_tmp(x) lO end end process To write the correct synchronization code requires that the programmer inspect the internals of multiply and assign_from_tmp to derive that t needs to be locked for the duration of the transaction. This obligation is easy to overlook and its omission is a clear violation of basic information hiding, because it exposes t. However, moving the synchronized instruction inside the subroutines for proper information hiding would allow other processes to interfere with the operation. In summary, the shared memory model is prone to interference due to race conditions. Race conditions must be eliminated through mutual exclusion mechanisms, which cannot always guarantee the absence of general races and may violate information hiding. 2.2 Message Passing Systems In a message passing system, each process has its own memory area, disjoint from all the others. Processes communicate by sending messages to one another. Having exactly one process per memory area automatically eliminates all data races and is considered the major advantage of the message passing model [Hoa85]. Also, convenient formalisms exist that make it easy to reason about the correctness of such programs [Hoa85, Mil80]. However, if one wishes to harness the message passing model for concurrent programming by providing general purpose programming mechanisms, a number of new concurrency- related problems arise that do not exist in the shared memory model. In this section, we 11 P1 e1 e4 P2 e2 e3 P3 Figure 2: Message Ordering explore these problems and possible solutions to them. The first problem is that of a copying overhead. When two processes want to access the same data structure, the data must be copied. In the case of complex data structures, the messages sent between processes can become so large that the communication overhead dwarfs the actual computation. Since memory access speed has not increased at the same rate as CPU speed, this can be a concern for time-critical applications. The second problem is that of message ordering. Where the shared memory model has a non-determinism problem when it comes to accessing shared data, the corresponding problem in the message passing model is message ordering for computations that involve several processes. Consider the time diagram in Figure 2. Horizontal lines represent pro- cesses, arrows represent messages being sent, and the filled circles represent the events that occur when messages are received. An event being depicted to the right of another event means that it occurred at a later time. Whether event e3 occurs before event e4 (as depicted in the figure) or vice versa depends solely on the order in which the corresponding mes- sages arrive. The underlying problem is that message passing is an asynchronous operation (meaning that messages initiate an independent thread of control in the receiver, while the sender’s thread continues in parallel), whereas procedure calls in the shared memory model 12 53 64 Pl el 66 P2 2./ 65 P3 Figure 3: Causality are synchronous operations. Non-deterministic message arrival can cause problems when it violates causality. In- tuitively, causality violations occur when an event e’ depends on the results of an event e, but a process observes and acts upon the effects of e’ before it observes and acts upon the effects of e (a formal definition can be found in [SM94]). For example, in Figure 3, process P2 sends a message (e1 to e3) to inquire about P1 ’8 state. P] then sends a reply message (e3 to e6) with the required data. By the time the message arrives (e6), P] has received another message (e4), which has altered the state, so that the information is out of date. Here, e4 depends on the results of e3, but P2 observes and acts upon the results of e4 in e5, before it observes and acts upon the results of e3 in e6. This is a causality violation. The third problem is that because of the non-deterrninism of message passing, it can be difficult to obtain a globally consistent view of the state of more than one process (also called a distributed global snapshot). Specifically, if we have processes P1, . . . ,Pn, with states 51 (t), . . . ,5" (I) expressed as a function of the time t, then we may wish to compute a function f that depends on several of the Si(t). Because message transit takes an unknown, but non-zero amount of time, the values of the S ,(t) will typically be gathered at different 13 times. If any state changes occur between the beginning and the end of the computation (a violation of causality), then the result of f may be incorrect. The general problem of com- putations that depend on the state of more than one process is sufficiently complex to have spawned several independent research areas [Mat93]. Specifically, the case of obtaining a consistent global snapshot has been shown to be inherently non-trivial [CT90]. Both the second and the third problem can be solved by having a message passing mechanism that preserves causality. Unfortunately, the known general purpose algorithms [Fid88, Mat89] that preserve causality use a vector of timestamps, containing one times- tamp per process, and require a worst-case 0(n) overhead per message, where n is the number of processes. While there also exist a number of more efficient algorithms for the area of distributed discrete-event simulations [CM81, Jef85] that preserve causality, they require that each event carry a global timestamp. This requirement limits their applicability to other domains. Despite these problems, the message passing model has the advantage that it is very amenable to being used in combination with formal methods [BB89, Mil80]. Even when a full formalization is not possible, event-based schemes are generally so well understood that they can be analyzed with reasonable ease by modeling them with statecharts [Har87]. However, based on the existing research we conjecture that a fully automated way to handle causality-related problems may incur a substantial overhead that is linear in the number of processes. 14 2.3 High-Level Synchronization Paradigms Having discussed the underlying execution models, we now turn to higher-level paradigms that provide abstractions to deal with these execution models in a more convenient fashion. Object—oriented techniques fall either in the category of passive objects or active objects. In the active object model [Nie95], every object is driven by an autonomous process, com- municating with other objects via message passing. Since it is often not practical to have one separate thread of control for each object in a large system, most approaches actually have one active object offering services and controlling several non-active objects that it communicates with via synchronous communication [Car90, YT87]. In the passive object model, objects do not embody processes. Instead, processes are threads of control that ex- ist outside the object system and that call methods of shared objects directly. When one relates these models to message passing and shared memory approaches, active objects are the object-oriented equivalent of the message passing model, whereas passive objects implement the shared memory model. Both active and passive object systems distinguish between clients and servers. A server object is a provider of services, which can be invoked by one or more clients. In the pas- sive object model, services are invoked by procedure or method calls; in the active object model, services are invoked by sending a message to the server. Synchronization for a client-server interaction can either be performed by the server or by the client, and syn- chronization mechanisms can vary considerably depending on which side is responsible for synchronization [Bri72, CH74, FP97, Hoa74, H0199, Mey93]. The design by contract approach prefers to call servers suppliers, but server is the es- 15 tablished term for concurrent systems. We will therefore use the term server in the context of synchronization mechanisms as an object or process that interacts concurrently with a client, and the term supplier to emphasize the role as the provider of services in a contrac- tual context. 2.3.1 Server-side Synchronization for Passive Objects Among the first high-level mechanisms for shared memory systems was the concept of a monitor [Hoa74], developed by Hoare and Brinch Hansen. A monitor is strictly a server- side synchronization mechanism that is associated with each shared object: each time a method in the shared object is invoked, the object is locked; upon return from the called method, the object is unlocked. Monitors incorporate the concept of condition variables, which embody an abstract condition. Processes can wait for the condition to become true or signal that the condition has become true. Monitors localize synchronization within the shared object that is being accessed and automatically eliminate all data races. However, they cannot easily handle multi-step trans- actions. Specifically, a monitor cannot be locked for a sequence of service invocations or manage transactions that need to invoke services on more than one shared object without risk of interference. While this problem can be handled for transactions by migrating client code into the server (where the code becomes non-interruptible), such migration inflates the module interface of server objects and cannot solve the problem where a client needs to lock multiple servers. Subsequent approaches attempted to correct the problem of dealing with multi-step 16 transactions. Path expressions [CH74] allow a programmer to specify as part of the in- terface of a shared object all legal combinations of service invocations. Such combina- tions include both sequences of service invocations (thus handling multi-step transactions) and which services can be invoked concurrently (increasing parallelism when compared to monitors). The disadvantage of path expressions is that an error in the specification of a path expression can create data races. Also, with respect to multi-step transactions, a path expression has to anticipate all possible sequences of service invocations in order for a client to be able to actually use them. Should the actual access pattern by a client and the path expression disagree, this error is not necessarily detectable. Moreover, path expres- sions cannot handle transactions that involve multiple servers. A recent approach that attempts to deal with multi-step transactions on the server side is the concept of synchronization rings [H0199]. Synchronization rings are wrapper objects around server objects that are separate from the server proper, but that can intercept service invocations on one or more servers for synchronization purposes. Monitors can be simu- lated by having one synchronization ring for each shared object, which locks the object upon the beginning of a service invocation and unlocks it at the end. Unlike a monitor, however, a synchronization ring can intercept service invocations for more than one shared object and can synchronize access for entire multi-step transactions in a fashion similar to path expressions. The disadvantage of synchronization rings is that they cannot guaran- tee the absence of data races in general, and (like path expressions) need to anticipate and duplicate all possible sequences of service invocations by clients. 17 2.3.2 Client-side Synchronization for Passive Objects Given the inherent difficulty of modeling synchronization for multi-step transactions on the server side, a fair amount of research has focused on client-side synchronization. One of the most powerful early approaches was that of conditional critical regions [Bri72]. While never implemented in their original form, conditional critical regions formed the prototype for more advanced approaches, as well as proof techniques such as the Owicki-Gries model [OG76]. A conditional critical region is a Statement of the form lock R when C do S end. This statement declares that as soon as the shared object R is available and condition C becomes true, R is locked and statement S is executed. By ensuring that R can only be accessed from within a conditional critical region, data races can be avoided. Obviously, conditional critical regions can handle multi-step transactions of arbitrary length without additional effort, as long as they use only the single server R during the transaction. An obvious extension to conditional critical regions is to let R be a set of shared ob- jects, which are locked atomically as a group, rather than R being a single resource. In this extension, all resources in R have to be locked and C has to be true before S is executed. Such extended conditional critical regions are the core concept upon which modern client- side synchronization mechanisms such as SCOOP [Mey93, Mey97] and the Concurrent Extensions of Eiffel (CEE) [J P93] are built]. CEE implements a holdif statement that has 1While it has been argued whether SCOOP and CEE use active and passive objects, both use mechanisms that translate easily to general passive object schemes. l8 the semantics of extended conditional critical regions. SCOOP implements extended con- ditional critical regions as procedures: each argument passed to such a procedure that is a shared object is considered to be part of R, and the precondition of the procedure functions as the equivalent of C (preconditions are explicitly written in Eiffel, the language on which SCOOP is based). The programming language Sather [Omo91] extended the concept of conditional critical regions even further, allowing the specification of several alternative sets of resources, where the acquisition of one of those alternative sets was sufficient to en- ter the critical region. Thus, one could express conditions where exclusive access to either one set of resources or another (or yet another, and so forth) was required. Sather’s lock- ing mechanism is very powerful, but the actual implementation allowed for data races on shared objects, if the programmer accidentally forget to include them in the lock statement. In general, while client-side mechanisms can handle multi-step transactions more easily than server-side mechanisms, they are frequently criticized because synchronization code needs to be replicated in each client, rather than being encapsulated in a single shared object [B1079]. A large number of client-side synchronization mechanisms also permit data races (notably Java [Han99]). An inherent problem in client-side synchronization mechanisms is that a client may specify that more than one server object is to be locked atomically, and that naive imple- mentations of the locking of multiple shared objects may lead to deadlock. For example, if a process P1 looks a shared object A first, and another shared object B second, while another process P2 locks B first and A second, it is then possible that P1 has acquired A, P2 has acquired B, at which point P1 cannot acquire B nor can P2 acquire A, and the two processes are deadlocked. To solve the problem of acquiring multiple shared objects, one 19 ’-—--————_—-—-————---——————-——-—--—------—--—-———-—--———-—---—-——o‘ \ I Accounting Subsystem A Student Enroll Alice CSES l4 . Student Standalone “’8‘" 3 Bob CSE8I4 _____________________________________________________ Ii 0 m g \I o .I-_-.--_—' l I l I l l I l l I | l l l I l l l l l l l l l l l l l l __>.'...... ... l I I l l l l l l l l l l l l l l l I I I I I I \ \ Authentication System Figure 4: Enrollment Example has therefore to construct a procedure for doing so that avoids deadlocks. The problem of locking multiple shared objects occurs commonly in practice. Con- sider, for example, Figure 4, taken from [BSOO]. It depicts part of the enrollment system of a fictional university. Boxes indicate objects in a passive object system, and arrows in- dicate that one object uses the services of another. Alice’s ongoing transaction involves no fewer than five shared objects (the accounting subsystem, EnrollCSE8l4, CSE814, CSE470, MTH472). Bob’s transaction, when it is his turn, will require at least three (the accounting subsystem, StandaloneCSE8 l4, CSE814). The objects EnrollCSE814 and Stan- daloneCSE8l4 are distinct because they implement different synchronization contracts and provide different methods (the former requiring access to a course and its prerequisites, the 20 latter requiring access only to the course). The enrollment scenario therefore requires the locking or acquisition of multiple shared objects. 2.3.3 Active Object Approaches Because the focus of this thesis is on passive object schemes, we only briefly discuss the most interesting active object synchronization schemes. One of the oldest approaches to synchronization is Hoare’s concept of Communicating Sequential Processes, or CSP [Hoa78]. It has been the basis both for programming languages such as OCCAM-2 [INM88] and specification languages such as LOT OS [3889]. Its influence can also be seen in the design of Ada’s Rendezvous mechanism. CSP combines concurrency control (such as parallel and alternate execution) and input and output (across message channels) as primary programming language constructs based on the message passing model, facilitating the development of concurrent programs at a higher level of abstraction. In addition, CSP was the first attempt at a process algebra. Process algebras, such as CSP, CCS [Mi180], or ACP [BK85], provide algebraic languages for the specification of processes and calculi for reasoning about their behavior. More recently, the field of coordination languages [PA98] evolved from the idea that computation and coordination can be separated in active objects. Coordination, in its most general form, “is the process of building [concurrent] programs by gluing together active pieces” [CG92]. Hoare’s Communicating Sequential Processes can be viewed as a model for a coordination language, though Linda [GCCC85] is typically considered the earliest genuine coordination language. 21 process philosopher(i) process main IOOP fori := 1 to 5 do think; 0ut(“fork”, i); in(“room ticket”); eval(philosopher(i)); in(“fork”, i); if i < 5 then in(“fork”, i mod 5 + 1); 0ut(“room ticket”, i); eat; end 0ut(“f0rk”, i); end 0ut(“fork”, i mod 5 + 1); end 0ut(“r00m ticket”); end end Figure 5: Dining Philosophers in Linda Among coordination languages, we distinguish between data—driven and control-driven languages. Linda is the prototypical example of the former: all processes are separate enti- ties, but have access to a shared data space, called a tuple space. The two basic primitives in Linda allow the programmer to insert an item (a tuple) into the tuple space, or to retrieve an item that matches a certain pattern. Figure 5 shows the classical dining philosophers problem in Linda. Here, the out primitive inserts a tuple such as (" fork" , 1) into the shared tuple space. The in primitive retrieves a tuple from the tuple space; should there be no matching tuple in the space, the process blocks until another process inserts one. The main advantage of a shared tuple space is that as a central authority on concurrency management, it reduces the causality and global view problems of active object systems. This solution to the dining philosophers problem is already much more straightforward than the traditional ones based on mutual exclusion (such as in [Tan92]). But the more 22 general advantage of a coordination language is that processes can be designed separately from one another, interfacing through separate coordination interfaces. Unlike data-based coordination languages, control-based coordination languages do not know the concept of a shared data space. In control-based coordination languages, pro- cesses are black boxes, communicating with their environment via input ports and output ports defined in their interfaces. Channels between processes then create a set of producer- consumer relationships between output ports of producers and input ports of consumers. Examples of control-based coordination languages are PCL [TGC93], the POLYLITH Software Bus [Pur94], or WRIGHT [AG97]. Of particular interest among the control-based variants are compositional coordination languages such as Strand [Pat90] or PCN [F0396]. Rather than coordinating processes as black-box modules, these languages allow pieces of functional code to be composed to form more complex processes. Their major disadvantage is that they require programs to be written using functional or logical languages, since code with side-effects does not meet their requirements for composability. Almost all coordination languages rely on the active object model. Like CSP, they share its advantages and Shortcomings. Relying on data flow architectures, few coordination lan- guages support coordinated multiple resource access natively (an exception is the synchro- nizer concept [FA93], which operates in a fashion similar to the synchronization rings dis- cussed in Section 2.3.1). 23 2.4 Synchronization of Multi-step Transactions In Section 2.1, we introduced the concept of a multi-step transaction as a sequence of service invocations by and among multiple objects to perform some task and noted that most languages guard transactions with a mechanism for synchronizing access to an object for the duration of such a transaction, such as Java’s synchronized primitive or the holdif statement of CEE. For example, in Java, the statement synchronized (ob) { . . . } directs the run-time system to execute an entire block of statements atomically with respect to the object Ob. Consequently, the process may perform a multi-step transaction on ob without concern for interleaved invocations from other (concurrent) processes. Unfortunately, a compiler cannot determine if a sequence of statements should exe- cute as a multi-step transaction. Consequently, the proper use of such a feature cannot be checked by the compiler and therefore requires designer discipline to use correctly. Consider, for example, that the programmer forgot to encapsulate an intended multi-step transaction inside the synchronized (ob) { . . . } construct. The program would compile without error, and the omission might go undetected for many runs until some unfortunate timing sequence leads to an undesired interference by another process. In traditional programming languages, multi-step transactions and the participation of an operation in a multi-step transaction cannot be declared as part of a server module’s in- terface. Instead, transactions must be implemented inside client code, which means that the client programmer must know that certain sequences of server operations must be protected in order to run atomically. Moreover, the use of features like synchroni zed violates the principle of information hiding (as we have seen in Section 2.1), because the programmer 24 may have to expose information about the implementation of the synchronized code to properly use this feature. Features such as synchronized are symptomatic of the generally limited support [FP97] that popular programming languages provide for separation of concurrency con- cerns. The module interface of ob lists a set of services with no hint of constraints on the sequence of invocations of these services. Because interfaces cannot represent sequencing constraints in client-side synchronization schemes, there is no way for a compiler to detect the absence of a synchronized statement or to enforce the proper sequencing at run time. However, to avoid an error, a programmer must know to guard a sequence of ob-service invocations from interleaved ob calls by other concurrent processes. 2.5 Synchronization and Composability Multi-step transactions are often not confined to accessing the services of only a single server, as seen in Figure 4 in Section 2.3.2. Instead, they may access multiple shared ob— jects, all of which need to be locked for the duration of the transaction. We now discuss how the specification of transactions that access multiple shared objects affects the modularity and composability of code. The archetypical example for transactions that access more than one shared object is the case of the dining philosophers, where each philosopher (as a multi-step transaction) will repeatedly pick up two forks, eat with them, and put the forks down again. Textbook solutions usually involve tricky semaphore manipulations, whereas the basic constraint of the system is straightforward: Whenever a philosopher eats, he requires access to both the 25 fork on his left-hand side and the fork on his right-hand side. In a semi-formal notation, we might express this constraint as follows: eating => left_fork /\ rightjork and write the code for a philosopher as follows, where eat implements the relevant multi- step transaction: process philosopher loop think; eating 2: true; eat(left_fork, rigthork); eating := false; end end From this code and the accompanying constraint, a runtime system should be able to derive proper synchronization requirements. Now, compare how this brief, elegant require- ment translates to an ad-hoc monitor-based solution in Figure 6, taken from [BA90]. From the standpoint of understandability, the solution in Figure 6 is inferior to the smaller and more transparent solution above. Not only does the former fail to capture the essence of the problem, but the salient synchronization information has been merged with the program code in a way that makes it close to impossible to identify concurrent interac- tion. Instead of abstracting and separating this information, the information is spread across 26 monitor Fork_Monitor Fork: array(0..4) of Integer range O..2:=(others=>2); OK_tQ_eat: array(0..4) of Condition; procedure Take_Fork(I: Integer) is begin if (Fork(I) /= 2 then Wait(OK_to_Eat(I))); end if; Fork((I+l) mod 5) := Fork((I+l) mod 5)-1; Fork((I-l) mod 5) := Fork((I-l) mod S)-1; end Take_Fork; procedure Release_Fork(I:Integer) is begin Fork(I+1) mod 5) Fork((I+1) mod 5)+l; Fork(I-l) mod 5) := Fork((I-1) mod 5)+l; if Fork((I+1) mod 5)=2 then Signal(Ok_to_Eat((I+1) mod 5); end if; if Fork((I-1) mod S)=2 then Signal(Ok_to_Eat((I-l) mod 5); end if; end Release_Fork; end Fork_Monitor; Figure 6: Dining Philosophers using Monitors 27 El El Q93 Figure 7: A Composability Problem and interleaved with the computation. Observe that the encapsulation of the mechanism in a monitor does not serve to localize synchronization information. Instead, the monitor func- tions as a central authority to avoid deadlock problems, but contains code that belongs in the client, thereby violating information hiding. One indication of this violation is that adding more clients (philosophers) requires the modification of the monitor code. We recognize this problem as one of poor separation of concerns; the migration of client code to a central monitor is due to the exigency of preventing deadlocks, making it impossible to factor the code out. A side effect of this poor separation of concerns is a lack of reliability. Code such as the one in Figure 6 is hard to verify, and the effect of changes is not always apparent. Moreover, there is no mechanism in the code in Figure 6 to indicate whether we have proper mutual exclusion. Another problem encountered here is that of composability. If we were not using a cen- tral locking mechanism in the code, adding more clients would run the risk of correctness problems, most importantly race conditions and deadlocks. There are solutions that avoid the need of a central locking mechanism (as discussed in Section 2.3.2), but they are diffi- 28 cult to use correctly in an ad-hoc fashion. We cannot just connect modules together as we do for sequential code and have some confidence that the combination works as expected. This problem of composability becomes particularly visible if one considers a hierar- chically layered system. How does one specify process interaction in such a way that hidden implementation details at lower layers are not exposed to higher layers? For instance, sup- pose there are two processes P1 and P2, accessing a shared object D through separate front ends A and B (see Figure 7). Unfortunately, making D visible violates the principle of information hiding—the pro- cesses should only access the objects A and B directly, and their dependence on D should not be exposed to the implementation of P1 or P2. But if D isn’t visible to P1 or P2, multi- step transactions by P1 and P2 are likely to interfere with one another, because P1 and P2 do not know that they have to lock D for the duration of such a transaction. Moving the requirement to lock D to the code of A or B would lead to the same delocalization as observed in the previous example.2 To solve this problem, one needs to determine the set of shared objects that are affected by a transaction. This cannot be done in an ad-hoc fashion by the programmer without exposing information that is private to A and B. The best way to infer the set of affected shared objects would be to have the compiler or the runtime system infer the set of affected objects automatically or semi-automatically. To allow the compiler or runtime system to perform such inference requires in general that synchronization dependencies be declared, Le. a contractual approach be used. 2Note that while certain existing language mechanisms, such as Sather’s multi-branch locks or SCOOP, can handle transactions that access a known set of shared objects, they do not effectively handle layered systems. 29 2.6 A Classification of Synchronization Mechanisms We now turn to the discussion of non-trivial synchronization mechanisms for passive ob- jects. We call a synchronization mechanism non-trivial if it can handle the atomic acquisi- tion of multiple shared resources. Existing approaches specify the resources that are to be acquired in a variety of ways. We organize these approaches according to a hierarchy of synchronization contracts and classify the power of synchronization mechanisms according to this hierarchy. Specifically, we say that a synchronization mechanism is obviated by a family of syn- chronization contracts, if the behavior implemented by the mechanism can be specified declaratively using such contracts. The family of synchronization contracts that obviates a synchronization mechanism therefore provide an upper bound on the expressiveness of the mechanism without regards to implementation details. We adopt the following nota- tional conventions to describe synchronization contracts. An exclusion contract between a client C and a supplier S is denoted C —) S, and a conditional contract between a client C and a supplier S that depends on a predicate P is denoted C 5+ S. Also, if A and A’ are synchronization contracts, then A /\ A’ and A V A’ describe their explicit conjunction and disjunction, respectively. We do not explicitly consider parameterized contracts, because the predicate on which a parameterized contract depends is not itself a contract but rather an enabling condition on the actual contract. We begin the classification by exploring two common approaches to synchronization— two-phase locking and general resource allocation—and show that these approaches are ob- viated by the explicit composition of exclusion contracts (Section 2.6.1). Such approaches 30 provide the bottom—most tier in our hierarchy of synchronization mechanisms, although the underlying deadlock avoidance and recovery mechanisms are the foundation on which more powerful mechanisms have been built. We conclude with the survey of recent re- search into more powerful synchronization mechanisms, specifically those that provide for the explicit composition of conditional contracts, and explicit composition of contracts by disjunction. 2.6.1 Existing Approaches for Explicit Composition of Synchroniza- tion Contracts The most common of the non-trivial synchronization mechanisms are obviated by the ex- plicit composition of exclusion contracts. The client in such a contract is an active entity, such as a process or a job running within a process. Clients employ a synchronization mechanism (or collaborate with other clients according to a synchronization protocol) to gain exclusive access to one or more suppliers. Each supplier is a passive entity, such as a database record, a shared operating system resource, or in-memory data accessed as part of a session in an e-commerce application. We discuss two typical styles of such mechanisms, namely two-phase locking and resource allocation. Additionally, there are also synchronization mechanisms that are more powerful in the sense that they cannot be obviated by the explicit composition of exclusion contracts. In- stead, they can only be obviated by the explicit composition of exclusion and conditional contracts. Likewise, there exist other contractual synchronization models besides the uni- verse model that allow for the explicit composition of conditional contracts. We discuss 31 both of these types of contracts in Section 2.6.2. Two-phase Locking Schemes Two-phase locking [EGLT76] is a mechanism for consistent concurrent access to shared resources in transaction-based systems, such as databases [BK91], CORBA Concurrency Control Services [SKT+96], and multi-user collaboration systems for software develop- mentor CAD/CAM [ABAK95, Kai95]. In the two-phase locking model, a transaction con- sists of two phases. In the first or growing phase, all shared resources are acquired, while in the second or shrinking phase, all shared resources that have been acquired during the growing phase are released. Resources can be acquired either en bloc at the start of the growing phase, or on demand, as they are needed by the transaction, at any time during the growing phase. They are, however, never released before the beginning of the shrinking phase, and no further resources can be acquired once the shrinking phase has begun. During the growing phase, concurrency control mechanisms are employed to ensure that no deadlock occurs. Examples of concurrency control mechanisms for two—phase look- ing are wound-wait [RSL78], wait-die [RSL78], cautious waiting [H292], running prior- ity [FR85], immediate restart [SPG91], general waiting [BHG87] and wait-depth limited (WDL) [FHRT92]. Of these, only wound-wait is automatically deadlock- and starvation- free. Wait—die, cautious waiting, running priority, and immediate restart are deadlock-free, but not starvation-free, and must be augmented with a mechanism to avoid starvation. WDL and general waiting are not necessarily deadlock-free. To avoid deadlock using one of these schemes requires a background process [ACM87, Che95] that determines if there is a cycle of transactions waiting for one another. Because each of these algorithms vies for a fixed 32 set {R1, . . . ,Rk} of resources, they are collectively obviated by synchronization contracts of the form T —> R1 A . . . /\ T —> Rk, where the client is a transaction T. Most approaches to two-phase locking attempt to hide the details of concurrency control from the programmer in ways that do not involve synchronization mechanisms or contracts. In particular, many transaction-based systems do not require transaction code to explicitly acquire resources before using them, thus making resource acquisition transparent to the programmer. For example, the Enterprise JavaBeans specification suggests that a resource should be automatically locked the first time it is accessed [DYKOl] without being directed to do so explicitly by the programmer. Such automatic locking can eliminate the need for both explicit synchronization mechanisms and explicitly specified synchronization con- tracts. Unfortunately, these automatic locking mechanisms recover from deadlock by rolling back transactions. While transaction-based systems in general support rollback, it is prac- tically infeasible to integrate rollbacks into a general purpose programming language. Rolling back an arbitrary computation would involve storing and later recovering the en- tire state of a process, including processor registers, the processor stack, and the state of the operating system kernel with respect to the process, which in itself can be difficult or impossible [EAWJ96]. In addition, certain operations, such as 110 or the creation of pro- cesses, cannot be undone at all, and have to be delayed until no rollbacks are possible. This is a common requirement of rollback-based systems such as [Jef85], and requires pervasive changes to the language implementation and libraries. Finally, the need to store duplicate state information requires CPU and memory overhead, which can be considerable [CPF99]. In summary, synchronization contracts obviate a large class of synchronization mech- 33 anisms, as they are commonly used in transaction-based systems. Unlike mechanisms that require rollback capability, a contractual approach can be supported by general purpose programming languages. Resource Allocation in Distributed Systems Synchronization mechanisms for composite synchronization contracts have also been in- vestigated extensively for distributed message passing systems, generally under the term resource allocation. Resource allocation assumes that there is a set of processes, such that each process P has a corresponding set R of resource requirements. Most commonly, each R is Simply a set {R1, . . . ,Rk} of resources, all of which must be exclusively acquired by the corresponding process P. Deadlock-free acquisition protocols for such requirements can be found in [CM84, Rhe98, CS99]. This type of resource allocation protocol is obviated by synchronization contracts of the form P —+ R1 /\ /\ P —> Rk. Alternatively, each R can also be a set of sets {{R1,1,-- .,R1,k1},. ..,{R1,1,... aRl,k,}} of resources, where exclusive access to one of the {Ri,1,.. . ,Ri,k,-} is needed to satisfy the resource require- ments of P. Examples of deadlock-free acquisition protocols for these requirements can be found in [BB98, BBFOl]. They are obviated by synchronization contracts of the form (P—tRLl/\.../\P—+R1,k1)V...V(P->Rl,1/\.../\P—+R[,k1). 2.6.2 Beyond Composition of Exclusion Contracts In addition to the common mechanisms discussed so far, a number of more powerful syn- chronization mechanisms have also been proposed. These more powerful mechanisms are not obviated by the arbitrary composition of exclusion contracts, but are obviated by the 34 composition of exclusion and conditional contracts. We present two representative synchro- nization mechanisms [FP97, J P93] to illustrate this point. We then describe two contractual synchronization models [Mey97, H0199] that support the composition of conditional con- tracts. One of the more powerful procedural synchronization mechanisms that has been im- plemented in a programming language is Sather’s multi-branch locking statement [FP97]. Its basic syntax is: lock when LU, . . . ’L1.k1 then statemenLlistl when LLI’ . . . aLl,k, then statementJist, end When executed by a process P, this statement is used to regulate exclusive or conditional access to a set of shared resources. Each LiJ- can be either a mutex or a semantic condition. A mutex provides exclusive access to an object S i J, whereas a semantic condition provides exclusive access to an object S ,- J- once LiJ- has been acquired (if it is a mutex) or is satisfied (if it is a semantic condition).3 As soon as all of the locks and conditions of a given when branch can be locked or verified, respectively, the corresponding statementJisti is executed and has exclusive access to all SiJ. Let Aij be the exclusion contract P —+ Sij if Lij is a L,- - mutex, or the conditional contract P —‘>] Sid if LiJ- is a semantic condition. Then each when 3When we say that the mutex or condition provides exclusive access to an object, we describe the intended use of this construct. This use is not enforced by the language implementation. For example, Sather does not actually associate mutexes with specific shared resources, and it is legal to access a shared object without executing a lock statement at all. 35 branch of the multi-branch lock is obviated by the conjunction of those A ,- J- that obviate the Lij of that particular branch. The entire multi-lock statement is obviated by a disjunction of these conjunctions, namely (A1,1 /\ /\ AMI) V V (ALI A /\ AMI)- In addition to the basic syntax above, Sather also allows the statement to have an else- branch that is taken when none of the when-branches can be taken. The behavior of such an else-branch cannot be obviated by synchronization contracts that conform to our existing classification.4 Another powerful synchronization mechanism is expressed by the holdif statement of the Concurrent Extensions of Eiffel [JP93]. The syntax of the holdif statement is holdif condition then statementJist end. The condition of the holdif statement can be an arbi- trary semantic predicate involving one or more supplier objects, S 1, . . . ,5". The semantics prescribe that all the S,- are locked atomically as soon as the condition is true. Once that occurs, the statementJist will be executed. Consequently, the holdif statement is obviated by composite conditional contracts of the form P Cfld SI /\ /\ P Cfld S", where Cond is the condition of the holdif statement and P is the process executing it. Note that Cond can be a predicate that depends on the state of more than one of the Si, such as Sl.size + Sgsize 2 M, with M being a constant. A contractual approach that obviates the CEE holdif construct is described in the SCOOP extension of Eiffel [Mey97]. SCOOP expresses synchronization contracts implic- itly through the type and contract signature of an Eiffel routine. An Eiffel routine may have arguments that are shared suppliers SI, . . . ,Sn, and, optionally a precondition Pre. Such 4However, in Section 10.1.1 we will discuss how to extend that classification to cover alternative interpretations of disjunctions that would obviate the behavior of the else-branch. 36 a routine functions as a synchronization contract insofar as entry into the routine body is blocked until exclusive access to all of the S,- has been obtained and until Pre is verified. This expresses a synchronization contract of the form P ——> 51 /\ /\ P —+ Sn if the routine has no precondition, and P [if->8 51 /\ . . . /\ P €58 Sn if there is a precondition Pre. In either case, P is the process calling the routine. The synchronization rings proposed in [H0199] are another contractual approach. Con- ceptually, a synchronization ring is a wrapper around one or more shared supplier objects that intercepts access to these suppliers, permitting access only when one or more contracts (called constraints) are satisfied, and blocking access otherwise. More than one synchro- nization ring can be wrapped around a supplier, and in this case all the constraints provided by any of the rings must be satisfied for a process to be granted access to the supplier wrapped by these rings. The basic contracts that can be expressed by synchronization rings are exclusion and conditional contracts, and by wrapping multiple synchronization rings around the same supplier, they can effectively be composed by conjunction.5 An enhance- ment that synchronization rings provide over synchronization contracts as expressed in our classification scheme is that they allow for finer granularity of exclusion or conditional contracts, and can specify that two processes can access a shared supplier concurrently, if they do not interfere with one another within that supplier.6 SThe synchronization ring model also offers shorthand forms of commonly used conditional contracts as well as contracts that ensure quality of service, but these contract types are beyond the scope of this thesis. 6Concepts that can express such concurrent access to a shared supplier are beyond the scope of the universe model, but will be addressed in detail in Section 10.1.2. 37 Chapter 3 The Universe Model The universe model supports the declarative specification and run-time negotiation of ex- clusion and conditional synchronization contracts, parameterized contracts, and their com- posites. Under this model, processes operate in conceptually disjoint data spaces, which expand and contract over the lifetime of a program in order to satisfy the synchronization contracts of each process. We first introduce these ideas informally, in the context of an example (Section 3.1), followed by a more formal description of the basic concepts (Sec- tion 3.2). Contracts take the form of data invariants, called concurrency constraints, and are associated with modules that encapsulate cohesive groups of objects, called universes (Section 3.3). A run-time system negotiates these contracts by migrating universes among the data spaces of the competing processes in order to satisfy the contracts that pertain to these universes. The run-time system also schedules or suspends processes to guarantee that dataspaces contain the necessary universes while ensuring the dataspaces remain dis- joint (Section 3.4). A formal semantics of negotiation appears in [SBDO3]. In the sequel, we refine the notion of a data space into a connected, directed graph of universes in shared 38 memory, thus enabling migration among data spaces to be implemented by linking and unlinking edges. Moreover, we refine the functionality of the run-time system into a proce- dure that is invoked by each process to update its graph in a way that satisfies contractual obligations. 3.1 Motivating Example Consider the design of a web-based whiteboard, generally known as a wiki [LCOl], which is used for on-line collaboration. A wiki allows a group of authors to use a web browser to concurrently view, create, and edit text pages that are maintained in a shared repository. Each page contains marked-up text with Simple formatting instructions and hyperlinks to other pages. Figure 8 depicts the design of classes to represent the pages that can be stored and manipulated in a wiki.1 Pages can be either permanently stored on secondary storage (REAL_PAGE) or created dynamically from other data (VIRTUAL_PAGE). Virtual pages generally provide various types of information about the system, such as lists of changes, differences between two versions of a page, the local or global hyperlink structure, un- reachable pages, or resource usage. To avoid accidental deletion of data, updating a real page does not simply overwrite the old version, but stores that old version in a database, so that it can be restored at a later date. In our design, the current version of a real page is al- ways kept resident in memory, which allows for rapid browsing of pages, including global full-text searches. Of course, keeping these pages in memory requires careful concurrency 1The diagram uses the UML class diagram notation [BJR99], where boxes denote classes, lines with a triangle denote generalization relationships where the triangle points to the more general class, and arrows denote references. 39 L. ._L LLIL L. L L‘ ,L- L L IL L L L ., I REAL_PAGE f ‘ VIRTUAL_PAGE j I 7*" ""17”" I I II‘ ITATT'TTWT “ ‘rr'i ‘ I L L - - I LL L L. L L_L L L L ‘LLLLLLlL L L. . . a j . l I j . ‘ ' I ‘ I RECENT_CHANGES‘ ‘ SYSTEM_STATS I' { SITEMAP ‘ I I ‘ '- , ‘ , L L L L.,L L L. _. ._ ,, -L F L L L- L L LT L L I 3 I l . r I , L A l I . I L LL L L L L I I I~ I;~~———~——e——— I J l L L L L I J l l I l l I I ' l l L l ‘ 1 1 l I I ' , I ' ‘I I l 1 . l | L L L- Lt_LL_L ‘ ,. L L LLIL L ,.._..LL_LL L--. I ‘ l ' .‘ I DATABASE I _ .cHANGE_LOGI e e, CROSS_REFS {we . _ Ml I I L__. ‘—-—-__—... .._—__.— Figure 8: UML model of page classes in the wiki example. management to avoid data races, starvation, and deadlock. Figure 9 depicts part of the definition of class REALLPAGE in our extended version of Eiffel. The class definition is prefixed by the keyword universe (line 1), which indicates that instances of this class may participate in the negotiation of synchronization contracts. For brevity, we defer the formal definition of universe classes to Section 3.2. Lines 3—9 de- clare class attributes of which there are three different types. The attributes query_db and updating are called condition variables; they encode different synchronization states of instances of this class and may be referenced in the premise of a parameterized con- 40 l universe class REALLPAGE inherit PAGE 2 feature -- attributes 3 id: STRING -- name of this page 4 curreanource: STRING -- most recent version of this page 5 history: LIST[VERSION] -- list of valid versions 6 database: DATABASE 7 changeJog: CHANGELLOG 8 cross_refs: CROSSLREFS 9 queryLdb, updating: BOOLEAN l O feature -- methods 1 l createLnevaersion(source: STRING) is l 2 local now: TIME; version: VERSION 13 do 1 4 updating := true 1 5 now := geLcurrentLtime 1 6 version := database.addLnew_version(id, source, now) 1 7 croserefs.deleteLlinks(id, links(current_source)) 1 8 curreanource := source 1 9 cross_refs.add.links(id, links(current_source)) 2 O changeJog.register.change("New Version " , id, version, now) 2 1 history.prepend(version) 2 2 updating := false 23 end 2 4 concurrency 2 5 (queryLdb or updating) => database 2 6 updating => changeJog and croserefs 2 7 end -- class REALPAGE Figure 9: Elided definition of class REAL_PAGE in (universe-model extended) Eiffel. tract or the condition of a conditional contract. The attributes database, change_log, and cross_ref s are called universe variables; they reference objects that are instances of universe classes.2 Finally, attributes id, current_source, and history are local object references that are neither condition nor universe variables. A universe is a cohesive group of objects that migrate together during contract negoti- ation. By virtue of being declared a universe class, instantiating REALLPAGE creates not 2for brevity, the declarations of these other classes are elided in Figure 9 41 only an instance of the class but also a new universe in which the instance is placed. This will contain the actual instance of class REALLPAGE, hereafter called the root object of the universe. Here, and in the sequel, when we refer to a universe in a context that requires an object, we mean the root object of the universe. The universe will also contain any objects referenced by any (non-universe variable) attributes of an object that is already in the uni- verse. Thus, for example, the universe will also contain the objects referenced by attributes id, curren t_.s ourc e, and hi 5 tory, along with any object references in these objects recursively. Lines 11—23 define a method that creates a new version of a page. This code adds the new version to the global database, registers the change in the change log, and updates the cross reference information. Synchronization is governed by the contract, or concurrency constraint, which specifies that instances of the class may access database if queryLdb or updating is true and that the class may also access change_log and cross_ref s if updating is true (lines 25—26). Only condition and universe variables may be refer- enced in a synchronization contract. Observe that createLnew_vers ion does not use low-level synchronization mech- anisms to guarantee mutual exclusion. Instead, it simply manipulates the boolean at- tribute updating. The run-time system uses the values of updating and queryLdb3 to determine which objects need to be migrated into the data space of the current pro- cess, and this migration is performed automatically, without further intervention by the programmer. Conceptually, boolean variables such as queryLdb and updating en- 3The queryLdb attribute is not manipulated in this example. 42 code the abstract state of a universe; whereas universe references, such as database, changeLlog, and cros s_ref 5, denote shared resources that are needed by instances of REALLPAGE to implement its methods. The contract relates different configurations of the abstract state to different resource requirements. Specifically, each universe reference— database, change_log, and cross_refs—denotes an exclusion contract with the referenced universe; the predicates updating and queryLdb or updating denote different synchronization-relevant state conditions; and the combination of these entities by implication and conjunction constitutes a parameterized composite contract. 3.2 Formal introduction to the universe model A program in the universe model comprises one or more sequential processes, each of which is a thread of control whose data space, or realm, comprises a collection of universes that the process is said to own. A universe (and thus any object contained within a universe) cannot be owned by more than one process concurrently. In addition, a process cannot legally access an object in a universe that it does not own. For example, accessing the object referenced by the universe variable databas e (line 16 of Figure 9) would be illegal if this object was not in the accessing process’s realm. Our model prescribes that an illegal access results in a run-time exception. Consequently, programs written using the universe model are guaranteed to be free of so-called data races as defined in [NM92]. Processes communicate indirectly using a mechanism called universe migration, through which the owner of a universe changes at run time. Over the lifetime of a pro- gram, a universe might change owners many times. Indeed, there could be times when a 43 universe is not owned by any process at all. Process scheduling is managed by a run-time system that attempts to migrate universes so as to satisfy the requirements of synchroniza- tion contracts. A process can be scheduled only if every contract that pertains to every universe in its realm is satisfied; we call such a realm sufiicient. A programmer manipulates a universe by performing operations on a distinguished group member called the root object. Each universe has exactly one root object, and the creation/destruction of a universe coincides with the creation/destruction of its root. A root object is distinguished as such by being an instance of a universe class, which (in Eiffel) we denote using the keyword universe, as depicted in Figure 9 (line 1). Attributes of a universe class may be condition variables, universe variables, or regular attributes that are not used to declare a contract. Instantiating a universe class creates both a new object and a new universe in which to place the newly instantiated object. Instantiating a non-universe class creates a new object and places it in the same universe as the object that invoked the creation procedure to construct the newly instantiated object. During the execution of a program, a process will occasionally execute an instruction that affects the contents of its realm. In Figure 9, for example, the assignment to the instance variable updat ing on line 13 affects the realm because the contract requires database, changeLlog, and cross_ref s to be a part of the realm whenever updating is true. Likewise, the assignment on line 22 affects the realm. When a process invokes one of these realm-aflecting operations, execution of the process is suspended and the realm is adjusted by universe migration. This adjustment involves both removing unneeded universes from the realm and adding universes that are required to make the realm sufficient. If the ad- justment is successful, then the process can resume execution; otherwise, the process is 44 1 universe class EVEN T_MONITOR 2 feature { NONE } — private attributes 3 channel : CHANNEL — universe variable 4 reading : BOOLEAN — condition variable 5 concurrency 6 reading => channel when channel.has_data 7 end — class EVENTMONITOR Figure 10: Elided definition of class EVENTLMONI TOR. Notice constraint includes a con- ditional universe reference. blocked until its needs can be satisfied. 3.3 Contract language: Concurrency constraints Contracts in our model are expressed as concurrency constraints, which specify the con- ditions under which operations over objects in a given universe might access objects in other universes. A concurrency constraint is a limited propositional formula, whose atomic propositions are either conditions or universe references. 3.3.1 Conditions and Condition Verification A condition is a (full) propositional formula, whose atomic propositions are condition vari- ables. We distinguish two classes of conditions—local and remote. The syntax of a local condition is defined as follows: local_condition 2:: true | condition Var 45 | not local_condition | local_condition and local_condition | local_condition lor local_condition I local_condition => locachondition Here, the terminal symbol condition Var refers to a condition variable, and the terminal symbol true is the trivial condition. Also, the usual precedences and associativities of logi- cal operators apply. Likewise, a remote condition is a full propositional formula, whose atomic propositions are either condition variables or a remote condition variables, which are condition variables of a universe that is referenced indirectly through a universe variable. More formally: remote_condition ::= true | condition Var I universeVar . conditionVar I not remote_condition l remote_condition and remote_condition I remote_condition lor remote_condition l remote_conditi0n => remote_condition Here, the terminal symbol universe Var refers to a universe variable, and (while not shown in this grammar) a remote condition may reference at most one universe variable. For 46 example, u.x and u. y is a syntactically legal remote condition; whereas u.x and v.y is illegal because it references two universe variables (u and v). The atomic constraints of the concurrency constraint on line 6 in Figure 10 illustrate both types of conditions, where reading and channel . ha s_da ta denote local and remote conditions, respectively. A condition is verified by universes, the root objects of which provide value assign- ments to condition and universe variables that appear in the condition. Let u be a universe and c be a local condition whose free variables refer to condition variables in it. We say that u verifies c if, by substituting the value u.x for each condition variable x that is free in c, the resulting formula is logically valid. Remote conditions must be verified by two universes, a local universe (u) and a remote universe (v). Let c be a remote condition all of whose free variables refer either to a condition variable in u, a condition variable in v, or a distinguished universe variable x in u. We say that u and v verify c if, by substituting the value u.y for each local condition variable y in c, and by substituting v.y for each remote condition variable x.y in c, the resulting formula is logically valid. 3.3.2 Constraints and Constraint Satisfaction Syntactically, concurrency constraints are limited propositional formulae whose atomic propositions are either conditions or universe references, which specify mutually-exclusive access dependencies. Here, and in the sequel, we adopt a typographic distinction between conditions and constraints, using using lower-case italic letters at the beginning of the al- phabet (e.g., c, c’) to refer to conditions and capital calligraphic letters at the beginning of the alphabet (e.g., C, C’) to refer to constraints. The formal syntax of a concurrency 47 constraint is as follows: constraint ::= true I universe Var | universeVar when remote_condition | locachondition => constraint l constraint and constraint | constraint or constraint A constraint may be the trivial constant true (line 1), a definite universe reference (line 2), a conditional universe reference (line 3), a contextual constraint (line 4), or a composite constraint (lines 5-6). A definite universe reference asserts that the referent is in the same realm as the client universe to which the constraint applies. The reference to chang e_1 og in line 26 of Figure 9 is an example of a definite universe reference. A conditional universe reference also asserts that the referent is in the realm, but it contains an additional condition (called a when clause) that is used to defer the migration of a universe into a realm until the condition can be verified. For example, the expression channel when channel . haSLdata (line 6 of Figure 10) is a conditional universe reference.4 A contextual constraint of the form c => C asserts C whenever condition c holds. Definite and conditional universe references are used to declare exclusion and con- 4This example is described in detail in [SBDO3]. Also, notice that the when-clause is a new feature added after the publication of [BSOO]. It replaces: reading => queue.has_data /\ queue with reading => queue when queue.has.data. 48 ditional contracts respectively;5 whereas contextual constraints are used to declare param- eterized contracts. Semantically, we say a concurrency constraint is satisfied by a universe in the context of a set of dependent universes. Here, and in the sequel, we carefully distinguish the term verify, which we use to evaluate conditions, with satisfy (or locally satisfy), which we use to evaluate constraints. Let U be a set of (root objects of) universes, u E U be a specific object in this set, and C be a well-typed constraint that pertains to it. We say that C is locally satisfied by u in the context of U if C is logically valid under the following substitution: 1. Each local condition c that appears as the premise of a contextual constraint in C is replaced with the value: 0 true if u verifies c; and 0 false otherwise 2. Each definite (resp. conditional) universe reference v (resp. v when c) in C is replaced with the value: 0 true if u.v is either null or an element of the set U. 0 false otherwise Observe that there are two distinct cases to consider in the local evaluation of a universe reference. First, the referent could be null, in which case we interpret the reference as vacuously true.6 On the other hand, the referent might not be null, in which case we 5A conditional contract C E» S in the universe model requires that P depends only on the state of C and S, but on no other universes. 6See [SBDO3] for the rationale behind this decision. 49 interpret the reference as true provided that the universe u’ associated with the referent is contained in the context U. Observe also that an unverified when clause can only defer migration into a realm; it does not affect the local satisfiability of a constraint and thus cannot cause the referent universe to be migrated out of a realm. For example, the full constraint in Figure 10 enforces the following behavior: While reading is false, nothing is required; however, once a process assigns reading the value true, if the referent of channel is non-null, then the process may not continue unless the referent is in its realm. If the referent is not already in the process’s realm, its migration into the realm will be deferred until such time as the attribute channel . ha s_da ta is set (by another process) to true. This design not only ensures mutual exclusion; it also causes EVENTLMONITOR objects to block until the channel contains data. Moreover, once the channel is in the realm, it will remain there until the process sets reading to false, even if some operation on the channel causes channel .haSLdata to become false, which could happen for example if the process empties the channel. Thus, a when clause is merely a mechanism for defening migration into a realm; it cannot be thought of as an invariant that holds whenever the referent universe is in the realm. We use local satisfaction to formally define the sufficiency of a realm. A set U of uni- verses is sufficient if, for every u E U, we can Show that it locally satisfies its associated concurrency constraint in the context of U. Sufficiency is the basis for process scheduling and universe migration, which we now discuss. 50 3.4 Contract Negotiation and the Run-time System The universe model includes the concept of a run-time system, which is responsible for negotiating contracts and scheduling processes for execution. Under this model, the op- erational state of a process alternates between being runnable and being blocked, and a runnable process remains runnable until it executes a realm-affecting operation, i.e., it mod— ifies a condition or universe variable in some universe in its realm. A programmer using the universe model understands the synchronization effects of her program by thinking of each such operation as trapping into the kernel of this run-time system. The idea is that a realm- affecting operation could create a new universe or modify the parameters of a parameterized or conditional contract, all of which could cause active contracts to be re-negotiated. In re- sponse to such an operation, the run-time system then renegotiates any affected contracts and returns control to a process once all of its relevant contracts are satisfied. More precisely, a process is deemed runnable if its realm satisfies the following condi- tions. At minimum, the realm of a process must contain any universe whose root object is actively executing a method. That is, the realm must contain any universe whose root is the object of a method7 with an activation record on the call stack. We refer to the set of such universes as the universe stack and denote it using the symbol Sp where p is the process to which it applies. In addition to these minimum requirements, the realm of a runnable process must be sufficient and minimal. More formally, a blocked process p with realm U 7By “object of a method”, we mean the object referred to by the keyword this in C++, sel f in SmallTalk, or Current in Eiffel. An activation record is created with a this pointer that references object ob whenever a statement of the form ob . method ( . . . ) is invoked. 51 is deemed runnable if and only if: Sp Q U /\ sufficient(U) /\ -I3U’ C U | 5,, Q U’ /\ sufiicient(U’) We refer to a realm that satisfies these conditions as a complete realm, and we refer to the process of making a blocked process runnable as updating the realm. The run-time system is responsible for updating the realms of all the processes in the system. This is accomplished by migrating universes among the realms of all contending processes in order to complete as many realms as possible. Thus, the operational semantics of process scheduling are defined entirely in terms of universe migration.8 A major contri- bution of this thesis is a distributed algorithm for negotiating universe-model contracts by simulating the migration of universes among the realms of competing processes. Because real threads operate over shared memory as opposed to private data spaces, universe migra- tion need not involve physically copying the objects to new memory locations. Rather, each universe is attributed by a handle that identifies the owner process, and migration is sim- ulated by manipulating that attribute. More generally, the realm-update procedure is run independently by each process in a running system and several processes may be running the realm-update procedure for their respective realms concurrently; universe migration is implemented as a protocol of universe acquisition and release, such that processes will fairly negotiate for contended universes and a process will block if it cannot acquire all of 8For a detailed treatment of these semantics see [SBDO3]. 52 the universes it needs; and acquisition and release are implemented by manipulating data in shared memory. The challenge to implementing such an algorithm is to efficiently rep- resent realms and to design the algorithm so that processes running the algorithm avoid starvation and deadlock. 53 Chapter 4 Realm update Our realm-update procedure, which is invoked when a process performs a realm-affecting operation, renegotiates the contracts of the invoking process and returns once this renego- tiation concludes and the realm of the process is once again complete. Before delving into the algorithms used in this procedure, we first show how a realm is a directed graph of universes that satisfy a reachability property and that we can represent the realm to which a given universe belongs using a marking strategy (Section 4.1). Our algorithms maintain the reachability property using depth-first and breadth-first search procedures that consult and update these markings. The algorithm that releases unneeded universes is based on con- cepts from garbage collection [Wil92], specifically the use of reference-counting collection with cycle detection [BAR+ 03] (Section 4.3). By contrast, the algorithm that acquires new universes (Section 4.4) combines several ideas to fairly migrate universes while avoiding or automatically recovering from a large class of deadlock conditions (Chapter 5). 54 4.1 Run-time representation of realms To motivate our implementation of realms and the algorithms that maintain them, we begin by viewing the set of universes in a running system as a directed graph. Let 9 represent the global universe graph, whose nodes correspond to the universes in a running system, and each of whose directed edges corresponds to a link from a source to a referent universe through a universe variable. Then a complete realm RP of a process p is a subgraph of g that contains the universe stack Sp and all nodes reachable via a path of so-called accessible edges. An edge u —> v E Q is accessible if v is in a minimal set of universes that are needed to locally satisfy u’s constraint. In such a case, we also say that v is accessible from u. For instance, if it has the constraint (c :> v1) /\ (fic => v2) and c is true, then the edge u ——> v1 is accessible, whereas it —> v2 is not accessible. The function accessibles returns a set of target nodes that are accessible from a given source node: function accessibles( u : Universe ) returns Set[Universe]; This function is computed based on the concurrency constraint associated with u and the values of condition variables in 11. For efficiency, our compiler generates a custom accessibl es function for each universe class using a symbolic encoding based on binary-decision diagrams (BDDs) [Bry86]. Note that accessibles (u) is only well- defined when u is in the realm of the current process. Because concurrency constraints may contain disjunctions, the set of accessible targets is non—deterministic. The implementation of acces s ib l e s resolves the non-determinism by choosing from among the multiple possible choices and remembering the choice. More 55 precisely, the choice made by accessibles (u) persists until u witnesses a realm- affecting operation or is migrated to another realm. The extent of a realm can be computed using reachability algorithms, such as breadth- first search, by recursively exploring the accessible edges that emanate from nodes on the universe stack. In the sequel, we present algorithms that update a realm dynamically, fol- lowing a realm-affecting operation. To support these algorithms, each universe contains an owner attribute, which records the process that currently owns the universe, and a reference count, which records the number of universes in the same realm from which this universe is accessible. More precisely, the reference count of a universe u, with respect to a process p, is the number of accessible edges v ——> u where u is the target universe, plus one if u 6 SP. These attributes are accessed using the functions owner and RC respectively. The function raccessibl es (it) (short for “realm” accessibles) returns the set of universes that are in the set access ibles (u) and also in the same realm as u (i.e., for which the owner is the same is the owner of u). To implement the algorithms that follow, we rely on a few basic synchronization prim- itives. First, we assume that every universe u and every process p has an associated mutex, which can be locked by the instructions LOCK (u) or LOCK (p) , respectively, and similarly be unlocked by the instructions UNLOCK (u) or UNLOCK (p) . Second, to handle waiting for conditions, we assume the presence of an abstract wait until primitive with the follow- ing syntax and semantics. wait until condl then statementl or until condg then statementg 56 or until condn then statement" end A wait until statement suspends the current process until one of the conditions condi is true, at which time the corresponding statement,- is executed. We assume that when a process p executes a wait until statement, checking for conditions is an atomic operation enclosed in LOCK (p) . . .UNLOCK (p) statements. If more than one condi is true, the first alternative in the list is chosen. We have designed the condi so that a straightforward semaphore-based implementation is possible. 4.2 Realm-update procedure A realm-affecting operation causes an implicit release of a set R- of universes that are no longer required and an implicit request for a set R+ of new universes needed to complete the new realm. The semantics of the universe model guarantee that every universe in R’ U R+ is reachable from the universe w that witnessed the realm-affecting operation. Our realm- update procedure computes and releases R‘ using a procedure called realm contraction and then computes and acquires R+ using a procedure called realm completion. Both R+ and R‘ can be derived from the change to accessibles (w) and the change to the universe stack 5,, between the previous realm update and this one. The change to accessibles (w) could make edges inaccessible that were previously ac- cessible from w or vice versa. Likewise, the change in 5,, could make inaccessible those universes that are no longer on the stack, or make accessible those universes that have been 57 added to the stack since the previous realm update. Thus, to update a realm requires know- ing: (1) the value of accessibles (w) immediately before and immediately after the operation, and (2) the contents of the universe stack immediately following and immedi- ately preceding the previous and current realm-update operations respectively. Let oldLus tack (p) be the universes in Sp immediately following the previous realm update, and let us tack (p) be the set of universes in 3,, upon invocation of the current update procedure. Let 0 ldLacc be the set of universes accessible from w after the previous realm update and newLacc be the set of universes accessible from w at the beginning of this update (i.e., immediately following the most recent realm-affecting operation). R— is then the set of universes that become inaccessible as a result of deleting edges from w to universes that occur in oldLustaCk (p) or oldLacc, but not in us tack (p) or new_acc. Conversely, R+ is the set of universes that become accessible as a result of adding edges from w to the universes that occur in us tack (p) or new_acc, but not in oldLustack (p) or Ochacc. This process is implemented by procedure updateRealm: proc updateRealm( witness : Universe, ochacc : Set[Universe], new_acc : Set[Universe], p : Process) is var garbage, newlinks : Set[Universe]; begin garbage := (old.ustack(p) + oIdLacc) - (ustack(p) + new_acc); 58 newlinks := (ustack(p) + new_acc) - (oldLustack(p) + oldLacc); contractRealm(garbage, p); completeRealm(witness, newlinks, p); oIdLustack(p) := ustack(p); end The procedure first computes sets garbage and newl inks, which describe those uni- verses that have become disconnected or have been made immediately accessible, respec- tively. Procedure contractRealm is invoked to compute and remove R‘ from the realm and completeRea 1m is invoked to compute and add R+ to the realm. Finally, the uni- verse stack is saved for the next invocation of updateRea 1m. The remainder of this chap- ter describes the realm contraction and realm completion procedures in detail. 4.3 Realm contraction A realm is contracted by unlinking edges between universes that, as a result of a realm— affecting operation, are no longer accessible from either the witness or the universe stack. If, after being unlinked, a universe is no longer the target of any accessible edges, then the uni- verse is inaccessible to the process and is thus released and made available for migration to other realms. Of course, releasing one universe could unlink others, which may cause more universes to be released. This behavior is reminiscent of the way in which dynamically allocated memory is returned to the heap using reference-counting approaches to garbage collection. Our realm—contraction algorithm borrows from recent results in this area, specif- 59 ically reference-counting with cycle detection [MWL90, JL96, BAL+OI , BAR+O3], which we now briefly review. Simple reference-counting approaches to garbage collection cannot handle cycles of garbage because nodes in a garbage cycle will have non-zero reference counts even when these nodes are not reachable. To handle cycles, a collector must treat each node with a non-zero reference count as a potential root in a garbage cycle. The algorithms of Bacon et a1. [BAR+O3] detect cycles by performing a depth-first search of each potential cycle root, subtracting the reference counts to see if they go to 0. If the reference count of every visited node goes to zero, then a garbage cycle has been discovered, and every such node can be released. By contrast, the existence of a node with a non-zero reference count refutes the existence of a garbage cycle involving that node; thus any reference counts that were decremented during cycle detection must now be repaired. We adapted these algorithms to the needs of the universe model. Figure 1] depicts our realm-contraction algorithms. The function unl ink uses two supporting functions—dstecRef and dstncRef—to remove a link to a universe and return the set of universes that need to be released as a result of removing this link. Function df sDecRef performs a depth-first traversal of all universes reachable from a given universe u. When a universe is visited, its reference count is decremented by one, and it is added to a buffer of visited universes, which is returned to the caller. Upon completion, all universes reachable from u will have been visited along every realm—accessible edge. Moreover, the reference count of each visited universe will have been decremented by the number of incoming edges that were traversed. Function df sIncRef is used to repair the reference counts of universes that are not 60 members of a garbage cycle. Following dstecRef, any universe u E nodes with a non-zero reference count is still reachable; thus any universe v reachable from u is not part of a garbage cycle. The set of all such v is discovered by a depth-first traversal, originating at u. A side-effect of this traversal is to increment the reference count of each such v, thus repairing it. Observe that the function will be initially invoked on a universe with a non- zero reference count, and the intention is to increment the reference count of each universe reachable from this source but not increment the count of the universe itself. We achieve this effect using a parameter inc, which defaults to 0 when the function is invoked without providing a value for this parameter, but which is always supplied with the value 1 when the function is invoked recursively. The unl ink function works in three phases. First, df sDecRef is invoked to decre- ment the reference count of every universe reachable from the universe referenced by pa- rameter t along every reachable edge. These universes are returned in the set nodes. Many of these nodes will have reference counts of 0; however, some of these counts may be in- correct, as df sDecRef may have decremented the reference count of a universe whose reference count is unaffected by the unlinking of t. The for loop discovers and repairs the count of such universes by first identifying those universes s 6 nodes with non-zero refer- ence counts and then incrementing the reference counts of all universes reachable from each such 3 (using df sIncRef (s) ). Each invocation of df sIncRef unmarks and returns a set of universes, each of which has a reference count greater than 0, and these universes are removed from nodes. Notice that the loop iterates over a copy of the set nodes and thus the iteration is unaffected by the removal of unmarked universes from nodes in the body of the loop. When the loop terminates, every universe that remains in the set nodes will 61 function dstecRef( u : Universe ) returns Set[Universe] is var visited : Set[Universe] := O; begin RC(u) := RC(u) - 1; if (-v Marked(u)) then Marked(u) := true; insert u into visited; for (s E raccessibles(u)) do visited := visited U dstecRef(s); end end return visited; end function unlink( t : Universe ) returns Set[Universe] is var nodes : Set[Universe]; var changed : bool; begin nodes := dstecRef(t); for (s 6 nodes) do if (RC(s) 75 O and Marked(s)) then nodes := nodes - dfslncRef(s); end end return nodes; end function dfslncRef( u : Universe, inc : Natural := 0 ) returns Set[Universe] is var liveNodes : Set[Universe] := Q; begin RC(u) := RC(u) + inc; insert 11 into liveNodes; if (Marked(u)) then Marked(u) := false; for (v e raccessibles(u)) do liveNodes := liveNodes U dfslncRef(v, 1); end end return liveNodes; end proc releaseAll( nodes : Set[Universe], p : Process) is begin for (s 6 nodes) do release(s, p); end end proc contractRealm(garbage : Set[Universe], p : Process) is begin for (g E garbage) do releaseAll(unlink(g), p); end end Figure 11: Functions and procedures for realm contraction. 62 have a zero reference count. This set is returned to the caller. The function contractRealm modifies the realm by first determining which targets of the witness universe should be unlinked and then calling unl ink on each such uni- verse. The function releaseAll merely invokes another function, release, on every universe in the set returned by unl ink. Observe that whereas every function in Figure 11 operates within the realm of a given process, these functions do not require additional synchronization logic. By contrast, release modifies resources that are used by realm- completion functions; thus we defer its treatment to the sequel. Finally, our implementation actually incorporates several optimizations of these algo- rithms. First, because most realms tend to be trees, df sDecRef is optimized to handle tree structures more efficiently. If the initial universe has a reference count of 1 prior to being marked, then the universe can be released immediately and not inserted into the set returned by df sDef Ref. This same optimization can be applied to all children recur- sively, provided that no child has a reference count greater than 1 prior to marking. In addition, function contrac tRea 1m optimizes its calls to unl ink by performing all of the df sDecRef operations over the universes in garbage before performing any of the dfslncRef operations. This improves the efficiency of contraction in the case when a cycle spans the set of universes reachable from two or more of the universes in garbage. 4.4 Realm completion A realm is completed by linking universes that, as a result of a realm-affecting operation, are now accessible. Linking a universe u may require linking universes that u depends upon 63 proc completeRealm( w : Universe, proc link( u : Universe, newlinks : Set[Universe], p : Process, p : Process ) is re : Natural := l) is var u : Universe; begin begin LOCK(u); witness(p) := w; if (owner(u) = p) then pending(p) := 0; RC(u) := RC(u) + rc; for (s E newlinks) do else link(s, p); acquire(u, p, re); end end; while (-realmComplete(p)) do UNLOCK(u); u := nextAcquired(p); end bstink(u,p); end end function bstink( r : Universe, p : Process ) is begin for (c e accessibles(r)) do link(c, p); end end Figure 12: Functions and procedures for realm completion. to satisfy its concurrency constraint. Thus, our algorithms (Figure 12) operate by travers- ing (breadth first) edges that are accessible from universes in the realm—specifically from a witness universe w or from a universe that has appeared on the universe stack since the previous realm update. This initial set of universes to traverse is provided by the param- eter newl inks, supplied by the call from the procedure updateRealm. As with any recursive graph traversal, our algorithm employs a marking strategy, where a universe is considered to be marked if it is part of the realm of the process whose realm is being com- pleted. Function 1 ink attempts to link a universe u using rc as the initial reference count for 64 u.1 A marked universe is linked by incrementing its reference count; whereas an unmarked universe is linked by acquiring it. Note that l ink needs to be atomic to avoid interference by other processes and thus locks the target universe for the duration of its execution. Because a process may not be able to immediately acquire a universe, acquisition is a two-step protocol of negotiation. Figure 13 depicts the different steps of negotiation as states and transitions in an hierarchical labeled-transition system (LTS).2 Process p begins negotiation for a universe u by first requesting to acquire u, as indicated by the transition acquire. If u is being used by another process, then p’s negotiation for u stalls in the Requesting state, where it cannot proceed until the other process finishes using it. Later, when the process that owns u releases it (by calling procedure release), that process will check to see if any other processes have requested acquisition for u and then explicitly notify one of these requesters. In the sequel, we describe how procedure acqui re handles this case by registering a request for u and then returning, so that p may try to acquire other universes while waiting for u. In Figure 13, this notification is depicted by the ACQUIRED transition, after which we say that p owns it. Note that for simplicity in our algorithms, we follow this two-step negotiation even when a requested universe is not in use by another process. Such acquisitions will also complete with a notification to the requesting process, but in this case the notifying process is the same as the requesting process. For every call to acquire, the caller (i.e., completeRealm) is obliged to wait for the notification. This is accomplished by invoking the nextAcquired function, which 1When invoked from bf sLink, the value of re will always be 1. In the sequel, we Show that link is also used after a process is forced to surrender a universe, which may occur when the universe is the target of multiple links in the realm. 2using the UML state-diagram notation [BJR99]. 65 fi - - Negotiating [Requesting ACQUIRED ={ Owned ) x J acquire finalize {Unused} Figure 13: Protocol of negotiation by a process for a universe. waits for a notification and returns the acquired universe. Once acquired, a universe is considered marked, and the search proceeds by traversing its outgoing accessible edges. This is accomplished by recursively invoking bf sLink on the newly acquired universe. After every such edge is traversed, the search terminates, and the realm has been extended by R+. We coordinate this traversal by recording, for each process, the number of pending acquisition requests (i.e., pending (p) ), and testing for termination using the function r ea lmComp l e t e, which consults this pending attribute. 4.5 Preventing Starvation and Deadlock As processes negotiate for universes, care must be taken to prevent starvation and deadlock. To this end, we adapt and integrate existing deadlock avoidance strategies and deadlock re- covery heuristics to mediate negotiations among processes for universes. Ultimately, all of these algorithms are implemented in terms of the four primitive functions—acquire, nextAcquired, realmComp lete, and rel ease—used by our realm-update proce- 66 dure. Because these algorithms collaborate by manipulating data in shared memory, there are many special cases to consider, and thus the algorithms are complex. A key concept in our understanding and development of these algorithms is the negoti- ation abstraction, which supports two dimensions of refinement that help to clearly specify and validate our algorithms. The first dimension is a form of feature refinement [FKOl], whereby a state-based specification is extended with new behavior by adding new states and transitions but never removing any of the existing states or transitions. Using feature refinement, we were able to incrementally extend the basic negotiation protocol in Fig- ure 13 to incorporate: 1. an advanced deadlock-avoidance strategy (Section 5.1); 2. an advanced deadlock-recovery heuristic (Section 6.1); and finally 3. support for negotiating conditional contracts (Chapter 7). Each of these three refinements involves expanding a state in the less refined specification to include substates and adding a small number of transitions into and among these substates. In addition to supporting feature refinement, it is also straightforward to apply more traditional data refinements that demonstrate how the abstract states and transitions in a ne- gotiation protocol are implemented as concrete data structures and algorithms. In our case, the data structures attribute the Proces s and Universe objects to which the negotiation refers, and the algorithms correspond to case logic that fleshes out the four functions we need to implement to complete the suite of realm-update procedures. Observe, in discus- sions, that we will often refer to an individual negotiation, as if it is an object as opposed to an abstract state of collaboration among multiple objects. We adopt the notation (p, u) to 67 refer to the negotiation by process p for universe u. This simplifies presentation by enabling us to talk about a particular negotiation being “in” a given state or witnessing a transition, but bear in mind that negotiation is only an abstraction and that there is no single run-time data structure that localizes this state. In presenting the refinements, we often need to say that a negotiation is “in” a given state. We denote the current state of a negotiation by adjoining the process and universe identifiers to the name of the name of the abstract state. For example, Owned(p,u) says that negotiation (p, u) is state Owned. Most of our data refinements relate predicates over concrete data structures to the abstract states of a negotiation. For example, the notation: Owned(p,u) 9— owner(u) = p /\ emigratingiu) specifies that negotiation (p, u) is in the state Owned if and only if the owner attribute of u equals p and the migrating attribute of u is false. We structure the presentation of our algorithms around these two dimensions of re- finement. Specifically, each of Chapters 5, 6, and 7 concerns a feature refinement of the negotiation protocol and algorithms of the section that precedes it. Moreover, each section is organized so as to first present the specification of the negotiation protocol at that level of feature refinement followed by a data refinement that shows how to implement that spec- ification. Before continuing, we introduce one final concept that will be used in all of the subsequent refinements. All deadlock-prevention strategies work by coercing a process to relinquish resources that it has previously acquired. In the universe model, contractual obligations limit the 68 universes that a process can relinquish. We define the core of a process p, written Core(p), to be the universes in Realm(p) that cannot migrate out without violating a contract that is in force prior to (and is unaffected by) the operation that caused the realm to be updated. When a process is executing user code, Core(p) = RP; whereas when a process is updating its realm, Core(p) = Rp\R", i.e. the set of universes that remain in the realm of the process after realm contraction. Also, note that our negotiation protocol represents the membership of a universe in the core of a process. Specifically, when a process p completes its realm (i.e., when the function realmComplete returns true), all negotiations for universes in R; will finalize, which means they all simultaneously transition into the final state Core. 69 Chapter 5 Feature: Deadlock Avoidance Our first refinement extends the negotiation protocol to allow multiple concurrent processes to acquire multiple (non-core) universes while avoiding starvation and deadlock. The pro- cedure borrows ideas from a resource-allocation strategy called wound—wait, under which processes can preemptively steal resources allocated to other processes [RSL78]. The idea is that when two or more processes attempt to acquire the same universe, priority is given to the oldest process. We say that a process p is older than another process q if p began its most recent realm-update operation earlier than q. Suppose, without loss of generality, that p is older than q. If q attempts to acquire a universe currently held by p, then q waits for p. Conversely, if p attempts to acquire a universe currently held by q, then p wounds q by signaling q to surrender the universe. Following this scheme, and provided that processes compete only for non-core universes, the oldest process in the system will eventually be able to acquire all the universes it needs and thus complete its realm. Successively, the same then applies to the second-oldest process, the third-oldest process, and so on. 70 5.1 Specification Based on this strategy, we refined the negotiation protocol to allow processes to com- municate by exchanging messages with one another. Under this protocol, inter-processes communication is asynchronous, and there are two kinds of messages—ACQUIRED and SURRENDER, both of which are parameterized by a reference to the universe under con- tention. Intuitively, a process p sends a message of the form ACQUIRED (u) to a process q to signal that q is the new owner of u. Observe that p and q may be the same process, which might happen if, when attempting to acquire a universe, the process discovers that there are no contending processes. In contrast to signaling acquisition, a process p sends a message of the form SURRENDER (u) to a process q 75 p to signal q to relinquish ownership of u. Figure 14 refines the negotiation protocol to incorporate the wound—wait strategy. Re- questing is now an hierarchical state with two sub-states. Upon entering Requesting(p,u), if no other process can legally claim ownership of u, we say u is migrating to p; otherwise we say that p is waiting for u. Entry into Migrating(p,u) coincides with a process sending an ACQUIRED (u) message to p. Thus, migrating implies that there are no obstacles to p acquiring u other than for p to actively receive and handle the message, and negotiation transitions to Owned once the message is handled. By contrast, Waiting(p,u) implies that there are obstacles to p acquiring u, e.g., if another process currently owns u.l Observe that when Migrating(p, u) is entered from Unused or Owned, p is both the sender and the receiver of the ACQUIRED (u) message. However, if the state is entered via the enable 1There is another kind of obstacle, namely that p is negotiating a conditional contract for u and the condition cannot be verified. At this level of feature refinement, we are only concerned with the negotiation of exclusion contracts. We deal with the unverified condition obstacle in Chapter 7. 71 fNegotiating f . W R eques 1 mg SURRENDER {Unused} (““126 [mg acquire (Migrating L J x J finalize Figure 14: Refinement of protocol in Figure 13 to avoid deadlocks using wound—wait strat- egy. ACQUIRED transition, then the message will come from a process other than p. The refinement includes two transitions to implement the wounding of one process by a higher-priority process. Suppose a process p attempts to acquire a universe u that is owned by another process q, such that u ¢ Core(q). If p is older than q, then p will send a SURRENDER (u) message to q coincident with entry into Waiting(p,u).2 The SURRENDER(q,u) transition is taken when q receives and acts on this message. In ad- dition to causing q to release ownership of u, this transition can engender orphan(q, v) transitions in other negotiations (q, v) where u 74 v. This can happen, for example, if v is accessible from u and the release of it causes v to be released. We refer to this process as orphaning a universe and note that a universe could be orphaned at any time during 2This phenomena is not depicted in the diagram. 72 negotiation. Finally, the negotiation protocol assumes some inter-negotiation synchronizations that are not depicted in the diagram. For example, the transition SURRENDER(q, u) is co- incident with the transition enable(p, u) when q releases universe u and p, which was waiting for u is sent the ACQUIRED (u) message. Note that sending the ACQUIRED mes- sage is not depicted in the diagram. A similar synchronization occurs when p is waiting for u E Core(q), and q executes a realm-affecting operation that causes it to release u (during realm contraction). In the sequel, we will refine the responsibility for such inter- process synchronizations into operations over universes (which are passive objects) rather than processes, which only communicate asynchronously. Finally, all of the negotiations for a given process must synchronize on the finalize transition. In the sequel, we refine the responsibility for this intra-process synchronization into an atomic operation on the the process object, without reference to any of the universe objects. 5.2 Data Refinement We now describe the primitive data structures and functions used to implement the protocol presented in Section 5.1. One category of primitives refines the abstract states of negotiation into concrete data structures that can be queried and manipulated by our algorithms (Sec- tion 5.2.1). Another category of primitives implements the message-passing infra-structure (Section 5.2.2); while a third describes some additional book-keeping attributes that are necessary for advanced deadlock avoidance. For brevity, all of the functions introduced in this section appear in the appendix. 73 5.2.1 State refinements Our specification (Section 5.1) refers to several concepts that are used to define observable states in the protocol. These include process priority, the core of a realm, and the various negotiation sub-states in Figure 14. We refined these concepts into concrete data Structures that attribute processes and universes, thus allowing efficient representation and observation of these abstract states. Both process priority and membership of a universe in the core of its realm are rep- resented using time starnps. Each process contains a start time, which is the time when the process began its most recent realm-update operation. The relative age of two pro- cesses can be established by comparing these times, e.g., if startLtime (p) is less than s tartLtime (q) , then p is older than q. Likewise, each universe contains an acquisition time, which records the start time of its owner at the time the owner began to migrate the universe into its realm. To test if a universe is in the core of its owner’s realm, one can simply check if the acquisition time of the universe is less than the start time of its owner. When this check returns true, we can infer that the universe was acquired by a previous realm-update operation, which means it must be in its owner’s core. The start time of a process is also used to order processes that are competing to acquire the same universe. We associate with each universe a priority queue with which to buffer requests for acquisition. Each queue element is an ordered pair that comprises a process identifier and a reference count. Elements are ordered by the start time of the process co- ordinate, such that the head of the queue is the oldest process (i.e. the process p with the lowest value of startLtime (p) ). We use the attribute rqu (u) to access the priority 74 queue associated with universe u. In addition to the standard enqueue and dequeue op- erations, these priority queues support an operation contains, which determines whether a given process is waiting inside the queue, and an operation updateQueueRC, which ad- justs the reference count associated with a process that is waiting in the queue. Finally, our algorithms must distinguish among the various states of negotiation, as de- picted in Figure 14. We distinguish between states Migrating and Owned by associating a boolean attribute migrating, with each universe u. This attribute is set when a uni- verse begins to migrate into the realm of a process and reset once the process receives the ACQUIRED message and completes the migration. Negotiation is in the Waiting state if a process is in the request queue of the universe. More formally:3 ||> Migrating(p,u) owner(u) = p /\ migrating(u) ||> Owned(p,u) owner(u) = p /\ -migrating(u) Waiting(p,u) fi contains (rqu (u) , p) 5.2.2 Inter-process messaging We implement inter-process messaging using FIFO queues and two procedures—s ignal and getMes 5 age. A process sends a message to a receiver process by invoking procedure 5 ignal, which is parameterized by the process, the message and the receiver. Messages are stored in a message queue associated with the receiver. The message queue of a pro- cess p is accessed using the attribute ms gQ (p) . Function getMessage retrieves the next 3The pair (owner (u) , migrating (u) ) must be tested and set atomically in order to avoid race conditions. This can be accomplished by encoding the process identifier of the owner and the migrating flag in a single machine word. 75 message in the associated process’ queue, blocking if the queue is empty.4 To monitor the progress of realm completion, we attribute each process with the number of pending requests for universes that the process has requested but does not yet own. The value of pending (p) is incremented each time p attempts to acquire an unused universe and decremented each time p successfully negotiates ownership of a new universe. Once pending (p) reaches zero and all messages have been processed (i.e., ms gQ (p) is empty), then the realm is complete. Before concluding, note that because our protocol is asynchronous, messages might become stale while they are waiting to be processed. A message ACQUIRED (u) (or SURRENDER (u)) grows stale when, in the interval between when the message is sent and when it gets processed, the receiver is forced to surrender u from its realm. Observe that a process might surrender a universe indirectly, as a side-effect of being forced to sur- render another universe. Suppose, for example, that a process receives a SURRENDER (v) message and that there is a realm-accessible edge v —> u to another universe it. Then re- leasing v will decrement the reference count of u, and if this count goes to zero, u must be released. If u is released, then any pending messages of the form ACQUIRED (u) or SURRENDER (u) are stale. We handle stale messages by explicitly checking for staleness whenever we retrieve a message from the queue. 4Pseudocode for both of these procedures appears in the appendix. In Section 6.3, we refine the getMessage function to incorporate deadlock recovery. 76 5.2.3 Additional primitives Finally, the attribute blocked (p) contains the set of all universes on which p is blocking indefinitely. A process is blocking indefinitely on a universe if that universe is in the core of another process or has been requested by an older process. The process is blocking indefinitely in this case because it requires an action by the other process to release the universe. A universe is added to this set by calling recordBlocked (p) and removed from it by calling recordUnblocked (p) . Note that while the value of blocked (p) is not necessary for the basic algorithm, it will be used in the advanced deadlock avoidance scheme to determine potential deadlock. 5.3 Algorithms We defined the universe-acquisition protocol from the point of view of a given process to be the orthogonal composition of independent negotiations associated with that process. The completeRealm algorithm (Figure 12) simulates the composition of these independent negotiations by successively issuing requests to acquire universes and then looping over the acknowledgements of these acquisitions (possibly issuing new requests) until, by some measure, the realm is determined to be complete. We use this (event-driven) sequential control structure to simulate the concurrent execution of negotiations within a given pro- cess. Moreover, we assign responsibility for implementing the various negotiation transi- tions (Figure 14) to the procedures acquire, nextAcquired, and realmComplete, which are used as primitives in the completeRealm algorithm (Section 4.4). We present these algorithms by first defining acquire and nextAcquired (Sec- 77 tion 5.3.1). In terms of the negotiation protocol, acquire implements the acquire tran- sition, and nextAcquired implements ACQUIRED and SURRENDER for negotia- tion (p, it). By contrast, function completeRealm implements the simultaneous tran- sition finalize(p,u) for all u for which there exists a negotiation (p, u), using function realmComplete to decide when it is appropriate to take the transition. The implemen- tation of realmComplete is trivial and appears in the appendix. To implement SURRENDER in nextAcqui red, we use an auxiliary function called surrender (Section 5.3.2). This procedure implements the orphan transition. Pro- cedure surrender also uses the release procedure, which we first referenced in contrac tRea 1m. The algorithm for release (Section 5.3.3) implements the enable tran- sition, thus completing the algorithmic refinement of the negotiation protocol. 5.3.1 Universe Acquisition Universe acquisition is a multi—step process, initiated by invoking the procedure acqui re and completed by invoking the function nextAcquired, both of which are depicted in Figure 15. Procedure acquire is invoked with three parameters—a reference u to the universe to be acquired, a reference p to the process wishing to acquire u, and a refer— ence count rc with which to set u once it becomes part of the realm of p. Negotiation then begins, entering the Requesting state, in which there are two cases to consider. If u has no owner, then it is available for migration, in which case migration is initiated by calling enterMigrating, which sets the owner, RC, acquisitionLtime, and migrating attributes, and then sends an ACQUIRED (u) message to p. Alternatively, 78 proc acquire( u : Universe, p : Process, rc : Natural ) is begin if (owner(u) = 1) then pending(p) := pending(p) + l; enterMigrating(u,p,rc); else enterWaiting(u,p,rc); end; end proc enterMigrating( u : Universe, p : Process, rc : Natural ) is begin Marked(u) := false; migrating(u), owner(u) := true, p; RC(u) 2: re; acquisition.time(u) := starLtime(p); L0CK(P); signal(p, ACQUIRED, u); UNLOCK(P); end proc enterWaiting( u : Universe, p : Process, rc : Natural ) is begin request(u,p,rc); if (-1 wound(U.p)) then recordBlocked(u,p); end end proc enterOwned( u : Universe, p : Process) is begin migrating(u) :2 false; pending(p) := pending(p) - l; recordUnblocked(u, p); end function nextAcquired( p : Process ) returns Universe is var msg : MessageType; var u : Universe; begin loop (msg, u) := getMessage(p); if (msg = ACQUIRED) then if (inMigrating(U.p)) then enterOwned(u, p); return u; end else /* msg = SURRENDER */ if (inOwned(u.P)) then surrender(u, p); end end end end function wound( u : Universe, p : Process ) returns boolean is var wounded : boolean := false; begin if( start_time(p) < acquisition.time(u) A P = head(rqu(U)) ) then LOCK(owner(u)); if (u a! Core(owner(u))) then signal(owner(u), SURRENDER, u); wounded := true; end UNLOCK(owner(u)); end return wounded; end Figure 15: Functions and procedures for acquiring a universe while avoiding deadlock. 79 if u has an owner, then negotiation enters the Waiting state, which is accomplished by invoking the procedure enterWai ting. Within enterWai ting there are two cases to consider: Either p has priority over the current owner of u, in which case p can preemptively steal u from its owner, or p must block, waiting for the current owner (and possibly other competing processes of higher priority) to release u. In either case, p must register in the priority queue of u, which is accomplished by invoking the procedure reques t, whose implementation appears in the appendix. After registering p with u, p then attempts to wound the owner of u. If p lacks the priority to wound the owner of u, then p must wait for the owner to release u. In this case, we say that p is blocked on u, and we record this fact by invoking the function recordBlocked. This information will be used in the version of these algorithms that incorporates deadlock recovery (Chapter 6). I Function wound first checks to see if the process p can legally preempt u from its owner. For the preemption to be legal, (1) p must have higher priority than the owner of u; (2) p must be at the head of the request queue of u; and (3) u must not be in the core of its owner’s realm. The pseudo-code incorporates some optimizations. If p has higher priority than the owner of u, then the start time of p must be less than the start time of the owner of u. This condition can be efficiently checked by comparing the start time of p against the acquisition time of u, because the latter value equals the start time of owner (u) , unless u is in the core of its owner’s realm. Observe that owner (u) must be locked so that the test for u being in the core and the subsequent signal to surrender are atomic. If conditions (1)—(3) are satisfied, then p signals the owner to surrender u. If any of these checks fails, function wound returns false. 80 Function nex tAcqui red is responsible for handling messages sent to a process. Each call to this function retrieves and returns the next ACQUIRED message in the calling pro- cess’ message queue. To find an ACQUIRED message, nextAcquired might encounter zero or more SURRENDER messages, each of which are handled without returning from this function. If the queue becomes empty prior to finding an ACQUIRED message, then nextAcquired will block. ACQUIRED messages are meant to trigger a state transition from Migrating to Owned. To handle such messages, nextAcquired first checks to see if the current state of negotiation is Migrating (by invoking function inMigrating). If so, negotiation en- ters the Owned state (by invoking enterOwned), and u is returned to the caller of nextAcquired. Observe that the check is necessary because, as described in Sec- tion 5.2.2, the message could have become stale while it was waiting to be serviced. By the same reasoning, nextAcquired must also check that SURRENDER (u) messages are not stale. Commensurate with the protocol, such a message is only meaningful when negotiation is in the Owned state, which we check using the function inOwned (defined in the appendix). Finally, procedure enterOwned completes the migration process by setting migrating (u) to false and recording that p no longer blocks on u, if that was the case before. In addition, the count of pending acquisition requests is decremented. 81 proc surrender( u : Universe, p : Process ) is var rc : Natural; unlinked : Set[Universe]; begin if (-: migrating(u)) then for (s E raccessibles(u)) do unlinked := unlink(s); releaseFringe(unlinked, p); releaseAll(unlinked, p); end releaseAll(accessibles(u) - raccessibles(u), p); end rc := RC(u); release(u, p, re); link(u, p, rc); end proc releaseFringe( nodes : Set[Universe], p : Process) is begin for (s 6 nodes) do releaseAll(accessibles(s) - raccessibles(s), p); end; end Figure 16: Procedures for surrendering a universe to another process. 5.3.2 Universe Surrender Function nex tAcquired delegates responsibility for handling the transition from Owned to Requesting to a procedure called surrender, which causes a process p to release and then immediately re-request a universe u. Once u is released, a higher-priority process will be able to acquire it, and p will then wait for u to become available again. Of course, if p owns or is currently requesting other universes that are accessible only from u, then these other universes must also be released. Thus, surrendering a universe is similar to removing all incoming accessible links to the universe; however, there are two important differences. First, the reference count of u prior to the surrender must be stored somewhere so that RC (u) can be properly reestablished once p re-acquires u. Second, and most impor- tant, the algorithms developed in Chapter 4 for recursively releasing inaccessible universes 82 only traverse links between universes that are realm accessible (i.e., both the source and the target of the link is in the realm). However, when p is instructed to surrender u, p might be negotiating for universes that are accessible but not yet realm-accessible from u. Thus, the surrender algorithm must handle this case separately. Figure 16 depicts the algorithm for procedure surrender, which releases and then immediately relinks a universe u. Notice that surrender may be invoked when (p, u) is in state Migrating or state Owned. While no action other than releasing and relinking u is necessary in state Migrating, in state Owned, p will have already requested universes accessible from 11. Moreover, the realm of p might comprise accessible edges that are not yet realm accessible, because the targets of these edges have not yet been acquired. We refer to the targets of such edges collectively as the fringe of the realm. Because unl ink only traverses realm-accessible edges, the set of universes that it returns will not include fringe universes. Thus we must compute (and then release) these fringe universes separately. The procedure releaseFringe computes and releases fringe universes that are accessible from a set of universes that are known to be in the realm, after which all accessible edges originating at u will have been unlinked. Finally, u is released and then immediately re- linked. Figure 17 contains a step—by-step example that illustrates the process of surrendering a universe. In this figure, black circles denote owned universes, gray circles denote requested universes (the fringe), and white circles denote unused universes. Each universe has an as- sociated reference count listed next to it. Arrows denote accessible edges. Arrows become dashed when they are unlinked (in function df sDecRef or release) and the reference count of the target universe is decremented. Dashed arrows become solid again when the 83 Emmw From rug... 5.5. .22 a new 3.4 —u0m amuse... .22 a new aroma From Fuom From accuses .22 H F 35 8:23:50 3...... Figure 17: Example: Surrendering a universe 84 link is reestablished (in function df sIncRef) and the reference count of the target uni- verse is incremented. The example begins with the process receiving a SURRENDER message for universe s. Because s has a realm-accessible edge s —> b, function unl ink is invoked on b, which then results in df sDecRef being called on b, as depicted in Step 1. Observe that (if sDecRef does not traverse the edges d —> h or f —> g, since the target universes are not in raccessibles (d) or raccessibles (f), respectively. Thus the set returned by df sDecRef (b) is {b, c, d, e, f} Out of these universes, only c has a reference count > 0. Thus, in step 2, df sIncRef is called on c, which reestablishes the edges c —) e and c —-> f. At this point, b and d are the only universes that are set to be released. In Step 3, however, the call to releaseFringe on the set {b, d} releases universe h. Step 4 then releases the universes b and d. Finally, in step 5, s is released, followed by an immediate re-acquisition attempt via 1 ink, making it a requested universe. 5.3.3 Universe Release Both contractRealm and surrender use the utility procedure release (Figure 18) which, when invoked on a universe u, decrements the reference count of u and migrates u out of the realm if appropriate. There are three cases to consider: Either (p, u) is in state Wailing, Migrating, or Owned. If waiting, then p is not the owner of u, and the only action to take is to update the entry in the request queue of u. This update is accomplished by procedure rel easeFromQueue, which appears in the appendix. Altemately, if waiting or owned, then p is the owner of u. In either case, we decrement the reference count of u, 85 proc release( It : Universe, proc migrateNext( u : Universe ) is p : Process, var newOwner : Process; rc : Natural := l ) is newRC : Natural; begin begin LOCK(u); if (n empty(rqu(u))) then if (owner(u) 79 p) then ( newOwner, newRC ) :2 releaseFromQueue(u, p, re); dequeue(rqu(u)); else enterMigrating(u, newOwner, newRC); RC(u) := RC(u) - rc; else if (RC(u) _<_ 0) then owner(u) := _L; if (migrating(u)) then end pending(p) := pending(p) - 1; end end migrateNext(u); end end; UNLOCK(u); end Figure 18: Procedures for releasing a universe from a realm. and if it goes to zero, then u must be migrated out of the realm. This is accomplished by invoking procedure migrateNext. If (p, u) was in state Migrating on entry to release, then the number of pending requests in p must be decremented to reflect the release of u. Procedure migrateNext is responsible for actually migrating a universe out of the realm of its current owner and into the realm of a new owner if another process is waiting to acquire it. If there are waiting processes, the function retrieves a new owner and refer- ence count from the request queue and initiates migration to this new owner by invoking enterMigrating. If the queue is empty, then the owner is set to J. to denote that the universe is unused. 86 Chapter 6 Feature: Deadlock Recovery We now refine the deadlock-avoiding protocol with capability that prevents a larger class of deadlock situations. Suppose that processes p and q are contending for a universe v, that q is also contending for a universe u E Core(p), and that q is older than p. Then p can neither release u nor preempt q to acquire v, and the resulting deadlock cannot be avoided using the wound-wait strategy. Our next refinement incorporates a heuristic that can automatically recover from deadlocks in which, for any processes p and q such that q is older than p: RgflR;#QJ /\ RgflCore(p)7€® /\ RjflCore(q)=@ That is, when there is at least one universe that both p and q need to acquire, the universes that q seeks to acquire may overlap with p’s core, but not vice versa. The heuristic borrows ideas from an approach that was first informally introduced in [BBD82] and later refined under the name cautious waiting [HZ92]. Under cautious waiting, a process that requests a resource will wait for the owner of the resource only if this owner is not also waiting. 87 Should the owner be waiting, the requestor will restart its operation by first releasing and then attempting to re-acquire its resources. Once released, these resources can be acquired by other processes, thereby breaking the deadlock. We apply this heuristic to recover from deadlocks caused when a hi gh-priority process (the requestor) attempts to acquire universes in the core of a lower-priority process (the owner). 6.1 Specification To add cautious waiting to our protocol required precisely formulating a restart behavior and a restart condition that triggers this behavior. Because restarting a process is expensive, we designed the condition to forestall triggering the behavior as long as possible, relying on avoidance capability to prevent the majority of deadlocks. This condition optimistically assumes a process is not waiting for another waiting process, barring substantial evidence to the contrary. We formalized the condition into a direct waiting predicate, that can be refuted by any negotiation in which a process is engaged, and we formalized the behavior as the simultaneous transition of all such negotiations. Section 6.1.1 refines the individual negotiation protocol with new states and transitions that simplify the specification of the predicate and the effect of the behavior. The direct-waiting predicate is an approximation of a more intuitive condition (Section 6.1.2), which is difficult to compute efficiently. We show how the direct-waiting predicate approximates this more intuitive condition to yield an effective and efficient test (Section 6.1.3) 88 6.1.1 Negotiation Refinement Figure 19 refines the negotiation protocol to support checking the restart condition and effecting the restart behavior. Waiting is now an hierarchical state with two sub-states. Upon entering Waiting(p, u), if no process with higher priority is negotiating for u and u ¢ Core(owner(u)), we say that p is preempting the owner of u; otherwise, we say p is blocking on u. A preempting negotiation can reasonably expect to transition into Migrating in the near term; whereas a blocking negotiation could block for some indeterminate amount of time. In the sequel, the existence a preempting negotiation is sufficient (but not necessary) to refute the restart condition. Restarting a process p consists of p: (1) releasing any universe u such that Negotiating(p, u), (2) artificially lowering its priority by updating its start time, and then (3) attempting to reacquire all of its released universes. This operation can break a deadlock by making all of p’s universes available to competing processes while preventing p from immediately re-acquiring them by preemption. We refer to p’s release of and low-priority re-request for a universe u as reordering the negotiation (p, u). We specify the reordering of a negotiation using a new transition called reorder, which transitions into Negotiating. While it might seem that reorder(p, u) should transition into Waiting, there are cases in which, after reordering, (p, u) could be either migrating or preempting. If u is not under contention by other processes then p will reacquire it despite its lowered priority. Altemately, because restart is not an atomic operation, other processes could complete their realms and then release a universe that was under contention prior to the completion of the restart operation. 89 reorder F . . \ Nego tratrng (R . \ eq u es t ”'9 SURRENDER r .. \ Warring [PreemptinQZfiBlockingj L y [ Owned J“ enable Mi ratin \ [ g gj ACQUIRED \ J x J orphan acquire finalize {Unused} [@l Figure 19: Refinement of protocol in Figure 14 with cautious waiting for deadlock recovery 6.1.2 Restart Condition: Conceptual Definition We define the restart condition in terms of a process’s belief in its ability to continue making progress in negotiation. To make the condition precise requires some new terminology. Let p and q be distinct processes, and let W be a set of universes. We say p directly waits for W relative to q, if: 1.w¢e 2. p has requested all of the universes in W but has no reason to believe it can continue negotiating for them until they are released by q; and 3. there is no smaller set W’ C W that satisfies conditions 1 and 2. If such a W exists, we refer to it as p’s wait-for set relative to q. For example, suppose p requests a set U of universes. Then p’s wait-for set relative to q, will contain U n Core(q) and any universes in U for which q is a contender with higher priority. As this example illustrates, the wait-for sets of a given process need not be disjoint. Likewise, we say p indirectly waits for for W relative to q if: 1. p directly waits for W relative to q; and 2. there exists a set X and a process r such that q directly waits for X relative to r. Observe that p and r might refer to the same process, which demonstrates indirect waiting is necessary (but not sufficient) for even the simplest form of (unavoidable) deadlock. Our reformulation of the cautious waiting heuristic can therefore be stated as follows: 91 Indirect waiting heuristic. Restart any process p whose set of outstanding re- quests is the union of wait-for sets relative to processes for which p is indirectly waiting. In the sequel, we refer to the condition that triggers the restart as the indirect waiting con- dition. Checking the indirect-waiting condition is complicated by two factors. First, it could require the run-time system to check if the owner of each requested universe is directly- waiting on another process, a potentially expensive operation if the number of outstanding universes is large. Second, even if the inspection of each universe is inexpensive, the inspec- tion of the whole set is not an atomic operation. To make the check atomic would add to the complexity and expense of the overall algorithm. To avoid such potentially complex and expensive operations, we instead implement a safe approximation of the indirect-waiting condition in terms of sustained direct waiting over a time interval. 6.1.3 Restart Condition: Effective Definition We approximate the check that a process p indirectly waits for some indeterminate process if there exists a process q, such that p directly waits for q and has continued to do so for some (heuristically determined) time T E R+. This approximation works because direct waiting that is not caused by a deadlock will be resolved in finite time; whereas direct waiting due to deadlock will persist until the deadlock is broken. Thus, our approximation of the indirect-waiting heuristic can be stated as follows: 92 Direct waiting heuristic. Restart any process p whose set of outstanding re- quests is the union of wait-for sets relative to processes for which p has been directly waiting for some time interval T 6 HF“. In the sequel, we refer to the condition that must persist for time T as the direct-waiting condition and denote it mathematically via the function: directWaiting : Process ——> B001 The direct-waiting condition is a safe approximation of the indirect—waiting condition in the sense that the former will detect any indirect waiting that persists for at least T time units. Indirect waiting that does not persist for at least T time units could not have contributed to an unavoidable deadlock. On the other hand, the approximation can yield false positives, which incur unnecessary restart operations. While not desirable, false positives are safe because a restart cannot introduce any new direct or indirect waiting conditions, and thus no new deadlocks. To precisely formulate the direct-waiting condition, we analyze the state space of nego- tiation to determine the different ways in which a given negotiation can refute the condition. Then, using the pigeonhole principle, we define directWoiting(p) to be the absence of any of these refuting cases. Let U be the set of universes with which a requesting process p is negotiating, and let P Q U be the set of pending requests, i.e., the set of universes that p has requested but does not yet own. Certainly, if P is empty, then p cannot be directly waiting, i.e.: P = Q => fidirectWaiting(p) (6.1) 93 Suppose further that P is not empty, and consider a universe u E P. Then the negotia- tion (p, u) can be in only one of three states—Preempting, Blocking, or Migrating. If preempting or migrating, p has reason to believe it can continue negotiating for u. More formally: (3 u : P o Migrating(p, u)) => fidirectWaiting(p) (6.2) (3 u : P o Preempting(p, u)) => fidirecIWaiting(p) (6.3) Thus, for directWaiting (p) to be true, it is necessary (but not sufficient) for the negotiation corresponding to every outstanding request be in state Blocking. A negotiation (p, u) in Blocking can only transition out of blocking via an orphan or an enable transition. Thus, for p to believe it can continue negotiating for u, p must believe that one of these transitions is imminent. There is really no way to predict when a blocking negotiation will be enabled, as this timing is a function of the computation being performed by the owner of u and also the number of contending processes with priority over p. By contrast, the orphaned transition may occasionally be predictable. Recall that (p, u) can be orphaned if p is negotiating for another universe v, from which it is reachable, and (p, v) undergoes a SURRENDER transition. Always the optimist, we treat the occur- rence of a SURRENDER transition as a potential generator of orphan transitions in other negotiations. Thus, if there were some way to detect that a SURRENDER transition was imminent, such would refute the direct-waiting condition. More formally: (3 u : U \ P o imminentSurrender(p, u)) => fidirectWaiting(p) (6.4) 94 Because the direct waiting condition is defined in terms of universes that p has requested, but does not yet own, equations 6.1. 6.2, 6.3, and 6.4 cover all of the possible refutations of directWaiting(p). That is: fidirectWaiting(p) <=> P = G (6.5) V El u : U \ P o imminentSurrender(p, u) (6.6) v 3 v : P o Migrating(p, v) (6.7) v 3 w : P o Preempting(p, w) (6.8) We call the negation of this formula the direct-waiting predicate, but in order to simplify the data refinement below, we delay this negation. 6.2 Data Refinement We now apply several identities and logical manipulation to refine the direct-waiting pred- icate into an efficient predicate over the attributes of a process object. Observe first that a SURRENDER(p, u) transition is imminent if there is a SURRENDER ( it) message waiting on the message queue of p. More formally: imminentSurrender(p, u) (it SURRENDER (u) 6 mng (p) (6.9) Moreover, a negotiation (p, u) transitions to Migrating coincident with the insertion of an ACQUIRED (u) message onto the message queue of p. By combining this knowledge with 95 lines 6.6 and 6.7, and by duplicating the disjunct in line 6.7 and rewriting it alongside line 6.8, we get: fidirectWaiting(p) 4:} P = Q V size (mng (p) ) 21 v 3 v : P o Migrating(p, v) v Preempting(p, v)} Then by negating both sides, we get: directWaiting(p) <:> empty(mng(p)) /\ P¢Q /\ Vv : P o Blocking(p, v) Finally, observe that algorithms developed in the previous section support the blocking- preempting distinction, as we associate with each process p a set blocked (p) that con- tains the universes for which p could not immediately acquire or preempt. More formally: Blocking(p,u) 4:» ueblocked (p) By simple substitution, we derive that the direct-waiting condition is true if and only if empty(mng (p) ) /\ pending (P) > O 96 /\ pending (p) = size(blocked (p) ) Thus, the restart condition can now be checked by testing that the above condition persists for a sufficiently long time. 6.3 Algorithms We integrate our modified cautious-waiting heuristic into the deadlock-avoiding algo- rithms by refining the getMes sage primitive. Figure 20 depicts the refinement, called getMessageCooperatively, which replaces getMessage in the overall system. In contrast to getMessage, getMes sageCOOperative 1y checks the direct-waiting predicate concurrently with waiting for message reception, and invokes a restart operation if predicate is satisfied. This concurrent check is handled by the alternate conditions of the wait until primitive. The first alternative is taken when there is a message in the message queue for process p, in which case the message is dequeued and returned in a fashion iden- tical to the original getMessage function. The second alternative is taken only if the first alternative fails (per our definition of wait until). Consequently, we know that: (l) the message queue is empty and (2) pending (p) > O.1 This case checks the remaining conjunct (i.e., that the size of the blocked set equals the number of pending requests) and that the timeout has expired. If these conditions are true, then process p must be restarted. The restart procedure must take care to avoid starvation. Consider, for example, that process p directly waits for process q, which directly waits for another process r due to lbecause getMessageCooperatively is called from nextAcquired only if (1) is false or (2) is true. 97 function getMessageCooperatively( p : Process ) returns MessageType x Universe is begin loop wait until (-w empty(mng(p))) then LOCK(P); msg, u := dequeueMsg(mng(p)); UNLOCK(p); return ( msg, u ); or until (size(blocked(p)) = pending(p) A timeout(p)) then start_time(p) := currenLtimeO; dstestart(witness(p), p); increaseTimeout(p); end end end proc dstestart( u : Universe, p : Process ) is begin for (s E accessibles(u)) do if (—. marked(p,s) /\ 5 ¢ Core(p)) then if( reorder(p,s) ) then dstestart(s, p); end; end; end; end function reorder( p : Process, u : Universe ) returns boolean is var pOwnsU := false; begin LOCK(u); mark(P.U); if (owner(u) = p) then acquisition_time(u) := starLtime(p); pOwnsU := empty(rqu(u)); UNLOCK(u); if ( —. pOwnsU ) then surrender(u, p); end else reorderQueue(rqu(u)); UNLOCK(u); end; return pOwnsU; end Figure 20: Refinement of function getMes sage to support deadlock recovery. conflicting requests on a single universe u. Such a situation will trigger a restart, causing p to surrender and then attempt to re-acquire u. Moreover, after surrendering u, p will update its start time, thereby lowering its priority. Now assume that immediately after p surrenders u, but before p can update its start time, the (OS-level) scheduler gives both q and r enough time to successively acquire, release, and re-request u. Then when p is scheduled and tries to re-acquire u, it will once again find itself indirectly waiting on u relative to q, and this will trigger another restart operation. Theoretically, this can continue indefinitely, leading to starvation. To avoid starvation, we manipulate the timeout interval T. Observe that if T is large enough, starvation cannot occur, because p will be able to acquire the universes released by r and q before an indirect waiting condition is asserted by the expiry of the timeout. Of course, if T is too large, then the time it takes to recover from a real deadlock will be at least T. To reconcile these concerns, T is initially set to a small value and increased after each restart. Eventually, T will be large enough so that in our example p will not restart before r and q have completed. At this point p becomes the highest-priority process and can preemptively acquire the universes under contention and complete its realm. A detailed discussion of this starvation-prevention mechanism can be found in [FP97], along with advice on how to select values for T. Thus, restarting a process comprises three steps. First, the start_time of p is updated. Second, the procedure dstestart is invoked upon the witness of the last realm-affecting operation. Its purpose is to perform a depth—first search of RE, starting at witness (p) , updating the attributes of each universe to reflect the reordering of (p, u). Finally, the timeout of p is increased. 99 Function df sRe s tart has two parameters: a universe u at which to start the traversal and the restarting process p. The for-loop and outermost if—condition implement a depth- first search with a marking strategy. Core universes are considered to be marked by defaultz, and non—core universes are marked using the procedure mark, whose implementation we elide for brevity. When an unmarked universe s is traversed, we attempt to implement the reorder(p, 3) transition, after which either p owns 3 or it does not. If p owns s after the re- ordering, we invoke the traversal recursively, as negotiations involving universes accessible from s will need to be reordered. Function reorder (p , u) implements the transition reorder(p,u), returnng true if p owns it after the transition. There are three cases to consider. First, suppose the owner of u is p. In this case, we have to update the acquisition time of u to reflect the repiioritization of p. If u is not under contention, (i.e., the request queue of u is empty), then there is nothing else to do to reorder (p, u), and reorder returns true to indicate that p (still) owns u. Otherwise, u it must be surrendered.3 If p does not not own u, reordering (p, u) reduces to modifying the entry for p in the request queue of u to reflect to the new (lower) priority of p. 2Because restarting a process changes its start time, the test for a universe being in the core must refer to the initial start time assigned at the beginning of realm update. In our implementation, we provide a process attribute called initia1.start_time (p) , which is consulted by functions dstes tart, surrender and wound. 3Observe that the surrender operation checks independently that the universe is not in the core. Therefore, the acquisition time is updated in this case as well, so that surrender will not mistakenly believe that the universe is in the core. 100 Chapter 7 Feature: Conditional Contract Negotiation We conclude our treatment of contract negotiation by refining the algorithms to handle conditional contracts. To motivate this treatment, observe that exclusion contracts are sub- sumed by conditional contracts in that an exclusion contract of the form u is semantically equivalent to a conditional contract of the form u when true. By analogy, a conditional contract with a (non-trivial) condition behaves exactly like an exclusion contract when the condition is verified. However, if the condition is not verified, then any migration that would have occurred as a result of negotiating the corresponding exclusion contract must be deferred until the condition can be verified. Negotiations that are deferred have much in common with negotiations that are blocking; thus we generalize our model of the negotia- tion of exclusion contracts to handle conditional contracts by refining what it means for a negotiation to be blocking. Suppose p owns a universe v whose constraint contains a conditional universe reference 101 reorder . . \ rNegotiating r . \ Requesting r . . N Waiting r j B’Wking SURRENDER (Non Dfrdl Q’reempting): fl r Deferred de er ( Owned )- L F‘“ J Fa \ J enable defer “9,533:— ACQUIRED \ J L J orphan acquire finalize >l Unused l—— Core Figure 21: Refinement of protocol in Figure 19 to negotiate conditional contracts. This diagram depicts the most refined negotiation protocol. 102 of the form u when c for some (possibly trivial) condition c. Then, the negotiation (p, u) is blocking if 1. p lacks the priority to link the edge v —> u into its realm; or else 2. u and v fail to verify c. To say that p has priority to link it means either no other process currently owns u, or u ¢ Core(owner(u)) and p is the oldest process contending for u. Observe that if c is the trivial condition, this definition reduces to the definition of a blocking negotiation for an exclusion contract, again as expected. A deferred negotiation is one that is blocking because of condition (2). Figure 21 depicts the final (most refined) model of the negotiation state space. In this refinement, state Blocking partitions into substates Deferred and Noanrd‘. Briefly, a ne- gotiation is in state Non Dfrd if the blocking is caused by condition (1) above; whereas the negotiation is in state Deferred if the blocking is caused by condition (2). Observe that a negotiation over an exclusion contract can only block in state Noanrd; whereas a nego- tiation over a (non-trivial) conditional contract can block in either Noanrd or Deferred. Observe also that in a deferred negotiation, p will have checked condition c and found it not verified by u and the client of the contract; whereas in a non-deferred but blocking negoti- ation, p will not have checked c and will thus have no knowledge of its truth assignment. Figure 21 also depicts two new transitions, both named defer, from Owned and Mi- grating into Deferred. These transitions arise when p negotiates for u to satisfy multiple contracts, one or more of which are exclusion contracts and one or more of which are con- 1short for non-deferred. 103 ditional. Consider the case where p, while negotiating an exclusion contract, has acquired (or is in the process of acquiring) u. If p then attempts to negotiate a non-trivial conditional contract whose referent universe is u, and if the u and the client fail to verify the referent condition, then p must release u, and (p, u) must enter Deferred. That is, unsatisfied con- ditional contracts trump exclusion contracts that refer to the same universe, for otherwise p would hold onto u but could not modify it to make the condition true. To implement deferred negotiations and the transitions into and out of state Deferred requires associating with the negotiation any referent conditions of contracts that pertain to the universe under negotiation. Because these conditions are specific to a given negoti- ation, and not merely to a given universe, we store the set of them as part of the tuple that is placed in rqu (u) (i.e., in addition the process identifier and reference count). When a process releases a universe 14 and looks to the request queue of u to discover which ne- gotiation to enable, the process must first check to see if the relevant conditions are true. These additional checks involve straightforward additions to procedures 1 ink, acquire, enterWaiting, and release. Finally, observe that unavoidable deadlocks may arise due to deferred negotiations. However, because deferred is a substate of Blocking, the algorithms presented in Chapter 6 will handle those deadlocks. 104 Chapter 8 Eiffel Language Extensions for the Universe Model We extended the Eiffel programming language with language constructs and libraries that support the universe model and implemented a compiler for these extensions. This chapter documents the language features available to a programmer who wishes to use this com- piler. The core part of the language extension is the reification of universes (Section 8.1) and processes (Section 8.2) by means of language constructs and library classes. These language constructs and the supporting library classes allow the definition and instantiation of universes, processes, and their realms. Section 8.3 describes our handling of global data. Local data exists on the stack of a process and can always be accessed in a thread-safe fashion; heap data is partitioned into universes and access to it is regulated by the mech- anisms of the universe model. Conversely, access to global data—which, by definition, is accessible to all processes—requires special precautions. Finally, the universe model re- quires special assignment semantics (Section 8.4). Under these semantics, it is sometimes 105 necessary to implement the assignment of an object reference to a variable by copying the referent in order to prevent inadvertent sharing of data between two processes. The chapter concludes with a summary of the new language features (Section 8.5). 8.1 Defining and Creating Universes The reification of universes involves declaring universes via language constructs and the creation of runtime instances of universes. We declare universes by writing universe classes. Instantiating a universe class creates both an object, called a universe root, and a new uni- verse, whose sole member is the universe root. In this section, we first describe how uni- verse classes can be defined (explicitly, via inheritance, or by means of a type modifier) and then define the semantics of the universe creation process. 8.1.1 Defining Universes A universe class is defined by prefixing a class definition with the keyword universe. Thus, the definition universe class U . . . end creates a universe class named U. A universe class definition can also declare one or more concurrency constraints, intro- duced by the keyword concurrency. For example, the following class definition defines a universe class U2 with a concurrency constraint . 106 universe class U2 concurrency < constraint> end Concurrency constraints follow the abstract syntax defined in Chapter 3. Operators have the standard Eiffel precedence and associativity [Mey92]. The new when operator is not associative and has a higher precedence level than the and operator and lower precedence than the comparison operators in standard Eiffel. The new => operator is synonymous with the existing implies operator and has the same precedence and associativity. In addition to the concurrency constraint language defined in Chapter 3, our extension allows a convenient shorthand for arithmetic expressions. For example, consider the uni- verse class definition in Figure 22, which provides a simple emulation for a low-level reen- trant mutex mechanism. This class provides methods to lock and unlock the universe ref- erenced by the variable target_universe. Calls to lock and unlock can be nested, and a depth variable ensures that the universe is released only if the same number of calls to unlock as to lock have occurred. The source code in the example is complicated, because the code must keep track of the nesting depth and update a boolean condition vari- able accordingly. Figure 23 shows a simpler implementation. Rather than having an explicit condition variable, the expression depth > O is used, greatly simplifying the code. Generally, any legal Eiffel expression that uses only variables and constants of type BOOLEAN, INTEGER, REAL, and DOUBLE, arithmetic operators (including addition, sub- 107 (DQONU'lrbLMNH i—‘l—‘l—‘H tutor—sou) NI—‘l—‘l—‘l—‘l—‘H OkomxlOWU'lib NNN carol—s universe class REENTRANTLOCK feature targeLuniverse: UNIVERSE_BASE locked: BOOLEAN depth: INTEGER feature lock is do if depth = 0 then locked := true end depth := depth + 1 end unlock is do depth := depth - 1 if depth = 0 then locked := false end end concurrency locked :> targeLuniverse end -- class REENT RANT _LOCK Figure 22: A reentrant mutex 108 l universe class REENTRANT_LOCK 2 feature 3 targeLuniverse: UNIVERSE_BASE 4 depth: INTEGER 5 feature 6 lock is 7 do 8 depth := depth + l 9 end -- lock 1 O unlock is 1 1 do 1 2 depth := depth - 1 13 end -- unlock l4 concurrency 15 (depth > O) :> targeLuniverse l 6 end -- class REENT RANT _LOCK Figure 23: Improved reentrant mutex traction, multiplication, division, and modulo), and comparison operators, can be used in lieu of a boolean condition variable. The compiler will treat each such expression as though there were a separately declared boolean condition variable that is updated each time the expression changes. 8.1.2 Universe Classes and Inheritance Universe classes can be inherited by other universe classes. When one universe class (the child) inherits from another universe class (the parent), the child inherits the instance vari- ables, methods, class invariants, and concurrency constraints defined in the parent. For example, a universe class can inherit from the class in Figure 22 and use the lock and unlock methods to acquire and release exclusive access to the target_universe 109 without having to redefine the concurrency constraint. In the case of multiple inheritance, the inherited constraint is the logical conjunction of the constraints of all parents. Like any other class, a parent universe class can also be deferred, which means that the class de- clares one or more abstract operations. (Such classes are often called abstract classes in other languages.) If a universe class does not explicitly inherit from another universe class, then it implic- itly inherits from the special class UNIVERSE_BASE, which is provided by our standard library. This predefined class provides the instance variables and methods necessary to im- plement the realm update algorithm (Chapter 4). Any universe class is either a direct or indirect descendant of UNIVERSE_BASE. Therefore, a variable that is defined to be of type UNIVERSE_BASE can hold a reference to any universe root in the program. Inheritance provides a convenient way for the programmer to reuse concurrency and synchronization patterns (such as the one embodied in the REENTRANT_LOCK class). Thus, inheritance allows a software designer to easily reuse predefined client-side syn- chronization logic. 8.1.3 Alternate Universe Declaration It is sometimes inconvenient to define a new class to declare a universe. For example, if a universe class contains no concurrency constraint, then its definition differs from that of a normal class only by the presence of the keyword universe. As a convenience feature, our language extension provides a shorthand notation for such simple universe classes. Given an existing type T, the type declaration universe T declares a universe class with 110 the same class signature (i.e., methods, instance variables, and class invariants) as T. For example, the declaration feature x: universe HASlLTABLE; declares an instance variable x of a universe class that has the same class signa- ture as HASILTABLE and the concurrency constraint true. As with any other uni- verse class without an explicit inheritance clause, this class also implicitly inherits from UNIVERSE_BASE. 8.1.4 Creating Universes Universe classes are instantiated in the same fashion as any other Eiffel class, using either the create instruction or the alternate ! ! syntax. Given an instance variable feature x: universe T; where T has a constructor make with no arguments, each of the following instructions create a universe. create x.make !! x.make After either instruction has completed, x will contain a reference to the universe root of the newly created universe. 11] Observe that for the constructor (i.e., make) to be called, the newly created universe must be part of the realm of the process that executes the create instruction. Otherwise, the process would be accessing an object outside its realm. The language implementation must therefore adjust the realm of the current process to include the newly created universe. This adjustment occurs after the universe and the universe root have been created, but before the constructor is called. Similarly, after the creation instruction has been completed, the universe must again be excluded from the realm. Otherwise, it might be impossible for other processes to acquire the newly created universe. More precisely, we adjust the realm both upon entry into and upon exit from the con- structor. Upon entry, the newly created universe is pushed on the universe stack (cf. Sec- tion 3.4) and the realm update algorithm is invoked to add the universe to the realm. The constructor is then executed. Upon return from the constructor, the universe is removed from the universe stack and the realm update algorithm is invoked again to remove the universe from the realm. 8.2 Defining and Creating Processes Like universes, processes are reified by declaring them via language constructs and then creating run-time instances. The declaration must provide the code that the process will execute. The runtime instantiation consists of both a representation of a process1 in the underlying operating system and a realm within which the process will execute. 1As before, we include light-weight processes and threads in the notion of operating system processes. 112 8.2.1 The Constituent Parts of a Process To define the code that a process will execute, the programmer must create a class that inherits from the predefined class PROCESS_BASE (an excerpt of which is presented in Figure 24) and that defines a s tart method, which contains the code that the process will execute. For example, Figure 26 defines a class EVENTMONITOR. A process executing the start method defined in EVENT_MONITOR will enter an infinite loop, constantly retrieving events from an event channel and handling them, but never terminating. Creating a running process from such a descendant of PROCESS_BASE comprises two steps, the creation of an initial universe to populate the realm of the process, and the creation of an operating system process that executes the code provided by the s tart method. The first step is performed by creating an instance of the descendant of PROCESS_BASE. This instance will be the universe root of the initial universe with which the realm of the process will be populated. We call it the process root of the process. Second, a process in the underlying operating system must be created that will then execute the start method of the process root. This step is performed by creating an instance of the standard library class PROCESS, which represents operating system pro- cesses. (An excerpt of the class can be seen in Figure 25.) This class has a constructor called wi th_roo t, whose sole argument is the process root of the new process. Invoking the constructor will create an operating system process that will populate its realm with the supplied process root and will then execute the root’s s tart method. The precise semantics of the wi th_root constructor are as follows. Given a variable process of type PROCESS and another instance process_root of a universe class 113 l deferred universe class PROCESS_BASE 2 feature 3 start is deferred end 4 end -- class PROCESS_BASE Figure 24: The PROCESS_BASE class that inherits from PROCESS_BAS E, the instruction create process.with_root(process_root) creates a new operating system process. This new process then performs the following operations. First, it pushes the universe referenced by process_root on the universe stack and invokes the realm update algorithm. The result of the realm update will be that the process root is inserted into the realm of the new process. Second, the new process calls proces s_root . s tart. This operation is safe, because process_root is now part of the realm of the new process. Third, once the start method terminates (either normally or by an uncaught exception), the process releases all universes in its realm and terminates itself. Note that the original process (i.e., the one that executed the create instruction) will not block while the new process performs these steps, but will continue concurrently. The result of the create instruction in the original process will be that proces 5 now references an instance of PROCESS that provides a handle for the new process. Thereafter, process can be used to manipulate the attributes of the new operating system process (such as its priority). Creating processes in two independent steps is necessary to gain maximum control over the details of process creation. In particular, the two-step process allows the programmer to obtain a reference to both the process root of the process and the operating system rep- 114 1 class PROCESS 2 creation with_root 3 feature { NONE } 4 with.root(process_root: PROCESS_BASE) is 5 do 6 7 end 8 end -- class PROCESS Figure 25: The PROCESS class resentation of the process. Otherwise, if the creation of a process root (i.e., an instance of a descendant of PROCESS_BAS E) implicitly also created a new operating system process, it would not be possible for the programmer to reliably identify or otherwise reference that process, because the only output of the constructor of the process root is the process root itself. If the flexibility of the two-step process is not needed, the programmer can design simpler custom procedures to create processes. 8.2.2 An Example of Process Creation For a non-trivial example of process creation, consider Figure 28, which uses the EVEN'ILMONITOR class from Figure 26 and the EVENT_GENERATOR class from Fig- ure 27. The EVENT_GENERATOR class generates events and inserts them into an event channel. At the other end of the event channel, an instance of EVENT_MONI TOR retrieves the generated events from the channel and processes them. Initially (lines 12—14), instances of the event channel, monitor, and generator are cre- ated. Each instance is a universe root, and channel is passed as an argument to the con- structors of the monitor and generator universe classes, so that they can reference it. Once 115 ooq oxmtbwror—s 11 12 13 14 15 16 17 l8 19 20 21 GJQONUWerNi-J KO l—‘l—‘H NHO universe class EVENT_MONITOR inherit PROCESS_BASE creation make feature evenLchannel: EVENT_CHANNEL reading: BOOLEAN make(channel: EVENT_CHANNEL) is do evenLchannel := channel end start is local event: EVENT do from until false loop -- infinite loop reading := true event := evenLchanneldequeue reading := false handle(event) end end concurrency reading => evenLchannel when evenLchannelsize > 0 end -- class E VENT _MONIT 0R Figure 26: An event monitor universe class EVENT.GENERATOR inherit PROCESS_BASE feature evenLchannel: EVENT.CHANNEL writing: BOOLEAN make(channel: EVENT_CHANNEL) is do evenLchannel := channel end start is do end concurrency writing :> evenLchannel end -- class E VENT _GENERATOR Figure 27: An event generator 116 mflmmerNl-J NNHHHHHHHHHH HOkoooxroxtntwar—too universe class MAIN inherit PROCESS_BASE feature start is local do end end generator: EVENT_GENERATOR monitor: EVENT_MONITOR channel: EVENT_CHANNEL monitor_process, generator_process: PROCESS -- create universes create channelmake create monitor.make(channel) create generator.make(channel) -- create processes create monitor_process.with_root(monitor) create generator_process.with_root(generator) -- suspend self suspend Figure 28: Creating processes 117 the universes have been created, the next step is to create and start the processes (lines 16—17). First, the monitor process is created by instantiating PROCESS with the construc- tor wi th_root. The argument of the constructor is the root of the monitor universe. The wi th_root constructor builds the initial realm for the monitor process so that it includes the monitor universe and then calls monitor . start. The same procedure is then re- peated for the generator process and the generator universe. Finally, the main process sus- pends itself via the standard library method suspend (line 19) so as not to take CPU time away from the generator and monitor processes. The duration of the s tart method determines process lifetime. Just as a process starts by beginning to execute this method, a process terminates when the start method of its process root returns, either normally, or via an exception. In our example, the start method of the monitor process contains an infinite loop. Because the monitor process will never return from s tart, it does not terminate, but runs indefinitely. Figure 28 also shows how to define the initial process of the system. A specially des- ignated class MAIN2 describes the code and the initial realm of the initial process of the system. The runtime system will execute this process implicitly at program start, and as soon as the initial process terminates, the entire program terminates as well. Therefore, we use the suspend method to keep the program active3. Alternatively, the MAIN class could subsume the functionality of either the event monitor or the event generator pro- cess, thereby eliminating the need for that process. This design would obviate the call to suspend. 2The name of this class can be changed by setting a compiler option. 3Without the call to suspend, the program would immediately terminate. 118 In summary, the reification of a process requires that the programmer define a descen- dant of PROCES S_BASE and instantiate both that descendant and the standard library class PROCES S. The descendant provides the code for the process to execute; the instance of the descendant populates the initial realm of the process; and the instance of PROCES S reifies the underlying operating system process that executes the code of the descendant. 8.3 Accessing Global Data Like all programming languages, Eiffel allows access to global data (as opposed to the local data of methods and the heap data that makes up the instance variables of an object). Such access can be regulated in one of two ways. First, each item of global data (which may be a single global variable or the variables of an entire module) can be treated as a universe. A reference to such data can then be assigned to the universe variable of an object and access to the data can be regulated by means of concurrency constraints. As an alternative, storage for global data can be replicated for each process. Then each process would have its own separate instance of the global data. For example, a global variable could be replicated for each process, and each process could access and alter its own version independently without affecting the others. The second approach makes each replicated version of the storage global only for a single process (and such data is therefore often called thread-local data), whereas in the first approach, data is globally shared by the entire program (at the expense of having to explicitly regulate access to it). In Eiffel, global data is accessed by means of so-called once functions. The syntax of a once function differs from that of a regular function by the use of the keyword once instead 119 of do to denote the start of the function body (cf. Figure 29, line 4). A once function in standard Eiffel is evaluated only once per program run: Even if it is called several times, the result is calculated only the first time, and stored in a global location. On any subsequent call, the result that was calculated the first time is retrieved from that location and returned. In this respect, once functions implement the Singleton design pattern [GHJV95]. Note that in a concurrent system, initializing and accessing the global location must occur in a thread-safe fashion. In the universe model, the semantics of a once function depend on whether the return type of the function is a universe class or not. If the return type is a universe class, then the once function is evaluated once per program and access to the resulting universe is regulated normally via concurrency constraints. Otherwise, the once function is evaluated once per process and the result is stored in thread-local storage of the process, where it can be accessed without interference by other processes. In the sequel, we illustrate both cases. Figure 29 shows an alternate implementation of the event channel example that uses a once function whose return type is a universe class to represent the globally shared event channels. The class EVEN'ILGLOBALS declares a single once function (globa 1_channe1, lines 3—6). Its body creates an instance of EVENT-CHANNEL. Result is a special variable, which holds the return value of a function and is of the same type as the function itself. It can be used like any other variable: It can be the target and source of assignments and the target of a creation instruction. Therefore, the result of global_channel will be a globally unique instance of EVENT_CHANNEL, no matter how often the function is called. Other classes can now use the function defined in EVENT_GLOBALS by inheriting 120 from EVENT_GLOBALS. The property that a once function will be called only once per program remains true even when it is inherited: When two different classes inherit from EVENT_GLOBALS, both will obtain the same result when calling the inherited function global_channel. In our example, both EVENT_GENERATOR and EVENTJVIONITOR can therefore reference a shared event channel by accessing the global_channel func- tion. Lines 13—16 of Figure 29 demonstrate how a constructor has to be written to use that function. The class definition of EVENT_MONI TOR is very similar and has been elided for brevity. The start method of the program can now be written without reference to the chan- nel universe (lines 22-36). The creation of the channel universe has been omitted entirely, since it will be automatically created the first time that global_channel is called. Lines 29—30 create the monitor and generator universes, but do not pass an argument to their constructors. Figure 30 illustrates the case where the return type of a once function is not a universe class. Here, primes is part of a hash table implementation. However, by the semantics of standard Eiffel (that primes is evaluated only once per program), the result would be shared among all processes. This could violate the basic principle that only one process may access an object at any time. To correct the behavior for functions whose type is not a universe class, we require that a once function is evaluated once per program execution only if the type of the result of the function is a universe class. If the type of the result of the function is not a universe class, then the function is evaluated once per process. The result of the function is stored in thread-local storage for each process. Therefore, no two processes can access the result 12] mflmmhLANl-J HHHHHHt—I mmrthr—Iom l—‘H ooq wwwwWWWWNNNNNNNNNNi-J \rmmprt—xommqmmnwwr—som universe class EVENT_GLOBALS is feature globaLchannel: EVENT_CHANNEL is once create Resultmake end -- globaLchannel end —- class E VENT _GLOBALS universe class EVENT_GENERATOR inherit PROCESS_BASE; EVENT_GLOBALS creation make feature evenLchannel: EVENT_CHANNEL make is do evenLchannel := globaLchannel end end -- class E VENT _GENERATOR universe class MAIN inherit PROCESS_BASE feature start is local generator: EVENT_GENERATOR monitor: EVENT_MONITOR monitor_process, generator_process: PROCESS do -- create universes create monitormake create generatormake -- create processes create monitor_process.with_root(monitor) create generator_process.with_root(generator) -- suspend self suspend end end Figure 29: Accessing global data 122 1 class HASH_TABLE 2 feature 3 primes: ARRAY[INTEGER] is 4 once 5 -- calculate primes in Result 6 end 7 8 end -- class HASH_TABLE Figure 30: Thread-local once functions of the once function at the same time. Once functions therefore provide convenient access to both global and thread-local data. Global data is defined as a once function that returns an instance of a universe class. In order to access global data defined through a once function, the result of that function has to be assigned to a universe variable, and a concurrency constraint must exist that makes the universe accessible. Conversely, thread-local data is defined as a once function that returns an instance of a normal class and can be accessed safely at any time without the need to manipulate concurrency constraints. [CW02] derived similar semantics for once functions for an implementation of Eiffel’s SCOOP mechanism that were omitted from the original specification [Mey93, Mey97]. 8.4 Assignments and Data Access The condition imposed by the universe model that no process access objects outside its realm requires adjustments to the semantics of assignments and method invocations. In particular, assignments of references to variables and the passing of references as actual 123 parameters to methods can result in more than one process referencing the same object. In order to avoid uncontrolled concurrent access to such objects, one has to either guard each access with a runtime check that ensures that accessed objects are in the realm of the current process, or replicate objects so that they cannot be referenced by more than one process. In the sequel, we show how these requirements affect the semantics of assignments and method invocations. 8.4.1 Motivation As a motivating example, consider line 14 of Figure 26. This assignment retrieves a refer- ence to an instance of EVENT from the event-channel universe and assigns it to the event variable. If the data referenced by event were part of the channel universe, then once reading had been set to false (in line 15), the process method (that is called in line 16) would be unable to legally access the object referenced by event. To avoid this problem, we treat any assignment where the source might be in a different universe than the destination differently from normal assignments: A copy of the source object is created and assigned to the destination variable. Thus, the process method can safely work on event (which has been copied into the event monitor universe) without crossing realm boundaries. However, creating such a copy is potentially an expensive operation. Not only does the object itself have to be copied, but, if any of its instance variables reference other objects, those objects may have to be copied as well, proceeding recursively through all objects that are reachable directly or indirectly from the source of the assignment. Therefore, while 124 simple data such as an integer or string is handled efficiently by the copying approach, it can be very expensive in general. It is therefore desirable for the programmer to be able to suppress such copying when it is not warranted. 8.4.2 Shared and Normal References We alter the semantics of assignments in our language extension to accommodate these requirements. The normal semantics of assignment prescribe that assigning a reference to a variable (a reference assignment for short) will not create a copy of the referent. Instead, the semantics of reference assignment are preserved only when it can be guaranteed at compile time that the destination and source of the assignment lie within the same uni- verse. Observe that for a strictly sequential program—a single process existing in a single universe—the semantics of reference assignment will not be affected at all. Thus, the alter- ation of reference assignment semantics only affects newly written concurrent code (where the programmer is aware of these semantics), but will leave existing sequential code intact. In addition to altering the semantics of normal references, we also provide shared ref- erences that allow reference assignments without creating a copy, even if the source and the destination of the assignment do not lie within the same universe. Given a type T, the declaration of a variable as being of type shared T will mark the variable as a shared refer- ence. (We call such a variable a shared variable for short.) As with the universe keyword, a class can also be declared as shared. For example, the class definition shared class S . . . end declares a shared class S, which has the same class signature as a class definition without 125 the shared prefix. A variable declaration such as: feature x:S then makes the declared variable x a shared variable. A shared variable can hold references to objects in any universe, whereas normal variables can only hold references to objects in the same universe. Note that variables whose type is a universe class behave like shared variables in that they can hold a reference to any universe root, regardless of the universe of which the referent is a root. Therefore, assigning an object reference to a shared variable or to a variable whose type is a universe class will not create a copy of the referent, even if the referent is contained in a different universe. It is the responsibility of the programmer to make sure that concurrency constraints allow access to the objects referenced by shared variables. To illustrate the use of shared, Figure 31 shows a variant of the event monitor code from Figure 26 that uses a shared variable. The main difference is the use of the keyword shared (line 11). Because event is now shared, the call of handle (event) (line 16) will access data contained in the event channel. Therefore, the realm-affecting assignment to reading now has to occur after the call of handle (event) , because otherwise the event channel universe would be removed from the realm and handl e could not access it. Because a shared variable can point to any object in the system, each access to a shared variable must be preceded by a runtime check (which is generated by the compiler). That check verifies that the referent is part of a universe that is in the realm of the current process. Should the check fail, a runtime exception is raised. The same check is necessary when 126 CDQONmiDLAJNl—l ll 12 13 14 15 16 17 18 19 2O 21 22 universe class EVENT _MON ITOR inherit PROCESS_BASE creation make feature event_channel: EVENT.CHANNEL reading: BOOLEAN make(channel: EVENT.CHANNEL) is do evenLchannel := channel end start is local event: shared EVENT do from until false loop -- infinite loop reading := true event := event_channel.dequeue handle(event) reading := false end end concurrency reading => event_channel when event_channel.size > 0 end -- class E VENT _MONITOR Figure 31: Shared Data accessing an instance of a universe class. A variable that is declared to be of a universe class can, like a shared variable, reference objects in any universe. Conversely, normal references only reference objects in the same universe and do not require a check. 8.4.3 Locality of Objects and Variables The semantics of a reference assignment in our language extension depend on the location of the source and the destination of the assignment. In order to make these semantics pre- cise, we define what it means for an object or variable to be local to a universe. An object is local to a universe if and only if it is contained within that universe. An instance variable 127 of an object is local to a universe if and only if the object is contained within that universe. A local variable of a method (including its formal parameters and the special Result vari- able) is local to a universe if the current object of that method (i.e., the value of Current) is contained within that universe. Note that any variable or object is local to exactly one universe, which we call the containing universe of the variable or object. We also define whether a type is local or not. A type is local if and only if any variable of that type can only reference or contain objects whose containing universe is the same as the containing universe of the variable. Otherwise, we say the type is non-local. In Eiffel, both shared and universe classes are non-local types. All other classes are local types. 8.4.4 Semantics of Assignment Whether the source of an assignment has to be copied before being assigned to a variable depends on the source and the destination of the assignment. Semantics vary depending on whether the source and destination are local to the same universe. To generate efficient code for assignments, we infer this information at compile time from the types of the source and destination of the assignment. An assignment var := expr in Eiffel has only a limited number of variants, which we will enumerate in the sequel along with their semantics. The destination var of the assign- ment can only be an instance variable or the local variable of a method (local variables include the formal parameters of the method and the special Result variable that holds the result of a function). The source expr can be either a local variable, an instance variable, or a method call. Note that operators in Eiffel are just an alternative syntax for a method call. 128 For instance, the expression 1 + 2 is rendered as an invocation of the method inf ix " + " on the target 1 with argument 2. Similarly, constants can be interpreted as parameterless once functions, and access to instance variables of an object other than Current can be be implemented through parameterless methods that return the contents of the variable. We consider first the case where both var and expr are local or instance variables. If the type of var is non-local, no copying is necessary, because a variable of a non-local type can reference an object in any universe. Likewise, if both var and expr are of a local type, then no copying is necessary, because expr can only reference objects in the same universe, and var can reference objects in the same universe. However, if expr is of a non-local type, and var is of a local type, then the assignment must assign a copy of expr to var. The semantics of assignment where expr is a method call of the form target.method(arg1, . . . ,argn) are more complex, because target may be either local to the current universe or not, which affects the containing universe of the result of the function. First, if target is declared to be of a local type, then the semantics of the assignment are the same as if expr were an instance variable of the current object; i.e., copying is necessary if and only if the type of the result of method is of a non-local type and var is of a local type. This is because the Result variable of a function that is declared to be of a local type can only reference an object local to the same universe that var is local to. Second, if target is of a non-local type, then we must assume that it is possible that the result of method is not local to the containing universe of var. While the method itself may have been declared to be of a 129 local type, the result will be local to the containing universe of target, not the containing universe of the object invoking the method. Therefore, the result of such a method call must be copied. 8.4.5 Semantics of Argument Passing Assignment also implicitly occurs when a reference is passed as an argument to a method. Such argument passing will assign the reference to a formal parameter of the method. Without loss of generality, we consider only methods with a single argument to derive the semantics of such implicit assignments. Consider a method call target.method(expr) where expr is assigned to a formal parameter par of method and target is declared to be of a local type. In this case, par is local to the containing universe of the current object and the semantics are identical to that of the assignment par := expr, where par is a local variable. However, if target is declared to be of a non-local type, then the containing universe of par is the same as that of target, and par may not be local to the containing universe of expr. Under these circumstances, if par is declared to be of a local type, then the assignment has to create a copy of expr, which is then assigned to par. Conversely, if par is declared to be of a non-local type, then it can hold a reference to an object in any universe, and no copying occurs. 130 8.4.6 Assignment of Values Eiffel’s type system supports not only reference types, but also so-called expanded types. A variable that is declared to be of an expanded type does not contain a reference to an object, but instead contains the object itself. An expanded type in Eiffel would generally correspond to a record type in Pascal or Modula-2 or a struct type in ClC++, whereas a reference type would correspond to a pointer to a record or struct. Examples of expanded types include the basic arithmetic types, such as INTEGER or REAL, but also user—defined classes that are prefixed by the keyword expanded. The following class definition defines a class COMPLEX that is an expanded type. expanded class COMPLEX real: DOUBLE imaginary: DOUBLE end In standard Eiffel, assignment between expanded types is performed componentwise. For example, if one has two variables 21, 22 of type COMPLEX, then the assignment 21 := 22 is performed by assigning 22 . real to 21 . real and 22 . imaginary to zl.imaginary. In our extension of Eiffel, assignment of values of expanded types also occurs compo- nentwise. That is, the assignment of an expression expr to a variable var, where var and expr are of an expanded type, is replaced by an (atomic) sequence of assignments var.x1 := expr.x1 131 var.xk := expr.xk for each component x,- of the type of var. Because expr may have side effects, it is evalu- ated only once for each such sequence. If an x,- is itself of an expanded type, the process is repeated for that x,- until the original assignment has been replaced by a sequence of as- signments of either reference types or expanded types that do not contain references (such as INTEGER). The resulting sequence of assignments then follows the basic rules for ref- erence assignments described above. 8.4.7 Deep Copying If an assignment to a variable of a local type necessitates copying by the above rules, this is a deep or recursive copy, which copies not only the object itself, but also objects referenced by instance variables of that object, and so forth. In this section, we define the precise semantics of the deep copy of a data structure. The purpose of creating a deep copy is to avoid inadvertent access to an object that is not part of the realm of the current process. Consider, for instance, that the type EVENT in our example is defined as: class EVENT feature priority: INTEGER message: STRING 132 end Copying only the event object itself from the event channel to the event monitor universe would leave the message object inside the event channel universe. If process were de- fined as class EVENT_MONITOR feature process(event: EVENT) is do print(event.message) end end it would access an object (event .message) that is not part of the current realm. The result of such an operation would be undefined. Note that if message were to be declared as being of type shared STRH‘JG, then the access would result in defined behavior, namely a runtime error if me 8 5 age is not part of the current realm. A deep copy operation transfers not only the object referenced by the source of an assignment to another universe, but also any object referenced directly or indirectly by instance variables (excluding those instance variables that are of a non-local type). We first define what objects need to be copied. For this purpose, we define a local reachability relation on objects. An object 02 is locally reachable in one step from an 133 object 01 if 01 has an instance variable of a local type that references or contains 02. An object on is locally reachable from an object 01 if there are objects 02, . . . ,on_1 such that oi+1 is locally reachable in one step from o,- for all i < n, or if 01 = on. Effectively, local reachability is the reflexive, transitive closure of one-step local reachability. The set of objects that are locally reachable from 01 can be computed by recursively traversing all instance variables that are declared to be of a local type. Instance variables that are shared or whose type is a universe class are not traversed. Thus, the deep copying procedure will not access objects outside the realm of the current process. Let S be the set of objects locally reachable from an object source. For all 0 E S, we create a new object copy(o) such that copy(o) is an identical copy of 0. We now have a copy of each object locally reachable from source. However, the instance variables of these copies may be incorrect. For instance, if object source had an instance variable (of a local type) that referenced oh, the corresponding instance variable of copy(source) would still reference the original ob instead of copy(ob). Therefore, these instance variables must be updated in a final pass. For all 0 E S, we update the instance variables of copy(o). Let v be an instance variable of copy(o). If v is declared to be of a non-local type, then it is not changed. If v is declared to be of a local type and it references an object o’ E S, then v is assigned a reference to copy(o’). If v is declared to be of a local type and it references an object 0’ ¢ S, then v is not changed. Finally, if v does not reference an object (e.g., if it is of type INTEGER or another expanded type that does not contain references), then v is not changed, either. After all instance variables have been updated, copy(source) contains a deep copy of source. To illustrate this process, consider the instance diagrams in Figure 32. Each box in 134 this diagram describes an object, with the shaded rectangle at its top describing the name and type of the object and the white rectangles describing the instance variables and their types. An arrow emanating from a white rectangle points to the object that the correspond- ing instance variable references. The objective of this example is to show the process of a deep copy with source being the object obl. The set S of objects reachable from obl are {ob1, ob2, ob3}. The object OM is not reachable, since it is only referenced by the shared variable b2 of ob2. The copying process then creates objects oblcopy = copy(ob2), ob2copy = copy(ob2), and ob3copy = copy(ob3). Updating the instance variables of oblcopy changes a to reference ob2copy instead of ob2 and leaves b unchanged (since it is of a non-local type). Updating the instance variables of ob2copy changes c to reference oblcopy instead of obl and b to reference ob3copy instead of ob3, while ()2, being of a non-local type, remains unchanged. Finally, updating the instance variables of ob3 is not necessary, since its only instance variable is an integer which does not reference any object. In summary, this definition of deep copying satisfies the requirement that objects out- side the realm of the current process will not be accessed. The copying algorithm itself will only access objects in the same universe as source, which is already part of that realm. Once the data structure originating at source has been copied, it is not possible for user—defined code to access objects outside the realm via copy(source), either. All objects that can be accessed via copy(source) are either local to the same universe as copy(source) itself or are referenced by a variable of a non-local type and all accesses through variables of a non-local type are checked at runtime. 135 a:A b: shared B b:B b2: shared B x: INTEGER x: INTEGER Original i oblcopyzO I 9b29°Py3A j 31A 020 b: shared B b: B Deep Copy b2: shared B ob3copy: B x: INTEGER c: C b:B x: INTEGER b2: shared B :1 0M8 x: INTEGER Figure 32: Deep Copy Example 136 8.5 Summary of Language Extensions Our extensions introduce the new keywords universe, concurrency, and shared, the new symbol => and provide additional semantics for the existing keywords when and once. 0 The keyword universe begins the declaration of a universe class (Section 8.1.1 and Section 8.1.3). o The keyword concurrency declares a concurrency constraint (Section 8.1.1). The keyword shared declares a shared class or variable (Section 8.4.2). The keyword when and the symbol => are operators for concurrency constraints (Section 8.1.1) o The keyword once declares once functions, which have special semantics in our ex- tension (Section 8.3). In addition, we provide new standard library classes, called UNIVERSE_BASE, PROCES S_BASE, and PROCES S. o UNIVERSE_BASE is an ancestor for all universe classes (Section 8.1.2). o PROCESS_BASE is a deferred class, from which other classes inherit to define the code of a process. This descendant is then instantiated to define the process root (initial universe) of a realm (Section 8.2.1). o The instances of PROCESS are representations of actual processes as provided by the underlying operating system (Section 8.2.1). 137 Finally, we alter the semantics of assignment and method invocation to prevent inad- vertent concurrent access to shared data (Section 8.4). 138 Chapter 9 Performance Evaluation We now turn to the issue of how well the realm update algorithm performs. The perfor- mance of each of the individual components of our algorithm (e.g., wound-wait, cautious waiting, and reference counting) has been analyzed before in isolation [BKH97, H292, BAR+O3]. Given that our algorithms combine these disparate techniques in a manner that involves complex interactions, a full complexity analysis is beyond the scope of this thesis. Instead, we chose to evaluate the algorithm empirically. To this end, we have devised a parameterized benchmark, which, dependent on a set of input parameters describing realm size and degree of contention, will yield the overhead of adding universes to a realm and removing them from it. We first introduce the benchmark in Section 9.1, including the pa- rameters and their meaning. Section 9.2 explores the results we obtain for various settings of these parameters. Section 9.3 discusses the cost of adding universes to a realm or remov- ing them from a realm on a uniprocessor machine. This cost is essentially linear in terms of the number of universes involved, with a small cost per universe, even under artificially high levels of contention. On a symmetric multi-processor architecture, execution cost for 139 our benchmarks is dominated by an OS-specific context switch overhead that is inherent to any completely fair scheduling approach. 9.1 The Benchmark Infrastructure The benchmarks described in this section are generated from a simple class schema (Fig- ure 33) and a set of input parameters. The generated code is passed to our extended Eiffel compiler, and the compiled code is then executed to estimate the overhead caused by the realm update algorithm for the given input parameters. 9.1.1 The Class Schema The class schema (Figure 33) comprises a condition variable c (line 8), a set of universe variables u1, . . . ,un (line 9)1 , a concurrency constraint c => u1 /\ . . . /\ un (line 30)1 over these variables. The schema also provides a procedure i terate that repeatedly flips c, thereby implicitly invoking the realm update algorithm to either add (when c becomes true) or remove (when c becomes false) universes in the set {u1, ..., un}. The body of this procedure consists of a pair of nested loops, the outer loop of which loops forever over its body. The body consists of three statements, the first of which is the inner loop (lines 18—20) that controls the interval between invocations of the realm update algorithm. Its duration is determined by the function interval_l ength (see below). The second is an instruction (line 21) to toggle the condition variable c. Changing c is a realm-affecting operation, ]Note that the code listed in the schema is a placeholder for the actual code. The code generator instantiates this schema to produce syntactically correct code. 140 and thus causes the invocation of the realm update algorithm. Finally, at the end of each iteration of the outer loop, the routine record_iteration is called, which is a special benchmarking routine (implemented in C and assembly language to avoid distortions of the benchmark results). Its purpose is to check whether a given number of iterations of the outer loop has been reached, and to end the benchmarking process once that happens. It does this by incrementing a global counter (shared by all threads) atomically, and if a set number of iterations is reached, it terminates the program and outputs the time spent since its inception. The function interva 1-1 eng th, which determines the duration of the inner loop, and thus the time between invocations of the realm update algorithm, generates pseudo- random numbers with a fixed average. The purpose of having it generate pseudo-random numbers is to avoid having the realm update algorithm accidentally operate in lockstep with the operating system scheduler or other operating system events. The function is calibrated so that an average iteration of the outer loop takes ~ 1000 clock cycles. The pseudo-random numbers follow the standard exponential distribution over the interval (0, oo). Observe that It may be zero, in which case we omit any declaration of universes in line 9, and set the concurrency constraint (line 30) to be c => true. Even though toggling the value in c has no effect on the satisfiability of this constraint, assignments to c are nevertheless realm affecting operations. Thus, they invoke the realm-update algorithm and allow us to estimate the overhead for the basic invocation. 141 9.1.2 Input Parameters Three input parameters affect the generated benchmark code. The first parameter, Number- Processes, is a positive integer that sets the number of processes. Second, DeltaRealm is a non-negative integer that controls how many universes are added to (or removed from) the realm of a process on each iteration. DeltaRealm corresponds to the value of n above. Third, Overlap is a non-negative integer less than or equal to DeltaRealm, which defines the overlap between two processes: Suppose that processes are numbered p1, ..., pk. Then the maximum realm of p,- will have Overlap universes in common with the maximum realm of pi+1 for l g i < k, and the maximum realm of pk will have Overlap universes in common with the maximum realm of p1. 9.1.3 Output Values and Precision For any set of input parameters, we track two output values. One is the average Overhead incurred by invoking the realm update algorithm. We determine that value by running the benchmark twice, once with the realm update algorithm enabled, and once with the al- gorithm disabled. We disable realm updates by removing the invocation of the procedure from the code generated by the compiler for the c := not c instructionz. The difference between the two runs, divided by the number of total iterations performed, is the average overhead caused by the realm update algorithm per iteration. To minimize errors, we further average the result over several runs. The other output value is SignalRate, the number of times per thousand invocations (%o) 2Observe that this is safe and does not generate any runtime errors, since no process actually ever accesses any of the u;. 142 QOWiBWNi-i wwwNNNNNNNNNNHHHHHHHHHH NHOkoooqmmtwaHomooqmmprt—Aom universe class BENCHMARK inherit PROCESS_BASE feature { NONE } -- attributes c: BOOLEAN ul, ..., un: UNIVERSELBASE —- declaration will be generated for n >= 0 feature { ANY } -- public routines iterate is local j: INTEGER do from until false loop -- infinite loop from j := intervaLlength until j = 0 loop 1' ==j - 1 end 0 := not c -- invokes scheduler record_iteration end end concurrency c => ti] and and un -- constraint will be generated for n >= 0 end -- class BENCHMARK Figure 33: Schema used to generate benchmark classes. 143 of the realm update algorithm that one process signals another process the sending of an ACQUIRED or SURRENDER message. This value is an indirect indicator for the degree of universe contention. Note that the implementation is optimized to not signal other processes for every single message sent in order to avoid unnecessary process context switches, so it is not an indicator for the number of messages being sent between processes. The relative variation of Overhead is generally less than a few percent between indepen- dent runs. The absolute error for SignalRate can be as much as 0.5%, which corresponds to one signal per two thousand scheduler invocations. 9.1.4 Configuration Options Our runtime library can be configured by a number of parameters at compile time, some of which require a certain degree of processor support. The PORTABLE option selects an implementation that has a high degree of portability across systems that support the POSIX API [IEE96]. The FASTLOCKS option supports customized mutexes that were optimized for our purposes instead of using the POSIX implementation. Our optimization essentially removes overhead due to API differences and unnecessary options provided by the POSIX interface and the Linux implementation, but is otherwise a typical mutex implementation. Additionally, it is reduced to a single compare-and-swap call in the absence of contention. The SMP.SUPPORT option can be chosen in conjunction with the FASTLOCKS option. Disabling it removes an inherent overhead in synchronization mechanisms on symmetric multi-processors using the Intel architecture. Finally, USER_THREADS selects an experi- 144 til. mental implementation of user level threads3, which requires compiler support to be pre- emptive, but eliminates synchronization overhead completely. 9.2 Benchmark Results To measure the performance of our algorithm as a function of the input parameters, we have constructed a number of scenarios. Each scenario keeps all but one or two of the input parameters fixed, and varies the other parameters along a (usually linear) scale. For each scenario and set of input parameters, we list the output values. We do not explicitly list the value of SignalRate in scenarios where no contention occurs, because in that case it is always zero. Our benchmarks were performed on an Intel Xeon processor, running at an effective frequency of 2.175 GHz, using a Linux operating system (kernel version 2.2.17 configured for uniprocessor use) and the POSIX threads implementation that is part of the GNU libc 2.3.1 implementation. We used the FASTLOCKS option and disabled SMP_SUPPORT4, unless specified otherwise. The first scenario (Table l a) attempts to capture the effect of the DeltaRealm parameter, all else being equal. For this purpose, we set Overhead to zero so as to avoid contention. The NumberProcesses parameter was kept at 2 throughout the scenario. Observe that the 3 The USER_THREADS option is based on the portable GNU threads library [EngOO]. It is exper- imental only insofar as the thread library cannot yet be used in conjunction with the Boehm-Weiser conservative garbage collector [BW88] that we are using, but is otherwise functional. 4Enabling SMPJUPPORT on 80x86 processors introduces a fixed overhead per synchroniza- tion primitive, caused by prefixing atomic instructions such as compare-and-swap with an instruc- tion that locks the system bus and keeps processor cache lines synchronized. Unfortunately, that overhead varies greatly between processor types, from ~ 30 to ~ 160 clock cycles according to our measurements, and decreases the reproducibility of results. 145 DeltaRealm Overhead Overlap Overhead SignalRate 0 216 0 1310 0.00% l 428 l 1376 1.49960 2 564 2 1331 1.34% 3 756 3 1333 1.90% 4 847 4 1356 1.38%) 5 1049 5 1339 1.77% 6 1290 6 1265 1.86% 7 1341 7 1330 1.63960 (a) (b) Table 1: Scenarios 1 and 2 -— No contention and with contention. value of Overhead can be approximated well by a linear function. The second scenario (Table l b) examines various degrees of contention between re- sources. While still keeping the number of processes at 2, we vary the Overlap parameter from 0 through 7. DeltaRealm is kept at a constant value of 7, so that we exarrrine every situation from no contention to 100% contention. While the non-zero SignalRate indicates that contention does indeed occur, there is no effect on the value of Overhead that exceeds the error of measurement. Finally, the third scenario (Table 2) measures what happens if multiple processes con- tend for the same set of resources. In this scenario, we track the overhead for 1 through 5 processes contending for the same set of resources. We vary NumberProcesses from 1 to 5, and for each value of NumberProcesses, we vary DeltaRealm from 1 to 5 and from 10 to 30. For each run, Overlap will be the same as DeltaRealm to create complete contention. This time, we observe a minor performance penalty that increases with the number of processes. I46 Overhead DeltaRealrn = Overlap § 1 2 3 4 5 10 20 30 § 1 412 599 721 877 1006 1637 3036 4398 0% 2 450 574 760 859 1065 1801 3222 4494 g. 3 466 597 779 880 1073 1741 3432 4585 g 4 496 613 796 898 1105 1858 3391 4618 2 5 512 672 840 941 1158 1906 3404 4754 SignalRate DeltaRealm = Overlap 3» 1 2 3 4 5 10 20 30 § 1 0.00%. 0.00%. 0.00%. 0.00%. 0.00%. 0.00%. 0.00%. 0.00%. 5 2 1.14%. 1.19%. 1.51%. 0.98%. 1.11%. 1.65%. 1.28%. 2.50%. g. 3 3.21700 3.38700 3.07%0 4.98700 3.71700 5.15%0 7.21%0 6.45%0 g 4 6.97%. 6.35%. 6.29%. 6.14%. 8.14%. 8.59%. 9.86%. 12.98%. 2 5 9.61%. 10.98%. 9.17%. 9.72%. 11.69%. 12.29%. 14.10%. 20.58%. 9.3 Analysis Table 2: Scenario 3 — Process contention. We have provided performance data for our scheduler on a uniprocessor system using stock hardware and an off—the—shelf operating system, when performing under a variety of circumstances. We now discuss the meaning of these results, what they reflect of our implementation, and why we didn’t provide similar data for an SMP architecture. 9.3.1 Uniprocessor Results In the absence of contention, the realm update algorithm has a cost that is approximately linear in the number of universes involved in an update, with the cost increasing by ~ 160 clock cycles per universe. This is the expected outcome, since the realm update algorithm has a cost that is linear in the number of universes for that case, as long as the underlying graph is tree-shaped. 147 For two processes and up to 30 shared universes we do not see any increase in cost that significantly exceeds any error due to caching, pipelining, and other sources of error immanent to a modern microprocessor. In particular, for DeltaRealm = 30 the value of SignalRate does not exceed a value of 2.5 per thousand invocations of the realm update algorithm for two threads, compared to a value of 1.14 for DeltaRealm = 1. That means that while there is minimal overhead due to contention, the cost of contention per universe is negligible in comparison. For more than two processes, the situation is more complex. For any given number of processes, the increase in SignalRate is roughly linear in terms of DeltaRealm, and there appears to be a matching increase in Overhead, though that is too small compared to the immanent error due to caching, pipelining, etc. to make any reliable statements. The cost per universe appears to be largely dependent on the number of threads involved, and increases non-linearly. However, for all our test cases, the overhead per each additional thread never exceeds 5% of the cost for the case where there is no contention. To further analyze the case where two or more threads contend for one or more uni- verses, note that the additional cost for contention is dominated by the cost of process context switches. In our implementation, the cost of a context switch is 10—100 times that of the various synchronization primitives that we employ (usually around ~ 4000 clock cycles for a context switch, ~ 40+ clock cycles for a synchronization primitive). Thus, even relatively infrequent context switches can contribute significantly to the overall over- head. Context switches can occur either voluntarily by one process transferring control to another, or involuntarily, when a time slice of the OS scheduler has ended. Involuntary context switches are rare on a uniprocessor, and in our implementation, 148 voluntary context switches on a uniprocessor can only occur as a side effect of involuntary context switches, due to ACQUIRE or SURRENDER messages being sent. Furthermore, note that if one process initiates a process switch by signaling the sending of such a message to another process, this can also result in a third process being scheduled instead, causing further contention, as there are now three or more processes negotiating for the acquisition of universes. It is therefore to be expected that the overhead in contention scenarios depends both on the number of universes under contention and the number of processes contending for them. We conclude that the realm update algorithm has a reasonable cost and scales well on uniprocessors, even for artificially large degrees of contention between processes. 9.3.2 SMP Architectures We have not provided data for our benchmarks when running on an SMP architecture, be- cause they expose a pathological case of SMP scheduling. For example, when running two processes with DeltaRealm = Overlap = l and AngntervalLength % 1000 on the same architecture with an SMP-capable OS kernel, we have Overhead > 5000 clock cycles, over ten times times the uniprocessor cost, and SignalRate > 950 per thousand realm updates, meaning a process context switch for almost every invocation. Of the signals delivered, over 99.9% were for universe acquisitions, and less than 0.1% for SURRENDER messages. Therefore, the increased overhead is almost entirely attributable to the increase in Signal- Rate and the concomitant context switches. In particular, the context switch overhead alone exceeds the synchronization cost on uniprocessors even for very large realm sizes. 149 This massive increase in overhead is not a shortcoming of the realm update algorithm, but simply due to lock contention; a similar increase in overhead could be observed when replacing the realm update procedure with code that locks a semaphore when c = true and unlocks it when c = f al se. To understand the reason, observe that on a uniproces- sor, once a process P relinquishes a resource and transfers control to another process Q, P will be suspended for a full time slice, where it can’t interfere with Q’s operation. On an SMP architecture under the same circumstances, P continues running, until it is blocked when attempting to acquire the shared resource again. Essentially, on an SMP architec- ture under high contention, almost each resource acquisition carries the additional cost of a context switch. Observe that this problem need not occur if resource management is not completely fair. A uniprocessor scheduler does not have the problem because Q is allowed to acquire the resource several times before P can attempt it again. This approach sacrifices fairness for performance. To obtain similar performance in the SMP case under high contention, a resource management algorithm will have to give up a degree of fairness to avoid the above scenario. This phenomenon occurs in practice as well. Previous research has found that applica- tions that use blocking synchronizations can spend a third of their time on context switching [KLMO91], while another study [1899] that used the SPLASH-2 benchmarks found that for some applications half the time was spent waiting on synchronization. A commonly proposed solution [KLMO91, TF02] is to use busy waiting (“spinning”) instead of idle waiting to eliminate context switches in those cases where context switch overhead domi- nates execution time. To study this, we implemented an experimental spinning approach for 150 tr our realm update algorithm by inserting a busy waiting loop that delayed until u . owner 3E nil at the beginning of request. By doing this, the overhead of the realm update became comparable to the uniprocessor case (within 10%). Unfortunately, this simple so- lution can potentially lead to starvation, does not scale well for multiple universes, and can unnecessarily occupy processors on multi-user machines. An alternative solution to the problem of excessive context switch overhead is to use user-level threads (which can be en- abled in our implementation by setting the USEILTHREADS option in the runtime system), which are limited to a single processor, but consequently cannot exploit the parallelism of SMP systems. Further research is needed to determine the best solution to deal with such scenarios. 151 Chapter 10 Conclusion and Future Research We have presented an approach to synchronization that relies on the specification of declar- ative contracts, rather than the embedding of synchronization mechanisms in the procedu- ral code of a class. This approach can be integrated easily in object-oriented languages. We have provided algorithms for the most difficult parts of its implementation, validated their efficiency, and provided a set of language extensions for Eiffel that allow a programmer to use the model. The remaining open questions are the relative power of the universe model relative to existing approaches and what additional features may have to be supported by a contractual model in the future. 10.1 Comparison of the Universe Model to Existing Ap- proaches The universe model does not provide capabilities for every single concept that has been discussed so far, especially not all of the features of the synchronization mechanisms and 152 contractual models presented in Section 2.6.2. Sather, CEE, and SCOOP have features for the explicit composition of contracts that are not covered by the universe model, and which we discuss first. Second, we consider the issue of intra-object synchronization contracts, which are provided by the synchronization ring model, but not by the universe model. Con- versely, the universe model supports the implicit composition of synchronization contracts, which is not available in any of the other approaches. 10.1.1 Explicit Composition beyond the Universe Model A distinguishing feature of Sather that cannot currently be replicated in the universe model is the else clause of its multi-branch lock statement. It specifies an alternate execution path that can be taken when none of the available branches can be taken. The universe model as a contractual model does not support such semantics. [FP97] describe three possible alternative semantics for the else-branch, none of which is clearly superior to all of the others. Should semantics similar to the else-branch be found to be desirable, a contractual approach would be to use timeout contracts, which are contracts that are satisfied after a certain time, but not before. For instance, the contract C —~> SI V C —+ 52 V timeout(t) can be satisfied by acquiring either 51 or 52 within t time units, or after the expiration of t 2 0 time units if neither S 1 nor 52 can be acquired before the timeout expires. Beyond the synchronization contracts allowed by the universe model, both SCOOP and CEE allow the programmer to express the obligations of a conditional synchroniza- 153 E‘. tion contract that involves a predicate that can depend on the state of several shared ob- jects. Conversely, the universe model only allows for synchronization contracts of the form C ”55) S, where P( C, S) depends only on the state of C and S. Restricting conditional contracts in such a fashion was a conscious design decision. It simplifies the correct imple- mentation of when clauses considerably, which is important, given the already considerable complexity of the basic deadlock avoidance/recovery algorithm. There is no principal prob- lem in extending the language of concurrency constraints to allow the condition of when clauses to depend on the state of additional universes, except that implementation of such a scheme would be non-trivial. We believe that extending the language of concurrency constraints to cover more powerful conditional contracts should not be done until there is sufficient evidence that the additional power is useful in practice. 10.1.2 Intra-object Contracts A type of synchronization contract that we have not studied in this thesis is that of intra- object synchronization contracts, which are supported by the synchronization ring model [H0199]. Whereas our existing classification of synchronization contracts assumes that a process has either exclusive access to a supplier or no access at all, intra—object synchro- nization contracts specify if (and under what circumstances) multiple clients that are owned by different processes can access the same shared supplier concurrently (a classic exam- ple of this type of contract are path expressions [CH74]). A common case of intra-object concurrency is that of the multiple readers, where multiple processes access a shared sup- plier, as long as they only read the supplier’s data, but do not modify it. This is obviously 154 thread-safe and can significantly improve the level of concurrency. One way to specify intra-object synchronization contracts is the concept of ports [H0199]. Using this notion, shared objects are accessed via a specific port. For example, there might be both a read-port R, which allows only read-only access, and a write-port W, which allows read-write access. More than one process can access the same supplier via port R, but if one process accesses the supplier via port W, then no other process can access the same supplier. While intra-object synchronization is well understood for the multiple readers problem, one can specify more complex requirements for intra-object concurrency, which signif- icantly complicates the negotiation of synchronization contracts. In particular, a general approach would allow the specification of an arbitrary set of ports P1, . . . ,Pk. One has to then define which ports can be accessed concurrently and which cannot, and has to en- sure that no unwanted interference can occur, which is a non-trivial problem. As [H0199] notes, however, intra-object concurrency is required only infrequently in practice. Consid- ering also that intra-object contracts require a more complex runtime system to negotiate them, possibly to the point of affecting its efficiency, we have deferred the specification and implementation of intra-object synchronization contracts to future work. 10.1.3 Implicit Composition of Contracts in the Universe Model In addition to providing synchronization contracts as a first-class language feature, the universe model exceeds the contractual capabilities of the mechanisms discussed so far by allowing for the implicit composition of contracts. Whereas previously discussed ap- 155 proaches assume that the client of a synchronization contract is a process, job, or transac- tion, the universe model assumes that the client of a synchronization contract is a shared resource (specifically, a universe) owned by a process. By this token, the supplier of one synchronization contract can become the client of a second (and possibly third, fourth, etc.) synchronization contract. This chaining of synchronization contracts, as described by the notion of accessible links in Chapter 4 is not possible if the client and the supplier of a synchronization contract are a process and a shared resource, respectively. In addition to being able to construct chains of synchronization contracts, one has also to be able to evaluate the composition of synchronization contracts that occur along these chains, which is only feasible in a contractual model. For example, in order for the client C of an exclusion contract C —+ S to know about the behavior of S with respect to syn- chronization, the essential information about that behavior must be available through the interface of S. If S were to employ a procedural synchronization mechanism, C would have to perform procedural analysis of the code of S in order to infer the same information, which is both undecidable in the general case and violates basic information hiding prin- ciples. Thus, we conclude that language support for implicit composition is only possible if a supplier S informs its clients about the synchronization requirements S has when it is itself acting as a client. In other words, S must publish its contractual relationships. 10.2 Open Research Questions We conclude this discussion by describing two open research problems. First, our classifi- cation scheme provides only one way to represent disjunctions, namely the V operator. As 156 noted already in Section 10.1.1, this is insufficient to represent the else-branch of Sather’s lock statement. But more generally, there are several feasible interpretations of the seman- tics of the disjunction of synchronization contracts, which creates an inherent ambiguity. The second problem is that the negotiation of certain composite conditional contracts may require influencing the OS scheduler in non-trivial ways. Unlike the first, the second one is a concern for procedural and contractual approaches to synchronization alike, and may simply prove to be infeasible. We now address these two problems in turn, and show for the first one how it can be solved by extending the classification scheme for synchronization contracts. 10.2.1 'Ii'eatment of Disjunctions The ambiguity of composing synchronization contracts by disjunction arises from a subtle fairness problem in their interpretation. As an example, consider the composite contract C —> 51 V C —> S2. Each time that this contract is negotiated, the system has to acquire either 81 or $2. One has to avoid always choosing the same supplier S,- to avoid starving the other supplier. Were one to acquire the first supplier (either 81 or 52) that became available, then it would be possible that 51 would always be selected and 82 never, or vice versa. On the other hand, acquiring the first supplier to become available will likely result in better performance. Judging from just the synchronization contract without additional information, either interpretation may be the one intended by the programmer. Both interpretations can feasibly be implemented in practice as follows and are there- fore not purely theoretical considerations. If one has a deadlock- and starvation-free algo- 157 rithm to implement synchronization contracts composed by conjunction, it is always possi- ble to derive an algorithm for composition by disjunction as well and either interpretation of the V operator above has an effective implementation. To illustrate, consider a contract A1 V . . . V Am, where the A ,- are conjunctions of arbitrary basic contracts. For a completely fair algorithm that does not exploit parallelism, select one of the A ,- at random, and attempt to satisfy it. For a maximally concurrent (but possibly unfair) algorithm, attempt to satisfy the conjunction A1 A . . . A Am. As soon as there is a j such that Aj is satisfied, release all resources that are not needed by A j. We call this the eager approach to negotiating disjunc- tions. Since neither approach can be disregarded as infeasible, the question remains which one was intended by the programmer. In order to disambiguate between these (and possibly other) interpretations, not all types of disjunctions can be represented by the same operator V. Instead, a plurality of operators may be needed to encode the intent of the software designer. A related notion exists in the concurrent specification language CSP [Hoa85], which has two distinct operators to describe non-deterministic choice. Likewise, providing an or—else operator would allow synchronization contracts to obviate Sather’s lock statement entirely. Whether such an ex- tension of the language of synchronization contracts is necessary, or one interpretation can satisfy most situations that occur in practice, can only be answered by experience. Regarding existing practice of dealing with the ambiguity of disjunction, those synchro- nization mechanisms that effectively implement composition by disjunction seem to only provide one interpretation of disjunction, which is either the eager option [BBFOl ], leaving fairness management up to the programmer, or a hybrid approach. An example of a hybrid approach is the implementation used by the programming language Sather [FP97]. Sather’s 158 algorithm attempts to satisfy each of the disjuncts (in random order) in turn. After attempt- ing to satisfy one of them, the algorithm waits for a specified time interval (which increases with each attempt) before trying the next alternative. While this does not absolutely guaran- tee fairness, it makes starvation unlikely, as long as locks are not held for a time exceeding the initial time interval. The algorithm that we described in this thesis uses the completely fair approach, though chiefly to simplify its presentation. An implementation like Sather’s would be a straightforward extension of the existing implementation.1 10.2.2 Composition of Conditional Contracts The fair negotiation of composite conditional contracts creates a non-trivial scheduling problem. Consider the composite contract CESIACgSQ. Let P(t) and Q(t) be the value of P and Q, respectively, at time t, and suppose that P(t) and Q(t) both become true infinitely often, and independently of one another. However, whether P(t) A Q(t) ever becomes true depends entirely on scheduling. It is possible for scheduling to occur in such a fashion that P(t) is true only when Q(t) is false, and vice versa. Thus, any negotiation mechanism that tries to satisfy the composite contract has to do one of two things. It can either simply observe P A Q without interfering with the scheduling, and wait for P A Q to become true, or it can attempt to control the execution of processes that alter P and Q, steering them to a point where P A Q becomes true. The observing approach lOne simply has to reevaluate acces s ibl es after each invocation of df sRes tart in Chap- ter 6. 159 has the potential to lead to starvation; the controlling approach is difficult to implement in the general case. Therefore, most existing approaches that allow for composite contracts either use a strictly observing approach (such as [CW02]) or use an observing approach augmented by some simple timing heuristics that can delay processes to apply a nrinimum of control (such as [FP97]). Our implementation likewise provides means for an observa- tional approach with timing-based heuristics; we consider a fully controlling approach for the conjunction of conditional contracts to be an open research question, which may prove to be infeasible. 10.3 Outlook Our long-term research vision is that the use of synchronization contracts may funda- mentally simplify the development and improve the reliability of multi-threaded shared- memory systems. Specifically, we believe that declaratively specified (and automatically negotiated) contracts will obviate the need for complex ad hoc synchronization mechanisms and protocols, thereby reducing the dominant source of complexity in these programs. As the use of powerful synchronization contracts is a relatively new programnring paradigm, validating the long-term vision will require more research into building systems using this paradigm. Thus, we cannot yet make any general claims on this point. 160 Appendix A Low-Level Algorithmic Details A.1 The Message Passing Subsystem A.1.1 Checking for realm completion Figure 34 depicts the function realmComplete, which performs two services for the completeRealm algorithm. Procedure completeRealm invokes realmComplete to decide whether to terminate the loop over pending requests for universe acquisition. When this loop terminates, the realm must be officially completed by moving all of the acquired universes into the core of p. For efficiency reasons, realmComplete actually performs the movement into the core in addition to returning the boolean value true to signal termination of the acquisition loop in completeRealm More precisely, realmComplete (p) returns true if and only if: 1. Vu : (p, u) E Negotiating e (p, u) = Owned and 2. and p has no knowledge of an imminent transition within any of its negotiations. 162 proc signal( p : Process, msg : MessageType, u : Universe ) begin enqueueMsg(mng(p), ( msg, u )); end function realmComplete( p : Process) returns boolean is var pendingMessages: boolean; begin if (pending(p) > 0) then return false; end; LOCK(P); pendingMessages := true; if (empty(mng(p))) then pendingMessages := false; start_time(p) := currenLtimeO; end; UNLOCK(p); return a pendingMessages; end function getMessage( p : Process ) returns MessageType x Universe is var msg: MessageType; u: Universe; begin wait until (-1 empty(mng(p))) then LOCK(P); msg, u := dequeueMsg(mng(p)); UNLOCK(p); end; return ( msg, u ); end function inMigrating( u : Universe, p : Process ) returns bool is begin return migrating(u) A owner(u) = p; end function inOwned( u : Universe, p : Process ) returns bool is begin return amigrating(u) A owner(u) = p; end Figure 34: Various algorithms for message passing and negotiation-state observation. 163 This condition is checked by evaluating whether p has any pending acquisition requests or pending messages. If there are no pending acquisition requests or messages in the queue, then all of the requested universes are moved into the core by updating the start time of p. A.1.2 Message Transmission Inter-process messages are transmitted by procedure signal and retrieved by procedure getMes sage. These procedures implement message passing by appending or remov- ing messages from a process’ message queue. Procedure signal is trivial. Procedure getMes sage retrieves the first message on the queue, blocking if the queue is empty. By our definition of the wait until primitive, p is locked for the duration of the test. A.1.3 In-state predicates Functions inMigrating (u, p) and inOwned (u, p) return true if (p, u) = Migrating and (p, u) = Owned respectively. The actual conditions checked are explained in Section 5.2.]. A2 Releasing and Requesting Universes A.2.l Manipulating the Set of Blocked Universes Procedures recordBlock ed and recordUnblocked add (respectively remove) a uni- verses to (respectively from) the set of blocked universes of a process (CalloutFigure- fig.request.release). Both implementations are straightforward. Observe that if the set is 164 proc request( u : Universe, p : Process, rc : Natural ) is var oldHead : Process := head(rqu(u)); begin if (contains(rqu(u), p)) then updateQueueRC(rqu(u), p, re); else enqueue(rqu(u), start_time(p), p, re); pending(p) := pending(p) + 1; if (oldHead 79 1 A head(rqu(u)) 75 oldHead) then recordBlocked(u, oldHead); end; end end proc recordBlocked( u: Universe, p : Process ) is begin LOCK(p); blocked(p) := blocked(p) + { u }; UNLOCK(p); end proc releaseFromQueue( u : Universe, p : Process, rc : Natural ) is var preempting : boo]; begin LOCK(P); preempting := u ¢ blocked(p); UNLOCK(p); updateQueueRC(rqu(u), p, -rc); if (-1contains(rqu(u), p)) then pending(p) := pending(p) - 1; recordUnblocked(u, p); if (preempting A -1 empty(rqu(U)) A start_time(head(rqu(u))) < acquisition_time(u)) then recordUnblocked(u, head(rqu(U))); end end; end proc recordUnblocked( u: Universe, p : Process ) is begin LOCK(p); blocked(p) := blocked(p) - { u }; UNLOCK(p); end Figure 35: Support procedures for requesting and releasing universes. changed for a process that is not the currently executing process, then it might trigger the wait until test in function getMessageCooperatively (Figure 20). Thus, depending on the implementation of wait until, this condition might need to be signaled explicitly. A.2.2 Requesting a Universe Procedure request enqueues a request, by a process p, for a universe u with reference count rc (Figure 35). The procedure begins by storing the current head of the queue in a local variable (oldHead) for later reference. It then tests if the queue already contains the process p, and if so, the reference count in the queue for that process is updated, and no fur- ther action is taken. Otherwise, p is inserted into the queue according to its 5 tart_time, and pending (p) is incremented to reflect that there is one more outstanding request. Inserting a new process into the queue could make the new process the head of the queue, if for example the priority of p exceeds that of oldHead. In this situation, oldHead must be informed that its negotiation for u is now blocking. Observe that it is possible that the negotiation was already blocking prior to the insertion, but the negotiation could also have been preempting. A.2.3 Releasing a Process from a Queue Procedure releas eFromQueue removes from the request queue of u some number (rc) of link requests on behalf of process p. This removal is accomplished by decrementing the reference-count entry in the tuple associated with p. If this operation removes all pending link requests on behalf of p, then the tuple associated with p is removed, in which case 166 the number of pending requests for p must be decremented by one and u must be removed from blocked (p) . The operation becomes more complicated if the entry for p was at the head of the queue and if (p, u) was preempting because then removing the entry for p might affect the blocked status of a another process in the queue.1 Specifically, if what was the second entry in the queue becomes the new head of the queue, the process associated with this second entry might need unblocked. To test this case, the procedure sets the flag preempt ing to record whether (p, u) = Preempting. This implies that p was at the head of the queue prior to the decrementing of link requests. If p was preempting, the queue is not empty, and the start time of process q that is the new head of the queue is less than the acquisition time of the universe, then (q, u) must transition from Blocked to Preempting. 1Notice that accessing the start time of the process at the head of the queue must be synchronized. Whereas the queue is ordered by the start time of the process component of its elements, we assume here that we can access this time from the tuple at the head of the queue, as opposed to accessing it from the process component of this tuple. 167 Bibliography [ABAK95] D. Agrawal, J. L. Bruno, A. El Abbadi, and V. Krishnaswamy. Managing con- [ACM87] [AG97] [BA90] [BAL+01] [BAR+O3] [BB89] [BB98] current activities in collaborative environments. In Conference on Cooperative Information Systems, pages 112-124, 1995. R. Agrawal, M. J. Carey, and L. W. McVoy. The performance of altema- tive strategies for dealing with deadlocks in database management systems. IEEE Transactions on Software Engineering, SE-13(12):l348—1363, Decem- ber 1987. R. Allen and D. Garlan. A formal basis for architectural connection. ACM Transactions on Software Engineering and Methodology, 6(3):213—248, July 1997. M. Ben-Ari. Principles of Concurrent and Distributed Programming. Prentice Hall, 1990. D. F. Bacon, C. R. Attanasio, H. B. Lee, V. T. Rajan, and S. Smith. Java with- out the coffee breaks: a nonintrusive multiprocessor garbage collector. In Pro- ceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, pages 92—103. ACM Press, 2001. C D. F. Bacon, C. R. Attanasio, V.T. Rajan, S. E. Smith, and H. B. Lee. A pure reference counting garbage collector. Submitted for publication, available at http://www.research.ibm.com/people/dldfblpapers.htrnl, 2003. T. Bolognesi and E. Brinksma. Introduction to the ISO specification language LOT OS. In P. H. J. van Eijk, C. A. Vissers, and M. Diaz, editors, The For- mal Description Technique LOTOS, pages 23—73. Elsevier Science Publishers North-Holland, 1989. V. Barbosa and M. Benevides. A graph-theoretic characterization of and-or deadlocks, 1998. I68 [BBD82] [BBFOI] [BHG87] [BJPW99] [3] R99] [BK85] [BK9l] [BKH97] [Blo79] [Bri72] [Bry86l [BSOO] R. Balter, P. Berard, and P. Decitre. Why control of the concurrency level in distributed systems is more fundamental than deadlock management. In Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing, pages 183—193. ACM Press, 1982. V. C. Barbosa, M. R. F. Benevides, and A. L. Oliveira Filho. A priority dy- namics for generalized drinking philosophers. Information Processing Letters, 79(4):]89—195, 2001. P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Re- covery in Database Systems. Addison-Wesley, Reading, Massachusetts, 1987. A. Beugnard, J.-M. Jézéquel, N. Plouzeau, and D. Watkins. Making compo- nents contract aware. IEEE Computer, 32(7):38—45, July 1999. G. Booch, I. Jacobson, and J. Rumbaugh. Unified Modeling Language User Guide. Addison Wesley, Reading, Massachusetts, 1999. J. A. Bergstra and J. W. Klop. Algebra for communicating processes with abstraction. Theoretical Computer Science, 37:77—12], 1985. N. S. Barghouti and G. E. Kaiser. Concurrency control in advanced database applications. ACM Computing Surveys, 23(3):269—317, 1991. A. Burger, V. Kumar, and M. L. Hines. Performance of multiversion and distributed two-phase locking concurrency control mechanisms in distributed databases. Information Sciences, 96(1+2):129—152, 1997. T. Bloom. Evaluating synchronisation mechanisms. In Seventh International Symposium on Operating System Principles, pages 24—32, 1979. P. Brinch Hansen. Structured multiprogramming. Communications of the ACM, 15(7):574—578, July 1972. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677—69l, 1986. R. Behrends and R. E. K. Stirewalt. The universe model: An approach for improving the modularity and reliability of concurrent programs. In D. S. Rosenblum, editor, Proceedings of the ACM SIGSOFT 8th International Sym- posium on the Foundations of Software Engineering ( F SE-00), volume 25, 6 of 169 [BW88] [Car90] [CG92] [CH74] [Che95] [CM81] [CM84] [CPF99] [C899] [CT 90] ACM Software Engineering Notes, pages 20—29. ACM Press, November 8-10 2000. H. Boehm and M. Weiser. Garbage collection in an uncooperative environ- ment. Software — Practice and Experience, 18:807—820, 1988. D. Caromel. Concurrency: An object-oriented approach. In Jean Bezivin, Bertrand Meyer, and Jean-Marc Nerson, editors, TOOLS 2 (Technology of Object-Oriented Languages and Systems), pages 183—197. Angkor, 1990. N. Carriero and D. Gelemter. Coordination Languages and Their Significance. Communications of the ACM, 35(2):97—-107, February 1992. R. H. Campbell and A. N. Haberrnann. The specification of process synchro- nization by path expressions. In E. Gelenbe and C. Kaiser, editors, Operating Systems, Proceedings of an International Symposium, volume 16 of Lecture Notes in Computer Science, pages 89—102. Springer-Verlag, 1974. I.-R. Chen. Stochastic Petri net analysis of deadlock detection algorithms in transaction database systems with dynamic locking. The Computer Journal, 38(9):717—733, 1995. K. M. Chandy and J. Misra. Asynchronous distributed simulation via a se- quence of parallel computations. CACM, 24:198—206, 1981. K. M. Chandy and J. Misra. The drinking philosopher’s problem. ACM Transactions on Programming Languages and Systems, 6(4):632—646, Octo- ber 1984. C. D. Carothers, K. S. Perumalla, and R. Fujimoto. The effect of state-saving in optimistic simulation on a cache-coherent non-uniform memory access ar- chitecture. In Winter Simulation Conference, pages 1624-1633, 1999. M. Choy and A. Singh. Dynamic resource allocation using views. Technical Report TRCS99-36, Department of Computer Science, University of Califor- nia, Santa Barbara, 1999. C. Critchlow and K. Taylor. The inhibition spectrum and the achievement of causal consistency. In Proceedings of the 9th ACM Symposium on Principles of Distributed Computing, pages 31—42, 1990. 170 [CW02] [Dij68] [DYKOl] [EAWJ 96] [EGL'I76] [1311800] [FA93] [FHRT92] [Fid88] [FKOI ] [F0896] M. Compton and R. Walker. A run-time system for SCOOP. Journal of Object Technology, 1(3):]19—157, 2002. Special issue: TOOLS USA 2002 proceed- ings. E. W. Dijkstra. Co-operating sequential processes. In F. Genuys, editor, Pro- gramming Languages, pages 43—112. Academic Press N.Y., 1968. L. DeMichiel, L. U. Yalcinalp, and S. Krishnan. The Enterprise JavaBeans 2.0 specification, 2001. http:/ljava.sun.com/products/ejb/docs.html. M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback- recovery protocols in message passing systems. Technical Report CMU-CS- 96181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996. K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. The notions of consistency and predicate locks in a database system. Communications of the ACM, 19(11):624—633, November 1976. R. S. Engelschall. Portable multithreading — the signal stack trick for user- space thread creation. In USENIX, editor, 2000 USENIX Annual Technical Conference: San Diego, CA, USA, June 18—23, 2000, pages 239—249, Berke- ley, CA, USA, 2000. USENIX. S. Frolund and G. Agha. A language framework for multi-object coordination. In 0. Nierstrasz, editor, Proceedings of the EC OOP ’93 European Conference on Object-oriented Programming, pages 346-360. Springer-Verlag, 1993. P. A. Franaszek, J. R. Haritsa, J. T. Robinson, and A. Thomasian. Distributed concurrency control with limited wait-depth. In 12th International Conference on Distributed Computing Systems, pages 160—169, Washington, DC, USA, June 1992. IEEE Computer Society Press. C. Fidge. Timestamps in message-passing systems that preserve the partial or- dering. Australian Computer Science Communications, 10(1), February 1988. K. Fisler and S. Krishnamurthi. Modular verification of collaboration-based software designs. In Proc. of the ESEC and ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2001. 1. Foster. Compositional parallel programming languages. ACM Transactions on Programming Languages and Systems, 18(4):454—476, July 1996. 171 [FP97] [FR85] [GCCC85] [GHJ V95] [GJSB96] [Han99] [Har87] [Hoa74] [Hoa78] [Hoa85] [H0199] [H292] C. Fleiner and M. Philippsen. Fair multi-branch locking of several locks. In International Conference on Parallel and Distributed Computing and Systems, pages 537—545. IASTED/ACTA Press, 1997. P. Franaszek and J. T. Robinson. Limitations of concurrency in transaction processing. ACM Transactions on Database Systems (TODS), 10(1):1—28, 1985. D. Gelemter, N. Carriero, S. Chandran, and S. Chang. Parallel programming in Linda. In Proceedings of the 1985 International Conference on Parallel Pro- cessing, pages 255—263, University Park, Pennsylvania, August 1985. IEEE. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Pattems: Ele— ments of Reusable Object-Oriented Software. Addison—Wesley Publishing Company, Reading, Massachusetts, 1995. J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification. Addison-Wesley, 1996. P. Brinch Hansen. Java’s insecure parallelism. ACM SIGPLAN Notices, 34, 1999. D. Harel. Statecharts: a visual formalism for complex systems. Science of Computer Programming, 8, 1987. C. A. R. Hoare. Monitors: An operating system structuring concept. Commu- nications of the ACM, 17(10):549-557, October 1974. C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666—677, August 1978. C. A. R. Hoare. Communicating Sequential Processes. Prentice/Hall Intema- tional, Englewood Cliffs, New Jersey, 1985. D. Holmes. Synchronisation Rings - Composable Synchronisation for Object- Oriented Systems. PhD thesis, Macquarie University, Sydney, October 1999. M. Hsu and B. Zhang. Performance evaluation of cautious waiting. ACM Transactions on Database Systems, l7(3):477—512, September 1992. 172 [IEE96] [INM88] [J ef85] [JL96] [JP93] [J S99] [Kai95] [KLMO91] [LCOI] [Mat89] [Mat93] [Mey92] IEEE. 1996 (ISO/IEC) [IEEE/ANSI Std 1003.], 1996 Edition] Information Technology — Portable Operating System Interface (POSIX®) — Part I : Sys- tem Application: Program Interface (API) [C Language]. IEEE, New York, NY, USA, 1996. INMOS, Ltd. OC CAM 2, Reference Manual. Prentice-Hall, 1988. D. R. Jefferson. Virtual time. ACM Transactions on Programming Languages and Systems (TOPLAS), 7(3):404—425, 1985. R. Jones and R. Lins. Garbage Collection. John Wiley and Sons, New York, 1996. G. Jalloul and J. Potter. A separate proposal for Eiffel. In Proceedings of the 2nd Conference on Technology of Object-Oriented Languages and Systems (TOOLS Pacific ’93). Prentice Hall, 1993. D. J iang and J. P. Singh. Scaling application performance on a cache-coherent multiprocessor. In Proceedings of the 26th annual international symposium on Computer architecture, pages 305—316. ACM Press, 1999. G. E. Kaiser. Cooperative transactions for multiuser environments. In Modern Database Systems, pages 409—433. ACM Press and Addison-Wesley, 1995. A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the13th ACM Symposium on Operating Systems Principle, pages 41—55. ACM Press, 1991. B. Leuf and W. Cunningham. The Wiki Way: Collaboration and Sharing on the Internet. Addison-Wesley, 2001. F. Mattem. Virtual time and global states of distributed systems. In Michael Cosnard, Yves Robert, Patrice Quinton, and Michael Raynal, editors, Parallel and Distributed Algorithms, pages 215—226. Elsevier Science Publishers B. V. (North Holland), Amsterdam, 1989. F. Mattem. Distributed Control Algorithms (Selected Topics), pages 167—185. Springer-Verlag, 1993. B. Meyer. Eiflel: the Language. Prentice Hall, 1992. 173 [Mey93] [Mey97] [Mi180] [Mit95] [MWL90] [Nie95] [NM92] [OG76] [Omo9 l] [PA98] [Pat90] [Pur94] B. Meyer. Systematic concurrent object-oriented programming. Communica- tions of the ACM, 36(9):56—80, 1993. B. Meyer. Object-Oriented Software Construction. Prentice Hall, 1997. R. Milner. A Calculus of Communicating Systems. Number 94 in LNCS. Springer-Verlag, Berlin, 1980. S. Mitchell. TAO —- A Model for the Integration of Concurrency and Synchro- nisation in Object-Oriented Programming. PhD thesis, University of York, UK, 1995. A. Martinez, R. Wachenchauzer, and R. D. Lins. Cyclic reference counting with local mark-scan. Information Processing Letters, 34(1):31—35, 1990. O. Nierstrasz. Regular types for active objects. In 0. Nierstrasz and D. Tsichritzis, editors, Object-Oriented Software Composition, pages 99—121. Prentice-Hall, 1995. R. H. B. Netzer and B. P. Miller. What are race conditions?: Some issues and formalizations. ACM Letters on Programming Languages and Systems, 1(1):74—88, March 1992. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs. Acta Information, 6:319—340, 1976. S. M. Omohundro. The Sather language. Technical report, International Computer Science Institute, 1947 Center Street, Suite 600, Berkely, California 94704, 1991. G. A. Papadopoulos and F. Arbab. Coordination models and languages. In M. V. Zelkowitz, editor, Advances in Computers, volume 46, pages 329-400. Academic Press, 1998. P. C. Patton. The Strand88 Concurrent Programming Environment. Superper— formance Computing Service/Strand, London, United Kingdom, March 1990. Brief No. 39. J. M. Purtilo. The POLYLITH software bus. ACM Transactions on Program- ming Languages and Systems, 16(1): 151—174, January 1994. 174 [Rhe98] [RSL78] [SBDO3] [SKTT 96] [SM94] [SPG91] [Tan92] [TF02] [TGC93] [Wil92] [YT87] I. Rhee. A modular algorithm for resource allocation. Distributed Computing, 11(3):157—168, 1998. D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, II. System level concurrency control for distributed database systems. ACM Transactions on Database Sys- tems, 3(2):]78—198, June 1978. R. E. K. Stirewalt, R. Behrends, and L. K. Dillon. A model-based semantics for synchronization contracts in object-oriented languages. Submitted for pub- lication in the Elsevier journal Science of Computer Programming. An online copy may be found at http://www.cse.msu.edul'stire/Papers/scp03.pdf, 2003. J. Siegel, A. Klein, A. Thomas, D. Frantz, and H. Mirsky. CORBA Fundamen- tals and Programming. John Wiley and Sons, Inc, New York, 1996. R. Schwarz and F. Mattem. Detecting causal relationships in distributed com- putations: In search of the holy grail. Distributed Computing, 7(3):]49—174, 1994. A. Silberschatz, J. L. Peterson, and P. Galvin. Operating Systems Concepts. Addison-Wesley, third edition edition, 1991. A. S. Tanenbaum. Modern Operating Systems. Prentice Hall, Englewood Cliff, NJ, 1992. D. Tsafrir and D. G. Feitelson. Barrier synchronization on a loaded SMP using two-phase waiting algorithms. In International Parallel and Distributed Proceedings Symposium. IEEE, April 2002. E. Tryggeseth, B. Gulla, and R. Conradi. Software configuration management in PROTEUS. In Proceedings of the 4th International Workshop on Software Configuration Management (Preprint), pages 232—240, Baltimore, Maryland, 1993. P. R. Wilson. Uniprocessor garbage collection techniques. In Y. Bekkers and J. Cohen, editors, Proc. of the International Workshop on Memory Manage- ment, Lecture Notes in Computer Science 637, pages 1—42. Springer—Verlag, 1992. Y. Yokote and M. Tokoro. Concurrent programming in ConcurrentSmalltalk. In A. Yonezawa and M. Tokoro, editors, Object-Oriented Concurrent Pro- gramming, pages 129—158. MIT Press, Cambridge, Mass., 1987. 175 E 1111111111