{a
.. «mad... 2: 4
. £ ; ..
wane?

.

a; i 4.4... .
.. kl. rung—11.5“:-
ngévmﬁngix‘iuﬁu.

luru...i>.w.l

. «having...
.3,tv.!-LI§.£ .!.
.3}. 51 4

...

Int- ” :-
Lﬁﬁﬁ i

Bu.

_.

. z: i .
Stu fad.
.33: 1...... .

.5 .
L?! 15......08Jrliﬂalil
{A1315} xixl
.YEh‘Q-t

5;.
.33.. “3:11
. . . . :

.u’o‘oo: i’lvtf

.
.5): .2:
3.4....“1:

2):???

 

.
.31 1.1:
:3 t.

 

 

 

THESIS)

7000

 

LIBRARY
Michigan State
University

 

 

 

This is to certify that the

dissertation entitled

DESIGN AND ENGINEERING OF COMPLEX REAL-TIME SYSTEMS

presented by

Aleksandar M. Bakic

has been accepted towards fulﬁllment
of the requirements for

 

Doctoral degreein Computer Science &
Engineering

7% M494

Major professor

Date Mag. 311 m

M5 U is an Aﬂirmaliw Action/Equal Opportunity Institution 0-12771

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

11m wow.“

 

DESIGN AND ENGINEERING OF COMPLEX REAL-TIME
SYSTEMS

By

Aleksandar M. Bakic’

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2000

ABSTRACT
DESIGN AND ENGINEERING OF COMPLEX REAL-TIME SYSTEMS
By
Aleksandar M. Bakic’

Complex real-time systems are emerging parallel or distributed, heterogeneous
computer systems with many disparate constraints and requirements. Their subsys—
tems and components are designed using appropriate, multiple models from real-time
scheduling theory, resource allocation and quality-Of-service managements schemes.
Issues in the design, engineering and deployment of complex real-time systems ad-
dressed by this research include problem-solving approaches for ﬁnding satisfactory
values of system parameters according to the real-time models and their integration;
dynamic reconﬁguration; instrumentation; and on-line performance analysis and vi-
sualization.

Four main objectives of this research resulted in proof-Of-concept contributions. A
compiler-based approach to design and engineering of complex real-time systems was
designed, implemented and evaluated that represents a systems engineering frame-
work particularly suitable for this systems domain. A distributed instrumentation

system kernel was developed and evaluated in order to investigate performance and

portability issues of the instrumentation of parallel and distributed, heterogeneous
systems. A comprehensive on-line performance analysis and visualization technology,
for the same domain and with an emphasis on complex real-time systems, was de-
veloped that incorporates common properties Of extant tools and provides a basis for
advanced on-line performance analysis and visualization tools. A mainly automated
technology that combines the preceding results was developed in an attempt to in-
tegrate systems engineering and on-line performance analysis and visualization in a

way that facilitates on-line reconﬁguration of complex real—time systems.

 

 

 

© Copyright 2000 by Aleksandar M. Bakié
All Rights Reserved

To my parents

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Matt W. Mutka, for his continued guidance and
support during my research. I have been fortunate to have an experienced researcher
and talented pedagog to supervise me. He made the pursuit Of my PhD. an enjoyable
and invaluable professional experience.

I am thankful to Dr. Diane T. Rover for providing me with my ﬁrst research Oppor—
tunity and continuing to support me as a PhD. student. Her ideas and constructive
critiques in our collaboration made a difference so many times.

I thank the other members of my guidance committee, Dr. Anthony S. Wojcik
and Dr. James Stapleton, for their helpful comments and suggestions.

I wish to thank Dr. George C. Stockman again for extending me an offer to come
to the MSU in the ﬁrst place. I am indebted to Dr. Lionel M. Ni for valuable lectures
in practical, and Dr. Eric K. Torng in theoretical aspects of Computer Science; as
well as to other MSU professors who taught me advanced and exciting.

Many of my fellow students helped me and collaborated with me over the course
of study. I would like to thank them all, especially Peter C. Wong, Paul A. Reed,

Hugh M. Smith, Wenting Tang, Kuk-jin Lee and Abdul Waheed. I would also like to

vi

thank the CSE and ECE departments for the excellent facilities, and technical and
administrative support.

I am grateful to my family for their love, support and patience. Special thanks to
my wife Vera for her love and understanding, and to our little daughter Jovana for

bringing joy to our lives.

vii

TABLE OF CONTENTS

LIST OF TABLES x
LIST OF FIGURES xi
1 Introduction and Motivation 1
1.1 Focus Of the Dissertation ........................... 2
1.2 Motivation ................................... 6
1.2.1 Design and Engineering of Complex Real-Time Systems ........ 7
1.2.2 Distributed Instrumentation Systems ................... 9
1.2.3 On-Line Performance Analysis and Visualization ............ 11
1.3 Research Objectives, Activities and Contributions ............. 13
1.4 Organization ................................. 17
2 Background and Related Work 20
2.1 Distributed Real-Time System Design ................... 21
2.2 Constraint Logic Programming and Real-Time Systems .......... 26
2.2.1 Constraint Logic Programming Background ............... 27
2.2.2 Use of CLP as Support for Real-Time Systems ............. 30
2.2.3 Overview Of ECL‘PSe ........................... 32
2.3 Explicit Timing Constraint Checking .................... 34
2.4 Distributed Instrumentation Systems .................... 38
2.5 Performance Analysis and Visualization .................. 42
3 A Compiler-Based Approach to Design and Engineering of Complex
Real-Time Systems 47
3.1 Real-Time System Speciﬁcation in RTSML ................. 48
3.2 Process of Compilation to CLP ....................... 55
3.2.1 Module common ............................... 57
3.2.2 Module rms ................................. 58
3.3 Experiments .................................. 61
3.3.1 Conventional CLP Approach ....................... 64
3.3.2 Repair-Based CLP Approach ....................... 65
3.4 Scalability Issues in the Approach ...................... 69
3.5 On the Correctness of the Approach and Compilation ........... 70
3.5.1 Language Semantics and the Power of CLP Problem Solving ...... 71
3.5.2 Correctness-Related Compilation Details ................. 75
3.5.3 Evaluation Against the High-Integrity Compilation Criteria ...... 77
3.6 Summary ................................... 78

4 A Portable and Flexible Distributed Instrumentation System 80

4.1 Objectives and Approaches ......................... 81
4.2 Description of BRISK ............................ 85
4.2.1 Architecture ................................. 85
4.2.2 Implementation ............................... 87
4.3 Evaluation of BRISK ............................. 96
4.3.1 Local Performance ............................. 97
4.3.2 Distributed Performance .......................... 101
4.4 Summary ................................... 108
5 An On-Line Performance Visualization Technology 109
5.1 Visual Object Architecture .......................... 110
5.1.1 Low-level visual object ........................... 112
5.1.2 High-level visual object .......................... 114
5.1.3 Application of visual objects to a heterogeneous system ......... 117
5.2 Visual Object Markup Language (VOML) ................. 118
5.2.1 Event Processing and Information Rendering Architecture (EPIRA) . . 119
5.2.2 The VOML language ............................ 121
5.2.3 The VOML compiler ............................ 125
5.3 The VOML Speciﬁcation of a Simple Visual Object ............ 128
5.4 Summary ................................... 134
6 An Integrated Approach to Real-Time System Design and On—Line
Performance Visualization with Steering 136
6.1 Target Real—Time System .......................... 137
6.2 RTSML Speciﬁcation ............................. 141
6.3 RTSML Compiler Extension ......................... 150
6.4 Example Run-Time Session ......................... 156
6.5 Summary ................................... 161
7 Conclusions and Future Work 162
7.1 Research Contributions ............................ 162
7.2 Future Work ................................. 164
APPENDICES 166
A RTSML Document Type Deﬁnition (excerpt) 167
B VOML Document Type Deﬁnition (excerpt) 170
BIBLIOGRAPHY 177

ix

2.1

3.1
3.2
3.3
3.4
3.5
3.6

4.1
4.2
4.3
4.4

LIST OF TABLES

Performance visualization tools and systems ................ 45
Ranges of system parameters ........................ 50
Some constraint predicates used ....................... 60
Scenario of parameter changes ........................ 63
Conventional CLP approach timings .................... 65
Repair-based CLP approach timings .................... 67
Successive solutions’ distances ........................ 68
Summary of BRISK evaluation ....................... 96
CPU time per 6-integer NOTICE macro .................. 98
Count Of increases in time frame T ..................... 106
Peak time frame T in milliseconds ...................... 106

1.1

3.1
3.2
3.3

4.1
4.2
4.3
4.4
4.5
4.6
4.7

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14

6.1
6.2
6.3
6.4
6.5
6.6

LIST OF FIGURES

Overview of the integrated approach .................... 19
An example complex RT system ....................... 48
RTSML speciﬁcation excerpts ........................ 49
A generic repair-based CLP program .................... 66
BRISK as an instrumentation system kernel ................ 81
Architecture of the BRISK instrumentation system ............ 86
Diagram of the BRISK basic implementation ................ 88
An example three-ﬁeld NOTICE macro call (internal sensor) ....... 91
In-memory structure of the event record generated by the call in Figure 4.4 91
EXS CPU utilization for various event rates ................ 99
Measurements of the clock synchronization algorithm (8 EXS nodes, 5—
second polling period, 10-minute experiment) ............. 103
The design of a visual object ......................... 111
On-line performance visualization of the real-time multimedia application 117
Event Processing and Information Rendering Architecture (EPIRA) . . . 119
A brief description Of VOML ........................ 122
Code Of the IR component used as lineplotrender in Figure 5.4b . . . 123
VOML compilation and execution process diagram ............ 126
Sketch of a VOML speciﬁcation that uses remote component deﬁnitions . 128
Event declarations .............................. 129
Info and control structures .......................... 130
View initialization .............................. 130
Event processing components ........................ 131
Template IR component ........................... 132
Active IR component ............................. 133
A snapshot of the view ............................ 135
Original face-tracking system’s state diagram ............... 137
Distributed face-tracking system ....................... 139
RTL-speciﬁc details of the target system .................. 141
RTSML speciﬁcation excerpts ........................ 148
Integrated visualization, repair and steering ................ 151
A snapshot from the RTSML-based visual object ............. 160

xi

Chapter 1

Introduction and Motivation

A broad variety of different software systems—ranging from small, embedded con-
trollers with microsecond response times to large, heavyweight systems with response
times of seconds or even minutes—have been designated “real-time.” For both a
power plant control system and a global air-ticket reservation system, as well as an
abundance Of other diverse systems, certain common characteristics can be summa-
rized in the following deﬁnition by Selic [100]:

Deﬁnition. A real-time system is a software system that maintains a timely and
ongoing interaction with its environment.

Unlike other system resources in conventional software systems, such as memory
and processing power, the (real) time cannot be controlled; only its progress can
be measured. The timeliness property of a software system is a function of the
timeliness of individual activities (or tasks) that the system supports. A real-time
system achieves its timeliness by managing its resources accordingly. Jensen further

deﬁnes “real-time—liness” of a task [62]:

Deﬁnition. A task is real-time to the degree that its completion timeliness pre-
dictability is part of its logic.

The ongoing interaction property of a software system relates to a quality that
is Often referred to as reactivity [77]. Translated from a logic-theoretic framework to
English:

Deﬁnition. A reactive system persists across some interval of time (possibly
indeﬁnitely) during which it responds to inputs as they occur.

In this setting, the dominant feature of a real-time system is its structure, rather
than its function. To reason about the above two properties of a real-time system,
its structure must be ﬁxed, while the function may be modiﬁed. Consequently, the
focus of software system design shifts from the traditional, algorithmic to structural
approaches. The structural approaches range over different domains. On one end,
real-time scheduling theory is mainly concerned about low-level issues, such as phys-
ical resource allocation. On the other, formal software-engineering methods address
more abstract real-time requirements, such as timing relations among different appli—

cation states.

1.1 Focus of the Dissertation

Complex real-time systems are emerging parallel or distributed, heterogeneous com-
puter systems with many disparate constraints and requirements. For example, an
embedded complex real-time system may possess limited system resources (for com-
putation, communication and I / O) that need to be shared by many of its components.

2

It must perform certain tasks in a timely fashion under possibly large variances in
the amount Of data and events that it processes. If the system or some of its com-
ponents are critical for the operation of its embedding system and /or environment,
they must be made fault-tolerant. The dynamics of the environment may Span from
static models to stochastic ones, to unpredictable ones.

Consequently, components and subsystems of a complex real-time system are de-
signed using apprOpriate, multiple models from real-time scheduling theory, resource
allocation and quality-of-service management schemes. Issues that arise in the design,

engineering and deployment of complex real-time systems include the following.

1. Detailed models are complex and require exhaustive search through the design
space to ﬁnd values Of model parameters for which the system satisﬁes imposed
requirements. Engineers of complex real-time systems often cannot afford this,

and use fast heuristics or simple, ad hoc schemes instead.

2. As the above mentioned models and schemes may vary greatly in their assump-
tions and goals, it is hard to analyze how they interact when the components
and subsystems are to be integrated. The integration further increases the

complexity of the system.

3. One design solution most likely cannot satisfy the requirements when the system
parameters vary signiﬁcantly and dynamically. In a dynamic system reconﬁg-
uration, the quality Of a new design solution is almost always traded for the

speed of the reconfiguration.

4. System performance data needed to decide whether and what kind of dynamic
reconﬁguration should be performed, has to be collected and analyzed on-line.
In integrated complex real-time systems with end-to—end requirements, justified

decisions can only be made using global information.

5. To collect the performance data from a distributed, heterogeneous real-time
system, a portable distributed instrumentation system (IS) is needed. It has to

be designed with low real-time intrusion and ﬂexible performance in mind.

6. Although the performance data can be analyzed completely automatically, it is
Often required that a human operator has insight into the system performance
and makes reconﬁguration decisions. To present system performance informa—

tion at a high level, a performance visualization technology is crucial.

Problem Statement. The central focus of research presented in this disser-
tation has been a unifying, extensible, compiler-based framework for the design and
engineering of complex real-time systems. Its novelty is manifest in a comprehensive
application of a constraint programming technology in model—based design and integra-
tion of complex real-time systems. In addition, the framework lends itself to further
extensions, such as a novel integration of on-line performance visualization and dy-
namic system reconﬁguration. In this broader context, the dissertation research foci
are in the areas of distributed system instrumentation and on-line performance visu-
alization, aiming at improved testing and deployment support for complex real-time

systems.

The unifying property of the framework is related to the goal of supporting dis-
parate real-time models of system components and their integration. The compiler-
based property is aimed at allowing users Of the framework to specify complex real-
time systems at a high level, and have a knowledge-based compiler that would handle
the complexity behind high-level speciﬁcations. The extensibility property applies to
both the front and back ends, and is achieved through an extensible syntax and add-
on modules that cooperate in constructing a model of the whole complex real-time
system. The framework supports solving for the system parameters using multiple,
controllable problem-solving approaches. It also integrates an on-line performance
visualization technology and facilitates creating real-time model- and system-speciﬁc
performance visualizations. This integration is especially suitable for visualization-

driven on-line system reconﬁguration.

The general problem and solution of this dissertation should be regarded from a
systems engineering point of view. As deﬁned in [55], systems engineering is a disci-
pline that develops and exploits structured, efﬁcient approaches to analysis and design
to solve complex engineering problems. Its focus is on methods to solve problems,

not the solution of the problems per se.

In the following section, the motivation is discussed for the research in the ar-
eas of the design and engineering of complex real-time systems, distributed system
instrumentation, and on-line performance visualization, respectively.

5

1 .2 Motivation

In [104], Stankovic concludes that, despite extensive results in the area Of real-time
scheduling in recent years, the state of the art still provides piecemeal solutions.
There are many realistic issues that have not yet been addressed in an integrated
and comprehensive manner. New scheduling approaches should be analyzable and
comprehensive enough to simultaneously handle multiple characteristics of real-time
systems, such as: task preemptiveness, periodicity, importance, and grouping; prece-
dence, placement, and end-to-end timing constraints; communication, resource, and
fault tolerance requirements; tight and loose deadlines, and normal and overload con—

ditions. The solution should be integrated enough to handle interfaces between

CPU scheduling and resource allocation,

1/ O scheduling and CPU scheduling,

CPU scheduling and real-time communication scheduling,

local and distributed scheduling, and

static scheduling of critical tasks and dynamic scheduling of essential and non-
essential tasks.

These issues are too complex to be addressed all at once, and it will probably take
years of researchers’ work before ﬁrst comprehensive and tightly integrated real-time
models are devised. However, there are two constituents of a solution to the general
problem: one is the real-time theory (including scheduling and formal methods), and
the other is systems engineering. Tools are needed that will bridge the gap between
theory and practice in an efﬁcient manner.

6

1.2.1 Design and Engineering of Complex Real-Time Sys—

tems

An idea behind this dissertation that has led to the choice of the main parts of the
compiler-based framework—the target language and compiler back-end—is that a
complex real-time system could be regarded as a complex, highly non-linear network
(or circuit), seemingly analogous to those from, e.g., control theory. (Although the
similarity comes from non-linear relations between interconnected components—in
particular, the way changes in Operation of an element affect the Operation of other
elements—it is difﬁcult to see whether analogues of the control theory approaches
can help with solving real-time computer system problems. Almost the opposite,
real-time control systems, are common nowadays [14]. There has been theoretical
research [110] in applying the control theory to proving the stability of self-stabilizing
distributed algorithms. However, no real-time properties of algorithms or systems
have been analyzed.) In such a network, real-time activities are performed according
to real-time models, across interconnected bounded hardware resources. Relations
among the activities and the resources, within subsystems and among the subsystems,
are relatively simple and best termed as time and resource constraints expressed
via equations and inequalities of the real-time models. This common underlying
property of real-time models has been a major motivation for adopting a constraint-
programming approach to solving complex real-time system problems. To solve such
a network means to solve for unknown real-time model parameters and perform the
resource allocation. This inevitably requires combinatorial search in general.

7

A new class of programming languages combining the declarativity of Logic Pro-
gramming with the efficiency of constraint solving is Constraint Logic Program-
ming (CLP). New application areas, including combinatorial search problems such as
scheduling, planning or resource allocation can now be tackled, which were intractable
for logic programming [36]. The declarativity of a Prolog-like CLP language provides
an elegant way of expressing the disparate requirements Of complex real-time systems
in an integrated manner.

Among the research foci proposed by a group of leading experts for real-time sys-
tems [103] is the deveIOpment of requirements and internal documentation which can
state real-time requirements precisely and can be used for maintenance and inspection
Of safety critical systems. Specifying a complex real-time system means specifying its
structure and relations among individual components and subsystems, accompanied
with real-time semantics. This task can be broken into two orthogonal ones. First,
the structure (such as the breakdown of computation and communication onto tasks
and messages) and basic relations (such as resource allocation) are independent of
the heterogeneity present in complex real-time systems that primarily comes from
disparate real-time models. Their speciﬁcation can be supported by a ﬁxed syntax.
Second, each real-time model deﬁnes certain semantics, and some semantics are at-
tached to the rules of integration of two different real-time models. These rules can
be viewed as a variable, model-speciﬁc syntax.

The Standard Generalized Markup Language (SGML) [42] is a meta-language and
information infrastructure for structured documents and speciﬁcations that has been
used, among many other applications, for maintaining technical documentation. It

8

allows for explicitly specifying the structure of a document and the relations among
its elements, while the checking of the document semantics is performed by tools
called SGML applications. The motivation to use SGML for the compiler front-end
comes from an idea that a complex real-time speciﬁcation can also be viewed as a
multi—purpose speciﬁcation document. In other words, the compiler front-end is an

SGML application.

1.2.2 Distributed Instrumentation Systems

Besides speciﬁcation and problem-solving tools addressed in the previous subsection,
tools are also needed that can provide feedback from tested and deployed complex
real-time systems. The feedback in the form of collected system performance data
can be analyzed on-line or most-mortem, and is necessary for validating the design
assumptions. Furthermore, the performance data collected and analyzed on-line can
be further fed to a problem-solving tool, in order to ﬁnd new values of system param-
eters that might better handle a current situation imposed onto a deployed system
by the environment. It should also be possible to enforce the new parameters onto
the running target system. Altogether, a distributed instrumentation system, which
can be used for collecting performance data in real time and without signiﬁcant in-
trusion on the target complex real-time system, is necessary. Since it would have an
infrastructure to link the real-time system performance data and the problem-solving
tool in one way, reusing the infrastructure for steering the target system (i.e., en-
forcing the new parameters) is possible. Finally, the heterogeneous and still evolving

9

nature of complex real-time systems prevents making most Of detailed design and
implementation decisions of such a distributed IS at once and in advance. Hence, the
needed IS should rater be designed and seen as a kernel than a ﬁnal and complete
implementation.

Designers and users of parallel and distributed systems have applied a variety of
monitoring methods and instrumentation techniques to gather information for testing,
debugging, and analyzing performance and optimizing systems [112]. These methods
and techniques require deveIOpment of systems for instrumentation data collection,
management and analysis, which are themselves distributed systems. A distributed
instrumentation system is typically specialized to an application domain and / or com-
puting environment. Moreover, the distributed nature Often makes it harder to use
and adapt. A signiﬁcant investment in time and effort may be needed to understand
the IS implementation sufﬁciently to port and / or conﬁgure it for another application
and / or environment.

An off-the-Shelf distributed IS that is robust, portable and flexible would beneﬁt
both designers and users of a wide range of parallel and distributed systems. It would
allow them to instrument and begin monitoring their system rapidly, and as needed
subsequently, to optimize or extend the IS for their environment. High performance
and openness of the IS implementation are equally important for its success as general-
purpose systems software.

Obviously, the requirements of a distributed IS for instrumenting and steering
complex real-time systems are broad, and even exceed those for instrumenting con-
ventional parallel and distributed systems. A viable approach to the design and

10

implementation of such an IS could be one of a portable and ﬂexible distributed IS
kernel, augmented with features that are needed in the complex real-time system
domain. One such IS feature, for example, is being able, to a certain degree and with
cooperation with the Operating system, to schedule its computation and communica-
tion activities in a way that reduces its intrusion, in the real-time sense, on the target

system.

1.2.3 On-Line Performance Analysis and Visualization

With the advent of computer technology, systems are becoming more and more com-
plex and, at the same time, there is more and more computing power available and
needed for supporting the systems themselves, in various forms of on-line analyses.
One large class of the on-line analysis deals with the system performance, trying
to summarize and /or explain various performance metrics, such as the efﬁciency of
resource usage or quality of service provided for higher-level activities. Advanced
analyses attempt to diagnose performance problems and suggest actions that may
lead to the performance improvements. In the domain of complex real-time systems,
the on-line analysis places emphasis on different measures of the real-time—liness, both

hard and soft, individual and collective ones.

Performance analysis and visualization (PAV) tools are crucial components of an
effective development cycle, as well as deployment, of parallel and distributed appli-
cations. On-line PAV is even becoming necessary for the latter. Since the amount of
performance data to be analyzed and visualized increases with the size of a target par-

11

allel/distributed application, on-line PAV itself Should be distributed. Heterogeneous
systems, in addition, need PAV tools that provide ﬂexible integration and conﬁgura-
tion support for heterogeneous performance data. Extant generic and library-speciﬁc
PAV tools for parallel/ distributed systems can cover only low-level performance as-
pects, provided that the target systems ﬁt into their generic schemes and/or use
speciﬁc libraries, such as PVM [32] and MP1 [31]. A wider range of performance as-
pects, at multiple levels, global and local, are needed to capture and visually explain
the behavior of a heterogeneous system.

As the performance data are gathered from different subsystems and components
of a complex real-time system by a distributed IS, they need to be analyzed according
to general, but also model-speciﬁc real-time-liness criteria. One example of model-
speciﬁc criterion is that in all sequences of n invocations of a periodic activity, at
least m (m g n) must meet their deadline [15]. This requires a PAV tool that
can be extended to support new analyses. There are also real-time quality-of-service
metrics of interest that have a more complex semantics than, for example, the just
mentioned m-out-of-n criterion. Such metrics are often used for applications running
atop of systems that provide generic real-time support. In these cases of vertical real-
time integration, automated analyses that are comprehensive and helpful are hard
to devise, and it may be necessary to provide visualizations of simpler performance
metrics for a human operator to analyze ad hoc instead. Finally, complex real-time
systems are supposed to guarantee a degree of aggregate real-time-liness through
the integration of its subsystems and components. However, if certain temporary
conditions, such as overloads, violate the design assumptions, the guarantee is voided.

12

If the assumed real-time—liness is found to have deteriorated, it may be safer to treat
the situation as unpredicted than to analyze it automatically. Again, the human
operator needs a potentially large amount of performance information presented in the

form of visualizations in order to make a dynamic reconﬁguration / recovery decision.

Altogether, on-line PAV tools can greatly beneﬁt the testing and deployment of
complex real-time systems. If well integrated with a suitable model-based problem
solving framework, it would extend the latter from a static to a dynamic one, and
make it achieve even higher level of abstraction (from textual Speciﬁcations to visu-

alizations) provided for the system designers and users.

1.3 Research Objectives, Activities and Contribu-

tions

This section states the objectives Of the research presented in the dissertation. The
statements are followed by the description of activities undertaken towards meeting

the Objectives, and a summary of contributions that have resulted from the activities.

There are four research objectives:

1. To devise a systems engineering framework particularly suitable for the design
and engineering of complex real-time systems. This includes a systems Speciﬁca-
tion language and multiple problem-solving approaches, as well as an automated
mapping between them.

13

2. To investigate performance and portability issues of the instrumentation of par-
allel and distributed, heterogeneous systems. The performance issues include
real-time prOperties relevant to the instrumentation of complex real-time sys-

tems.

3. To discover common properties of extant on-line PAV tools for parallel and
distributed, heterogeneous systems, as well as gather desired properties of the
on—line PAV viewed as a real-time middleware. Using the ﬁndings, to devise a

comprehensive PAV framework as a basis for advanced extensions.

4. Finally, to integrate the systems engineering and on-line PAV frameworks in
a way that facilitates on-line system steering/ reconﬁguration Of complex real-
time systems. The integrated approach should be semi—automated, and allow
algorithmic and visual analyses Of instrumentation data received from the target

system to utilize problem-solving tools.

The activities in pursuit of the above objectives were carried in the following order,

with certain overlappings:

1. The ﬁrst version of an Object-oriented, on-line PAV software was designed and
implemented.

2. The CLP technology was chosen as appropriate and promising for solving design
and engineering problems of complex real-time systems.

3. A simple distributed instrumentation system was designed and developed.

4. Several real-time models and small test systems were used for determining the
power of a publicly available CLP tool. The CLP code was either written
manually or generated by utility tools.

14

5. The distributed IS was made more portable, augmented with performance tun-
ing options and reconceived as an extensible IS kernel.

6. The ﬁrst version of a complex real-time system speciﬁcation language and its
translator to CLP were designed and implemented.

7. The on-line PAV software was vertically extended into a component-based tech-
nology, and supported by a high-level language and compiler.

8. A new version Of the systems speciﬁcation language was introduced together
with an extensible compiler. The ﬁrst complex real-time system was modeled.

9. The distributed IS kernel was evaluated, and then extended to provide instru-
mentation control and steering.

10. The on-line PAV technology was ported, in addition to the Unix/X11 domain,
to the Java/ WWW domain.

11. A real-time test-bed and client-server system were prepared for evaluating the
three constituents (the systems engineering framework, the on-line PAV tech-
nology, and the distributed IS kernel) of the integrated approach together.

12. The compiler for the complex real-time system speciﬁcation language was
rewritten and extended to generate system-speciﬁc code that is input to the
on-line PAV technology.

This research has made four notable contributions to the state Of the art in the
design and engineering of complex real-time systems, and instrumentation and on-

line performance analysis & visualization of a wider range of parallel and distributed,

heterogeneous systems:

1. A compiler-based approach to design and engineering of complex real-
time systems. Based on a comprehensive usage of the CLP technology, this
contribution addresses the ﬁrst Objective and problem statement by (1) enabling
high-level speciﬁcations of complex real-time systems that place emphasis on
the system structure and real-time models involved; (2) automated handling of
model-speciﬁc details, including the integration of different real-time models;

15

and (3) providing multiple problem-solving approaches, such as ﬁnding optimal

solutions and repairing partially correct solutions.

. A portable and ﬂexible distributed instrumentation system kernel.
This contribution addresses the second Objective and part Of the problem 5-
tatement by providing (1) a ﬂexible and robust tool for instrumentng complex
real-time systems; (2) base for developing elaborate, domain-Speciﬁc distributed
ISes; and (3) test-bed for experimenting with IS managing polices and IS-speciﬁc

distributed algorithms.

. An on-line performance visualization technology. Centered around an
Object-oriented, portable and distributable, adaptive, system- and application-
speciﬁc on-line performance analysis and visualization framework that supports
rapid prototyping, this contribution addresses the third Objective and important
part of the problem statement related to the dynamic reconﬁguration /steering
Of complex real-time systems. The technology extends the core framework with
(1) a PAV-speciﬁc architecture and very high-level language designed atop of
it; and (2) component-based approach to development of PAV tools; which
together form (3) a base for automated and advanced PAV systems, such as the

last contribution in this list.

. An integrated approach to real-time system design and on-line PAV
with steering. This contribution extends the compiler from the ﬁrst contri-
bution with support for generating code that is treated as input to the PAV
technology of the third contribution; it also utilizes the distributed IS kernel

16

from the second contribution to instrument and steer complex real-time sys-
tems. The fourth objective and problem statement are addressed by a novel,
integrated technology that tightly links a target complex real-time system, its
speciﬁcation, an appropriate PAV, and PAV-driven and model-based dynamic

reconﬁguration; and does so in a mainly automated manner.

Overall, all the contributions were supported with a proof of concept. Whenever
the time and technical conditions permitted, performance evaluations were carried

out additionally to support the contributions.

1.4 Organization

Chapter 2 discusses background and related work. Several approaches to the design
of distributed real-time systems are described ﬁrst. Constraint Logic Programming
is introduced and several works that use it as support for the speciﬁcation and ver-
iﬁcation of real-time systems are described next. They are followed by related work
on checking of explicit timing constraints. Issues in distributed instrumentation are
discussed, and an overview Of several distributed ISes is presented. The chapter ends
with a discussion on on-line performance analysis and visualization (PAV).

Chapter 3 presents the compiler-based approach to design and engineering of
complex real-time systems, the central focus of the dissertation. First, the Real-Time
System Markup Language (RTSML) is described. An example complex real-time
system is presented next, followed by the process Of compiling its RTSML Speciﬁca-
tion to CLP. Salient characteristics of the compiler back-end and the descriptions of

17

two real-time model-speciﬁc modules are presented. Results of experiments with the

example complex real-time system are analyzed.

Chapter 4 presents a. portable and ﬂexible distributed instrumentation system
kernel called BRISK. Along with the description, approaches that provided IS per-
formance gains are discussed. The objectives of BRISK, approaches taken in its
design and implementation, the architecture and implementation details, and results

of evaluating its performance and scalability are presented.

Chapter 5 presents a software technology for on-line performance analysis and
visualization of parallel and distributed, heterogeneous systems. A visual object
framework is described, followed by an example of its successful use for PAV of a dis-
tributed multimedia real-time application. A PAV architecture for the visual Objects,
a markup language based on it called VOML, and the development environment are
presented next. The chapter concludes with an example of a VOML speciﬁcation and

corresponding performance visualizations.

Chapter 6 presents an integrated approach to the design and engineering of com-
plex real-time systems, their instrumentation, on~line PAV, and dynamic system re-
conﬁguration. Figure 1.1 illustrates this approach; it also serves as a reference by
showing the relations between diﬂerent parts of the work presented in this disser—
tation. A real-world target distributed real-time system is described, followed by a
description of its design and engineering performed using the approach described in
Chapter 3. Technical details of the integration are explained next, and the operation
Of the new, integrated technology is shown on an example execution scenario.

18

Chapter 7 concludes the dissertation with a summary Of the major contributions
and future directions in areas that include complex real-time system problem-solving
related Optimizations, extensions of the distributed IS kernel, advanced on-line PAV

tools, and different integrations Of the systems engineering and on-line PAV frame-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  

 

 

 

 

 

works.
RT
RTSML models
specmcation
VOML
—— a ‘ specification
RTSML compiler
CLP
program
. VOML compiler
conventional
& repair-based [I
search

Visual
Object

prototype

w/steering

extension

oil-line
on-line ll
#%
target
CLP
dynamic RT system
reconfiguration instrumentation & steering p___4
human operator

Figure 1.1: Overview Of the integrated approach

19

Chapter 2

Background and Related Work

This chapter discusses background and previous work related to the dissertation re-
search presented in the following chapters. The ﬁrst section covers several approaches
to the design of distributed real-time systems. The second section provides back-
ground in Constraint Logic Programming (CLP) and presents several works that use
it as a tool in the area of real-time systems. The third section overviews other works
in the area of real-time systems that deal with explicit timing constraints (as opposed
to implicit ones, such as those contained in the constraint formulae from real-time
scheduling theory models). The fourth section describes several distributed instru-
mentation systems and their relation with BRISK. Finally, the ﬁfth section describes
related work in the area of system performance visualization, including a few more
comprehensive, integrated approaches.

20

2.1 Distributed Real-Time System Design

As was stated in Section 1.2, the state of the art in the area of real-time scheduling
still does not provide integrated and comprehensive solutions for realistic scenar-
ios. Among many fragmented design and engineering approaches for parallel and
distributed real-time systems, some concentrate on integration and decomposition.
More comprehensive approaches usually prOpose a search algorithm for solving sys—
tem parameters, such as branch-and-bound, simulated annealing, or even greedy. For
speciﬁc classes of distributed real-time systems, an approach may include run-time
support as well. The descriptions of some related approaches are presented below and

ordered approximately from simpler to more complex ones.

In [102], Spuri and Stankovic address the problem of integration of task precedence
constraints with resource sharing in real—time scheduling. Their motivation is to
give more freedom to the scheduler so that more dynamic real-time systems can be
supported. They derive analytical task schedulability formulae that can be applied

in more real-time system situations than previously developed.

An engineering technique for decomposing end-tO—end delays in distributed real-
time systems is prOposed by Saksena and Hong [97]. In eﬂ'ect, a global distributed
scheduling problem is transformed into a set of single-processor scheduling problems
with local deadlines. The problem solving approach consists of an approximate tech-
nique to quickly generate an initial solution, and an iterative method to ﬁne-tune the
initial solution.

21

In an end-to—end approach to the design of real-time systems by Gerber et a1. [39],
real-time applications are structured as a set of process components, connected by
asynchronous channels in which the end-points are the system’s external inputs and
outputs. End-to—end propagation delay, temporal input-sampling correlation, and
allowable separation times between updated output values, are postulated as end-
tO-end constraints. The problem solving approach is a multi-stage procedure that
involves an augmentation of the problem, an Optimization algorithm that generates

a set of intermediate rate constraints, and a domain-speciﬁc constraint solver.

An approach to scheduling and allocation in multiprocessor real-time systems is
described in [23]. Cheng develops a hybrid timings model that combines absolute and
relative timing constraints on tasks. Based on this model, the simulated annealing
technique is applied as the overall search algorithm to ﬁnd feasible schedules over
multiple processors. A task replication technique (for the purpose of improving the

scheduling) is developed and embedded into the simulated annealing algorithm.

A framework that provides a systematic approach to designing distributed, het-
erogeneous real-time systems that utilize resources in an efﬁcient, pipelined and pre-
dictable manner, is proposed by Chatterjee and Strosnider in [22]. It deﬁnes abstrac-
tions for representing real-time applications and capturing the fundamental prop-
erties of distributed pipelining; a ﬂow control mechanism; a decomposition of the
multi-resource scheduling problem into a set of single resource scheduling problem-
s with well-deﬁned interactions; an analysis methodology to support heterogeneous
scheduling policies among system resources; and a delineation of how manipulating

22

system conﬁguration parameters affect various application timing metrics. It does
not specify any particular mapping or Optimization technique.

In [84], Mutka and Li describe a tool that ﬁnds feasible processor allocations
for sets of rate-monotonically scheduled (RMS) tasks over a set of heterogeneous
processors. It uses three different RMS tests, considers possible task blocking due
to priority inversion, allows task co—allocation, and can perform task transformations
if needed. Communication among the tasks is not considered. The problem solving
approach is based on a branch-and-bound algorithm.

An optimal solution to the problem of allocating communicating periodic tasks
to heterogeneous processing nodes in a distributed real—time system by Peng et a1. is
presented in [87]. The maximum normalized task response time is minimized subject
tO the precedence constraints resulting from intercommunication among the tasks to
be allocated. The task system is modeled with a task graph in which computation and
communication modules, communication delays, and intertask precedence constraints
are described. These tasks are assigned to processing nodes by using a branch-and-
bound search algorithm.

In [66], Kang et al. describe a distributed real-time system model with statistical,
end-to-end constraints. It exploits both discrete-time Markovian analysis and real-
time scheduling theory, and uses several approximations to avoid modeling the entire
system. A system is modeled as a set of chains, where each chain is a distributed
pipeline of tasks, and a task can represent any activity requiring non-zero load from a
processor or network resource. Every chain has two end-tO-end constraints: delay and
minimum allowable success rate for outputs that meet their delay constraints. The

23

search algorithm uses two heuristics, which help in signiﬁcantly reducing the number
of potential feasible solutions but, at the same time, can miss some and report a
failure.

PERTS [101] is a commercial prototyping environment based on the rate mono-
tonic analysis (RMA), initially developed at the University of Illinois at Urbana—
Champaign. It integrates multiple analysis tools allow a real-time system designer
to test and evaluate the system against various design scenarios. Several real-time
scheduling algorithms and protocols are supported, and end-to-end analysis for single—
and multiple-node architectures is provided. It interfaces to Real-Time CORBA [27],
ObjecTime [74] and a few more real-time software technologies.

An approach due to Welch et al. for engineering time-constrained systems which
must operate in dynamic environments (with potentially unknown worst-case scenar-
ios and large, unpredictable variances in system parameters) is presented in [120].
A speciﬁcation language was developed that enables the description of environment-
dependent features. An abstract model constructed from the speciﬁcations is aug-
mented dynamically with the state of environment-dependent features. It is also
used to deﬁne techniques for quality-of-service monitoring and diagnosis, and allo—
cation analysis. A prototype resource-management middleware was developed to
experimentally evaluate the approach.

In the light of an overall approach that would be able to handle all the issues
brought out by Stankovic and listed in Section 1.2, the approach presented in this
dissertation attempts to address the need for a comprehensive computer-aided design
and engineering framework for complex real-time systems. However, it fundamentally

24

differs from one that implicitly follows from the above-mentioned motivation in [102]
by the same author. The latter is based on the need for a single real-time scheduling
approach that would address as many realistic scenarios at once as possible. The
approach presented in this dissertation aims to integrate and compose extant real-
time scheduling approaches, which address only speciﬁc scenarios, into a complex,
more realistic one. Intuitively, a single complex scheduling approach might not scale
well in complex real-time systems, which are parallel / distributed systems that should
scale with respect to the system real-time-liness. (It is easy to see that for a single
resource, such as CPU, the addition of support for more realistic scenarios in real-time
systems, such as ﬁxed priorities to activities and / or resource sharing among activities,
strengthens schedulability conditions, resulting in a lower resource utilization.) Such
an approach might overly underutilize the resources, leaning toward extensive resource
sharing and tight integration. In other words, an inherent property of loosely-coupled
systems, such as distributed computer systems, is that the Operating time scale is
larger than that of tightly-coupled ones, such as symmetric multiprocessors. This
is rather relevant to realistic real-time systems, in which the operating time scale
affects the real-time-liness. (As a side note, anticipating as many realistic scenarios
at once as possible is similar to analyzing worst cases in the hard real-time scheduling
theory. For a reference, Jensen in [63] argues that hard real-time scheduling does not
scale.) Another important difference between the two approaches is that Stankovic’
one is more scheduling theoretic, while the one presented in this dissertation is a
systems engineering approach. The former attempts to devise real-time scheduling

models that would support more realistic scenarios without strengthening feasibility

25

conditions. The latter attempts to build realistic real-time systems by placing an

emphasis on the use and integration of appropriate extant real-time models.

2.2 Constraint Logic Programming and Real-Time

Systems

Constraint Logic Programming tools have matured over the last decade and solvers
exist for a wide range of problem domains. For example, solvers for combinatorial
problems over ﬁnite domains and sets, systems of equations and inequalities involving
rational and real numbers, and solvers for systems of non-linear equations over real
numbers can be integrated within a CLP tool by the means of a Logic Programming
language, usually extended Prolog. CLP has been used in a variety of applications,
such as scheduling, resource allocation, timetabling, ﬁnancial planning, frequency as-
signment for cellular phones, etc. [116] Research in various areas of engineering have
used CLP for solving practical industrial problems. Constraint Logic Programming
has been used in some areas of computer-aided engineering (e.g., in mechanical en-
gineering [69, 107]; for VLSI design of electrical circuits [13]; in manufacturing [124];
in computer system performance analysis [75]) in the recent past. In the context
of Artiﬁcial Intelligence, a more general framework of Constraint Programming has
been used in reactive systems [35] and electro—mechanical machines [114].

26

Before more related work, which uses CLP in the area of real-time systems, is
presented in Section 2.2.2, CLP background is given in Section 2.2.1. Section 2.2.3

describes ECLiPSe, the CLP tool used in the work presented in this dissertation.

2.2.1 Constraint Logic Programming Background

The insight which led to the design of the CLP framework is the observation that
the algorithm Of uniﬁcation used in Logic Programming (LP) is a constraint solving
algorithm and as such it could be combined with, or replaced by, various other con-
straint solving algorithms. In other words, LP Offers the means to create a single and

powerful framework for various cooperating constraint solving algorithms [116].

More formally, CLP is a many-sorted generalization of LP, in which different sorts
are associated with different interpretation domains, and corresponding formulae are
manipulated using predeﬁned constraint solvers. Special classes Of formulae, called
constraints, are not handled using traditional resolution, but are interpreted under a
predeﬁned speciﬁc interpretation and handled by external constraint solvers. Basic

deﬁnitions of the formal CLP framework presented in [71, 56] are given below.

A CLP language is built upon a set 23 Of function and constraint predicate symbols,
called signature. Primitive constraint predicates (including the equality symbol =)
belong to E and are interpreted with respect to a predeﬁned interpretation structure,
called E-structure, while user-deﬁned constraint predicates are subject to the user
deﬁnitions.

27

A term is an Object created using function symbols from 2 and a collection of
variables. It can be either a simple variable or an application f(t1, . . . , tn) of an n-
ary function symbol f E E to n terms t1, . . . , tn (n 2 0). An atom is an application
p(t1, . . . , t,,) of a constraint predicate symbol p to n terms t1, . . . , tn. pr is a primitive
constraint predicate, then the atom is called a primitive constraint. Every constraint
is built from primitive constraints. A program is composed of a collection of clauses,

where each clause has the form:

headz—c [ b1,...,b,c

where head and b,(i : 1, . . . , k) are user-deﬁned atoms, while c is an arbitrary con-
junction Of constraints. The symbol | is used in the body of the clause to separate
the constraint part from the goals and can be read as “and,” in the same way as a

comma in the body of the clause.

A E-structure D consists of a set D and an interpretation function I D. A con-
straint c is solvable if D [2 3(c), where the notation 3(gb) denotes the existential
closure of the formula 4) (i.e., each variable in (b is within the scope of an existential
quantiﬁer). A Z-theory is a collection of closed Z-formulae (i.e., formulae built over

2). A solution 0 for c is a mapping from the variables in c to D, such that D [2 CH.

The execution of a constraint program requires the use of constraint solvers ca-
pable of deciding the solvability of each possible constraint formula. Resolution is
extended in order to embed calls to constraint solvers. If ? — C] | g1, . . . , 9,, is a

28

goal, and p : —c2 [ b1, . . . , 1),. is a clause in the program, then the resolvent of the

goal wrt. the given clause is

?— (c1,c2,g1 =1?) I b1.---.bk.92.---.9n

as long as D [2 (c1 /\ c2 /\ (91 = p)). The symbol = is an abbreviation for the
conjunction of equations between corresponding arguments of g1 and p, if 91 and p
have the same predicate symbol. The constraint solver is used to test the validity of

the condition on the constraints.

The idea behind the introduction of the CLP framework is that a logic-based pro-
gramming language, its declarative and operational semantics and the relationships
between these semantics can be parameterized by a choice of the domain Of compu-
tation and constraints. The resulting scheme deﬁnes the class of languages CLP(X)

Obtained by instantiating the parameter X [116].

The parameter X stands for a 4-tuple (2, D, L, T), where Z is a signature, D is
a E-structure, L is a collection of Z-formulae and T is a ﬁrst-order Z-theory. The
2 determines the predeﬁned predicate and function symbols with their arities, D is
the structure over which computation is to be performed, .6 is the class of constraints
which can be expressed, and T is an axiomatization of some properties of D. The
pair (D, L) is called a constraint domain. One such domain, FD, will be described in
Section 2.2.3.

29

2.2.2 Use of CLP as Support for Real-Time Systems

CLP and its more general version, Constraint Programming, have successfully been
used as a tool for solving speciﬁc problems in the area of real-time systems as well
as other areas with certain real-time aspects, ranging from temporal reasoning and
scheduling to formal methods, to resource allocation. This fact supports the moti-
vation for using CLP as an elegant way of expressing the disparate requirements of

real-time systems, in Section 1.2.1.

A family Of logics and associated programming languages for representing and
reasoning about time is introduced in [37]. The family is conceptually simple while
allowing for different models Of time. Formulae can be labeled with temporal infor-
mation using annotations. Unlike temporal logic [88], both qualitative and quantita-
tive (metric) temporal reasoning about deﬁnite and indeﬁnite information with time
points and time intervals in different models of time are supported. The introduced
temporal annotated logic can be made an instance of annotated constraint logic, and

there is a systematic was to make a clausal fragment executable as a CLP program.

A new class of application domains for Constraint Programming is introduced
in [98], due to the emergence of special real-time systems, enjoying increasing popu-
larity in the areas of automotive electronics and aerospace industry. Real-time sys-
tems of this kind are time-triggered in the sense that their overall behavior is globally
controlled by a recurring clock tick. An off-line scheduling approach maps inﬁnite,
periodic processing onto a single time window of a ﬁxed length. The authors also

30

describe which techniques from traditional scheduling and real-time computing led to
success and which failed, when confronted with a large-scale application of this type.

In [33], the author ﬁrst explains how the bottom-up evaluation method Of Revesz
for computing least-ﬁxed points Of CLP programs can be adapted from the domain
of integers to the domain Of reals. The procedure is applied on a state reachability
problem in timed automata [2], also including certain extensions of timed automata.
It has been successfully experimented for proving automatically the correctness Of a
SOphisticated reactive program that controls dataﬂow rates on ATM networks.

A CLP-based framework is developed in [46] for speciﬁcation and veriﬁcation of
real-time systems that is based on the notion of timed automata. A user models
the ordering Of real-time events as the grammar of a language accepted by a timed
automaton, and real-time constraints on these events are then captured as denotations
of the grammar productions speciﬁed by the user (i.e., the valuation function of
the associated denotational semantic maps into the time domain). The resulting
speciﬁcation is a CLP program that is executable. Many interesting properties of the
real-time system can be veriﬁed by posing appropriate queries to this CLP program.
The approach is also constructuve in the sense that conditions can be computed under
which a property will hold for a given real—time system.

In [45], the author applies CLP on a problem of global optimization of DSP
application mapping onto parallel architectures. The problem is characterized by
numerous resources (number of processors, bandwidth, memory size), real-time con-
straints (latency, sampling) and many non-linear constraints. The author also notices
the capacity of CLP to compose several concurrent system models. Certain aspects

31

of the problem are presented in [3]. It is shown how it is possible to handle and
solve three system models at once, under architectural and real-time constraints: a
data-partitioning model equivalent to that supported by HP Fortran; a ﬁne-grained
(at macro-instruction level) scheduling; and a capacitive, distributed Shared memory

model.

2.2.3 Overview of ECLiPSe

The ECL‘PSC platform integrates a number Of constraint solvers, including ones for
solving sets of constraints over ﬁnite domains CLP(FD), real and rational numbers
(CLP(R,Q) [52]), and intervals over real numbers (RIA) [53]. A special library called
REPAIR [96] allows the user to start with a tentative solution, which can be modiﬁed,

or repaired, if it turns out inconsistent with the constraint set.

The ﬁnite domain support consists of three libraries: for symbolic ﬁnite domain-
s; for handling integer variables and numerical constraints on these variables; and
with built-in complex constraints. The second library is the major one, propagating
equations and inequalities between linear expressions. A linear numeric expression
is one that can be written in the form T erml + Termg + + T ermn, where each
term can, in turn, be written as Number or Number - Variable. According to the
deﬁnitions in Section 2.2.1, the set D in this case is the set Z of integers, and ﬁnite
domain constraints are existential positive formulae built up from the linear expres-
sions and the ﬁve predicates 2, >, 2, < and S, interpreted in Z, and inﬁnitely
many membership predicates 6 [a,b], one for each ﬁnite interval [a, b] of Z. The

32

primitive constraint x E [a, b] is interpreted by the empty set when b < a; similarly,
x = a and x E [a,a] are identiﬁed. Solving ﬁnite domain constraints is both NP-
complete and very important for practice, which has favored the use of a practically
efﬁcient technique called constraint propagation. It consists of a set of transformation
rules such that each primitive ordering or equality constraint c between the variables
x1, . . .,x,, in c, whose domains are deﬁned by membership constraints x,- E [a,-,bi],
induces new restrictions on these domains, resulting in new membership constraints.
A reduced domain RD(xj,c) of the variable x,- for c is the smallest interval [oz-J3]
containing all m E Z such that the constraint Obtained by substituting xj by m in
c A ( 2:? xi 6 [a,, b,]) is satisﬁable in FD.

In practice, a typical CLP(F D) program consists of three sections. First the
domains of program variables are declared (the notation is slightly different than
above; comma is used to separate intervals, and intervals are speciﬁed using . .).
For example, the variable declaration Task_O_TimeAt1MIPS : : [0, 1000 . . 2000]
could be interpreted as task 0 being either inactive (does not use any CPU time)
or active such that it may require between 1,000 and 2,000 milliseconds of CPU
time on a unit-capacity CPU. (Some program variables, a.k.a. meta-terms, may be
assigned attributes other than ﬁnite domains, and multiple different, cooperating
constraint solvers may be involved.) Second, constraints are stated that are used to
build a constraint network at run time. As opposed to the generate-then-test approach
to searching of logic programming, CLP uses a much more efﬁcient constrain-then-
generate approach. The last program section deﬁnes the order the program variables
will be assigned values, which are consistent with the constraints, and the order

33

those values will be tried; this is called labeling. Sometimes, stating the constraints
is enough to reduce the domains of program variables to single values and Obtain
solutions, if any. In other situations, the CLP system cannot decide the problem
with this information only, i.e., it may return domains of the program variables’
values for which there might be solutions. As the program variables are assigned
values in an user-speciﬁed order, their domains are further reduced by the constraint
propagation. If the CLP system detects that the constraints cannot be satisﬁed with
current value assignments, backtracking is performed and most recent assignments

are undone. Other CLP solvers may handle their constraints more or less differently

than CLP(FD).

2.3 Explicit Timing Constraint Checking

In the area of speciﬁcation and veriﬁcation of real-time systems, explicit timing con-
straints have been a research focus during the last years. They have been studied in
two main contexts. The ﬁrst one is formal methods, where theories based on the tim-
ing constraints have been developed, independently of the real-time scheduling theory,
for expressing and statically checking temporal properties of real-time systems. The
other context is real-time languages and systems, where the timing constraints are
checked dynamically at run-time.

Major work in the context Of formal methods has been done by A. Mok and
F. Jahanian et al. Their work started with Real—Time Logic (RTL) [59], a formal
language for the speciﬁcation of real-time systems. RT L formulae are constructed

34

using addition and subtraction Of event occurrence functions (which map to the time
domain) with integers, (in)equality predicates, universal and existential quantiﬁers,
and logic connectives. The semantics of RTL is based on the occurrence of events
(using the absolute timing, not only ordering) that are based on the execution of a
real-time system, such as an event coming from the environment, the start and end
of code blocks or the assignment of values to a state variable. Checking whether a
set of timing constraints in RTL is satisﬁed is generally undecidable, although certain
classes of RTL formulae are easier to check [57, 106]. Algorithms were developed for
checking safety properties [60] and partial event-traces [57].

The research was later directed toward the other context. A distributed on-line
monitoring and checking tOOl was described in [24] and [58] that allows for specifying
timing properties in a subset of RTL and checking for their violation. The work de-
scribes how to store events coming from the system, how to deﬁne timing constraints
based on these events, and how to evaluate the constraints in a distributed environ-
ment. A graph~based algorithm was developed that evaluates these RTL formulae
whenever a new event arrives or when a timeout expires (at earliest possible time).

In the context of dynamic timing constraint checking, a real-time system is e-
quipped with means for detecting and/or preventing timing constraint violations.
When a timing constraint violation is detected, the system tries to adapt by, for
example, activating stand-by resources, rescheduling of the remaining resources, or
executing alternative algorithms for solving the problem under different conditions.

A methodology for specifying and checking timing constraints in a distributed
Object-oriented environment is presented in [40]. The focus there is on how object

35

orientation can be utilized to simplify the speciﬁcation and the checking of timing
constraints and how this can be integrated in an existing programming language like
C++. The methodology integrates a precompiler that generates instrumentation code
and a constraint checker based on RTL-like timing constraints speciﬁed for each class,
and a distributed instrumentation system. The functional and timing speciﬁcations
are semantically separated. Notiﬁcations by constraint checkers in the form of events
are used as the feedback from the target system.

Another approach to dynamic timing constraint checking in distributed Object-
oriented systems is presented in [93]. Like the previous approach, it separates real-
time requirements from individual commands in a program. The real-time require-
ments are timing constraints described by declarative synchronization code between
the interfaces of objects, expressing common, message-based temporal coordination
patterns. Objects in the system are based on the Actor model [1], and a high-level
programming language construct called RTsynchronizer is deﬁned that Speciﬁes a
collection Of temporal constraints between actors. The run-time system is able to
dynamically enforce timing constraints, based on the principles of safe progress and
unsafe wait, by e.g. delaying messages. A group of actors may be constrained by
overlapping RTsynchronizers, which may be dynamically added or removed.

In [85], the work has been extended and the actors and synchronizers have been
assigned formal Operational semantics. The actor semantic interprets actor primitives
as two-phase transitions between conﬁgurations, which are modeled by ordered pairs
((1, p), where (1 represents actor states and p is the set Of pending messages. As for
the synchronizers, a constraint conﬁguration is similarly deﬁned as an ordered pair

36

(x,£), where x is a multi-set of demands for message invocations corresponding to
the constraint é; transitions are determined by constraint ﬁrings, whenever a message
invocation matches a pattern in a constraint. A collection Of synchronizers is termed
an interaction constraint system conﬁguration. Finally, the two Operational semantics
are composed into one. The overall model is yet to be implemented, and the authors
have identiﬁed three main tasks: a (SO-called constraint-directed) scheduling strategy
that ﬁts the synchronizer approach; constraint propagation by the compiler and run-
time system; and the distribution Of synchronizer entities.

A number of other formal methods for real-time systems treat explicit timing con-
straints using various time models. For static checking, the CLP approach described
in this dissertation could support some of the formal methods on the grounds Of
common underlying frameworks such as logic and Presburger (linear) arithmetic. An
interesting issue is one of the integration of these models with models from real-time
scheduling theory within the compiler-based framework.

While the behavior of a real-time system based on static analysis becomes unde-
ﬁned as soon as some of its design assumptions are violated, it is difﬁcult to provide
any guarantees for a real-time system equipped only with dynamic timing constraint
checking. The approach to dynamic system reconﬁguration described in this disserta-
tion attempts to combine static analysis and constraint checking with on-line system
performance analysis. Similarly to, for example, common analysis of missed task
deadlines, the on-line performance analysis may include dynamic timing constraint
checking as described above. Namely, parts of real-time models that allow for static
analysis are evaluated in advance by the CLP tool to derive unknown system param-

37

eters. Run-time system performance degradations and constraint violations are then
fed back to the CLP tool, which tries to solve for new, better values of the system
parameters by taking into consideration all the static requirements that result in cer-
tain guarantees. While a run-time system for dynamic timing constraint checking
can prevent some constraint violations and quickly detect unpreventable ones, it it-
self lacks complete information to steer the target system toward a provably better

conﬁguration.

2.4 Distributed Instrumentation Systems

Many distributed ISes have been developed over the past decade (e.g., [30, 80, 47]),
usually as components Of larger software toolkits for analysis of parallel/ distributed
systems. Only a few have been ported to multiple platforms and made available to
broader usage. Several such systems are brieﬂy discussed here. The usage of BRISK
in the context of the Objectives and features of these distributed ISes is based on

certain inferences only, and their developers may have different recommendations.

The Automated Instrumentation and Monitoring System (AIMS) [123] contains
xinstrument, a source-code instrumentor for F ortran77 and C, and monitor, a library
of time-stamping and trace-collection routines that generates trace ﬁles. Tools for off-
line processing of trace ﬁles include a utility for removing monitoring overhead and
maintaining consistency of causally-related events, and a trace ﬁle animation and
analysis toolkit. If xinstrument were modiﬁed to support BRISK, BRISK could
replace the rest of the IS.

38

The Pablo [91] software instrumentation system contains an instrumenting parser
with a GUI, and a performance data capture library for generating trace ﬁles in Self-
Describing Data Format (SDDF) [4]. The instrumentation software supports tracing,
interval timing, and counting. The instrumentation library monitors the instrumen-
tation overhead and volume of data, and can automatically adjust the intrusion on
the target application by changing the monitoring method. The basic implementation

of BRISK could be extended to support SDDF and automatic intrusion adjustment.

Paradyn [79] takes an approach called dynamic instrumentation for dynamically
controlling the performance data to be collected. The design Of the tool is based on
two data abstractions: metric-focus and time-histogram. The dynamic instrumenta-
tion environment includes a compiler and code generation, structural analysis Of the
binary, and an instrumentation manager that allows code to be inserted and removed
from the running program. It also incorporates a strategy for describing performance
information to users Of high-level parallel languages. This is an example of a com-
prehensive, specialized distributed IS, although ported to multiple platforms, which

probably would not beneﬁt fundamentally from BRISK as its kernel.

Falcon [43], a monitoring and steering system, uses a low-level sensor speciﬁca-
tion language and mechanisms for on-line capture and analysis of application-Speciﬁc
information about large—scale parallel programs. It includes a semi-portable binary
I/O facility and a tool for instilling a partial order on an unordered event stream.
The latest BRISK implementation provides generic capabilities that can be used to
emulate those of the core Falcon system (i.e., not including auxiliary tools).

39

The JEWEL distributed measurement system [70] uses low-cost software sensors
in the form of cpp macros, assumes synchronized hardware clocks, and has a data
collection and reduction infrastructure based on the External Data Representation
(XDR) protocol [54]. It is integrated with a conﬁgurable graphical presentation fa-
cility providing a set of single-metric views, and an interactive experiment control
system. JEWEL components are based on a custom conﬁguration language and data
and control protocols. A distributed IS with JEWEL’S features and slightly different
architecture could be built on top Of an extended BRISK.

Vista [117], a C++ framework for development Of domain-speciﬁc distributed IS-
es, is based On a generic distributed IS model that deﬁnes three components of a
distributed IS: (1) local instrumentation server (L18), (2) IS manager (ISM), and (3)
transfer protocol (TP). In the interests of ﬂexibility, this simple model has been adopt-
ed for the BRISK design. For performance reasons, the BRISK LIS implementation
is based on J EWEL’S internal and generic external sensors.

In Cristian’s distributed (centrally-controlled, master—slave) clock synchronization
algorithm [29], a master node polls slave nodes in so-called rounds. In each round,
the master queries each slave for its current time, and waits for an answer; when the
answer is returned, the master computes the time difference between the pair. This
querying is repeated a number of times for each slave to average the results. At the
end of each round, the master sends the time differences to the slaves to adjust their
clocks. In effect, the master maintains the slaves’ clocks synchronized relative to each
other (a.k.a. internal synchronization) by maintaining their values within a maximum
deviation from its clock value (a.k.a. external synchronization). Furthermore, the

40

algorithm discards measurements whose round-trip time is longer than one calculated
based on the assumed clock drift, shortest round-trip, and achievable clock error. It
also adjusts the number of queries in a round based on the Observed probability Of a
successful query. A variant of this algorithm has been used in a distributed IS called
DTM, described in [47].

Kimelman and Zernik in [67] present a technique for on-the-ﬂy ordering and
matching Of causally-related event data records that are being produced by a number
of distinct processors, engaged in multiple one—to—many and many-to—one communi-
cations. The technique is Optimal in terms Of the amount Of space required, and in
terms of the amount Of additional delay incurred prior to delivery of an event data
record to its ultimate destination. The time—stamps of out-of-order causally-related
event data records are used, instead of a distributed clock synchronization algorithm,
to estimate clock drifts of the processors, and these estimates are used for correction
of the time-stamps Of the preprocessed event data records.

The primary source of software-based instrumentation intrusion is execution of
additional instructions, while side-effects include changes in memory reference pat-
terns, event reordering and even CPU stalls. Malony presents in [76] two time-based
intrusion and perturbation analyses Of software-based tracing on a multiprocessor.
(In [76], the term perturbation is used for any—not only critical—change in even-
t timing due to instrumentation.) Based on two derived performance perturbation
models, trace event times are adjusted to (1) remove delays due to the measured costs
of instrumentation, and (2) reorder the event sequence based on knowledge of event
dependencies, maintaining causality. Experimental results Show that it is possible to

41

characterize perturbations through simple models and recover the actual execution

timing with up to 20% error.

It is clear that extant distributed ISes have different goals in different domains,
but share many similar needs and characteristics. The related work discussed in this
section supports the motivation from Section 1.2.2 for the development of BRISK as a
distributed IS kernel, with an emphasis on features that are useful for instrumenting

complex real-time systems.

2.5 Performance Analysis and Visualization

In this section, several related PAV tools, systems and environments are described.
The PAV environments are progressing with features to incorporate new analysis and
display modules. Visualization environments are not only becoming extensible, but
retargetable to different analysis scenarios.

ParaGraph [49] is a graphical display tool developed by Michael T. Heath for vi-
sualizing the behavior and performance of parallel programs that use PICL (Portable
Instrumented Communication Library) [122] or MP1 (Message-Passing Interface) [31].
The visual animation of a parallel program is based on execution trace information
gathered during an actual run of the program on a message-passing parallel computer
system. The resulting trace data are replayed pictorially to provide a dynamic de-
piction of the behavior of the parallel program, as well as graphical summaries of its
overall performance. The same performance data can be viewed from many different
visual perspectives to gain insights that might be missed by any single view. The

42

necessary execution trace data are produced by a tool called MPICL, developed by
Pat Worley of Oak Ridge National Laboratory, which uses the proﬁling interface of
MP1 tO provide timestamped records of MP1 events.

Pablo took the PAV environment research one step further by incorporating sup-
port for performance analysis environment prototyping [92]. The Analysis GUI com-
ponent Of the Pablo Performance Analysis Environment consists of a performance
visualization system that provides a portable, scalable, and extensible tOOl for the
analysis and display Of data written in the Pablo Self-Deﬁning Data Format (S-
DDF) [4]. The user interacts graphically with the Analysis GUI to build a data-ﬂow
graph whose nodes organize, transform, analyze, and display data read from Pablo
SDDF ﬁles. Typically, the input ﬁles are generated with a trace library component.

Viz continues in this direction by focusing on the visualization technology required
for application-speciﬁc performance visualizations [50]. It was created out of a need
to support rapid visualization prototyping in an environment that could be extended
by abstractions in the application problem domain. Viz provides this in a program-
ming environment built on a high-level, interactive language (Scheme) that embeds a
3D graphics library (Open Inventor), and that utilizes a data reactive model of visu-
alization Operation to capture mechanisms that have been found to be important in
visualization design (e.g., constraints, controlled data ﬂow, dynamic analysis, anima-
tion). The strength Of Viz is in its ability to create non-trivial visualizations rapidly
and to construct libraries of 3D graphics functionality easily. Although our original
focus was on parallel program and performance data visualization, Viz applies beyond
these areas.

43

Avatar [90] is a virtual reality framework, built on the Pablo performance anal-
ysis kit that supports multiple metaphors to display dynamic data. By separat-
ing the metaphor display software from the data processing and interaction compo-
nents, Avatar’s software architecture has allowed for quickly creating new display
metaphors. Three different display metaphors for performance data have been devel-
oped to date: time tunnels (time lines and event driven graphs of task interactions),
scattercubes (3D generalizations of 2D scatterplots that support analysis Of high-
dimensional time-varying data), and geographic displays (texture—mapped spheres
with source-destination arcs and stacked bars). Avatar has been used to study two
types of high-performance input / output of the Portable Parallel File System (PPF S):
parallel scientiﬁc codes and WWW servers.

Rivet [18] is an information visualization environment that provides a cohesive
platform for the analysis and visualization of modern computer systems. It uses a
component-based architecture in which complex visualizations can be composed from
simple data Objects, visual objects and data transformations. Additionally, it provides
powerful coordination mechanisms, which can be used to add extensive interactivity
to the resulting visualizations. Rivet has been successfully applied in focused studies
of a wide range of computer systems: parallel applications, superscalar processors,
memory systems, and wireless networks.

Lucent Technologies’ Visual Insights [115] offers a set of interactive and linked
data visualization components for the Microsoft ActiveX developer market that help
software developers create more ﬂexible, animated ways to display trends in vast
stores of information.

44

In Table 2.1, PGRT visual Objects that are described in Chapter 5 of this dis-

sertation, are compared with the above-described related PAV tools and systems.

While some of the latter have gone further in speciﬁc directions, such as the graphi-

cal metaphor, the design decisions taken in the approach presented in this dissertation

were primarily based on the requirements stated in Section 1.2.3. (There is also a

concern that insisting on use Of the state-of-the-art, academic software technologies

in PAV tools and systems, such as, for example, lazy functional languages, could limit

their accessibility by non-experts who use legacy software tools. Instead, using tech-

nologies that are gaining acceptance, such as SGML (e.g., in the Chemical Markup

Language [83]) and use Of structure, components and scripting (e.g., in VRML [109]),

increases the chance of the PGRT visual Objects contributing to the PAV community.)

Table 2.1: Performance visualization tools and systems

 

 

 

TOOl/ System On/Off Graphical Underlying gra- View classes Reu-
line metaphor phical technology sable
ParaGraph off-line data-ﬂow X library generic no
Pablo widgets off-line data-ﬂow X library generic yes
Avatar on-line data-ﬂow VRML three metaphors n/ a
Viz on—line data-reactive Open Inventor domain-speciﬁc yes
Rivet off-line data-ﬂow OpenGL domain-speciﬁc yes
Visual Insights mostly n/ a n/ a generic yes
Off-line
PGRT VO on-line data-ﬂow low-level VO domain-speciﬁc yes

 

 

 

implementations

 

 

 

 

Among the few more comprehensive PAV approaches that try to link on-line PAV

with system modeling, design, engineering, and steering, PERTS [101] and Falcon [43]

come close to what is the self-contained result Of this dissertation, presented in Chap-

45

ter 6. PERTS facilitates the design and engineering of real-time systems based on the
rate-monotonic analysis, and interfaces Wind River’s WindView visualization tech-
nology [121]. Falcon is not Speciﬁcally aimed at real-time systems, but integrates
an application-speciﬁc on-line monitoring system, an interactive steering mechanism,
and a graphical display system, in an effort to affect the performance and/or execu-
tion behavior of large-scale parallel programs. A good source of related work is An
Annotated Bibliography Of Interactive Program Steering [44], which places emphasis

on monitoring, information presentation and steering.

46

Chapter 3

A Compiler-Based Approach to
Design and Engineering Of

Complex Real-Time Systems

This chapter describes the unifying, compiler-based approach to the design and en-
gineering of complex real-time systems. The Real-Time System Markup Language
(RTSML) is described in Section 3.1. An example of a complex real-time system
accompanies the description, in order to make it easier to explain how the compiler
generates CLP code in Section 3.2. Results Of experiments with the example system
are presented in Section 3.3. Section 3.4 brieﬂy comments on the scalability issues
in the approach. Before a summary, Section 3.5 discusses correctness issues in the
approach and RTSML-tO-CLP compilation.

47

 

3.1 Real-Time System Speciﬁcation in RTSML

In this section, the RTSML language is described using excerpts Of an example com-
plex real-time system that is motivated by a shipboard control system described
in [119]. Figure 3.1 shows the hardware conﬁguration and a detail from the software
conﬁguration of the system. The squares represent processors, circles represent chan-
nels, and small squares labeled R represent routers; the lines connecting the processors

and channels Show possible routes among the processors.

Sensors (8) Sense (8), Evaluate-and-Decide (3), Act (a) ”Actuators (8)

 

 

 

p0 P12

 

 

 

 

 

 

 

 

 

 

 

 

@‘ﬁb—— ———.-_

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i ‘ i ‘
i ' I '
i ' i '
i ' l '
i ' I :
i

i i
: . : .

i i
: | ”CO ., : j
i .-:’ i :
| .
I i '
i ' l i P13 '
l P1 ' i I '
i ’ i i '
i : i I :
l i
| i i
I ' l l '
l ' i i '
i ‘ i i '
i P2 ' i i P14 [
l
i ' i i '
i [ i i [
I I
I ' C3 i l '
i : i i :
i
i l i '
i P3 ' I I P15 '
I ' i i '
l ' i i '
i ' i i '

I I

 

Figure 3.1: An example complex RT system

There are 40 periodic tasks, divided into ﬁve groups of 8 tasks each. Sensor
tasks make the ﬁrst group, to be allocated over the four leftmost processors. Sense,
evaluate-and—decide, and act tasks are all to be allocated over the eight processors
in the middle. Actuator tasks are to be allocated over the four rightmost processors.
The communication among the tasks goes from left to right: each sensor sends a

48

 

1 <!DOCTYPE RTSML PUBLIC "-//MSU-PGRT//DTD RTSML 2.0//EN">

 

 

 

 

 

 

 

 

2 <rtsm1>

3 . . .

4 (system id="sO" type="common">

5 (processor-group ﬁdﬁﬁpggfgtype="common">

6 <proceesor EEEFEUﬂ type="rms (mips (initial 10)

7 (domain (10)))"></processor>

8 </proceesor-group>

9 .

10 (processor-group ﬁdgfﬁgfﬁitype="common">

11 .

12 (processor m type="rms (mips (initial 10)
13 (domain ((range 8 10))))"></processor>

14 .

15 </processor-group>

16 (task-group id="th" type="common">

17 (task lﬁ=ItDTI type="common (period (initial 50)
18 (domain ((range 50 80)))) (deadline (initial 50)
19 (domain ((range 40 50)))) (timeOImips (initial 180)
20 (domain ((range 100 300))))" yrocessors=ﬁpg0"p
21 </task>

22 .

23 </task-group>

24 (message-group id="mgO" type="common">

25 <message Kai" -E-E‘l type="common

26 (period (initial 500) (domain ((range 500 600))))
27 (deadline (initial 15) (domain ((range 10 30))))
28 (length (initial 17) (domain ((range 8 32))))"
29 Ia'EtEcEFItDI‘I WEEEE'IBZtE'D

30 </message>

31

32 </meesage-group>

33 (task-group id="sense" type=”common">

34 .

35 (task lﬁ=3§21 type="common

36 (period (initial 1200) (domain ((range 800 1600))))
37 (deadline (initial 1000) (domain

38 ((range 800 1200)))) (timeclmips (initial 2300)
39 (domain ((range 2000 4000))))" processors=“pgl"p
4O </taek>

41

42 (channel-group id="upper" type8"common">

43 (channel type="rms (mbps (initial 10)

44 (domain (10)))”> ...

45 </channe1-group>

46 .

47 (routing-table id="rt0" type="common">

48 <route type="common"

49 W“

50 .

51 </routing-table>

52 .

 

Figure 3.2: RTSML speciﬁcation excerpts

49

 

 

 

 

Table 3.1: Ranges of system parameters
Task, message period deadline demand, length
group [ms] [ms] [kI, kB]
Sensors 25—120 25—75 40—400
Sense 800—2500 800—1800 1500—6000
Eval & Decide 800—2500 800—1800 1500—6000
Act 800—2500 800—1800 1500—6000
Actuators 35—120 30—75 40—400
Sensors to Sense 200—650 10—75 8—80
Sense to E&D 200—650 10—75 8—80
E&D to Act 200—650 10—75 8—80
Act to Actuators 200—650 10—75 8—80

 

 

 

 

 

periodic message to two sense tasks (as shown for task TO sending to tasks T8 and
T9); each sense task sends a periodic message to two evaluate-and-decide tasks; each
evaluate—and-decide task sends a periodic message to two act tasks; and each act task
sends a periodic message to two actuator tasks. Table 3.1 lists the ranges of system
parameters. For any speciﬁc values of, for example, a task’s period and deadline,
additional constraints will be enforced according to the real-time model. For example,
the rate-monotonic model requires the deadline to be less than or equal to the period.
The (idealized) capacity of the processors is 10 MIPS, but the available capacity of
any of the eight processors in the middle can possibly drop to 8 MIPS; the channels
have the capacity of 10 MB/s. The upper part of the last column shows processor
time demands in milliseconds assuming the capacity of 1 MIPS, which is equivalent

to demands in kiloinstructions (k1); the lower part shows message lengths in kB.

Parts of the RTSML specification that are relevant to the details shown in Fig-
ure 3.1 are highlighted in Figure 3.2. The system consists of processor, channel, task

50

and message groups, which are speciﬁed using corresponding RTSML elements (e.g.,
processor-group). The attributes of the elements are used to either specify relations
among the elements (e.g., the processors attribute of the task element), or their
properties with respect to a real-time model (the type attribute). The attributes in
dotted boxes of Figure 3.2 show that task to (i.e., T0 in Figure 3.1) can be allo-
cated to a processor in processor group ng only (lines 5 and 20; corresponding to
the leftmost dashed oval rectangle in Figure 3.1), and that task 139 can be allocated
to a processor in processor group pg1 only (lines 10 and 39; corresponding to the
dashed oval rectangle in the middle in Figure 3.1). The attributes in solid rectangles
of Figure 3.2 (lines 6, 12, 43 and 48—49) show that processors p0 and p8 can com-
municate via route r0-8 going across channels c0 and c1; routes are unidirectional
and there may exist more than one route between two processors. The attributes in
dashed rectangles of Figure 3.2 (lines 17, 25, 29 and 35) show that message m0-8-9
is multicast by task to to tasks t8 and 129.

As a markup language, RTSML is not intended to capture the semantics of var-
ious real-time models. Extending its document type deﬁnition (DTD), given in the
Appendix A, with model-speciﬁc elements and attributes would require the core of
the compiler to be modiﬁed with the addition of each new real-time model or scheme.
Instead, attribute type is added to each RTSML element that is used to store the
type of the real-time model which the corresponding element instance abides, along
with model-speciﬁc parameters. In this way, it was possible to design the compiler so
that elements of different real-time “types” are handled by different compiler mod-
ules, which can be added without modifying the core. In other words, these compiler

51

modules are responsible for capturing the semantics of real-time models and their
integration; it is the responsibility of a module implementor to exploit available con-
straint solvers efficiently and to provide interface (in terms of generated CLP code)

for integration with other modules.

The model-specific parameters are given in the form of S-expressions, a notation
close to markup, which are easily parsed. For example, at line 6 of Figure 3.2, the
underlined value of the type attribute of the ﬁrst processor element will be parsed
by a compiler module responsible for the Rate-Monotonic Scheduling (rms) real-time
model. This module handles both computation and communication resources (i.e.,
processors and channels). In the case of processor p0, the module will determine that
it has a constant capacity of 10 MIPS, while in the case of processors in processor
group pgl it will determine that their initial capacities are 10 MIPS, but down to
only 8 MIPS may be available at times. (This does not mean a non-deterministic
model. This attribute is an input parameter in practice, whose value may vary at

times. However, it is beneﬁcial to know its bounds.)

Another compiler module that has been integrated in the current implementation
of the RTSML compiler is called common. It handles elements with a default, or
common, real-time model assigned. In the case of tasks and messages, by default
they are periodic, have deadlines, and require certain amounts of resources (CPU
time required by a task, or communication time required to transfer a message). We
have chosen milliseconds and kilobytes as the units for time and message length,
respectively. Like the resources, tasks’ and messages’ parameters may take values

52

from a ﬁnite integer domain, including ranges. The common routing model, used in
this example, is static.

As new real-time models are supported, the problem of model compatibility arises.
Namely, it is hard to integrate components or subsystems of a complex real-time sys-
tem that have been designed separately, based on independent real-time models. The
developer of a new model-speciﬁc compiler module will deﬁne how the new module in-
tegrates with the other modules. For example, how to insure that a periodic message
arrives before its deadline across a number of rate-monotonically scheduled real-time
channels? The rms module will cooperate with the common module and generate
CLP code such that, for example, the message’s deadline is divided by the number of
channels it traverses, and the floor of the result is used as the message deadline when
crossing each channel. Similarly, a stochastic version of the rate-monotonic schedul-
ing model [111] would see a common task’s ﬁxed CPU requirement as a degenerated
distribution. Vice versa, the deterministic rms model could work with tasks with
stochastic CPU requirements by taking into account the upper bounds of their CPU
requirement distributions. A dynamic and / or fault-tolerant routing model, over a set
of processors and channels, would probably require more effort to integrate.

All the above examples could be classiﬁed into the horizontal system integration.
For the beneﬁt of such integration of complex real-time systems, it might be useful
to draw from approaches to integration of low-level digital systems. Extending the
discussion at the end of Section 2.1, there are numerous examples of successful inte-
gration of digital devices implementing different functional subsystems/components
by means of their timings speciﬁcations. In complex digital systems, there are also

53

multiple, special-purpose busses and multiple timing protocols. Vice-versa, real-time
models might be more integrable if they were devised with the horizontal integration
with some—not as many as possible at once—other real-time models among the goals.

Vertical integration of complex real-time systems deals mainly with coupling low-
er, system-level subsystems/components with higher-level software. In many cases,
the real-time properties of the latter are speciﬁed using software-engineering formal
methods. Future support for integration of formal methods in RTSML is envisaged
as their application on special cases of real-time system components that obey the
above-mentioned models. One example of such integration, also referred to as method
integration, is given in [89]. A real-time design method, called HRT-HOOD [21], that
constrains the designer to produce a design that is amenable to scheduling analysis (of
periodic and sporadic tasks) is integrated with the Modecharts [81] formal method for
real-time systems. In effect, a Modechart design is constrained so that it can be ana-
lyzed for schedulability; Modecharts are applied to only the most “important” parts
of the design by applying them to the relative HRT-HOOD objects. The following
section will show how the CLP approach can handle a schedulability analysis, while
Section 2.3 discussed checking of RTL constraints used in Modechart speciﬁcations.
The future RTSML support might include (1) if it turns out beneﬁcial, syntactic ex-
tensions of the core RTSML to support embedding of formal method expressions, and
(2) adapting the existing CLP-based (such as in [46]) or other veriﬁcation / constraint-

checking approaches to that of the compiler presented in this dissertation.

54

3.2 ProCess of Compilation to CLP

The RTSML compiler is implemented in CLOS [17] and based on an SGML conversion
library called STIL [99]. Its core compiles an RTSML speciﬁcation to a CLP program

in four phases:

1. parsing of the RTSML system speciﬁcation, and creation of a hierarchy of initial
objects that contain data collected during the parsing,

2. traversing the object hierarchy and invoking real-time model-speciﬁc modules’
methods to upgrade the initial objects (by changing their class to its subclass) to
module-speciﬁc objects with real-time parameter slots and methods for generat-
ing chunks of corresponding CLP code (variable declarations and constraints),

3. structure-based (generic and module-speciﬁc) cross-linking of the objects in the
hierarchy to create a complete internal system representation needed by the
CLP-generating methods, and

4. controlling of the CLP code generation.

Each module deﬁnes methods, according to an object-oriented compilation proto-
col established by the compiler core, for creating internal representations of the system
structure speciﬁed in RTSML, for all the RTSML elements that have the type at-
tribute. The internal representation of an RTSML element instance (e.g., task), in
the context of an RTSML speciﬁcation, is an object of a class that corresponds to the

RTSML element. The object contains

0 other objects according to the RTSML speciﬁcation (e.g., a task object contains
the list of processor objects to which it can be allocated, and vice versa),

0 real-time parameters derived from the S-expressions in the RTSML element
instance’s attributes, and

o dynamically created individual (per-instance) methods that generate parts of
the CLP program:

55

— declaration generator method, for declaring program variables used by the
module,

— constraint generator method, for setting constraints upon the program
variables, and

— variable generator method, for generating lists of program variables and
their values, used as arguments of predicates generated by the compiler
core.

These methods may handle CLP code optimization.

The program variables are divided into four groups. Parameter variables are
used as input arguments to constraint predicates (e.g., Task_z'_Period, denoting the
period of task i, is usually known and assigned a value in advance). The values of
solution variables are to be determined by the CLP program (e.g., Task_z'_Processor,
denoting the processor allocated to task 2', is to be determined if a goal of the CLP
is to ﬁnd a feasible allocation for this task). Cost variables are used to compute
the cost of a solution and possibly as arguments to optimization predicates (e.g.,
the number of tasks allocated the same processor could be represented by such a
variable; an optimization predicate could use these variables to ﬁnd a solution which
minimizes the maximum number of tasks allocated to the same processor). Finally,
supplementary variables are not needed as part of a solution, but must be instantiated
in order to guarantee the existence of a solution (otherwise, a possible contradiction
might not be inferred if these variables remain represented by non-singular domains
at the end of search).

Some implementation details of the two compiler modules, both of which use
CLP(FD) as the target language, are described in the remainder of this section. The
modules are integrated via shared program variables, which are in turn constrained

56

by both modules. In other words, the high declarativity of CLP allows that real-
time subsystems be speciﬁed by laying out (in)equalities over parameters according
to their models, and integrated by laying out more (in)equalities. The more these
(in)equalities have the nature of simple constraints (as Opossed to, e.g., having high
computational complexity), the better performance could be expected from the CLP

approach.

3.2.1 Module common

Task objects created by the common module, among other information, con-
tain functions that generate declarations of program variables Task_z'_Period,
Task-t_Deadline, Task_i_TimeAt1MIPS and Task_z'_Processor, where z' is the task
index. For each task, the ﬁrst three variables are parameter variables, and the last
one is a solution variable. Messages are handled similarly, but since a message is
to be allocated to a number of channels, additional variables need to be declared
for storing the route and channels over which a message is transferred to a task,
the number of hops along the route, and the per-hop deadline. Since the routing
model supports multicasts, this module may, in order to better exploit available
resources, generate additional variables and constraints for insuring that (a) if
multiple tasks that receive a message are on the same processor, only one route is
used for transferring the message to them, and (b) if a message is multicast via
multiple but intersecting routes to multiple processors, at most one route can be

57

used between the intersecting points. Routing implementation details are omitted

for brevity.

The common module also handles processors and channels with simple, utilization-
based task/ message feasibility checks. The checks may be used for, e.g., round—
robin non-real-time scheduling, but are also strong enough for, e.g., real-time resource
management based on the earliest-deadlinc-ﬁrst (EDF) criterion. An example of use

is given in Chapter 6.

3.2.2 Module rms

Processor objects created by the rms module, among other information, contain
functions that generate declarations of parameter variables Proc_k_MIPS, where k is
the processor index. For the purpose of expressing rate-monotonic (RM) schedul-
ing constraints, additional variables are used, including Task_t'_preempts.'1‘ask_j,
Task_2'_on_Proc_k and the others that appear in the following examples. This module
generates constraints that consider all possible allocations of tasks to processors and
message to channels, and set the RM priorities to the tasks and messages. Hard
real-time schedulability constraints are generated based on a generalized version of
the well-known formula from [113] that guarantees the RM schedulability of a task

set (considering the worst-case scenario of a so-called critical instant),

i

(VigigNil(303th.-tl Zleil < t- (3-1l

1:1 T1

58

In the formula, N is the number of tasks indexed by priority and allocated to
the same processor, D,- is the deadline of task t, 03- is the time demand of task j
on the processor, and T]- is the period of task j. Unlike the formula, the constraints
must capture any subset of the N tasks that are actually allocated to the processor
in a solution. Additionally, since the tasks’ periods are input parameters, the tasks
cannot be indexed in advance; their relative priorities must be computed by the CLP
program. Finally, the above formula is expressed using the following four types of

constraints (see Table 3.2 for the meanings of the constraints):

1. #<= (Task_z'_Period, Task_j_Period, Task-z'_preempts_Task_j),
where 2' 7t j, for determining the relative task priority,

2. ceiling_d(Task_i_on_Proc_k-Checktime, Task.j_Period,
Arrivals_of .Task- j -on_Proc_k -by _Task-z' -Checkt ime) ,
if tasks 2' and j could be allocated to the same processor k, for determining how
many instances of task j could arrive before the deadline of task 2',

3. #= (Task-z'_Processor , k , Task_z'_on_Proc_k),
for determining whether task 2' is on processor k (0 = “no”, 1 = “yes”); and

4. Proc_k_MIPS * Task_z’_on_Proc-k_Checktime #>= Task_t_on_Proc_k
* (Task_z'_TimeAt1MIPS + Task.j1_on_Proc_k * Task.jl_preempts_Task_t
* Arrivals_of-Task.j1-on_Proc_k_by.Task-i_Checktime * Task.j1_TimeAt1MIPS
+ ...),
for expressing the inequality in the formula (the
summands, in which 3'] is replaced by j2, j3, . . ..)

t 3

stands for the other

The t in the formula corresponds to the Checktime variable, and the processor
time demand CJ- (assuming some processor capacity) corresponds to the ratio of the
normalized processor demand time (the TimeAthIPS variable) and the processor
capacity (the MIPS variable). Tasks indexed by j in the formula are named j1,j2, .. . .

59

Table 3.2: Some constraint predicates used

 

 

 

Syntax Semantics
#<=(E1, E2, VB) VB 2 1 iff E1 3 E2; otherwise VB 2 0
#=(E1, E2, VB) VB 2: 1 iff E1 2 E1; otherwise VB 2 0
ceiling_d(VX , Vy, V2) ”2 = [(2)5
E1 #>= E2 E1 2 E2

 

 

 

En is a linear expression including program variables, V3: is a (ﬁnite-domain)
program variable

After the constraints have been set, the domains of the Task_2'-on_Proc_k_-
Checktimes are reduced. They are subsequently further reduced as the
Task_t_Processor and then Arrivals_of_Task_j_on_Proc_k.by.Task-z'_Checktime
variables have been labeled. A solution could be found if all the Task_z'_on.Proc-k--
Checktimes had non-empty domains. This way of searching for values of t that
satisfy the formula is notably different from the one used in some RM schedulability
analysis tools (e.g., [84]), which checks the formula for values of t that are multiples

of the tasks’ periods in a generate-then-test manner.

If the variable Arrivals_of..Task_j..on_Proc_k_by-Task_i_Checktime is assigned
the smallest value possible, then the minimum value from the resulting domain of
the variable Task_t_on_Proc_k_Checktime will be an upper bound on the earliest
completion time of task 2' in the worst case. The difference between the deadline of
task 2' and this value is thus a lower bound on its worst-case maximum laxity. The
smallest such lower bound among all tasks allocated to a processor, multiplied by
the processor’s capacity in MIPS, may be used as the lower bound on the maximum
additional, unexpected load the processor can sustain so that no task fails to meet

60

its deadline. An optimal solution would maximize the smallest lower bound on the

maximum additional sustainable load among all the processors.

The handling of channels and messages is similar. The real-time communication
mode] is similar to the one deﬁned in [64] for short messages (no pipelining) in multi-
hop real-time networks, and based on the rate-monotonic scheduling of messages on
channels. The routers are assumed to have enough processing and buffer capacity,
and to insure (by delaying) that a periodic message is made ready at equidistant
points in time on all channels it traverses. In the real-time communication-related
constraints, some of the variables deﬁned and computed by the routing-related part
generated by the common module are used, such as Message-t-on_Channe1_k (analo-

gous to Task_i_on_Processor_k).

Chapter 6 presents a third compiler module, rtlrms, an extension of the rms

module for modeling Real-Time Linux [12] with a RM scheduler.

3.3 Experiments

The RTSML speciﬁcation of the system described in Section 3.1 was compiled to CLP
code that contained 17,046 program variables and 30,466 model-based constraints.
Most of these constrants were internally rewritten by ECLiPSe as several simpler
constraints. The CLP code consists of predicates that, alone or combined, implement
a problem-solving approach, such as search for an optimal solution, any solution, or
repair of a solution.

61

The division of program variables (implemented in the two modules described in
Section 3.2) was based on a natural approach to system design: the columns of Ta-
ble 3.1 were represented by parameter variables (totalling 238), and task-to-processor
and message-to—route allocations were represented by solution variables (totalling
104). (When a message is multicast, it is allocated one route for each destination task.
Allocated channels are determined from the message-to-route allocations.) There were
7,104 supplementary variables that required additional labeling. (The other variables
were uniquely determined by labeling the solution variables.) For each processor and
channel (totalling 22), a cost variable is deﬁned that contains a lower bound on the
maximum additional load the processor (in M) or the channel (in kB) could sustain
if a task or message unexpectedly exceeded its declared resource demand, so that all
tasks/ messages allocated to the processor/channel still meet their deadlines. In the
cost function, 1 k1 is valued as good as 1 kB.

The experiments were conducted under the following scenario. The complex real—
time system was speciﬁed such that initial input parameters, described in Table 3.1,
took on values from the lower half of their ranges (i.e., domains), and the CLP
program was run to solve the constraints for the output parameters, i.e., to allocate
the tasks to processors and messages to routes/ channels. After the system had been
imaginarily deployed, some input parameters were changed, and the CLP program
was run to redesign the system, i.e., in this case to reallocate the tasks and messages
as necessary to satisfy all the constraints again.

There are eight applications in the system, each consisting of one of the sensor
tasks and the tasks that process its data, as described in Section 3.1. Based on the

62

communication patterns, these applications form trees rooted in their sensor tasks.
The changes in the input parameters (at the task, message and processor level) were
modeled by changes at the application level, such as increased load due to increased
amount of input data; reduced resource capacities due to uncontrollable activities out-
side the modeled system; and need for reduced response times (e.g., reduced periods
and deadlines) in, for example, emergency situations. Five such successive changes

were modeled with six input parameter assignments (IPA) described in Table 3.3.

Table 3.3: Scenario of parameter changes

 

 

IPA Description

P0 Load, i.e., computation requirements and message lengths, in the lower half
of their domains; available processor capacity 100%

P1 Load increases up to 15% wrt. P0 down the T0 tree

P2 Load down the TO tree back to normal; load increases up to 15% wrt. P0
down the T7 tree

P3 Load down the T7 tree back to normal; available capacities of P4 and P11
drop to 8 MIPS

P4 P4 and P11 capacities back to normal; deadlines reduced up to 15% wrt. P0
down the T1 tree

P5 Deadlines down the T1 tree back to normal; periods reduced (i.e., priorities
increased) up to 20% wrt. P0 down the T2 tree

 

 

 

 

 

 

 

 

 

Two problem-solving approaches were evaluated, using IPAs in the order as spec-

iﬁed in the above scenario:

1. the conventional CLP search, in which the parameter variables are assigned the
values of the current input parameters, the constraints are set, and labelings
are performed to search for consistent values of the solution variables, and

63

2. the repair-based CLP approach, in which the parameter variables are assigned
the values of the current input parameters, the solution variables are temporar-
ily assigned the values from the previous solution, and repair labelings are per-

formed that modify the temporary values until all the constraints are satisﬁed.

The experiments were run using ECL’PSe 4.0.2 on a 300 MHz single-CPU Ultra-
SPARC running Solaris 2.6. It is difﬁcult to ﬁnd the relation between the constraint
level and search time. Usually, the hardest problems are neither underconstrained nor
overconstrained. The results are discussed in the following subsections (the search

for optimal solutions has not been yet evaluated).

3.3.1 Conventional CLP Approach

Relatively consistent timings were obtained using a labeling heuristic that consists of
a sequence of two partial labelings: the “smallest-domain and most-constrained ﬁrst”
variable and “smallest-to—largest” value ordering for the solution variables, followed
by the “smallest-domain ﬁrst” variable and “smallest-to-largest” value ordering for
the supplementary variables. (Note that the value ordering is important for better
approximating the costs, as described in Section 3.2.) The timings of constraint
setting and labeling until the ﬁrst solution is found are shown in Table 3.4. With the
“smallest-domain ﬁrst” variable ordering of the solution variables, some timings were
about as good as for the repair-based approach below, but some were much worse.
Even though the conventional approach does not appear fast to be used in an
incremental manner, it appears good for one-time applications. Compared to, e.g.,

64

Table 3.4: Conventional CLP approach timings

 

 

 

IPA Setting con- First so- Sustainable over-
straints [s] lution [5] load [kI, kB]
P0 13.11 7,350.09 10—8000, 0—130
P1 12.89 1,180.07 30—8000, 0—130
P2 13.09 137.03 10—8000, 0—130
P3 13.07 6,750.46 10—8000, 0—130
P4 12.86 333.80 20—8000, 0—120
P5 12.85 231.08 50—8000, 0—120

 

 

 

 

 

a search technique popular in the recent years, genetic algorithms (GA), it promises
signiﬁcantly better performance and ease of use on highly constrained problems, such

as complex real-time systems, on which the GA perform poorly [41].

3.3.2 Repair-Based CLP Approach

A generic CLP program loop that uses the repair library of ECL‘PSe is shown in F ig-
ure 3.3 for the purpose of explaining the results. The variables Parameter, Solution
and Cost are lists of the corresponding program variables; the supplementary vari-
ables are “shown” in the ﬁgure together with the solution variables (i.e., also in
Solution) for simplicity. A notable difference from the conventional approach is that
the constraints are set using the input parameters’ domains instead of their current
values. The resulting constraint networks are thus weaker, but this is necessary in
order to allow input parameter modiﬁcations.

The repair-1abeling predicate ﬁnds conﬂicting variables and constraints, and
tries to assign values to the variables that result in no further conﬂicts. The fastest
repairs were obtained using a labeling heuristic for the solution variables that ﬁrst

65

 

repair_100p :-
setup_constraints_as_repairable(Parameter, Solution, Cost),

(

1abe1ing(Solution), Z find initial solution
copy_term(p(Solution, Cost), S), % trim associated constraints
setva1(solution, S), X store solution
output_solution(Solution, Cost),
fail Z undo labeling
repeat,
getva1(solution, p(01dSolution, Cost)), % fetch old solution
modify_parameters(Parameter), Z use new IPA

tent_set(Solution, 01dSolution), % set old solution as tentative
repair_labeling(Solution, Cost), % repair tentative solution
tent-get(p(Solution, Cost), p(NewSolution, NewCost)),
copy_term(p(NewSolution, NewCost), S), X trim associated constraints

setva1(solution, S), X store new solution
output_solution(NewSolution, NewCost),
fail 2 undo repair_labe1ing

 

Figure 3.3: A generic repair-based CLP program

assigns values to conﬂicting variables using the “smallest-domain ﬁrst” variable and
“smallest-to—largest” value ordering, and then variables involved in conﬂicting con-
straints using the “smallest-domain ﬁrst” variable and “smallest-to-largest” value
ordering. This labeling heuristic was followed by another one for the supplementary
variables, in which the “smallest-domain ﬁrst” was replaced by the “smallest-domain

and most-constrained ﬁrst” variable ordering.

The repair loop from Figure 3.3 was slightly modiﬁed to start with the IPA P0 and
replace it by the IPAs P1, P2, . . . , P5, successively. In this way, a current solution
is checked against a new IPA and, if a repair is necessary, subsequently replaced by
a new solution. This corresponds to a scenario in which the input parameters of a
continuously-operating complex real-time system change dynamically, and the repair

66

Table 3.5: Repair-based CLP approach timings

 

 

 

IPA change modify_parameters [s] tent_set [s] repair_labeling [s]
P0 —-) P1 34.58 40.19 199.92
Pl —> P2 34.55 40.12 64.60
P2 —> P3 34.97 42.20 67.64
P3 —> P4 36.14 40.40 68.24
P4 —> P5 34.47 40.86 215.44

 

 

 

 

 

loop redesigns the system on-line. The solution variables were stored at the end of
each iteration, but the supplementary variables were not, because they are local and
dependent on the solution variables. Using their values under P0 as temporary in
each iteration resulted in faster repairs.

Table 3.5 shows timings for assigning new values to the parameter variables
(modify_parameters in Figure 3.3), assigning temporary values to the solution vari-
ables (tent_set), and repair labeling, relative to the moment the corresponding IPA
change occured. The ﬁrst solution under P0 was obtained using the conventional
approach; this search time is not accounted.

Compared to the conventional approach, the described repair-based approach is
considerably faster in incremental solving overall. However, this holds for relatively
moderate parameter changes; for signiﬁcant changes, this approach tends to yield
timings similar to those of the conventional approach. The choice of the problem-
solving approach depends on both the search time and the solving time (off- or on-
line). In this approach, it is possible to further improve the search time by identifying
temporarily stable parameters and incorporating their values in advance of the solving
time, and also by dynamically choosing the labeling heuristics. Due to its inherent

67

scalability [5], and having in mind that model-oblivious repair heuristics were used and
that ECL’PS“ is not a heavily optimized system, the repair-based approach appears

good for dynamic resource allocation in large-scale complex real-time systems.

Table 3.6: Successive solutions’ distances

 

 

 

Next IPA Conventional Repair-based
approach approach
P1 (12, 13) (1, 6)
P2 (13, 21) (0, 0)
P3 (15, 24) (11, 28)
P4 (18, 22) (3, 11)
P5 (19, 22) (1, 5)

 

 

 

 

As an alternative for the fast repair, the repair-based approach may also be useful
for minimally perturbing a previous solution. In this case, corresponding system
reconﬁgurations are less considerable, which is sometimes desired. Some beneﬁts of

a perturbation-optimized heuristic include

0 potential ﬁnding of a stable solution in situations when there is a pattern in the
parameter changes, and

0 reduced system reconﬁguration enforcement time, which could possibly be com-

parable to the repair time.

A repair labeling heuristic for this purpose was similar to the one described
above, with the only difference in the “smallest-to-largest” value ordering replaced
by “previous-value ﬁrst” for the solution variables. The timings of this heuristic are
somewhat larger overall and have larger variance, whereas it is consistent in ﬁnd-
ing solutions that are relatively close to their predecessors. Table 3.6 compares the

68

distances between successive solutions found by the conventional approach and by
this repair heuristic. The distance between two solutions is an ordered pair (tp, mr),
where tp is the number of task-to—processor allocations and mr is the number of
message-to-route allocations in which the solutions differ. (Notice that the solution
to P1 satisﬁes the constraints under P2, too. The previous, speed-optimized heuris-
tic found different solutions to P1 and P2 because of the temporary values of the
supplementary variables, which were taken from the solution to P0 when repairing
P1 ——> P2. These values caused (artiﬁcial) conﬂicts, and since the speed-optimized
heuristic does not try previous values of the solution variables ﬁrst, it found anoth-
er solution even though the old one had satisﬁed the constraints. However, this is a
“false alarm” kind of situation that may be traded for faster repair in other situations.)
The perturbation-Optimized repair-based approach resulted in fewer task-to—processor
and message-to-route reallocations triggered by dynamic changes in the input system

parameters.

3.4 Scalability Issues in the Approach

The efﬁciency of a CLP problem-solving approach is highly dependent on the structure
of a problem [51]. In general, the faster the search space is pruned by the constraint
propagation, the easier it becomes to ﬁnd a solution. A complex real-time system is in
a CLP program modeled by a number of global and local constraints and variables,
corresponding to integration-related and subsystem/component—speciﬁc constraints
and system parameters, respectively. For the conventional CLP approach and a

69

tightly-integrated complex real-time system, labeling the global variables ﬁrst might
result in faster pruning of the search space. However, if loosely-integrated complex
real-time systems were shown to scale better in the real—time sense (in the light of the
discussion at the end of Section 2.1), there would be relatively less global constraints
and variables (i.e., their number would slowly increase with the system size) in their
CLP models, which would negatively affect the scalability with this problem structure-
aware choice of labeling. On the other hand, for the repair-based CLP approach and
a previous solution to a loosely-integrated complex real—time system problem, if the
violated constraints were mostly local, the repair time might be almost independent of
the system size. Otherwise, or if the complex real-time system were tightly-integrated,
that would increase the probability of a need for global repair (due to a violation of
global constraints). In these cases, the beneﬁt of the repair-based approach might

degrade.

3.5 On the Correctness of the Approach and Com-
pilation

This section discusses correctness issues in the approach and RTSML-to—CLP com-
pilation that are most relevant for users and compiler module developers. First, the
semantics of the source (RTSML) and target (CLP) languages, and the power of the
approach are addressed. Next, details and the correctness of the compiler modules
used in the example of Section 3.1 are discussed. Finally, the correctness of the

70

approach and compilation are evaluated against a set of high-integrity compilation

criteria.

3.5.1 Language Semantics and the Power of CLP Problem

Solving

The RTSML is a declarative, relational and extensible language. Individual re-
sources (processors for computation; channels for communication and I/O), ac-
tivities (tasks for computation; messages for communication and I/ O), and routing
information of a complex real-time system are speciﬁed in a declarative manner, using
RTSML attributes. Structural and other basic relations among these components and
subsystems are determined by the nesting of RTSML elements (tags) and the values of
attributes other than the type attribute, and can be intuitively interpreted from the
DTD given in Appendix A. (The sys and group attributes are meant to be available
for use by the modules, and are not used as structural information by the compiler
core.) For example, by listing the ids of processors and/or processor-groups as
the value of the processors attribute of a task, a relation “can be allocated to”
is established between the task and designated processors. Real-time model-speciﬁc
parameters and relations among the components and subsystems are determined by
module-speciﬁc, extended syntax of the type attribute that is speciﬁed by the devel-

oper of each module.

The relations in an RTSML speciﬁcation are viewed through the constraint mod-
el [65]. Unlike, e.g., the traditional relational model in which relations are represented

71

by ordered tuples, they are represented by ﬁnite constraint forms while the ordered
tuples satisfying them are implicit. The dominant real-time model-speciﬁc parameters
and relations naturally ﬁt the constraint model, but it is not difﬁcult to ﬁt the basic
relations into this model either. In the above example task-to-processor allocation
relation, the ordered tuples are (task, processor) ordered pairs, while the relation
“can be allocated to” can be represented by a constraint, in the general mathematical
sense (e.g., a set membership constraint), that for the given task task, processor can
only be one of the designated processors. For all supported syntax extensions and
real-time models:

Deﬁnition. The semantics of an RTSML speciﬁcation is the set of all ordered
tuples of all the system parameters such that all the basic and real-time model-speciﬁc
constraints are satisﬁed.

The system parameters include both those ﬁguring in the basic relations, such as
the task—to-processor allocations, and real-time parameters, such as the task periods.
Different compiler modules consider different system parameters as input parameters
and the others as output parameters, but they are all treated equally in RTSML. The
mathematical domain and constraint types over a system parameter are not restricted.
In the real-time scheduling theory, for example, constraints usually involve non-linear
functions of real numbers.

In Section 2.2.1, it was said that an arbitrary conjunction of constraints in a CLP
language is solvable if the domain of computation satisﬁes the existential closure of
it. It is assumed that the satisﬁability problem is decidable by the corresponding
solver, which is sound and complete wrt. its domain and constraint types. Accord-

72

ingly, the semantics of such a CLP program, without LP predicates, is the set of all
ordered tuples of values of the quantiﬁed variables for which all the constraints are
satisﬁed by the computation domain. The ECL’PS“ system allows the integration of
different constraint solvers, e.g., FD and RIA. This is achieved via declaring program
variables as belonging to multiple computation domains at once, and deﬁning special
constraint propagation handlers that make the different constraint solvers cooperate
and maintain a consistency relationship between the multiple value domains of the
program variables. Hence, the semantics of a CLP program with multiple integrated
solvers/ computation domains is deﬁned similarly as above, and includes the multiple
values of the multiply-declared variables for which all the constraints are satisﬁed by
all the computation domains.

The goal of the RTSML-to-CLP compilation is to transform a complex real—time
system problem to one that is decidable by one or more cooperating constraint solvers,
by mapping from unrestricted mathematical domains and constraints to computation—
al domains and constraints that can be handled by the constraint solver(s). In some
cases, such as restricting a domain from reals to integers, the approach loses its com-
pleteness (and that is why the term transformation is used instead of reduction [86]).
Namely, the CLP solver(s) may not ﬁnd a solution to the transformed problem even
if there exists a solution to the original problem. (In this section, a solution means
an ordered tuple of all values of interest, of both input and output variables/system
parameters.) Although the loss of completeness is difﬁcult to avoid in practice, it can
be controlled in many cases by, for example, changing the degree of approximation of
real numbers by integers, and/or resorting to multiple constraint solvers with differ-

73

ent capabilities. On the other hand, the developers of compiler modules that handle
different real-time models and their integration with other models must insure that
the approach retain its soundness in all anticipated uses. In other words, a CLP pro-
gram must not positively decide its transformed problem if the original problem has
no solution. Beside the obvious need for a correct problem transformation (the main
task of the compilation), this issue is related to the power of CLP problem solving.
Generalizing what was discussed in Section 2.2.3, a CLP solver needs enough
information in order to return a (correct) solution to the transformed problem. There

are correctness nuances, based on the amount of supplied information:

1. The CLP solver syntactically accepts stated constraints, even they ex-
ceed its solving power. An example are FD constraints DI, Y] :: 0. .10,
X * Y #>= 5., because the second constraint cannot be linearized and decided.
The result returned are the original domains of the variables, and is interpreted
as “there might be solutions for the domains,” which cannot be considered

correct.

2. Some stated constraints do not exceed the solver’s power, but the solu-
tion is not “good enough.” For example, FD constraints [X,Y] :: 0. .10,
X * 2 + Y * 3 #>= 5. are decidable, but the solution are the original vari-
able domains. It is interpreted as “there are solutions for the domains,” which

cannot, for example, help linearize an additional constraint like it * 2 #<= 5.

3. By explicitly and signiﬁcantly reducing the domains of some variables, the so-
lutions to some stated constraints may become “good enough.” In FD, this is

74

done by labeling the variables (instantiating them with consistent values), in
RIA by invoking a strong-propagation algorithm to search for a small (bounded)

consistent sub-interval of a variable’s domain, etc.

The RTSML compiler module object-oriented interface includes a method that is
supposed to list all program variables whose domains should be explicitly and signif-
icantly reduced in order to guarantee the correctness of solutions to the constraints
generated by a module. The reduction is performed by CLP code generated by the
compiler core. The following subsection shows excerpts of an example problem trans-

formation and steps taken for ensuring the correctness of the CLP problem solving.

3.5.2 Correctness-Related Compilation Details

To summarize the presentation of the two compiler modules from sections 3.2.1 and
3.2.2, the meaning of the RTSML speciﬁcation of the example complex real-time
system consists of all solutions to the following constraints (stated informally with

reference to the preceding sections, for brevity):

0 Each common task is assigned a period, deadline and normalized execution time
demand in ms, and is allocated a processor.

0 Each common message is assigned a period and deadline in ms, length in kB, a
source and a list of destination tasks.

0 Each common route is allocated an ordered tuple of channels, source and desti-
nation processor.

0 Each rms processor is assigned a capacity in idealized MIPS.

e For all common tasks allocated to a rms processor, (1) relative priorities are
determined according to the RM criterion, and (2) the schedulability constraint
in Equation 3.1 is satisﬁed.

75

For each (message, destination task) ordered pair, a route is allocated from
the processor allocated to the message’s source task, to the processor allocated
to the destination task.

Without resource usage Optimizations, allocating a route to a message means
allocating all the route’s channels to the message.

The per-hop deadline of a message is equal to the message’s deadline divided
by the maximum route length over all its destination tasks.

Each rms channel is assigned a bandwidth in MB /s.

For all common messages allocated to a rms channel, (1) relative priorities are
determined according to the RM criterion, and (2) a schedulability constraint
similar to that in Equation 3.1, using the per-hop deadlines, is satisﬁed.

Both compiler modules use the F D solver only, so that all system parameters

must be approximated by integers. Referring back to the rms processor-related FD

constraints listed in Section 3.2.2, the meaning of the transformed problem consists

of all ordered tuples of integers that satisfy FD constraints including the following

(again, informally and for an illustration purpose):

If the period of task i is less than or equal to the period of task 3', the value of
FD variable Task_i-preempts-Task_j is 1; otherwise 0.

User, non-linear ceiling constraint predicate over FD is deﬁned as
ceiling-d(x, Y, Z) :- Z * Y #>= X, Z * Y #< X + Y.

If task i is allocated to processor k, the value of FD variable Task_i_on_Proc_k
is 1; otherwise 0.

The CLP version of the RM schedulability constraint is a non-linear constraint
over FD involving integer variables.

The common and rms compiler modules insure that their variables get properly

instantiated by the time a solution is found. Some variables, such as the task periods,

are input parameters and are instantiated in the beginning of the search; some are

76

trivially guaranteed to be instantiated, such as the Task_i_preempts_Task_j variable.
Others ought to be labeled: solution variables, such as Task_i_Processor, and sup-
plementary variables, such as Arrivals_of_Task_j_on_Proc_k_by_Task_i_Checktime.
Finally, some variables are allowed to remain represented by non-singleton domains,
such as Task_i_on_Proc_k_Checktime, which neither prevents other constraints from
being solvable nor is a speciﬁc value of it needed at all.

It is easy to see from the above representative excerpts that, assuming that all
the CLP constraints are decidable, the compilation is meaning-preserving in the sense
that a solution to the CLP problem corresponds to a solution to the RTSML problem.

In other words, the compiler-based approach is not complete, but is sound.

3.5.3 Evaluation Against the High-Integrity Compilation

Criteria

Although the RTSML compilation is not an exercise in high-integrity compilation,

the following criteria [105] help summarize this section.

1. The high-level source language must have a target-independent meaning. It must
be possible to deduce the logical behavior of any particular program [speciﬁca-
tion], independent of its execution on a particular target machine.

This property follows from the deﬁnition of the RTSML purely declarative se-
mantics.

2. This implies, among other things, that the source language must have a mathe-
matically deﬁned semantics. Otherwise, it is impossible to deduce what should
be the effect of executing a particular program.

Although informally stated, the RTSML semantics is mathematically deﬁned
via the semantics of supported real-time models.

77

3. The target machine language must also have a mathematically deﬁned seman-
tics. Otherwise, it is impossible to prove that the compilation translation is
correct.

A CLP language also has a mathematically deﬁned semantics.
4. The compiler from the source to the target language must be correct. Hence it

must be derived from the semantics of both the source language and the target
machine’s language.

The compiler correctness is to be shown for each module, in a way similar, but

complete, to that presented in Section 3.5.2.

a. To permit validation, the compiler for a high-integrity language must be seen
to be correct. It must be written clearly, and must be clearly related to the
semantics.

All compiler modules only transform constraints from RTSML to CLP, and
follow the object-oriented compilation protocol to insure that the CLP solver(s)
can decide the transformed problem.

6. The target code produced by the compiler must be clear, and easily related to
the source code. This gives the visibility to the compilation process that is a
requirement for high-integrity applications.

The target code is a text ﬁle, and the names of CLP variables are explanatory.

7. The semantics for both the source and target languages must be available for
peer review and criticism.

Module-speciﬁc RTSML semantics are based on extant real-time models, and
the semantics of model integration is also a real-time issue. The semantics of
CLP languages is a well-studied issue.

3.6 Summary

An approach to the design and engineering of complex real-time systems, based on
CLP and compilation from a high-level, domain-speciﬁc speciﬁcation language has
been presented. It is oriented toward the integration of various real-time models,
applied to computation, communication and I/O. One basic and a rate-monotonic
technology-based system model have been implemented and brieﬂy described.

78

An example complex real-time system, consisting of dozens of tasks communicat-
ing and competing for real-time scheduled computation and communication resources,
has been speciﬁed, and the approach was evaluated on the example. The timings
obtained Show that the approach has acceptable performance overall and that the
repair-based CLP approach is promising one for coping with dynamic changes in the
system parameters.

Scalability issues of the CLP problem—solving approaches have been analyzed in
combination with scalability issues of target complex real-time systems. Correctness
issues of the compiler-based approach have been discussed, such as the completeness
of the CLP-based modeling and soundness of the CLP problem-solving.

The following three chapters describe work that covers a run-time front-end of
the repair-based CLP approach. Namely, (1) a distributed instrumentation system
to gather performance data from a complex real-time system; (2) a PAV tool to
process and transform the performance data into the input parameters of the real-
time system models of the target system; and (3) an integration of the ﬁrst two, PAV-
driven, repair-based dynamic system reconﬁguration. Based on observed changes in
the input parameters, either by a human Operator or automatically, the CLP tool

may be demanded to repair the current solution.

79

Chapter 4

A Portable and Flexible
Distributed Instrumentation

System

This chapter presents a basic, reference implementation of a portable and ﬂexible
IS called BRISK (Baseline Reduced Instrumentation System Kernel). Along with
the description, approaches that provided the TS performance gains are discussed.
Section 4.1 describes major objectives of distributed ISes, and approaches taken in
the design and implementation of BRISK. Section 4.2 details the BRISK architecture
and implementation. Results of evaluating its performance and scalability are given

in Section 4.3.

80

4.1 Objectives and Approaches

BRISK is a simple distributed IS designed with the following goals in mind:

0 to be portable to a majority of operating systems and platforms, and easily
used with a wide range of parallel/distributed applications and systems, and

e to provide a robust implementation base for the development of future distribut-
ed monitoring and control systems.

Figure 4.1 depicts the concept of BRISK as a distributed IS kernel that could be

extended to support higher-level, more speciﬁc monitoring approaches.

 

Domain and application-specific extensions

. source—code instrumentation

- monitoring approaches

- adaptive intrusion adjustment

- perionnance-intonnation description
- experiment control

- performance visualization

 

r
BRISK as a kernel ]

- portability

- performance knobs

- event-based monitoring
~ time-stamping

- event ordering

- clock synchronization

 

 

 

 

 

t J;

Figure 4.1: BRISK as an instrumentation system kernel

Design objectives concreting the above goals can be divided into three groups:

Performance ﬂexibility. BRISK should offer high performance under various re-
quirements. Simple metrics were evaluated ﬁrst that are related to localized IS

81

performance, and to implement, evaluate and optimize relevant parts of BRISK
in isolation. More complex IS performance metrics, related to IS operation
on distributed applications, depend on the simple ones, plus the target appli-
cation and system characteristics. Henceforth, “tuning knobs” were added to
many of BRISK’s subsystems, so that users can balance simple and complex IS

performance metrics in a speciﬁc working environment.

Usage ﬂexibility. BRISK should be able to support different monitoring applica-
tions, such as ﬁne-grained instrumentation for testing, performance measure-
ment for visualization and/or steering, etc. It should also be able to emulate
various monitoring methods and techniques, such as hybrid (software instru-
mentation/hardware detection) monitoring for tracing or proﬁling. Without
assuming speciﬁc hardware monitoring support, BRISK implements a generic
software, event-based monitoring approach that allows (1) new users to start
instrumenting their parallel/distributed applications quickly, and (2) then in-

crementally extend and/ or specialize the approach.

On the other hand, the IS should be compatible with a variety of extant,
independently-built tools and systems for the analysis of instrumentation da-
ta. Rather than impose a quasi standard of instrumentation data storage upon
these tools/systems—and likely, in practice, render them unusable—it should

be possible to adapt or extend BRISK to support a particular interface.

Finally, different parallel / distributed applications may require very different in-
strumentation / experiment scenarios. It is likely that no ﬁxed IS control protocol

82

and/or graphical user interface can be powerful enough to support arbitrary s-
cenarios of testing these complex systems. All these issues resulted in the design
of BRISK as a general-purpose distributed IS kernel, based on a simple model

described in Section 2.4.

Portability. BRISK, as general-purpose distributed software, should be able to op-
erate in heterogeneous environments. This means that it should be designed and
implemented assuming a minimal set of resources that are available in a major-
ity of parallel/distributed environments. The basic implementation of BRISK
relies on a small number of highly available systems libraries, such as those for
interprocess communication through shared memory, External Data Represen-
tation (XDR), and TCP/IP protocols. (Shared memory IPC semantics may
vary across different platforms, but this is a minor problem. It is important

that there is probably no platform that does not support shared memory IPC.)

A number of important issues challenge the concept of a portable and ﬂexible

distributed IS. These include:

Degree of intrusion on the application. Due to the instrumentation overhead,
a target system may exhibit different behavior. The overhead should be pre-
dictable and must not change the order and timing of critical events in the
target system, i.e., cause perturbation. (In this dissertation, the term pertur-
bation means only those changes in event timing that are signiﬁcant from the
target system perspective). It is desired, especially for real-time systems, that
IS components are schedulable with the target system, so that perturbation

83

analyses can be performed using schedulability theory. The BRISK design ad-
dresses this issue by defining a via-shared-memory event detection / notiﬁcation
protocol atop of which different event generation schemes may be used, and by

having an event processing/ delivery component as a separate process.

Global clock reference. Processes that make up a parallel / distributed system run
on processors that may have non-synchronized clocks. It is very difﬁcult to
determine the precise global time and/or relative timings of events, which is
mandatory for determining accurate global states and debugging erroneous tim-
ing behavior based on the instrumentation data. BRISK by default (1) chooses
the best available systems clock call for a platform, (2) converts the current
local time into the Universal Coordinated Time (UTC) format, and (3) uses a
distributed clock synchronization algorithm. In this way, users who can afford
synchronized hardware clocks may Specialize BRISK for their type of platform.
BRISK also provides a tunable algorithm for coping with network delays and
minimizing the chance for the instrumentation data to be delivered to consumers

out of order.

Throughput and latency of the instrumentation data transfer. Events of
interest in a target system that are to be processed on-line may collectively
form large volumes of instrumentation data and monopolize resources. On
the other hand, in time-critical analysis, important events may need to be
delivered to a central place as soon as possible. The IS should be able to adapt

84

to throughput/latency requirements of the target system. The basic BRISK

implementation has a number of command-line parameters for this purpose.

Support for transparent monitoring. Adding significant amounts of instrumen-
tation code to parallel /distributed systems by users is subject to errors. It is
important that tools can be built based on the IS to instrument the target sys-
tem automatically, so that the users need only specify what to monitor, from
which aspect, and at which level. The IS design should be ﬂexible enough to al-
low the develOpment of such tools. This issue actually overlaps with the BRISK
usage ﬂexibility objective. The default software event generation is amenable
to automation at the source level. Different aspects of the target system perfor-
mance may be dynamically monitored after a tool has instrumented its source

code, possibly with the user’s guidance.

4.2 Description of BRISK

This section details the BRISK architecture and implementation.

4.2. 1 Architecture

The architecture of BRISK is shown in Figure 4.2 (for less intrusion, a separate IS
network is preferred, but not necessary). On each node, multiple user processes are
instrumented using internal sensors. The internal sensors use cpp macros, described
later in the Section 4.2.2, to write instrumentation data records to the shared memo-
ry. The shared memory is read by an external sensor, which runs as another process

85

on the same node and may be assigned a lower priority. Both the internal sensors
and the external sensor (EXS) form an local instrumentation system (LIS) that sends
instrumentation data to the instrumentation system manager (ISM). Time-stamps,
embedded into the instrumentation data records by the internal and external sensors
on different nodes, are synchronized. BRISK synchronizes LIS clocks using a modi-
ﬁcation of the Cristian’s clock synchronization algorithm. This algorithm invokes a
master-slave strategy whereby a master polls the slaves, determines differences be-

tween the values of its clock and the slaves’ clocks, and updates the slave clocks.

 

     
 
 
   
  

Target network 1

 

 

 

 

 

APP = application
INS = internal sensor
EXS = extemal sensor
ISM = instmmentation shared w
tem meager
PGRT- IE = our t W
integration environment
9,, as

 

 

 

 

 

 

l

 

 

temporal and '''' instmmentaticn data
causal ordering - - - control
\

— app. cormunication

 

 

 

 

 

 

 

 

Figure 4.2: Architecture of the BRISK instrumentation system

The instrumentation data transfer protocol used between a LIS and the ISM is
based on XDR, which makes BRISK amenable to a heterogeneous environment. In
the ISM, the instrumentation data records are sorted on-line by time-stamp before

86

delivery to consumers. Special, causally-related events are additionally ordered, pos-
sibly overriding incorrect time-stamps. The default output mode of the ISM is writing
to a shared memory, which is then read by instrumentation data consumer tools. Be-
sides writing to shared memory, the BRISK ISM may log instrumentation data to
trace ﬁles in the PICL-compliant [122] ASCII format, or it may pass instrumentation

data to a list of CORBA-enabled visual objects [6].

4.2.2 Implementation

BRISK is written using the FWEB [38] literate programming package and may be
compiled by a C or C++ compiler to create two executables, exs—external sensor
and ism—instrumentation system manager; a header ﬁle with dynamically-typed N 0-
TI CE macros; a utility tool for creating faster and shorter, statically-typed NOTICE
macros, and a tiny library for shared memory initialization and access. Figure 4.3

shows details of the implementation described below.

The main BRISK subsystems and components are described below. The pre-
sentation order follows that of the ﬂow of instrumentation data from source (LIS)
to destination (output of the ISM). The LIS time-stamps events with the help of
the distributed clock synchronization algorithm. The transfer protocol forwards the
instrumentation data over to the ISM, for on-line sorting and output.

Local instrumentation server (LIS). The BRISK NOTICE macros are inter-
nal sensors used in an application for event notiﬁcations. They are an extension of
JEWEL cpp macros, which write a data record consisting of integers to a ring-buffer

87

 

 

 

 

   
    
   
   
     

   

 

 

     

 

 

 

 

  
 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

      
     
   
 

 

 

 

 

 

 

 

 

F—Tﬂ F——\
. . O
' I '1 shared
' NOTICE macro Z ' NOTICE macro 'ﬁ mem.
t ...... l I. ______ l j
instrumented instrumented d l ta ‘
L application J application e 5
F——( f N ' ' ' FT——\ f—
: / batching, / batching,
-' —————— latency —- —————— latency
18019533”? .——1 180195??? r—l control
instrumented ‘ ' i'nétr'uinén'réd """" >
t application i lication
!_———"’/
in-order transfer
'———> data ﬂow ' ' ' sync '°°p
""" > control ﬂow OLS = On-Line Sorting
CRE = Causally-Flelated
a batch queue shared Events
mem
E event queue j .
kinshumentation system ""9 buffer
switch for causally
CORBA
@ related events F y R
V ts-ordered heap instrumentation instrumentation
(ts = trmestamp) ﬁle system data consumer data consumer
tool tool
1 event dropping
L—_J

 

 

 

 

 

Figure 4.3: Diagram of the BRISK basic implementation

data structure in shared memory. The NOTICE macros are capable of writing hetero-
geneous records, with over ten basic types available for individual ﬁelds, ranging from
bytes, to ﬂoats, to null-terminated strings. It is possible to modify certain parts of the
NOTICE macros to affect the degree of intrusion and perturbation on the target ap-
plication. The potential modiﬁcations that are not part of the basic implementation

are based on the following.

0 Regarding the intrusion, it can be lowered by, for example, partially evaluating
the NOTICE macros in advance (as described at the beginning of this section),

88

or by dropping events in certain cases. Namely, the NOTICE macros use a
simple ring-buffer locking mechanism to allow multiple instrumented programs
to run on the same node; this mechanism could be busy-waiting, blocking or

causing a NOTICE macro to be dropped if the lock is not free.

Regarding the perturbation, the described lowering Of the intrusion of the NO-
TICE macros is one way Of avoiding it in an instrumented target application.
Sometimes, a low level of perturbation in an instrumented target application
may be acceptable; however, the perturbation issue may reappear if the instru-
mentation code is removed from the target application after testing. To avoid
this reversed perturbation, the NOTICE macros may be slightly modiﬁed and
left in the target application code. For example, they could be modiﬁed to write
event data records tO private memory when the application is not monitored,
so as to take approximately the same time to execute; the drawback Of the

software approach is a change in memory reference patterns.

With the basic implementation, it is also possible to locally activate/deactivate

event detection by dynamically toggling the value of a BRISK variable that is in

the sc0pe Of the target application source code. Beside this simple Option, BRISK

provides a generic, ﬂexible scheme for on-line remote control of the instrumentation

and/or steering Of the target application. Instrumentation data consumers can send

data through BRISK in the other direction that are written in a shared memory area

accessible to the EXS and target application. The organization and interpretation of

89

these data in the shared memory area are to be deﬁned in a contract between the

target application and a monitoring/steering tOOl, and can be easily automated.

Besides the data types for event data, three system types are available for coor-
dination among BRISK, instrumented distributed applications, and instrumentation
data analysis tools. A “time-stamp type,” X_TS, allows the user to embed BRISK’S
internal time-stamp into an event record. The embedded time-stamp is an eight-byte
longlong_t, representing the number of microseconds Of Universal Coordinated Time
(UTC). The local time is read by a call to gettimeofday library function embedded
in the NOTICE macro code. (If gettimeofday is not available on a platform, a
substitute should be provided.) The read value is then added to a correction value,
maintained by the EXS. The result is a time-stamp Of the synchronized EXS clock,
embedded into the record before its sending to the ISM. The other two system types,
X.REASON and X-CONSEQ, are used to mark causally-related events. The user supplies
u_1ong identiﬁers for ﬁelds Of these types, determining which consequence events must
follow respective reason events. If BRISK’s clock synchronization algorithm fails to
prevent the occurrence of so-called tachyons, i.e., consequence events that appear to
happen before their reason events, events marked using these system types will be
post-processed by the ISM to ensure proper ordering. Without additional informa-
tion, it is not possible to handle more than pairs Of causally-related events. The
technique described in Section 2.4 uses other event data records, speciﬁc to the PICL
library, to gather information about causally-related send/ receive events in collective

communication.

90

An example NOTICE macro call is shown in Figure 4.4 (to save space, the macro
code itself is not described). It writes an event record in the shared memory with the
structure shown in Figure 4.5. All ﬁelds are aligned according to their types.
char *C = "Job XYZ";

float percent_done = 58.7;
NOTICE_3(X_TS, TIMESTAMP, X_STRING, c, X_FLOAT, percent_done);

Figure 4.4: An example three-ﬁeld NOTICE macro call (internal sensor)

8-byte internal time-stamp (ITS) in UTC format

1-byte X_TS type (not followed by a value; ITS will be used instead)
l—byte X_STRING type

null-terminated "Job XYZ" (8 bytes, assuming 1-byte chars)

1-byte X_FLOAT type

sizeof(float)-byte 58.7

1-byte end-of—record/end-of-buffer marker

Figure 4.5: In-memory structure Of the event record generated by the call in Figure 4.4

A goal is to provide the convenience Of dynamic typing to new users, and, at the
same time, retain the form Of a macro for lower intrusion. The header ﬁle contains N O-
TICE macros for up to eight dynamically-typed ﬁelds, which are likely to be sufﬁcient
for many uses. More than eight dynamically-typed ﬁelds in a macro adds excessive
code to a compiled instrumented application, which indirectly increases the intrusion.
A utility tool is provided, accompanying the basic implementation, to create custom
NOTICE macros having user-deﬁned ﬁeld types and insert them into the header ﬁle.
This tool effectively supports an on—demand partial evaluation/specialization of NO-
TICE macros that results in smaller and faster code, and thus lower intrusion. It

91

exempliﬁes the ﬂexibility Of BRISK and can be used as a starting point for the other
modiﬁcations Of NOTICE macros mentioned above.

Distributed clock synchronization algorithm. Each LIS takes part in the
distributed clock algorithm. The main difference between the BRISK algorithm and
the Cristian’s algorithm is that the master (ISM) time is used only as a common
reference point for computing relative skews Of the slave (EXS) clocks. That is, instead
of directly adjusting the EXS clocks based on their skews relative tO the ISM clock, like
in the Cristian’s algorithm, the BRISK algorithm only considers relative differences
between the EXS clock values for the adjustment purpose. This modiﬁcation is due
to the fact that, for measurement purposes, it is important that the EXS clock values
be as close to each other as possible, while it is not necessary for them to be close
to the ISM clock value. At the cost Of small positive drifts Of the EXS clocks, the
algorithm described below converges faster than the original Cristian’s algorithm.
Other minor differences between the two algorithms are (1) in the criterion for the
measurement discards; and the absence Of some Optimistic enhancements in the basic
BRISK implementation, such as (2) dynamical changing Of the number Of queries in
a round based on the Observed probability Of a successful query; and (3) inclusion Of
the EXS hardware clock drift in the adjustment calculation.

Firstly, an EXS clock with the maximum positive skew relative to the ISM clock,
i.e., one with the most-ahead clock, is selected, based on polling as in the Cristian’s
algorithm. Then, the skews of the other EXS clocks and their average are computed
relative to the selected EXS clock, as absolute values. Finally, only the EXS clocks

whose relative skews are greater than the average, i.e., only those slower than the

92

average, are advanced by a correction value. The purpose Of this restriction is to ac-
count for the network noise and, in a conservative manner, take care not to promote
another EXS clock as the fastest one erroneously. The price of this decision is po-
tentially slower convergence. The correction value is chosen as follows: if the average
value is above a small threshold, e.g., 3 times the number Of EXSes, in microseconds,
the correction value is set equal to the relative skew Of the corresponding EXS clock;
otherwise, the correction is set to a ﬁxed fraction Of the relative skew, 0.7 in the basic
implementation. This reduction of the correction value is also conservative in nature,
because the EXS clocks cannot be perfectly synchronized in practice.

Additionally, the algorithm was modiﬁed to detect and discard outliers occurring
in a single polling period, up to a predeﬁned number. This modiﬁcation accounts
for anomalies, such as unexpected, large clock value differences Obtained during the
polls.

Transfer protocol (TP). For instrumentation data transfer between the
LIS/EXS and ISM, the XDR protocol has been chosen in the basic BRISK implemen-
tation. BRISK’s data transfer protocol does not, however, use XDR in the typical
way, with rpcgen and static typing, as in JEWEL. Instead, each dynamically typed
instrumentation data record is sent with a meta-information header needed for it to
be correctly received. The external sensor packages instrumentation data in XDR
format with the meta-information header compressed, and sends them to the ISM
over a TCP stream socket. Minimizing the slack in instrumentation data messages
is important since transferring Of likely large volumes of event records through the
network is several orders Of magnitude slower than through shared memory.

93

Instrumentation system manager (ISM). When the ISM receives an instru-
mentation data batch from an EXS, it stores it in the corresponding queue; the
in-order arrival Of these batches is guaranteed by the socket stream protocol. For
dynamic merging/on-line sorting and extracting instrumentation data records from
multiple queues, the ISM uses a heap having one entry for each queue.

Each instrumentation data record, after being extracted from the ISM’s heap,
is written to a shared memory buffer using the same binary structure used by the
NOTICE macros. Optionally, a PICL—compliant trace record can be generated with
the time-stamps either in the UTC format or as the ﬂoating-point number Of seconds
passed since the ISM has been run. The visual Objects mentioned in Section 4.2.1
are components Of an Object-oriented framework for the development of on-line per-
formance visualization of parallel / distributed systems. Through an Optionally linked,
portable implementation Of CORBA 2.0 called MICO [94], the ISM can call remote
visual Objects’ methods and pass instrumentation data records to be processed as
PICL-compliant strings. Other consumers can read the ISM’s shared memory buffer,
e.g., using BRISK library code that creates such strings.

On-line sorting algorithm. Before the ISM ﬁnally delivers instrumentation
data to their consumers, it uses the synchronized embedded time-stamps, the current
time, an estimated clock difference between the ISM and the most-ahead EXS, and
a user-speciﬁed time frame T to delay the processing Of each instrumentation data
record for T time units after the record creation. If the ISM detects that two successive
records from different external sensors have been extracted from the heap out Of order,
it increases the time frame; then, it exponentially decreases the time frame to reduce

94

the latency of event processing and the amount Of instrumentation data in the queues.
This method of sorting results in a tradeoff between event ordering and latency.

Additionally, causally-related events are matched via a hash-table: if a conse-
quence event record being processed does not match a reason event record with the
same identiﬁer in the hash-table, it is kept in a data structure until the corresponding
reason event record is processed. When a just-arrived reason event record matches
a waiting consequence event record whose time-stamp is smaller than its own, the
latter’s time-stamp is overridden by a larger value. Since in this case it is Obvious
that the clocks have not been synchronized, i.e., the time-stamps should reﬂect the
causality, an extra round of the clock synchronization algorithm is invoked immedi-
ately. A causally-marked event Of either type is kept in the data structure no longer
than a speciﬁed timeout, because its peer may have been dropped.

The on-line sorting algorithm assumes that the EXS clocks are perfectly synchro-
nized. Since this is not achievable in practice, tachyon occurrences are possible if the
EXS clock value difference is greater than the time taken between the causally-related
events, e.g., the time it takes to send a message between two nodes. (For example,
when node A receives a message from node B, the event Of receiving the message is
causally-related to the event Of sending the message. Node A’s clock could be t1 units
behind node B’s clock, but it also takes t2, possibly smaller than t1, time units for
the message to arrive.) Tachyons can pass through the ISM if causally-related events
in the target application are not marked using BRISK. In fact, instrumenting some
causally-related events using BRISK may help BRISK maintain better EXS clock
synchronization. (Note that users may deﬁne application-speciﬁc causally-related

95

events that are not necessarily send and receive events. However, there is danger that
incorrect causality assumptions may deteriorate the BRISK clock synchronization.)
The extra synchronization rounds would, in turn, reduce the probability of tachyon

occurrences resulting from other causally-related events.

4.3 Evaluation of BRISK

Table 4.1 summarizes the relationships between (1) the design criteria in Section 4.1,
and (2) the actual BRISK architecture and implementation presented in the previous
section and experimental results presented in this section. The relevance of a result
with respect to a criterion can be low (L), moderate (M), high (H) or none (-).
Similarly, the success of a result in satisfying the corresponding criterion can be

poor (P), fair (F), high/with special attention (H) or not applicable (-)

Table 4.1: Summary Of BRISK. evaluation

 

 

Implementation results

 

LIS design

TP de81gn

ISM design

NOTICE timing

EXS CPU usage
Event latency

Clock synchronization
On—line sorting

Relevance/ Success
Performance ﬂexibility H/H M/F H/H H/F H/H H/F H/H H/H
Usage ﬂexibility H/H M /F H /H M /F L/F M/H M/F H/H H/F
Portability H/H H/H H/H - -/- -/- -/- H/H M/H

Low intrusion H/F H/F L/F H/F H/H H/F M/F L/F L/-

Global clock reference L/F Lf- L/F M/F M/F M/F H/F H/H H/H
Throughput and latency M/F H/H H/H L/F M/F H/H H/F L/- L]-
Transparent monitoring H/H L/- M/H L/- L/- L/- L/— L/- LF

 

g Event throughput

 

 

.\

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Design criteria

 

96

The table entries containing at least one “H” follow from the preceding and follow-
ing presentations. The “H / F” entries indicate aspects in which the basic implemen—
tation should be specialized in order to satisfy the criteria in extreme cases Of target
system behaviors and/or monitoring requirements. The “M/F” and “L/F” entries,
based on available knowledge, may also imply rare use cases in which BRISK may
need a design or implementation modiﬁcation in order to satisfy a criterion which has

become negatively affected by the basic implementation.

Experiments have been conducted with BRISK using two conﬁgurations. The ﬁrst
conﬁguration consists Of one external sensor; the range Of several simple performance
metrics were measured. The second conﬁguration includes multiple external sensors
on different nodes; the system scalability was measured, as well as the quality Of the
clock synchronization and dynamic on-line sorting. In both conﬁgurations, loop-based
synthetic applications were used that were instrumented using NOTICE macros hav-
ing six ﬁelds Of type integer. Including the time-stamp and type information, each
instrumentation data record requires 40 bytes in the XDR-based transfer protocol.
The experiments were executed primarily on Sun Ultra-1 workstations running So-
laris 2.5.1 within a 155 Mbps local ATM network. Eight nodes were available for

measurements under the second conﬁguration.

4.3.1 Local Performance

NOTICE macro (internal sensor) timing. The CPU time taken by a six-integer
NOTICE macro to write to shared memory is given in Table 4.2 for three platforms.

97

The experiments were performed with 100,000 macro calls. The timings compare well
to those Of the simpler, low-intrusion JEWEL macros. Hence, with the Options Of
NOTICE macro modiﬁcations for further reducing the degree Of intrusion and the
possibility Of perturbation on the application, BRISK approaches an Optimum that

can be achieved with software, portable event generation and detection.

Table 4.2: CPU time per 6-integer NOTICE macro

 

 

Platform us
150 MHz SGI Onyx 4><R44OO / IRIX 6.2 18.6
195 MHz SGI Indigo2 R10000 / IRIX 6.2 8.7
140 MHz Sun Ultra-1 / Solaris 2.5.1 3.6

 

 

 

 

External sensor CPU use. Besides the internal sensors, the external sensor
(EXS) is another component Of BRISK that could interfere with a target distributed
system. It is a separate process running on the application nodes, and it therefore
competes with the application for machine resources such as CPU time. CPU utiliza-
tion Of the EXS was measured for six different event rates, with one message (batch)
buffer Of size 8 kB. The tOp tool was used to save CPU utilization to a ﬁle at 5-second
intervals. The application was started approximately 10 seconds after starting tOp,
and it ran for 60 seconds. Figure 4.6 shows the results.

The CPU utilization Of the EXS is shown to be negligible at event rates of up
to 38,000 per second. At 41,000 events per second, the EXS uses approximately 3%,
which may still be insigniﬁcant for some applications. In practice, a few thousand
events per second should be enough for ﬁne-grainly instrumented real-time appli-
cations. At these rates, the EXS process could be assigned a lower priority (on a

98

 

 

 

 

50 I I I I I I I I
38K/s ~0—
41K/s +-
45 ’ We 43- ‘
46K/s «-
50K/s +-
40 - 75K/s ar- .
35 ~ ~
E 30 - .
C
.3
g 25 - -
5
E
0 20 s -
15 ~ ~
10 - = «
5 - 3 ..
o _ 4 _- . t t I m; V i
0 10 20 30 40 50 60 70 80 90
Time [s]

Figure 4.6: EXS CPU utilization for various event rates

conventional or real-time Operating system) to further reduce the degree of intrusion

and the chance for perturbation.

Event throughput. A synthetic application was used to generate event records
at various ﬁxed rates. In order to Obtain the relatively high event rates desired, it
was necessary to generate events in batches, pausing between the batches instead of
between successive events. A nanosleep system call was used to create the pause
between batches. The time resolution Of this function is platform dependent; the
shortest sleep time that could be achieved on the Sun workstations ranged from 10

to 20 milliseconds.

Throughput was measured for various values of two parameters: (1) the size Of the

internal / external sensor shared memory buffer, and (2) the maximum size Of messages

99

that carry events from the EXS tO the ISM. A single application process was used to
generate event records. Each experiment was executed for a duration Of one minute.
TO determine the maximum throughput, experiments were repeated using successively
higher event rates, until the internal or external sensor began tO drop events. In an
experiment in which the maximum throughput was found to be around 90 thousand
events per second, shared memory size of 2 MB was used with a message size Of
65536 bytes, including 4 bytes for the header and 64 message buffers. The size of
shared memory buffer was large enough to preclude it becoming a bottleneck. At
the receiving end, the ISM is run with a command-line parameter that speciﬁes its
responsiveness to incoming event records. It sets the number Of event records that
can be processed /Output without an interruption to receive incoming events. If this
parameter is set sufficiently low, and the incoming event rate is sufﬁciently high, the
ISM will receive incoming events faster than it outputs them. This is an unstable
situation, since the ISM continues to allocate additional memory to hold the events
waiting tO be output. However, setting this parameter with an appropriate value may
allow the ISM to handle higher peak rates Of incoming events while allocating only a
small amount Of additional memory for events pending output.

TO quickly adapt BRISK to a target application throughput and latency require—
ments, as well as to minimize its memory usage, the EXS and ISM provide several
command-line parameters. These include all the above mentioned ones, and three
more for the EXS: the maximum number Of events to read from the shared memory
in one turn; the approximate shared memory buffer polling period; and the approxi-

mate batch sending latency.

100

4.3.2 Distributed Performance

Event throughput. When multiple EXSes send instrumentation data to the
same ISM, their maximum sustainable event rates are dependent on each other
and also on event-generation / transfer patterns. Using approximately uniform event-
generation / transfer patterns across the EXSes, the maximum per-EXS and aggregate

event rates were measured to investigate potential bottlenecks.

In this experiment, the ISM ’s responsiveness command-line parameter was tuned
with the increase of the number of EXSes in order to ﬁnd a balance between the
receiving and processing rates and to achieve an approximate maximum throughput
Of 90 thousand events per second. The on-line sorting was effectively turned Off
by setting the time frame T to 0, so that it does not affect the responsiveness. A
signiﬁcant factor here is the minimum blocking time Of a select system call that
separates the receiving and processing phases. TO balance the ﬂow Of events through
receiving, processing and blocking phases, the processing rate must be sufficiently

high to keep up with all the instrumentation data that arrives during the phases.

The results show that the aggregate event rate is limited by the CPU capacity
and dependent on the kernel clock quantization which affects the select system call.
The rate remained almost constant as more EXSes were added. This low dependence
on the number Of EXSes is favorable toward future BRISK extensions that include
scalable, hierarchical instrumentation data collection with local ISMs forwarding in-
strumentation data to a global ISM.

101

Clock synchronization. As explained in Section 4.2, the ISM polls EXSes in
periodic rounds. At the end Of each round, the clocks Of the EXSes are updated with
the time difference to the fastest EXS clock. Three metrics for evaluating the clock

synchronization were selected:

1. the mean relative error, which is computed at the end of each round Of polls,
is simply the mean Of the absolute values of differences (skews) between each
EXS clock and the fastest one;

2. the time required for the mean relative error to converge, i.e., the number of
rounds after which the mean relative error falls below a threshold value. This
time also depends on the polling period;

3. the maximum relative error, which is also computed at the end Of each round of
polls, is found by taking the difference between the fastest and slowest clocks.
It provides some indication Of variance.

To perform a larger number Of experiments, the polling period was reduced from
a default Of sixty seconds to ﬁve seconds. Because the system clocks have less chance
to drift apart with ﬁve-second polling periods, the results Obtained were expected
to be better than what would be Obtained with sixty-second periods. Figure 4.7
shows the performance Of the BRISK clock synchronization on relatively lightly loaded
processors and network. The mean and maximum relative error in microseconds are
plotted over time after start of the algorithm, which is speciﬁed as number of polling
periods or rounds.

Although the peaks in Figure 4.7 are relatively insigniﬁcant, the conditions were
investigated under which the peaks occur. The algorithm is dependent on, among
other things, the implementation of gettimeofday, i.e., the type and precision Of the
clock used, and network conditions.

102

 

200 I I I I I I I I I I I I

, Mean relative error +—
180 ] Maximum relative error —+— ]

160- . _ . . A . .,,,.
120 t‘ .. . . . . ._ . .. ... . _,

Microseconds

 

 

 

 

 

 

Rounds

Figure 4.7: Measurements Of the clock synchronization algorithm (8 EXS nodes, 5-
second polling period, 10-minute experiment)

Improvements to the basic clock synchronization algorithm, including discarding
Of outliers, did not substantially reduce the occurrence of the peaks in the mean
relative error. With further study, the ﬁrst Observation was that the peaks were
caused by more than one node during an experiment. This Observation eliminates
the possibility of a single machine having an errant clock. Second, the clock errors
occurred both above and below the expected value during a single round. These
within-round variations imply that the Unix clock synchronization daemon is not
the cause, since the daemon does not change the clock more than once within a few
seconds. Finally, some correlation was Observed between the occurrence of peaks
and two other factors: (a) the time of day, and (2) other activity on the computers.

For example, experiments were performed around the same time on separate trials.

103

Each time, the frequency Of the peaks was greater at 11:30 PM than what had been
measured just thirty minutes earlier. Thus, isolating the factor affecting the quality

Of clock synchronization, if needed, will require controlled experimentation.

The average synchronization Of EXSes within a few tens Of microseconds on a fast
local network is likely to be an acceptable cheap substitute for synchronized hardware
clocks for the majority Of target applications. Similarly, the Observed drift of EXS
clocks relative to the ISM clock Of up to a few tens Of milliseconds per hour can Often
be tolerated. The command-line parameter for the clock synchronization period can
be used to balance between the quality of the clock synchronization and the negative
effect that messages sent by this distributed algorithm, which have the priority over
instrumentation data messages, may have on the instrumentation data latency and

throughput.

Dynamic on-line sorting. In order to test and evaluate the dynamic on-line
sorting algorithm, which attempts tO nullify variations in the latency of instrumenta-
tion data arriving from different nodes, code was added to BRISK to simulate delays
in the network. This code passes event records for a random period Of time. After
this period, an event record is delayed for some other random time. The amount
of time by which an event record is delayed is determined by subtracting a random
number with a bounded, uniform distribution, from its time-stamp. An exponential
distribution was used for the time between delayed event records, as it has the desir-
able “memoryless” characteristic for this type of simulation [61]. Event records are
considered late when both Of the following occur: (1) their delays are greater than

104

the time frame T, the on-line sorting delay described in Section 4.2.2, and (2) the
ISM has already output an event record with a more recent time—stamp.

The sorting was evaluated with a set Of experiments in which four parameters
were varied: algorithm for time frame T; time frame half-life; maximum delay time;

and mean time between delayed events. These parameters are described as follows:

1. An algorithm increases the time frame T upon the arrival Of a late event record.
The two Options for this algorithm were either to double the current value Of
the time frame (doubling algorithm) or to set the time frame to the lateness of
the event record (lateness algorithm).

2. The time frame half-life (hl) is the amount Of waiting time before reducing the
time frame when no event records are late. The two Options for hl were either
2 s or 10 s.

3. While the minimum event record delay was ﬁxed at 50 ms, the maximum event

record delay time was variable. The two Options for this parameter were either
200 ms or 500 ms.

4. The exponential distribution is parameterized by the mean time between de-
layed event records. The two Options were either 1 s or 2 s.

The following performance metrics were used in these experiments to evaluate the
on-line sorting algorithm: the frequency Of increasing the time frame T and the peak
value of the time frame T. Each experiment was performed twice. Eight EXSes were
used, and the aggregate event rate was approximately 2,000 event records per second.
The duration of each experiment was 125 seconds, resulting in 256,000 event records.

Table 4.3 shows the frequency by which the time frame T was increased due to
late event records. Table 4.4 shows the peak magnitude Of the time frame T, which

correlates to the event latency. In both tables, the ﬁrst column lists the simulation

105

parameters used: minimum event record delay (min), maximum event record delay

(max), and mean time between delays (MTB-de1ay).

Table 4.3: Count Of increases in time frame T

 

 

 

 

 

 

 

doubling lateness
min/max/MTB_delay Trial h1 = 2 s bl = 10 s hl = 2 s bl = 10 s
50 ms/200 ms/l s 1 48 8 16 9
2 11 10 85 24
50 mS/200 ms/2 s 1 38 11 20 25
2 14 4 61 22
50 ms/500 ms/l s 1 44 13 33 29
2 48 I3 21 13
50 mS/500 mS/2 S 1 34 11 78 14
2 47 12 56 14

 

 

 

 

 

 

 

Table 4.4: Peak time frame T in milliseconds

 

 

 

 

 

 

 

doubling lateness
min/max/MTB_de1ay Trial bl = 2 s bl = 10 s hl = 2 s bl = 10 s
50 ms/ 200 ms/ 1 s 1 800 800 429 268
2 200 800 505 452
50 ms/200 Ins/2 s 1 300 400 373 391
2 400 400 374 391
50 ms/ 500 ms/ 1 s 1 800 3200 1589 1799
2 3200 3200 1016 1524
50 ms/500 ms/2 s 1 1600 800 1194 1177
2 1600 1600 808 1274

 

 

 

 

 

 

 

In Table 4.4, higher noise levels on the network, represented by higher max and
lower MTB-de1ay values, resulted in higher latency. However, this elevated noise level
did not result in a signiﬁcantly higher number Of late events, as shown in Table 4.3.
These two observations suggest that the dynamic on-line sorting algorithm is adapting

to the noise level on the network.

106

The non-parametric Wilcoxon signed rank test [16] has been applied to analyze
the effect Of the ﬁrst two parameters on the performance metrics and numerically
support qualitative analysis.

When comparing the results from the doubling algorithm to those from the late-
ness algorithm in Tables 4.3 and 4.4, the following Observation was made. The lateness
algorithm resulted in a higher number Of late events, at a conﬁdence level Of about
90%, but smaller peak values of time frame T, at the 99.22% conﬁdence level. This
higher number suggests that the lateness algorithm is better suited for latency-critical
applications, while the doubling algorithm is better for order-critical applications.

A comparison Of the results from the 2 s half-life to those from the 10 s half-life
shows that reducing the time frame T half-life increases the number Of late events,
at the 99.99% conﬁdence level. NO evidence was found for this to have recognizable
effect on T’s peak value. This independence Of T’s peak value on T’s half-life implies
that the half-life may be a large value for most applications, except those that are
latency-critical.

The minimum value Of the time frame T, the choice Of the algorithm for its
increasing, and its half-life are available as command-line parameters, and so are the

maximum times for keeping reason and consequence events in the ISM hash tables.

Event latency

Extensive measurement Of the event latency, which is aﬂected more signiﬁcantly by
multiple EXSes than is throughput, is part Of our future work on BRISK. Network
noise, combined with dynamic on-line sorting, is one source of latency. Another source

107

is waiting select system calls, which are used in both the EXS’s and ISM’s main
event lOOps to avoid busy-waiting. Namely, the duration Of waiting select system
calls depends on the resolution Of the kernel clock, which has been found to be 10—
20 ms. While the event processing time taken by EXS and ISM is in the range of
tens Of microseconds, two select calls can add up to about 40 ms latency in a worst

C388.

4.4 Summary

BRISK is a lightweight distributed instrumentation system developed with an em-
phasis on portability and ﬂexibility. It exploits successful technical solutions Of extant
distributed ISes in order to provide a robust kernel for future distributed monitoring
and control systems. SO far, it has been ported to Sun Solaris, SGI IRIX, Linux and,
using AT&T U/WIN [68] and Cygnus IPC library, to the MS Windows family.
Characteristics Of BRISK that were evaluated included event throughput, clock
synchronization, intrusion on the instrumented system, and dynamic on-line sorting.
Measurements indicate that BRISK could meet the performance requirements of a
wide range Of distributed applications. Being a ﬂexible and Open IS, it should be
relatively easy to adapt it to a speciﬁc environment and improve its performance in

a speciﬁc direction.

108

Chapter 5

An On—Line Performance

Visualization Technology

This chapter describes a framework for on-line performance analysis and visualiza-
tion called PGRT visual Objects. It is Object-oriented and easily distributable via
middleware software such as CORBA [82] and DCOM [20]. Within it, a visual-Object
deveIOper can integrate low- and high-level, application-speciﬁc PAV. Furthermore, it
is based on two visual-Object levels for portability and code reuse: a device-dependent
low level, and a device-independent high-level. A goal was also to be able to integrate
various sources Of Off- and on-line performance data. TO achieve this ﬂexibility, the
visual Objects consume performance data in the form Of event records from an envi-
ronment. To formalize the design of high-level visual Objects, i.e., enforce a structured
approach that is less error-prone, certain rules and a very high-level, component-based
speciﬁcation language, called Visual Object Markup Language (VOML) have been

109

deﬁned. The language uses Standard Generalized Markup Language (SGML) markup

for structuring visual Objects, and Scheme scripts for deﬁning PAV semantics.

The use Of SGML enables development Of a PAV information infrastructure for
platform- and tool-independent development Of visual Objects. It may also facilitate
automatic monitoring, analysis, and visualization Of globally distributed applications
via network-enabled SGML entity managers. The use Of Scheme for visual Object
semantics enables both rapid prototyping Of visual Objects and customizing VOs for
a wide range Of platforms via, for example, Scheme-tO-C and Scheme-tO-Java VM
bytecode compilers. That is, a single VOML speciﬁcation may be used to generate
automatically an X library-based visual Object and one that runs within a WWW
browser.

In Section 5.1, the visual-Object framework is described in detail, and an example
Of successful use for PAV of a distributed multimedia real-time application are shown.
A PAV architecture for high-level visual Objects, the markup language based on it,
and the development environment are presented in Section 5.2. An example Of a

VOML speciﬁcation is given in Section 5.3.

5.1 Visual Object Architecture

The Visual Object (VO) architecture identiﬁes two main software layers apparent in
the majority of extant PAV tools, and represents them as two classes: the high-level
V0 (HLVO) class and the low-level V0 (LLVO) class. In general, the responsibility
of an HLVO class is tO implement an application-speciﬁc semantics, while an LLVO

110

 

class is platform-dependent while providing a platform-independent interface to the
HLVO class. When implementing a V0 class, an HLVO class implementation is
derived from an LLVO class implementation, as shown in Figure 5.1. (Vertical bars
in a high-level method denote the presence Of multiple peer components.) In the
following subsections, the main characteristics Of the LLV O and HLVO class, and an

application to a heterogeneous system are shown.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N
[ HLvisualobject
[ \r ( _‘ r (f “N
s
3
23
nigh-I
£2
é ,
dehnrtrdizeJL ”m4 _ \ "Nana ealbackj
[ inlostructures ]
,-_-l __________________ l ___________

 

 

LL visual object

 

 

quantitative
and

qualitative

adaptation

 

 

 

 

 

 

 

 

 

 

--—————-—————————/

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.1: The design Of a visual Object

111

5.1.1 Low-level visual object

The responsibilities Of an LLVO class described below illustrate the basic building
block Of this PAV technology. They have evolved by repeat Of substantial experimen-

tation with an X library-based two-dimensional LLVO class implemented in C++.

Multiple views. An LLVO maintains a number of display areas, referred to as views.
In the Xlib-based implementation of the LLV O class, each display area is sup—
ported by a contained Object that maintains the state of the corresponding X

window.

Graphical primitives. The LLVO class provides methods for rendering simple
graphical Objects, text and ﬁgures in the views. The coordinate system used for
the graphical Objects’ representative coordinates (as arguments to the meth-
Ods) is a world coordinate system speciﬁed by the user at the moment Of
(re)initializing a view. That is, the user chooses a coordinate system in which
the dimensions are closely related to the performance information. The trans-
formation tO the view coordinate system is then in most cases transparent to
the user. (Major exceptions are labels written on the margins, whose coordi-
nates must be given in the world coordinate system, while the view coordinate
system is sometimes more intuitive. There exists a LLVO method that provides

information which enables the use Of the view coordinate system.)

Display area. A view consists of an internal area surrounded by margins, referred
to as scrollable area. As a visualization progresses, the mapping from the world

112

coordinate system to the view coordinate system may change (either implicitly,
as declared for quantitative adaptation purposes, or explicitly), at which point
only the contents Of the scrollable area may be translated or rescaled (zoomed)

as a response.

Control methods. Methods such as scroll, resize, rescale and snapshot provide ex-
plicit control over each view. Combined with the graphical primitives, they
allow an HLV O to control explicitly, among other things, what to be drawn and

what to be visible at a point in time.

Quantitative adaptation. A relation between a view and calls to graphical prim-
itives that draw in the view may be established that causes the view to adapt
dynamically by translating or rescaling (zooming) the contents Of the scrollable

area, thus implicitly controlling what should be visible over an interval of time.

Qualitative adaptation. The LLVO class may be portable to multiple graphical
platforms that differ at some extent (e.g., different X servers may use different
color maps). At run time, it may adapt to the platform capabilities, as well as

provide drawing Optimization.

An LLVO class implementation may perform book-keeping about graphical object-
8 being drawn and/or be based on vector graphics, in order to facilitate quantitative
and/or qualitative adaptation. However, this is not mandatory and an HLVO class
implementation can only assume that the underlying LLVO class is memoryless and

raster-based.

113

The quantitative adaptation Of a view in this implementation is initialized by
specifying directions (from {x+,x—, y+, y—}) and types Of adaptation (rescaling or
scrolling) for these directions. On the other hand, one Of the parameters Of every
graphical primitive is the adaptation ﬂag that determines whether the view should
adapt, e.g., rescale down if necessary, before the graphical Object is drawn, in order
for the graphical Object to be visible. In the case Of rescaling, the view may also
adapt in the Opposite way. For example, if the view had to rescale “down” (zoom
out) in response to a peak in a temporal line plot (the contents Of the scrollable
area is scrolled to the left as the time progresses), it will rescale “up” (zoom in)
once the peak has disappeared from the scrollable area. Another parameter for the
initialization is the level of quantitative adaptation. It can be used to, for example,
specify the maximum size Of data structures (in this implementation, interconnected
red-black trees [28]) used to remember extreme points Of graphical Objects that have

been drawn with the adaptation ﬂag set.

5.1.2 High-level visual object

Similarly as for the LLVO class in general, the implementors have freedom to deﬁne a
precise framework for developing HLVOs. In this section, the HLVO implementation
base is described, and in Section 5.2 an HLVO development framework is presented.

The main components Of an HLVO class are the four methods shown in Figure 5.1.

Event processing. The performance data passed to an HLVO via calls to the pro-

cessing method are termed events (or data events). Based on the events, this

114

method ( 1) updates data structure holding performance information that are
referred to as info structures, and (2) controls the rendering Of this information

by updating data structures referred to as control structures.

Information rendering. The rendering method may be called, to (read and) map
a portion Of the info structures’ contents to the LLVO views, either immedi-
ately after processing an event (asynchronous rendering mode) or by a thread
that may synchronize the rendering Of multiple HLVOs (synchronous rendering
mode). This method communicates with the processing method by both reading

and writing the control structures.

Callback processing. An HLVO may also respond to changes in its run-time en-
vironment, as well as to the user’s commands. This method may, for example,
preprocess callback events coming from the LLVO (e.g., if a view has been
automatically resized), the GUI Of a V0, etc., and then forward them to the

processing method as if they were data events.

(Re)initialization. In on-line performance visualization, it is desirable to be able
to partially reinitialize or reconﬁgure an HLVO without interrupting the target
application and / or instrumentation system that supplies performance data. The
user may deﬁne which info structures should be reinitialized and how, and what

should be drawn upon a mouse click on a “reinit” button or a similar event.

In order to allow for rapid prototyping Of HLVOs and further PAV research, two

frameworks have been developed based on two interpreters / compilers Of the Scheme

115

language [26]. The ﬁrst Scheme implementation is called GUILE [73] and has a
Scheme-tO-C compiler called Hobbit [108] associated with it. In this implementation,
a generic HLVO class inherits the X library-based LLVO class. Both classes have
some methods and data wrapped by Scheme procedures within a tool integration
environment for instrumentation and performance visualization, called PGRT-TIE [6,
7]. The GUI Of a V0 (in addition to LLVO callbacks) is implemented separately,
using a GUILE interface to Tk [118]. A CORBA interface has been developed for
this framework so that a PAV application may consist Of VOs distributed over multiple
nodes.

The second Scheme implementation is called Kawa [19], and includes an interpreter
and a compiler to Java V M bytecodes. Similarly to the ﬁrst implementation, a generic
HLVO class inherits the Java AWT-based LLV O class. The LLVO class methods are
wrapped by Scheme macros. The GUI Of a V0 is implemented in a way that minimizes
necessary modiﬁcations Of the VOML compiler described in Section 5.2.3: a layer Of
Scheme macros mimics the GUILE Tk interface on tOp Of the Java Swing library. This
framework allows for developing either standalone or WWW-centered VO prototypes.

It is possible in Scheme, as a dynamically-typed language, for the event processing
method to receive any type of data structure as an event, which allows for easy
integration Of different performance data sources. Most importantly, this high-level
algorithmic language is suitable for easy deﬁnition Of complex info structures (e.g.,
association lists serving as micro-databases) and compact expression Of updating and
querying them (e.g., with the help of powerful macros) in the event processing and
information rendering methods, respectively.

116

5.1.3 Application of visual objects to a heterogeneous system

        
  

 

 

 

   
   
  

T Kecgivaiﬁﬂisﬂhrgﬁﬁit T Frame Fiat; T ‘ A
- i r. I T's-Terr , 2.9491
F
lf‘flfllllﬂw LLAIL/(fal‘\]\ﬂll 7 7 lg 2.21m
1.47m
, 7.3590
FPS
Tm: [SEC] 129.37

TIHE [SEC]

. . . _. , Sp

7 Viieréfuéde-it- Pgéiimlzc‘ltgr I g ReceiverlrLost nimughput
Irame Hate
Receivellflosl I‘enorllcrly

Packet Statistics

ilivjli LEVI

l I Fvenl Window
I

 

“Eat” [HZ] Lmes.’Checkmarks

 

 

 

 

Figure 5.2: On-line performance visualization of the real-time multimedia application.

As part of the PGRT environment, two prototype visual objects have been applied
to the study of a distributed multimedia real-time application. The target system
consisted of a server and a number of heterogeneous receivers of multimedia data
streams. The visual objects helped determine the processing demands required to
playback different patterns of video frames and to handle different sizes of video
frames, as well as the wasted computation due to receiving video frames that cannot
be replayed due to time constraints (e.g., a new video frame arrives before an Older
video frame can be processed). Furthermore, they helped understand the operation
of the application: variations in periodic behavior, and speciﬁc points in a network

117

where frame loss occurs, either due to network congestion or individual workstations
loading conditions, were displayed. Immense amount Of state information, condensed
into a set Of visual displays, could be used by the feedback control algorithm to make
decisions automatically about target bandwidth being requested Of the video source.

The snapshot Of one visual Object is given in Figure 5.2. Its four views show (1)
the throughput Of received and estimated throughput Of lost video data, (2) the frame
rate, (3) the frequency distributions of received and lost video frames over one-second
intervals, and (4) a spatial, animated view Of all receivers and their connections,
depicting the relative volumes Of received, lost, used, and dropped video packets.
The other visual Object has 16 views, divided into four groups: (1) CPU utilization,
(2) the periodicity Of video frames received, the number Of received ATM cells, and
(4) the number Of lost ATM cells. In each group, there are four related views Of
the corresponding metric: (1) minimum-average-maximum, (2) sample deviation, (3)

aggregate, and (4) per-receiver histogram.

5.2 Visual Object Markup Language (VOML)

In this section, a framework is described for semi-automatic design and prototyping
Of HLVOs. First, a generic architecture for event processing and performance infor-
mation rendering is deﬁned that is orthogonal to the V0 architecture described in
Section 5.1. (In this context, where the two architectures coexist, “orthogonal” means
that either architecture can be extended without essentially affecting the other, while
they are combined to Obtain a working HLVO.) Next, salient characteristics Of a very

118

high-level language based on this architecture, called Visual Object Markup Language
(VOML), and its compiler are presented. The full VOML document type deﬁnition
(DTD) is given in Appendix B. The VOML system allows a performance visualiza-
tion developer to concentrate on application- and visualization-speciﬁc semantics and

build HLVOs by combining reusable components.

 

 

 

 

       
   
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_ _ rendering the contents '
event processrng ol inlo stmcts updating
based on the contents
cond ( J goondj controlsuucts
LL
1 T r :

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l
[ N
J
N
p
p
c J
\
J
I
n
K___/
r—ﬂ

 

 

 

 

 

 

 

Event lnlonnation
Processing Rendering
Module Module

.. _ i

into structures

1
L.
control structures
l l

 

 

 

 

controls and graphical primitives Ior LLVO

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

data events
callback events
r J
k
r N
c
f
(

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_
—'H

 

 

 

 

Figure 5.3: Event Processing and Information Rendering Architecture (EPIRA)

5.2.1 Event Processing and Information Rendering Architec-
ture (EPIRA)
There are many possible patterns for development Of complex HLVOs. For example,

one could extend or modify the V0 architecture in Figure 5.1 and build complex

119

HLVOs in a pure Object-oriented style, by inheriting from simpler HLVOs. How-
ever, since a goal was to develop a framework that could be applicable to target
languages that do not. have strong support for Object orientation (e.g., Scheme and
C), a component-based approach has been taken.

Figure 5.3 shows the Event Processing and Information Rendering Architecture
(EPIRA). The architecture speciﬁes the tentative parts Of the HLVO architecture,
shown in Figure 5.1, and focuses on the data-driven computation aspect. The two
modules in the ﬁgure correspond to the event processing and information rendering
methods. The events arrive (via method calls) from the two “busses” on the left: they
carry the performance data and the changes in the run-time environment. Arrows
are drawn to denote unidirectional data ﬂows.

The event processing module may contain a number Of event processing (EP)
components. Each EP component in turn may contain a number Of parts (separated by
horizontal lines in the ﬁgure), belonging to one Of three classes: ( 1) event-based ones,
shown in the middle and executed upon arrival Of a speciﬁc event, and condition-based
ones, which can be executed (2) before or (3) after the event processing, provided
that a speciﬁc condition tests true. (In VOML, this is referred to as a “condition
event”) Since only one event can be received at a time, among all event-based parts
(belonging to different EP components) only those for the received event are executed.
The conditions corresponding to the condition-based parts are evaluated each time
any event is received.

Similarly, the information rendering module contains a number Of information
rendering (IR) components. Each IR component in turn contains two parts, or phases,

120

in order to avoid potential write-after-read hazards involving the control structures.
During the ﬁrst phase, info and control structures are analyzed and the contents Of
the info structures are appropriately rendered in multiple views. Once that all ﬁrst-
phase parts Of the IR components have been executed, the execution Of second-phase
parts begins, when the control structures may be safely updated. Since the HLVO
assumes that the LLVO has no special rendering support (e.g., a depth buffer), some

visualizations may depend on the relative execution order Of the ﬁrst—phase parts.

5.2.2 The VOML language

The SGML [42] has been chosen as the basis for a PAV information infrastructure
to be built around the V0 and EPIRA architectures. For a start, the VOML is an
SGML document type deﬁnition (DTD) that encompasses the structure Of HLVOs
based on EPIRA. Some Of its higher-level elements and example relations among the

elements are given in Figures 5.4a and 5.4b.

As it can be seen from Figure 5.4b, VOML attributes are used both to specify
certain characteristics of software components described by the elements and to create
relations among them, some Of which directly correspond to the connections shown in
Figure 5.3. The others are not as “hardwired,” and are described using Figures 5.4b
and 5.5. Figure 5.5 deﬁnes an example IR component that is used in a visual Object

deﬁned in Figure 5.4b.

Although SGML is a very suitable tool for writing structured speciﬁcations, it lack-
s the means for describing semantics Of a speciﬁcation. On the other hand, Scheme is

121

 

 

voml
head
body
vi

 

anal-object
event-declarations
data-event
info-structures
control-structures
utility-code
view-initializations
view
event-processing
ep-component
preprocess-inputs
info-rendition
ir-component
line

 

 

(event-declarations)
(data-event name="onescalar" rtype="entry" etype="3000">
(data-field name="key">
(data-field name="value">

<info-etructures>
(variable name="currenttime" type-"rea1">
(variable name=”assoclist" type="1ist">
(variable name="palette" type8"list">

(control-structures>
(variable name=“beepcount" type="int" init="0">

<utility-code>
(define (beep)
(display 8\Bel))
(lutility-code>
<viev-initializations)
(view name="lineplotvieu" tit1e="Multi-scalar line-plot"...>

(event-processing)
(op-component name="onescalarprocess" inputs="onescalar.key.value"
infosI"currenttime assoclist”)

(info-rendition)
(ir-component names"lineplotrender” vievs="1ineplotviev"
infos="currenttime assoclist palette" controls="beepcount">

 

(a) Higher-level elements

 

(b) Relations among elements of a VOML speciﬁcation

Figure 5.4: A brief description Of VOML

a standardized language with simple syntax and clean semantics that is very suitable

for describing the semantics of EP and IR components. Hence, a decision was to em-

bed Scheme into VOML markup. Combining markup and a programming language,

typically Java in WWW-related markup languages, is not a new idea. However, the

integration Of VOML and Scheme is tighter, as can be seen from the code example

in Figure 5.5. Unlike script-augmented HTML ﬁles that are ﬁnal documents to be

“executed,” VOML speciﬁcations are to be compiled.

Namely, within Scheme code deﬁning the semantics of a component, there may

exist references to “formal parametersz” info structures, control structures, events

122

 

 

(description)
This IR component draws a line-plot of multiple scalars over time, in the supplied
view (‘0). Only lines with the last-update time equal to the current time are drawn.
Once 10 lines have been drawn, a short sound (beep) is generated.

The info structures consist of the current time (80, non-negative real number)
a multi-scalar association list (31, indexed by non-negative integer keys),
and a color palette ($2, a list of strings -- color names).

Each value in the association list is a 4-element vector:

8(old-time old-value new-time new-value).

The key of each value in the multi-scalar association list is used to index the
color. when all colors are exhausted, the line thickness is increased to distinguish
between different scalars. A counter is used as a control structure (10) for
generating sounds.
(/description>
(let ((palette-len (length 32)))
(alist-for-each
(lambda (scalar-id scalar)
(if (= $0 (vector-ref scalar 2))
(begin
(set! 20 (+ 20 1))
(if (= Z0 10)
(begin
(beep)
(set! 20 0)))
(line view=“‘0” from=”(vector-ref scalar 0) (vector-ref scalar 1)"
to="$0 (vector-ref scalar 3)" thick="(+ (quotient scalar-id palette-len) 1)"
color="(nth (modulo scalar-id palette-len) 82)" adapt="yes" clip="margin">)))

 

 

81))

 

Figure 5.5: Code Of the IR component used as lineplotrender in Figure 5.4b

(in EP components) or views (in IR components). The reference notation is Tn [/m] ,

where

e T is $ for info structures, 7. for control structures, and “ for events and views;

0 n is the position Of the formal parameter in the corresponding parameter list
(e.g., $0 corresponds to currenttime in the last line Of Figure 5.4b, because it

is the O-th argument supplied via the infos attribute);

0 Optional /m is used for referencing individual ﬁelds of an event. (Currently,
VOML only supports PICL [122] compliant events, i.e., lists with the ﬁrst two
elements being integers that determine the record and event type.) For example,

123

an occurrence Of “0/ 1 within code Of EP component onescalarprocess in

Figure 5.4b would reference ﬁeld value Of data event onescalar.

The info and control structures are translated into special global variables by the
compiler. Effectively, they are “passed” to EP and IR components by reference when
listed in the infos and controls attributes Of the enclosing VOML element. In this
way, a reusable component may be written, tested, and placed into a library. In an
SGML system, such components may be kept as external SGML entities and used in

VOML speciﬁcations of different HLVOs by simply referencing them by names.

EP components tend to be application-speciﬁc, as they process application—speciﬁc
event records. To make them more reusable, the element preprocess-inputs is
provided that allows for specifying “glue logic” (as Scheme expressions) for da-
ta events speciﬁc tO a new application. Namely, before an existing EP com-
ponent is referenced (i.e., used), any ﬁelds Of the data events it processes may
be arbitrarily preprocessed. For example, an EP component that updates in-
fo structures for a simple line-plot visualization (e.g., Scheme code just under
(op-component name="onescalarprocess". . . > in Figure 5.4b, updating info struc-
tures to be rendered by the IR component code in Figure 5.5) can be used to visualize
the frame rate Of a multimedia application. The glue logic in this case could be a
function that divides the number Of frames received in a time interval by the length
of the time interval, whose result would be assigned to the second ﬁeld (named value
in the case of the default data event onescalar). (Assume that the number of frames
is contained in a data event ﬁeld, and the time interval is kept in an info structure.)

124

Similarly, different library IR components that are parameterized may be com-
bined in interesting ways over a number Of views. Additionally, they may be given
attributes to determine their higher-level behavior. (Currently, this behavior is sup-
ported in the HLV O implementation by auxiliary Scheme code; it might also be sup-
ported by a drawing-Optimizing LLVO class.) One such attribute is named refresh,
which currently can have any combination Of values resize, rescale and update. If
any Of the ﬁrst two values is used for an IR component, the component will redraw
its contents if any Of the views it draws to gets resized or rescaled. This is useful
for raster-based LLVO class implementations, where resizing or rescaling an image is
lossy. If update is used, the component will undraw what it drew last time, before
proceeding to render the contents Of the info structures again. Certain higher-level
behavior, which would by default ignore any control structures (e.g., the effect of
update does not depend on the actual IR component code), can also be controlled by
an enable attribute that takes a Scheme expression evaluating to a Boolean value.
When combining IR components, the HLVO develOper may deﬁne their execution

order.

5.2.3 The VOML compiler

The VOML compiler is built on top Of an SGML transformation library called
STIL [99] and consists Of the following components.

125

 

Catalog

extemal
entities

 

 

 

 

 

    

SGML
parse

 

 

 

 

 

 

SGML parser
w/ entity manager
\
/ [I
SGML VOML
declaration docrment
tor VOML type
deﬁnition

 

 

 

   

 

<———— -—--—-—-—I———--—-—-

tree
(ESIS formal)

 

 

 

  
 

 

VOML compiler
on top of ST lL

 

 

 

 

 

 

Figure 5.6: VOML compilation and execution process diagram

HLVO

GUILE

 

 

 

 

SGML parser. The sgmls parser [25] is used as the front-end that parses an SGML

declaration, VOML DTD, and external entities used in a VOML speciﬁcation

Of an HLVO.

STIL library. This library is written for the clisp [48] implementation of Common

Lisp with CLOS. It allows traversing a parse tree created by the SGML parser,

and deﬁning “hooks” (semantic actions) that are called during the traversal.

VOML validating parser. One part of this component consists of the hooks called
by the STIL library. The other part consists Of CLOS Objects that contain code

and other information relevant to EPIRA components of the HLVO speciﬁcation

126

 

being compiled. The hooks process VOML elements (including the contents in

Scheme), their relations and attributes, and build the CLOS Objects.

VOML code generator. This component “tangles” the plain, application- and
visualization-speciﬁc Scheme code from a VOML speciﬁcation with code that
it generates for integration with the run-time environment. The latter includes
a graphical user interface for accessing and modifying selected info and con-
trol structures, managing views, registering VOs with routines that supply data

events, etc.

Figure 5.6 shows the compilation and execution process Of a VOML speciﬁcation in
the GUILE-based environment. Processes are shown as oval rectangles, while input,
intermediate and output ﬁles are shown as rectangles. Solid lines denote the process

Of ﬁle inclusion, while dashed lines denote references in VOML and Scheme ﬁles.

While an SGML parser uses an entity manager to ﬁnd components Of a document
which are referenced as external entities—such as library EP and IR components——
within its virtual storage system, the SGML standard itself does not specify how to
implement one. A WWW-enabled entity manager would further enlarge the PAV
information infrastructure and facilitate automated monitoring and PAV of globally
distributed applications. Figure 5.7 shows an example Of how component deﬁni-
tions could be fetched Off a WWW site by the entity manager, to be included for
compilation. In the example, the vendor Of an imaginary software product, whose
performance is to be visualized, keeps the latest implementations of an EP and IR

127

component for the product, ready to be used in a HLVO speciﬁcation. (It is assumed

that the information about the components’ interfaces is available.)

 

(!DOCTYPE VOML PUBLIC "-//MSU-PGRT//DTD VOML 1.0//EN"
I
(lENTITY SoftwareXYZep
SYSTEM "http://vendor.com/voml/XYZep.voml">
(IENTITY SoftwareXYZir
SYSTEM "http://vendor.com/voml/XYZir.voml">
]
>
(voml)

(event-processing)
(op-component name="XYZep"
inputs=”mydata1.f1.f2 ...” ...>
tSoftwareXYZep;
(lop-component)

(info-rendition)
(it-component name="XYZir"
views="myview1 ..." ...>
hSoftwareXYZir;
(/ir-component>

 

 

 

Figure 5.7: Sketch Of a VOML speciﬁcation that uses remote component deﬁnitions

5.3 The VOML Speciﬁcation of a Simple Visual
Object

In this section, the main parts of the VOML speciﬁcation of a simple V0 with a view
similar to the last one Of the V0 shown in Figure 5.2 are presented and commented.
The V0 receives performance data events from a distributed application, generated
whenever a node is (1) added or (2) removed, and periodically to carry proﬁle data
from each node. The event declaration section is shown in Figure 5.8. The record and
event types Of the ﬁrst two events are taken from the PICL speciﬁcation [122], while
the proﬁle event belongs to an extension of PICL. When some ﬁeld are skipped (i.e.,

128

ignored), the index attributed is used to specify the position of the next declared

ﬁeld.

 

(event-declarations)
(data-event name="addnode"
rtype="pg-entry" etype="-901">
(data-field name="ts" type="int">
(data-field name="node-id" type="int">
</data-event>
(data-event name="rmnode"
rtype="pg-exit" etype="-901">
(data-field name="ts" type="real">
(data-field name="node-id" type="int">
(/data-event>
(data-event name="node-prf"
rtype="entry" etype="3141">
(data-field name="ts" type="real">
(data-field name="node-id" inder="3" type="int">
(data-field name="node-type" index="5" type="int”>
(data-field name="rkbps" type="real">
(data-field name="tb" type="real">
(data-field name="used" type="real">
(data-field name="fps" type="real">
(data-field name="packets" type="int">
(data-field name="pack-used" type="int">
(/data-event>
(levent-declarations>

 

 

 

Figure 5.8: Event declarations

The info and control structure speciﬁcations are shown in Figure 5.9. The info
variable numofnodes (although redundant) keeps the current number of communi-
cating nodes; nodes is an association list that keeps the previous and current proﬁle
Of each node; nodeno keeps the (non-negative) node id from the latest proﬁle event.
The control variable nodechange indicates whether the number Of nodes has changed

(meaning that the view has to be updated, as will be seen later).

The next is the utility code section, which is omitted for brevity. It contains func-
tion getfontname that returns a font name from a list of available fonts, given some
hints. Besides, functions id-get, id-put and id-rem, which are used to manipulate

129

 

(info-structures)
(variable name="numofnodes" type="int">
(variable name="nodes" type="list">
(variable name=”nodeno" type="int" init="-1">
(/info-structures>

(control-structures)
(variable name="nodechange" type="boolean">
(/control-structures>

 

 

 

Figure 5.9: Info and control structures

the nodes association list, may be deﬁned in this section (in this case, they are deﬁned
and exported from another module, available in the run-time environment).

The view initialization section is shown in Figure 5.10. The (only) BU-View view
is 700 by 700 pixels large, through which a rectangle in the world coordinate system
from (—10, —10) to (110, 120) is visible. In this example, the view neither scrolls nor
zooms. The control variable nodechange is set to true only to trigger the drawing Of

the switch in the beginning.

 

(view-initializations)
(view name="BU-View" window="700 700"
world="-10 110 -10 120" controls=”nodechange”>
(description>Switch, nodes and bandwidth
utilizations(/description)
(set! 10 it)
(lview)
(/view-initializations>

 

 

 

Figure 5.10: View initialization

There is one EP component for each event, although one could implement, for
example, only one for all the three events (depending on the desired granularity when
creating a component library). They are shown in Figure 5.11, as updating the info
and control structures according tO the event declarations. Fields of an event are
listed using a notation in which the event name is followed by some Of its ﬁelds’

130

 

 

(event-processing)

(op-component name="rmnode" inputs="rmnode.ts.node-id"
infos="numofnodes nodes nodeno" controls="nodechange">
(description>Remove a node(/description>
(input name="‘0">

(set! 31 (id-rem 31 ‘0/1))
(set! 80 (- 80 1))
(set! 82 -1)
(set! 20 8t)
(linput)

(/ep-component>

(ep-component name="addnode" inputs="addnode.ts.node-id"
infos="numofnodes nodes nodeno" controls="nodechange">
(description>Add a node, reset the infos(/description>
(input name="‘0">

(set! 81 (id-put 31 ‘0/1
(cons (vector ‘0/0 ‘0/1 0 0
(vector ‘0/0 ‘0/1 0 0
(set! 30 (+ $0 1))
(set! 32 -1)
(set! 10 3t)
(/input>

(Iep-component)

(op—component name="nodeprofile"
inputs="node-prf.ts.node-id.node-type.rkbps.tb.used.fps.packets.pack-used"
infos="numofnodes nodes nodeno")

(description>Update a node’s infos<ldescription>
(input name="“0">
(let ((old-info (cdr (id-get 31 ‘0/1)))
(new-info (vector ‘0/0 ‘0/1 ‘0/2 “0/3 ‘0/4
‘0/5 ‘0/6 “0/7 “0/8)))
(set! 31 (id-put 81 ‘0/1
(cons old-info new-info))))

O O 0 O 0 0)
0 O 0 0 0 0))))

(set! 82 ‘0/1)
(/input>
(/ep-component>
</event-processing>

 

Figure 5.11: Event processing components

names, delimited by periods. In the nodeprofile EP component, all the event ﬁelds
declared above are used. It is not necessary to use them all and in the same order as
declared; the ‘m/n notation uses the order(s) given in the inputs attribute. It can
be seen that the ﬁeld node-id is used as the key, and the value ﬁeld in the nodes
association list is a pair Of vectors keeping the previous and current proﬁle of a node.

In this example, only the current proﬁle will be used, but in a more complex VO both

the previous and current one may be needed.

131

 

 

 

(ir-component name="nodes-ir" views=”BU-View"
infos="numofnodes" controls="nodechange"
refresh=“update resize" buffer="yes" enable="%0">
(description>Switch, nodes, connections<ldescription)
(let! ((viewinfo (view-info view="‘0">)

(width (list-ref viewinfo 5))
(height (list-ref viewinfo 6))

(size (inexact->exact
(max (/ width 40) (/ height 40))))
(font (getfontname "fonttable"
"courier" size “bold")))
(text view="“0" coords=”50 107“ hali =“center”

font="font" fcolorS’"black"’
content=’"Bandwidth Utilization"’>
(figure view=”‘0" filename=’"bggif/switch.gif"’
orig-origin="0 0" orig-extents=“0 0"
world-origin="45 45" world-extents="10 10")
(let* ((nodenum (- $0 1))
(step (/ (if (gt $0 0)
(/ 6.28 nodenum)
0)))
(if (gt nodenum 0)
(let loop ((num nodenum))
(lets ((angle (* num step))
(sine (sin angle))
(cosine (cos angle)))

(figure view="‘0"
filename=’"bggif/node.gif"’
orig-origin="0 0" orig-extents="0 0"
world-origin="(+ 45 (t 45 sine))

(+ 45 (* 45 cosine))"
world-extents="10 10")
(line view="‘0" from="(+ 50 (t 6 sine))
(+ 50 (a 6 cosine))"
to="(+ 50 (t 39 sine))
(+ 50 (a 39 cosine))“
color=’"red"’ thick="12">

(if (gt nun 0)

(loop (- num 1))))))))
<end-with>(set! XO If)(/end-with>
</ir-component>

 

Figure 5.12: Template IR component

Finally, the information rendering section consists Of two IR components. The
nodes-ir IR component, which is shown in Figure 5.12 and will be executed ﬁrst,
writes text and draws the switch and as many “PGRT globes” around it as there
are active nodes, connected with the switch via thick red lines. (In the prototype
implementation of the VOML compiler, the execution order Of the IR components
is Opposite Of the order they appear in a speciﬁcation.) The enable attribute spec-

iﬁes that this IR component should be executed whenever the number Of nodes has

132

 

 

 

(it-component name="bu-ir" views="BU-View"
infos="numofnodes nodes nodeno" refresh="resize">
(description) Bandwidth utilization (/description>
(if (gt 82 -1)

(let* ((angle ("I 32 (/ 6.28 30)))
(sine (sin angle))
(cosine (cos angle))
(new-info (cdr (id-get $1 $2)))
(node-t (vector-ref new-info 2))
(newkbps (vector-ref new-info 3))
(newtotal (vector-ref new-info 4))
(newused (vector-ref new-info 5))
(sent (/ newkbps newtotal))
(mag (/ newused newkbps)))
(line view="‘0" from="(+ 50 (t 6 sine))
(+ 50 (a 6 cosine))"
to="(+ 50 (t 39 sine))
(+ 50 (t 39 cosine))"
color=’“red"’ thick="12">
(if (= node-t 2)
(line view="‘0" from="(+ 50 (s 6 sine))
(+ 50 (i 6 cosine))"
to="(+ 50 (t (+ 6 mag) sine))
(+ 50 (* (+ 6 mag) cosine))"
colora'"blue"’ thick="10">
(line view="‘0" from="(+ 50 (t 6 sine))
(+ 50 (t 6 cosine))"
to="(+ 50 (t (+ 6 mag) sine))
(+ 50 (t (+ 6 mag) cosine))"
color=’"green"’ thick="10">
(set! 82 -1)))
(/ir-component>

 

Figure 5.13: Active IR component

changed. The refresh attribute adds that everything the IR component drew last
time should be redrawn when the view BU-View is resized. It also speciﬁes that ev-
erything the IR component drew last time has tO be undrawn before something new
is drawn (whenever the IR component is enabled). The buffer attribute is used to
make the IR component draw in-memory only until it is done, and then ﬂush the con-
tents Of the memory to the screen. This is useful to make the rendering smoother and
faster when there are many graphical Objects to be drawn. This IR component resets
the nodechange control variable in the second phase, so that other IR components
may be added safely that depend on the value of this variable. In Scheme, the HLVO

prototyping language embedded in VOML, this second phase is implemented using

133

 

the delay and force primitives. This allows for mixing the ﬁrst- and second-phase
IR code as if there were only one phase, and avoiding the write-after-read hazard at a
high level. (Care must be taken when there are circular/ mutual dependencies among

control structure updates.)

The other IR component is executed each time an event is received, and it draws a
blue thick line on top Of a red thick line (drawn in the same place as the thick red line
drawn by the previous IR component, so that it can effectively be undrawn), showing
the relative bandwidth used by a node (i.e., a portion of the received bandwidth; the
line is drawn after a proﬁle event was received that resulted in setting the nodeno info
variable to a non-negative value). If the node is the server, it draws a green thick line
instead Of a blue one, showing the relative bandwidth consumed by the server. The
IR component code is shown in Figure 5.13. The refresh attribute treats the resizing

Of the view same as above.

A snapshot Of the view is given in Figure 5.14. Note that not all Of the VOML

features have been shown in this example.

5.4 Summary

A novel PAV technology intended to satisfy growing needs of researchers and users
Of heterogeneous parallel and distributed systems has been presented. Salient char-
acteristics of the technology include support for rapid prototyping and automated
design Of PAV tools, Object orientation, distributability, portability, code reuse and

134

34X eu-vrew Jﬁjﬁg’]

Band-rim ecu ﬁxation

 

 

 

 

 

Figure 5.14: A snapshot of the view

ﬂexibility. An example Of high-level visual Object speciﬁcation in V OML has been

presented that shows some Of the features Of the language.

135

Chapter 6

An Integrated Approach to
Real-Time System Design and
On-Line Performance Visualization

with Steering

This chapter presents an approach tO integration Of the two technologies presented
in chapters 3 and 5, which use BRISK, presented in Chapter 4, for instrumenting
and steering a target complex real-time system. First, in Section 6.1, an example,
real-world target distributed real-time system is described. Its design and engineering
using the CLP approach / technology is described in Section 6.2. Next, technical details
about the integration Of the two technologies are explained in Section 6.3. Finally,

136

the Operation of the new, integrated technology is shown on an example scenario in

Section 6.4.

6.1 Target Real-Time System

The target real-time system used to demonstrate the approach is a modiﬁcation Of a
real-time face-tracking system [11]. The original system runs on a single workstation
equipped with an inexpensive “eye” video camera. It analyzes video frames and ﬁnds
the workstation user’s face and, once the face has been found, the eyes (as well as other
features, such as the eyebrows and nose tip). By tracking the head movements and
gaze direction in real time, the system can be used as a basis for handless application

control, graphics cursor driver, etc.

match

match
A

PD-TRACK
grab,match
no match
no match

match (predict)
(recovered) -

FD_PREDICT
grab,match

match

         

PD-FIRST
match

 
  
 
 
  

   

 

match
(not
recovered)

   
 
  
  

0 match
(do not
predict)

Figure 6.1: Original face-tracking system’s state diagram

The state diagram of the original face—tracking system is shown in 6.1. The most
relevant facts about it are summarized below.

137

o The task of ﬁnding the user’s eyes (“match”) is relatively lightweight, and is the
main part of the system’s “inner loop.” It assumes that the user’s face has been
located, and is invoked as frequently as possible. It usually sufﬁces to invoke
this task at a frequency between 15 and 20 Hz in order to smoothly track the
eyes. The eyes can be located even if the user’s face moves a little, relatively to
the assumed face location. In cases of short temporary problems with ﬁnding
the eyes, the system uses a computationally simple Kalman filter to extrapolate

the eyes coordinates (“predict”).

a The most time-consuming task is that of ﬁnding the user’s face (“face”). As
the main part of the system’s “outer loop,” it is invoked after the user’s face
has moved signiﬁcantly enough for that the eye-ﬁnding task to fail to locate the
eyes. This usually happens after 5 to 20 successful eye ﬁndings, possibly helped

by a few predictions.

Real-time characterization of the original system. It is a soft real-time
system in which, besides a required minimum average tracking frequency that depends
on the workstation user’s head movements, important roles have the extrapolation and
face-recovery mechanisms. The latter is the most critical task because it interrupts
the tracking for a user-noticeable time interval. However, it is too expensive on a
workstation to look for the current location of the user’s face more often than only
when necessary.

While the original system achieves required tracking frequencies on an average
workstation, it does that at the cost of fully utilizing the CPU. Its modiﬁcation p-

138

resented here is a transformation into a client-server system. The motivation comes
from the facts that making the target real-time system distributed, (1) the approach
to design and engineering of complex, distributed real-time systems from this disser-
tation can be applied on a non-trivial case, and (2) the CPUs of several workstations

in a fast local—area network can be drastically off-loaded.

sewer side 3 client side

newrull image 9

 

 

 

 

 

 

"M.Wﬂmm

 

ﬁnd race area i

 

 

 

 

 

 

 

wait tor next period

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E grabasendlmm
newtacearea : J
ready? ;
, receive the servers’ r ”
ﬁndeyes : responses and update
ll round. return coordinates E W lxe and match MM
3 WT: mmwm statedagran
I i oondvars; or implementation

i Sthcdl Ihestate

waitlornen period 3 man unprarrrermm L J

Figure 6.2: Distributed face-tracking system

The original state diagram has been modiﬁed, and the tasks of face and eye
ﬁnding have been made largely decoupled periodic real-time tasks executing on the
same or two different Real-Time Linux (RTL) [12, 78] server PCs. The new system is
shown in Figure 6.2. A main modiﬁcation of the state diagram consists of replacing

all “grab, face/match” actions by returns (in the single—threaded case, where the

139

 

 

 

state diagram is a function) or waits on the corresponding signal variable (in the
multi-threaded case, where the state diagram is a separate thread). The invocation
frequencies of the real-time tasks are high enough to achieve a satisﬁable face-tracking
quality. At the same frequencies, they receive full and face-only raw images from the
client workstations, and return the coordinates of the face and eyes, respectively. The
client receives the coordinates, forwards new face coordinates to the eye-ﬁnding task,
extrapolates eyes coordinates if necessary, and possibly passes them to an application.

Real-time characterization of the distributed system. It is a soft real-time
system centered around a hard real-time subsystem running on the RTL server PCs.
Behavior based on (ﬁxed over an interval of time) periods had to replace the reactive
one, because the delay and jitter introduced by two transfers over the network (to
immediately send the image data and receive the coordinates) would stop the tracking
for a moment and make it less smooth. The hard real-time subsystem acts as a clock
of the whole system. The clients periodically send the image data into a pipeline and
receive the coordinates with more delay but less jitter out of it; they can estimate the
jitter and take advantage of that. Except for sudden head movements, the periodic
face—ﬁnding task is less critical, because the face coordinates are kept more up—to-date.
Even with a noticeable delay, the tracking can be useful and considered real-time if
the delay is almost constant.

Besides these main design choices, there are implementation details to compensate
partially the lack of real-time support in legacy workstations and networking hard-
ware. The chosen target system should be viewed mainly as an example. There may

be other real-time systems, similar in structure to this one, for which the motivation

140

 

’ ‘3 I! 9%...." ‘~

for a similar transformation may be stronger. Section 6.2 describes how the target

system has been engineered using the compiler-based approach from this dissertation.

6.2 RTSML Speciﬁcation

In the transformed target real-time system, described in the previous section, up

to eight pairs of CPU-demanding face and eye ﬁnding tasks execute on three RTL

server PCs. They have been implemented as real-time tasks, which are scheduled

using a rate-monotonic scheduler.

The Linux kernel is also scheduled by RTL as

a background task, which is allocated the CPU when no real-time tasks are ready.

Figure 6.3 sketches the description of RTL-specific details (the arrows represent data

and control ﬂow).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 other
shared memory J to? Linux
5 processes
I J
lace eye
finder ﬁnder
Linux kernel
FlT FIFOs
RT 188k controller ll
” ” Real-Time kernel V )(
Interrupt control hardware V

 

 

 

Figure 6.3: RTL-speciﬁc details of the target system

141

Due to the current lack of networking (TCP/IP) support in RTL, the inputs of the
real—time tasks, messages with images from the corresponding clients’ cameras, are
received by a Linux network interface card (NIC) driver (see [95] for more information)
and read by a proxy Linux process. The proxy process stores them to a (unmapped
and hidden from the Linux kernel) shared memory area accessible to the real-time
tasks. In order for the NIC driver not to lose packets of the incoming messages, causing
their retransmission, it must be allowed to process them very frequently (1—2 kHz,
depending on the NIC hardware). Since the driver cannot be isolated from the rest of
Linux (and treated as a real-time task), the whole Linux kernel must be allocated the
CPU very frequently and allowed enough time in each turn for the driver’s interrupt
handling routine, for which the RTL leaves NIC interrupts pending, can copy the
incoming packets from the NIC. This required a modiﬁcation of the default rate-
monotonic scheduler, such that it (1) treats Linux similarly to a periodic real-time
task, and also (2) allocates the CPU to it when no real-time tasks are ready. Unlike
the real-time tasks, which return the control to the scheduler when they ﬁnish, the
periodic invocations of Linux need to be stopped by the RTL’s timer handler after
their allowed time has expired.

Furthermore, a real-time analysis of the trafﬁc coming into a RTL server from a
fast Ethernet-based LAN is very limited. For example, it is not possible to guarantee
that before each instance of a real—time task is allocated the CPU, a fresh input will
be available in the shared memory. However, at least a soft real-time, average-case
analysis can be beneﬁcial that is applied on the TCP/IP packet assembly in the
Linux kernel and storing of messages to the shared memory by the proxy task (the

142

proxy task receives the messages via stream sockets). Even though the Linux kernel
is allocated the CPU very frequently, its allowed time in each turn is only supposed
to be long enough to handle outstanding packets in the NIC buffers. Part of this time
may actually be given by the Linux kernel to the TCP/IP subsystem and proxy task,
but in a conservative analysis it is assumed negligible. Within an interval of time,
the time needed by the proxy task to prepare the inputs for the real-time tasks is
(conservatively) approximated as linearly proportional to the sum of the lengths of all
the messages expected to arrive within that interval. A similar approximation can be
done for the task of TCP/IP packet assembly, where small but possible bursts/ jitter
should be taken into account. Altogether, for this purpose a constant K6,, [ms/KB]

can be deﬁned whose value is a measured normalized input-preparation time on a

given RTL server PC.

To reconcile the above RTL-speciﬁc requirements with the exact rate—monotonic
schedulability analysis, a theorem due to Lehoczky, Sha and Ding [113] is presented for

a start (T,, C.- and D,- are period, time demand and deadline of task 7",, respectively):

Theorem. Let a periodic task set 71,12, . . .,r,, be given in priority order and
scheduled by a ﬁxed priority scheduling algorithm using those priorities. If D,- 3 Ti,

then T,- will meet all its deadlines under all task phasings if and only if

" C, t
min —[—) g 1. (6.1)
—‘ 'jzl t 71.7

The entire task set is schedulable under the worst case phasing if and only if

143

max min

l<z<n O<t<D
i=1

C'JT
—— — 6.2
Note: To ﬁnd the minimum, one needs only to consider some of the values of t which

are multiples of any of T1, T2, . . . ,T,_1 and D,. [3

Assume that under a task-to—processor allocation, the tasks 7'], 72, . . . , Tn are some
face and eye ﬁnding tasks on a RTL server PC. Their priorities are, under the rate-
monotonic scheduling, ﬁxed based on their periods. First, the frequent allocation of
the CPU to the Linux kernel may be added to the above equations as task To, the
highest-priority one. However, a simpliﬁcation is possible because To is two to three
orders of magnitude smaller than the periods of the other tasks. Namely, in the sum-
mand in Equation 6.2, when t takes on the values of multiples of T], j = 0, 1, . . . ,i— 1,
the relative error of removing the ceiling function 1n the summand for j: 0 becomes
negligible, and the summand approaches a constant utilization fraction Co/To. It can
be subtracted from both sides of the inequality for all 2' = 1,2,. . .,n. Furthermore,
assuming that time demands C,,z' = 1, 2, . . . , n, are inversely proportional to the CPU
capacity (unlike Co, which is the ﬁxed allowed time), the effect of introducing the fre-
quent CPU allocations to the Linux kernel is approximately equivalent to slowing
down the CPU 1/(1 — Co / T 0) times (the error is on the conservative side, because the
time demands C,,z' = 1, 2, . . . , n, on a real PC with a memory hierarchy will actually

be increased by a smaller factor).

Second, consider “promoting” the Linux background task into a lowest-priority
real-time task Tn+1 with Tn“ = Dn+1 > T". ' The time demand within an interval of

144

time tcp S Tn“, for preparing the inputs from messages expected to arrive within the
interval, under an assumption that tap is large enough for an average-case analysis, is

given by the following formula:

n «Li
Cn+l(tcp) : tcchp 2 Ta (63)
i=1 1

where L,- is the length of "0’s input. Equation 6.1 has been chosen as the feasibility
test for task Tn+1, with Cn+1 = Cn+1(Tn+1) and Tn+1 slightly larger than the largest
of the other periods. A stronger test could, for example, have multiple deadlines for

partial preparations (so that each instance of a real-time task gets a fresh input before

it is allocated the CPU):

max + + Z gfig—ill S 1. (6.4)
J

lgzgn CT, . -

However, such a stronger and more complex test approaches a worst-case analysis as
ta,D becomes small for the average-case analysis. Since the incoming network trafﬁc is
non-deterministic, a worst-case analysis is not worth pursuing. On the other hand,
allowing tcp to become too large would make the chosen feasibility test degenerate
into a simple CPU utilization check, which is unnecessarily weak. Finally, the cho-
sen feasibility test requires only a minimal extension to the original rate-monotonic
schedulability analysis.

The soft real-time analysis of the network traffic is simple. The whole LAN is
viewed as a single communication channel. While the message latency cannot be
bounded in a non-real-time LAN, one can hope that it will be relatively low if the

145

LAN is not very loaded. By observation, e.g., looking at local FTP throughput rates,
an upper bound for the allowed bandwidth B can be set. Then, the following formula
is used to check if the messages will likely arrive from all the clients to all the servers

with low latency:

bl

f. (6.5)

I

BZZ
i=1

H

The client programs execute on various non-real-time operating systems, one client
per computer. They are now computationally light, with the average CPU time
needed per video frame depending also on the application built on top of the face-
tracking task. In addition, the CPU time needed by the OS for copying and sending
messages to the server(s) is modeled similarly as in the case of the Linux background
task on an RTL server PC. The total CPU utilization by the client can be limited

using the following formula:

Cnorm cl 1' Cnorm c
Si > __‘ ______'_E 6.6

where S,- is the allowed dedicated CPU capacity of the computer on which the client 2'
executes (the CPU capacity in idealized MIPS is used instead of utilization, because
the RTSML compiler modules expect it as one of the parameters). Gummy“,- is the
normalized (at 1 MIPS) average CPU time demand of the client within an interval of
time equal to the period of the corresponding, more frequent, eye-ﬁnding task, T 61,,-
(conservatively estimated). Guam“, [ms] is the normalized message c0pying demand
time within a unit of time (1000 ms), obtained using a formula similar to Equation 6.3.

146

Here ends the modeling of the distributed real-time face-tracking system. Two
RTSML compiler modules have been used to compile parts of the system’s speciﬁca-

tion:

1. default module common, for simple periodic tasks and messages, non-real-time
channels and processors with utilization checks only (using generalizations of
Equation 6.5 and Equation 6.6), and static routing. The client tasks, which are
not trivially periodic, have been approximated by periodic ones; this explains

the form of Equation 6.6.

2. module rtlrms, for RTL-augmented processors with rate-monotonic scheduling
(using a generalization of Equation 6.2). Both the rtlrms and common modules
internally create for each processor a common task to approximately model the

CPU time demand for unaccounted message processing on that processor.

The above-mentioned generalizations assume all possible task-to-processor and
message-to—route allocations, variable CPU capacities, message multicasting, etc. The
common and rtsml modules generate CLP code in which satisfactory resource alloca-
tions are to be found by applying possibly multiple problem-solving approaches. In
general, other modules may search for satisfactory values of some real-time parame-
ters, such as periods. In the context of CLP and problem solving, one distinguishes
between input parameters (e.g., tasks’ periods) and output solutions (e.g., task-to-
processor allocations). Figure 6.4 shows excerpts from the RTSML specification of
the target system.

147

 

1 <!DOCTYPE RTSML PUBLIC "-//MSU-PGRT//DTD RTSML 2.0//EN”>

2 <rtsm1>

3 (head)

4 <tit1e>Distributed Face Tracking Application</tit1e>

5 </head>

6 (body)

7 <system id="ft" type="common">

8 (processor-group id="pg0" type="common">

9 (processor id="p0" type="rtlrms (mips (initial 450) (domain (450)))
10 (norm-copy-time 5)"></processor>

11 <processor id=“p1” type="rtlrms (mips (initial 450) (domain (450)))
12 (norm-copy-time 5)"></processor>

13 (processor id=“p2” type="rt1rms (mips (initial 200) (domain (200)))
14 (norm-copy-time 5)"></processor>

15 </processor-group>

16 <channel-group id="cg0" type="common">

17 (channel id="c0" type="common (mbps (initial 4)

18 (domain ((range 2 5))))"></channe1>
19 </channel-group>

20 <processor-group id="pg1" type="common">

21 (processor id="wsO" type="common (mips (initial 100) (domain (50 100)))
22 (norm-copy-time 6)"></processor>
23 ...

24 <processor id=”us7" type="common (mips (initial 100) (domain (50 100)))
25 (norm-copy-time 6)"></processor>
26 </processor-group>

27 (task-group id="servers" type=“common">

28 <task id="stf0" type="common

29 (period (initial 590) (domain ((range 500 1000))))

30 (deadline (initial 590) (domain ((range 100 1000))))

31 (timeOImips (initial 20250) (domain (0 (range 18000 22500))))"
32 processors="pg0"></task>

33 (task id="ste0" type="common

34 (period (initial 59) (domain ((range 50 100))))

35 (deadline (initial 59) (domain ((range 50 100))))

36 (timeclmips (initial 4050) (domain (0 (range 3600 4500))))"
37 processors="pg0"></task>

38 . <!-- through stf7 and ste7 —->

39 </task-group>

40 (task-group id="clients" type="common">

41 <task id="ct0" type="common

42 (period (inherit-from steO))

43 (deadline (inherit-from ete0))

44 (time01mips (initial 500) (domain (0 500)))"

45 processors="wsO"></task>

46 . <!-- through ct7 -->

47 </task-group>

48 (message-group id="mg0" type="common“>

49 (message id="m10" type-“common

50 (period (inherit-from at10))

51 (deadline (inherit-from eth))

52 (length (initial 225) (domain (0 225)))"

53 source="ct0“ destinatione="st10”></meaeage>

54 (message id="me0" type="common

55 (period (inherit-from ete0))

56 (deadline (inherit-from ete0))

57 (length (initial 7) (domain (0 (range 4 14))))"

58 source-"ctO" deetinationa="ate0"></meaaage>

59 . <!-- through m1? and me7 -->

60 </message-group>

61 (routing-table id="rt0" type="common">

62 (route id="r00" type="common" aource="w30" destination="p0"

63 channels-"c0")

64 ...

65 </routing-tab1e>

66 </ayetem>

67 </body>

68 </rtsm1>

 

Figure 6.4: RTSML speciﬁcation excerpts
148

In line 7 of Figure 6.4, common means that there is no Special relationship between
the subsystems of the system with identiﬁer ft. Similar holds for the subsystems
themselves (processor, channel, task and message groups), and routing. The pro-
cessors in group ng, whose descriptions start in line 8, are the ones running RTL
with the modiﬁed rate-monotonic scheduler, and the real-time model that they obey
is identiﬁed by the name of the corresponding compiler module, rtlrms. After this
identiﬁer, follows a list of model parameters (given using S-expressions) of the pro-
cessor being described: its capacity in estimated idealized MIPS (initial value and
the domain, which may contain single integer values and ranges or integers) and the
normalized message-copying demand time (ch) at the unit CPU capacity (1 MIPS).
Parameters which have domains of possible values can be changed at run time (in
the case of the MIPS parameter, the changing means increasing/ decreasing the CPU
utilization share allocated to the tasks of the target system). In line 17, channel c0
models the 100 Mbps LAN that we have used for the clients and servers: it can be
loaded with 2—5 MBps, depending on other users’ usage. Eight client workstations
are described in lines 20—26 as common, i.e., having a non-real-time OS. Eight pairs
of face- and eye-ﬁnding real-time tasks are described in lines 27—39. The ranges of
their periods correspond to the ranges of usual frequencies of the face- and eye-ﬁnding
tasks in the original system. There are no special deadline constraints, so that they
can be equal to the periods. The time01mips parameter is the normalized CPU
time demand in milliseconds, and its value is the maximum measured time demand
in typical face-tracking experiments, multiplied by the CPU’s speciﬁed capacity in

idealized MIPS. Depending on the actual workstation user’s head/eye movements,

149

this maximum value can vary within a range. The value of 0 means that the task is
actually not running at all. The processors attribute states that the real-time tasks
can only be allocated the processors in the group ng. Eight client tasks are described
in lines 40—47. Their periods and deadlines are inherited from the corresponding eye-
ﬁnding server tasks for the purpose of modeling, but are also enforced so by a proxy
task at run time, as will be described in Section 6.4. The inherit-from speciﬁer is
a convention provided by the module common for use with tasks and messages. In
lines 48—60, the messages carrying full and face-only raw images are described. The
length of a full 24-bit video frame needed for the face ﬁnding is 225 kB; the maximum
length of a message carrying a typical 8-bit face area needed for the eye ﬁnding is
7 kB, but it can vary as the user moves closer/ farther from the camera. An implicit
convention of the common module is that the length of a message is 0 if its source
task’s time demand is 0, which means that the message is not sent at all. Finally,
in lines 61—65, routes (in general, series of channels) are statically speciﬁed for each

source-destination pair of processors.

6.3 RTSML Compiler Extension

This section describes the coupling of the RTSML/CLP and VOML-based on-line
PAV technology, and extensions of the RTSML compiler for this purpose. In the
coupling, the PGRTT IE environment with visual objects also acts as an intermediary
between the target system, a human operator and the CLP engine, as shown in
Figure 6.5.

150

 

 

 

 

 

 

 

   

(63%) Poems

Taroet System \ steer. omrh
ﬁ ECUPSe
V0

i
V'— ISM J CLP m
/ W pert. data '\‘

pm
[a k [maul vo views] J

 

 

 

 

 

 

 

 

 

L enrsrr
r/‘\

 

(

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L L Linux J

H RTL server PC

 

 

 

Figure 6.5: Integrated visualization, repair and steering

The performance data arrive to the environment and steering commands are sent
to the target system via BRISK. Either the human operator or a visual object alone
may trigger a repair as a response to, for example, deteriorated performance of the
target system.

The core of compiler and three currently available modules have been extended to
support the creation of VOML speciﬁcations, as well. From an RTSML speciﬁcation,
the compiler generates a VOML speciﬁcation with one visual object for each system
element. (That is, a target system may actually consist of multiple, disjoint or related,

systems as by the RTSML notation.) This is achieved by

0 adding slots to the classes of the initial objects that contain (1) data needed
for graphical rendering of a system’s structure and model-speciﬁc performance
information of its subsystems and components, and (2) chunks of model-speciﬁc
VOML/ Scheme code for processing events and rendering performance informa-

151

tion, as well as preparing the parameter inputs and possibly automatic repair
triggering;

e modifying the second compilation phase to both update some VOML data slots
and augment the added methods to update some VOML data in the CLP code
generation phase;

0 additional cross-linking of the VOML data in the third compilation phase; and

0 extending the fourth compilation phase with the splicing of all the
VOML / Scheme data and code together into a V OML ﬁle.

The correspondence between the CLP and VOML/Scheme data and code is not
always direct, because the latter must in some cases produce less abstract information
suitable to the graphical rendering. Some VOML data, which are stored in the info
and control structures, are compile—time transformations of the information supplied
about the real-time parameters; Scheme code, embedded in VOML elements, is also
needed in cases when similar transformations need be performed on-line by a created
visual object. For example, while in the CLP domain messages are thought of as being
sent via routes, a graphical view of a distributed system’s physical resources should
depict channels instead of the routes. Or, while different tasks may have different
ranges of values for their periods, a View with normalized ranges and current values
may be easier to comprehend.

In addition to the module-speciﬁc VOML/ Scheme data and code, there is a con-
vention that enables the creation of several standard views for any modeled system.
Namely, each subsystem has to provide common VOML data: the name of the real-
time model obeyed; identiﬁer; and a list of its components. Similarly, each component
has to provide: the name of the real-time model obeyed; identiﬁer; subsystem iden-
tiﬁer; a list of allocated resources or activities; and the minimum, maximum and

152

initial/current goodness value. After these common data, any module may or may
not provide additional data and accompanying code. The standard views for an
RTSML system include: a resource (processors and channels) and activity (tasks and
messages) graphs, generated using a graph-drawing algorithm [34]; and one view for
each of the subsystem types: processor, channel, task and message group, graphically
presenting contained components and their goodness values. All the views are inter—
linked: the user can, for example, click on a task in the activity graph, and the task
view will pop up; the user can then click on the task view to open another view of

the corresponding task group, etc.

Note, on the other hand, that using this scheme, an RTSML compiler module
could be used for implementing a speciﬁc performance visualization aspect only. For
example, instead of using common as the value of the type attribute of a task-group
element, one could use the name of a “real-time model” that sets no additional con-
straints over its tasks, but only creates a view in which speciﬁc performance aspects,
such as the tasks’ collective timeliness, are visualized on-line. (It is assumed that
a compiler module handling the task-group element would “know how” to extract

relevant data from the task objects based on the task class.)

This section concludes with the description of an example VOML extension of the
RTSML compiler for supporting common tasks (to save space, the complete code is
omitted). The list of modiﬁcations is as follows:

153

e the module-speciﬁc class i-task-common, which inherits from the initial ob-

ject class i-task, speciﬁes shared (per-class) slots containing the following

VOML / Scheme code:

— event declarations: a callback event common-task-click for clicking on the
common-task view; and a data event common-task-instance that carries
the information about a task instance’s identiﬁer, start time, response time,

and time demand;

— control structures: a Boolean (distinguishing between false and any oth-
er value) variable current-common-task for remembering which task’s
information is rendered in the common-task view; and a Boolean vari-
able common-task-updated considered for enabling an IR component
common-task-updated-ir, which updates graphical presentation of the

current-common-task’s performance information;

— info structures: an association list common-tasks that contains module-

speciﬁc, on-line processed information about all common tasks;
— view initializations: the title, size and span of the common-task view;

- event processing: an EP component common-task-click-ep for respond-
ing to the user’s click on “task group” or (allocated) “processor” in
the common-task view, by updating one of predeﬁned control struc-
tures selected-task-group and selected-processor; an EP componen-
t common-task-instance-ep for processing the common-task-instance
events and updating the common-tasks association list with new mini-

154

mum, moving average and maximum values of the period, response time
and time demand; and an EP component common-repair for responding
to a pre-condition event repair-needed by setting the new values of some
of the parameters, as needed for the CLP repair described above, based on

all the common tasks’ moving average period and time demand; and

— information rendering: an IR component common-task-ir for render-
ing in the common-task view the expected performance information
as previously computed by the CLP engine; and the IR componen-
t common-task-updated-ir, mentioned above, for rendering the current

performance information that can be compared against the expected ones;

0 one of the individual (per-instance) CLP code generation methods, which out-
puts lists of CLP variables, is augmented to update mappings between the
CLP parameter and solution variables, and data (period, deadline, demand and
processor allocation) kept in the VOML info structures, so that the expected
performance data and resource allocation computed by the CLP engine can be
loaded into the info structures after an initial solution has been found or af-
ter a repair (note that immediately after a repair, the expected performance

information is consistent with previously measured ones);

0 the module-speciﬁc method used in the second compilation phase, as described
at the beginning of this section, is augmented to update task-related data, kept
in a predeﬁned slot of the class i-system that will be stored in VOML info

155

structures; and to collect all the VOML/ Scheme code from the i-task-common

class’ shared slots into predeﬁned slots of the class i-system; and

e the data and code from the above-mentioned slots of the i-system class are

spliced into a V OML ﬁle in the ﬁnal compilation phase.

6.4 Example Run-Time Session

This section completes the presentation of the integration by giving some more detail
about the target system and the integrated approach that has been adopted for
experimentation, and by showing an excerpt from the operation of the former being

steered by the latter.

A middleware has been developed that consists of three Linux proxy tasks, de-
scribed in Section 6.2, one of them acting as the main proxy for the face-tracking
clients. The proxy tasks are located in a dedicated LAN that is connected to larger
LAN via a switch. All the proxies and clients are instrumented using BRISK. The
proxies also receive steering commands via BRISK and control the real-time tasks’ pe-
riods and RTL server PC allocations, which they forward to the clients. The following

is a sequence of steps that initiates a run-time session:

1. the PORT-TIE environment is started on a control host computer;

2. on the host computer, outside PORT-TIE, the human operator starts the BRISK
Instrumentation System Manager (ISM);

156

. on all the RTL server PCs and computers where the face-tracking clients will
be running, the BRISK External Sensor (EXS) is started; on the RTL server

PCS, the EXSes are given unique, known identiﬁers;

. the proxies are started and given the same identiﬁers as those of the EXSes

running on the same RTL server PCs;

. in PORT-TIE, the human operator starts BRISK as a so—called data source,
from which the visual object will receive instrumentation data records (as per-

formance data events);

. in PGRT-TIE, the human operator loads a Scheme ﬁle generated by the VOML
compiler, containing the visual object according to the target system’s RTSML

speciﬁcation;

. by opening, for example, the resource graph view of the visual object, and
by clicking on a processor, the ﬁrst (callback) event is generated; before the
processor view is opened, the visual object starts the ECL‘PSC engine on another
computer via the Guile implementation of the Expect library [72], and asks it to
ﬁnd a feasible resource allocation according to the initial real-time parameters

in the RTSML speciﬁcation;

. after the solution received from ECL‘PSe is loaded into the info structures, a spe-
cial, manually added event-processing component passes the task-to-processor
allocation mapping and periods for the real-time tasks, via the BRISK inter-
face of PGRT-TIE and the ISM, to the EXSes running on the RTL server PCs;

157

10.

the proxies read the allocation and periods from a memory area shared be-
tween them and the EXSes, start/stop the real-time tasks and forward these

information to the clients, too;

. the face-tracking clients are started, and they connect to the main proxy task,

from which they receive unique client identiﬁers and the addresses of the RTL
server PCs which will run the corresponding face- and eye-ﬁnding tasks, ac-
cording to the initial resource allocation; subsequently, the clients establish

connections with the proxies running on the assigned RTL server PCs;

the visual object starts receiving performance data from the proxy tasks (about '
the real-time tasks, such as the common-task-instance event mentioned in

Section 6.3) and clients.

The above-mentioned special event-processing component has knowledge about

the correspondence of the proxy and real—time task identiﬁers on one hand, and the

identiﬁers of the processors belonging to the group ng and tasks belonging to the

group servers in the RTSML speciﬁcation, on the other. Since the steering is inte-

grated with the performance visualization, and only indirectly with the design and

engineering approach, the RTSML compiler has no knowledge about a particular 8-

teering mechanism. Note, again, the disparity among the various variables involved:

(1) the period and time demand of a task are system parameters, and can be mon-

itored; (2) the deadline is also a system parameter, but is not supposed to be mon-

itored; (3) the goodness of a task is deﬁned as its laxity (deadline minus response,

one of the ﬁelds of the common-task-instance event), which can be monitored but

158

is not a system parameter; etc. Probably the most of related knowledge that can be
embedded in an RTSML compiler module is about what variables are monitorable,
regardless of whether they belong to the parameters or solution. This would help the
code reuse and separate design / engineering from steering concerns.

Once that the run-time session has been initiated, the visual object allows the
human operator to compare visually the expected and actual system performance.
The common-task view, mentioned in Section 6.3, is shown in Figure 6.6, together
with a few other views and the GUI.

In order to receive the coordinates of the new face area and the face features within
the current face area at the requested frequencies, the clients must send the messages
to the real-time tasks in a pipelined fashion, with up to a predeﬁned maximum number
of outstanding messages in the LAN and servers’ buffers. On the other hand, if a
message does not arrive in time to be processed by a real-time task, the real-time task
immediately returns the CPU to the RTL scheduler and waits for the next period.
In this case, the sending of a common-task-instance event via BRISK is skipped.
Too frequent skips can be noticed by the moving-average task period that becomes
signiﬁcantly larger than the projected period. Similarly, if it becomes harder to ﬁnd
a workstation user’s face or eyes, the time demand by the corresponding real-time
task(s) may become larger than the expected maximum. The clients also may send
data about their perceived quality of service, which, for example, may drop if the face-
tracking needs higher face- or eye-ﬁnding frequencies; these data may be processed
and visualized by another visual object, possibly on another computer. In a situation
like these, the human operator may decide to trigger a repair, by pressing the “Trigger

159

 

 

 

£— _m>w_ ...-«.3 $2580 .5339:
...... _m>m._ 3:533 $2.286 5:55
....wamx Lamar;
Caz/:9»: .3 _
83on .5555 .5323"
LowmmuPﬂ WEIDH: ..Eume
3:520 .SEEB segue—ow
x93 :2:an .5325
mmxmmmz 55:23 .6325
2:86 wamwwmz smUoEw

....Eo xmf canoe—ow

.595 6:525 382%,

1:20 Lomwmuﬁhn— twang—mm ,

.5me 3.2.3.

.530 38.53”.

connBOLm

. 35...?th ...;

 

 

 

1 object

15113.

A snapshot from the RTSML-based v

Figure 6.6

160

Repair” button in the GUI, which will result in passing of the measured performance
information as new parameters to ECLiPSe. With a more elaborate GUI, generated
indirectly by the RTSML compiler modules, the human operator could be allowed
to manually override the new parameters. If a solution is found by ECL’PS", it
is enforced upon the target system in a way similar to the steps 8—9 in the above
sequence. Depending on the target system model’s complexity and the values of the
parameters, the repair approach’s solving time may vary from about the same as for
solving from scratch, to about an order of magnitude faster than for solving from

scratch, as discussed in Section 3.3.2.

6.5 Summary

A novel technology was presented that integrates multiple other approaches: the
design and engineering of complex real-time systems; support for their dynamic re-
conﬁguration; on-line PAV with steering/reconﬁguration. The presentation included
an example distributed real-time system and details about how the involved technolo-
gies are used in an integrated manner to help system developers and users achieve

better system performance.

161

 

Chapter 7

Conclusions and Future Work

This dissertation describes the work in four areas related to heterogeneous, parallel
and distributed real-time systems. The main focus is on a compiler-based approach
to the design and engineering of complex real-time systems, which is augmented by
instrumentation, performance analysis & visualization, and dynamic system reconﬁg-

uration.

7 .1 Research Contributions

Based on the analogy with non-linear circuits in other areas, such as the control
theory and digital systems, the design and engineering approach for complex real-
time systems presented in this dissertation views a complex real-time system as a
network of integrated and interoperating subsystems and components. It attempts
to help developing distributed, heterogeneous real-time systems that can operate in

162

more realistic scenarios by integrating specialized parts that abide extant models from

real-time theory.

The observation that real-time requirements are best termed as time and resource
constraints expressed via equations and inequalities of the real-time models, led to the
application of CLP as a promising engineering tool for specifying through integration

and solving real-time design problems.

An SGML-based speciﬁcation language (RTSML) has been developed that sup-
ports integration of heterogeneous components and subsystems into complex real-time
systems. Its compiler generates CLP code and can be extended with new modules
to support other real-time models. An especially interesting and important aspect
of the CLP-based problem solving is the repair-based approach. The experiments
showed that it is promising for coping with moderate dynamic changes in the system

parameters. Portions of this work were published in [8].

In the area of distributed instrumentation systems, the concept of a portable
and ﬂexible IS kernel was proven by the development and evaluation of BRISK. It
is expected to be beneﬁcial for instrumenting heterogeneous distributed real-time
systems due to its ability to adapt and tune so as to minimize the intrusion on a

real-time target system. Portions of this work were published in [10].

A novel on-line performance analysis and visualization technology for heteroge-
neous parallel and distributed systems was developed. It supports rapid PAV pro-
totyping, automated design of PAV tools, object orientation, distributability, porta-
bility, code reuse and flexibility. A high-level visual object speciﬁcation language

163

(VOML) has been developed. Its compiler generates visual object code that receives
instrumentation data. from BRISK. Portions of this work were published in [9].

The RTSML compiler was extended with support for the developed on-line PAV
technology. The BRISK IS kernel is also utilized in a novel, mainly automated, in-
tegrated technology that includes the design and engineering of complex real-time
systems, system-speciﬁc PAV, and PAV-driven and model-based dynamic reconﬁgu-

ration.

7 .2 Future Work

Work in the complex real-time system design and engineering based on the approach

presented in this dissertation may include:

0 addition of new compiler modules for other real-time models, with an emphasis
on stochastic ones,

e static (e.g., introduction of redundant constraints) and dynamic (e.g., support
for user/module-speciﬁed labeling priorities) CLP search/repair optimization,
and

e potential extensions to RTSML (e.g., broader support for end-to-end con-
straints; ﬁner-than-task granularity of real-time software; support for formal
methods).

The basic BRISK implementation may be extended in several ways. First of
all, hierarchical instrumentation data collection, preprocessing and reduction—via
the notion of local ISM—would make it usable for large-scale parallel/distributed
systems. For programmable, run-time control of BRISK’S components, it may be ex-
tended with a portable interpreter [73] over the existing communication infrastructure

164

 

for BRISK control. (It is assumed that resources needed for the IS control are in-
signiﬁcant compared to those used for the main IS functions under average operating
conditions.) A system type may be added to event records to allow for low-latency,
out-of-band events. The NOTICE macros could be made multithread-safe. The sup-
port for causally-related events may be extended with one-to-many and many-to—one
dependencies, such as those described in [67]. As needed, more interfaces to consumer
tools would be added, e. g., SDDF trace ﬁles and DCOM [20]. More evaluations ought
to be done, especially of latency and causally-related events.

Besides potential extensions to the VOML language and enhancements of the
underlying VO implementations, the most promising aspect of the on-line PAV tech-
nology is its being a base for the development of advanced on-line PAV tools. The
EPIRA and the declarative style of composing an HLVO from reusable components
allow for easy automated generation of domain-speciﬁc on-line PAV tools, analogous
to the one presented in Chapter 6. On the other hand, the VOML compiler could be
extended into a more powerful tool, such as an expert system for, e.g., pattern-based,
on-line PAV composition.

The application of the PAV-driven dynamic reconﬁguration of a complex real-
time system could be extended. For example, there could be multiple, distributed
and cooperating visual objects and CLP tools that would exploit the repair-based
dynamic system reconﬁguration for different purposes, such as gradual performance
improvement; or simultaneously trying and deciding from multiple, different repairs

(with different assumptions and /or repair heuristics), etc.

165

APPENDICES

166

Appendix A

RTSML Document Type Deﬁnition

(excerpt)

This appendix provides the precise deﬁnition of RT SML, which is subject to changes
and extensions in the future. The header part of the DTD is omitted to save space.
Element attributes marked #IMPLIED are inferred from the context by the compiler,

unless speciﬁed by the user.

 

 

< ! - Document Body ->

<!ELEMENT PROCESSOR - - (TASK*)>
<!ATTLIST PROCESSOR

1d 10 #REQUIRED

sys IDREF #IHPLIED

type CDATA #REQUIRED

group IDREF #IMPLIED

>
<!ELEMBNT PROCESSOR-GROUP - - (PROCESSOR*)>
<!ATTLIST PROCESSOR-GROUP

id ID #REQUIRED

167

sys IDREF #IMPLIED

type CDATA #REQUIRED
>
<!ELEMENT CHANNEL - - (#PCDATA)>
<!ATTLIST CHANNEL
id ID #RBQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
group IDREF #IMPLIED
>
<!ELEMENT CHANNEL-GROUP - - (CHANNEL*)>
<!ATTLIST CHANNEL-GROUP
id ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED

>

<!ELEMENT ROUTE - O EMPTY)
<!ATTLIST ROUTE

1d ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
table IDREF #IMPLIED
source IDREF #REQUIRED
destination IDREF #REQUIRED
channels IDREFS #REQUIRED
>

<!ELEMENT ROUTING-TABLE - - (ROUTE*)>

<!ATTLIST ROUTING-TABLE
id ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
)

<!ELEMENT TASK - - (#PCDATA)>

<!ATTLIST TASK
id ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
group IDREF #IHPLIED

processors IDREFS #IMPLIED
>

<!ELEHENT TASK-GROUP - - (TASK*)>

168

<!ATTLIST TASK-GROUP

1d ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
>
<!ELEMENT MESSAGE - - (#PCDATA)>
<!ATTLIST MESSAGE
id ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
group IDREF #IMPLIED
source IDREF #REQUIRED

destinations IDREFS #REQUIRED
>

<!ELEMENT MESSAGE-GROUP - - (MESSAGE*)>

<!ATTLIST MESSAGE-GROUP
id ID #REQUIRED
sys IDREF #IMPLIED
type CDATA #REQUIRED
>

<!ELEMENT SYSTEM - - ((PROCESSDR-GROUP I

CHANNEL-GROUP l
ROUTING-TABLE I
TASK-GROUP l
MESSAGE-GROUP)*)>
<!ATTLIST SYSTEM
id ID #REQUIRED

type CDATA #REQUIRED
>

<!ELEMENT BODY O O (SYSTEM)*>

 

 

<!— Document Structure
<!ENTITY Z version.attr "VERSION CDATA #FIXED ’%RTSML.Version;’">
<!ELEMENT RTSML 0 0 (HEAD, BODY)>

<!ATTLIST RTSML

Zversion.attr;
>

169

Appendix B

VOML Document Type Deﬁnition

(excerpt)

This appendix provides the precise deﬁnition of V OML, which is subject to changes

and extensions in the future. The header part of the DTD is omitted to save space.

Element attributes marked #IMPLIED are inferred from the context by the compiler,

unless speciﬁed by the user.

 

<!-

<!ENTITY % datatype
<!ENTITY % guitype
<!ENTITY % cbacktype
<!ENTITY Z recordtype
<!ENTITY Z inputtype
<!ENTITY % graphtype
<!ENTITY X vieucontrol
<!ENTITY Z scrolldir
<!ENTITY % svaltype
<!ENTITY Z boolval

 

Document Body >

"(intlreallstringlbooleanllistlvector)">

"(checkbuttonltextboxlslidebar)">

"(clicklkeystrokelresizeIrescalelgui)">

"(entrylexitlatomiclpg-entrylpg-exitlpg-atomic)">

"INPUTIPRECDNDITIONlPOSTCONDITIDN")

"POINTILINEIRECTANGLEIPOLYGONIARCIELLIPSEITEXTIFIGURE")

"OPEN-VIEUICLOSE-VIEWISCROLLIRESIZEIRESCALEIVIEH-INFOI
SNAPSHOTITEXT-EXTENTS">

"(leftlrightlupldown)">

"(endvalldisplacement)">

"(yeslno)">

170

<!ENTITY Z clipval "(edgelmargin)">

<!ELEMENT DATA-FIELD - O EMPTY>
<!ATTLIST DATA-FIELD
name NAME #REQUIRED
index NUMBER #IMPLIED
type Zdatatype #IMPLIED
>

<!ELEMENT DATA-EVENT - - (DATA-FIELDI#PCDATA)*>
<!ATTLIST DATA-EVENT

name ID #REQUIRED

rtype Zrecordtype #REQUIRED

etype CDATA #REQUIRED

>

<!ELEMENT CALLBACK-EVENT - O EMPTY)
<!ATTLIST CALLBACK-EVENT

name ID #REQUIRED
ctype Zcbacktype #REQUIRED
View IDREF #IMPLIED
var IDREF #IMPLIED
>
<!ELEMENT CONDITION-EVENT - - (#PCDATA)>
<!ATTLIST CONDITION-EVENT
name ID #REQUIRED
infos IDREFS #IMPLIED

controls IDREFS #IMPLIED
>

<!ELEMENT EVENT-DECLARATIONS - - (DATA-EVENT]CALLBACK-EVENTICONDITION-EVENT)*>

<!ELEMENT VARIABLE - O EMPTY)
<!ATTLIST VARIABLE

name ID #REQUIRED

title CDATA #IMPLIED

type Zdatatype #REQUIRED

init CDATA #IMPLIED

gui Zguitype #IMPLIED

gparam CDATA #IMPLIED

>
<!ELEMENT INFO-STRUCTURES - - (VARIABLE)*>
<!ELEMENT CONTROL-STRUCTURES - - (VARIABLE)*>
<!ELEMENT (Zinputtype) - - (#PCDATA)>

171

<!ATTLIST (Xinputtype)

name CDATA #REQUIRED

>

<!ELEMENT (Xviewcontrol) - O EMPTY)

<!ATTLIST OPEN-VIEW
view CDATA

<!ATTLIST CLOSE-VIEW
view CDATA

<!ATTLIST SCROLL
view CDATA

direction Zscrolldir
valtype staltype

value CDATA
>

<!ATTLIST RESIZE
view CDATA

window NUMBERS

>

<!ATTLIST RESCALE
view CDATA
factors CDATA
>

<!ATTLIST VIEW-INFO
view CDATA
>

<!ATTLIST SNAPSHOT
view CDATA
filename CDATA

<!ATTLIST TEXT-EXTENTS
content CDATA
font CDATA
halign CDATA
valign CDATA

#IMPLIED)

#IMPLIED)

#IMPLIED

#REQUIRED
#REQUIRED
#REQUIRED

#IMPLIED
#REQUIRED

#IMPLIED
#REQUIRED

#IMPLIED

#IMPLIED
#IMPLIED)

#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED)

<!ELEMENT (Zgraphtype) - 0 EMPTY)

<!ATTLIST POINT
view CDATA

#IMPLIED

172

coords
color
adapt
clip

>

<!ATTLIST LINE

view
from
to
thick
color
adapt
clip
>

view
from
to
color
fillc
adapt
clip
>

view
points
color
fillc
adapt
clip

>

<!ATTLIST ARC

view
center
radii
angles
color
fillc
adapt
clip

>

CDATA
CDATA
Xboolval
chipval

CDATA
CDATA
CDATA
CDATA
CDATA
Zboolval
chipval

<!ATTLIST RECTANGLE

CDATA
CDATA
CDATA
CDATA
CDATA
Zboolval
Xclipval

<!ATTLIST POLYGON

CDATA
CDATA
CDATA
CDATA
Zboolval
chipval

CDATA
CDATA
CDATA
CDATA
CDATA
CDATA
Xboolval
chipval

<!ATTLIST ELLIPSE

#REQUIRED
#IMPLIED
#IMPLIED
#IMPLIED

#IMPLIED
#REQUIRED
#REQUIRED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED

#IMPLIED
#REQUIRED
#REOUIRED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED

#IMPLIED
#REQUIRED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED

#IMPLIED
#REOUIRED
#REQUIRED
#REQUIRED
#IMPLIED
#IMPLIED
#IMPLIED
#IMPLIED

173

view CDATA #IMPLIED
center CDATA #REQUIRED
radii CDATA #REQUIRED
angle CDATA #IMPLIED
color CDATA #IMPLIED
fillc CDATA #IMPLIED
adapt Zboolval #IMPLIED
clip chipval #IMPLIED

>

<!ATTLIST TEXT

View CDATA #IMPLIED
content CDATA #REQUIRED
coords CDATA #REQUIRED
halign CDATA #IMPLIED
valign CDATA #IMPLIED
fcolor CDATA #IMPLIED
bcolor CDATA #IMPLIED
font CDATA #IMPLIED
adapt Zboolval #IMPLIED
clip chipval #IMPLIED
>

<!ATTLIST FIGURE
view CDATA #IMPLIED
filename CDATA #REQUIRED
orig-origin NUMBERS #IMPLIED
orig-extents NUMBERS #IMPLIED
world-origin CDATA #IMPLIED
world-extents CDATA #IMPLIED
adapt Xboolval #IMPLIED
clip Xclipval #IMPLIED

>
<!ELEMENT OVERRIDE-FIELD - O EMPTY)
<!ATTLIST OVERRIDE-FIELD

name CDATA #REQUIRED
new-value CDATA #REQUIRED
>
<!ELEMENT PREPROCESS-INPUTS - - (OVERRIDE-FIELD)*>
<!ELEMENT DESCRIPTION - - (#PCDATA))
<!ELEMENT EP-COMPONENT - - ((PREPROCESS-INPUTS)*, DESCRIPTION, (Zinputtype)*)>
<!ATTLIST EP-COMPONENT
name NAME #REQUIRED
inputs NAMES #IMPLIED

174

infos IDREFS #IMPLIED
controls IDREFS #IMPLIED
>

<!ELEMENT END-WITH - - ((ZviewcontrollZgraphtypel#PCDATA)*)>
<!ELEMENT IR-COMPONENT - - (#PCDATA, DESCRIPTION,

(Zviewcontrolngraphtypel#PCDATAIEND-WITH)*)>
<!ATTLIST IR-COMPONENT

name NAME #REQUIRED
views IDREFS #IMPLIED
infos IDREFS #IMPLIED

controls IDREFS #IMPLIED
refresh NAMES #IMPLIED
enable CDATA #IMPLIED
buffer Zboolval #IMPLIED
bcolor CDATA #IMPLIED
>

<!ELEMENT EVENT-PROCESSING - — (EP-COMPONENT)*>
<!ELEMENT INFO-RENDITION - — (IR-COMPONENT)*>

<!ELEMENT VIEW - - (#PCDATA, DESCRIPTION, (ZviewcontrollxgraphtypeI#PCDATA)*)>

<!ATTLIST VIEW
name ID #REQUIRED
title CDATA #IMPLIED
infos IDREFS #IMPLIED
controls IDREFS #IMPLIED
window NUMBERS #REQUIRED
world CDATA #REQUIRED
margins CDATA #IMPLIED
fcolor CDATA #IMPLIED
bcolor CDATA #IMPLIED
adirs NAMES #IMPLIED
)

<!ELEMENT UTILITY-CODE - - (#PCDATA)>
<!ATTLIST UTILITY-CDDE
infos IDREFS #IMPLIED
controls IDREFS #IMPLIED
>

<!ELEMENT VIEW-INITIALIZATIONS - - (VIEW)+>

<!ELEMENT VISUAL-OBJECT - -
(EVENT-DECLARATIONS,

175

INFO-STRUCTURES,
CONTROL-STRUCTURES,
(UTILITY-CODE)*,
VIEW-INITIALIZATIONS,
EVENT-PROCESSING,
INFO-RENDITION)
>
<!ATTLIST VISUAL-OBJECT

name ID #REQUIRED
type (asynclsync) #REQUIRED
aqual NAME #IMPLIED

>

<!ELEMENT BODY O O (VISUAL-OBJECT)*>

 

 

 

<!-- Document Structure
<!ENTITY X version.attr "VERSION CDATA #FIXED ’%VOML.Version;’">

<!ELEMENT VOML O 0 (HEAD, BODY))
<!ATTLIST VOML Zversion.attr;>

176

BIBLIOGRAPHY

177

l1]

[2]

l3]

[4]

l7]

[8]

[9]

Bibliography

Gul A. Agha. Actors: A Model of Concurrent Computation in Distributed
Systems. MIT Press, 1986.

Rajeev Alur and David L. Dill. A theory of timed automata. Theoretical
Computer Science, 126(2):183—235, April 25 1994.

Corinne Ancourt et al. Automatic data mapping of signal processing applica-
tions. In Proceedings of the International Conference on Application-Speciﬁc
Systems, Architectures and Processors (ASAP’97), July 14—16 1997.

Ruth A. Aydt. The Pablo self-deﬁning data format. Technical report, Depart-
ment of Computer Science, University of Illinois, Urbana, Illinois 61801, April
11 1995.

R. Bahgat and R. Yang. A distributed repair-based technique for constraint
satisfaction. In Proceedings of the 7th International Conference on Intelligent
Systems, July 1—3 1998.

Aleksandar Bakic’, Matt W. Mutka, and Diane T. Rover. Real-time system per-
formance visualization and analysis using distributed visual objects. In Proceed-
ings of the IEEE Workshop on Middleware for Distributed Real- Time Systems
and Services, December 2 1997.

Aleksandar Bakié, Matt W. Mutka, and Diane T. Rover. Performance opti-
mization of distributed applications in an extensible, adaptive environment.
Accepted for publication in Elsevier Science Future Generation Computer Sys-
tems, special issue on performance data mining, 1999.

Aleksandar M. Bakic’ and Matt W. Mutka. A compiler-based approach to design
and engineering of complex real-time systems. In Proceedings of the 19th IEEE

International Conference on Distributed Computing Systems, May 31-June 5
1999.

Aleksandar M. Bakié, Matt W. Mutka, and Diane T. Rover. BRISK: A portable
and ﬂexible distributed instrumentation system. In Proceedings of the IEEE'
2nd Merged Symposium IPPS/SPDP 1999 — 13-th International Parallel Pro-

cessing Symposium 63 10-th Symposium on Parallel and Distributed Processing,
April 12—16 1999.

178

[10] Aleksandar M. Bakic’, Matt W. Mutka, and Diane T. Rover. An on-line per-
formance visualization technology. In Proceedings of the IEEE Heterogeneous
Computing Workshop, in conjunction with the 2-nd Merged Symposium IPP—
S/SPDP 1999 — 13-th International Parallel Processing Symposium 63 10-th
Symposium on Parallel and Distributed Processing, April 12—16 1999.

[11] Vera Bakic’ and George Stockman. Menu selection by facial aspect. In Proceed-
ings of the 12th Conference on Vision Interface ’99. Canadian Image Processing
and Pattern Recognition Society, May 1999.

[12] Michael Barabanov. A Linux-based real-time operating system. Master’s thesis,
New Mexico Tech, 1997. [Online] Available http: //www.rt1inux.org/.

[13] Renate Beckmann, Ulrich Bieker, and Ingolf Markhof. Application of constraint
logic programming for VLSI CAD tools. In Proceedings of the Conference Con—
straints in Computational Logics, September 1994.

[14] S. Bennett. Real—Time Computer Control: An Introduction. Prentice Hall,
1994.

[15] G. Bernat and A. Burns. Combining (n m)-hard deadlines with dual priority
scheduling. In Proceedings of the 18th IEEE Real- Time Systems Symposium,
December 1997.

[16] Peter J. Bickel and Kjell A. Doksum. Mathematical Statistics—Basic Ideas and
Selected Topics. Prentice Hall, 1977.

[17] D. Bobrow, L. DiMichiel, R.P. Gabriel, S. Keene, G. Kiczales, and D. Moon.
A Common LISP object system speciﬁcation: X3J 13 document 88—002R. SIG-
PLAN Notices, 23, September 1988.

[18] Robert Bosch, Chris Stolte, Diane Tang, John Gerth, Mendel Rosenblum, and
Pat Hanrahan. Rivet: A ﬂexible environment for computer systems visualiza-
tion. Computer Graphics, 34(1), February 2000.

[19] Per Bothner. Kawa - compiling dynamic languages to Java VM. In Proceedings
of the 1998 USENIX Annual Technical Conference, New Orleans, 1999. [Online]
Available http: / / www. gnu.org / software / kawa / .

[20] Don Box. Essential COM. Addison-Wesley, January 1998.

[21] A. Burns and A.J. Wellings. HRT-HOOD: A structured design method for hard
real-time Ada systems. Real- Time Safety Critical Systems, 3, 1995.

[22] Saurav Chatterjee and Jay Strosnider. Distributed pipeline scheduling: A
framework for distributed, heterogeneous real-time system design. The Com-
puter Journal, 38(4):271—285, 1995.

179

[23] Sheng-Tzong Cheng. Scheduling and Allocation in Multiprocessor Systems. PhD
thesis, Department of Computer Science, University of Maryland, 1995.

[24] S. E. Chodrow, F. Jahanian, and M. Donner. Run-time monitoring of real-time
systems. In Proceedings of the Real- Time Systems Symposium, pages 74—83,
December 4—6 1991.

[25] James Clark. sgmls: A validating SGML parser, 1993. [Online] Available
ftp://ftp.jclark.com/pub/sgmls/.

[26] William Clinger and Jonathan Rees, editors. Revised“) Report on the Algo-
rithmic Language Scheme. IEEE, November 2 1991.

[27] Object Management Group Technical Committee. Realtime CORBA 1.0 RF -
P, 1999. [Online] Available http://www.omg.org/techprocess/meetings/-
schedule/Realtime_CORBA_1 . 0_RFP.htm1.

[28] Corman, Lieserson, and Rivest. Introduction to Algorithms. McGraw-Hill, 1990.

[29] F. Cristian. Probabilistic clock synchronization. Distributed Computing, 3:146-
158, 1989.

[30] Paul S. Dodd and Chinya V. Ravishankar. Monitoring and debugging distribut-
ed real-time programs. Software—Practice and Experience, 22(10):863—877, Oc-
tober 1992.

[31] Jack Dongarra et al. MPI: A message-passing interface standard. Technical
report, Oak Ridge National Laboratory, June 1995.

[32] G. Giest et al. PVM 3.0 User’s Guide and Reference Manual. ORNL/TM-
12187, February 1993.

[33] Laurent Fribourg. A closed-form evaluation for extended timed automata. Tech-
nical Report LSV—98—2, Laboratoire Speciﬁcation et Veriﬁcation, Ecole Nor-
male Supérieure de Cachan, 61, avenue du President Wilson, 94235 Cachan,
Cedex, France, March 1998.

[34] Mehldau Frick, Ludwig. A fast adaptive layout algorithm for unidirected graphs.
In Proceedings of Graph Drawing ’94. Springer-Verlag, 1994. [Online] Avail-
able http://i44www.info.uni-karlsruhe.de/“frick/gd/gd94p.ps.gz and
ftp://ftp.uni-passau.de:/pub/loca1/graphed.

[35] Markus P.J. Fromherz and John Conley. Issues in reactive constraint solving. In
Proceedings of the Workshop on Concurrent Constraint Programming for Time
Critical Applications - COTIC 97, November 1997.

[36] Thorn Friihwirth, Alexander Herold, Volker K'éhenhoff, Thierry Le Provost,
Pierre Lim, Eric Monfroy, and Mark Wallace. Constraint logic programming—
an informal introduction. Technical report, European Computer-Industry Re-
search Center, Arabellastr. 17, D-8000 Miinchen 81, Germany, 1992.

180

[37]

[38]

[39]

[40]

[41]

[42]
[43]

[44]

[45]

[46]

[47]

[48]

[49]

Thom Friiwirth. Temporal annotated constraint logic programming. Journal

of Symbolic Computation, Special issue on Executable Temporal Logics, 22:555—
583, 1996.

FWEB. A literate programming tool. The FTP server of the Plasma
Physics Laboratory, Princeton University (PPPL), 1995. [Online] Available
ftp://ftp.pppl.gov/pub/fweb/.

R. Gerber, S. Hong, D. Kang, and M. Saksena. Formal Methods in Real— Time
Computing, chapter End-to—End Design of Real-Time Systems, pages 237—265.
John Wiley & Sons, 1996.

Martin Gergeleit, Jorg Kaiser, and Hermann Streich. Checking timing con-
straints in distributed object-oriented programs. OOPS Messenger, 7(1):51—-58,
January 1996.

David E. Goldberg. Genetic Algorithms, chapter Computer Implementation of
a Genetic Algorithm, pages 85—86. Addison-Wesley Publishing Company, Inc.,
1989. '

Charles F. Goldfarb. The SGML Handbook. Oxford University Press, 1990.

Weiming Gu, Grep Eisenhauer, Eileen Kramer, Karsten Schwan, John Stasko,
and Jeffrey Vetter. Falcon: On-line monitoring and steering of large-scale paral-
lel programs. Technical Report GIT-CC-94—21, Georgia Institute of Technology,
1994.

Weiming Gu, Jeffrey Vetter, and Karsten Schwan. An annotated bibliography of
interactive program steering. Technical report, College of Computing, Georgia
Institute of Technology, November 1 1994.

Christophe Guettier. Global Optimisation of DSP Application Mapping onto
Parallel Architectures using Constraint Logic Programming. PhD thesis, Ecole
des Mines de Paris, Boulevard Saint Michel, Paris, France, December 12 1997.

G. Gupta and E. Pontelli. A constraint-based approach for speciﬁcation and
veriﬁcation of real-time systems. In Proceedings of the 18th IEEE Real- Time
Systems Symposium, December 2-5 1997.

Dieter Haban and Dieter Wybranietz. A hybrid monitor for behavior and per-
formance analysis of distributed systems. IEEE Transactions on Software En-
gineering, 16(2):197—211, February 1990.

Bruno Haible. CLISP, a Common LISP implementation, 1997. [Online] Avail-
able http : //clisp . cons . org/“haible/clisp . html.

Michael T. Heath and Jennifer Etheridge Finger. ParaGraph: A tool for visu-
alizing performance of parallel programs. Technical report, Oak Ridge National
Laboratory, Oak Ridge, TN, 1994.

181

[50]

[51]

[55]

[57]

[58]

[59]

[60]

[61]

[6?]

Harold H. Hersey, Steven T. Hackstadt, Lars T. Hansen, and Allen D. Malony.
Viz: A visualization programming system. Technical Report CIS—TR-96—05,

Department of Computer and Information Science, University of Oregon, Eu-
gene, OR 97403—1202, April 1996;

T. Hogg. Phase transitions in constraint satisfaction search, 1997. [On-
line] Available http : //www . parc . xerox . com/spl/groups/dynamics/www/-
constraints.html.

Christian Holzbaur. OFAI clp(Q,R) manual. Technical report, Austrian Re-
search Institute for Artiﬁcial Intelligence (OFAI), Schottengasse 3, A-1010 Vi-
enna, Austria, December 1995.

E. Hyvéinen. Constraint reasoning based on interval arithmetic. In Proceedings
of the 11th International Joint Conference on Artiﬁcial Intelligence, 1989.

Sun Microsystems Inc., editor. Solaris 2.4 Software Developer Answerbook. Sun
Microsystems, Inc., 1995.

ISR. What is systems engineering? Technical report, The Institute for Sys-
tems Research, University of Maryland, 1999. [Online] Available http://www . -
isr.umd.edul.

Joxan Jaffar and Michael J. Maher. Constraint logic programming: A survey.
Journal of Logic Programming, 19/202503—581, 1994.

F. Jahanian and A. Goyal. A formalism for monitoring real-time constraints
at run-time. In Proceedings of the IEEE Fault- Tolerant Computing Symposium,
pages 148—155, June 1990.

F. Jahanian, R. Rajkumar, and S. Raju. Runtime monitoring of timing con-
straints in distributed real—time systems. Real- Time Systems, 7(3):247—274,
November 1994.

Farnam J ahanian and Aloysius Mok. Safety analysis of timing properties in real-
time systems. IEEE Transactions on Software Engineering, SE—12(9):890—904,
September 1986.

Farnam Jahanian and Aloysius K. Mok. A graph-theoretic approach for timing
analysis and its implementation. IEEE Transactions on Computers, C—36(8),
August 1987.

Raj Jain. The Art of Computer Systems Performance Analysis. John Wiley &
Sons, Inc., 1991.

E. Douglas Jensen. Some of my perspectives on real-time computer systems,
1995. [Online] Available http://www.rea1time-os.com/realtime.htm1/.

182

[631

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

E. Douglas Jensen. A real-time manifesto, 1996. [Online] Available
http://www.realtime-os.com/rtmanifesto/rtmani_0.html.

D. Kandlur, K.G. Shin, and D. Ferrari. Real-time communication in multi—
hop networks. IEEE Transactions on Parallel and Distributed Systems, pages
1044—1056, October 1994.

PC. Kanellakis and DC). Goldin. Constraint programming and database query
languages. In Proceedings of the Symposium on Theoretical Aspects of Computer
Software, number 789 in Lecture Notes in Computer Science, pages 96-120,
April 1994.

Dong-In Kang, Richard Gerber, and Manas Saksena. Performance-based design
of distributed real-time systems. In Proceedings of the Third IEEE Real- Time
Technology and Applications Symposium, 1997.

Doug Kimelman and Dror Zernik. On-the—ﬂy topological sort—a basis for in-
teractive debugging and live visualization of parallel programs. In Summary of
the ONR/ACM Workshop on Parallel and Distributed Debugging, May 17—18
1993.

David G. Korn. UWIN—UNIX for Windows. Technical report, AT&T Re-
search, 1998. [Online] Available http://www.research.att.com/sw/tools/-
uwin/.

S. Lakmazaheri and W. Rasdorf. Constraint logic programming for the analysis
and partial synthesis of truss structures. Artiﬁcial Intelligence for Engineering
Design, Analysis, and Manufacturing, 3(3):157—173, 1989.

Frank Lange, Reinhold Kroeger, and Martin Gergeleit. JEWEL: Design and
implementation of a distributed measurement system. Technical report, German
National Research Center for Computer Science (GMD), P. O. Box 1316, Schloss
Birlinghoven, D—5205 St. Augusting 1, Federal Republic of Germany, April
1992.

J. L. Lassez and J. Jaffar. Constraint logic programming. In Proceedings of
the 14th ACM Symposium on Principles of Programming Languages (POPL),
pages 111—119, January 1987.

Don Libes. Exploring Expect. O’Reilly & Associates ISBN 1—56592—090-2, 1994.

Thomas Lord. An anatomy of Guile/the interface to Tcl/Tk. In Proceedings
of Usenix Tcl/Tk Workshop ’95, 1995. [Online] Available http://www.red--
bean.com/guile/.

ObjecTime Ltd. Embedded software solutions for the connected world, 1999.
[Online] Available http : / /www . object ime . com/ .

183

[75] Shikharesh Majumdar. Application of relational arithmetic in performance
analysis of computing systems. In Workshop on Interval Constraints (Inter-
national Logic Programming Symposium ILPS ’95), December 1995.

[76] Allen D. Malony, Daniel A. Reed, and Harry A.G. Wijshoff. Performance mea-
surement intrusion and perturbation analysis. IEEE Transactions on Parallel
and Distributed Systems, 3(4):433—450, July 1992.

[77] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent
System’s Speciﬁcation. Springer-Verlag, 1992.

[78] Paolo Mantegazza. DIAPM-RTAI for Linux: WHYS, WHATS and
HOWS. In Proceedings of the Real-Time Linux Workshop, Novem-
ber 1999. [Online] Available http://www.thinkingnerds.com/projects/-
rtl-ws/presentations.html.

[79] Barton P. Miller, Jonathan M. Cargille, R. Bruce Irvin, Krishna Kunchitha—
padam, Mark D. Callagha, Jeffrey K. Hollingsworth, Karen L. Karavanic, and
Tia Newhall. The Paradyn parallel performance measurement tools. IEEE
Computer, November 1995.

[80] Barton P. Miller, Cathryn Macrander, and Stuart Sechrest. A distributed
programs monitor for Berkeley UNIX. Software—Practice and Experience,
16(2):183—200, February 1986.

[81] K. Mok and F. Jahanian. Modecharts: A Specification language for real-time
systems. IEEE Transactions on Software Engineering, 20(12):933—947, Decem-
ber 1994.

[82] Thomas J. Mowbray and Ron Zahavi. The Essential CORBA—System In-
tegration Using Distributed Objects. The Object Management Group ISBN
0—471—10611-9, 1997.

[83] Peter Murray-Rust. Chemical markup language, 1997. [Online] Available
http:/lwww.venus.co.uk/omf/cml/.

[84] Matt W. Mutka and Jong—Pyng Li. A tool for allocating periodic real-time
tasks to a set of processors. Journal of Systems and Software, Vol. 24, pages
135-148, 1995.

[85] Brian Nielsen, Shangping Ren, and Cu] Agha. Speciﬁcation of real-time in-
teraction constraints. In Proceedings of the First International Symposium on
Object-Oriented Real- Time Computing, 1998.

[86] Christor H. Papadimitriou. Computational Complexity, chapter Reductions and
Completeness, page 159. Addison-Wesley Publishing Company, 1994.

184

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

D. T. Peng, K. G. Shin, and T. F. Abdelzaher. Assignment and scheduling of
communicating periodic tasks in distributed real-time systems. IEEE Transac-
tions on Parallel and Distributed Systems, 8(12), December 1997.

Amir Pnueli. The temporal logic of programs. In The Proceedings of the 18th
Annual Symposium on Foundations of Computer Science. IEEE, 1977.

D. Priddin and A. Burns. Integrating real-time structured design and formal
techniques. In Proceedings of the Symposium on Formal Techniques in Real-
Time and Fault- Tolerant Systems, number 1486 in Lecture Notes in Computer
Science, pages 92-102, September 14—18 1998.

D. Reed, K. Shields, W. Scullin, L. Tavera, and C. Elford. Virtual reality and
parallel systems performance analysis. IEEE Computer, 28(11):55—67, Novem-
ber 1995.

Daniel A. Reed, Ruth A. Aydt, Tara M. Madhyastha, Roger J. Noe, Keith A.
Shields, and Bradley W. Schwartz. An overview of the Pablo performance analy-
sis environment. Technical report, Department of Computer Science, University
of Illinois, Urbana, Illinois 61801, November 7 1992.

Daniel A. Reed, Ruth A. Aydt, Tara M. Madhyastha, Roger J. Noe, Keith A.
Shields, and Bradley W. Schwartz. An overview of the Pablo performance analy-
sis environment. Technical report, Department of Computer Science, University
of Illinois, Urbana, Illinois 61801, November 7 1992.

Shangping Ren, Gul A. Agha, and Masahiko Saito. A modular approach for
programming distributed real-time systems. Journal of Parallel and Distributed
Computing, Special Issue on Object-Oriented Real-Time Systems, 36(1):4—12,
July 10 1996.

Kay Réimer and Arno Puder. MICO: CORBA 2.0 implementation. Technical
report, Computer Science Department, University of Frankfurt, Germany, 1997.

[Online] Available http:/ldiamant-atm.vsb . cs .uni-frankfurt . de/"mico/.
Alessandro Rubini. Linux Device Drivers. O’Reilly & Associates, 1998.

Hani El Sakkout and Stefano Novello. Repair through conﬂict detection in
ECLIPSe. Technical report, IC-Parc, University of London, UK, June 1997.

Manas Saksena and Seongsoo Hong. An engineering approach to decomposing
end—to—end delays on a distributed real—time system. In Proceedings of the IEEE
Workshop on Parallel and Distributed Real- Time Systems, pages 244—251, 1996.

Klaus Schild and Jbrg Wiirtz. Oﬂ-line scheduling of a real-time system. In
Proceedings of the Symposium on Applied Computing, 1998.

185

 

[99] Joachim Schrod. SGML transformations in LISP. Technical report, Com-
puter Science Department, Technical University of Darmstadt, Kranichweg 1,
D-63322 Roedermark, FR Germany, 1995.

[100] Bran Selic. Turning clockwise: Using UML in the real-time domain. Commu-
nications of the ACM, 42(10):46—54, October 1999.

[101] Tri-Paciﬁc Software. PERTS: The art of modeling real-time systems, 1998.
[Online] Available http : //www.tripac . com/perts/.

[102] Marco Spuri and John A. Stankovic. How to integrate precedence constraints
and shared resources in real-time scheduling. IEEE Transactions on Computers,
43(12):1407—1412, 1994.

[103] Jack Stankovic et al. Real-time computing: A critical enabling technology. [On-
line] Available ftp: //ftp.cs .umass .edu/pub/ccs/spring/critenable .ps.

[104] John A. Stankovic. Distributed real-time computing: The next generation.
Journal of the Society of Instrument and Control Engineers of Japan, 1992.

[105] Susan Stepney. High Integrity Compilation—«A Case Study, chapter Introduc-
tion, page 4. Prentice Hall International (UK) Ltd., 1993.

[106] Douglas A. Stuart, Aloysius K. Mok, and Farnam J ahanian. A methodology and
support tools for analysis of real-time speciﬁcations. Submitted for publication.

[107] D. Subramanian and C-S Wang. Kinematic synthesis with conﬁguration spaces.
In Proceedings of Qualitative Reasoning 93, pages 228—239, 1993.

[108] Tanel Tammet. Hobbit: A Scheme-to-C compiler. Technical report, Depart-
ment of Computing Science, Chalmers University of Technology, University of
Goteborg, Sweden, 1995. [Online] Available http://www-swiss.ai .mit.edu/-
~jaffer/Hobbit.html.

[109] The VRML Consortium Inc. The virtual reality modeling language, 1997. [On-
line] Available http://www.vrml . org/.

[110] Oliver Theel and Felix C. Gartner. An exercise in proving convergence through

transfer functions. In Proceedings of the IEEE ICDCS Workshop on Self-
Stabilizing Systems, pages 41—47, June 5 1999.

[111] T.-S. Tia, Z. Deng, M. Shankar, M. Storch, J. Sun, L.-C. Wu, and J .W.-S. Liu.
Probabilistic performance guarantee for real-time tasks with varying computa-
tion times. In Proceedings of the IEEE Real- Time Technology and Applications
Symposium, 1995.

[112] Jeffrey J. P. Tsai and Steve J. H. Yang, editors. Monitoring and Debugging of
Distributed Real- Time Systems. IEEE Computer Society Press, 1995.

186

 

[113] André M. van Tilborg and Gary M. Koob, editors. Foundations of Real-
Time Computing: Scheduling and Resource Management, chapter Fixed Prior-
ity Scheduling Theory for Hard Real-Time Systems, page 7. Kluwer Academic
Publishers, 1991.

[114] Flavio M. Varejao, Markus P.J. Fromherz, Ana C. Bicharra Garcia, and
Clarisse S. de Souza. An integrated framework for the speciﬁcation and design of
reprographic machines. In Proceedings of the 13rd International Conference on
Applications of Artiﬁcial Intelligence in Engineering (AIENG ’98), July 1998.

[115] Visual Insights, Lucent Technologies. Software components, 1998. [Online]
Available http : //www . visual insights . com/ components/.

[116] Ioannis Vlahavas, Panagiotis Tsarchopoulos, and Ilias Sakellariou. Parallel and
Constraint Logic Programming: An Introduction to Logic, Parallelism and Con-
straints. Kluwer Academic Publishers, 1998.

[117] Abdul Waheed and Diane T. Rover. A structured approach to instrumenta-
tion system development and evaluation. Proceedings of Supercomputing ’95,
December 4—8 1995. To appear.

[118] Brent Welch. Practical Programming in Tcl and Tk. Prentice Hall ISBN 0—13—
182007-9, 1995.

[119] Lonnie R. Welch, Michael W. Masters, and Robert D. Harrison. Toward a 2lst
century shipboard computing infrastructure. Technical report, Naval Surface
Warfare Center, Dahlgren, Virginia, January 1996.

[120] Lonnie R. Welch, Binoy Ravindran, Behrooz A. Shirazi, and Carl Bruggeman.
Speciﬁcation and modeling of dynamic, distributed real-time systems. In Pro-
ceedings of the 19th IEEE Real- Time Systems Symposium, December 2-4 1998.

[121] David Wilner. WindView: A tool for understanding real-time embedded soft-
ware through system vizualization. In Proceedings of the ACM SI GPLAN Work-

shop on Languages, Compilers, 65 Tools for Real- Time Systems, pages 117—123,
July 21—22 1995.

[122] Patrick H. Worley. A new PICL trace ﬁle format. Technical Report ORNL/TM-
12125, Oak Ridge National Laboratory, Mathematical Sciences Section, PO.
Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367, September 1992.

[123] Jerry Yan. Performance tuning with AIMS—an automated instrumentation and
monitoring system for multicomputers. In Proceedings of the Twenty-Seventh
Hawaii International Conference on System Sciences, Hawaii, January 1994.

[124] Darko Zupanié. Optimal solution for a textile production unit. In Proceedings of

the Second International Conference on the Practical Application of Constraint
Technology, April 1996.

187

 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

’WWWWW