I: .2 l
inf.
31.))

“f.

.3!

a. .: :5
£3.23!

mm.
.

 

1....- 5,
3|!!-

guns} n

‘ 59.....ruﬁ
i. x 2.1:}!
4.! 3.5.3....

3.. n: n. In

thuh‘aﬁﬂ

1):»..1 ‘
:5 .2.»

.15!
.ﬁl|).‘
1 (1.3-...i

. 2. ‘ . .3513:

:ny.|)s..o.au§i ‘53.

.. {550.511 n‘nvx‘vuav
ax.:§.c.....=ia
i; .. 115i

talk}... r!) v
got.) :5).
3 .l I.

@mﬁwﬂ érﬁkzwm

V. . ‘1: . \

 

 

THESE

i. IIIIIIIIIIIIIIIIIIIIIII III III III“

1293 0138 87 9899

This is to certify that the

dissertation entitled

The Integration of an On-line Parallel Debugger
with a Visualization Methodology
for Modeling Expected Behavior

presented by
Joseph L. Sharnowski

has been accepted towards fulﬁllment
of the requirements for

PhD degree in Computer Sc ience

“5275614 “(C/W

lJlajor professor

Date (71/91/05

MSU is an Afﬁrmative Actinnr’EquaI Opportunity Institution 0.12771

 

 

LIBRARY
Michigan State
University

 

 

 

PLACE N RETURN BOXto romovo thio chockout from your rooord.
TO AVOID FINES rotum on or botoro duo duo.

DATE DUE DATE DUE DATE DUE

     

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

usu loAnAfﬂnnotivo Action/Emu Oppommﬂylnoﬂtuionm *

 

THE INTEGRATION OF AN ON-LINE
PARALLEL DEBUGGER WITH A
VISUALIZATION METHODOLOGY FOR
MODELING EXPECTED BEHAVIOR

By

Joseph L. Shamowskz'

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Computer Science Department

1995

ABSTRACT

THE INTEGRATION OF AN ON-LINE
PARALLEL DEBUGGER WITH A
VISUALIZATION METHODOLOGY FOR
MODELING EXPECTED BEHAVIOR

By

Joseph L. Sharnowskz'

Post-mortem visualizations of program execution are useful for debugging the
complex behavior of parallel programs. However, the effectiveness of the visualizations
is limited by the characteristics of the information that they present. First, the
monitor intrusion and data storage constraints restrict the amount of data that
may be recorded during the program execution, and thus restrict the corresponding
information presented in the visualizations. Second, the type of information presented
by visualizations often limits how well their representations match the programmer’s
conceptual model of the expected program behavior. The models used for visualizing
program behavior typically have focused on depicting low-level events, such as
message-passing or procedure calls. These models impose upon the programmer the
burden of establishing a match between the graphically-displayed events and the
patterns of expected behavior for the program.

This dissertation presents newly developed solutions that address these two

problems. In order to relax the problems associated with the acquisition of data, we
use an on-line approach to debugging in contrast to a post-mortem strategy. Based on
this approach, the ﬁrst main contribution of this research is the development of new
strategies for applying visualization to the operation of an on-line parallel debugger.
In particular, we developed a visualization-based approach for the selection of causal
distributed breakpoints. In addition, we developed several techniques for improving
the scalability of visualizations that represent process communication and integrated
these techniques into a visualization-based debugging environment for supporting
top-down examination of program states.

The other main contribution of this research is the development of a methodology
for modeling visualizations based on the expected behavior of the program, where the
model of expected behavior is generated from the program’s formal specification. In
particular, we developed a technique for modeling expected behavior in visualizations
of communication between processing elements, as well as techniques for illustrating
the current location in a program execution in terms of the formal specification. In
addition, a technique has been developed for using the formal specifications of a

program’s data structures to guide the generation of data visualizations.

Copyright © by

Joseph L. Sharnowski
1995

Dedicated to the memory of my grandmothers,
Helen Lewandowski and Julia Sharnowski,

whose faith in God was an inspiration to everyone around them.

ACKNOWLEDGMENTS

I would sincerely like to thank my advisor, Dr. Betty H. C. Cheng, for all of her
help and guidance during my Ph.D. program. Beyond providing direction, she also
was an endless source of encouragement. It has indeed been both a pleasure and an
honor to have been her student.

I thank the other members of my committee, Dr. Lionel M. Ni, Dr. Diane Thiede
Rover, and Dr. David Yen, for their efforts on my behalf. I also wish to thank
NSF, as this work has been supported in part by the NSF Grants CCR-9209873 and
OCR-9407318.

I wish to thank my family and friends for always being there for me. In particular,
I thank my parents for being both supportive and for being ideal role models. I also
wish to thank my Goddaughter Sarah, for being a source of both pride and joy. Last,

but hardly least, I am deeply grateful to Rana, for everything she has added to my
life, and all she will still add.

vi

TABLE OF CONTENTS

LIST OF FIGURES

1

Introduction
1.1 Motivations ................................
1.2 Research Contributions ..........................

1.3 Organization of Dissertation .......................

Background Information
2.1 Specifying Parallel Programs with LOTOS ...............
2.2 Cholesky Factorization Program .....................

Overview of the GOLD Debugging Framework
3.1 A Top-down Debugging Procedure ...................
3.2 The General Architecture of GOLD ...................

Visualization-based Breakpointing
4.1 Distributed Breakpoints .........................
4.2 Local Breakpoints .............................

Scalable Visualization Techniques
5.1 Handling Large Numbers of Processes ..................

5.2 Handling a Large Degree of Process Communication ..........

Visualization-based Examination of Program States

6.1 Types of Visualizations Supported by GOLD ..............
6.2 Top-down Coordination of Visualizations ................
The Modeling of Expected Behavior in Program Visualizations
7.1 Overview of Panorama ..........................
7.2 Data Collection Step ...........................

7.3 Graphical Depiction Step .........................
7.4 Debugging Example: Cholesky Factorization ..............

vii

ix

tomb—‘H

10
10
21

25
26
27

34
34
43

46
46
48

54
54
64

69
70
71
77
80

7.5 Integration with GOLD ......................... 84

8 Data Visualizations from Formal Speciﬁcations 95
8.1 The Cell Model .............................. 95
8.2 Mapping Speciﬁcations to the Cell Model ................ 99
8.3 An Example Application ......................... 104
8.4 Improvement to the Mapping Strategy ................. 107

9 Related Work 110
9.1 On-line Parallel Debuggers Supporting Visualization .......... 110
9.2 Models for Visualization of Program Execution ............ 117

10 Conclusions and Future Investigations 123

BIBLIOGRAPHY 127

viii

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8

3.1
3.2
3.3
3.4

4.1
4.2
4.3
4.4
4.5
4.6

5.1
5.2
5.3
5.4
5.5

6.1
6.2
6.3
6.4

LIST OF FIGURES

Simple parallel computing environment ................. l5
Speciﬁcation for transmitting a number between P1 and P2 ..... 16
Behavior expression for the overall speciﬁcation ............ 18
Recursive creation of worker nodes ................... 19
Process deﬁnition for a channel ..................... 19
Speciﬁcation of nodes for the number-doubling program ........ 20
Header and behavior expression for For_Sub-Main-Loop ........ 23
Speciﬁcation of the For_Sub-Communication process .......... 24
The central operations interface ..................... 28
The architecture of GOLD ........................ 29
The local operations interface ...................... 31
The status grid window .......................... 33
Stopping global execution using a triggering breakpoint ........ 36
Example of a causal distributed breakpoint ............... 37
Interface for insertion of causal distributed breakpoints ........ 39
Overlapping causal distributed breakpoints ............... 42
Example of a non-causal distributed breakpoint ............ 44
Interface for insertion of local breakpoints ............... 45
Popup used to ﬁlter the processes .................... 47
Space—time diagram before process-ﬁltering has been applied ..... 49
Space-time diagram after process-ﬁltering has been applied ...... 50
Example of a single point-of-view communication graph ........ 51
Modifying the time axis scale to “zoom in” ............... 53
Example of a call graph ......................... 55
Example of a source code listing ..................... 56
Textual display of data structures .................... 57
Interface for obtainng additional variable information ......... 59

ix

6.5
6.6
6.7
6.8
6.9
6.10
6.11

7.1
7.2

7.3
7.4
7.5
7.6
7.7
7.8

7.9

7.10
7.11
7.12
7.13
7.14

8.1
8.2
8.3
8.4
8.5

8.6
8.7

9.1

9.2

Graphical depiction of a one-dimensional array ............. 60

Graphical depiction of a two-dimensional array ............. 61
Interface for ﬁltering data values ..................... 62
Filtered version of a two-dimensional array ............... 62
Examination of changing values for a two-dimensional array ..... 63
Visualization hierarchy for examining a breakpoint state ....... 65
Options for examining a breakpoint state from a space-time diagram 66
General execution ﬂow of PANORAMA .................. 71
Subtree of processes for the speciﬁcation of the host node in the

number-doubling program ........................ 73
Example of the mapping procedure ................... 76
BC-graph of the execution of the number-doubling program ..... 79
BC-graph of only message-passing events ................ 82
BC-graph of both active processes and message-passing events . . . . 83
BC-graph of the ﬁltered events ..................... 84
Windows for speciﬁcation and source code corresponding to

questionable process ........................... 85
The interface for the improved mapping tool .............. 87
BC-graph interface for causal distributed breakpoints ......... 89
The ﬁltering/ clustering interface ..................... 90
BC-graph interface for GOLD ...................... 92
Modeling expected behavior in a call graph ............... 93
Comparison of speciﬁcation listing to source code listing ....... 94
Two-dimensional array trait ....................... 101
Technique for ﬁnding the dimension of an ADT from its trait ..... 103
Maximum match with the minimum cost ................ 104
Construction of the sort matrix from the match matrix ........ 105

A visualization of column sorting for the implementation of the
assignment algorithm ........................... 106
The visualization of the match matrix after the misbehaving column sort107

A data visualization for a counter .................... 108

Typical layout for an application-speciﬁc visualization of the sorting
problem .................................. 120

Sample class hierarchy of Voyeur visualizations ............. 121

CHAPTER 1

Introduction

Parallel computing offers a promising approach for achieving the high performance
required by computationally-intensive applications. The widespread use of this
approach, however, has been limited by the difﬁculties associated with the
development of parallel programs. This dissertation addresses one of the main

difﬁculties, speciﬁcally the task of debugging a parallel program.

1 .1 Motivations

Debugging a sequential program is a difﬁcult task, as it relies on insight for knowing
where to look for the cause of an error. Such insight is often only achieved after
years of experience in program development [1]. Debugging a parallel program
presents an even more difﬁcult challenge [2], as communication between processing
elements complicates the task of locating the cause of an error. In order to
simplify the development of parallel programs, support tools must be developed
for handling this additional complexity. Several obstacles make the development
of such tools difﬁcult, including the nondeterministic behavior of parallel programs,
where different executions may result in different sequences of interactions between

processing elements. In addition, parallel programs are also susceptible to the probe

effect, where an attempt to monitor a program’s behavior may actually change the

behavior by introducing delays that change the order of synchronizations.

1.1.1 Formal Methods

An indirect solution for addressing the difﬁculty of parallel debugging is to minimize
the number of errors that must be detected and corrected. The application of formal
methods to the process of software development provides a means for achieving this
solution. Formal methods are mathematically-based techniques that are used to
describe and reason about properties of software systems. The description of the
properties are presented using a notation called a formal speciﬁcation language, and
the document in which the properties are described is called a formal speciﬁcation
[3]. The formal speciﬁcation is an abstraction of the software system, where
the implementation details are intentionally omitted. Using formal speciﬁcations
facilitates the early evaluation of a software design through the use of formal reasoning
techniques [3, 4, 5]. Additional beneﬁts are that the well-deﬁned syntax and semantics
of formal speciﬁcation languages makes their manipulation amenable to automation.

The use of formal speciﬁcations is unfortunately unable to entirely eliminate the
possibility of errors in the implementation. For example, even if the speciﬁcation
accurately represents a problem, the process of constructing an implementation for the
speciﬁcation is subject to human coding errors. In particular, the expected behavior of
the program, as described by the formal speciﬁcation, may be inconsistent with the
actual behavior, as revealed during the program’s execution. Thus, despite the beneﬁts
of using formal methods for program development, we must still address the problem
of the additional complexity associated with parallel debugging, as debugging tools are
still necessary for eliminating errors that are introduced during the implementation

stage.

1.1.2 Using Visualization to Debug Errors

Visualization has been shown to be an effective approach for representing the complex
behavior of a parallel program [6, 7, 8, 9, 10, 11]. Large quantities of event data from
the program execution can be encapsulated in compact graphical representations.
Such visualizations convey program execution from a global perspective, where
the communication between processing elements is depicted graphically. These
visualizations reveal patterns and discrepancies in the event data more readily than
corresponding textual output.

One major difﬁculty in the use of visualization is that the nondeterministic
behavior of parallel programs and the probe effect have both hindered the
development of on-line visualization-based debugging tools. Such tools are intended
to visualize the execution of the program as it is running, but, in the case of the
probe effect, in particular, the overhead required to perform the visualization may
signiﬁcantly alter the execution behavior of the program. The typical approach
to building visualization—based debugging tools has been, instead, to use a post-
mortem strategy, where trace data of important events is collected during program
execution, but the graphical depiction of the data is performed off-line after execution
is complete. By postponing the depiction of the event data until after the program
execution completes, the probe effect is not as large as compared to when the
visualizations are rendered on-line. The disadvantage to this approach, though, is
that the amount of collected data must be limited, else the probe effect will become
a signiﬁcant factor, and, in addition, the generated data may potentially exceed the
available storage.

Recent advances have been made in the area of replay techniques, where
deterministic re-execution of a parallel program is guaranteed by forcing

communication between processing elements to occur in the same order as in the

original execution [12, 13, 14, 15]. Such techniques are useful for the construction
of on-line debuggers, where cyclic debugging strategies focus on running a program
repeatedly, each time making additional progress towards pinpointing the location of
an error. Since a cyclic debugging approach is only meaningful when deterministic
execution is guaranteed, the use of replay techniques for providing such determinism
is beneﬁcial. In addition, since the replay mechanism guarantees that re-execution
of the program will be deterministic, the probe effect is prevented from altering the
behavior of the program.

Replay techniques facilitate the development of visualization-based, on-line
debuggers. Since execution behavior is not affected by the probe effect, on-line
visualization is able to produce meaningful results. An additional beneﬁt of this
approach is that data storage constraints are no longer a factor since the data is
being visualized as it is generated, in contrast to a post-mortem strategy where the
data is ﬁrst stored and then visualized after program execution is complete. Currently,
however, the use of visualization in on-line debugging environments has been limited
to the depiction of communication between processing elements [16, 17, 18, 19, 20],
mainly because replay-based techniques have only recently become well-established.

Another major difﬁculty in the use of visualization is ﬁnding a graphical
representation for the event data that ﬁts the programmer’s conceptual model of
the problem at hand [21]. Event data often consists of low-level events such as
message-passing or procedure calls, yet visualizations based directly on these events
are often difﬁcult to correlate with the expected behavior of the program. Such
visualizations lack any use of abstraction to model the low-level events in terms of
high-level behavior, such as stages in the algorithm. In order to debug any errors,
the programmer must manually establish a match between the graphically-displayed

events and the patterns of expected behavior for the program.

1.2 Research Contributions

Thesis Statement: Visualizations and formal speciﬁcations may be applied in new
ways to facilitate the diﬁicult task of parallel debugging. In particular, on-line,
replay-based debugging techniques offer new opportunities for applying visualization
to support typical debugging operations, such as setting breakpoints and examining a
program’s state. In addition, formal speciﬁcations may be applied to parallel debugging
for modeling a program’s expected behavior, facilitating a comparison between expected

and actual behavior.

This dissertation presents two major contributions to parallel debugging

technology:

1. The development of new visualization techniques for use in an on-line parallel

debugger.

2. The development of a methodology for using the formal speciﬁcation of a

program as a basis for visualizing the program execution.

The research discussed in this dissertation speciﬁcally applies visualization to the

following tasks of an on-line parallel debugger:

o Visualization-based selection of distributed breakpoints: Many of the typical
on-line debugging operations are more complicated in a parallel debugger as
compared to a sequential debugger since multiple processes must be considered
instead of a single one. For example, causal distributed breakpoints [22] is one
approach that has been used for stopping a parallel program at a particular
point in its execution. In this approach, a sequential breakpoint in a single
process initiates the stop, where the remainder of the processes are halted

and restored to the earliest state reﬂecting all events that occurred before the

sequential breakpoint. A graphical representation for this type of breakpoint
scheme is superior to a textual representation, as the logical ordering of events
between processes is depicted well by a two-dimensional plot. This type of
two-dimensional plot is known as a space-time diagram [17, 23, 24], where
processes are on one axis and time is on the other, with communication shown
as lines between the relevant (process, time) pairs. This dissertation presents a
visualization—based approach for setting distributed breakpoints, where a space-
time diagram is used to graphically illustrate where each process is located in
its execution when a distributed breakpoint stops the program [9, 11]. The
programmer may use this interface to insert distributed breakpoints, simplifying

the otherwise difﬁcult operation.

Scalable techniques for visualization of communication events: For the cases
of programs that use a large number of processes or have a large degree of
process communication, visualizations may become cluttered, thereby limiting
their beneﬁt. The effectiveness of visualizations is determined by how well
they reveal patterns and discrepancies, regardless of how much information
is represented. This dissertation describes GUI-based ﬁltering techniques
for scaling visualizations of communication [10, 11]. A unique characteristic
of these techniques is that they are integrated in a top-down debugging
framework, where ﬁltering the degree of information presented in visualizations
of communication is the ﬁrst step when converging towards the source of an

error .

Top-down use of visualizations for examination of program states: The
advantages of visualizing multiple aspects of program execution have been
discussed previously [23]. A replay-based, on-line debugging strategy is well—

suited for providing multiple visualizations, as the large volume of data required

to render the visualizations can be acquired without being limited by the probe
effect and data storage constraints. A top-down approach for using a variety
of visualizations to debug incorrect program behavior may thus be constructed
[9, ll, 25]. For example, in a case where incorrect communication between
processes is suspected, a visualization of communication events would be useful
for initially inspecting the message-passing behavior, where ﬁltering may be
applied as described above. As the location of the suspected error is narrowed to
a single process or a small subset of processes, the execution of those particular
processes may individually be visualized using a (procedure) call graph or a
source code listing. Finally, if a particular data structure must be inspected,
data visualizations may be used. In the case of data visualizations, ﬁltering
may be useful for highlighting particular instances of data. Several ﬁltering
possibilities are provided, including choices for ﬁltering values that exceed some
maximum value, that fall below some minimum value, that are equal to some

critical value, or that are within some critical range.

The task of debugging a parallel program is facilitated by an environment
that supports comparisons between the actual behavior of the program and its
expected behavior. The second contribution of this dissertation is a new approach to
visualizing the execution of a parallel program in the context of the program’s formal
speciﬁcation, where the formal speciﬁcation is used to model the expected behavior

of the program. The following items were addressed for this research contribution.

o Modeling expected behavior in program visualizations: Visualizations that depict
various aspects of the program state are useful for identifying inconsistencies
that may indicate the source of an error. In order for these visualizations to be
useful for debugging purposes, however, the programmer must be able to ﬁnd a

match between the depicted behavior and the expected program behavior. To

facilitate this task, we developed an approach [8, 26] for visualizing the execution
of a parallel program in the context of the program’s formal speciﬁcation,
written in the speciﬁcation language LOTOS (Language of Temporal Ordering
Speciﬁcations) [27, 28]. A LOTOS speciﬁcation is used to represent an
abstraction of the program, thus providing a model of the program’s expected
behavior. In the case of communication visualizations, each LOTOS process
may be used to represent a phase in the computation, and thus the correlation
of message-passing events with computation phases is established by overlapping
a representation of the active LOTOS processes onto the diagram. In the case
of graphically representing the current location in a single process, the location
may be represented within a visualization based on the formal speciﬁcation. In
this latter case, a call graph visualization may be used to depict call relations
between the LOTOS processes, while a speciﬁcation listing may be used to

portray the execution ﬂow.

Selection of Data Visualizations Using Formal Speciﬁcations: Similar to the
item discussed above, visualizations of program data may also be generated
based on the formal speciﬁcation of the program. In this case, however,
the formal speciﬁcation is used to guide the selection of an appropriate data
visualization, rather than modeling the visualization in terms of items from the
formal speciﬁcation. Speciﬁcally, a method for deriving appropriate parametric
data visualizations from formal speciﬁcations is presented, where this method
maps formal speciﬁcations of a program’s data structures to a graphical unit
termed a cell model [7, 29]. The cell model consists of several components used
to characterize a data structure. A rule-based system is used for selecting
appropriate debugging visualizations depending on the contents of the cell

model.

For both areas of research described above, we speciﬁcally considered the case
of programs written for distributed-memory systems, where communication between

nodes is via message-passing [30].

1.3 Organization of Dissertation

The remainder of this dissertation is organized as follows. Chapter 2 gives background
information on the LOTOS speciﬁcation language and the Cholesky factorization
program used to generate most of the visualizations shown in this dissertation.
Chapter 3 presents an overview of the Graphical On-Line Debugger (GOLD), a
prototype debugging environment that was developed as a testbed for several of the
research contributions discussed in this dissertation. The discussion of research results
is divided into several chapters, where each chapter presents results targeted towards
accomplishing one of the research contributions discussed above. In particular,
Chapter 4 describes a technique for using visualizations to guide the operation of
setting a causal distributed breakpoint for a set of processes. Chapter 5 presents a
scalable approach for using visualizations to represent process communication, where
this approach focuses on using simple, GUI-based ﬁltering techniques. Chapter 6
introduces a strategy for coordinating the use of different types of visualizations when
examining a program state. Chapter 7 discusses a technique for modeling expected
behavior in program visualizations. Chapter 8 discusses a method for selecting data
visualizations based on the formal speciﬁcations of the program’s data structures.
Related work is presented in Chapter 9. Finally, conclusions and future investigations

are discussed in Chapter 10.

CHAPTER 2

Background Information

This chapter discusses background information relevant to concepts that appear later
in this dissertation. First, an introduction to the LOTOS speciﬁcation language
is presented, in particular demonstrating how it may be used for the speciﬁcation of
parallel programs. Second, a brief introduction to the Cholesky factorization program
is provided, where this program was used to generate most of the visualizations shown

in this dissertation.

2.1 Specifying Parallel Programs with LOTOS

LOTOS [27, 28] is a formal speciﬁcation language speciﬁcally designed to specify
protocols and services. The concepts of LOTOS are general in nature, however,
thus making the language useful for a wide variety of other tasks, including the
speciﬁcation of parallel programs. Several features of LOTOS inﬂuenced its use
for our research: it has been internationally standardized, thus promising a stable
deﬁnition; it is relatively user-friendly compared to other speciﬁcation languages, as
provided through the usage of keywords and hierarchically-structured speciﬁcations;
and there exist public-domain tools for working with LOTOS, including syntax and

semantics checkers. In the following discussion, we present a brief introduction to

10

11

LOTOS, followed by a description of how the language may be used to specify a
parallel computing environment. We also include the speciﬁcation of a simple parallel

program.

2.1.1 Introduction to LOTOS

The following is an abbreviated introduction to the LOTOS speciﬁcation language.
We limit our overview to the concepts and notation relevant to the speciﬁcation of
parallel and distributed systems. For a more thorough introduction to LOTOS, the
reader may refer to the tutorials provided in [28, 31].

LOTOS consists of two components:

o A data type component, based on the algebraic speciﬁcation language ACT ONE
[32].

o A control component, based on the process algebras CCS [33] and CSP [34].

The data type component supports the speciﬁcation of data structures. The
deﬁnition of a data type includes a sort name used to identify instances of the data
type, a list of operations that may be performed on the data type, and a set of
equations that describe properties of the operations. In order to facilitate the task of
constructing speciﬁcations, LOTOS supports a library of standard data types, such
as boolean and natural number.

The control component of LOTOS permits a system to be described in terms

1 is an entity that may execute either external

of process deﬁnitions, where a process
actions (events) or internal, unobservable actions. In the former case, synchronization
with the other processes in the system is required, taking place at synchronization

points called gates. The system as a whole is viewed as a single process, possibly

 

1The meaning of a LOTOS process is distinguished from the meaning of an operating system
process, where, in the latter case, a process refers to the execution of a sequential program.

12

composed of several subprocesses. Each subprocess is a process in itself, and thus the
entire system may be viewed as a hierarchy of processes.

The general form of a LOTOS action consists of three components:

o A gate name

o A list of attributes, where each attribute can either be a value offered at the
gate (labeled with “1”) or a variable accepted at the gate (labeled with “7”)

o An optional predicate that establishes a condition on the variables that can be
accepted

An example action is the following (where “Nat” is the natural numbers sort):
g l3 ?V:Nat [V < 2]

The predicate for this action asserts that the variable V can only accept values that
are less than 2, namely 0 or 1 since V is of sort natural number. Thus, the action is
equivalent to one that provides a choice between the two actions “g 13 10” and “g 13
ll”, offering the values 3 and 0 or 3 and 1, respectively, at gate g.

Processes communicate by synchronizing on common actions. For example, if a
process P is ready to execute the action “g 13” at the same time that a process Q is
ready to execute the action “g ?Y:Nat [ Y < 7 ]”, then synchronization may occur
between processes P and Q on the common action “g l3”.

Two special processes are deﬁned in LOTOS: stop and exit. The process stop,
representing a completely inactive process, is invoked by an active process in order
to complete a transition to an inactive state. The process exit is a process with the
sole purpose of executing a successful termination action, after which it invokes the
process stop.

A behavior expression models the activity of a process. Behavior expressions
are built from actions and other behavior expressions by using a predeﬁned set of
operators. The operators relevant to the speciﬁcations in this dissertation are the

following (where c and d are actions, and B is a behavior expression):

13

o Action preﬁx operator: written as a semicolon “;”
This operator supports the sequential composition of actions and behavior
expressions. For example, “c ; d ; B” means that action c is followed by action
d, which, in turn, is followed by behavior expression B.

o Choice operator: written as “[ ]”
This operator denotes the choice between two or more actions or behavior
expressions. For example, “( c ; B ) [] ( d ; B )” represents a choice between
either action c followed by behavior expression B, or action d followed by
behavior expression B.

o Parallel composition operators: written as “|[G]|”, III, and I]

The selective parallel operator “|[G]|” is used to express parallelism between two
behavior expressions, where G is the set of gates on which synchronization must
occur. Any actions in the two behavior expressions occurring at gates not in
the set C may interleave. The special case in which G is empty is represented
by the interleaving operator III, and the special case in which 6' contains the set
of all gates is represented by the full synchronization operator II. In order to
illustrate the use of the parallel composition operators, consider the case where
c denotes the action “g 13” and d denotes the action “g l2”. Then “( c ; exit
) |[g]| ( d ; exit )” is equivalent to stop, since synchronization cannot occur
between “g 13” and “g 12”. However, “( c ; exit ) ||| ( d ; exit )” is equivalent
to “( c ; d ; exit ) [ ] ( d ; c ; exit )” since the interleaving operator requires
no synchronization.

A behavior expression may be preceded by a guard, where the guard must be true
in order for the expression to be invoked. The guard has the form of a predicate

followed by an arrow. For example, in the following guarded expression, behavior

expression B will only be invoked if the value of X is equal to 2:
[ X = 2 ] -—» B

A typical structure for a LOTOS speciﬁcation is of the following form, where
functionality is either exit or noexit, depending on whether the process is able to

successfully terminate [28]:

speciﬁcation name_of_spec [gate list](parameter list): functionality
( type deﬁnitions )

behavior
(behavior expression for the overall speciﬁcation)

where

14

(process deﬁnitions)
endspec

where each process deﬁnition has the form:

process name_of_process [gate list](parameter list): functionality :=
(behavior expression for the process)

where
(subprocess deﬁnitions)

endproc

2.1.2 Speciﬁcation of the Parallel Computing Environment

A LOTOS speciﬁcation for a parallel program requires the speciﬁcation of the entire
environment of the program, where the environment consists of both the processing
elements where the computation occurs and the medium through which the processing
elements can communicate. For the research discussed in this dissertation, we
consider the case of parallel programs written for distributed-memory systems, where
communication channels interconnect the nodes to provide a medium for message-
passing. A general technique for the LOTOS speciﬁcation of such environments has
been developed [35], from which we adapted our particular approach. We outline the
key aspects of our approach below.

A simple example of a parallel computing environment is two processing elements
P1 and P2 connected via two unidirectional channels C l and C2, as shown in
Figure 2.1. The synchronization points between the processing elements and the
channels are the gates labeled send and recv. In accordance with previously developed
notation [35], an action that sends a message Mesg from a processing element labeled

Sender to a processing element labeled Rcvr may be formatted as:
send lSender chvr lMesg

Similarly, the action that receives the message that was sent as described above may

be formatted as:

15
recv lSender chvr ?Mesg:Message

where Mesg is a variable whose value is set through synchronization.

 

 

send C1 recv

 

 

 

 

recv C2 send

 

 

 

 

 

 

Figure 2.1. Simple parallel computing environment

Consider a program for the computing environment given in Figure 2.1, whose
purpose is to send a natural number from P1 to P2, and then send the same number
back again from P2 to P1. The speciﬁcation for this program is shown in Figure 2.2.
The behavior expression for the overall speciﬁcation states that the actions of the
processing elements (P1 and P2) may interleave, and the actions of the channels
(C 1 and C2) may also interleave, but the actions between the processing elements
and the channels must fully synchronize (i.e., synchronize on both the send and recv
gates). By examining the behavior expressions in the four process deﬁnitions of the
speciﬁcation, we observe that the ﬁrst synchronization occurs between P1 and C1 at
the send gate. This synchronization causes the variable Msg in the action “send lPl
1P2 ?Msg:Nat” to accept the value Some_Num offered by the action “send lPl lP2
lSome_Num”, effectively representing the passing of a message from the processing
element P1 to the channel C 1. Similarly, the next three synchronizations that occur
pass the message from C 1 to P2, then from P2 to CZ, and ﬁnally from C2 to P1, at

which point all four of the LOTOS processes P1, P2, Cl, and C2 have successfully

16

 

speciﬁcation simple_send-and_return(Some-Num:Nat): exit
library N aturalNumber endlib
behavior

( P1[send,recv](Some-Num) ||| P2[send,recv] )

||
( C1[send,recv] III C2[send,recv] )
)

where
process P1[send,recv](Some_Num:Nat):exit :2
send !P1 !P2 lSome_Num ; recv !P2 1P1 ?ReturnMsg:Nat ; exit
endproc
process P2[send,recv]:exit :=
recv 1P1 1P2 ?Received_Num:Nat ; send 1P2 iPl lReceived_Num ; exit
endproc
process C1[send,recv]:exit :=
send lPl 1P2 ?Msg:Nat ; recv lPl lP2 lMsg ; exit
endproc
process C2[send,recv]:exit :=
send 1P2 1P1 ?Msg:Nat ; recv lP2 lPl lMsg ; exit
endproc
endspec

Figure 2.2. Speciﬁcation for transmitting a number between P1 and P2

 

 

 

terminated, such that the speciﬁcation may also successfully terminate.

The model that we use for a general parallel computing environment supports an
unlimited number of processing elements, where each pair of processing elements is
connected by two unidirectional channels, as shown in Figure 2.1 for the case of P1
and P2. We assign each processing element with a natural number for its name, where
the advantage of this naming scheme is discussed below. The number zero is assigned
to a special processing element called the host node, whose purpose is to handle

management duties for the set of processing elements, such as assigning tasks or data

17

sets to the other processing elements. We refer to the other processing elements in the
environment as worker nodes, assigning them the names 1, . . . , TOTAL- WORKERS,
where T0 TA L_ WORKERS refers to the number of worker nodes in the environment.
The purpose of worker nodes is to collectively complete the main computation for a
problem by each performing a portion of it. For clarity purposes, we distinguish
between the following types of channels: those that connect the host to the worker
nodes and those that connect pairs of worker nodes. We assume for simplicity that
processing elements do not fail, that messages are not lost, and that unbounded
message size is supported by the environment.

The behavior expression for the overall speciﬁcation of the computing environment
indicates that the actions of the host and worker nodes interleave, and it also indicates
that the actions of the two types of channels interleave. But it requires the actions
between the nodes and the channels to fully synchronize, as was the case for the
speciﬁcation shown in Figure 2.2. The behavior expression for the overall speciﬁcation
is shown in Figure 2.3.

The individual instances of worker nodes and the two types of channels are created
using recursion, based on the naming scheme for the processing elements, as shown
in Figure 2.4. The NodeCtr variable is used to recursively count in descending order
through all the “names” of the worker nodes, from TOTAL_ WORKERS to one (the
process deﬁnitions for creating the instances of the channels are similar in nature).

The required behavior of a channel is to forward a message from a sender to a
receiver, and then make a recursive call to itself in order to wait for the next message
that needs to be forwarded. An example of the process deﬁnition for a channel for
the case of a channel from a host to a worker is shown in Figure 2.5.

The speciﬁcation of the behavior of the Host_Node and Worker_Node processes
depends on the particular program that is being speciﬁed.

The full speciﬁcation for the parallel computing environment is presented in the

18

 

Host_Node[send,recv] (TOTAL_WORKERS)

| I I
All_Worker_Nodes[send,recv] (TOTAL-WORKERS,TOTAL_WORKERS)

All_Host_to_Worker_Channels [send,recv] (TOTAL_WORKERS)

| | |
All_Worker_to_Worker_Channels[send,recv](TOTAL_WORKERS,

TOTAL_WORKERS)

Figure 2.3. Behavior expression for the overall speciﬁcation

 

 

 

Appendix.

2.1.3 Speciﬁcation of a Simple Program

The speciﬁcation discussed above for a parallel computing environment is generic
in nature, such that it may be used for specifying any parallel program that
communicates via message-passing. Each particular program will have unique process
deﬁnitions for the host and worker nodes, but the remainder of the speciﬁcation will
be approximately the same.

A typical structure for a message-passing parallel program is for the host node to
distribute work to each of the worker nodes, where the worker nodes then perform
their assigned duties, perhaps communicating among themselves to do so. Finally,
the worker nodes return the results to the host nodes, thus completing that stage of
computation. In the case of multiple stages of computation, the host node may then

assign a new set of tasks to the worker nodes, and the cycle continues.

19

 

process All_Worker_Nodes[send,recv](NUM-WRKRS,NodeCtr:Nat):
noexit :=
[NodeCtr > 0] —+
Worker_Node[send,recv] (NU M_WRKRS ,NodeCtr)
|||
All_Worker_Nodes[send,recv](NUM_WRKRS,NodeCtr - 1)
endproc (* All_ Worker_Nodes *)

Figure 2.4. Recursive creation of worker nodes

 

 

 

 

process Host_to_Worker_Channel[send,recv](Sender,Rcvr:Nat): noexit :=
send lSender chvr ?Msg:Message;
recv lSender chvr lMsg;
Host_to_Worker_Channel[send,recv](Sender,Rcvr);

endproc {”‘ Host_to- Worker_ Channel *)

Figure 2.5. Process deﬁnition for a channel

 

 

 

A simple example of a message-passing parallel program that ﬁts the above
description is one in which the host node sends each worker node a natural number
equal to the value of the worker node’s name, where the worker node then doubles
the received number and returns the result to the host. The speciﬁcations for
the host and worker node processes of this number-doubling program are shown
in Figure 2.6. The behavior expression for the host node illustrates the use of
process decomposition, where the subprocesses Send_Numbers and Receive_Replies

perform the tasks of sending messages to the worker nodes and receiving the results,

20

 

 

process Host-Node[send,recv](NUM_WRKRS:Nat): exit :=

Send_Numbers[send,recv](NUM_WRKRS);
Receive_Replies[send,recv](NUM_WRKRS);
exit
)
where
(* Send_Numbers uses recursion to send a number to each worker node *)
process Send_Numbers[send,recv](NodeCterat): exit :=
[NodeCtr > 0] —+
(
send !0 lNodeCtr lNodeCtr ;
Send-Numbers[send,recv] (NodeCtr - 1)

)
[l

[NodeCtr = 0] —> exit
endproc (* Send_Numbers *)

(* Receive_Replies uses recursion to receive numbers from worker nodes *)
process Receive_Replies[send,recv](NodeCtr:Nat): exit :=
[NodeCtr > 0] —->

(* Since Sender is a variable, receive the replies in any order *)
recv ?Sender:Nat 10 ?Reply:Nat ;
Receive_Replies[send,recv](NodeCtr - 1)

)
[l

[NodeCtr = 0] -i exit
endproc (* Receive_Replies *)
endproc (* Host-Node *)

process Worker_Node[send,recv](NUM_WRKRS,MY_NUM:Nat): exit z:
(* Receive a number from the host node *)
recv 10 lMY_NUM ?Value:Nat ;
(* Send twice the value of the number back to the host node 4*)
send lMY_NUM l0 l(Value+Value) ;
exit
endproc (* Worker_Node *)

Figure 2.6. Speciﬁcation of nodes for the number-doubling program

 

 

21

respectively. The Send_Numbers process requires the messages to be sent to the
worker nodes in decreasing order of the value of the worker nodes’ names, but the

Receive_Replies process receives the results in any order.

2.2 Cholesky Factorization Program

The application we used to generate most of the visualizations shown in this
dissertation is the Cholesky factorization program supplied with the PVM 2.4 [36]
distribution, modiﬁed to run under PVM 3.3. The executions occurred on clusters
of identical, ethernet-connected SUN SPARCstation 10 workstations. In the brief
background discussion below, we present an informal description of the parallelized

program, followed by LOTOS speciﬁcations of relevant portions of the program.

2.2.1 Informal Description of the Program

Cholesky factorization considers the special case in which a matrix A is both
symmetric and positive deﬁnite. In this case, matrix A has a factorization of the
form A = LLT, where L is a lower triangular matrix. This factorization is known as
the Cholesky factorization.

The program that we use to compute the Cholesky factorization is a Column-
Cholesky [37] implementation, in which the processing elements are each assigned an
approximately equal number of columns for the computational tasks, although not
necessarily consecutive. The implementation consists of three phases: synchronous
Cholesky factorization, forward substitution, and backward substitution. A full
discussion of these phases is beyond the scope of this dissertation, although the
interested reader may refer to other references [37, 38]. In the following presentation,
we focus on the message-passing events for each of the phases. We do not discuss the

contents of the messages, but rather focus on the patterns in which they are sent. The

22

matrix we consider is of size n X n, where the columns are numbered 0, . . . , n — 1. The
number of available worker node processes is represented by TOTAL_ WORKERS.

All three phases of the program contain message-passing events within loops that
use the column number as the index variable. In the case of the synchronous Cholesky
factorization phase, the iteration is in order of increasing column number, from 0 to
n — 1. During each iteration of the loop, the processing element checks if it is assigned
the column corresponding to the value of the index variable. If so, the processing
element sends a broadcast message to the other processing elements. Otherwise, it
receives the message broadcast from the processing element that actually is assigned
the column corresponding to the value of the index variable.

The message-passing behavior in the backward substitution phase is similar to
that described above for the synchronous Cholesky factorization phase, except that
the iteration is in order of decreasing column number, from n — 1 to 1 (column zero
is skipped).

The forward substitution phase iterates in order of increasing column number,
from 1 to n — 1 (column zero is again skipped), but the message-passing behavior is
different from that of the other two phases. In this case, if the processing element
determines that it is assigned the column corresponding to the value of the index
variable, then it waits to receive values sent from each of the other processing elements.

Otherwise, it sends a message to the processing element that is assigned the column.

2.2.2 LOTOS Speciﬁcations for the Program

In this example, the host node process participates in the computation only during
initialization, where the worker node processes perform the majority of the work.
The behavior expression for the worker node processes consists of the sequential
composition of an initialization process with three processes corresponding to the

three stages of the program:

23

Initialization[send,recv]((List of Local Variables));
Synchronous_Cholesky_Factorization[send,recv]((List of Local Variables));
Forward_Substitution[send,recv]((List of Local Variables)) ;
Backward_Substitution[send,recv]((List of Local Variables))

The error we consider below is located within the Forward_Substitution process,
and so we focus our discussion on that process. Its behavior expression consists of
a set of initialization operations, followed by the main loop that iterates in order of

increasing column number:

For_Sub_Pre—loop[send,recv]((List of Local Variables));
For_Sub_Main_Loop[send,recv](l,n-l,(List of Local Variables))

where the header and behavior expression for the For_Sub_Main_Loop process is
shown in Figure 2.7. The CurrentCol variable within the For_Sub_Main_Loop process
is used to iterate in order of increasing column number through recursive calls to the

For_Sub_Main_ Loop process.

 

process For_Sub_Main_Loop[send,recv](CurrentColzNat,MAX_COL:Nat,
(List of Local Variables)): exit :=
[ CurrentCoI S MAX_COL] —i
(
For_Sub_Pre—communication[send,recv]((List of Local Variables));
For-Sub_Communication[send,recv](CurrentCol,(List of Local Variables));
For,Sub_Post-communication[send,recv]((List of Local Variables));
For_Sub_Main_,Loop[send,recv](CurrentCol+1,MAX_COL,
(List of Local Variables))

)

[l
[CurrentCol > MAX_COL] —» exit

Figure 2.7. Header and behavior expression for For_Sub-Main-Loop

 

 

 

Finally, the speciﬁcation of the For_Sub_Communication process is shown in

24

Figure 2.8. The behavior expression for this process is divided into two choices

 

process For_Sub_Communication[send,recv](CurrentColzNat,
(List of Local Variables)): exit :=
[CurrentCol E MY-COL_SET] —»
For_Sub_Receive_Messages[send,recv](0,TOTAL_WORKERS);
ll
( [CurrentCol ¢ MY-COL_SET] —+
send lMY-NUM lOwner(CurrentCol) lMsgzMessage ;
exit )
where
process For_Sub_Receive_Messages[send,recv](NodeCtr:N at,
TOTAL_,WORKERS:Nat): exit :2
[NodeCtr < (TOTAL-WORKERS - 1)] —»
(* Since Sender is a variable, receive messages in any order *)
( recv ?Sender:Nat lMY_NUM ?Msg:Message ;
For_Sub_Receive_Messages[send,recv](NodeCtr + 1,
TOTAL_WORKERS) )
ll
[NodeCtr = (TOTAL_WORKERS - 1)] —+ exit
endproc (* For.Sub_Receive_Messages *)
endproc (* For.Sub_Communication *)

Figure 2.8. Speciﬁcation of the For_Sub_Communication process

 

 

 

that determine whether the worker node should participate in send or receive
actions. The selection between the two choices is based on whether the CurrentCol
variable belongs to the set of columns assigned to the worker node that has invoked
the process. In the case where the column is assigned to a worker node, the

recursive For.Sub_Receive_Messages process is used to receive TOTAL_ WORKERS-

] messages sent by the other worker nodes.

CHAPTER 3

Overview of the GOLD Debugging

Framework

In order to provide a context for the presentation of the research results, this chapter
describes GOLD [11], a graphically-based debugger that we developed to give a proof
of concept demonstration of our research. GOLD may be used to debug parallel
and distributed programs written in C and compiled using the GNU compiler gcc,
and has currently been ported to Sun workstations running either SunOS 4.1.x or
Solaris. GOLD supports programs that have been written using PVM 3.3 [39]
message-passing primitives, where the programs consist of a single host process that
spawns a collection of worker processes. A PVM program is typically designed to
parallelize a computation by having the host process distribute work to a collection
of worker processes. In this chapter, the top-down procedure for debugging using
GOLD is presented. Next, the general architecture of GOLD, which facilitates top-

down debugging, is discussed.

25

26
3.1 A Top-down Debugging Procedure

GOLD is designed to support a top-down procedure for locating the source of an

error. The steps in the top-down procedure are as follows.

1. Construct a hypothesis regarding the potential location of the source of the
error.

2. Establish one or more program states to examine in order to prove or disprove
the hypothesis.

3. For each program state from Step 2:

(a) Execute the program until the desired program state is reached.

(b) Examine the program state using a top-down approach, such that the
state is investigated from a broad scope initially, but the scope narrows as
debugging proceeds in order to examine the potential source of an error in
greater detail.

4. If the source of the error has been located, then quit; Else, construct a new
hypothesis based on what has been learned so far, and return to Step 2.

Therefore, the procedure for debugging is top-down in two respects. First, the
overall procedure uses a top-down strategy for deriving hypotheses that incrementally
make progress towards the source of an error. Second, Step 3(b) of the procedure
uses a top-down strategy for examining a particular program state.

The visualization-based features of the GOLD debugging environment are
designed to support the top-down debugging procedure. In particular, the
visualization-based approach to breakpointing supports the task of stopping the
program in a speciﬁc state (Step 2), while the strategy for coordinating the application
of different types of visualizations supports the examination of a program state (Step

3(b)). The GUI-based techniques for ﬁltering communication events are useful when

27

a hypothesis concerns incorrect message-passing behavior, where it is helpful to depict

only the relevant communication activity during Step 3(b).

3.2 The General Architecture of GOLD

The top-down procedure described above deﬁnes an iterative approach to debugging.
The basis for iterative debugging is to execute a program repeatedly, each time
gathering more information to be used towards identifying the source of an error. The
difﬁculty in using an iterative approach for parallel debugging is that any attempt to
monitor the behavior of a parallel program may actually change its behavior. Referred
to as the probe effect [40], the debugging routines could potentially cause the order
of message-passing transactions between processes to change such that the erroneous
behavior may be difﬁcult to reproduce.

Replay strategies [12] have been devised for providing deterministic re-executions
of parallel programs. The basis of a replay strategy is to record the communication
order between processes during an initial program run. No additional debugging
operations are performed during the initial run, thus minimizing the possibility that
the erroneous behavior will be concealed. Any additional program runs are guaranteed
to be equivalent by forcing the communication to occur in the same order as in
the initial run. Replay strategies thus facilitate an iterative approach to parallel
debugging.

The basic operation of GOLD is based on a replay strategy. In order to debug a
PVM program, the program is recompiled with an instrumented version of the PVM
library, where the instrumentation handles all details of the replay-based operation.
An initial run of the recompiled program is ﬁrst used to collect the necessary data
for supporting program replay, where the program is run using its normal command-

line parameters. During the re-executions, the program is executed by typing gold

28

followed by the name of the program and its command-line parameters. For example,
if the program is typically invoked by typing “host dataﬁle”, then the following line

would be used to invoke the re-execution:
gold host datafile

The replay-based operation of GOLD ensures deterministic re—execution, and so
GOLD has been constructed as an on-line debugger, where debugging operations are
performed during the re-executions. This approach is in contrast to a post-mortem
strategy for debugging, where important events are recorded during a program
execution, but the examination of the event data is performed off-line after execution
is complete. Although the overhead required by GOLD to perform on-line debugging
operations may slow the execution of the program to a small degree, the replay-based
operation guarantees that the behavior will maintain its equivalence with the initial
run.

A Central Operations Module (COM) is used to control the global activity during
the execution of a parallel program. The interface for this module is given in

Figure 3.1. Menu items for setting or removing a breakpoint distributed across

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

is] .m:{ Fig;
l File hekpoints Options Elp 7
,ilExewtion of the distributed processes:
] Continue | [ Halt |
- Exeoinetion of Distributed State:
[ Space-tine Diegreo I [ Status Grid I
Fmsmbuted Status: nu processes are stopped. "

 

 

Figure 3.1. The central operations interface

29

the set of processes are accessible through the Central Operations Interface (C01),
and options are available for activating visualizations to examine the state of the
set of processes. The interface also provides options for controlling the execution
of the parallel program, including Continue for restarting the execution after the
processes have all stopped and Halt for terminating execution. (A “start” option is
not applicable for the case of PVM programs, since the host process is started when
gold is initially activated, and the worker processes are started during the execution
of the host process via a pvm_ spawn system call.) A Status line indicates the current
execution status for the set of processes.

The execution of individual processes is managed by Local Operations Modules
(LOMs), one per process. An overview of the architecture for the GOLD debugging

environment is shown in Figure 3.2, where the relationship between the COM and

 

I (for Process 0)

LOM
(for Procea O)

 

 

 

LOM

CO. / (for Process 1)
00M

 

 

 

 

 

 

 

 

 

 
 

 

  
 

gdb debugger

(for Process n)

 

 

 
 

 

—— . AlwaysecliveduringaGOLDdebugglngsession
———— a Acﬂvetedonenopﬂonalbesisbymeprogrammer

 

 

 

Figure 3.2. The architecture of GOLD

30

the LOMs is illustrated. The LOM uses the GNU debugger gdb to perform low-level
debugging operations, such as inserting breakpoints and determining the values of
particular variables. As such, a gdb process directly controls the execution of the
PVM process, and the LOM controls the execution of the gdb process. The LOM
modules communicate with the COM in order to coordinate the execution of the
parallel program. For example, in order to continue execution after the processes
have all stopped, the programmer would select the Continue button from the COI
(Figure 3.1), and the COM will then send an appropriate message to each of the
LOMs. Each LOM will then request the local debugger to continue execution of its
PVM process.

The top-down procedure for debugging facilitates the programmer’s ability to
move towards the source of an error. As such, the programmer will often narrow
the search to the execution of a particular process. In order to examine this local
execution more closely, a local operations interface (LOI) is supplied, as shown in
Figure 3.3. Various options are provided for controlling the execution of the single
process, including Start for initiating execution, Continue for restarting execution
after the process has been stopped at a local breakpoint, and Step and Next for
continuing execution one line at a time, where Step sequences through the individual
statements of a called function and Next treats a function call as a single line.
A Status line indicates the execution status for the process, and additional ﬁelds
indicate the current point of execution in terms of the source code, including the
function, the source code ﬁle, and the line number within the source code ﬁle. The
LOI also offers options for managing breakpoints and for examining the state of the
local process. In particular, menu items are available for setting or removing a local
breakpoint, and call graphs and source code listings may be activated to examine the
current state of the process. The bottom portion of the LOI displays messages that

are generated by the local debugger or by the execution of the program (e.g., output

31

 

    
 
   
      
     
     
   
   
   
   
       
     
 
   
     
    
   
       

  

 

' f???ff“‘fié‘ii‘i‘fif*im‘i' ' In *r‘.;';f?i-'jif ,z -- ~- = - -
Breakpoints Options

    

 

 

IExecution of this process only:

iiCurrent Point of Execution:

Function: nain
Source Code File: cholhost.c
Line Number: 43

 

 

iExanination of Current State:
3 l 9611 GFGPh I I Source Code Listing I

 

 

 

 

 

 

aStatus: Stopped at a breakpoint.

MESSAGES FROM gdb & PROGRAM EXECUTION:

: GOD is Free soFtuare and you are welcome to distribute copies of it

.3 under certain conditions: type "show copying” to see the conditions.

é There is absolutely no warranty For GDB: type “shoe warranty" for details
3 GDB 4.12 (spare-sun-solar132.3),

5; Copyright 1994 Free Software Foundation, Inc...

 

 

OUTPUT 2:

Breakpoint 1. oain (argc=4, argv=0xeFFFF464) at ../cholhost.c:24
24 ii (ergo < 4) {

INPUT 3:

I Breakpoint 2, nain (argc=4, argv=0xeFFFF470) at ../cholhost.c:43
I 43 nsgtype = 13001:

 

 

 

 

 

 

 

Figure 3.3. The local operations interface

32

from a call to printf).

The LOI for a process is activated as an option by the programmer, such that
it remains deactivated by default. By using this approach, the potential problem
of ﬂooding the windowing environment with a large number of active interfaces
(windows) is avoided. In contrast, however, the COI remains active throughout the
debugging session, as it is required for controlling the overall execution of the parallel
program.

A status grid window is provided via the COI for displaying the current status
of each of the PVM processes used in a particular program execution, where
different color shadings represent different status values (e.g., running, stopped by
a breakpoint, etc.). A LOM notiﬁes the COM whenever a change in the status of its
local process occurs, and the status grid is then updated accordingly. An example
of the status grid for an execution using a host process and eight worker processes is
shown in Figure 3.4. (The host process is always assigned the node number 0, while
the worker processes are assigned the node numbers 1, . . . , n for the case of n worker
processes.) In the case shown in Figure 3.4, process 6 is currently running, while the

remainder of processes are each stopped at a breakpoint.

33

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. ' (A': .
“. 7A,!" \
.1. ‘:. ,{rx
1.: ,' ‘-
. .v I-V
u r'

 

 

 

 

 

Stopped (via a bredtpoint or single-stepping)

m11., misuse gid- -- lulu
STRTLB [F Tl-E INOIVIIIH. PRIIIESSES:
a mom
1 .2‘ "iii 3 i“. "f 5 7 s
LEGEND:
Not stated I Process died

 

 

 

l

 

Disoiss I I Help

 

 

 

Figure 3.4. The status grid window

 

CHAPTER 4

Visualization-based Breakpointing

When attempting to determine the source of an error, the programmer marks a
particular program state for examination by placing a breakpoint at that state. This

chapter discusses the graphically-based approaches for establishing breakpoints using

GOLD [9, 11].

4.1 Distributed Breakpoints

The task of setting a distributed breakpoint for a set of processes has been considered
previously [22, 41, 42, 43]. The novelty of our approach is the use of visualization
for supporting the operation of setting a breakpoint. The discussion of distributed
breakpoints is presented in two parts, where we ﬁrst present background information,

and then discuss the visualization—based approach used by GOLD.

4.1.1 Background Information

Process communication in a message-passing parallel program occurs via send and
receive events. Other types of events that occur local to a single process are referred
to as internal events [42]. The entire collection of events that occur during a program

execution is partially ordered by the “happened before” relation, denoted “—r” [44].

34

35

This relation is deﬁned as follows.

1. If a and b are events in the same process, and b follows a, then a —+ b.

2. If a is a send event in one process, and b is the corresponding receive event in

another process, then a —+ b.

3. Ifa—rbandb—rc,thena—+c.

If a 71+ b, b 71+ a, and a 75 b, then a and b are referred to as concurrent events.

The state of a process (or a collection of processes) that has been stopped is
referred to as the breakpoint state. A common technique for stopping the execution
of a group of processes is to have a breakpoint located in a single process trigger
the operation. We refer to this particular process as the triggering process, and
the relevant breakpoint is known as the triggering breakpoint. When the triggering
breakpoint is reached, the remaining processes may be stopped using a variety of
strategies. One such strategy is to have the triggering process send an appropriate
message to each of the other processes requesting them to stop. However, this strategy
is unfavorable for debugging purposes, as it may hide the source of an error. For
example, consider the case of the two processes P and Q shown in Figure 4.1. For
this case, a triggering breakpoint occurs in process P at the location marked with
an “X”, where a message (indicated by a dotted arrow) is then sent to process Q in
order to stop it at time tq,. However, in order to determine if process communication
led to an error at point “X” in process P, process Q should be examined at time tq,,
as that was its state when it last had a causal effect on process P. By waiting until
tq, to examine process Q, any state information regarding the message that was sent
at tq, may change in the intervening time.

Another strategy by which a process may stop the global execution when it reaches

a triggering breakpoint is to simply let the other processes run until they block waiting

36

 

 

tPI ”2 s
P b x c
': — : Normal ' event
Processes / \ . 's ' WWW
Q a ‘ d ....... . top request
to‘ 102 toa
Time

Figure 4.1. Stopping global execution using a triggering breakpoint

for some event to occur in the triggering process. For example, in the case shown
in Figure 4.1, the triggering breakpoint at point “X” will eventually cause process Q
to block when it attempts the receive event d, since the send event c will not have
occurred yet. We can also observe from this example that process Q is again allowed
to proceed beyond the last state that had a causal effect on process P, and thus this
strategy for stopping the global execution has the same disadvantage as the previous
one.

In contrast, the strategy we use for stopping the global execution of a group of
processes is based on a causal distributed breakpoint [22]. This type of breakpoint is
one in which a single process stops at a triggering breakpoint and the remaining
processes are stopped at the last event that “happened before” the triggering
breakpoint. This last event is referred to as a causal breakpoint event. A causal
distributed breakpoint stops each process at the last point that had a causal effect on
the execution of the triggering process. An example of a causal distributed breakpoint
is shown in Figure 4.2. The triggering breakpoint is located at point “X” of process P,
and the dashed line indicates where each process is stopped by the causal distributed
breakpoint.

GOLD computes a causal distributed breakpoint by ﬁrst executing the initial
program run. At the end of the normal execution of the initial run, each process

completes a post-processing stage before it terminates, where a dependency vector is

37

 

 

 

 

I
a d h I
" \ i
f / : Message-passing event
Processes O b I r/
/ //’ 9 ——— : Locationotbreakpointstates
R c 0:
Time

Figure 4.2. Example of a causal distributed breakpoint

recorded for each send and receive event performed by that process. The send and
receive events for a process are assigned consecutively—numbered event indices, such

that a dependency vector for event e,- of process i is deﬁned as follows [22]:

BK” = (61,62, . . . , 6”)

where n is the number of processes. Component 6.- of the vector is set equal to 6;.
Each message is appended with the value of the current event index from the sender,
such that the value of component 6,, j 75 i, is set equal to the value of the event index
from the last message received by process i from process j. For the case in which no
message has been passed from process j to process i yet, the value of 6, is set equal
to .L, a value less than all possible event indices.

At the end of the post-processing stage, the dependency vectors from each process
are copied to a common location accessible by the COM. The combined collection
of dependency vectors is used for computing the causal distributed breakpoints. A
dependency vector, however, only records the direct causal dependencies that exist
at a particular event position. In order to compute a causal distributed breakpoint,
transitive dependencies must also be considered. The visualization-based operation
for setting a causal distributed breakpoint uses an algorithm that was speciﬁcally

designed to consider transitive dependencies during the computation of a causal

38

distributed breakpoint [22].

4.1.2 Using Visualizations to Set Causal Distributed

Breakpoints

GOLD uses a visualization-based approach to support the operation of setting a causal
distributed breakpoint [9, 11]. We refer to the operation of setting a breakpoint
as the insertion of the breakpoint. The breakpoint menu from the COI (shown
in Figure 3.1) may be used to activate the breakpoint insertion interface shown in
Figure 4.3. A primary component of this interface is the communication graph,
referred to as a space-time diagram [44]. A space-time diagram plots processes along
one axis and time along the other, where messages are shown as lines between the
relevant (process, time) pairs. In this particular case, the displayed communication
events are those that were recorded in the replay data during the initial program
run. The outcome is a graph of communication patterns that is used to assist the
programmer in appropriately placing causal distributed breakpoints.

Only event indices are recorded during the collection of replay data. In particular,
the time that each event occurs is not recorded. The relative placement of the events
in the space-time diagram-is thus based solely on the “happened before” relation,
where the “time” of an event is based on its relative time. The time between a send
event and its corresponding receive event is one time unit, and concurrent events are
plotted at the same position in time. An alternative technique for the collection of
replay data would be to record a timestamp with each send or receive event, such
that the space-time diagram could more accurately represent the relative occurrence
of the events. However, the disadvantage of capturing timestamps during the initial
program run is that the level of monitoring intrusion is increased, likewise increasing

the possibility of a probe effect, and thus this technique is not used.

39

 

 

 

 

 

, : I:\\\III”
Iris 4.": ei'I-\\l|l
'71! :I-I“
lM/II

I/ I

   
    
   
   
    

n
b 0‘
“’ -—

I“
(’1
ii

  
      
 

. ‘

\‘I

I

U
i
I

   

1

 

O

O

I

 

 

(D
N
I i

x
\

 

 

 

 

 

I '4 ”

l

I

i

l

I. .-
i

i

I

 

E e
V
S a ii
18 35
r I H E
L-- :
o triggering breakpoint event I Between calls to oholFo and Forstb
OCeusolbredeointevent Inetueen ceilsteterstbendbeekwb

O Progenitor-defined bredtpoint event

 

 

 

[um] {Ra-oval

 

 

Camel Distributed Breakmints that are Ctrrently lmerted:
TRIGSERIK; PREIZESS THE (F TRIGGEle REFKPOINT

y
I
I
i O 35
I
I
E

 

 

 

 

 

 

I Oisniss I I Help I

L’Q‘fﬁh “WWWW \'u~\'\-o-v.‘,. aux .. . .-."~\‘n ..‘,-,..' .~-... - ... u. .- - I”. M, u... --r-...,-,....; nu H... .; ,

 

Figure 4.3. Interface for insertion of causal distributed breakpoints

40

A potential location for a causal distributed breakpoint may be evaluated by
selecting an appropriate send or receive event in the space-time diagram. Scroll
bars have been included on both axes of the space-time diagram so that it can be
completely examined for potential locations of breakpoints. Upon selection of a send
or receive event, a dark-colored, ﬁlled circle is placed at the position of the event to
mark it as the location of the triggering breakpoint. The algorithm for computing the
causal distributed breakpoint is then executed, where the result of this computation
is depicted by placing an empty circle along each of the other process lines at the
location of the last event that had a causal effect on the execution of the triggering
process. (The lighter-colored, ﬁlled circles represent the location of programmer-
deﬁned breakpoint events, which will be discussed below.) For example, in Figure 4.3,
the triggering breakpoint is at a receive event for process 8, and empty circles for
depicting the causal breakpoint events have been drawn accordingly along the other
process lines. (The causal breakpoint event for process 0 is located at an event earlier
than the displayed time frame.)

Setting causal distributed breakpoints based solely on a visualization of message-
passing events may be difﬁcult, as it depends on the ability to locate relevant message-
passing patterns. In order to simplify the task, we offer an option for generating
programmer-deﬁned marker events when the replay data is captured. Speciﬁcally, a
GOLD-supported system call may be added to the source code in appropriate places,
where this system call includes a textual description of the marker event as one of
its parameters. During the initial program run, the corresponding marker events are
collected along with the replay data. When the space-time diagram is rendered, the
marker events are depicted using shaded boxes, where different colors shadings are
used to distinguish marker events. For example, Figure 4.3 shows a case in which
two marker events were used, highlighting when the program is between the Cholesky

factorization and forward substitution phases, and between the forward substitution

41

and backward substitution phases. A legend of the different marker events is depicted
below the space-time diagram, where the textual descriptions of the marker events
are used to label the corresponding shaded boxes. The marker events may also be
selected as the triggering breakpoint event for a causal distributed breakpoint.

The selection of an event in the space-time diagram gives the programmer an
opportunity to evaluate the causal dependencies for a triggering breakpoint, but
does not actually set the causal distributed breakpoint, such that it would stop the
program during an execution. In order to complete the insertion of the breakpoint,
the programmer may select the Insert button. At that time, the list of inserted
breakpoints will be updated. For example, in the case of Figure 4.3, the causal
distributed breakpoint depicted in the space-time diagram has already been inserted,
as indicated by the entry in the list (lower window of Figure 4.3). When the Insert
button is selected, the COM also records the event number at which each process
should stop in order to satisfy the causal distributed breakpoint. As the program is
running, each LOM maintains a counter that tallies the number of message-passing
and marker events that have occurred for the particular PVM process, and will stop
the execution of the process when the value of the counter matches the event count
for a distributed breakpoint. If the programmer decides an inserted breakpoint is no
longer necessary, then the appropriate entry in the list of inserted breakpoints may
be selected and the Remove option chosen.

One of the special cases addressed by GOLD is when causal distributed
breakpoints overlap. During an execution where multiple breakpoints are set, the
program stops at the breakpoints according to the relative order of the triggering
breakpoints, where this order is computed according to the “happened before” relation
for events recorded in the replay data. In a case where causal distributed breakpoints
overlap, however, particular processes may not be able to stop at the appropriate

location for a later breakpoint after the stopping location of an initial breakpoint has

42

been reached. For example, consider the two causal distributed breakpoints shown
in Figure 4.4, where the triggering breakpoints are located at points “X” and “0”.
If the execution initially stops at the causal distributed breakpoint triggered at point
“X” (i.e., process P stops at point “X”, process Q stops after executing event 6, and
process R stops after executing event a), process P has already passed its stopping
location for the second breakpoint triggered at point “0” (i.e., event c), and so the
requirements of that causal distributed breakpoint cannot be satisfied. A similar
conﬂict occurs for process R if the program initially stops at the breakpoint triggered
at point “0”. In order to avoid this type of conﬂict, GOLD checks for overlapping
causal distributed breakpoints when the Insert button is selected, and prevents the
insertion of a breakpoint that it determines to be overlapping with a breakpoint that

was inserted earlier.

 

— : Message-passingevem

——— :Locaﬁondbmakpolntstates

 

 

Time

Figure 4.4. Overlapping causal distributed breakpoints

GOLD provides the programmer with an option for stopping the execution of
a process at an event time different from that computed by the causal distributed

breakpoint algorithm} This option is useful for stopping the program at non-causal

 

1The interface for inserting causal distributed breakpoint supports this option by using the
different mouse buttons for separate purposes. Speciﬁcally, the left mouse button is used to select
the location of a triggering breakpoint, and the middle mouse button is used to mark an alternative
breakpoint event.

43

distributed breakpoints. For example, Figure 4.5 illustrates a case where processes
1—8 are each stopped after the first message-passing event (initiated by process 8)
in the backward substitution phase. (The causal breakpoint event for process 0 is
again located at an event earlier than the displayed time frame.) The location of

programmer-defined breakpoint events are marked by lighter-colored, ﬁlled circles.

4.2 Local Breakpoints

An operation for inserting a local breakpoint is provided by the LOI, accessible from
the breakpo int menu. The interface that is activated for inserting a local breakpoint
consists of three main regions, as shown in Figure 4.6. The top region contains a
call graph that the programmer uses to select the appropriate function in which the
breakpoint should be placed. When the function is selected, the source code for that
function is displayed in the middle window. The programmer may set a breakpoint
by selecting the appropriate location in the source code listing, and then choosing the
Insert button. The bottom region of the interface is a list of the local breakpoints
that have been inserted for the process. For example, in the case shown in Figure 4.6,
a local breakpoint has been inserted for process 0 (i.e., the host process) immediately
after the call to pvm_spawn. An option for removing a breakpoint is also provided.
A particular case in which setting local breakpoints may be useful is when a process
has stopped at a distributed breakpoint event, but it is then desirable to examine the
sequence of states immediately following that event. In that case, local breakpoints
may be inserted for controlling the execution of the process. The process may then
be run separately from the other processes in the system, using the execution options
accessible from the LOI. The process may be run on an individual basis in this manner
until it blocks, waiting to receive a message from another process, or until completion

if no other receive events occur.

44

D Triming bredtpoint event I Betwen calls to cholf‘o and For-sub

0 Canal bredtpoint event I Betwen calls to Forstb and backed:

D Wanner-defined bredcpoint event

Vmi’rgllwéwm

Distributed Breakpoints that re erently Inserted:
TRIGGERING PROCESS TIME OF TRIGGER“: BRERKPOINT
8 44

Dismiss

 

Figure 4.5. Example of a non-causal distributed breakpoint

45

 

 

 

 

 

 

w‘ ‘ VWCA‘S'

 

 

 

 

exit“): 4t
}
blksize = etoi(*++ergv):
if (blksize > n/p) blksize = n/p :

lt enroll in pvn */
so = pvn_ngtid():

; pvn_speun('cholnode', (cherI*)0, 0. ", nprocs, tids): 5;;i
— 55
’ nsgtgpe = 16001: 3;}
pvn_initsend(PvnDataDeFault)z ijg

 

 

 

 

 

 

[ Insert I l Renove I
: Breekpoint inserted in Function ’nein' (line 42 0F cholhost.c)
éLocel Breakpoints that are Currently Inserted:
In nein (cholhoshc): at line 42

 

 

 

P

 

5‘

 

 

 

 

 

i Dismiss Help i

   

 

Figure 4.6. Interface for insertion of local breakpoints

CHAPTER 5

Scalable Visualization Techniques

We have developed several techniques for improving the scalability of visualizations
that represent process communication [10, 11]. The basis for these techniques are
ﬁltering mechanisms, where the programmer is provided with a simple interface for
controlling the information that is depicted. Two cases in particular have been
considered. First, we developed a set of visualization-based features to handle
programs that make use of a large number of processes. Second, we deveIOped
an approach for visualizing program executions that incur a large degree of process
communication. These two orthogonal sets of features may also be used cooperatively

for the case of programs that grow in size in both dimensions.

5.1 Handling Large Numbers of Processes

Space-time diagrams are used by GOLD to depict process communication during a
program execution. Previously, we described how a space-time diagram is used to
facilitate the insertion of causal distributed breakpoints. The space-time diagram
that depicts communication during the program execution is similar, except that the
time of an event is based on actual time, in contrast to a relative time. Each LOM

sends a message to the COM whenever a send or receive event occurs. The COM

46

47

orders these messages as they are received, waiting to record receive events until
the corresponding send events arrive, and ensuring that the events at each node are
recorded in the order that they occur. When an event is able to be recorded (i.e.,
all events that “happened before” the event are already recorded), a timestamp is
recorded with the event, where this timestamp is used as the time of the event when
constructing the space-time diagram. This timestamping mechanism ensures accurate
event ordering, which is critical for debugging.

Programs that use large numbers of processes particularly inﬂuence the
effectiveness of a space-time diagram, as it becomes difficult to detect patterns and
inconsistencies in a space-time diagram when the number of elements plotted along the
process axis grows large. In order to handle this shortcoming of space-time diagrams,
we take advantage of a strategy used by many programmers when debugging. That
is, in most cases, the programmer has a general idea of the location of the source of
an error, which means that typically the behavior of only a subset of the relevant
processes is of interest during debugging. When this case occurs, the “Process Filter”

window shown in Figure 5.1 may be activated for ﬁltering what processes are depicted

 

 

 

ml ~31}: 'ProcessFil-ter ' '[9 g
Select the: .

if ODepict all processes

 

: ¢Depict only processes For which marker events occw

5; Q Depict only processes in the Following ren9e(s):

IHI

e 4-

 

 

 

 

 

 

 

 

 

 

Figure 5.1. Popup used to ﬁlter the processes

48

along the process axis. One of the selections shown in the window is an option to
specify one or more ranges of processes that should be depicted along the process
axis. For the case shown in Figure 5.1, processes 3—6 were selected as the range of
processes to be depicted, where the scenario was for an execution of the Cholesky
factorization program that used 16 worker processes. The unﬁltered version of the -
space-time diagram is shown in Figure 5.2, while the space-time diagram that has had
ﬁltering applied is shown in Figure 5.3. (The depicted time frame is the same in both
ﬁgures.) Clearly, the visualization shown in Figure 5.2 is too congested to be useful,
where, in contrast, the programmer may readily locate patterns of communication in
Figure 5.3.

An additional option shown in the popup in Figure 5.1 is one to depict only the
processes for which a programmer-deﬁned marker event occurs. This option is useful
when only a subset of processes participate in some phase of the program execution,
where a marker event has been added to record participation in the phase. As in the
above case, the number of processes along the process axis of the space-time diagram

may be signiﬁcantly reduced by using this option.

5.2 Handling a Large Degree of Process
Communication

Similar to the case above, the effectiveness of space-time diagrams is affected when a
program incurs a large degree of process communication. In this case, the detection
of patterns and inconsistencies is complicated since the space-time diagram becomes
“cluttered” with a large quantity of activity. Filtering options are again useful for
handling this situation, where, for this case, the filters are designed to eliminate
the depiction of all but a subset of messages. In particular, a single process is

selected as the process of interest, and a visualization referred to as a “Single Point-

49

Filters Options

p

LDC)‘IO')U'I#WMHO

HHHHHHH
mmA-NNHO

T I H E (nsec)

For Herker Events:
I Detueen cells to cholFo and Forsub

I Between cells to Forsd: and becksub

 

Figure 5.2. Space-time diagram before process-ﬁltering has been applied

P

3
a;

e? -:- V' '

:—T‘I'.""’."' 3350 '-:v.'-'.-'.'

MWN-t I? :"k 0“ '25-”:

A: (1%?“ A‘.‘..5.V.ﬁ’PY/Xs . '

.‘i ~.¢‘-a~'o.- calm/xx

$6.99 '.'-".’."-Z¢“<¢'a.'.‘—,\'Y.¢r'-

   
 

50

 

 

 

 

Filters Options

 

 

 

.t

 

 

 

 

 

 

 

 

 

_, Yuk? _.-/-. 3.1 1.3g, r: .-\ ,\;1)':‘r.’~ ' ,
9.111,; 3(1),“);.;.~.;j~:-;.;._-‘n-.~’1v~ -.;,/x‘.//),’% ‘“

43319 63859

T I H E (nsec)

Legend For Marker Events:

 

,i .5
I
.wWWWWJNMY-mﬁ-m twee“.

~. .a. \vsaecosﬁnevz‘

"vi-“- KIM -

 

n-J'ﬁfm'lrff ,‘f .‘h ”.7“ 56!. ‘2' c - '.' 1"{11 .-."~‘.

I Between cells to cholFo end Forstb

.Betueencellstoi-‘orsibmdbeckstb

 

{Aime arm‘s P:<\Kt\'h":-'_r'-‘.'

‘u.

 

 

I Disniss I I Help I

 

 

Figure 5.3. Space-time diagram after process-ﬁltering has been applied

avaxexexnkoramNt-é-m vkvxi-r uvmm awn“. w .-

r>.¢MX¢N£-Fua 'JN

V's v--.<~w‘ 5s

AV“ A I";

565. MﬂWMWWWMEWA’KvAW“ nun:- M We; Arc-x;

51

of-View (SPV) communication graph” is activated to depict only the messages that
are either received at or sent from that particular process. An example of this type
of visualization is shown in Figure 5.4, where process 10 has been selected as the
process of interest. (The event data for Figure 5.4 is the same as that used to render
the visualizations shown in Figures 5.2 and 5.3, where approximately the same time

frame is depicted.)

 

 

 

yam . 3' , . .'.‘ .'. '-'.‘-‘\.'<‘ , ._ J. .‘-'. '._._ 3. _._ $.39", '.' _.

" m )h' AVEW‘E

 

 

 

‘k'owﬁ‘M‘QWh‘CMav/KWU '33:: Name's. ._ ~ _-." . 473' .f-o- '.-. 3!)‘-I‘-‘".'<>'.' .-". .‘."I _ ~ :-:-‘-'J ‘6 rm»:

 

cam

XWi-IV 313')

”3“."

 

i-N'l.

 

 

 

 

a

 

 

I Biseiss I I Help I

 

 

{106w} VAWWMWWWmmmmmmwmm»nwam%xe MMWW’

{KBKOM W.$MWWJN~ ukW\’£W€Q”>xVA-‘

 

E

Figure 5.4. Example of a single point-of-view communication graph

52

One of the features of an SPV graph is the simple approach for activating it. An
SPV graph may initially be activated by selecting the appropriate process axis in a
space—time diagram with the middle mouse button. In order to change the process
of interest, the applicable process axis may be selected in the SPV graph, and the
SPV graph will then be updated accordingly. This visualization-based approach for
ﬁltering messages provides the programmer with a simple, efﬁcient mechanism for
examining process communication when a large degree of communication occurs.

Additional options are offered for changing the scale of either the process or time
axes. These options are useful for either the case of a large number of processes or
a large degree of process communication. The programmer may use these options
to “zoom in” towards a particular region of a space-time diagram. For example,
Figure 5.5 illustrates a case in which the time axis scale for the space-time diagram
shown in Figure 5.3 was modiﬁed. By changing the time axis scale, the message-
passing events immediately before the backward substitution phase could be examined

more closely.

 

53

 

    
 

I

m

   

A

v

 

Whm>lom<mmmW-EWWWWW"- 23v. WWW 10H! m
U) m
0|

'0'

Filters Options

Legend f‘or Marker Events:

   

 

 

 

 

 

 

 

 

 

 

 

T I H E {nsec)

 

. I Between cells to cholFo end Forsib

 

 

 

 

 

 

 

Figure 5.5. Modifying the time axis scale to “zoom in”

CHAPTER 6

Visualization-based Examination

of Program States

Distinct types of visualizations are used to represent different characteristics of a
parallel program execution. For example, space-time diagrams are designed to depict
the communication between processes across time, whereas other visualizations are
targeted towards representing the execution in terms of the source code structure
[45, 46]. These different types of visualizations each have individual advantages,
and so may be applied for different purposes during debugging. In this chapter, we
present guidelines for coordinating the use of the different visualizations for examining
program states [9, 11, 25]. We ﬁrst introduce the different types of visualizations
supported by GOLD, and we then discuss how they are applied in a top-down,

integrated manner.

6.1 Types of Visualizations Supported by GOLD

The initial execution of a program using GOLD is dedicated to the collection of
replay data, but the programmer may then run the program any additional number

of times in order to examine particular states. The different visualizations used for

54

55

examining program states may be categorized into four types, described as follows.
First, communication graphs are used to examine communication patterns during a
program execution. Both space—time diagrams and SPV graphs are supported for
depicting message-passing events as they occur. In the case of space-time diagrams,
any programmer-deﬁned marker events are also depicted.

Second, a call graph is used to depict the current location in the execution of a
particular process. The function hierarchy (i.e., a tree structure) is used to graphically
represent the location, where the path from the root node to the node representing
the current function is highlighted. An example of a call graph is shown in Figure 6.1,

where the current location in the execution is within the function forsub.

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

I Dismiss I I Help I

 

 

 

 

Figure 6.1. Example of a call graph

56

A third type of visualization supported by GOLD is a source code listing. This
type of visualization is also used to depict the current location in the execution of a
particular process. In this case, however, the location is displayed via a listing of the
relevant source code, where the current location is highlighted. Although source code
listings have a textual format, as opposed to a graphical format, they are handled
as a visualization since their use is consistent with the use of the graphically-based
visualizations. An example of a source code listing is illustrated in Figure 6.2, where

the current location within the function forsub is shown.

 

 

 

 

 

)
pvn_ukaloet( tt2, . 1 ):
t+=t2:

}

}
else {
pvn_initsend( PvnDataDeFault ):
pvn_kaloat( at, 1. 1 ):
nsgtype = k:
pvn_send(tids[nep[k]], nsgtype):
}

bl,” -= t :
bIJ] /= icoIIJ] ,

J++ 0

I

O

i becksub ( n, nrows, nap, nyrows, row, b ,ne, nprocs)
: int me, n, nrows, leap, Inyrows :
Float "row, Ilb :

int nprocs:

 

 

 

 

 

 

 

Source Code File: mime LExenine Selected Variable I

 

 

 

 

 

I Dismiss I l H°1P I

Figure 6.2. Example of a source code listing

57

Finally, data visualizations are used to inspect the contents of data structures.
They are especially useful for the case in which the other types of visualizations
have been used to narrow the source of a suspected error to the operations on a
particular data structure. The values of one- and two-dimensional arrays are depicted
graphically, while the values of other types of data structures are printed textually.
For the latter case, the contents of data structures located in the same process are
displayed in a common window. For example, Figure 6.3 shows a case in which the

values of several of the variables from the function forsub are displayed.

 

 

 

 

Process Index: 3

 

ne = 2
t = -o.ooez7515225
n = 24

 

 

 

 

 

 

 

 

 

 

Figure 6.3. Textual display of data structures

For the depiction of one— or two-dimensional arrays, each array is displayed in a
separate window. If the selected array has been declared with static dimension(s)
in the current function, then the size of that array is accessible via local debugger

commands. However, if the array is dynamically allocated, or passed as a pointer to

58

the current function, the size is not readily accessible, and so a separate interface is
activated to obtain additional information from the programmer. For example, in the
case of the source code listing in Figure 6.2, the variable map was passed as a pointer
to the function f orsub. Upon selecting that variable for examination, the “Additional
Variable Information” interface shown in Figure 6.4 was activated. This interface lets
the programmer indicate whether the selected variable (i.e., a pointer) points to a
single element, a one-dimensional array, a two—dimensional array, or to some other
data structure. If the selection for a one-dimensional array or a two-dimensional array
is chosen, then additional size information must also be entered in the appropriate
textﬁeld. In the case shown in Figure 6.4, the selection for a one-dimensional array
has been chosen for the variable map, and the size has been set equal to the variable
11. (The variable n is also passed as a variable to the function forsub.) By entering
this size information in the appropriate textﬁeld and selecting the Apply button, the
LOM has enough information to query the local debugger for the values of the array
elements.

The one-dimensional array visualization corresponding to the variable map is
shown in Figure 6.5. In this case, the array values are plotted as a bar chart, where
the length of a bar represents its relative magnitude. The precise values of particular
elements may be examined by selecting the corresponding bar (thus highlighting
the bar), such that a key beneﬁt of the array visualization is that it provides a
simple format for probing the values of the array elements. In the case shown in
Figure 6.5, the value of element 21 has just been probed, where the bar for element
21 is highlighted and its value is displayed.

An example of a two-dimensional array is shown in Figure 6.6. In this case,
the array values for the variable output are plotted in a two-dimensional grid,
where different color shadings represent different ranges of values. (In the particular

case shown in Figure 6.6, all the array values fall in either the top or the bottom

59

 

:1 - ' i

mI mg): Additional Variable InFornation . [a In
Process Index: 3

 

 

 

Not enough inFornation was available to Fully determine the type

oF variable "nap . Please choose the wmriate selection
that corresponds to "nap“:

0 Single element
9 [he-dimeional array

0 Two-dimensional array
0 Other

Enter umber oF elements in First dimension (nay be an
expression that includes local integer variables):

 

 

first-er want-er a? ailments In seam likewise: {may he or:
ewe-r seal-.34: that iralissiea Ian's}. image! as: wales} ;

 

 

 

 

 

 

 

 

- 99919 I Cancel .

 

 

 

 

 

 

Figure 6.4. Interface for obtaining additional variable information

range of the four possibilities listed in the legend.) Similar to the case of a one-
dimensional array, the precise value of a particular element may be examined by
selecting the corresponding rectangle, where the rectangle is then highlighted and the
corresponding value is displayed.

One of the features of the array visualizations is an option for ﬁltering the values
in order to highlight particular instances of data. Several ﬁltering possibilities are
provided, including choices for ﬁltering values that exceed some maximum value,

that fall below some minimum value, that are equal to some critical value, or that

60

 

 

 

 

 

 

 

 

 

 

 

 

cl €6.11; ﬂue-Dimional My Visualization lalaII
Process Index: 3
Saree Code File: cholnode.c Vridale: up I Filter Date I I Save Visualization I
23
l
V i
R i
L L
u I
E l
S i
o _ l
0
MERRY INDICES 23
Value 0F nap[21] = 21.000000
I “am” I ”'MHZF' -

 

 

 

Figure 6.5. Graphical depiction of a one—dimensional array

are within some critical range. The interface for ﬁltering data values is shown in
Figure 6.7, where a ﬁlter has been selected for highlighting negative data values in
the two-dimensional array visualization that was shown in Figure 6.6. The ﬁltered
version of the two—dimensional array visualization is shown in Figure 6.8.

An additional feature of the array visualizations is an option for saving the current
contents of a visualization. This option is useful for examining changes in an array as
the program executes. The saved version of the array visualization may be compared
to an array visualization that is activated at a later point in the program execution.
For example, Figure 6.9 illustrates a case in which the array visualization shown in
Figure 6.8 was saved, and then compared to an array visualization for the variable

output at a later point in the program execution.

61

 

 

43L EOLD: Tau "' ‘ ‘ May Visualization l“I-_J1I
Process Index: 2

SareetodeFile: cholnode.c Variable: wtput IFilter Data | ISaveVisualizationI

 

 

 

 

 

 

 

 

 

 

 

 

 

LEGEND:
u
I Value >= 1.06
I [:1 1.06 > Value >= 0.71
N
n [:I 0.71 > Value >= 0.35
I I 0.35 > Value
c
E
s
COLUMN INDICES
Value oF wtputiZlIiil = —0.002337
o. ~

 

 

Figure 6.6. Graphical depiction of a two-dimensional array

The depiction of the space-time diagram used for state examination is handled by
the COM. The LOMs notify the COM whenever a message-passing or marker event
occurs, such that the space-time diagram may be updated. In contrast, the LOMs
handle the rendering of the other three types of visualizations. The information that
is contained in call graphs, source code listings, and data visualizations is conﬁned to
a single process, and therefore communication with the COM is unnecessary.

The communication graphs are continuously updated during program execution.
When the program stops at a causal distributed breakpoint, scroll bars along the
time axes of the space-time diagram and the SPV graph may be used to examine the

communication events throughout the execution history. As such, the COM stores an

62

 

 

 

 

 

 

 

 

 

L—‘I GOLD: Filter Prra Data 9 1.! I
Process Index: 2
Please select the appropriate Filter to apply, where only
values matching the selected Filter should be shown:

0 None
C) Value >
0 Value =
Q value < ”M... .WMWH V.W.,........l
ORange: < Value < i

_ in 04.8321 I I

 

 

 

 

 

 

 

 

 

Soiree Code File: cholnode.c Vriable: output 5 Filter Data I iSave Visualization I

H62

mmn

 

I Hatch

.Noﬂatch

 

 

 

 

 

 

 

COLUMNINDICES
FilteringsuccessFul.

 

 

 

 

l Dis-is: M

      

 

Figure 6.8. Filtered version of a two—dimensional array

9.11:7...“ ‘Maw ' ' lJleI

 

63

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   
 

      

  

 

 

 

 

  

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

. _ as: a _ .25.. a .7, , ..
8083. u "8:835 ea 83, ll!” _ 383 w
mug—ax“ zzzqou
9480.? " ”8:89:33 we 33>
m w 0 ~ 9 z u z x = .— o u
m em
a i a
u w. u
~ _-_—__-a<...\ u
.312. : em; _
as... a
7133323; 2i H ”sash, senses 8588858 c
g3 8328: 0.8835 ”3&3;
no
ml.

 

 

 

 

 

Figure 6.9. Examination of changing values for a two-dimensional array

64

entry for each communication event that occurs in order to enable the programmer to
browse through the entire collection at any point during the execution of the program.
In contrast, call graphs, source code listings, and data visualizations are only updated
when the execution reaches a breakpoint (either a causal distributed breakpoint or a
local breakpoint). These visualizations reﬂect the current state of the program, and
so do not require the storage of execution data across time, yielding a savings in data
storage requirements. This feature is a side-effect of using an on-line approach to
visualization as opposed to a post-mortem approach. In the case of a post-mortem
strategy, any data that is potentially of interest must be stored during the program

execution, such that it may later be visualized after execution is complete.

6.2 Top-down Coordination of Visualizations

The different types of visualizations are applied in a top-down manner for examining
a program state. That is, the communication graphs and status grid are used as
the top-level visualizations, where the application of the other types of visualizations
is based on the selection of process axes or events from a communication graph.
Visualizations may be applied according to the hierarchy shown in Figure 6.10. Upon
selection of a process axis in a communication graph, the programmer is given the
options listed in the “Process Selection Window” shown in Figure 6.11, where the
program is currently stopped at the causal distributed breakpoint that was illustrated
in Figure 4.5. In particular, options are provided for activating an SPV graph, an
LOI, an LOI and a call graph, or an LOI and a source code listing for the selected
process. The advantage of activating the LOI at the same time as a call graph or a
source code listing is that the relevant process may be executed independently of the
other processes for examining a particular sequence of states. Alternatively, a call

graph or source code listing may be activated as an option from the LOI.

65

 

 

 

 

 

3
3
é? 5‘ 4g
3
5

Call
Graph

 

 

 

 

 

 

Communication Graphs
Space-time Diagram SPV Graph .
Causal Distributed I
Breakpoints interface
Local rations
neg...

 

 

 

Figure 6.10. Visualization hierarchy for examining a breakpoint state

66

 

%

 

33333358 838.

828 as ozone 3 £8 Sosa I

 

noon-v m x a ._.

venom are. so use...

 

 

 

 

 

 

EE

 

 

 

 

 

889: Lo... 9.5»: 38 8.58 v5 :20

 

saga 8a £3 :3 2. :20
882a be and serous essence 38.5
£5 838a; 51928 2850

 

 

 

35):”!

 

 

 

 

 

 

 

 

Figure 6.11. Options for examining a breakpoint state from a space-time diagram

 

67

The “Process Selection Window” may also be activated by selecting the
appropriate process from the status grid. This option may be useful, for example, if
the status grid indicates a process has died. The relevant LOI, call graph, or source
code listing may be activated for determining the point at which the particular process
died.

An additional option is provided when an endpoint of a message-passing event
is selected from a communication graph, where the causal distributed breakpoint
interface will then be activated. In that case, the causal distributed breakpoint that
has the selected event as its triggering breakpoint is illustrated. If the programmer
decides to insert the illustrated breakpoint, then the breakpoint will not have any
effect on the current execution (since the relevant point in the execution has already
passed), but will apply for future re—executions.

The hierarchy shown in Figure 6.10 indicates that additional levels of visualizations
may be activated based on the selection of items in call graphs and source code listings.
If a function node is selected in a call graph, then a source code listing corresponding
to that function is displayed. If a variable name is selected from the source code
listing, then an appropriate data visualization is depicted.

The hierarchy for applying visualizations shown in Figure 6.10 is based on the
scope of the different visualizations. As the scope of the visualizations narrows,
the level of detail revealed by them increases. For example, space-time diagrams
depict global program activity (i.e., a broad scope), but they reveal only a low level
of detail regarding the execution of each process. In contrast, a data visualization
depicts detailed information regarding the contents of a data structure for a particular
process, but its scope is consequently limited to that data structure.

Therefore, we coordinate the use of numerous types of visualizations in order
to obtain a top-down approach for the examination of a program state, where the

examination initially proceeds from a broad scope, but where the scope narrows as

68

the potential source of an error is investigated. Speciﬁcally, a communication graph
is initially used to depict the overall message-passing activity in the system, where
increasingly greater detail is revealed by examining lower-level visualizations, such as
call graphs, source code listings, and data visualizations, all of which can be activated
at the local process level. By using this approach, the programmer may systematically

converge towards the source of an error.

CHAPTER 7

The Modeling of Expected
Behavior in Program

Visualizations

This chapter discusses a method for using the LOTOS speciﬁcation of a parallel
program to model expected behavior in visualizations of the program’s execution
[8, 26]. For the case of visualizing communication between processing elements}
each LOTOS process represents a phase in the computation, where we overlap a
graphical representation of the phases onto a space-time diagram. For the case
in which a program is stopped at a breakpoint, we have developed techniques for
visualizing the current location in terms of the formal speciﬁcation, either via a
visualization of the LOTOS process hierarchy or via a speciﬁcation listing. This
overall methodology facilitates the task of debugging a parallel program by supporting
graphical comparisons between the actual behavior of the program and its expected

behavior. Prior to the development of GOLD, we developed a prototype debugging

environment, PANORAMA, to implement the expected behavior methodology. An

 

1Throughout this chapter, “processing element” refers to an operating system process, while
“process” refers to a LOTOS process.

69

70

overview of PANORAMA is presented in Section 7.1. Section 7.2 discusses how
PANORAMA supports the collection of data, and Section 7.3 discusses how it supports
the modeling of expected behavior in program visualizations. Section 7.4 considers
a practical example where we demonstrate how PAN DRAMA can be used to detect
message-passing errors. Section 7.5 discusses the integration of the expected behavior
modeling techniques into the GOLD debugging environment, including improvements

and extensions to the original methodology.

7 .1 Overview of Panorama

PANORAMA is a tool for visualizing the execution of a message-passing parallel
program within a model of the program’s expected behavior. There are two main
steps required to visualize program execution using PAN ORAMA: data collection and
graphical depiction. Below, we present an overview of how these two steps may be
used to generate program visualizations, while the details regarding each of the steps
are discussed in turn in the subsequent subsections.

The general execution ﬂow for PANORAMA is shown in Figure 7.1. PANORAMA
incorporates a post-mortem visualization strategy, where trace data of important
events is collected during program execution, while the graphical depiction of the
data takes place off-line after execution is complete. Post-mortem approaches such
as this one have been widely applied by visualization tools [24, 47, 48, 49, 50].

The ﬁrst step towards visualizing program execution is to collect the appropriate
data for rendering the visualizations. Both the LOTOS speciﬁcation and the
source code of the program are required as input for accomplishing this task. The
speciﬁcation is analyzed during the data collection step to derive a model of the
expected behavior for the program, where this model is recorded in the form of

expected behavior data. This expected behavior model is not only used in the

71

 

   
    

 

  

 

 

 

 

 

 

 

 

LOTOS Expected
S eci cation Behavior Data ,
p ﬂ Data > (3139111931 Visualizations
Collection I Depiction >
Step F Step
Source Trace Data
Code (Based on the Expected
Behavioral Model)

 

 

 

Figure 7.1. General execution ﬂow of PANORAMA

graphical depiction step, but it also guides the instrumentation of relevant events
in the source code. The instrumented version of the source code is then compiled
and executed, where the instrumentation generates trace data during the program
execution.

Both the expected behavior data and the trace data generated by the data
collection step are used as input to the graphical depiction step. The graphical
depiction step uses the two types of data to render a visualization of the program
execution that models the expected behavior of the program. Specifically, a mapping
is maintained between the expected behavior model and the trace data, thus providing
a means for visualizing the trace data in terms of high-level events from the LOTOS
speciﬁcation. Such a visualization strategy is useful for debugging communication

errors, which we illustrate by example in Section 7.4.

7 .2 Data Collection Step

The collection of data is the ﬁrst step towards visualizing program execution using
PANORAMA. In this subsection, we discuss the collection of the two types of data
used by PANORAMA’S graphical depiction step: the expected behavior data and the

trace data. In particular, we discuss how these two types of data are generated using

72

the LOTOS speciﬁcation and source code of the program as input.

7.2.1 Collection of Expected Behavior Data

A LOTOS speciﬁcation is an abstraction of a program, from which a model of the
program’s expected behavior may be derived. In order to facilitate the task of
debugging message-passing errors, PANORAMA’s expected behavior model focuses
on capturing the occurrences of the message-passing events in a program. By
concentrating on the aspects of a speciﬁcation related to message-passing events, the
expected behavior model forms an abbreviated representation of the speciﬁcation,
while retaining a level of abstraction in the speciﬁcation.

PANORAMA generates the expected behavior model by calculating the hierarchy
of processes within the LOTOS speciﬁcation. The hierarchy forms a tree structure,
where the name of the speciﬁcation is at the root of the tree. As each process is
added to the tree, the message-passing actions that occur in the behavior expression
of that process are recorded with it. For example, Figure 7.2 shows the subtree
of processes for the speciﬁcation of the host node in the number-doubling program
from Section 2.1.3, where the message-passing actions that occur in the speciﬁcation
are listed with their corresponding processes. After ﬁnishing the calculation of the
expected behavior model (i.e., the tree structure), PANORAMA stores the model,
where the stored version is known as expected behavior data.

Recording the message-passing actions according to where they appear in the
process hierarchy structure enables message-passing actions to be categorized. This
approach effectively treats the message-passing actions that occur in separate process
deﬁnitions as being of different types. For example, in the speciﬁcation of the number-
doubling program that was shown in Figure 2.6, PANORAMA categorizes the action
to send a message located in the Send_Numbers process differently from the action

to send a message located in the Worker_Node process, where the names of the

73

process deﬁnitions are used to identify these different categories of message-sending
actions. Later, we will show how this approach allows us to visualize the different
types of message-passing events that occur during a program execution in terms of
their corresponding locations in the expected behavior model.

The strategy for generating the expected behavior model is based primarily on
the nesting of process deﬁnitions, where this nesting structure is provided by default
in a LOTOS speciﬁcation. One aspect of LOTOS speciﬁcations that PANORAMA
considers during the generation of the model is that there are no reserved names for
the gates used in message-passing actions. Because LOTOS does not have reserved
keywords for this case, the programmer is allowed to explicitly declare gate names,
or, by default, send and recv will be used. Given the names of the gates, PANORAMA
is then able to identify the message-passing actions in the behavior expression of each

process deﬁnition.

7.2.2 Collection of Trace Data

In order to visualize a program’s execution, instrumentation must be used to identify

relevant events for the program to record when it is executed. The trace data

 

 

I Host_Node

 

 

 

 

l l

Send_Numbers Receive_Replies
send !0 !NodeCtr lNodeCtr recv ?Sender:Nat !O ?Reply:Nat

 

 

 

 

 

 

 

 

Figure 7.2. Subtree of processes for the speciﬁcation of the host node in the number-
doubling program

74

generated via instrumentation may later be graphically depicted in the form of
visualizations. In the following discussion, we explain how PANORAMA supports
each of the three stages of trace data collection: instrumentation, program execution

(during which trace data is generated), and post-processing.

Instrumentation

PAN DRAMA uses a software instrumentation approach for the collection of trace data,
where appropriate statements are added to the source code in order to generate the
relevant trace data during program execution. The method for adding these data
collection statements is based directly on the expected behavior model. Speciﬁcally,
the method allows the programmer to map items from the expected behavior model
to their corresponding instrumentation points in the source code via a graphical
interface. The beneﬁt of this approach is that the generated trace data may be
graphically depicted in terms of the expected behavior model, which will be discussed
in detail in Section 7.3. The programmer begins the mapping procedure by selecting
an entry from a nested listing of the processes, where the selection of the particular
process is based on the programmer’s decision to instrument the source code events
corresponding to that process. The programmer is then requested by PAN ORAMA to

indicate instrumentation points in the source code for each of the following events:

1. Entry point of the process: The location in the source code at which point the
implementation of the process’ behavior expression begins

2. Exit point of the process: The location in the source code at which point the
implementation of the process’ behavior expression ends

3. Message-sending actions for that process: The message-sending statements in
the source code that correspond to each of the message-sending actions in the
behavior expression of the process

4. Message-receiving actions for that process: The message-receiving statements
in the source code that correspond to each of the message—receiving actions in
the behavior expression of the process

75

Given that a LOTOS speciﬁcation is an abstract representation of the
implementation, there may not be a direct correspondence between the speciﬁcation
and the source code. Therefore, for each of the events listed above, the programmer
may indicate an approximate location in the source code if an exact location is
not available. For example, broadcasts are often implemented within a parallel
programming environment as a single procedure call, but LOTOS requires each
message that is sent to be speciﬁed as a separate action. In this case, PANORAMA
provides the programmer with a special option for indicating a many-to-one mapping.

After selecting instrumentation points for each of these events, the programmer
may then start the mapping procedure again for a new process from the expected
behavior model. An example of this mapping procedure is shown in Figure 7.3, where
the Receive-Replies process from the number-doubling program is being handled. The
upper window lists the events to be instrumented for the Receive_Replies process,
while the lower window displays the source code for the node program of the number-
doubling program. The button labeled “Instrument Program” is used to perform a
speciﬁc event instrumentation after items have been selected from both the upper
and lower windows. The button labeled “Select new process” allows a new process
to be chosen for instrumentation from a nested listing of the processes.

After the programmer has performed the mapping procedure for each process
of interest, PANORAMA handles the underlying details of adding the data
collection statements to the source code. Currently, PANORAMA provides software
instrumentation support for programs that use PVM (Parallel Virtual Machine)
version 2.4 [36] message-passing primitives. Both C and Fortran coding options are
handled by PANORAMA.

A signiﬁcant advantage to PANORAMA’S support of a software instrumentation
approach is that the data collection statements are capable of generating auxiliary

information as part of a traced event, where the auxiliary information is used to map

O O l U I O I ‘ D

: “H v .. -' i 1 a
. aaaaaaaaa
-:-:-:- :- i-rom-M-i-t-M-z - - .

0 II. .O'III'OOI I 0.10 .I .I

a o'u 9 ° a o'a'n‘ﬂv'a‘r'a'a‘ 'n' a‘e'a'e'o “I .50.

, a“ . .,.,._._._._.,.,-..,.,._. ._.‘ . . _ . .
l'. I C .I.I'I.l.w.o'-.w.n'1.l'p.a.- U'I‘I.h'h.~.l'l O U . I I O O V l I
'0‘...‘ Q'O‘I-Q.O.IIQ.‘.O‘IIO.I.I C lllllll

 

 

Events to instrunent:

 

Receivejeplies

: Entry Point
Receive_Replies : Exit Point
Receive_Replies : Receive action: recv ?Sender:Nat IO ?Reply:Nat i

i
i
I
i
i
i
l
l
i
i
i

 

 

a.- wooe- Immauaa so. use

Select an event Froa the above list and then indicate
the corresponding instruaentation point in the source code:

 

n

/a
Node program.

coo-emu a...mom.a...no.mun-cacao."Doomoneounmwomu-euaa-umwmouuwanew”...-amoooomamennmmomemou-mno—lmnnmanonueuu on a so
a
.-

 

 

ea
00
a.
ION”... a...accoun-enema-mooeenIIo-ounaum

Figure 7.3. Example of the mapping procedure

the event to its corresponding location in the source code and formal speciﬁcation.
However, the disadvantage of software instrumentation is that it shares the processor
with the application program, and therefore may produce interference in terms of
both execution time and the ordering of message—passing events between processors.

Less-intrusive forms of program instrumentation are possible, such as hardware
instrumentation where specialized components are used to collect trace data. The
disadvantage of using hardware instrumentation is the difﬁculty of establishing a
mapping between the low-level form of the collected data and the high-level form
of the source code and program speciﬁcation. Our visualization strategy relies
on the existence of a mapping between the collected trace data and the expected

behavior model, and thus a software instrumentation approach has been chosen for

77

PANORAMA. Although this approach may potentially alter the execution behavior
of the program, PANORAMA attempts to avoid this possibility by allowing the
programmer to selectively add instrumentation for only the processes in the expected

behavior model that are of interest.

Program Execution

The instrumented version of the source code may be compiled and executed, where
each processing element produces a ﬁle of trace data during the program execution.
The merging of these trace ﬁles is performed by PANORAMA in the post-processing

stage.

Post-processing of the Trace Data

The post-processing stage performs a time-ordered merge of the trace ﬁles that were
generated by the processing elements during the program execution stage. Since
distributed-memory systems generally lack a synchronized global clock, additional
analysis is then performed to adjust the ordering to be consistent with the happened-
before relation. In the PAN ORAMA framework, it is guaranteed that no message-

passing events are listed in the time-ordering as being received before they are sent.

7 .3 Graphical Depiction Step

The graphical depiction step of PANORAMA uses both the expected behavior data
and the trace data to render a visualization of the program execution that models the
expected behavior of the program. In this subsection, we discuss the visualization
strategy used by PAN ORAMA, including a description of the major features available
for generating visualizations.

Space-time diagrams display communication events between processing elements

78

across time. One axis of the diagram represents the processing elements, while the
other axis represents time. PANORAMA uses an enhanced version of a space-time
diagram for graphically depicting program execution, where the enhancement is an
overlapping of the active LOTOS processes onto the diagram. (A process is considered
“active” between its recorded entry and exit times.) Since both message—passing
events and process entry / exit times are depicted by the diagram, it effectively displays
each of the items from the expected behavior model across time as the corresponding
events appear in the trace data. By graphically depicting the trace data in terms
of items from the expected behavior model, the diagram facilitates a comparison
between the actual behavior and expected behavior of the program. Thus, we call
this diagram a Behavior Comparison graph, or BC—graph for short.

The BC-graph is constructed by using arcs to represent message-passing events
and shaded rectangles to represent the active processes. (The shaded portion of a
rectangle represents the interval between the entry and exit times of the corresponding
process.) Different shading patterns are used to distinguish between active processes.
The graphical depiction of the events is performed by a playback strategy, where the
programmer may choose either to sequence through the events in a step mode or have
PANORAMA provide a simulated replay. For example, a BC-graph of the execution of
the number-doubling program is illustrated in Figure 7.4. This visualization shows
that six processing elements were involved in the computation, consisting of one
host (by default, labeled “0”) and ﬁve worker nodes (labeled “1-5”). The shaded
rectangles represent the intervals of time throughout the program execution during
which the Send_Numbers, Receive_Replies, and Worker_Node processes were active.
The message-passing events that occurred during the program execution are depicted
in the model of active processes, thus facilitating a visual mapping of the message—
passing events to their corresponding location in the expected behavioral model.

PANORAMA’s visualization strategy offers several features that facilitate the

79

iii“ .Slieiiliiiitiiiihillf if 9.... 221 =
i Data Control:

 

 

 

Replay Control:

 

 

 

 

 

 

 

 

 

l \ \
0 ,1 I.

84 417

T I H E (in as)
Filtered Processes:
- SendJiunbers - Receive_Raplies
a HorkerJiode

 

 

OZHWMMGOW'U
m—izmxmr-m
M

 

 

 

Figure 7.4. BC-graph of the execution of the number-doubling program

debugging of message-passing errors. One of the key features is that the graphical
elements that represent the items from the expected behavior model may be selected
using the cursor, at which time windows are activated that display the portions of
the source code and LOTOS speciﬁcation that correspond to the selected element.
This feature provides a simple method for the programmer to investigate a potential
error that was detected in the visualization. Another feature is the ability to perform
selective ﬁltering of the events to be depicted, where the processes in the expected
behavioral model are used as the basis for the selection. PANORAMA’s default
graphical depiction strategy is to display the entire set of events recorded in the

trace data, potentially leading to congestion. In order to reduce the congestion, the

80

programmer may choose relevant processes from a listing of the hierarchy of processes,
such that the events will be displayed exclusively for the chosen processes. Filtering
options are also provided for displaying only the active processes or only the message-
passing events. Finally, a BC-graph provides an abstraction (clustering) mechanism
for displaying a subtree of active processes by their parent process, thus reducing
the congestion that may be caused by displaying the individual active processes (as
depicted by the shaded rectangles). This last feature may be used in combination with
the selective ﬁltering option to provide a top-down approach to debugging. Examples

of several of these debugging features are given in the following section.

7 .4 Debugging Example: Cholesky Factorization

Space—time diagrams are useful for locating errors in simple message-passing patterns.
However, for cases in which the message-passing patterns between processing elements
are either complex or nonexistent, the beneﬁt of space-time diagrams is limited. In
this section, we illustrate an example in which PANORAMA facilitates the task of
debugging a message-passing error, where the error is not readily detected when
viewing space-time diagrams. The application we use is the Cholesky factorization
program. The trace data for this example was obtained by running the program on a
cluster of eight identical ethernet-connected SUN SPARCstation 1 workstations. In
the course of the discussion, we provide several PANORAMA-generated visualizations
that demonstrate the debugging features of PANORAMA.

The error that we investigate is located in the communication step of the forward
substitution phase. Speciﬁcally, the processing element that is assigned the column
corresponding to the current value of the index variable must wait to receive messages
from TOTAL_ WORKERS - 1 worker node processes, where a looping construct is

used to implement the receipt of multiple messages. We consider the case in which

81

the looping construct is implemented erroneously, such that it waits for messages
from T OTAL_ WORKERS worker node processes instead of TOTAL_ WORKERS —
1. Since only TOTAL_ WORKERS — 1 worker nodes send messages, the processing
element that is assigned the column enters a deadlock state since it is waiting for a
message that will never arrive. The other worker nodes are able to proceed to the
next iteration, but each one eventually enters the deadlock state upon executing an
iteration in which it is assigned the current column. Since the processing elements
enter deadlock states, our discussion below considers partial trace ﬁles that were
generated by the worker nodes before they entered the deadlock state.

The complexity of the message-passing patterns complicates the task of locating
the error in the forward substitution phase. For example, a BC-graph of only message-
passing events at the point of deadlock is shown in Figure 7.5. (The host node,
represented by processing element “0”, does not communicate with the worker nodes
after the initialization phase, and, thus, there are no message-passing events shown
for processing element “0”.) This particular visualization perspective is equivalent to
that provided by space-time diagrams.

By using the clustering mechanism, the BC-graph may be used to display the
communication behavior in a model of the high-level stages of the program. For
example, Figure 7.6 shows the message-passing behavior in the synchronous Cholesky
factorization and forward substitution phases of the program, where phases are
distinguished by the level of shading. By using this type of visualization perspective
for initially examining the program execution, the source of the problem may be
narrowed down to the forward substitution phase.

In order to gain a better understanding of the erroneous message-passing behavior
in the forward substitution phase, ﬁltering may be used to depict key events. For
example, Figure 7.7 shows the case in which ﬁlters have been applied to depict only the

active For.Sub_Receive_Messages processes along with the relevant message-passing

 

82

Data Control:
Replay Control:

8

m—dzmzmt-rn

p
R
0
C
E
S
S
I
N
G

T I H E (in ms)
Filtered Processes:
SgnchronousJZholeskgjactorization

g Forwa‘d_5¢.bstitution

 

Figure 7.5. BC-graph of only message-passing events

events.

The visualization in Figure 7.7 shows that the processing elements are each waiting
in the For_Sub_Receive_Messages process at the point where the information in the
partial trace ﬁles is exhausted. (The depicted events are the last events recorded for
each of the processing elements, at which point progress apparently stops since none
of the processing elements completed the execution of the program.) The behavior of
processing element “1” is of particular interest, as we see that it received one message
from each of the other processing elements (i.e., the expected behavior, as deﬁned
by the For_Sub_Receive_Messages process for a worker node whose CurrentCol

E M Y_ COL_SET), yet it failed to exit the For_Sub_Receive_Messages process as

83

Data Control:
Replay Control:

8

P
R
D
C
E
S
S
I
N
G

m—(zmzrnI—rn

T I H E (in ms)
Filtered Processes:
Sgnchronous_Choleskg_Factorization

a Foruard_Substitution

 

Figure 7.6. BC-graph of both active processes and message-passing events

it should have at approximately the time 994 milliseconds. Upon locating this
questionable behavior, the programmer may use the cursor to select any of the
graphical elements that represent the process, at which time windows are displayed
showing the speciﬁcation and source code corresponding to the questionable event,
as is shown in Figure 7.8. Through a comparison of the contents of the two windows,
an inconsistency may be detected in the bound for the number of messages to be
received, where the speciﬁcation states a bound equal to TOTAL_WORKERS — 1,
but the source code states a bound equal to TOTAL_ WORKERS. This inconsistency

explains the cause of the deadlock problem.

84

Data Control:
Replag Control:

8

m—Izmzmr-rn

p
R
0
C
E
S
S
l
N
G

T I H E (in ms)
Filtered Processes:
a For_Stb_Receive_Hessa9ec

 

Figure 7.7. BC-graph of the ﬁltered events

7.5 Integration with GOLD

GOLD is designed as on-line debugger, in contrast to PANORAMA, which is a
post-mortem debugger. As such, several features were added to the expected
behavior modeling approach when it was integrated into GOLD, where these features
speciﬁcally take advantage of the on-line debugging framework. In addition, some
improvements to the interfaces were made, in particular for the mapping tool. These

improvements and extensions are discussed below.

85

 

 

 

«Nance
«atwudvaoocg
2va

“3:95

V ?L “aghevvw “on: .8...

v no: N 3:31

 

 

3533336 Eu: 8.1x!
Announcuazuﬂaubug 8.36.8
A-SS-oﬂguguLOWV 08%.!
A§n§.8o¢-9mnb.tvoo§
9:8 A: SangudSC n .3335:
A

up; o bivn§.§38amﬁzugasiﬂaubu
“Eng :1 C: «in: >09.

c
A- "Augie? w biog
iubgvn§\§u§ 2:3 3 bk 088k

 

ewwjnljjﬂ]

3:ch2_»

inguggubu I
“gm “.832...

J

 

|.u_l|.u::|uzr—¢n

 

ill

 

 

(LIZDUMJ‘DWHZU

 

 

 

 

7|s§ulJ_|§ﬂ|l=lllJ

 

Figure 7.8. Windows for speciﬁcation and source code corresponding to questionable

process

86

7.5.1 An Improved Mapping Tool

Several improvements were made for the expected behavior mapping tool when it
was integrated into the GOLD debugging environment. For example, the PAN ORAMA
version of the mapping tool used a nested listing to show the different levels in the
hierarchy of a LOTOS speciﬁcation. The nested listing has been replaced with a call
graph structure, that graphically illustrates the hierarchical structure. For example,
the interface for the improved mapping tool is shown in Figure 7.9, where the top
left window depicts the hierarchy of processes from the LOTOS speciﬁcation of the
Cholesky factorization program. The structure of the source code is also illustrated
via a call graph, as shown in the top right window. By selecting a node from the
speciﬁcation call graph, the events that may be instrumented for the corresponding
process are shown in the lower left window. By selecting a node from the program call
graph, the corresponding location in the source code listing is displayed in the lower
right window. An event is instrumented by making a selection from both the lower
left and lower right windows, and then selecting the Instrument Program button.
An additional improvement to the mapping tool is that it now assists the
programmer in ﬁnding the appropriate mapping between an expected behavior event
and the corresponding source code. In particular, when the programmer selects a
node from the speciﬁcation call graph, the mapping tool searches for a matching
node in the program call graph, and will update the “Program Call Graph” and
“Source Code” displays accordingly if it ﬁnds a match. If no match is found, the

“Program Call Graph” and “Source Code” displays are left unchanged.

87

   

            
               

: a a .3 33:?
2 23833.2 c.8523}.

A
A
“Nvubv
: H A .3. assassin
“a: .d . 235! J. cam-.2
. D. n 9.5.53
v Aznagcvnﬁc 8.
I'll!
v .8 u 352; t
W Ezsfiéa: . :3 2 :11319 n c c 8a

u o n a 03853:. £212: out"; >02 "Em E389.

 

 
     
              

3-8.83-3.3 new $8?

wAzxucvxuung—vgou

 

i. .lx-lltlllllvli:.tilllL

s Jim-ﬁaW _a84-££_-ea-EL _ 83.5%-; W
M u

Wanna-Esm— TﬁBSﬂa-Em Tenses-3.2.5
W _ l.

W J‘s-E
W

 

 

 

 

 

 

 

Figure 7.9. The interface for the improved mapping tool

88

7.5.2 A BC-graph Interface for Causal Distributed

Breakpoints

The trace data generated during the initial program run is used to support program
replay. For programs that have been instrumented using the mapping tool, additional
trace data is generated by each instrumented event. This additional trace data
is used to render a BC-graph interface for setting causal distributed breakpoints.
An example of this interface is shown in Figure 7.10 for the case of the Cholesky
factorization program. The breakpoint that was originally illustrated in Figure 4.3
has now been inserted using the BC-graph interface, where different color shadings
distinguish between LOTOS processes. The advantage of using a BC-graph interface
for setting causal distributed breakpoints is that the modeling of expected behavior
can assist the programmer in determining the appropriate breakpoint location.

The BC-graph supported by PANORAMA offers ﬁltering and clustering options for
controlling the display of events. The BC-graph interfaces supported by GOLD also
offer ﬁltering and clustering options, where the interface for this feature is shown
in Figure 7.11. (The operation of the interface differs depending on whether it was
activated via the Filter or Cluster menu options.) In the case shown in Figure 7.11,
the highlighted processes represent those for which events have been ﬁltered. In
addition, the subtree of processes rooted at the Synchronous_Cholesky node have
been clustered, such that events that occur for any of the processes in the subtree

will be displayed as occurring for the Synchronous_ Cholesky process.

7.5.3 Expected Behavior Visualizations

Several visualizations were integrated into the GOLD debugging environment for
modeling expected behavior. The BC-graph was one of these, where it is now used to

depict communication activity as the events occur, rather than from a post-mortem

89

 

‘ . . ‘-_.'.‘.'.‘.‘.'. ,r.._..,... q.» n. --..-_-. -_ , . .................. . . \ltl\U-("--. . .‘e.' . . 4 .
. . ,-_- . .v . . .. . .. . . , . H ,. .. 5.‘ .‘. .... .' -.. 3...: H‘.,-._ . _“ , . . , . . _-,',-.'_-_'-_ ' .,.. .....-...._. . ...,... . .‘.
' '- .. ' . ' . ‘ . . . ’ . .. i .a'h_:.'-“.‘ ' ,x' ‘ '.'. 'u‘ . I - ‘
‘ #33 III 1‘ ' n-mam I . 0 t J- . m. 4. . .- -’ " 1 o r I -
‘ 4 "'l-i‘ “-' '. ' -' ‘-‘. -' - . K - - ‘-'.~'- ' ‘l ‘ ' A A .‘ -- '- . -‘ - v- . t .d

Filter Cluster

‘ ’1'-'.'-'.'~‘.' '.‘.‘.' ..‘. ...........

 

a:

 

 

A

: ‘WAhmf-me umw.+zam-:W

 
   
 
  
    
  
 
 

 

 

R E 1

o L 2

C E 3

is - ‘ main-‘II_
s E 5 -II\IIEII*
S - 6 .Imiluini—
- - 7 -I-I-I-I‘\

- S a —-_--.

. A-w' yeww- I,- 1'. '13 I to 4 . {'“Y’RZI'”. (\‘x(-\_\:4\:1;‘.AA 2
" U. "y; 5‘" .t‘ /< r5 S53$§Z§ﬂ5§jﬁz\f-yﬁgf: {cg-:1: :x 9‘
' him-1M zMéascu‘mm-rq ,5:.;.;.‘.‘._z.,;., ‘

- For LOTUS processes:
0 Triggering breakpoint event I Witnleekgjactorizatim

O Causal breakpoint event I Fmabstitutim

 

e mam-donned Wm event I BackuardjtbstitutiomFacta‘izatim

Carnal Distributed Breakpoints that re Currently Inserted:
TRIGGERINS FRUIESS TII‘E (I: TRIGGERINS MINT

8 S2

 

 

 

 

   

 

)

 

‘

WWWWWWMWWWWMWWmﬁVWmeﬂWWSMﬂWWNmmW-ﬁwmﬂﬁWWAVMIkA17MWAV'.(‘V.\thr.«a

 

 

 

 

 

 

I Dieniss I l Help I

L...

 

Figure 7.10. BC-graph interface for causal distributed breakpoints

90

 

 

 

   
           

 

 

Fast-88-318“— gagged-sh.
_ _
FEES-318m— gag-eagle.
_ _

_ Riser-Rioﬁ a-SSW-Qa-su _

_

—2 93333653.; Sausﬂunemovbgm :an gungm
_ _ a

 

 

 

 

Ens—Easasg _§5-§_.W§-3-ﬁi

Eta-3-8341 _ u-Eba-BJSIE
_ _

 

 

 

 

 

Figure 7.11. The ﬁltering/ clustering interface

91

perspective. An example of the BC—graph interface is shown in Figure 7.12, where
the program is currently stopped at the causal distributed breakpoint illustrated in
Figure 7.10.

Call graphs and source code listings have been shown to be effective for illustrating
the current point in the program execution. Similar to these, additional visualization
have been developed for illustrating the current location in the speciﬁcation. When
the program stops at a breakpoint, visualizations may be activated for depicting
both the current location within the expected behavior call graph and the current
location within the speciﬁcation listing. These locations are computed according to
the last instrumented event that occurred. An example of an expected behavior call
graph is shown in Figure 7.13, where the program is currently stopped at the causal
distributed breakpoint illustrated in Figure 7.10. Figure 7.13 illustrates that the last
instrumented event that occurred for processing element 5 was within the process
For_Sub_ Communication.

Figure 7.14 shows an example of a speciﬁcation listing. In this case, the last
instrumented event that occurred was a send event. Figure 7.14 also shows the
source code listing for the same point in the execution, illustrating the portion of
the program that implements the send event. This scenario demonstrates a simple
approach for comparing expected behavior, as represented by the speciﬁcation listing,

to the corresponding implementation, as represented by the source code listing.

 

 

 

 

 

mi

 

Filter Cluster

 

 

a L gl-IIlI—lllll_

 

:lﬂlliﬂllm-IIIN-Ill-
élk‘llﬂllll—ZI'IHII’A'AIIIllll-
I“Iﬂﬂi’ilﬂ‘llIIM/lﬁllﬂ-IIIlllll-

ﬁlulmrz-iipI/nmsnlmun-l
:ILIInImmI/lzwiv‘mnm
leIIlIV—II/A—l-W

 

» :.£$>,¢%'M8W€<€<

 

Legend For LOTOS processes:

 

T I M E (usec)

 

 

 

7 Host_Node
I Sgnchronousjholeskyfectorizat

I Foruard_Substitution

ion

I Backuerd_Subst i tut i on_Factor i zat i on

 

 

 

l Disﬁrs; |

I”, " Help I

 

 

 

 

Figure 7.12. BC-graph interface for GOLD

93

 

.. ._ _=_ 3_-_._ . 3. _._ ». . W”. ._..'.l._.ﬂ_-.._-\:I... _‘ __..,-. __.--‘..._,.-__. ._- _..._e.;... -_u...,.h__.r. “HI...”
. . .. - - ~ . . - <1 - . . .. . ' , y ._. . ._, .. '._,_. g. < _. _-,_ .- v.-_.,-_._ ,'-' ._. ,1 K 3.3-39,.» x. ._'. ,'. . J, _ --'.'m;'.-.'.\-'-‘A'-‘~'-‘.'-"."A'.'4‘.- .‘ , .._-y..:
, - -- " w' .e 4 -1' a. ‘1 1" ‘. ‘ g A‘s-*5: :-: ;:~:.:--...-.-.:.;.;r;1:;:;:;:;:-:\_:--._-.:.;._-.;.;.;:;.;:;:;:-:-:c~;.1-:.:.;:::_~:::-,

, . .. ..I _ 4" LL -;- A W,“ . ”Au-3......" .- ~.._.-....-, __.... ......

 

 

 

I EmeraIMIrog‘anI
I

 

 

 

E amide I InllJlorklerMJ

I
I Horkeerde I
l

Witnleskg] FMSLIJstitutimI mien-mm]
l

i
rWﬁimLoop I rForijJlairLLooPI [Mi-EbrinioopI I:

 

 

     

 

 

 

 

 

 

 

 

 

 

 

 

I WWicatimI for- 5d: _Colnnicatio] IIaclLStb. ConnicatiI

|

 

 

 

 

[WWW Ire».3ub_aei=eive-neeej Rees-mam]

 

 

 

 

 

 

 

 

 

 

Figure 7.13. Modeling expected behavior in a call graph

94

 

.435 ?§5«§u§§y§§

_ 3.: _1 _ 8.83 * w

. k). .... Agagzggssggxﬁﬁg aihgcoﬁgiﬁg wisim

 

 

 

 

g; n§§§.§§§

733% 22E 0.2% “.2383. a E

’ ’mmwm WE?

 

' mg

 

 

l .
m 7,. .1

 

 

«000cm: as"
Agg.£§-838§
uaxo A: “Ag u mxmgmaznachohv u canonozma
A
guuouozvn>ooc~neonHnooaoaozuosaooomunamugou
. nomaauozumoxp :zzu>:. uozugoucomp anon
w as» o>.oooc todnoagoa o a“ Loecom oocgm av

v
A: Haaummmxaazunchahv vngauouozu
vn>ooc.ucoauoomcauazuoa“oooanaam com canoes;
A

mum” “ usogmxu .aunu .oaocc cc to: on“ .

ﬂaw Auoocac ~o... a .30; .uaocmn tau: ~uaoc: ~c v asaxoon W

m _ A

a mi “Amm .mvuconnoooocnuuo—nxaa
m_ “Awe” touvuco uncoogn ap—nxaa

N A A

.1 ”H. u I...

u nag—co: ux aqua

“ a nu "nun

v Au: nu nxuaocv u“ 11w” x ,M , .K1.

:0: .3 £38.. .zsssﬂziﬁdda
. «x u 09.500:
W 2 a J .3 38:?
a, 2 23333.3. v2.3.5.3
v 8:.

      

 

A: "hwmndoon>z cg an: _ouucocgzuH v

“Agasﬁsnatikasugggu
- A W, A- “51.8.; 5 385:9
5: .32. g8}: ..1 - - v
s?& A H.“ ouvgvngsgusuusag 3 .3“— “Mg

onus:
2 H A .3. 38:49.5.

 

anxo M
“3: 2823533. 5:... Hal M

 

 

 

 

 

 

-q:n.u;¢ma¢uwwmmmwmsnu»“Lamp.” ..Hrm¢~1.umh.x .gmmm
m «xuucm noQOOLm wrmrllrlrrrrrssrw»mngwwamwmmquhHHHLAmunm.S. ,

mug“yamunusaaagauxmwasswnuasepxaamxmmw1:111:21w.”vita“1.5.“1.1..rhwap¢»wxnrxamaymaszwxnuas

 

 

my” “Ame“ .Ha . onmomac tau vaoocununcan “ m «xuu:_ acocodm meannooogm

 

Figure 7.14. Comparison of speciﬁcation listing to source code listing

CHAPTER 8

Data Visualizations from Formal

Speciﬁcations

The Lockheed Integrated Visualization Environment (LIVE) project [7, 29, 51] uses
formal speciﬁcations to facilitate the selection of data visualizations. LIVE maps
formal speciﬁcations of a program’s data structures to a graphical unit termed a
cell model. The cell model consists of several components used to characterize a
data structure. LIVE uses a rule-based system for selecting appropriate debugging
visualizations depending on the contents of the cell model. In this chapter, we describe
the cell model, show how the formal speciﬁcation of a data structure is mapped to the
cell model, and illustrate how the cell model is used to select visualizations that are
useful for debugging. We conclude with the discussion of an improvement to the cell
model approach, where the programmer may provide input regarding the selection of

the application-speciﬁc visualization appropriate for the particular case.

8.1 The Cell Model

The cell model is used to formally capture the characteristics of a data structure

so that the mapping from data structure specification to visualization is feasible

95

96

[7, 29, 51]. The model contains several components that cumulatively provide enough
information to distinguish among data structures with different visual representations.
Four components are used to accomplish this task: dimension, size, complexity, and

access ports. Each of these four components are discussed in detail below.

8.1.1 Dimension

The dimension component of the cell model is used to categorize the structural layout
of the overall data structure. The value of the dimension component is used to
determine the layout of the visualizations used for a particular data structure. The
value for this component is a natural number if the data structure layout contains an
array-like ordering, as is the case with arrays, stacks, queues, and deques. If the data
structure does not contain an array-like ordering, then it is considered dimensionless
and therefore classiﬁed according to its cell-connectivity, i.e., the relationship between
the cells in the data structure. There are many different types of cell-connectivity,

with the following high-level classiﬁcation:

0 None: There is no relation between the cells in the data structure. The set
data structure is an example of such a case since the elements in a set are not

necessarily related.

0 Hierarchical: The relation between cells is such that when a new cell is added to
the data structure, it must be at the lowest level in the hierarchy and it cannot
have as a child a cell that was added earlier to the data structure. A tree is one
example of a data structure with this type of cell-connectivity. This type can
be further decomposed by differentiating according to the maximum number of

children a cell can have.

0 Random, symmetric: A cell can have a relation to any other cell in the data

structure. The symmetric property indicates that if there is a relationship

97

between two cells, then that relationship holds in both directions (i.e., it is
directed in both directions). For example, an undirected graph ﬁts in this

category.

0 Random, antisymmetric: A cell can have a relation to any other cell in the data
structure. The antisymmetric property indicates that if there is a relationship

between two cells, it must be in only one direction.

0 Random, not necessarily symmetric: A cell can have a relation to any other cell
in the data structure. The not necessarily symmetric property indicates that if
there is a relationship between two cells, then it may be in either one or both
directions. A directed graph is an example of a data structure with this type of

cell—connectivity.

8.1.2 Size

The size component is used to denote the capacity of the data structure. The
determination of this particular component is optional since the data structure
capacity is not always available from the speciﬁcation. If the capacity of a data
structure is available from the speciﬁcation, then it may be used to ﬁx the boundaries
of the visualizations of that data structure. The availability of the capacity is
especially advantageous in the case of data structures usually considered to have

a static size, as is the case, for example, with arrays.

8.1.3 Complexity

The complexity component is used to denote the number of values that are stored
within each cell of the data structure. An appropriate glyph (graphical object) may
be chosen to represent the cell values given the complexity of the cell as well as the

dimensions of the values within the cell. It should be noted that the values within

98

each cell need not be simple data types with dimension zero (e.g. integers, ﬂoats,
etc.). For example, given a one-dimensional array of queues (i.e., each element in the
array is an entire queue), the complexity of each cell is one and yet the value within
a cell is an entire queue (dimension equal to one). In this particular case, the value
within a cell can be visualized using a representation for a queue of cells.

For cases in which the dimensions of the values within a cell are all zero, relatively
simple glyphs (compared to the above example) may be used to model the values.
Depending on their design, such glyphs may have several distinguishable features,
where each feature may be used to model a different property. For example, a height
tile from a height ﬁeld visualization has at least two distinguishable features: its height
and its color. A height tile may thus be used as the glyph for data structures with a
complexity of one or two, where the data values within the cells each have dimension
zero. As another example, an arrow has at least four distinguishable features: its
height, its color, and its a: and y directions. An arrow may thus be used as the glyph
for data structures with a complexity of one, two, three, or four, where the data values

within the cells once again each have dimension zero.

8.1.4 Access Ports

Access ports are the constraints placed on entering or removing data elements from
a data structure. This component was added to the cell model to differentiate
between data structures such as stacks, queues, and one-dimensional arrays, where
the dimension, size, and complexity may all be identical yet different visual
representations are desirable. Currently, the classiﬁcation of access ports are limited
to the following types, although additional types may be added in the future as the

visualization of more complex data structures are investigated:

0 Input/ Output (I/O) access from a single cell: A single cell in the data structure

99

is used for input and output to the overall data structure. The particular cell
may change, but we will still have only a single cell handling 1/0 at any time.

This scenario is applicable in the case of a stack.

0 I/O access from two cells: This type of access is the same as the previous,
except that two cells are handling 1/0 at any time. This scenario would apply

in the case of a deque.

0 I/O access from greater than two but less than all cells: This case is the

generalization of the above two.

0 Random I/O access: Each cell has input / output access. This type of access
would apply in the case of an array, where indices are used to map to a particular

cell.

0 Input access from one cell in the data structure and output access from another
cell: This type of access permits the input and output functions to be separated.

This scenario is applicable to a queue.

The limitation of the cell model approach is that it considers only structural
characteristics of data types. In order to obtain a complete understanding of the
manipulation of data in a parallel program, operations on the data structures should

also be considered.

8.2 Mapping Speciﬁcations to the Cell Model

This section deﬁnes a mapping from the formal speciﬁcation of a data structure to
the dimension component of the cell model, considering only data structures that
have a dimension that can be represented by a natural number (i.e., that are not

dimensionless). Extensions to the procedure for mapping to the size, complexity,

100

and access ports components of the cell model have not yet been considered. For
this particular work, the formal speciﬁcation is written using the Larch speciﬁcation
language [52]. (Although we used Larch for this portion of the investigations, we
favor LOTOS for the other investigations. The features of LOTOS that inﬂuenced
this decision are discussed in Section 2.1.)

Larch is a two-tiered speciﬁcation language, where each Larch speciﬁcation has
components written in two languages: one designed for a speciﬁc programming
language (interface) and the other common to all programming languages (shared).
The former is referred to as a Larch Interface Language (LIL), and the latter is the
Larch Shared Language (LSL). The LSL is an algebraic speciﬁcation language that
allows a programmer to formally specify data abstractions, known as traits, that are
independent of program state and programming language. A signiﬁcant advantage
to using traits is that they are reusable when writing new speciﬁcations. The LIL is
used to specify what can be observed about the behavior of components written in
a particular programming language. In the following discussion, we limit our use to
the LSL. A further discussion of the LIL portion of Larch speciﬁcations is presented
by Wing [53].

Each Larch trait corresponds to an abstract data type (ADT). An ADT is a class
of values and an associated collection of operators that act on those values, where the
properties of the operators are speciﬁed using only axioms [54]. The operators of an
ADT may be grouped into three sets: constructors, modiﬁers, and observers. The set
of constructors are used to produce all possible values of the particular ADT, where
the range of each constructor is of a type known as the distinguished sort} The set
of modiﬁers are the non—constructor operators whose range is the distinguished sort.

Lastly, the observers are those operators that contain the distinguished sort in their

 

1The term “sort” is used in order to avoid confusion with the similar concept “type” from
programming languages.

101

domain, while their range consists of a sort other than the distinguished sort.

An example of a Larch trait is shown in Figure 8.1 for the case of a two-dimensional

 

Array(A, DX, DY, E) : trait
introduces
{}z —+ A
bind : A, DX, DY, E —» A
apply : A, DX, DY—> E
deﬁned : A, DX, DY—i Bool
asserts
A generated by {}, bind
A partitioned by apply, deﬁned
V a : A, di, d1i, d2i : DX, dj, de, d2j: DY, e: E
apply(bind(a, d2i, d2j, e), d1i, dlj) == if d1i = d2i /\ d1j= d2j
then 6 else apply(a, d1i, de)
- deﬁneda}, dz; dj)
deﬁned(bind{a, d2i, d2j, e), d1i, de) ==
(d1i = d2i /\ d1j= d2j) V deﬁned(a, d1i, de)
implies
converts apply, deﬁned
exempting ‘v’ di: DX, dj : DY apply{{},di,dj)

Figure 8.1. Two-dimensional array trait

 

array. The portion of the trait labeled by the introduces clause contains a list of the
operators for the particular trait. The operators for the two-dimensional array trait
are {}, bind, apply, and deﬁned. Each operator is immediately followed by its signature
that speciﬁes the sorts of the operator’s domain and range. The range is always a
single sort (3;), but the domain may consist of zero or more sorts (55,1, . . ”SQ-,1“).
Algebraically, It,- represents the number of sorts in the domain of the signature for the

operator op,, where the signature is represented by:

102
Si,la'-°aSi,k.' —> 5;, k; 2 0

The “A generated by” clause indicates that the set of operators immediately
following it are the constructors for the sort (in the case of the two—dimensional array
trait, {} and bind).

The above description of the Larch speciﬁcation language is a high-level overview,
where many details have been omitted for the sake of brevity. For further details
regarding the language, the relevant technical reports [52, 55] may be consulted.

The procedure that is used to map from a Larch trait to the dimension component
of the cell model is shown in Figure 8.2. Step 2 of the procedure tests for the case in
which there exists a single constructor of the form op :—> M for the sort M, where
the domain of the signature is empty. This situation occurs when the data structure
can only be assigned a single value, in which case its dimension is zero. An example
of this type of data structure is the Boolean value true.

Step 4 of the procedure tests for the existence of a constructor for which a “new”
value of the data structure is generated by joining two or more “old” values; for
instance, the insertion of a tree at the leaf of another tree. The computation of
the dimension for this case is beyond the current scope of our technique, and so,
consequently, the procedure returns a value of DIMENSIONLESS.

Step 6 of the procedure tests for the existence of indices. Two is subtracted from
the total count of domain sorts in a signature in order to account for one occurrence
of M, plus one occurrence of a sort representing an element to be added to M. Only
one of the sorts is an element to be added since the operators have been constructed in
an abstract fashion. (If it were possible to add two different types of elements to the
data structure, then two different constructor operators would have been used.) The
other sorts in the domain of this particular constructor are thus indices for placement

of the element, and so the number of these other sorts is equal to the dimension of

103

 

INPUT:

The trait that deﬁnes the sort M corresponding to the ADT.

OUTPUT: The dimension of the ADT, where the value DIMENSIONLESS indicates
that the data structure lacks a dimensional ordering computable by this

technique.
PROCEDURE:

1. Determine the set of sorts found in the domains of the constructors’
signatures.

2. If the set of sorts found in step 1 is empty, then return dimension equal to
zero and stop.

3. Count the maximum number of times that the sort M appears as a domain
sort within a constructor’s signature.

4. If the result from step 3 is greater than one, then return dimension equal
to DIMENSIONLESS and stop.

5. Count the maximum number of domain sorts within any of the
constructors’ signatures.

6. If the result from step 5 is greater than or equal to three, then return
dimension equal to the count minus two and stop.

7. If the set found in step 1 contains a sort not equal to M and that sort
appears as the range for one of the signatures, then return dimension equal
to one and stop.

8. Return dimension equal to DIMENSIONLESS and stop.

Figure 8.2. Technique for ﬁnding the dimension of an ADT from its trait

 

the data structure. For example, for the trait of a two-dimensional array, shown in

Figure 8.1, step 5 of the procedure determines the maximum number of domain sorts

within a constructor to be four for the case of the constructor bind. The procedure

will thus return the expected value of two.

Step 7 of the procedure tests if there is an ordering imposed on the data structure,

such that the cells of the data structure may be depicted in a one-dimensional layout.

In order to determine if there is such an ordering, we use the rule that if one of the

104

non-constructor operators permits access to a particular element within the container,
then that operator imposes an ordering. Examples of data structures for which this
rule applies are stacks and queues.

Finally, step 8 of the procedure handles the case in which no ordering was found

for the ADT. Speciﬁcally, the procedure returns dimension equal to DIMENSIONLESS.

8.3 An Example Application

This section describes how LIVE—generated visualizations based on Larch
speciﬁcations are used to detect an error in the implementation of a column-sorting
routine for the assignment problem. The assignment problem is identical to the
(weighted) bipartite graph matching problem, which is deﬁned as: G = (V, E) is an
undirected bipartite graph such that V can be partitioned into two disjoint sets .5'
and T and all edges have one end point in S and the other in T. A set M of edges is a
matching for G if each vertex in G is the end point of at most one edge in M. In the
weighted bipartite matching, a weight w(i, j), is associated with each edge (i, j), and
the cost of a matching is the sum of the weights of its edges [56]. For this problem,
the maximum matching is the one with the minimum cost. For example, the arcs,
represented by ones in the match matrix, shown in Figure 8.3, produce the maximum

match with the minimum cost for the corresponding weight matrix.

 

 

 

 

 

 

 

 

 

 

 

ll 22 4 76

32 95 2 8

100 55 6 7

333 25 88 0 0
Weight Matrix Match Matrix

Figure 8.3. Maximum match with the minimum cost

105

A C / Paris [57] implementation of the algorithm was adapted for the case of Single
Instruction Multiple Data (SIMD) machines? A critical component of the algorithm is
the sorting routine. The strategy of this routine is to sort columns by the row position
of matches, where the ﬁrst row equals 0, the second row equals 1, and so on. The row
positions of the matched arcs are used to label the columns of a sort matrix S, where
zeros are placed in the positions that do not contain a match. The preprocessing
of the sort matrix 5' involves spreading the row numbers of the matching arcs along
the columns. Next, inﬁnity values, termed inﬁnites, are substituted for zeros. An
example is shown in Figure 8.4. The columns of this sort matrix are then sorted in

ascending order. The weights matrix and the match matrix are sorted based on how

 

     

 

S' is sorted.

Row #
0
l
2
3

Match Matrix Sort Matrix Sort Matrix
Before Preprocessing After Preprocassing

Figure 8.4. Construction of the sort matrix from the match matrix

The underlying data structure for a matrix is a two-dimensional array, and so the
trait in Figure 8.1 applies. Using LIVE’s rule-based system [51], the programmer is
able to generate visualizations for this problem such as those in Figures 8.5 and 8.6.

An error is detected in the visualizations of the match matrix3 in Figures 8.5
and 8.6, where a white dot represents a match. More speciﬁcally, as the sorting

progresses, the white dot in the ﬁrst row moves in the opposite direction of the other

 

2See [56] for details of this algorithm and the results obtained from its implementation.
3These visualizations are for a 128 x 128 matrix.

106

white dots (Figure 8.5) until it eventually ends up at the far right end of the matched
columns upon completion of the sorting (Figure 8.6). Since the columns are sorted
according to match row positions, this particular white dot should instead migrate to

the ﬁrst column.

White dot from ﬁrst row

 

Figure 8.5. A visualization of column sorting for the implementation of the assignment
algorithm

From our earlier discussion, we recall that a column without a match will contain
all zeros and, thus, be used to set the corresponding column of the sort matrix to
inﬁnites. The sorting routine will position these columns to the right of the columns
containing a match, as is shown in Figure 8.6. Noting that the misbehaving column
is sorted similar to the columns that do not contain a match, we hypothesize that

the corresponding column in the sort matrix has been incorrectly ﬁlled with inﬁnites.

107

White dot from ﬁrst row

 

Figure 8.6. The visualization of the match matrix after the misbehaving column sort

(Visualizations of the sort matrix were used to conﬁrm this hypothesis, as discussed
in [58].) Backtracking in the procedure to ﬁnd an explanation for the error, we also
recall that row indices start at zero. Since any column of the match matrix containing
all zeros will trigger its corresponding column in the sort matrix to be set to inﬁnites,
it is concluded that the bug was caused by inﬁnites being incorrectly substituted for

the zeroth row position.

8.4 Improvement to the Mapping Strategy

The procedure for determining an appropriate data visualization can be improved by
letting the programmer guide the ﬁnal selection of the visualization. The procedure
discussed earlier may be used to initially determine the correct category of data

visualizations, based on the static characteristics of the data structure. At that

108

point, a. graphical user interface may then be used to let the programmer select the
application-speciﬁc visualization appropriate for the particular case.

As an example, we consider the case in which the algorithm discussed earlier
determined that a particular data structure was of type ‘integer’, based on the
information in the trait. There are several possibilities for how an ‘integer’ may
be used, where, in each case, the programmer may have a different conceptual idea
of how the data visualization should illustrate the respective operations. Some of
the possible uses for an integer are the following, where the characteristics of an

appropriate data visualization are discussed in each case:

0 Counter: The basic operation is to increment the integer, where the initial value
of the integer is zero. A data visualization for illustrating this use of an integer
could be similar to that shown in Figure 8.7, where the increment operation is

illustrated by “ﬁlling” the gauge by one unit.

 

 

 

Dis-is:

 

 

 

Figure 8.7. A data visualization for a counter

109

0 Array index: An integer used as an array index may be either incremented
or decremented, although the value of the index should always lie within the
bounds of the array. If the array index extends outside the bounds, then the
application-speciﬁc visualization should highlight the error. (For example, this
case may occur if a program references position n + l of an array that contains
only n elements.) In order to illustrate this type of data visualization, the
number of elements in the array must be determined, in addition to the value

of the integer.

e Other uses for an integer: A default case should be provided, where the textual

value of the integer is simply printed.

CHAPTER 9

Related Work

In this chapter, we discuss work related to the thesis research. The discussion
is organized into two sections corresponding to the two main contributions of the
research. Speciﬁcally, we present investigations into the integration of visualization-
based functionality with on-line parallel debuggers, and we then present alternative

modeling techniques for visualizations of program execution.

9.1 On—line Parallel Debuggers Supporting
Visualization

Replay techniques offer a promising approach for providing deterministic execution
in on-line parallel debugging tools [12, 13, 14, 15]. However, many common debugger
operations, such as breakpointing and the testing of predicates, present problems of
their own for the case of a parallel programming environment, and thus must be
resolved before a completely functional on-line parallel debugger is feasible. Methods
for performing such operations have only recently been offered for the case of a
parallel environment [22, 41, 42, 59]. Consequently, the builders of visualization-
based parallel debuggers have typically designed their tools based on a post-mortem

approach, where development methods are better established than for the case of

110

111

on-line debugging [6]. Thus, the use of visualization in on-line parallel debuggers is
still relatively modest at this time. We discuss below the small number of cases where
visualization has been integrated into the design of an on-line debugger, illustrating
in each case how visualization has supplemented the usual operations of an on-line
debugger. In all cases, the use of visualization is mainly limited to the depiction of
message-passing behavior. Where applicable, we also discuss scalable techniques that

have been applied for handling executions with large numbers of processes.

9.1.1 Bugnet

Bugnet [16, 60] provides a variety of on-line debugging capabilities for the case of
distributed programs that communicate via message-passing. The design of Bugnet
is based on a replay strategy, where the messages along with their contents are
saved during an initial program execution. In addition, checkpoints are periodically
recorded during the initial execution so that replay can begin at intermediate program
states rather than from the beginning of the program. During replay, the user
can re-execute either a single processing element or a selected subset of processing
elements, where the messages from any processing elements not being replayed are
supplied by the message log. Although the replay mechanism of Bugnet is useful for
debugging, many other common on—line debugging operations are not offered, such
as breakpointing.

Bugnet supports visualization of program execution by allowing the user to
arrange groups of processing elements (depicted as boxes) in an arrangement desirable
for viewing the interaction between the processing elements. As the program executes,
the beginning of a message-passing event is shown by having the sending processing
element darken, followed by a line appearing between the sending processing element
and the receiving processing element, and then having the receiving processing

element darken. The ending of the message-passing event is shown by having

112

the sending processing element lighten, the line between the processing elements
disappear, and ﬁnally by having the receiving processing element lighten. The
limitations of this visualization strategy are that it does not scale well, it does not show
message-passing activity across time effectively, and it is difﬁcult for the programmer

to relate the visualizations to the expected behavior for the program.

9.1.2 Idd

Idd [17] is another example of an on-line debugging tool for distributed, message-
passing programs. One of the key features of Idd is that it allows breakpoints to
be set in the source code, but with the limitation that processing elements are not
guaranteed to stop in a consistent state. Another feature of Idd is that it allows
the programmer to deﬁne assertions for the global message-passing activity, where
a supervisor program automatically stops program execution when an assertion is
violated. At that time, the programmer is able to inspect the various processing
elements in order to search for the error.

The monitoring component of Idd supports the visualization of message-passing
events as they occur. These events are displayed across time in the form of a space-
time diagram, where the y-axis of the diagram represents the processing element
number, the x—axis represents the time, and lines between points in the diagram
represent message-passing activity. Scroll bars are provided on each of the axes
to allow the programmer to go either forward or backward through the processing
elements or time. This visualization approach supports scalability and depiction of
behavior across time better than that provided by Bugnet, but the programmer is still
left with the task of ﬁnding a match between the expected behavior for the program

and the behavior that is actually depicted.

113

9.1.3 VISTOP/TOPSYS

The visualization tool VIS TOP is part of the integrated tool environment TOPS Y5 for
programming message-passing multiprocessors, designed speciﬁcally for the parallel
programming library MMK (Multiprocessor Multitasking Kernel) [18]. An on-
line mode for the TOPSYS monitoring system allows breakpoints to be set in an
application, where state inspection is possible when the breakpoint is reached during
program execution. An additional option provided by TOPSYS is that variables may
be modiﬁed when breakpoints are reached in order to test ﬁxes. TOPSYS also offers
an alternative post-mortem mode, where traces are recorded during the program
execution to later be visualized off-line.

The visualizations provided by VISTOP depict program execution in terms of the
basic object types of MMK: the task, the mailbox, and the semaphore. Icons are
used to distinguish between each of these object types. The communication of a task
with a mailbox or a semaphore is indicated by an arrow. A thin arrow represents the
actual communication, while a thick arrow represents the case where a task is waiting
for a communication request to be satisﬁed. The limitations of VISTOP are the same
as for Bugnet: lack of scalability, poor depiction of behavior across time, and the
difﬁculty of relating the visualizations to the expected behavior of the program. In
addition, since VISTOP is speciﬁcally oriented towards depicting the basic objects
of MMK, its visualization approach is not easily portable to other message-passing

programming environments.

9.1.4 ParaRex

In the cases of Bugnet, Idd, and TOPSYS, program visualization was provided as
an additional feature of an on-line parallel debugging tool. In the case of ParaRex

[19], however, the design strategy was to integrate the post-mortem visualization tool

114

ParaGraph [24] with the on-line debugging capabilities of DECON, a debugger for the
distributed-memory machine iPSC / 2. The result of this integration was a debugging
tool that used the features of ParaGraph to provide on-line visualization, while
simultaneously allowing examination of variables through the features of DECON.
An execution replay technique was used as the underlying mechanism for providing
deterministic execution, and, thus, meaningful results.

The visualization options offered by ParaRex are, of course, identical to those
of ParaGraph. Speciﬁcally, ParaGraph provides a large variety of visualizations,
covering many different aspects of a program’s execution. Examples of the
information visualized include message-passing activity, the degree of parallelism
at any instant, and statistical data, such as the total volume of messages.
One disadvantage of the visualization approach of ParaRex is that ParaGraph’s
visualizations are actually oriented towards locating performance problems rather
than debugging functional errors. In particular, as was the case for the three tools
discussed earlier in this section, the visualizations of ParaRex are not easily related
to the expected behavior of the program. Another disadvantage of ParaRex is that
DECON has been designed speciﬁcally for the iPSC / 2, and thus the overall approach

used by ParaRex is not easily portable.

9.1 .5 Panorama

The design strategy of the tool Panorama [20] was similar to that of ParaRex, in
that visualization capabilities were integrated with the functionality of an on-line
debugger} However, in the case of Panorama, a key design goal was portability across a

variety of parallel platforms. Thus, the developers of Panorama have designed generic

 

1Despite the resemblance in names, the tool Panorama should not be confused with the tool
PANORAMA, discussed earlier in Section 7. Both tools were developed in approximately the same
time frame, and unfortunately were assigned similar names.

115

debugger functions to interface between the base debugger of a parallel machine
and the visualization component of Panorama. The base debuggers that Panorama
supports are the ipd debugger from the iPSC / 860 and the ndb debugger from the
Ncube.

Two visualizations are offered by Panorama: a time-line visualization and a
processor map visualization. The time-line visualization is similar to a space—time
diagram, except that send and receive events are distinguished by using a circle to
represent a send and a square to represent a receive. The processor map visualization
is an alternative portrayal of message-passing activity, where the programmer may
place the processors (shown as boxes) in an arrangement desirable for viewing their
interaction, with the interconnections between processors shown as lines between the
boxes. In this case, message-passing activity is depicted by labeling each incoming
line with the number of messages waiting to be read from the neighboring processor.
Although this latter visualization provides a different perspective for viewing message-
passing behavior, neither visualization offered by Panorama relates such behavior to

the expected behavior of the program.

9.1.6 PPT/xipd

A program phase tree (PPT) [46] is a graphical representation of program structure,
where the programmer may modify this representation in order to portray program
activities in a particular manner. A PPT is, by itself, not a tool, but rather a type
of visualization. The research discussed in [46] illustrates how PPT visualization
capabilities were added to the xipd on-line debugger for the iPSC / 860.

In contrast to the visualizations provided by the tools discussed above, a PPT
focuses on illustrating the entries and exits from the various phases of a program
rather than message-passing activity. The initial form of a PPT is a standard call

graph, but the programmer can delete nodes or edges and collapse subtrees until

116

a desired arrangement is achieved. During program execution, as each processing
element invokes a procedure corresponding to an entry point of a node, a new line
is drawn between the caller and callee nodes, where multiple colors are used to
distinguish between the invocations by different processing elements.

A key advantage of a PPT is that it provides the programmer with the ﬂexibility
to visualize program execution in a format that resembles the expected behavior of the
program, although it is limited to structures based on the call graph of the program.
Some disadvantages of a PPT are that it does not illustrate behavior across time well,

nor does it show the interaction between processing elements.

9.1.7 Prism

Prism [61] provides on-line debugging capabilities for message—passing programs,
speciﬁcally developed for the CM-5. A main visualization supported by Prism is
a “where tree”, a call graph structure that illustrates where each process is located
in its execution. As such, the “where tree” shows the global status of the system
in a single visualization. However, a “where tree” does not illustrate communication
between the processes.

One of the features of Prism is an option for ﬁltering which processes should be
targeted for debugging. A language is provided for specifying lists and ranges of
processes to which debugging commands should be issued. Although the language is
extensive, the disadvantage of this approach is that a programmer must ﬁrst learn

the language before process ﬁltering can be applied.

117
9.2 Models for Visualization of Program
Execution

In Section 7, we discussed a strategy for visualizing a parallel program’s execution
using a model of expected behavior, where the expected behavior model was
constructed based on the LOTOS speciﬁcation of the program. We then showed
that debugging of incorrect message-passing communication is facilitated by modeling
visualizations in this manner. We also described, in Section 8, a method for generating
debugging visualizations of data structures, where, in that case, the visualizations
were modeled based on the Larch speciﬁcations of a program’s abstract data types.
We will now discuss additional models that have been used to construct visualizations

oriented towards debugging incorrect behavior in parallel programs.

9.2.1 Event-oriented Models

A common strategy for the construction of visualizations is to use a model based
directly on the event data generated during the program execution. The main
advantage of this approach is that little or no post-processing of the event data
is required, but the main disadvantage is that the programmer must manually
ﬁnd a correlation between the graphically-represented events and the conceptual
understanding of the program. An example of a visualization constructed according
to an event-oriented model is a space-time diagram, as was discussed in Section 7.3.
In this case, the event data recorded during the program execution is used to directly
render a visualization of communication between processing elements across time.
Examples of visualization tools that offer space-time diagrams or other visualizations
constructed according to an event-oriented model are ParaGraph [24], Idd [17], the

Moviola system [23], and Radar [62].

118

9.2.2 Clustering Techniques

An approach that has been used to reduce the congestion in a visualization of event
data is to cluster groups of low-level events into user-deﬁned high-level events. The
visualizations of program execution are then modeled in terms of these high-level
events. For example, the tool Belvedere [63, 64] visually abstracts low-level behavior
of a program by using the Event Definition Language (EDL) [65] to deﬁne the pattern
of low-level events that constitute a particular high-level event. An event recognizer
is used to identify occurrences of the high-level events among a stream of low-level
events. The visualizations produced by Belvedere depict program execution in the
context of the high-level events. There is an overhead cost in this approach, however,
since the programmer must learn and apply EDL to construct the high-level events. A
similar tool, Ariadne [66], also supports the deﬁnition of high-level events in terms of
low-level events, but suffers the same disadvantage in that the programmer must learn
and apply its event modeling language. The approach that we used in the PANORAMA
project avoids this additional overhead by modeling the high-level events of a program
in terms of the program’s formal speciﬁcation, where this speciﬁcation has already

been created during the design phase of the program.

9.2.3 Source Code Models

One approach that has been used to simplify the task of ﬁnding a correlation between
the program visualizations and one’s understanding of the program is to model the
visualizations in terms of the source code structure. An example of a visualization
constructed according to this approach is a dynamic call graph, where the nodes in the
graph represent high-level program objects such as subroutines and the lines between
nodes represent call relations. Tools which provide dynamic call graphs include Faust

[67], Schedule [68], and the visualization system of Zimmermann, Perrenoud, and

119

Schiper [69].

An enhanced version of a dynamic call graph is a program phase tree (PPT) [46],
as was discussed in Section 9.1.6. This type of visualization provides the programmer
with the ﬂexibility to modify a standard call graph to more closely reﬂect a particular
understanding of the call relations in the program.

Another type of visualization modeled in terms of the source code structure is a
ﬂow diagram. For example, PF-View [45] represents program execution as a series
of icons linked linearly to show the ﬂow of execution, where each icon corresponds
either to a parallel construct or to serial code. The programmer may click on an
icon to expand it, revealing the individual processors involved in the corresponding
computation.

The disadvantage of using dynamic call graphs or ﬂow diagrams is that the
level of insight that they can provide is limited to aspects of call or ﬂow relations
within a program. If the incorrect behavior is not based entirely on these relations,
then these visualizations, when used alone, have limited utility. When used in an
environment that supports multiple visualization windows, however, the information
they provide may be correlated with the information from other visualizations to
provide a beneﬁcial overall view of program execution. The advantages of using
multiple visualizations to depict a program’s execution are discussed by LeBlanc,

Mellor-Crummey, and Fowler [23].

9.2.4 Application-speciﬁc Techniques

Application-speciﬁc visualizations are those that have been designed to visually
model one’s natural understanding of a program. Examples of tools that support
application-speciﬁc visualizations are Voyeur [70], POLKA [71], and ParaGraph [24].
An example of a typical application-speciﬁc visualization is one used to depict the

execution of a sorting program, consisting of the graphical image of an aligned set

120

of bars (rectangles), where the height of each bar reﬂects the magnitude of the
corresponding number. The layout for such a visualization would be similar to that
shown in Figure 9.1, where ten bars (corresponding to numbers) are depicted in
a random order. As the sorting algorithm proceeds, the bars exchange positions
accordingly, until, at the end of the execution, they are all ordered in either increasing

or decreasing order.

 

 

MAGNITUDE

 

 

INDEX

 

 

 

Figure 9.1. Typical layout for an application-speciﬁc visualization of the sorting
problem

The main advantage of application-speciﬁc visualizations is that they are designed
to reﬂect one’s conceptual understanding of a problem, but the main disadvantage is
that a new visualization must be created for each new problem investigated. Since
window programming is tedious, the time requirements for creating application-
speciﬁc visualizations may outweigh their beneﬁt. In order to make the application-
speciﬁc visualization approach feasible, tools are often developed to simplify the

task of creating new visualizations. For example, Voyeur [70] is an application-

121

speciﬁc visualization tool that provides a hierarchical library of visualizations. The
creation of new visualizations is simpliﬁed by starting with the inherited code from
a visualization higher up in the hierarchy. A sample of the class hierarchy of
visualizations for Voyeur is shown in Figure 9.2. As an example, the x,y class of
visualizations have the characteristic that they represent state by placing graphical
elements on an x ,y coordinate space that ﬁlls the drawing area. The icon visualization
is constrained to using a grid with integer-valued coordinates, while, in contrast, the
vector visualization is constrained to using real-valued coordinates. For both the icon
visualization and the vector visualization, the behavior may be derived from the base
x,y class. A time-savings is achieved by starting with the code from the base x,y

class when deriving the icon and vector classes.

 

/base\
x,y text boxes
vector icon trace linked-list

 

 

 

Figure 9.2. Sample class hierarchy of Voyeur visualizations

9.2.5 Property-oriented Models

In Sections 7 and 8, we discussed methods that we have used to model visualizations
in terms of the formal speciﬁcation of a program. Another similar approach is to
model visualizations in terms of formal properties used in correctness proofs of a

program. For example, Roman and Cox have developed a method for mapping from

122

the shared dataspace paradigm to appropriate visualizations [72, 73]. The shared
dataspace paradigm is one in which processing elements have access to a common
content-addressable data structure (typically a set of tuples) whose components may
be examined, inserted, and deleted. The mapping used by Roman and Cox is from
shared dataspace program states to graphical objects that may be rendered as images.
In particular, formal properties expressed in a logical calculus are modeled within
the images, such that interpretation of the images may be done in the context
of the properties used to reason about the computation. A disadvantage of the
approach presented by Roman and Cox is that the mapping method is limited to
the shared dataspace paradigm. Popular parallel programming paradigms such as

shared variables and message-passing are not handled.

CHAPTER 10

Conclusions and Future

Investigations

This dissertation has discussed research that provided two main contributions to the
area of visualization-based parallel debugging. First, we developed new techniques
for using visualization to support the operation of an on-line parallel debugger
[9, 11]. The results included a visualization-based approach for the insertion of
distributed breakpoints, where the difﬁcult task of setting distributed breakpoints
is reduced to a simple, graphical operation. For example, in order to insert a causal
distributed breakpoint, the programmer selects an appropriate event from a space-
time diagram, where the graphical interface is then updated to reﬂect the location of
the last event along each process axis that had a causal effect on the execution of the
triggering process. The programmer may use this interface to evaluate the location of
potential breakpoints, and to insert those breakpoints that are found to be relevant
to the current problem. In order to assist the programmer in setting a distributed
breakpoint, we also provided support for marker events. The programmer may use
marker events to instrument the program at appropriate points, where the generated
trace data from the marker events is then used to mark the relevant execution points

in the space-time diagram. This feature facilitates the task of ﬁnding the appropriate

123

124

location for inserting a distributed breakpoint.

Another technique that was developed to support the operation of an on-line
parallel debugger was scalable techniques for visualizing communication events [10,
11]. The basis for these techniques was to use ﬁltering to reduce the amount of
information presented in the visualizations. The ﬁltering options are provided via a
simple graphical user interface, such that the programmer may easily direct which
events should be ﬁltered. A key outcome of this approach was a new visualization
referred to as a Single Point-of-View (SPV) communication graph, in which only the
events received at or sent from a selected process are depicted.

An additional outcome of our research is the integration of several types of
visualizations into a common environment for supporting the top-down examination
of program states [9, ll, 25]. These visualizations included communication graphs
(i.e., space-time diagrams and SPV graphs), call graphs, source code listings, and data
visualizations. In the case of the data visualizations, additional ﬁltering options were
provided to highlight particular instances of data, such as all values that exceed some
maximum value, or all values that are within some range. These visualizations are
applied in a framework where program execution is initially examined from a broad
scope, using visualizations such as communication graphs. In order to determine
the potential source of an error, the user may narrow the scope of the visualizations
that are used, applying call graphs, source code listings, or data visualizations, in
that order. Graphical support is provided for coordinating the use of the individual
visualizations. This visualization-based framework is beneﬁcial since it supports a
methodical approach for using visualizations to debug a parallel program.

The second main contribution of this dissertation was the development of a
methodology for visualizing a program’s execution using the formal speciﬁcation of the
program as a model of expected behavior. One outcome of this research was a strategy

for modeling expected behavior in space-time diagrams [8, 26]. A visualization

125

referred to as a BC-graph was developed, where shaded rectangles representing
LOTOS processes were overlapped onto a space-time diagram of the message-passing
events. This visualization may be used to compare the expected behavior, as
represented by the LOTOS processes, to the actual behavior, as represented by the
message-passing events. We also developed techniques for visualizing the current
location in a program execution in terms of the formal speciﬁcation. When the
program stops at a breakpoint, visualizations may be activated for depicting the
current location of a process either within the call graph of the speciﬁcation or within
the speciﬁcation listing. These visualizations provide a simple approach for visualizing
the current program activity within a model of the program’s expected behavior.

An additional outcome of this research was a technique for using the formal
speciﬁcations of a program’s data structures to guide the generation of data
visualizations [7, 29]. In particular, the formal speciﬁcations are mapped to a
graphical unit termed a cell model. The cell model consists of several components
used to characterize a data structure. A rule-based system is then used to select
appropriate debugging visualizations depending on the contents of the cell model.

The research contributions discussed above were implemented in prototype
debugging environments, where the main purpose of these environments was to give
a proof of concept demonstration of the research results. The debugging environment
in which most of the contributions were implemented was the Graphical On-Line
Debugger (GOLD), a parallel debugger for PVM message-passing programs. GOLD
may be used to debug a program as it is running, where a replay strategy is used
to guarantee that re—executions of the program will be deterministic. By developing
GOLD using a replay approach, visualization could be applied liberally, since events
were guaranteed to occur in the same relative order as in the initial execution.

Two other debuggers, PANORAMA and LIVE, were developed prior to the

development of GOLD, where the purpose of these debuggers was to demonstrate

126

various aspects of the methodology for using a program’s formal speciﬁcation as a
model for visualizing the program’s execution. These debuggers were useful for these
demonstrations, but could not be used for the entire span of our research due to
certain limitations. In particular, PANORAMA was developed according to a post-
mortem approach, where monitor intrusion and data storage constraints limited the
visualization capabilities. In the other case, LIVE was developed for SIMD programs,
where a SIMD program has a single instruction stream. Thus, LIVE could not be
used to debug programs that use multiple instruction streams. These limitations
motivated the development of GOLD.

Future investigations should continue to address scalability issues for program
visualization, particularly considering the cases where programs use a large number
of processes or large data structures. Although this dissertation has presented several
ﬁltering options for handling these cases, the scalability of parallel debuggers must
continue to improve in order to efﬁciently support the needs of parallel programmers.
For example, visualization-based techniques may be explored for clustering processes,
such that the number of separate items plotted along the vertical axis of a space-time
diagram is reduced. In addition, visualization-based techniques for examining large
data structures may be explored, such as improving the ﬁltering techniques for one-
and two-dimensional arrays.

An additional area for future investigation is to strengthen the tool support
for parallel program speciﬁcation. The speciﬁcation of parallel programs is a
growing science, yet tool support must still be improved. In the particular case of
LOTOS, tools that would be beneﬁcial for supporting the expected behavior modeling
techniques discussed in this dissertation include editors, syntax checkers, semantic
checkers, and simulators. (Although there exist tools that perform these functions

for LOTOS, they are mostly at a prototype level.)

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] Keijiro Araki, Zengo Furukawa, and Jingde Cheng. A general framework for
debugging. IEEE Software, pages 14—20, May 1991.

[2] Charles E. McDowell and David P. Helmbold. Debugging concurrent programs.
ACM Computing Surveys, 21(4):593—622, December 1989.

[3] Jeannette M. Wing. A speciﬁer’s introduction to formal methods. IEEE
Computer, pages 8—24, September 1990.

[4] Betty H. C. Cheng. Synthesis of procedural abstractions from formal
speciﬁcations. In Proc. of COMPSAC ’91, pages 149—154, Tokyo, Japan, Sept.
1991.

[5] Betty H. C. Cheng. Applying formal methods in automated software
development. Journal of Computer and Software Engineering, 2(2):]37-164,
1994.

[6] Eileen Kraemer and John T. Stasko. The visualization of parallel systems: An
overview. Journal of Parallel and Distributed Computing, 18:105—117, 1993.

[7] Mark V. LaPolla, Joseph L. Sharnowski, Betty H. C. Cheng, and Kevin
Anderson. Data parallel program visualizations from formal speciﬁcations.

Journal of Parallel and Distributed Computing, 18:252—257, 1993.

[8] Joseph L. Sharnowski and Betty H. C. Cheng. A formal approach to modeling
expected behavior in parallel program visualizations. In PARLE ’94: Parallel
Architectures and Languages Europe, Lecture Notes in Computer Science, vol.
817, pages 202-213, Athens, Greece, July 1994. Springer-Verlag.

[9] Joseph L. Sharnowski and Betty H. C. Cheng. A visualization—based environment
for top-down debugging of parallel programs. Accepted to appear in the
Proceedings of the 9th International Parallel Processing Symposium (IPPS ’95).

127

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

128

Joseph L. Sharnowski and Betty H. C. Cheng. A scalable approach to
visualization-based parallel debugging. Technical Report CPS-94-67, Michigan
State University, December 1994. Submitted for publication.

Joseph L. Sharnowski and Betty H. C. Cheng. GOLD: A Graphical, On-Line
Debugger for parallel programs. Technical Report CPS-95-16, Michigan State
University, April 1995.

Thomas J. LeBlanc and John M. Mellor-Crummey. Debugging parallel programs
with instant replay. IEEE Transactions on Computers, C-36(4):471-—481, April
1987.

Richard H. Carver and Kuo-Chung Tai. Replay and testing for concurrent
programs. IEEE Software, pages 86—74, March 1991.

Robert H. B. Netzer and Barton P. Miller. Optimal tracing and replay for
debugging message-passing parallel programs. In Supercomputing ’92, pages 502—
511, 1992.

Robert H. B. Netzer and Jian Xu. Adaptive message logging for incremental
program replay. IEEE Parallel 69' Distributed Technology, pages 32—39, November
1993.

Larry D. Wittie. Debugging distributed C programs by real time replay.
SIGPLAN Notices, 24(1):57—67, January 1989.

Paul K. Harter, Jr., Dennis M. Heimbigner, and Roger King. Idd: An interactive
distributed debugger. In Proc. of IEEE 1985 Distributed Computing Systems,
pages 498—506, 1985.

Thomas Bemmerl and Peter Braun. Visualization of message passing parallel
programs with the TOPSYS parallel programming environment. Journal of
Parallel and Distributed Computing, 18:118—128, 1993.

Eric Leu and André Schiper. Execution replay: A mechanism for integrating
a visualization tool with a symbolic debugger. In Proceedings of 2nd Joint
International Conference on Vector and Parallel Processing (CONPAR 92 -
VAPPV), Lecture Notes in Computer Science, vol. 634, pages 55—66, Lyon,
France, September 1992. Springer-Verlag.

John May and Francine Berman. Panorama: A portable, extensible parallel
debugger. In Proceedings of ACM/ONR Workshop on Parallel and Distributed
Debugging, pages 96—106, San Diego, California, May 1993.

129

[21] Cherri M. Pancake and Sue Utter. Models for visualization in parallel debuggers.
In Proceedings of 1989 Supercomputing Conference, pages 627—636, 1989.

[22] Jerry Fowler and Willy Zwaenepoel. Causal distributed breakpoints. In
Proceedings of 10th International Conference on Distributed Computing Systems,
pages 134—141, Paris, France, May 1990.

[23] Thomas J. LeBlanc, John M. Mellor-Crummey, and Robert J. Fowler. Analyzing
parallel program executions using multiple views. Journal of Parallel and
Distributed Computing, 9:203—217, 1990.

[24] Michael T. Heath and Jennifer A. Etheridge. Visualizing the performance of
parallel programs. IEEE Software, pages 29—39, September 1991.

[25] Joseph L. Sharnowski and Betty H. C. Cheng. A top-down approach to
visualization-based debugging of parallel programs, November 1994. Presented

at the poster session of Supercomputing ’94.

[26] Joseph L. Sharnowski and Betty H. C. Cheng. A formally-based expected
behavior model for parallel program visualizations, April 1994. Submitted for

publication.

[27] International Organization for Standardization, IS 8807. LOTOS: A formal
description technique based on the temporal ordering of observational behavior,
1989.

[28] Tommaso Bolognesi and Ed Brinksma. Introduction to the ISO speciﬁcation
language LOTOS. Computer Networks and ISDN Systems, 14(1):25-59, 1987.

[29] Joseph L. Sharnowski, Betty H. C. Cheng, and Mark V. LaPolla. Mapping
formal speciﬁcations to parallel program visualizations. In Proceedings of
Minnowbrook Workshop on Software Engineering for Parallel Computing, pages
29—34, Minnowbrook Conference Center, New York, August 10-13, 1992.

[30] Geoffrey C. Fox, Mark A. Johnson, Gregory A. Lyzenga, Steve W. Otto, John K.
Salmon, and David W. Walker. Solving Problems on Concurrent Processors,
volume 1. Prentice Hall, Englewood Cliffs, New Jersey, 1988.

[31] Luigi Logrippo, Mohammed Faci, and Mazen Haj-Hussein. An introduction to
LOTOS: learning by examples. Computer Networks and ISDN Systems, 23:325—
342, 1992.

130

[32] Hartmut Ehrig and Bernd Mahr. Fundamentals of Algebraic Speciﬁcation.
Springer-Verlag, Berlin, 1985.

[33] Robin Milner. A calculus of communicating systems. In Lecture Notes in

Computer Science 92. Springer-Verlag, 1980.

[34] C.A.R. Hoare. Communicating Sequential Processes. Prentice Hall International,
Englewood Cliffs, New Jersey, 1985.

[35] Mazen Haj-Hussein and Luigi Logrippo. Specifying distributed algorithms in

LOTOS. To appear in Revue reseaux et informatique repartie.

[36] Adam Beguelin, Jack Dongarra, Al Geist, Robert Manchek, and Vaidy
Sunderam. A users’ guide to PVM: Parallel Virtual Machine. Technical Report
ORNL/TM-11826, Oak Ridge National Laboratory, July 1991.

[37] Alan George, Michael T. Heath, and Joseph Liu. Parallel Cholesky factorization
on a shared-memory multiprocessor. Linear Algebra and Its Applications, 77:165-
187, 1986.

[38] A. Gaber Mohamed, Geoffrey C. Fox, Gregor von Laszewski, Manish Parashar,
Tomasz Haupt, Kim Mills, Ying-Hua Lu, Neng-Tan Lin, and Nang-kang Yeh.
Applications benchmark set for Fortran-D and High Performance Fortran.
Technical Report SCCS 327, Northeast Parallel Architectures Center, Syracuse

University.

[39] Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and
Vaidy Sunderam. PVM 3 user’s guide and reference manual. Technical Report
ORNL/TM-12187, Oak Ridge National Laboratory, May 1994.

[40] Jason Gait. A probe effect in concurrent programs. Software — Practice and
Experience, l6(3):225—233, March 1986.

[41] Barton P. Miller and Jong-Deok Choi. Breakpoints and halting in distributed
programs. In 8th International Conference on Distributed Computing Systems,
pages 316—323, San Jose, California, 1988.

[42] Yoshifumi Manabe and Makoto Imase. Global conditions in debugging
distributed programs. Journal of Parallel and Distributed Computing, 15:62—
69, 1992.

131

[43] Dieter Haban and Wolfgang Weigel. Global events and global breakpoints in
distributed systems. In Proceedings of the 2lst Annual Hawaii International

Conference on System Sciences, Volume II, Software Track, pages 166-175,
January 1988.

[44] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system.
Communications of the ACM, 21(7):558—-565, July 1978.

[45] Sue Utter-Honig and Cherri M. Pancake. Graphical animation of parallel Fortran
programs. In Proceedings of Supercomputing ’91, pages 491—500, Albuquerque,
New Mexico, November 1991.

[46] Cherri M. Pancake. Customizable portrayals of program structure. In Proceedings

of ACM/ONR Workshop on Parallel and Distributed Debugging, pages 64~74,
San Diego, California, May 1993.

[47] Alva L. Couch. Seecube user’s manual. Technical report, Tufts Univ. Dept. of
Computer Science, 1987.

[48] Jack Dongarra, Orlie Brewer, James Arthur Kohl, and Samuel Fineberg. A tool
to aid in the design, implementation, and understanding of matrix algorithms for
parallel processors. Journal of Parallel and Distributed Computing, 9:185—202,
1990.

[49] Daniel A. Reed, Ruth A. Aydt, Tara M. Madhyastha, Roger J. Noe, Keith A.
Shields, and Bradley W. Schwartz. The Pablo performance analysis environment.

Technical report, Department of Computer Science, University of Illinois, 1992.

[50] Allen D. Malony, David H. Hammerslag, and David Jablonowski. Traceview: A
trace visualization tool. IEEE Software, pages 19—28, September 1991.

[51] Mark V. LaPolla. Towards a theory of abstractions and visualizations for

debugging massively parallel programs. In Hawaii International Conference on
System Sciences-25, 1992.

[52] John V. Guttag, James J. Horning, and Jeannette M. Wing. Larch in ﬁve easy
pieces. Technical Report 5, Digital Equipment Corporation Systems Research
Center, July 1985.

[53] Jeannette M. Wing. Writing Larch interface language speciﬁcations. ACM
Transactions on Programming Languages and Systems, 9(1):1—24, January 1987.

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

132

John V. Guttag, Ellis Horowitz, and David R. Musser. Abstract data types and
software validation. Communications of the ACM, 21(12):1048—1064, December
1978.

John V. Guttag, James J. Horning, and Andrés Modet. Report on the
Larch shared language: Version 2.3. Technical Report 58, Digital Equipment
Corporation Systems Research Center, April 1990.

M.L. Brady, K.K. Jung, M.V. LaPolla, H.T. Nguyen, R. aghavan, and
R. Subramonian. The assignment problem on parallel architectures. In The
Ist DIMACS Intl. Algorithm Implementation Challenge: Problm Deﬁnitions and
Speciﬁcations, 1992.

Thinking Machines Corp., Cambridge, Massachusetts. Paris Reference Manual,
version 6.0 edition, Feb. 1991.

Mark Vincent LaPolla, Joseph L. Sharnowski, Betty H. C. Cheng, and Kevin
Anderson. Using formal speciﬁcations to generate visualizations of data
parallelism. Technical Report CPS-92-05, Michigan State University, July 1992.

Robert Cooper and Keith Marzullo. Consistent detection of global predicates. In
Proc. of the ACM/ONR Workshop on Parallel and Distributed Debugging, pages
167—174, 1991.

R. Curtis and L. Wittie. BugNet: A debugging system for parallel programming
environments. In Proc. of 1982 Distributed Computing Systems, pages 394—399,
1982.

Steve Sistare, Don Allen, Rich Bowker, Karen Jourdenais, Josh Simons, and
Rich Title. A scalable debugger for massively parallel message-passing programs.
IEEE Parallel 8 Distributed Technology, pages 50—56, Summer 1994.

Richard J. LeBlanc and Arnold D. Robbins. Event-driven monitoring of
distributed programs. In Proc. 5th International Conference on Distributed
Computing Systems, pages 515-522, May 1985.

Alfred A. Hough and Janice E. Cuny. Belvedere: Prototype of a pattern-oriented
debugger for highly parallel computation. In 1987 International Conference on
Parallel Processing, pages 735—738, 1987.

Alfred A. Hough and Janice E. Cuny. Initial experiences with a pattern-oriented
parallel debugger. In Proceedings of the ACM SIGPLAN/SIGOPS Workshop on
Parallel and Distr. Debugging, pages 195—205, January 1989.

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

133

Peter Bates. Debugging heterogeneous distributed systems using event-based
models of behavior. In Proceedings of the ACM SIGPLAN/SIGOPS Workshop
on Parallel and Distributed Debugging, pages 11—22, January 1989.

Janice Cuny, George Forman, Alfred Hough, Joydip Kundu, Calvin Lin,
Lawrence Snyder, and David Stemple. The Ariadne debugger: Scalable
application of event-based abstraction. In Proceedings of ACM/ONR Workshop
on Parallel and Distributed Debugging, pages 85—95, San Diego, California, May
1993.

Vincent A. Guarna, Jr., Dennis Gannon, David Jablonowski, Allen D. Malony,

and Yogesh Gaur. Faust: An integrated environment for parallel programming.

IEEE Software, pages 20-27, July 1989.

J.J. Dongarra and DC. Sorensen. SCHEDULE: Tools for developing and
analyzing parallel fortran programs. In L. Jamieson, D. Gannon, and

R. Douglass, editors, The Characteristics of Parallel Fortran Programs, pages
363—394. The MIT Press, Cambridge, MA, 1987.

M. Zimmermann, F. Perrenoud, and A. Schiper. Understanding concurrent
programming through program animation. In Proc. of 19th SIGCSE Technical
Symposium on Computer Science Education, pages 27—31, 1988.

David Socha, Mary L. Bailey, and David Notkin. Voyeur: Graphical views of
parallel programs. SIGPLAN Notices, 24(1):206—215, January 1989.

John T. Stasko and Eileen Kraemer. A methodology for building application-
speciﬁc visualizations of parallel programs. Journal of Parallel and Distributed
Computing, 18:258-264, 1993.

Gruia—Catalin Roman and Kenneth C. Cox. A declarative approach to visualizing
concurrent computations. IEEE Computer, pages 25—36, October 1989.

Gruia—Catalin Roman and Kenneth C. Cox. Declarative visualization in the
shared dataspace paradigm. In 1989 IEEE 11th Intl. Conf. on Software
Engineering, pages 34—43, 1989.

APPENDIX

134
APPENDIX: LOTOS Speciﬁcation of the Parallel
Computing Environment

In this appendix, we provide the full LOTOS speciﬁcation for the parallel computing
environment that was discussed in Section 2.1.2. The process deﬁnitions for
Host_ Node and Worker_ Node are intentionally left blank, since their behavior depends

on the particular program that is being speciﬁed.

speciﬁcation Message-Passing_Program(TOTAL_WORKERS:Nat): noexit
(* List of Program-Speciﬁc Variables should be added to the parameter list *)

library NaturalNumber endlib

behavior

(
Host_Node[send,recv] (TOTAL_WORK ERS)

lll
All_Worker_Nodes[send,recv](TOTAL_WORKERS,TOTAL_WORKERS)

All_Host_to_Worker_,Channels[send,recv](TOTAL_WORKERS)
| | I
All_Worker_to_Worker_Channels[send,recv](TOTAL_WORKERS,
TOTAL_WORKERS)
)

where

process Host_Node[send,recv](NUM_WRKRS:Nat): exit :=

(
(* Fill in deﬁnition for Host Node *)

)
endproc (* Host_Node *)

process All_Worker_Nodes[send,recv](NUM-WRKRS,N0deCtr:N at): noexit :=
[NodeCtr > 0] -—>

(
Worker_N ode [send,recv] (N UM_WRKRS,N odeCtr)

Ill
All_Worker_Nodes[send,recv](NUM_WRKRS,NodeCtr - 1)

135

)

where
process Worker-Node[send,recv](NUM_WRKRS,MY_NUM:Nat): exit :=

(

)
endproc (* Worker_Node *)

endproc (* AIL Worker_Nodes *)

(* Fill in deﬁnition for Worker Node *)

process All_Host_to_Worker_Channels[send,recv](NodeCterat): noexit :=
[NodeCtr > 0] —;

(
Host_to-Worker_Channel [send,recv] (N odeC tr,0)

| | |
Host_to_Worker_Channel[send,recv] (0,NodeCtr)

Ill
All_Host_to_Worker_Channels[send,recv](NodeCtr - 1)

)
where
process Host_to-Worker_Channel[send,recv](Sender,Rcvr:Nat): noexit :=
(* Replace “Message” with the name of an appropriate sort *)
send lSender chvr ?Msg:Message;
recv lSender chvr lMsg;
Host_to_Worker_Channel[send,recv](Sender,Rcvr)
endproc (* Host_to_ Worker_ Channel *)
endproc (* All_Host_to_ Worker_ Channels *)

process All_Worker_to_Worker_Channels[send,recv](NodeCtr,
NUM-WRKRS:Nat): noexit :=
[NodeCtr > 0] —+

(
Single_Worker_Channels[send,recv] (NodeCtr,NUM_WRKRS)

Ill
All_Worker_to_Worker_Channels[send,recv] (NodeCtr - 1,NUM_WRKRS)

)
where
process Single_Worker_Channels[send,recv](S_Node,R_Ctr:N at): noexit :=
(* Do not create a self-loop channel *)

[R_Ctr > 0 A S_Node 75 R-Ctr] —»
(
Worker_to_Worker_Channel[send,recv](S_Node,R-Ctr)

Ill
Single-Worker_Channels[send,recv](S_Node,R_Ctr - 1)

)
[l

136

[R_Ctr > 0 /\ S_Node = R_Ctr] ——>
(

)

where
process Worker_to_Worker_Channel[send,recv](Sender,
Rcvr:Nat): noexit :2
(* Replace “Message” with the name of an appropriate sort *)
send lSender !Rcvr ?Msg:Message;
recv lSender !Rcvr lMsg;
Worker_to-Worker_Channel [send,recv] (Sender, Rcvr)
endproc (* Worker_to_ Worker_,Channel *)
endproc (* Single_ Worker_ Channels *)
endproc (* All_ Worker_to_ Worker_ Channels *)

Single_Worker_Channels[send,recv](S_Node,R_Ctr - 1)

endspec (* Message-Passing-Program *)

HICHIGQN STATE UNIV. LIBRARIES
[HIHHIINHII][1]][I[[NIWIIIHH[Willi]
31293013879899