~ ,
:‘F-a -———

0-; T . v, 1 -- ‘ w . . D .- .H -<v—-v-v- , ..v . ... . . ,_,_ _

. ,- -. 3; ,

, . , . "3.3.;

V ' - ' . 1 "‘5?!

. . _ 77 x_~'<

«sud.»

Muutmocassme usme A ;
PRGGRAM smucruaas ‘ ‘ .

Dzssertatmn for the Degree of Ph D
MlDHTGAU STATE UUWERRITY
EDWARD LOUIS LAMEE

‘ 1974 ‘ -

 

v—v

......

 

 

TTTTTLTTTTTTTTTTTTTTTTT I J
.3 UAivetsity ,

 

 

ABSTRACT
MULTIPROCESSING USING PROGRAM STRUCTURES
BY
Edward Louis Lamie

A model which is capable of representing computations on a multi-
processing system is deveIOped. The model takes the appearance of a
directed graph where each node contains both a data vector and a mapping
vector. The capy of a node concept is introduced and is used extensively
throughout the dissertation. various preperties of the model are investi-
gated including equivalence classes of input data and maximally parallel
form. Solutions to the problems of repeatability, interference, and
deadlock are presented. Properties of models which have restrictions

placed on available resources are also studied.

MULTIPROCESSING USING

PROGRAM STRUCTURES

By
Edward Louis Lamie

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

197#

ACKNOWLEDGEMENTS

I am deeply grateful to Professor Lew Greenberg for helping me to
develOp many of the ideas in this dissertation and also for serving as
my committee chairman. I am also grateful to Professors Harry Hedges,
Marshall Hestenes, and Carl Page for serving on my doctoral committee.

I am also indebted to my wife, Mary Ellen, for her patience and

understanding throughout my years of study.

11

TABLE OF CONTENTS

LIST OF FLGURES
LIST OF SYMBOLS
CHAPTER I INTRODUCTION
1.1 Background
1 .2 Previous Work
1.3 Statement of Problem
CHAPTER II FORMAL DEFINITION
2.1 Model Specification
2.2 Examples
2.3 Additional Features
CHAPTER III COMPUTABILITY
5.1 Implementation of a Flowchart Schema
3.2 Equivalence Classes of Data
5.3 An Analysis Formalism
CHAPTER IV CONDITIONS FOR REPEATABILITY
CHAPTER V IMPLICATIONS
5.1 Elimination of Classical Problems
5.2 Finite Program Structures
5.3 Capy Free Program Structures
5.# Comparison of FPS and CFPS
CHAPTER VI SUMMARY AND CONCLUSIONS
6.1 Summary

6.2 Conclusions

iii

10

36

53
71

96

6 .3 Future Work
BIBLIOGRAPHY

TABLE or CONTENTS (continued)

iv

1.1
2.1

2.2

2.7

2.8

2.9
2.11

3.1a

3.1b
3.2

3.3

3.#

3.5

LIST OF FIGURES

Organization of the ILLIAC IV
A pictorial representation of node N1

Trigonometric problem using two dimensional
coordinates

Conventional flowchart for Newton-Raphson algorithm
Program Structure for Newton-Raphson algorithm
A "more parallel" representation of Figure 2.#

Implementation of a Finite State Machine using
a conventional flowchart

Implementation of Finite State Machine using
Program Structure

Definition of mappings used in the Program
Structure of Figure 2.7

Use of a Program Structure to perform table lookup
Directed Graph containing a cycle
Flowchart Schema containing a loop

Segment of a Flowchart Schema which can be processed
in parallel

Implementation of parallelism in (a)
Example of a node that is also a junction

Use of a transition statement to describe the
transmittal of information to a junction

Use of a transition statement to describe the action
taken by a node containing a multiple mapping

A transition diagram made from transition statements
of Program Structure in Figure 2.#

V

15

22

26
28

29

32
3H
3h
39

39
41

#9

52

A.1
h.2

4.3
w.
m5
5.1

LIsr or FIGURES (continued)

Example of a non-Repeatable Program Structure

Example of a cycle-free, junction-free Program
Structure which is not repeatable

Program Structure which contains a critical race
Example of "poor" parallelism
Example of Maximally Parallel Form

Functional Diagram of a Pipeline Processor

v1

55
58

6A
66
67
88

02
erg

U
y.

:1
Ho

0\

#(B)

ewi )

3..

2° u; Lu :4 c: e:

w
0

|n<®

OH

2;

2

LIST OF SYMBOLS

Lease

node N1

the data vector of Ni

the mapping vector of N1

is an element of some set or vector
undefined element

number of elements in B

predicate to determine whether D; is defined
the jth copy of Ni

set of known nodes

set of unknown nodes

set of active nodes

there exist(s)

such that

an Operation on the elements of B and C
performance of a mapping

for all

is a subset of

set of input nodes

set of output nodes

Nk is a junction

index set of all nodes

classification set for all nodes

v11

10
1o
10
1o
11
11
13
16
17
17
17
1?
17
18
19

31
32

32

T

A

vi
FPS
Size(Ni)
Mam
t 3' 1‘1
D

CFPS

LIST OF SYMBOLS (continued)

set of test values

logical and

value vector of Ni

Finite Program Structures

amount of memory required by Ni

total amount of memory available

sets used to describe A and K

set of deadlock free execution sequences

Copy Free Program Structures

V111

B‘oB’oSggE

78

81

82
85

CHAPTER I

INTRODUCTION

1.1 Background

Since the first electronic digital computer was introduced in the
l9#0's, there have been numerous increases in device speed and reliability.
Most of these advances can be directly attributed to hardware technology
such as miniaturization. However, technology appears to be approaching
a physical limitation as it attempts to greatly increase device speed.
Consider the fact that electricity can travel approximately one foot in
one nanosecond and many of today's computers Operate with speeds in the
order of tens of nanoseconds. Future advances in technology such as
large scale integration will undoubtedly increase device speed. However,
major improvements in device speed and reliability will probably occur as a
result of increased parallelism.

Parallel Operation is not a new concept. It has long been present in
many hardware devices. For example, one technique to increase the speed
of adders and multipliers is to perform many of the bit Operations in
parallel.f Another way to improve system performance is to have the
input/Output devices and their channels Operate in parallel with the central
processing unit. Still another method of increasing computational capa-
bility is to develOp a system which has two or more processors which can
Operate in parallel. The ILLIAC IV is an example of such a system (see
Figure 1.1). It consists of four quadrants wherein each quadrant contains

sixty-four processing (or arithmetic) elements and one control unit. The

 

* See "Design of parallel binary adders", Hellerman (2R)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

64 PE's 64 PE's
1 control 1 control
unit unit
6% PE's 64 PE's
1 control 1 control
unit unit
Parallel

access disk

 

 

 

 

Figure 1.1 Organization of the ILLIAC IV

3
four quadrants share a uniprocessor (a Burroughs 6500) and a large disc

memory. Each of the four quadrants can work independently or they can
be joined together in a single array to process large problems. A de-
tailed discussion of the ILLIAC IV can be found in (4, 35).

Besides speed and reliability, there is another important reason
to investigate parallel processing (the term multiprocessing is used
synonomously). That reason is economics. As computer systems become
more complex, effective utilization of the system's resources is necessary
in order for the system to be cost effective. Time sharing and real time
processing present problems where all of these considerations must be 2
taken into account.

That multiprocessing is an important topic is evidenced by the fact
that discussion of this problem appears in most recent texts on Operating
systems (6,14,#5). Even though the justification for using multiprocessing
may be apparent, the techniques fer establishing such systems are not well
develOped. Organizing computations so that a number of processors can be
simultaneously working on them is a complex task. Organizing the compu-
tations so that all n processors of an n processor system are effectively
utilized is an even more complex task.

To illustrate one type of problem which can arise, consider the
following data transformation D1 and D2 where x, y, and z are storage

locations and f, g, and h are Operators.

D1 f(x)—-x
g(x)--y
D2 h(y)-4>z

The data transtrmations D1 and D2 must be done sequentially, but if the

Operations in D1 are performed concurrently, then the result of D2 will

1,
vary depending on the speed with which the Operations in D1 are performed.
For example, if Operation f is performed quickly, then it is possible to
Change the value of x before Operation g is begun. Another problem which
can arise is that f will attempt to change the value of x at the same

instant that g is attempting to access the value of x.

1.2 Previous Work

Petri nets were develOped by C. A. Petri (45) and are a means of
representing concurrent Operations. A Petri net is a labelled directed
graph that has only two node types call "places" and "transitions".
Performing an Operation is analogous to "firing" a transition in Petri
terminology. If conditions are met as specified by the places, the
transitions can be fired asynchronously. Petri nets have been used to
study conditions leading to deadlock (a situation in which it becomes
impossible to fire any transitions). Special cases of’Petri nets have
been analyzed and solutions to the deadlock problem have been develOped.
However, for many problems including the mutual exclusion relationship
between two transitions, only the general Petri net can be used. Many
of the concepts established by Petri can be found in other models including
Luconi (3A), Karp and Miller (29), Rodriguez (#1), and Slutz (#3). A
major drawback of Petri nets is that they are not sufficiently general
to handle parallel computations. A good example of this is the fact that
the "not" Operation (a transition fires if and only if a place if empty)
cannot be implemented.

Karp and Miller (29) have introduced a mathematical model for
parallel computation which is called "Parallel program schemata". The

model is an asynchronous system which consists of a set of Operations

5

which Operate on a set of memory locations. Each schemata can be defined
by two directed graphs, a data flow graph and a control graph. The data
flow graph specifies the domain and range (in terms of memory locations)
of each Operation. Thus there are two types of nodes in this graph, one
type to represent Operations and the other to represent memory locations.
Similarly, there are two types of nodes in a control graph, one to represent
Operations and the other to represent control states. The control graph is
used to specify the order in which the Operations will be initiated. More
specifically, the control states determine when Operations can be initiated.
The control graph is similar to a Petri net. The model developed by Slutz
(#3) is a generalization of the Karp and Miller model. Among other results
obtained by Karp and Miller, decision procedures were established fOr such
properties as equivalence, determinacy, and boundedness.

Dijkstra (19) has prOposed a method of communication between two
or more processors that share memory. All of the processors have access
to special memory locations called semaphores. The semaphores contain
information that is used to block a processor from entering its "critical
state" if any other processor is in its critical state. Thus, the processors
can communicate with each other, but they are prevented from interfering
with each other, i.e., having two processors in their critical states at
the same time. Dijkstra also considers the problem of "deadly embrace",
i.e., two processors need to go into their critical states but each is
waiting for the other to go first. Neither processor ever gets to its
critical state because each is saying, in effect: After you, after you,
after you,.......... Dijkstra is not concerned with the writing of parallel

programs, but rather specification of a system which Operates in parallel.

6

Rodriguez (#1) has introduced a parallel program model which he calls
"Program graphs". In this model, the computation elements are repre-
sented by nodes of a directed graph. Storage and transmission of
information are represented by the links between the nodes. The activation
of a node depends entirely on information residing in links that point

to that node. At any point in time, there may be many nodes that are
active. LOOping is achieved with the help of Special node types and
connection rules are specified so that determinism is assured. Program
graphs are shown to be deterministic in general, i.e., for any compu-
tation the final state is unique if started from the same initial state.

Luconi (3#) has prOposed a model fbr'representing communicating
processes which he calls "Asynchronous Computational Structures". These
structures have the ability of sharing memory and allowing the processes
to proceed concurrently. Depicted graphically, Operators reside in named
nodes while information resides in links which are also given a node-like
representation. There are no explicit timing constraints placed on the
Operators, so they can act asynchronously with respect to each other.
Complete functionality (a form of determinism) is proven based on several
conditions. The work of Rodriguez is given as an example of a computational
structure.

Slutz (#3) has develOped a model for parallel algorithms which he
calls "Flow Graph Schemata". The model is depicted as two directed
graphs, one for data flow and the other for control. In the data flow
graph, memory cells are represented by circular nodes. As in other
models, the functions may act asynchronously with respect to each other.
It is also possible for two or more applications of the same function

to be in progress concurrently. This behavior, called pipelining, uses

7

a FIFO queue to store invocations of any function. Conditions for
determinism are established and equivalence of certain classes of Flow
Graph Schemata are investigated.

Some attempts to model parallel computations have taken the approach
of adding parallel instructions to an existing programming language such
as ALGOL or PL/I. Anderson (2) has prOposed use of the M3931
statements as one method of explicitly declaring that two or more proce-
dures may be executed in parallel. Similar approaches have been taken
by Dijkstra (l9), Conway (l6), and Dennis and van Horn (18). This type
of approach is satisfactory as long as the parallel procedures do not
Operate on common variables. If the procedures do Operate on common
variables, then precautions must be taken to avoid certain timing pro-
blems such as the possibility of simultaneously changing the value of
a common variable. Anderson (2) has stated that his terminate, 222222:
and release statements solve this problem for his approach.

Although previous papers have prOposed various solutions to problems
posed by multiprocessing, none of them are widely used. This is due
principally to a lack of generality or the cumbersomeness of the models.
It is difficult to imagine any large scale problem being programmed using
any of the previously discussed models. The model described in chapter II
is intended to be useful in the creation of programs that are easily
written and can be efficiently executed on multiprocessor systems. As

such it is a programming scheme rather than a model for machine design.

1.3 Statement of Problem
This research is centered around the develOpment of a model fOr

parallel computation which is called "Program Structures". The model

8

is more general, more powerful, and conceptually simpler than the pre-

vious models discussed. Some problems that plague other models are not
relevant here. For example, it is shown that the problem of two processors
being in their critical states simultaneously is not really a problem at
all. Consequently, other problems such as the deadly embrace also disappear
in this model.

This dissertation is directed toward the creation of algorithms or
programs which can be executed on a multiprocessing system. The model
that achieves this situation takes on the appearance of a directed graph
where the nodes contain both information and transformations or mappings
of that imformation. The directed arcs are used to indicate the trans-
mission of the transformed information to other nodes. There is no
explicit control mechanism established to direct the execution of the
program. Control is considered to be taken care of by the prOblem
description itself and thus is an integral part of the parallel program.

Shared memory is used in this model. However, many traditional
problems are avoided by the use of a new feature call the "cOpy" of a
node. In previous models, there could only be one access to a given
piece of information in memory allowed at any one time. Using the
concept of OOpied nodes, there is no limit to the number of simultaneous
accesses to a given piece of information in memory. Besides eliminating
many previous problems, use of this concept has the effect of speeding
up the execution of a program.

A formal description of the model is presented, along with several
examples. Capabilities of the model are discussed as well as a proof
that the model is capable of implementing any flowchart schema. It is

shown that for some Program Structures the set of input data can be

9
placed in equivalence classes. An analysis formalism is develOped to

aid in the discussion of various prOperties of Program Structures.
Conditions fer a form of determinism (called repeatability) are estab-
lished and maximal parallelism is investigated. Solutions to problems

of interference and deadlock are presented. Finally, prOperties of
Finite Program Strutures and COpy Free Program Structures are investigated

and conditions for equivalence are presented.

CHAPTER II

FORMAL DEFINITION

2.1 Model Specification

The purpose of this section is to describe Program Structures as
well as some of their components and attributes. In order to do this,
the simplest element, the node, is defined first. After that, the
prOperties of the node are defined and then the definition of Program
Structure is presented.

As has been stated before, this model is depicted as a directed
graph wherein the nodes contain both data values and mappings between
nodes. The directed arcs are used to indicate which nodes are to have

information mapped into them.

Definition 2.1

A 2292 Ni is a set defined as follows:
Ni = {D1, Mi} where D1 is an n-tuple which contains intonation
associated with Ni, where ne'O, and M1 is an m-tuple whose components
are mappings of D1 into various nodes, where ma 0. If f 6 Mi, then f
is a rule which uses some or all of the data in D1'" to determine some
or all of the data values of some Dj. D1 is referred to as the d_a_t_a_

vector and M1 is referred to as the mapping vector.’

For example, Figure 2.1 shows a pictorial representation of a node.

In this example, node N1 might be used to represent a point in space that

 

'1' The set notation f 6 M1 is also used to denote some element in the

m-tuple M1.
10

11
passes information on to two other nodes. The mappings f and g would
be previously defined.
In order to facilitate the discussion of prOperties of nodes, several

notations and definitions are now introduced.

Definition 2.2
An element x e D1 is said to be defined if it has a value assigned
to it. Otherwise, that element is said to be undefined. For convenience,

undefined elements will be denoted by a question mark "?".

Definition 2.3

Given a set or vector B, let #(B) be defined to be n, where n is the

number of elements in B.

For example, given D1 6. N1, then #(D1) = j where D1 is a j-tuple.
Frequently there may be many elements in D1 or M1 and it is often
necessary to refer to a Specific element. The next definition provides

the notation to accomplish this.

Definition 2.#
. i .th i . i . .
Given D , the 3 element of D is denoted by Dj' Similarly, the

kth element of M1 is denoted by M; or ft.

It has been established that any element in D1 is either defined
or is undefined. This property is very important when the type of a node
is being determined. The next definition provides notation that can be

used in classifying a node.

12

 

 

 

 

 

 

 

1)1 .
M].
X
i (ml—+133
N y
g:D1-—*-Dk
Z

 

 

 

 

 

 

 

 

Figure 2.1 A pictorial representation of node N1

13
Definition 2.5,

Given D32 6 D1, let @(DE) be defined as follows:
i 1 if D3 is defined
@(D,) a
J

0 otherwise

Now it is possible to determine the type classification of any
node. This classification is not only fundamental, but essential to

the asynchronous Operation of Program Strutures.

Definition 2.6 i
i i #(D ) i i i
If D é N and E @(Dj) = #(D ), then node N is classified
i=1
as a known node. Furthermore, if a node is not a known node, then it is

classified as an unknown node.

This definition simply states that if every element in Di is
defined, then node N1 is a known node.

During the execution of a Program Structure, nodes will have values
mapped into their data vectors. Whenever a node has values specified
for every element in its data vector, then that node becomes a known
node. Until that happens, the node is classified as unknown. The
reason for making this distinction is that when the execution of a
Program Struture is defined, only known nodes can be processed.

The mapping vector of a node is used to transform or manipulate
information in the data vector and then place it in other nodes. The
mapping vector can be the empty set or it can contain one or more

mappings. In any case, the contents of the mapping vector remains

IA

unchanged throughout the execution of a Program Structure.

Definition egg
The domain of a node N1 will be denoted by Domain (N1). The

gaggg of a node N1 will be denoted by Range (N1). For any node N1,

Domain (N1) D1 where D1 e N1
Range (Nl) = [Dal DJ 6 NJ where N;] is an unknown node and

£33 D1 -»D3}

The primary consequence of this definition is that if f is a mapping
from Ni into Nj, then Nj must be an unknown node. This further eliminates
the possibility of mapping a known node into another known node and changing
the result of a previous computation.

As an example, consider the trigonometric problem in Figure 2.2 where

(x1, y1, d1) and (x2, y2, d2) are known and (x , ) is unknown. The

’3

quantities <11 and d2 are direction vectors and (xi, yi) is a two dimen-

sional coordinate.

The problem can be solved by using two known nodes N1 and N2, and
two unknown nodes N3 and N“. The mappings f, g, and h are used and are

defined as follows:

f(D1) = (D1, D2, Dg, ?, ?, ?) where ? indicates an
undefined value
g(D2) = (?, 2, ?, D2 D2 D2)

1’ 2’ 3

h(D5)

9
(x3, y3, .) where h calculates x3 and y3

From the preceding exmle it is apparent that for any M1 6 NJ",

Ml consists of zero or more mappings from Dlé M1 into the range of N1.

15

 

 

Figure 2.2 Trigonometric problem using two dimensional coordinates

16
In most models of parallel processing, much attention is given to
the control of execution of the model. In some cases, such as the models
prOposed by Rodriguez and Slutz, control is maintained by the use of
control nodes or even a control graph. In the model being proposed here,
control is not a major concern because the data being processed will
serve implicitly as the control. The next concept which is defined will

help achieve this effect.

Definition 2.8
A cOpy of a node N1 is another node which has the identical mapping
vector M1, the same number of elements in the data vector D1, but possibly

the contents of one or more of the data elements differ.

The notion of a cOpy of a node will be used extensively in the
execution of Program Structures. Since it is possible to have a number
of cOpies of a node, some additional notation is necessary to eliminate

any possible confusion.

Definition 2.3
.th i . i,j .

The 3 cOpy of node N will be denoted by N . In particular,
the original node N1 will be denoted by NJ"0 or simply as N1. The same
notation will be used for the mapping and data vectors as well.

Having established these preliminary concepts, it is now possible

to present the definition for the Program Structure.

Definition 2.10

A Program Structure is a triple (K, U, A) where

17

FR
ll

fulﬁl I N"3 is a known node} and K is called the

set of known nodes

U = {N1’0 ' NJ"0 is an unknown node and U is called the
set of unknown nodes, the set U never decreases in size

and the data elements in each node of U are never changed.

A = iTNl’J ' N1’3 is a cOpy of some node in U, j > 0, N1’3
is unknown, and 3 x a D1'3 3 @(x) = 1} and A is called

the set of active nodes

The most striking difference between Program Structures and earlier
models of parallel computation is the simplicity of the former. This
simplicity does have a price, i.e., the amount of storage required by
Program Structures would probably be greater than that of most other
models if they were implemented on a multiprocessing system. On the
other hand, the amount of internal overhead would most likely be dras-
tically reduced and many problems encountered in other models are
easily overcome by Program Strutures as demonstrated in Chapter V.

The set of known nodes is analogous to a set of "current states"
using Finite State Machine terminology. Since a Finite State Machine is
a sequential machine, there is at most one current state at any instant
in time. Since a Program Structure is intended to resemble a parallel
machine, there can be many current states at any point in time. Similarly,
the set of active nodes is analogous to a set of "next states". The set
of unknown nodes does not have a direct correlation to Finite State

Machines, but it can be regarded as a "master file".

18
When a node maps information to another node it might be possible
that the contents of a previously defined element of a data vector
would be altered. To avoid this situation, mappings will be performed
in a special manner. In order to define how mappings are carried out,

the following Operation on sets is required.

Definition 2.11
Given n-tuples B and C where #(B) = #(C), B & C is the Operation

on the components of B and C defined as follows:
If 3 bié B, cié c 3@(bi)=@(ci)=lthenB&C=B

else B & C = D where #(D) = #(B) and

 

hi if @(ci) = o
D: Ed. I d.= c. if@(b.)=O
1 1 l 1
for i = l,2,....#(B) L? if @(bi) = @(ci) = o

Forexample,ifB= {1,2,1}, '23 andC= {?,?,3,?}
thenB&C={1,2,3,?} .

Similarly, ifB= {1, 2, 2,2} andC= {an}, 2, 1?
thenB&C= {1,2,?,?}

Definition 2.12

k,m

The mapping szl’J-v D is performed as follows:

0 .
1. A copy of Nk’o is placed in A only if another copy of Nk’ is

19
not in A or if the information portion of those cOpies in A have
values defined in the positions which would be defined by f

2. f(Di’j) = (d1, d ...... 9 dn) is evaluated where n =.#(Dk’m)

29
and N1’3 is a known node
3. Dk‘r «- Dk’r 8. f(D1'3) for all Nk’ré A where r > o

4. If all the elements of data vector Dk’r are defined, then node

Nk’r becomes a known node and is removed from A and added to K

The definition of mapping above makes it impossible for any node to
alter the results of a previously processed node. It is assumed that
creation of copies is an indivisible Operation. If two nodes simultaneously
map to a common node and a cOpy does not exist, then only one cOpy would

be created and placed in A.

Definition 2.13

The performance of the mapping as defined above will be deonted by

Dk’m 6 Hum) where rznl'J —* Dk’m.

The execution of a Program Structure is simply a matter of visiting
the nodes in the set K. When there are no more nodes left to visit,
execution terminates. When visiting a node, all of the mappings in the
mapping vector are performed and information is passed to the specified
nodes. If there are two or more nodes in the set K, then these nodes
may be visited in any order. When visiting a node, it is first removed

from the set K; after being visited, it is destroyed. The following

definitions formalize these notations.

Definition 2.14
A node is said to be processed when that node is visited and all of

its mappings are performed.

The next definition specifies the Operation of a Program Structure.
Before proceeding with it, note that whenever an unknown node has all
of the elements of its data vector defined, it becomes a known node and

is added to the set K.

Definition 2.15
A Program Structure is said to be executed when the nodes in the
set K are processed. Execution is terminated when the set K becomes

empty and there is no node currently being processed.

It is not necessary to prescribe a procedure for executing a
Program Structure since only known nodes can be evaluated. However,
the following prOposition presents a possible algorithm for executing

a Program Structure.

Proposition 2.1
Any Program Structure can be executed using the following algorithm:

1. If K = O and there is no node currently being processed, then

go to step 6
2. Delete a node Nl’k from the set K
3. V f e Ml’k (w.l.o.g. assume f:D1'k--- D5"m where DJ’m 5

Range (Nl’k)) perform Dj,me f(D1’k)
A. If any nodes in A become known, delete them from A and add them

to K

21
5. Go to step 1

6. Terminate

Proof: Obvious from definition 2.15.

2.2 Examples

Several examples of Program Structures are now given in order to
make clear some of the preceding definitions. A conventional flowchart
is shown and compared to the equivalent Program Structure in each example.

The first example is the implementation of the Newton-Raphson root
finding algorithm, i.e., given an initial guess x, "improve" x until a
Specified degree of accuracy a is met:

f(xi) . . . .
= x. - where the initial guess 18 x0
f' xi)

The values for the initial guess x, the accuracy a, and a control
value n are input. The conventional flowchart appears in Figure 2.3
while the corresponding Program Structure appears in Figure 2.#.

In figure 2.h the root of the equation will appear in node N2 if a
root is found within n iterations. OtherWise, the most recent value
computed for x will appear in node N“. At any rate, when execution is
terminated either node N2 or node Nl+ (but not both) will appear in set A
and that result could be used by still another Program Structure.
Similarly, in Figure 2.3 the conventional flowchart could be a subprogram
which is used to communicate with a main program or other subprograms.

As a numerical example, suppose that a root is to be fOund for the
equation x2 - 5x + 3 = O. For illustrative purposes, it will be assumed

that each node will require one time unit and only the contents of the

22

 

 

 

 

INPUT

x, a, n

 

 

 

 

 

 

 

 

 

 

 

 

FPX*— f'(x)
x-v-x - FX/FPX

n-n—n-1

 

 

 

 

 
   

 

 

yes
OUTPUT
'error'

_I

j
cm

Figure 2.3 Conventional flowchart for Newton-Raphson algorithm

 

 

23

 

N2 N3
NA
The mappings are defined as follows:
1 1, (x-dx,a) if
f1(x, a, n) = (x, a) f1 (x,dx,a,n) = ldxl >- a

(x, ?) otherwise
f;(x, a, n) = (n)

f2(x, a) = (x, %§%%7, a, ?) f:(x, dx, a, n) = (n)

(7, ?, ?, n-1) if n > O

(?, ?, ?, ?) otherwise

13(n) 2

Figure 2.# Program Structure for Newton-Raphson algorithm

24
data vector of a node will be displayed. The contents of set U will not

be shown since it remains constant.

TIME Set K Set A

0 N1,1 5 (x)
.001 (a)

10 (n)

1 N2,1 5 (x)
.001 (a)

113'1 {10 (n)

2 5 (x)
N151 .6 (dx)

.001 (a)

9 (n)

3 .2 4.4 (x)
.001 (a)

. 113'2 { 9 (n)

4 N4,2 4.4 (x)
.095 (dx)

.001 (a)

8 (n)

5 N20} 4-305 (X)
.001 (a)

N313 {8 (n)

6 4.305 (x)
Nth} .002 (dx)

.001 (a)

7 (n)

,4 4.303 (x)
N2 if? (a)

In this example, the root is found after three iterations and the
result is placed in Nz’h. If there were no real roots to the equation
or if the root was not found after n iterations, the last computed value

would be placed in a COpy of Na. COpies of nodes N2 and N3 are in the set K

25
at the same time on several occasions; when this happens it does not
matter which node is processed first. This is characteristic of the
asynchronous behavior of a Program Structure.

The Program Structure described in this example could be made "more
parallel"‘f by breaking node N2 into two nodes thus potentially allowing
more nodes to be processed in parallel. The maps in each of the new
nodes could compute f(x) and f'(x) respectively. A third node could be
used to compute dx. The Program Structure in Figure 2.5 is an example
that shows how this might be done. Although this Program Structure
contains more nodes than the previous one, the execution time would
undoubtedly decrease if more than one processor is available to process
the nodes. This would be due to the fact that more Operations could be
performed simultaneously.

The next example of a Program Structure is the implementation of a
Finite State Machine. A definition of Finite State Machines is first

presented.

Definition 2.16
A Finite State Machine M is a quintuple M = (S, I, s, 5’, E? where
S is a finite set of states
I is a finite set of input symbols
8 is the start state (868)
5 is the transition (next state) function

F is a set of final states (FE S)

 

f Maximal Parallelism will be discussed in Chapter IV.

26

 

 

 

Figure 2.5 A "more parallel" representation of Figure 2.4

27

The input to the machine can be a sequence of symbols on a tape or
simply a string of symbols. Since they are equivalent notions, the
conventional flowchart will input from a tape and the Program Structure
will input from a string. In the conventional flowchart in Figure 2.6
the tape is considered to be device one. The tape is assumed to be in
position to read the first symbol prior to execution. Symbols are read
from the tape one at a time, reading from left to right. Furthermore,
each symbol x is read at most once and the tape will advance to the next
symbol after each read. In this example, this process continues until
the machine enters a final state or when there are no more symbols left
to read. In the Program Structure of Figures 2.7 and 2.8, a string I is
used rather than a tape so the symbols are read from the string in a
similar fashion.

Note that ‘6 (x,s) = ? if there is no next state associated with
x and 8. Also note that the null input is a valid input. If the machine
ends in a final state, that state will be found in node N8 of the Program
Structure. If no node N8 exists in set A when execution steps, then a

final state was not reached.

2.3 Additional Features

No mention has been made of the concepts of input and output. This
has been the case since Program Structures are inherently capable of
input and output Operations. The following definitions merely formalize

this capability.

28

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[omin/
(“5)

Figure 2.6 Implementation of Finite State Machine using a
Conventional Flowchart

 

 

 

 

 

 

 

 

 

 

7
.3 .1
s 6
N7 I A. f
x

 

Figure 2.7 Implementation of a Finite State Machine using a
Program‘Structure

 

30

f1(s) = (3)
ram = (I)
r3(s) = (s, ?, 2)

f:(8’ I) = (8)

f:(s, I) = (I)

f6(I, X) = (?, I. X)
‘3“, 1, x) = ( 5(s,x), I)

7 (3,?) ifxéF
f2(s, I, x) =

(?, '2) otherwise

Figure 2.8 Definition of mappings used in the Program Structure of

Figure 2.7

31
Definition 2.12

The input to a Program Structure is a set I of nodes such that
I = {Ni ' Ni 6 K prior to execution of the Program Structure}
In other words, the input to a Program Structure is the set
of all nodes which are known before any node is processed. The output

from a Program Structure is defined next.

Definition 2.18

The output from a Program Structure is a set 0 of nodes Nl’J such

that

‘I. Nl’J 6 A when execution is terminated
2. Nl’J is the last capy of N1

3. N1 is specified prior to execution

To give an example of input and output nodes a sequential table
search algorithm is implemented. Given a table T which contains n
elements and a key x, determine whether or not x is in T and if it is,
output the position (or index i) that it occupies in the table T.

Figure 2.9 shows a Program Structure which performs table lookup.

1 2

Nodes N and N are input nodes and nodes N3 and N“ are used as output

nodes. If the table search is successful the position of x will be found
in the last copy of N“. If x is not in the table T, then the value n+1
will be found in the last copy of N3.

{wt NZ}

{N3, N4}

In other words: I

O

32

    
  

f1(x, 1) = (x, 1, '2, '2)

r2(T, n) = (2, 2, n, T)

 

(x,i+1,n,T)
4 if i'< n
r (x,i,n,T) = (x,i,n,T) if x é T(i)
(x,i+1,?,'l‘) f3(x,i,n,T) =
otherwise (x,i,?,T) otherwise

 

 

Pant-“N

Figure 2.9 Use of a Program Structure to perform table lookup

33
This means that the output from a Program Structure can be any

arbitrary set of nodes. These nodes are always the last cOpies of nodes
residing in set A when execution is terminated.

There are no unconditional branching features in the nodes of a
Program Structure. However, there is a capability for performing
leaping Operations or iterations which has been apparent in most of
the preceding examples. This capability involves the notion of cycle,

which is defined next.

Definition 2.12

A Program Structure contains a.gy§;g if there exist nodes Ni and N‘-j
such that when Ni becomes known then Nj will become known only if N1 is
processed and cOpies of Ni will become known only if N5 is processed, and

so on. For example: Ni’1 éNjn 7-? 19,2: Nj"2 9Ni’3=> .....f

The concept of cycle'here is analogous to cycles in graph theory
and IOOps in flowchart schemata. This analogy is depicted in Figure 2.10
and in Figure 2.11.

Given any Program Structure, it is not immediately apparent whether
or not execution will halt in a finite number of steps. The next definition

introduces terminology that will be used to discuss this problem.

Definition 2.20
A Program Structure is said to be solvable if execution Of that
Program Structure terminates in a finite number of steps. Otherwise,

a Program Structure is said to be unsolvable.

 

.r Ni’j=>>Nk’m means Nk'm will become known only if Nl’J is processed.

34

Figure 2.10 Directed Graph containing a cycle

 

C D

 

 

 

 

 

 

 

 

 

 

 

 

 

 

C)

Figure 2.11 Flowchart Schema containing a 100p

35
In general, Program Structures may not necessarily be solvable,

but there is one type that is.

Proposition 2.2
Any Program Structure which is cycle-free (contains no cycles)

is solvable.

21:923.:

Assume that there are a total of n nodes in the Program Structure.
Since the Program Structure is cycle-free, this means that each node will
be processed at most one time. Hence, the Program Structure will terminate

execution after processing a maximum of n nodes.

CHAPTER III

COMPUTABILITY

3.1 Implementation of Flowchart Schema

Most of the preceding discussion has concentrated on the internal

aspects of Program Structures. In this section, it is shown that

Program Structures can be used to implement any Flowchart Schema as well

as incorporating any inherent parallelism in the Flowchart Schema.

Definition 3.1

A Flowchart Schema (FS) is a 4-tuple (M, N, p, f) where

M is a finite set of memory cells
N = {5, e, B, A} is a finite set of nodes where

s is the starting node (i.e., the node to be processed first)

 

e is the ending node (i.e., the last node to be processed)
B = (b1, b2, b3, ....) is an n-tuple of branching nodes
A = (a1, a2, a3, ....) is an m-tuple of assignment nodes
and each assignment node modifies the value of one or
more memory cells.
p is an n-tuple Of predicates corresponding with the elements of B
f:N x M - N where f is a control function which Specifies
which node is to be processed next and where
f(x,d) = y where ye N - e (6. means don't care)

andx=sors€ A

f(x’ pi(m)) = y, e N - e where x 6 B, m s M, pi(m) 15 T

yze N - e where pi(m) is F

36

37

Theorem 331

A Program Structure can implement any Flowchart Schema.
arses

Since only known nodes can be processed in a Program Structure, the
data vector of each node will be M. The input node will have every element
in its data vector specified. Each node Ni in the Program Structure that
corresponds to an assignment node will have one mapping and it will be
of the form fi:Di--'-Dj and where fi(M) = M'. M' denotes M after assign-
ments have been made. Each node N1 in the Program Structure that corres-
ponds to a branching node will have two mappings of the form

fiwi-eDj where 5,114) = {M' if pi(m) is T

? otherwise

rial-wk where r204) = {w if pi(m) is F
? otherwise
The mapping for the ending node N1 would have the form f1:D1-—-~D1 where

f1(M) = ?. Hence, any Flowchart Schema can be rewritten as a Program

Structure.

In practice, the implementation of a specific FS could be much
less cumbersome. In general, nothing would be gained by such an

implementation unless some degree of parallelism were introduced.

Definition 3.2
The set of all memory cells referenced by assignment node ai will

be denoted by Ci'

38
Propositiong3.2
If a1 and aj are any two sequential assignment nodes of a Flowchart
Schema and 01" C3 = 0, then ai and aj can be processed in parallel
without altering the final outcome.
222.91:
Since the processing of one node does not affect the other, the nodes

may be processed in any order.

Thus any parallelism detected at the node level can be implemented
in the Program Structure. In Figure 3.1(a) a1 and aj are sequential
assignment nodes of a Flowchart Schema where Ci/l Cj = ﬂ. When the
Program Structure is created, this parallelism is incorporated by changing

several mappings as indicated by Figure 3.1(b).

3.2 Equivalence Classes of Data
In this sectLon, the set of all inputs are placed into equivalence

classes. To accomplish this, some preliminary definitions are first made.

Definition43g3
A node is called a junction if it must receive information from

two or more nodes in order to become known.

Figure 3.2 provides an illustration of a node that is a junction.
In case node N5 is a junction since it must receive information from both
nodes N3 and NI+ before it will become known. Notice also that if node

N1 is a junction, then #(D1)12 2. However, the converse is not true.

39

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i
1 J
CinC.=¢ I! N
a.
J
Nk
8k
A
(a) (b)

Figure 3.1 (a) Segment of a Flowchart Schema which can be processed
in parallel

(b) Implementation of parallelism in (a)

Definition_§.4

If node N‘ is a junction, then it is denoted by 8N1.

Definition 3.5

 

Given an input set I, an execution sequence is the series of

 

nodes which must be processed in order to produce values in a given

output set 0.

Definitiong3.6
Two execution sequences are said to be equivalent if they always

produce identical values in the output set 0 from a given input set I.

The processing of a cycle can be regarded as the processing of a
single node when relating to execution sequences. For notational pur-

poses, if a series of nodes Ni, .... , N3 are to be processed n times

 

(a cycle), then that cycle can be referred to as N1, .... , N3.

Proposition 3:3
If N1 and Nj are non-junction nodes that precede a junction SNk,
then the following execution sequences are equivalent:
(1) Ni, N3, SNk
(2) N3, Ni, 8Nk
2221::
If Ni and Nj are both known nodes, then the order in which they

are processed is immaterial.

Ln

y)

 

Figure 3.2 Example of a node that is also a junction

42

The next definition helps in the discussion of equivalence

classes of input data which follows shortly.

Definitiong3.2
R is the relation on inputs which has the meaning "has an equivalent

execution sequence as".

Thus for any two inputs I1 and IJ to a Program Structure, I1 R IJ
means that the execution sequence generated by I1 is equivalent to the

execution sequence generated by Ij.

Theoremp3.4

The relation R is an equivalence relation if and only if there is a
unique output set associated with each input set.
2:223

Follows immediately from definition 3.7 and the definition Of an

equivalence relation.

Definition 3.8

[11] = [1'] I I1 R 13} where R is an equivalence relation.

This means that E Ii] is the set of all inputs that have equivalent
execution sequences as Ii. This also means that each input to a Program
Structure is placed in one and only one such set.

That an equivalence relation partitions its field is a well known
prOperty. Consequently, the collection of sets E Ii] forms a

partition on the set of all inputs to a Program Structure.

43

Pictorially, the set of all inputs can be regarded as a data space
and the individual inputs can be visualized as data points. The disjoint
collection of data points can be thought of as partitions. A similar
type of analogy can be drawn for the set of all outputs. The major
difference is that the set of all outputs can be rigorously predefined.

It has been an easy matter to show that the set of inputs can be
partitioned. However, it is difficult to show how these partitions can
be formed. The following comment and example are intended to add insight
to this problem. From the definition, it follows immediately that the
number of equivalence classes of an input set is the number of distinct
(non-equivalent) executions sequences.

An example is now being given that shows the appearance of an
execution sequence as well as its meaning. The example uses the
Newton-Raphson root finding algorithm of Figure 2.4.

There are three distinct execution sequences possible in that Program

Structure:
1 2 3 . . . . .
1. N N N - if this execution sequence 18 obtained
then the input value of n was 0
2. N1 N2 N3 8N N3 - If this execution sequence is obtained
then a root was not found after n iterations
3. N1 N2 N3 8N N2 - if this execution sequence is obtained

then a root was found within n interations

Even though there are an infinite number of possible inputs to the
Program Structure, all of these inputs can be partitioned into three

equivalence classes. The three equivalence classes are:

1+1.

1. Ex, n50] where x is any value

2. Ea, n70] where a is an element of the set of all inputs
that will not result in finding a root

3. Eb, n> (D where b is an element of the set of all inputs

that will result in finding a root

In this example, it is not very difficult to partition the set of
inputs. The difficulty arises when a Specific input is to be assigned
to an equivalence class. Obviously, there are two conditions which
determine which equivalence class an input is to be placed into: Nodes
which contain two or more mappings and nodes which contain "complex"

mappings. An example of a complex mapping is as follows:

5 (x, x+y) if x1< 3
f (ng) =
(?, x+y) otherwise

This makes the following prOposition obvious.

PrOposition 3.§_
Every cycle which terminates in a finite number of steps has at

least one node which contains a complex mapping.

The problem of partitioning input data would be greatly simplified
if Program Structures contained no cycles. In the following proposition,

that assumption is made.

Propositionp3.6

If a cycle free Program Structure contains a total of n nodes and

45

has no junctions, then the maximum number of possible execution sequences
n-l n-

is l + ES i,) .
i=1

has:

In a cycle free Program Structure no node is processed more than once.
Since there must be at least one input node, i.e., a node that is known
prior to execution, then there is at least one node to be processed which
leaves n-l nodes left to form execution sequences. The number of permu-
tations which can be formed is (n-1)! but the number of execution sequences
is less than this because an execution sequence and its permutation are
regarded as the same. To prOperly count the maximum number of possible
execution sequences, combinations of n-1 nodes must be calculated for
the possibilities of processing 1, 2, 3, ..... n-1 additonal nodes.

To state this more formally, the maximum number of possible execution

sequences is:

(‘2‘) + (.3)

3.3 An Analysis Formalism

There are a number of aSpects of Program Structures which can be
investigated and analyzed. However, due to the graph-like appearance
of Program Structures, it becomes difficult to perform any analysis on

them. In order to circumvent this difficulty, a formalism is now

introduced.

46
Definition_3.3

The index set of all nodes will be denoted by N =iﬁ, 2, 3, ... , n}

Definition_3.10

 

The classificiation set for all nodes will be denoted by

 

x < 1 => the node is unknown
C=[xl0§xf1where and
x = 1 =>-the node is known

Definitiong3.11
The set of test values on the mappings of any particular node will

be denoted by

T = {0, 1’ 3, 1+, 0000...}

A series of transition statements using the above definitions can be
made which describe the behavior of some Program Structure. In the
illustration below, when node N3 is processed, it causes node N# to become

known.

This action can be described by the transition statement

(3, 1, 0) -'-(4, 1, 0)

47
Definition 3.12

The transition statement (n, c, t)-"(n', c', t') describes the

action taken when node Nn is processed and where

n, n' 6 N
c, c' 52 C
t, t' 6. T

Each 3-tuple of a transition statement will be referred to as a state.

Since unknown nodes are not processed, they must not affect the
contents of any other node. The following definition specifies the

transition statement for that possibility.

Definitiong3.13
If node N“ is unknown, then (n, 0, d) —~ (n, 0, d) where the symbol

d represents "don't care".

The above definition can be made more general by testing the

classification of a particular node.

thinition_3,14

If x -< 1 for any node Nn , then (n, x, d)-—r-(n, x, d)

A more difficult problem arises when transition statements are
used to describe the transmittal of information to a junction or the
action by a node with a multiple mapping. For example, in Figure 3.3

node N8 will become known only if it receives information from both

48
nodes N6 and node N7. One possible way of describing this notion is by

using the following two transition statements:

(6, 1, d) .... (8, 1, d)
(7, 1, d) ——'-(8, 1, d)
Clearly this is unacceptable since it implies that either node N6 or
node N7 can make node N8 become known.
Another possible way of describing this action is by using the

following two transition statements:

(6, 1, d) -—-—> (8, 0, d)
(7, 1, d) -—-(8, 0, d)
This is perhaps a better solution, but it too is unacceptable since it
fails to communicate any information about the behavior of the nodes
To represent a junction, notation will be used that is similar to

"conditional expressions".

Definition 3.15

If node N1 is a junction and it must receive information from nodes
Nj and Nk to become known, then this action is described by the following
transition statement: (j, 1, d) A (k, 1, d) ——-» (i, 1, d) where A
denotes logical and. In other words, (i, 1, d) will be realized only if

the condition indicated is true.

Figure 3.4 gives an example showing how a transition statement is

used to handle a multiple mapping.

(6, 1, d) A (7, 1, d)—+(8, 1, d)

Figure 3.3 Use of a transition statement to describe the transmittal

of information to a junction

SO

(1, 1, d)—+— (2, 1, d) A (3, 1, d)

Figure 3.4 Use of a transition statement to describe the action

taken by a node containing a multiple mapping

51
Definitiong3.l6

If node N1 contains a multiple mapping to nodes NJ and Nk then the

action taken by N1 can be described by the following transition statement:

(1, 1, d) ——~(j, d, d) A (k, d, d)

It is assumed that definitions 3.14 and 3.15 can be generalized to
handle more than two nodes.

Figure 2.4 displays a Program Structure which is used to implement
the Newton-Raphson root finding algorithms. The following transition

statements describe the behavior of that Program Structure.

(d, 0, d)-—> (d, 0, d)

(1, 1, d)——_. (2, 1, d) A (3, 1, d)
(2, 1, d)——-. (A, , 2)
(3, 1, 0)
(4, 0, 2) 1K (4,

 

0

(3, 1, 1)—————. (A, 0, 3)

0, 0)
0

, 3)

 

(1+, 1, d)
(4, 1, 1)———» (2, 1, d) A (3, 1, d)
(4, 1, 0)-———>(3, 0, d)
Note that the third value of the triple can be used to describe
the behavior of a complex mapping. A choice of "next states" is
possible depending on that test value.
Figure 3.5 shows how a transition diagram can be drawn based on the

above transition statements.

52

 

 

Figure 3.5 A transition diagram made form transition statements

of Program Structure in Figure 2.4

CHAPTER IV

CONDITIONS FOR REPEATABILITY

The purpose of this chapter is to investigate ways in which the
uniqueness of the output of a Program Structure can be assured. In
addition to discussing determinism and repeatability, maximal parallelism

is studied.

Definition 4.1

A Program Structure is said to be deterministic if the execution
sequence generated by a particular set of values in the input set is
unique. Fer the purpose of this definition, execution sequences
N1 N2 N3 and N1 N 3 N2 are considered to be different even though one

execution sequence is a permutation of the other.

It should be obvious that in most cases Program Structures are not
deterministic. Determinism is not necessarily a desirable property of
Program Structures. In fact, a deterministic Program Structure has very

little capability for parallel processing.

Proposition 4.1

Any Program Structure can be modified so that it is deterministic.
2:929

If there is more than one input node, rewrite the Program Structure
so that there is only one input node. If any node contains two or more
mappings, then rewrite the mappings so that only one node can become

known as a result of processing any one node.

53

54
A lesser form of determinism, call repeatability, is frequently
the more desirable prOperty of Program Structures because parallelism

is achieved and yet the end result is unique.

Definition 4.2

A solvable Program Structure is said to be repeatable if the values
in the output set are always identical regardless of the number of times
the Program Structure is executed using a given set of values in the

input set.

The notion of repeatability has been called "weakly determinate" (14),
"output functional" (34), or even "deterministic" (43) in other models.
Clearly, a deterministic Program Structure is repeatable.

Program Structures are not always repeatable as the example in
Figure 4.1 illustrates. Although the purpose of that Program Structure
may not be clear, it is clear that the output node N5 may contain the
value 2, 3, 4, 5, 6, or 7 depending on when nodes N1 and NI+ are processed.
The reason for this apparently chaotic situation is that it is possible
to have up to six cOpies of node N# in the active set A at some instant
in time. If node N1 is processed after these six cOpies are placed in A,
then there are six known nodes - all cOpies of node N4. The order in
which these known nodes are processed will determine the final output
value in node N5.

However, all is not lost. The Program Structure in Figure 4.1 can
be made repeatable by changing one mapping:

('2, ?) if i< 6
rim) =

(i,?) otherwise

55

f1(1) = (?9 1) f2(1) = (1)

   
 
 

A

 

f2(i) = (i, '2) i .

:3

fI+(i, 3) a (1+3 , 2)

I: {111,112 } (i+1)ifi<6
f2“) =(

(?) otherwise
0: {N5} ‘

Figure 4.1 Example of a non-Repeatable Program Structure

 

56
The effect of the above mapping is that node N3 passes information
to node Nu only when the cycle is completely finished. Non-repeatability
frequently occurs when a node receives information (which makes it known)
both from a cycle and a node outside the cycle. This suggests a condition
to be applied which can make all Program Structures repeatable. If a
cycle could be "unraveled" so that execution of a cycle could be regarded
as execution of a series of nodes which then passes information on to

another node, then that Program Structure can be thought of as having no

cycles.

The following defintion suggests a type of cycle which is very

useful in develOping repeatable Program Structures.

Definition 4.3

A closed cycle is a cycle such that a node outside the cycle will
receive information from that cycle at most once each time the cycle is
executed. Also, there is one node in the cycle which contains a complex
mapping and this will determine whether the cycle will be executed once

again or whether information is to be passed to a node outside the cycle.

A Program Structure which contains only closed cycles behaves like
a cycle-free Program Structure. Although use of closed cycles can be
helpful in creating repeatable Program Structures, it is not in itself
a complete assurance of repeatability. Fer example, in Figure 4.2 a
cycle-free Program Structure is displayed that is not repeatable. The
output value in node Nu will depend on the order in which nodes N1 ,

2

N , and N3 are executed. The execution sequence N1 N3 N2 N3 will

57

produce a different output value than the execution sequence N2 N3 N1 N3.
Note also in this example that the Program Structure does not contain a
junction even though node N3 may graphically appear as such.

This suggests a condition to be applied to cycle-free Program

Structures which will guarantee repeatability.

Theorem 4.2

A cycle-free solvable Program Structure (or one that contains only
closed cycles) is repeatable if for any node NJ.- where fj :Dj-r-Diiand
fk :Dk—rDi, then rjwj) a fk(Dk) ,4. fj(Dj).
sass

To guarantee repeatability of a cycle-free Program Structure, the
theorem simply states that no two mappings may map values into the same
element of a data vector.

Case 1. The Program Structure contains no junctions. Each time a
node is processed, it causes 0 or more nodes to become
known. Since there is no communication or sharing of
information between the nodes, the set of output values
is not affected by the order in which the nodes are
processed. A

Case 2. The Program Structure contains one or more junctions.
Because of the condition stated in the theorem, the
information that must be placed in a junction to make it
become known is unique regardless of the order in which

the nodes that precede the junction are processed. This

has the effect of ensuring that certain nodes will be

58

  

f1(x) = (x)

9(2) = (z, '2)

 

assume x # y

Figure 4.2 Example of a cycle-free, junction-free Program Structure

which is not repeatable.

59

processed before the junction itself is processed.
Consequently, this will have no effect on the set of

values placed in the output set 0.

Note that in both Case 1 and Case 2 above there is only one
execution sequence for any given set of values in the input set I.

This is a necessary, but not a sufficient condition for repeatability.
Although the above theorem adds insight into the nature of repeat-
ability of Program Structures, it contains very restrictive conditions. So

that repeatability can be more easily described, the next concept is

introduced.

Definition 4.4
A critical race occurs whenever there are two or more copies of the

same node in the set A or the set K during execution of a Program Structure.

The term "critical race" is chosen because it resembles a like
situation described in switching theory. The purpose is to show that if
execution of a Program Structure can result in no critical races, then
that Program Structure is repeatable. Note that the Program Structure
illustrated in Figure 4.2 does contain a critical race and it is not
repeatable. The critical race occurs because it is possible to have two
N3

cOpies of node in the set K at the same time. The value placed in the

output set will depend on the order in which the nodes are processed.

Definition 4.5

The value vector of a node N1 will be denoted by V1 = (d:, d:,...,d;)

60

where d; is used to represent the value of Dl’J.

Lemma 4.2

Given an input set I, if the value vector Vi is unique for any
node Ni , then that Program Structure is repeatable.
areas:

Suppose Nj is any output node. Since Vj is unique, the values in

the output set are unique.

Theorem 4.4

If critical races cannot occur during the execution of a solvable
Program Structure, then that Program Structure is repeatable.
areas:

The absence of critical races guarantees that all cOpies of any

i 1
’ , N , 0...... ’

node N1 are processed in a unique order, i.e., N
Nl’m . Thus the values mapped to each cOpy are also unique. This
further implies that the value vector V1 for any node N1 is unique,

and by Lemma 4.3, the Program Structure is repeatable.

The analysis formalism which was develOped in the previous chapter
can be used to determine whether or not a Program Structure contains any
critical races. For example, consider Figure 4.1. The transition
statements for that Program Structure are:

(1,1,d)-———>- (A,0,1)
(2,1,d)-——>- (3,1,d)

(3,1,1)—'—*(3g19d) A (A’QOQB)

61
(3,1,0) ——‘.’ (“9093)

(1+,091) A (4,0,3) _'—+ (Sgogd)

If it is possible to write a series of transitions from the above
statements that will lead to a node more than once before that node is
transformed to another, then the Program Structure contains a critical
race. From the above transition statements, the following series of

transitions is possible:

(2,1,d) —+ (3,1,d)——>- (3,1,1)—+ (3,1,d) A (4,0,3)

'—+(3,1’1)——.'(3919d) A (4,0,3) eeeeoeeoeeo

The state (4,0,3) has been referenced more than once before being
transformed to another node. This means that there can be two or more
cOpies of node N“ in the set A at the same time. This implies that a
critical race exists. Results using the above appear at the end of the

chapter.

The analysis formalism can also be used to help determine the
number of equivalence classes present in a Program Structure. The
number of equivalence classes is closely related to the number of
output nodes. In particular, consider the Program Structure of

Figure 2.4. The transition statements are:

(1,1,d)———>—(2,1,d) ‘A (3,1,d)
(2,1,d) -——- (4,0,2)
(3,1,1)——>(l+,0,3)

(3,1,0) —__+ (Bgogd)

62
(4,0,2) A (4,0,3) -—->- (4,1,d)

(4,1,1)—+(2,1,d) A (3,1,d)

(4.1.0) ———>- (2,0,d) A (3,1,d)

The following series is possible given the above transition statements:

(1) (1,1,d)—->-(2,1,d) A (3,1,d)—>(4,0,2) A (4,0,3)
—~(4,1,d)—~(4,1,1)—+—(2,1,d) A (3,1,d)-+......
(Infinite cycle)
(2) (1,1,d)—>(2,1,d) A (3,1,d)—+(4,0,2) A (3,0,d)
(3) (1,1,d)——*—(2,1,d) A (3,1,d)—+(u,0,2) A (1+,0,3)

——~(4,1,d)-—"'(2,0,d) A (3,1,d)—#0505)

Using the analysis formalism, it appears that the Program Structure
can enter an infinite cycle. However, after studying the Program Structure,
this should not happen since there is a control built in to prevent such
a possibility. This may appear to be a serious anomaly, but in practice
it could be very worthwhile in detecting possible infinite cycles.

At any rate, the analysis formalism indicates that the Program Structure
can terminate in one of three ways. This further suggests that there
are only three distinct execution sequences, and hence three equivalence
classes.

If a Program Structure does contain a critical race, it may be
possible to make an alteration to it in order to insure repeatability.

In Figure 4.3 there is an example of a Program Structure which contains

a critical race. The purpose of the Program Structure is to implement a

Finite State Machine. Upon close inspection, it should be apparent that
6

nodes N5.and N could be processed repeatedly. This could cause multiple

COpies of node N7 to appear in the set A. When these cOpies become known,

65

the order in which they are processed will determine the set of values in
the output set. This Program Structure can be made repeatable by adding
a "delay" node N4 to insure that each input symbol prOperly correlates
with the current state. The "fixed up" Program Structure is the one which

appears in Figure 2.7.

Definition 4.6
Two execution sequences are said to be similar if each contains

exactly the same occurences of nodes as the other.

PrOposition 4:5
A solvable Program Structure is repeatable if for any given input
set the following prOperties hold:
1. All possible execution sequences are similar to each other
2. For any nodes Ni’p and Ni"q where p <.q, Ni’p
always precedes Ni’q in all execution sequences.
magi:
Clearly if the above prOperties hold for any Program Structure,

then that Program Structure does not have any critical races. Hence,

repeatability is assured.

One of the objectives in using multiprocessing is to decrease the
processing time of programs. To achieve this, as many nodes as possible
should be processed concurrently. This naturally leads to some considerations
about the "degree of parallelism" in a Program Structure. In other words,

a high degree of parallelism would imply less processing time.

64

 

 

 

 

 

Mappings are assumed to be similar to those in Figure 2.7

Figure 4.3 Program Structure which contains a critical race

65

Definition 4.7

A Program Structure is maximally_parallel if for every node which

 

contains more than one mapping, then all of the mappings of that node

are identity mappings.

An identity mapping is one that performs no manipulation of inferma-
tion, but rather cOpies information into another node. For the purpose
of the above definition, complex mappings will be considered as identity

mappings. The following is an example of an identity mapping:
f3(a, b, c, d) = (?, c, ?, a)

In Figures 4.4 and 4.5 a comparison is made between two segments of
a Program Structure. Fer the sake of comparison, assume that identity
mappings require one unit of time and the mappings of Figure 4.4 each
require ten units of time. This means that the program segment in Figure
4.4 could be executed in a minimum of thirty time units. By comparison,
the program segment of Figure 4.5 could be executed in a minimum of thirteen
time units. If a number of processors are available, it then becomes very

advantageous to have a Program Structure in maximally parallel form.

Proposition 4.6

Any Program Structure can be rewritten so that it is in maximally
parallel form.
sass:

The proof consists of a procedure which will effectively force any
Program Structure to be in maximally parallel form.

For each node N1 that contains a multiple mapping and n of the

66

aft.
“3
p

 

f: , f: , f; are non-identity mappings

Figure 4.4 Example of "poor" parallelism

oft-

67

.I'h-
m'lr
orb-

 

NP Nq “r

 

 

 

#1

3 9

Assume fh fh

1 2 , are identity mappings and
I

fp,fq,frarethesameasfi,f:,f;,inFigure4.4

Figure 4.5 EMple of Maximally Parallel Form

68
mappings (n 2 1) are nondi'dentity mappings, perform the following:
1. Create n new nodes that have the same data vector as N1
2. Each of the new nodes will have only one mopping, i.e., each
new node will have one of the n non-identity mappings
3. Replace the non-identity mappings in N1 with identity mappings
that map the data vector of Ni into each of the n new nodes
When this procedure is chleted, the Program Structure will be in maximally

parallel ferm.

Proposition 4.7

The maximally parallel form of any repeatable Program Structure is
also repeatable.
seer

Assume the proposition is false and then show a contradiction. Fer
any node N:1 in the original Program Structure, the value vector V3 is
identical to that of the corresponding node in the rewritten Program
Structure. This is true because the only alteration to the original node
was to change some of the mappings to identify mappings. Then for at
least one node Ni, the value vector V1 is not unique. Suppose N1 is a
new node that was created from original node NJ. Then the value vector
v3 is not unique either since the mapping from N3 to N1 is an identity

mapping. Hence a contradiction.

Much of the previous discussion of repeatability has utilized various
aspects of a Program Structure after it has been executed. It would be of
great practical value if it were known that a particular Program Structure

was repeatable before it was executed. The analysis formalism will be used

69
for this purpose.

Definition 4.8
When writing transition statements, a state is considered current
under either of the following circumstances:
1. The node being described is an input node and that state has
not yet appeared on the left side of a transition statement
2. Any state referenced by the right side of a transition statement

until that state appears on the left side of a transition statement

Definition 4.9
A path is a series of transiton statements which leads to a particular

current state.

Proposition 4.8

If x is any state and there exists one or more paths to x while x
is current, then that Program Structure contains a critical race.
e21:

If n represents any node, then x is of the form (n, 0, a) or (n, 1, a)
where a is some test value. Since there are two or more paths to x, this
means that during execution of the Program Structure there can be two or
more cOpies of node n in set A or set K concurrently. Hence, a critical

race exists.

Similar arguments can be made for the following propositions.

70
Proposition 4.3

If n represents any node, a f b, and states (n, 1, a) and (n, 1, b)

are both current, than that Program Structure contains a critical race.

Proposition 4.10
If n represents any node, test values a and b are not necessarily
equal, and states (n, 1, a) and (n, 0, b) are both current, than that

Program Structure contains a critical race.

Thus it has been shown that by using the analysis formalism, a
mechanical procedure can be develOped which can detect the presence of
critical races in any Program Structure before it has been executed. This

can be of considerable value when dealing with large programs.

CHAPTER V

IMPLICATIONS

5.1 Elimination of Classical Problems

In this section, solutions to such problems as interference among
nodes and deadlock will be presented. The problem of interference has
received a great deal of attention in the literature, particularly by

Dijkstra (19) and Gilbert and Chandler (22).

Definition 5.1

Two nodes Ni and Nj are said to interfere with node Nk if all of
the following hold:

1. {inf—wk and tjznj—vnk

2. fi(Di) & £5033) = r1031)

3. N1 and N3 can be in set K concurrently

Interference is a problem unique to multiprocessing in that it can
only arise when two or more processors are trying to simultaneously change
the value of some location in memory. The cOpy concept of Program Struc-
tures reduces the seriousness of this problem but it does not entirely
eliminate it.

There are only two ways in which interference can arise in the
execution of a Program Structure. In the first case, two copies of the
same node are processed concurrently. Note that more than one node can
be mapping information into a particular node simultaneously. Consequently,
the timing can be such that both cOpies are mapping into the same node.

The second case is similar to the first except different nodes map into

71

72

the same node concurrently.

Even in Program Structures the problem of interference is a potentially
dangerous one. Not only does this situation prevent repeatability, it may
also destroy any value produced by the execution of a Program Structure.
If two nodes map into the same data element at the same time, the result
will be indeterminate; in fact the result may cause erroneous action to be
taken.

Solution of this problem is accomplished in two steps. The first
step establishes a mechanism by which all potentially interfering nodes
are detected. The second step develOps a procedure to eliminate inter-
fering nodes. Most other approaches to this problem simply do not allow
more than one processor to simultaneously access shared memory. The
mechanism used to insure this is called mutual exclusion. The approach
taken here is not so restrictive; interference will be eliminated and
yet many processors can be accessing shared memory simultaneously.

Definition 5.1 specifies the criteria which causes interference so
the problem of detecting possible interference becomes one of determining
whether or not there are any nodes which meet that criteria. This
determiniation can be accomplished in large part by using the analysis
formalism. All nodes which meet the first criterion (i.e., all Ni,

Nj that map information into Nk) can be detected as follows:

(i, d, d)—->-(k, d, d)

(j. d, d)—->(k, d, d)

This amounts to a checking of transition statements to find all nodes that
map into the same node. This of course does not guarantee interference.

In fact, there is no problem if each data element in Nk has one or less

ln‘, ‘

 

 

73

mappings that reference it. Fer an example of this, consider junctions.
there are always two or more nodes that map into a junction and this in
itself does not necessarily lead to interference.

The second criterion of the definition determines whether or not there
is one or less mapping to each data element of node Nk. If fi(Di) & fj(Dj)==
fi(Di) and criterion one has been met, then there is at least one data
element in Nk that is referenced by both Ni and Nj. Even though both of
the first two criteria have been met there is still no assurance that
interference is even possible. The third criterion establishes that
possibility. If both Ni and N:1 are known concurrently, then they can
be processed concurrently and furthermore it is possible that they will
map information into the same data element of Nk concurrently.

By using the analysis formalism, it is possible to determine whether
or not nodes Ni and Nj can be known concurrently. If there exist paths
to NJ" and N3 where the path to N1 does not depend on Nj and the path to
Nj does not depend on Ni, then Ni and Nj can be known concurrently. As
in the first criterion, this becomes a matter of mechanically checking
the paths to N1 and Nj and making certain the above condition holds.

The preceding discussion is summarized in the following theorem.

Theoremg5,1

There exists an effective procedure which can detect all potentially

interfering nodes.

The solution to this problem is actually a matter of prOperly
synchronizing the processing of certain nodes. When a Program Structure

is being executed, the nodes in the set K can be processed in any order.

74

To eliminate interference, all that is necessary is to insure that poten-
tially interfering nodes are not processed concurrently. Since it is
possible to detect all potentially interfering nodes, suppose a list L of
these nodes is created. Furthermore, suppose that this list L is created
such that each element of L is a set of nodes which potentially interfere
with the same node. Interference will be eliminated if the following
change is made in the processing of known nodes.
For any known node Ni, that node can be processed only if either
of the following conditions hold:
a. For allxéL, Niel x OR
b. For all x 6 L where N1 6 x, no other element of x is
currently being processed when Ni is being processed
(i.e., prevent concurrent execution of two or more
potentially interfering nodes)

The above discussion constitutes the proof for the following theorem.

Theorem 5:2

 

The execution of any Program Structure can be synchronized so that

interference of nodes is eliminated.

There is another way in which interference can be eliminated. This
method has the advantage of not altering the way in which nodes are
processed. Suppose nodes Ni and N3 potentially interfere with node Np.
All that is necessary to prevent interference is to create a new node

Nq that is identical to NP and for all f1 e 111

that map into Np, change
the mappings so that they map into Nq. It may be necessary to repeat

this process a number of times and it may substantially increase the

75

size of the Pregram Structure. However, execution time would most
likely decrease since the resulting Program Structure would be in a
more parallel form.

Although it is possible that a repeatable Program Structure may
contain interfering nodes, it is not necessary to synchronize execution
in order to eliminate interference. By definition, a repeatable Program
Structure will yield the same output set each time it is executed.
Consequently, if there are interfering nodes, then the interference
they create is incidental and does not contribute in any way to the
values obtained in the output set.

Quite often, the solution to one problem leads to the creation of
another problem. In the situation created by Theorem 5.2, it must be
known whether or not the restraints imposed will reslut in perpetual

blocking of all processors.

Definition,5.2
A Program Structure is said to be deadlocked when it becomes

impossible to process any of the known nodes.

The concept of deadlock is analogous to the "deadly embrace"
described by Dijkstra (19) and Luconi (34) as well as the "hang
up state" discussed by Petri (45) and Rodriguez (41).

Suppose a Program Structure contains interfering nodes and the
synchronizing scheme suggested by Theorem 5.2 is used to eliminate
them. Processing of any interfering node Ni will be delayed only

as long as there are one or more known nodes, then it will always be

76
possible to process at least one node. Execution time may be increased,
but deadlock will not occur.

Before discussing the deadlock problem for Program Structures, this
problem will be first viewed with reapect to other models. In particular,
attention is given to problems arising in the use of shared memory and
shared resources.

Traditionally, when a processor has access to shared memory, it can
read or write anywhere in that memory. Since the memory is used as a
means of communication between processors, safeguards must be maintained
to prevent two or more processors from accessing shared memory concurrently.
However, such a mechanism can have the effect of preventing all processors
from ever accessing the shared memory and deadlock results.

Another way deadlock can occur is when processors share resources,
such as input or output devices. For example, suppose one processor has
control over the one available reader and needs to get control of the
printer before it can finish. At the same time, another processor has
control of the one available printer and needs to get control of the reader
before it can finish. Neither processor can proceed and deadlock results.

The most common approaches to the deadlock problem have been to either
develop an algorithm to prevent it or to simply allow it to happen and then
take corrective action. The former approach has received much attention;
one such widely known algorithm is the "banker's algorithm" of Dijkstra (19).

The concept of shared resources is not relevant when discussing Program
Structures. The mappings in each node perform manipulations on information
only; furthermore, the concept of input and output does not rely on external

devices.

77

Program Structures do use shared memory. However, the concept is
somewhat different than previous approaches. When a processor is accessing
shared memory, it can only access the information contained in the node it
is currently processing. Consequently, many processors can be accessing
shared memory simultaneously and not affect the final outcome. It is also
possible for two processors to access identical information simultaneously.
The "cOpy of a node" concept makes this possible. This modularity makes
it unnecessary to create algorithms whose purpose is to prevent two or more
processors from accessing shared memory concurrently.

The problem of deadlock is a serious matter in previous multiprocessing
models. However, this problem is of no consequence for general Program
Structures.T' No restrictions have been made as to the amount of available
memory or even the number of available processors. There is no configuration
which can prevent the processing of nodes in the set K. Hence, the next

prOposition follows immediately.

Proposition<5.3
A Program Structure with no restriction on the amount of available

resources cannot be deadlocked.

512 Finite Program Structures
This thesis has investigated Program Structures in their most general
form. In this section, a restricted form of Program Structures will be

studied with an emphasis on problems associated with this form.

 

1’ The next sections deal with special cases of Program Structures where

deadlocks can occur.

78
Definition 2.3
A Finite Program Structure (FPS) is a Program Structure which has

access to a finite amount of memory and a finite number of processors.

The definition does not imply that an FPS must deal with a finite
number of nodes. Indeed, an unsolvable FPS is infinite in this sense.
Besides limiting the number of processors, the definition states that
there is a limitation on the total amount of memory available to store

all the unknown, active, and known nodes.

Definition45.4
The amount of memory required by any node N:L will be denoted by
Size (N1). The total amount of memory available to a particular FPS

will be denoted by the mnemonic Mem.

There are several trivial ways in which deadlock can occur in an
FPS. If the number of available processors is zero, then nodes in the
set K cannot be processed. Also, if the amount of memory required by
any node exceeds Mem, then deadlock results because the set U cannot
be placed in memory. These trivial cases will not be considered further.
The "cOpy of a node" concept has proved to be a very powerful feature
in that it can help to eliminate interference between nodes. There is a
negative aspect to this in that imprOper usage can cause deadlocks. Creation

of unnecessary cOpies can occupy all available memory and lead to deadlock.

Assumed Conditions rggarding_FPS

79

1. There is at least one processor available for use.

2. Mem 2 3 .2 Size (Ni)
N‘eU

The only way that deadlock can occur in an FPS is by attempting to
occupy more memory than is available. This can happen if too many cOpies
of nodes are created. It is possible to state that for certain FPS
deadlocks can not occur. For the others, a dilemma is faced. If these
FPS are executed, then some will become deadlocked and some will not. Some
care must be exercised when dealing witthPS that might lead to deadlock.
If an FPS does become deadlocked, recovering from it can become a very
expensive propositon.

This suggests two approaches to the problem. If an FPS cannot
become deadlocked, then there is no problem and no restrictions need
to be placed on the execution of that FPS. This implies that for any
FPS there must be some method of determining whether or not a given FPS
is deadlock free. A method of detecting deadlock free FPS is presented
shortly. If an FPS cannot be classified as deadlock free then it is
necessary to alter the execution of that FPS. By prOperly processing
the nodes, it may be possible for such an FPS to not end in deadlock.

Of course, there are some FPS that are hOpeless. That is, no matter
what precautions are taken, deadlock will result. This tapic is also

discussed shortly.

Theorem 5.4
If a Finite Program Structure contains no critical races and assumed

conditions 1 and 2 hold, then that Finite Program Structure is free from

deadlocks.

80
has:

Elimination of critical races implies that there cannot be more than
one copy of any node in set A and/or set K. However, when a known node
is being processed it is no longer in the set K but it still requires
use of memory. Since it is permissible for a node to map information
into a cOpy of itself, it is possible that memory would be required
for the amount of two cOpies of that node. Since it is assumed that

Mem 2. 3 2 Size (N1) then there is an adequate amount of memory and

N16 U

it is impossible for deadlock to occur.

By using the analysis formalism it can be readily determined whether
any FPS contains a critical race. According to proposition 4.10 all that
is necessary is to examine the transition statements. If an FPS contains
a critical race, then there exists one or more paths to a current state.

In other words, if the transition statement (x, 1, d)-*-(i, 0, x) is
possible while (i, 0, x) is a current state, then a critical race exists.

Theorem 5.4 has shown that it is possible to determine whether any
FPS is free from deadlock. However, this is not adequate since many FPS
may not fall into this category. If an FPS cannot be classified as deadlock
free, this does not necessarily mean that it is certain to end in
deadlock. A stronger result is needed in order to determine whether
it is possible for any given FPS to be executed and avoid deadlock.

There will be few restrictions placed on the FPS under consideration.
Although properties of execution sequences are utilized, it is not necessary
to make any assumptions about them. Since a general solution to the dead-
lock problem is sought, the results must hold regardless of whether the

FPS is repeatable or even solvable.

81
Cycles and junctions are permitted and it is required that all cycles
be closed cycles. This will have the effect of assuring a finite number
of execution sequences for unsolvable FPS. Note that every unsolvable
FPS contains at least one cycle. The presence of a cycle or even an in-
finite cycle does not pose any difficulty in studying the deadlock problem.
The nodes of a closed cycle can be treated as nodes outside a cycle when

considering memory usage. For example, given the following execution

 

sequence which contains a closed cycle: N1 N2 N“ N5 N6 N7 N8N10 N11

The nodes in the closed cycle do not have to receive any special considera-
tion. Each node in the cycle would have to be checked (just as any other
node) to determine whether memory had been exhausted. Further, it is
immaterial how many times the cycle would be executed. If deadlock does
not occur during the first pass through the cycle, then deadlock cannot

occur after n passes where 1 5. n5. 0° .

Definition 5 .5

tj = {(x, y) I x is a node designator, y is a set of node
designators such that for all i 6 y there exists f1:D1-—~Dx}
T1 = {t3 I where tj is a description of the sets A and K after

the jth node in execution sequence i has been processed}

Since t j is a description of the sets A and K, for any x e t j where
x = (a, b), the elements of b are unique. However, it is possible to have
y 5 t3. where y = (c, d) and a = c. For each node in A or K during execution
sequence i there is one or more corresponding entries in Ti. It is a rela-

tively easy matter to determine whether or not a node is known. If

 

82

x 6 t where x = (a, b) and b = (b1, b2, ....., bm) then Na is a known

3

node if there exists a transition statement of the form:

(b1, d, d) A (be, d, d) A A (bm, d, d)—+(a. 1, d)

Proposition,5,5

A critical race does not occur in execution sequence i if for all
tjé Ti and for all x 2 t3. where x = (a, b) there does not exist yé tj
where y = (c, d) and a = c.

Proof:

Follows immediately from the definition of a critical race.

Definition 5.6

D = {- i ' execution sequence i does not lead to deadlock}

The elements of D can be determined by computing the amount of
memory required by nodes represented in each Tl. Fer convenience when
performing this computation, it can be assumed that only one processor

is being used.

Proposition 5.6

If a Finite Program Structure is deadlock free when using one
processor, then it is deadlock free when using two or more processors.
2.1-ass

There must always be at least one processor able to completely
process some node. If this were not true, deadlock could arise when using
only one processor. When multiple processors are used, it could be

possible for all but one to be idle. However, all nodes will ultimately

 

83

be processed and deadlock cannot occur.

Proposition 5.7

If there are n execution sequences for a given Finite Program
Structure and #(D) = n then that Finite Program Structure is free from
deadlock.
magi:

Since every execution sequence generated by the FPS does not lead
to deadlock, then clearly the Finite Program Structure is free from

deadlock.

Proposition_5.8

Suppose there are m equivalence classes of input data for any
given repeatable Finite Program Structure. Let E1 represent the set
of execution sequences for equivalence class i (1 5 i 5 m). For any
equivalence class i, if there exists x E E1 such that x 6 D then that
Finite Program Structure can avoid deadlock.
are

By definition, any two execution sequences in Ei are equivalent.
Then for all inputs in equivalence class i, there exists an execution
sequence which will achieve the desired output and be able to avoid
deadlock. Since the preposition states that this holds for any

equivalence class, then it is possible to process any input and not

end in deadlock.

A corollary to preposition 5.8 is now presented which considers

both repeatable and non-repeatable FPS.

84

Corollggy 5.2
A Finite Program Structure can avoid deadlock if for any execution

sequence either it or an equivalent execution sequence does not lead to
deadlock.
Proof:

Follows immediately from preposition 5.8.

The previous discussion and propositions can be summarized in the

following theorem.

Theorem45.10

It is decidable whether any Finite Program Structure can avoid

deadlock if all cycles are closed cycles.

Much of the previous discussion has centered on the detection of
potential deadlock. When it is possible for an FPS to avoid deadlock
then processing of nodes must be synchronized prOperly. This implies
that the order of processing the nodes is predetermined (i.e., knowing
the execution sequences) or making an adjustment to the processing
algorithm so that the prOper order of processing nodes would always be
Observed.

When a deadlock does arise, it may be possible to preempt the
processing of some node in an attempt to recover from this situation.

In other words, when a processor is about to perferm its mappings and
finds no available memory it can return that node to the set K and attempt
to process a different node. However, if prOper synchronization is used,

preemption of nodes will not be necessary.

85
533 Copy Free Program Structures
In this section, properties of a Program Structure in which cOpies

are not allowed are investigated.

Definitiong5.?
A Copy Free Program Structure (CFPS) is an FPS where each node

appears exactly once and must be in Set U, A, or K.

In order to utilize the prOperties of the "cepy" concept, an

analogous prOperty is now defined.

Definitiong5.8
Each element of the data vector of any node of a COpy Free Program

Structure is a finite length first-in first-out data queue.

Many of the results established for Program Structures are appli—
cable here. Fcr example, a node is known if each of its data queues
contains one or more elements. The current value is always the value
at the front of the queue. In the mapping fi:Di-----D3 where fi(Di) =
(a, ?, b), the first and third elements ofDJ are referred to as defined
data queues for this mapping. When a mapping is performed at least one
element is added to the rear of each defined data queue. Entries are
duplicated so that the queue length of each defined data queue equals

the maximum queue length of the data vector.

Proposition 5.11

A COpy Free Program Structure is repeatable if for any input set the

86
data values arrive at the data queues in a unique predetermined order.
freer
Assume that the value of an output node is composed of the elements
at the rear of the data queues. This preposition then follows immediately

from the definition of repeatability.

A special case of the above proposition would be a situation where
the maximum length of any data queue is one. Of course, if this condition
is met the CFPS is repeatable.

Although preposition 5.11 may seem very restrictive, in practice
it would be a more desirable condition to attain rather than the special
case discussed above. This would be the case since it would be possible
to achieve greater utilization of the processors. In fact, the condition
expressed in the epecial case implies that there is no need to use data
queues, but rather use single valued variables.

In the previous two sections, the problems of interference and
deadlock were discussed. Although the results obtained could be applied

here as well, an additional comment is necessary with respect to deadlock.

Propositiong5.12

A COpy Free Program Structure can be deadlocked only if the maximum
length of one or more data queues is exceeded.
freer

Deadlock can only occur when it becomes impossible to process any
of the known nodes. There are two ways that this can happen:

1. All available processors are delayed because they are processing

nodes which are attempting to map information into one or more

87
nodes which have exceeded their maximum queue length.
2. All currently known nodes are being processed and are delayed
for the same reason as (1).
In either case, deadlock occurs as a result of attempting to exceed

the maximum length for one or more data queues.

Note that the exceeding of a maximum queue length does not of
itself guarantee deadlock. It is possible for a CFPS to "recover" when
this situation arises simply by processing the node(s) with maximum
queue length a number of times while there are processors still avail-
able. If this situation does arise, it can seriously degrade system
performance since at least one processor must be idle as long as that
situation exists.

Although Program Structures are intended to represent programming
processes rather than a scheme for machine design, the implementation
of CFPS resembles existing hardware systems. Such a system is the
CDC STAR which is a pipeline processor and is described by Hill and
Peterson (25). In the pipeline computer (depicted in Figure 5.1) there
is a shared memory and a number Of arithmetic units which communicate
with the memory via the input and output pipelines. There are many
technical considerations that are overlooked such as methods of insuring
the determinacy of the system. Also, the pipeline processor might have
a number of Special purpose elements in the arithmetic unit.

When considering functional characteristics, the pipeline processor
bears a great deal of resemblence to CFPS. In fact, the pipeline processor
is a special case of CFPS. The data queues represent the input pipeline
and the mappings represent the output pipeline. The major difference is

that the processors of a CFPS are all identical. This suggests that the

88

 

 

 

 

 

 

 

 

 

 

 

Memory
A
1
Input Control Output
Pipeline Pipeline
A
r
Arithmetic
Units

 

 

 

Figure 5.1 Functional Diagram of a Pipeline Processor

89
cost of a CFPS would be higher but this would be compensated by added

flexibility and possibly better thruput.

5.4 Comparison of FPS and CFPS

The Program Structure in its most general form is clearly free from
the deadlock problems exhibited by FPS and CFPS. However, in a practical
sense, it is impossible to implement the general Program Structure due
primarily to limitations on memory size. Consequently, the two special
cases of Program Structure are contrasted in this section.

In both cases, repeatability is assured based on similar conditions
established for general Program Structures. Deadlock can be detected
based on finiteness conditions although it is easier to predict for FPS.

The two major areas of comparison are Speed of execution and amount
of memory required. If there are n nodes in a CFPS, then the maximum
number of nodes which can be processed concurrently is n assuming that
processors are available. For FPS, the maximum could be much higher
because a number of cOpies of the same node could be processed simul-
taneously assuming available processors and memory.

The advantage gained by increased Speed of FPS is somewhat compensated
by increased use of memory. Although additional memory is needed for the
data queues of a CFPS, it would in general be less than that required to
create cOpies of nodes.

When a person describes a solution to a problem by means of a Program
Structure, he may not be aware of whether he is using an FPS or a CFPS.

It may be possible to obtain different results depending on which was used.

Definition<519
A Finite Program Structure and a COpy Free Program Structure are

equivalent if for any input set the output set is identical for both.

It may seem that if two structures contain exactly the same components
then they would be equivalent. This is not always the case with FPS and
CFPS. Suppose there is a node N4 which has exactly three mappings to it
defined as follows:

f1(a, b, c) = (a, ?, b, c)

f2(d, e)

f3(g, h)

(?, d, e, ?)

(g, h, ?, h)

In an FPS, there would be three cepies of node N8 created and none of
those cepies would become known. However, in the CFPS enough information
has been mapped so that node N4 can become known twice. Clearly, this

can effect the values placed in the output set.

Theorem 5.13
A Finite Program Structure is equivalent to a COpy Free Program

Structure if both of the following hold:

1. Any node N1 is contained in both structures

2. Neither structure contains any critical races
has:

Having the same node in both structures implies that not only are
the node designators identical, but the mapping vectors and the capacity
of the data vectors are identical as well. Showing that for any given

input set the execution sequence generated by the Finite Program Structure

91
must be equivalent to that generated by the Cepy Free Program Structure

is sufficient to prove this theorem.

Suppose it is possible for the FPS to generate an execution sequence
that is not equivalent to that generated by the Cepy Free Program Structure
using the same input set. Then there must exist at least one node
Ni’j in the execution sequences of the Finite Program Structure and the
Cepy Free Program Structure which has one or more data elements whose
values are not identical. This can only happen if at some time there are
two or more cepies of the same node in the set K or A or if there are
two or more paths to Ni'j. In either case, this contradicts the fact
that the Finite Program Structure and the Cepy Free Program Structure
are repeatable (absence of critical races implies repeatability). Hence
given any input set the execution sequences generated by the Finite
Program Structure are equivalent to the execution sequences generated

by the Cepy Free Program Structure.

It is not always possible to create a structure which can be used
interchangeably as either an FPS or a CFPS. The last two prepositions
in this section show that it is possible to take any repeatable FPS and

construct an equivalent CFPS and vice versa.

Propositiong5.14

For any repeatable Finite Program Structure there exists an equivalent
COpy Free Program Structure.
22223

An equivalent Cepy Free Program Structure will be constructed

92
from some or all of the nodes of the given Finite Program Structure.
If the Finite Program Structure does not contain any critical races then
an equivalent Cepy Free Program Structure would contain exactly the nodes
of the Finite Program Structure. Assuming that critical races do exist,
it may be necessary to alter some of the nodes of the Finite Program
Structure in order to construct an equivalent Cepy Free Program Structure.
This may be necessary because there are no cepies of nodes in a COpy Free
Program Structure. An example has been presented earlier in this section
showing how this can be a problem.

A Cepy Free Program Structure can be made to behave like an Finite
Program Structure by changing the mappings so that a Special distinguishable
marker (say ‘) would replace all undefined elements. The multiple elements
in the data queues would simulate cepies of nodes. Every node that had
at least one element in its data queue would be regarded as a known node.
The processing of nodes would have to prOperly account for this.

Nodes would only be processed when all elements of the data vector
would be defined by merging one or more disjoint rows of the data queues.
After the node is processed all rows involved in the merging would be
deleted. These alterations provide the capability of allowing the Cepy
Free Program Structure to mimic the Operation of the Finite Program
Structure. The only Operation the Cepy Free Program Structure will not
be capable of implementing is processing two copies of the same node
concurrently. Clearly, this Operation is not required to achieve

equivalence.

As an example of how the above alterations could be carried out,

93

N5 N3

consider node which has information mapped to it from nodes

15033)
27w?) (2», c, c)
f9(b9) = (?, e, ?)

(a, ?, b)

After nodes N3 , N7 , and N9 have been processed, the data vector of node
N5

 

 

 

would appear as: 5
D

a e b

* c d

w e *

 

 

 

 

5 could be merged to provide a data vector

5 D5

The first and third rows of D
whose elements are all defined. After node N has been processed,
would contain only one row; the first and third would be deleted.

Note that if the mappings were not modified as suggested in the

 

 

 

preposition, D5 would appear as follows after nodes N3, N7 , and N9
have been precessed: 5
D
a c b
7 c d
? e 7

 

 

 

 

In this case, when node N5 is processed the data vector would centain

(a, c, b) rather than (a, e, b).

94
Proposition_5.15

For any repeatable Cepy Free Program Structure there exists an
equivalent Finite Program Structure.
23.92::

If there is never more than one entry in any data queue in the COpy
Free Program Structure, then critical races do not exist and an equivalent
Finite Program Structure can be constructed by using exactly these nodes
of the Cepy Free Program Structure.

Assuming that multiple entries in the data queues are possible, an
equivalent Finite Program Structure will be constructed. The nodes of the
COpy Free Program Structure will be used and various alterations will
be made to them. Performance of mappings can cause different actions
to be taken depending on which structure is being used. Another problem
which can arise is that it is possible for more than one cepy of a node
to be known at the same time in a Finite Program Structure whereas the
queueing discipline of the COpy Free Program Structure prevents this.

The multiple cepy problem can be eliminated by changing the execution
algorithm so that the oldest copy of a node is always processed first. This
forces the nodes of a Finite Program Structure to be processed in a
first-in, first-out manner which is identical to the manner they would

be processed in a Cepy Free Program Structure.

The other problem to be resolved is forcing the mappings of the
structure to behave identically. If any mapping to any node maps to n
elements of the data vector (u > 1) all that is necessary is to rewrite
that mapping in terms of n mappings where each maps to exactly one

element. The original mapping would then be replaced by n distinct

95
mappings to the same node. Although there may be many more mappings in the

Finite Program Structure, these additional mappings will assure duplicate
results for the Finite Program Structure. Clearly, a Finite Program
Structure has been constructed that behaves as though it were a repeatable

COpy Free Program Structure.

Note that there are other ways in which an equivalent FPS can be
constructed that do not involve the changing of the execution algorithm.
One such method would be the invocation of preposition 4.1, i.e., making
the FPS deterministic. The second step of this process would be to
rewrite the mappings as was done in the previous preposition. In practice,
it would not be desirable to make an FPS deterministic since only one

node could be processed at any time.

CHAPTER VI
SUMMARY AND CONCLUSIONS

6.1 Summagy

This dissertation has presented a novel approach to the problem of
multiprocessing. The notions of shared memory and asynchronous Operation
are widely used in other models. However, in most cases, these techniques
are limited because of the necessity to prevent two or more processors
from concurrently accessing the same memory location.

The concept of the cepy of a node is a new notion. Although seemingly
innocuous, it is this feature that eliminates many of the classical problems
of parallel processing as well as simplifying the entire process.

Chapter I presents an introduction and discusses previous work. A
statement of the problem is also included.

A formal definition of Program Structures is presented in Chapter II
interspersed with several motivational examples.

Some of the capabilities of Program Structures are discussed in

Chapter III and it is shown how the set of all inputs of certain Program
Structures can be partitioned in equivalence classes. A formalism is
introduced which can be used in the analysis of Program Structures. This
formalism proves to be most helpful as a tool and as a notational aid
when studying various SSpects of Program Structures.

Chapter IV discusses criteria necessary to ensure that the output
produced by a Program Structure is unique. This prOperty is called
repeatability and it is assured based on the absence of any critical
races. The maximally parallel form is investigated and it is shown that

any Program Structure can be written in this form.

96

97

The problems of interference and deadlock are presented in Chapter
V and solutions to these problems are discussed. Finite Program
Structures and Cepy Free Program Structures are defined and their
prOperties are studied with an emphasis on elimination of deadlock.
Finally, Finite Program Structures and Cepy Free Program Structures are

compared and conditions for equivalence of these structures are established.

6.2 Conclusions

The intention of this dissertation was to develOp a multiprocessing
model that was relatively easy to use and free from the problem associated
with earlier models. Whether or not the model discussed here is easy to
use may be a value judgement but clearly many of the classical prOblems
have been either eliminated or diminished. A major innovation in this
dissertation is the development of the concept of a cepy of a node.
It is this prOperty that makes the notion of shared memory take on a new
meaning.

It is no longer necessary to be concerned with the possibility
of two or more processors accessing shared memory simultaneously.
Consequently, there is no need to establish procedures for mutual
exclusion. It is the copy concept that reduces the seriousness of
problems such as repeatability, interference, and deadlock as well as
aiding in finding solutions for these problems.

Other innovations are the creation of a single node type and
the elimination of a separate control structure. Some models have over
ten node types each with precisely defined input, output and computation,
see (1, 3, 36, 41). Many models have employed two structures to represent

parallel computation; one structure to represent data relationships and

98
the other to represent the control mechanism of the model. In this
dissertation, there is only one structure and it serves as both a represen-
tation of data and control. Not only is the model highly readable, it
is also very usable in a wide variety of applications as suggested by the

examples in Chapter II.

6.3 Future Work

A t0pic which has received a great deal of attention in recent years
is "Structured Programming". This is intended to be a technique for
sequential programming where only three types of control structures are
allowed: Linear sequence, selection, and repetition. Selection refers
to conditional Operation and repetition refers to 100ping. These three
types of Operations are closely related to Program Structures and further
investigation could formalize these relationships. If it can be shown that
Program Structures can implement and structured program then many of the
results any techniques recently develOped in this area can be utilized.

In particular, study of translator writing systems using this marriage
could prove most interesting. The paper by Lincoln (32) presents some
ideas in this direction.

Another worthy area of research with respect to Program Structures is
the study of efficiency. No material has been presented which considers
the Speed of execution of a Program Structure or even the amount of memory
required. There can be numerous "bottlenecks" which can degrade system
performance. If these bottlenecks can be detected, various scheduling
schemes could be develOped that would either eliminate them or at least
minimize their effect. Since the amount of memory required by an FPS or a

CFPS is critical, a study of minimum memory requirements would also be in order.

99

Finally, another possible area of investigation is proving program
correctness of Program Structures. The problem of proving correctness
should be simplified somewhat due to the modular construction of Program
Structures. There have been a number of tools developed in this dissertation
such as the analysis formalism and equivalence classes of data which could
be most useful. Assertions about the behavior of Program Structures could
be made at the node level and this should have the effect of reducing the
complexity of the problem and making it possible to verify large scale

programs.

BIBLIOGRAPHY

10.

11.

12.

13.

BIBLIOGRAPHY

Adams, D. A., "A Model for Parallel Computations", Parallel
Processor Systems,_Technologies and Applications, L. C. Hobbs,
et al, Spartan Books, New York, pp. 311-333, 1970.

Anderson, James P., "Program Structures for Parallel Processing",

Baer, J. L., "A Survey of some Theoretical Aspects of Multiprocessing",
ComputinggSurveys, Vol. 5. PP. 31-80. March 1973.

Barnes, G. H., et al, "The ILLIAC IV Computer", IEEE Transactions on
C uters, pg. 7%, August 1968.

Bredt, T. H., et al, "Analysis and Synthesis of Control Mechanisms
for Parallel Processes", Parallel Processor SystemslgTechnolggies

237-295. 1970.

Brinch Hansen, Per, "Concurrent Programming Concepts", Qggpgtigg_
Surve , Vol. 5, pp. 223-245, December 1973,

Brinch Hansen, Per, Qperating System Principles, Prentice-Hall,
New Jersey, 1973.

Brinsfield, w. A. and Miller, R. E., "On the Composition of Parallel
Program Schemata", IEEE Conference Record of the 1971 Twelfth Annual
Symposium on Switching and Automata Theory. pp. 20-23, 1971.

Cadiou, J. M. and Levy, J. J., "Heehanizable Proofs about Parallel
Processes", IEEE Conference Record of the 1973 Fourteenth Annual
Symposium on Switchingiand Automata Theory. PP. EEZZS, 1973.

Campeau, J. 0., "Communication and the Sequential Problems in the
Parallel Processor", Parallel Processor_Systems, Technologies and
Applications, L. C. Hobbs, et al, Spartan Books, New York, pp.
215-23h, 1970.

Chen, T. C., "Parallelism, Pipelining, and Computer Efficiency",
Computer Design, Vol. 10, pp. 69-76, January 1971.

Chen, Y. E. And Epley, D. L., "Memory Requirements in a Multiprocessing
Environment", Journal of the ACM, Vol. 19, pp. 57-69, January 1972.

CODASYL DeveIOpment Committee, "An Information Algebra", Comm. ACM,
Vol. 5. pp. 190-204, April 1962.

100

1h.

15.

16.

17.

18.

19.

20.

21.

22.

23.

2#.

25.

26.

27.

28.

29.

Coffman, E. G. Jr. and Denning, P. J., Operating Systems Theory,
Prentice-Hall, New Jersey, 1973.

Constable, R. L. and Gries, D., "On Classes of Program Schemata",
IEEE Conference Record of the 1971 Twelfth Annual Symposium on
Switching_and Automata Theory, pp. 5-19, 1971.

Conway, M. R., "A Multiprocessor System Design", AFIPS Conference
Proceedings 1963 Fall Joint Computer Conference, Vol. 29, Spartan
Books, New York, pp.’1ho-1LS, 1963.

 

Denning, P. J., "The Working Set Model for Program Behavior",
Comm. ACM, Vol. 11, pp. 323-333, May 1968.

Dennis, J. B. and van Horn, E. C., "Programming Semantics for
Multiprogrammed Computations", Comm, ACM, Vol. 9, pp. 1R3-155,
March 1966.

Dijkstra, E. N., "COOperating Sequential Processes", Proggammigg
Iégggaggg, Gunuys, F., ed, Academic Press, New York, 19 .

Dijkstra, E. N., "The Structure of "THE" - Multiprogramming System“,
comm. ACM’ V01. 11, pp. 3l+1-31+6, My 1968.

Fleck, A. C., "Towards a Theory of Data Structures", Journal of
Computer and System Sciences, Vol. 5, pp. #75-488. 1971-

Gilbert, P. and Chandler, N. J., "Interference between Communicating
Parallel Processes", Comm. ACM, Vol. 15, pp. #27-437, June 1972.

Habermann, A. N., "Synchronization of Communicating Processes",
com. A04, V01. 15, pp. 171-176, ””611 1972.

Hellerman, R., Digital Computer System Principles, HoGraw-Hill,
New York, 1973.

Hill, F. J. and Peterson, G. R.,.Digital Systems: Hardware Organization
and Desi , John Wiley and Sons, New York, 1973.

Hoare, C. A. R., "A Structured Paging System", Computer Journal,
Vol. 16, pp. 209-219, August 1973.

Holt, R. C., "Some Deadlock Pr0perties of Computer Systems",
Computing Surveys, vol. 4, pp. 179-195, March 1972.

Joyner, w. H. Jr., "Automatic Theorem-Proving and the Decision
Problem", IEEE Conference Record of the 1973 Fourteenth Annual
Symposium on Switching and Automata Theory, pp. 159-166, 1973.

Karp, R. H. and Miller, R. E., "Parallel Program Schemata",
Journal of Computer and System Sciences, Vol. 3. pp. 197-195.
May 1969.

101

30. Keller, R. M., "0n Maximally Parallel Schemata", IEEE Conference
Record of the 1970 Eleventh Annual Symposium on Switching_and
Automata Theory, pp. 32-50. 1970.

31. Knuth, D. E., The Art of Computer Programming, Vol. 1, Addison-Wesley
Massachusetts, 1969.

32. Lincoln, N., "Parallel Programming Techniques fbr Compilers",
SIGPLAN Notices, pp. 18-31, October 1970.

33. Llewellyn, J. A., "The Deadly Embrace - a Finite State Model Approach",
Computer Journal, Vol. 16, pp. 223-225, August 1973.

34. Luconi, F. L., Asynchronous Control Structures, M. I. T. Project MAC
report MAC-TR-49, M.I.T., Massachusetts, 1968.

35- McIntyre, D. E., "An Introduction to the ILLIAC IV Computer",
Datamation, April 1970.

36. Miller, R. E., "A Comparison of some Theoretical Models of Parallel
Computation", IEEE Transactions on Computers, pp. 710-717, 1973.

37. Organick, E. I., The Multics System, M.I.T. Press, Massachusetts, 1972.

38. Rabin, M. 0. and Scott, D., "Finite Automata and their Decision Problems",
IBM Journal of Research and DevelOpment, Vol. 3, pp. 114-125, April 1959.

39- Randell, B. and Kuehner, C. J., "Dynamic Storage Allocation Systems",
Comm. ACM, Vol. 11, pp. 297-306, May 1968.

40. Regis, R. C., "Multiserver Queueing Models of Multiprocessing Systems",
IEEE Transactions on Computers, pp. 736-745, 1973.

41. Rodriguez, J. E., A Graph Model for Parallel Computations, M.I.T. Project
MAC report MAC-TR-64, M.I.T., Massachusetts, 1969.

42. Rutledge, J. D., "On Ianov's Program Schema", Journal of the ACM,
pp. 1-9, January 1964.

 

43. Slutz, D. R., The Flow Graph Schemata Model of Parallel Computation,
M.I.T. Project MAC report MAC-TR-53, M.I.T., Massachusetts, 1963.

44. Stone, H. S., "A Pipeline Push-down Stack Computer", Parallel
Processor Systems, Technologies and Applications, L. C. Hobbs, et al,
Spartan Books, New York, pp. 235-249, 1970.

45. Tsichritzis, D. C. and Bernstein, P. A., Operating Systems, Academic
Press, New York, 1974.

102

 

"‘mmmmIn“