H. .. 2.5%.... .
ovplf r O... I v'h’vh gqcm. .
.93.? 5.1.12.5 , v géfbwmﬁu.
J . r f“

I I. ‘ , ‘
743 c
. nwmk

. 1
31 ﬁﬁﬁﬁkﬁﬂﬁv .216
A , Q I
qux;
O .

uch1

Mm Ye.
. , I- x.
.+uu£.wo.....
5m??? .

aﬁv
. (uh, ‘1'...
{4M

‘2

10.. 1m #3.: f
. n... a I. ,

.. ”$31,:
0. erquI'! .17!
. «.2. u. I. .u on? .m.
zumnkacl .. ._ 9...?
. ﬂu”! v.11”. 11.31 a.
5941:: -: It
1.13.22.

92.3.1-

. d... ﬂxnawa‘m:
hr .v . .
«z. gwﬁs

52‘.

.11.
‘A

.
:33. p
.. $13.. «HWrPuhww‘. .
-it 1:30.451. 18559....z
1x ﬁch- liutAInIl-Uﬁ’ﬂd. ISLT.JO. #7:... I
tab: I. wing? up.
all E“‘l""

;
, .. ..
.9“... :r and“? . . .3... . 1 ‘
“1.."ngme I‘quhﬁul 55.31:: .
.- .uij. .1.
Jam... . ﬁr»... ..

. .Iuﬂdf‘
.71 .
“Er-p .
7’ " I. 1|”- £3).
9:409 u. 3.4%.“!
n .9.

iﬁﬂ.’
5 .

WM .iddwﬂ? fr? .
5!. e. .1 x ﬁt an.
. 4.4%.». nah». .vfﬁnukmq»-
- FYI. L‘N ' .-
lav-‘49"!

mgmmbw.

9 run»
:.

.g .r
...u5.h.. ,
3111...“. inevL.
9.33?!
b»... .
.

lﬁiyri

at -

V11

. X!
. I:
. all...
.43? . v .
4|! r...“

Sta-v.“
v.3. r...
In 12‘ ii.
1...)!

9'73.
3:. Ely)“. lit».
:0 x: .3 3.:

7
It

. .

‘
varnHut
: hurt...”

1

J0). n. ‘5. O\I§OVI .v 00!)! ‘0:

1141.
..lo ..t O? .
I luﬁh Altingf‘vl!
. A III}; ’Fbvlv.
. \Inyahknvll 011.1,:
2.33... .f..{tu‘-li
‘13“! tlu.‘ .
‘ I5.
[‘5]?! l'lﬂ ); | v
It..-»\!b.. .. u .2... sznvo
lo 21;)lebtt¢ulk.
.. . ‘

. .D‘I:
1....l I. Ila: ) .011 x‘J on u
lillvtlul .Igilblllhv
1.3 (It.
1?. rub
I. 3.1.1MMTI

tul-
$19..»
I.
~

.5.

1-
. r...
1.3431};
..v.v.§?!o-..!
f. 5.2.57.1
r. : Hit?» 1..
. . .5. 4...»?
.51.? H .5;
..

.v. v.vsclnlv.y
.1: .Oul....nlvl£:v
. A: It...

in}:

iluﬁvooarﬂlk.‘ rust”
”by/I; .10? u‘

1.31;.
-3 (an... o» a! o:

-L\.

 

'1‘,
v. I. -: .tvll|

.za

.l‘ltul'.‘
1| OIOYA‘n'lllt
11.0 ‘ nuuzlirl Illlll

THESIS

Illilli'lilll ill]Illllllllllllllllﬁll

31293 01716 5220

 

 

 

This is to certify that the

dissertation entitled

Toward a General Theory of
Unicast-Based Multicast Communication

presented by

Barbara D. Birchler

has been accepted towards fulﬁllment
of the requirements for

Doctoral degree in Philosophy

an MC a

Major p so

 

 

Date 8;/2 ‘//q7

MS U i: an Affirmative Action/Equal Opportunity Institution 0.12771

 

sof't

 

 

LIBRARY

Michigan State
University

 

 

PLACE IN RETURN BOX
to remove this checkout from your record.
TO AVOID FINES return on or before date due.

 

[RTE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1/98 chlRC/DataOmpGS-p.“

 

TOWARD A GENERAL THEORY OF UNICAST—BASED
MULTICAST COMMUNICATION

By

Barbara D. Birchler

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1997

ABSTRACT
TOWARD A GENERAL THEORY OF UNICAST—BASED MULTICAST
COMMUNICATION
By

Barbara D. Birchler

A common communication pattern that arises in many parallel and distributed computing appli-
cations is multicast, or one-to-many communication. In multicast, a single node in the system sends
a message to multiple destinations in the system. Due to the complexity of implementing multicast
in hardware, most machines offer only unicast, or one-to-one communication, in hardware. As a
result, multicast must be implemented in software. A common software technique is Unicast-Based
Multicast (UBM), in which multicast is accomplished by issuing multiple hardware unicast mes-
sages. In a UBM implementation, the number of nodes that can be informed can at most double in
each time step, thus the lower bound on the time to complete multicast is [logz (d + 1)] , where d is
the number of multicast destinations. The goal of our research is to develop a general [log ((1 + 1)]

UBM implementation for realistic direct network systems.

Current direct network systems obey the line-switching model of communication and employ
an oblivious routing scheme. In the line—switching model, the paths that are used to send messages
simultaneously in the network must be edge-disjoint. The paths that are used to deliver messages

am determined by the routing scheme of the network. In an oblivious routing scheme, each pair of

nodes in the network must use a speciﬁc prescribed path to deliver a message between them.

We introduce two realistic classes of oblivious routing schemes and develop line-switching
UBM algorithms for arbitrary topologies that use these schemes. First, we address source lim-
ited inclusive routing schemes. We show that these routing schemes can be modeled by directed
tree topologies. Mthin the context of directed trees, we show that a line-switching UBM algorithm
can be closely approximated by a node-switching algorithm, in which paths used simultaneously
must be node-disjoint as opposed to edge-disjoint. We then develop node-disjoint UBM algorithms
for arbitrary directed tree topologies. The second class of routing schemes are shortest path oblivi-
ous routing schemes, in which the oblivious path between any pair of nodes must be a shortest path .
in the network. We develop a [log2(d + 1)] UBM implementation for arbitrary direct networks
that employ a shortest path routing scheme. This result provides a theoretical basis that uniﬁes the
previous results on optimal multicast algorithms in speciﬁc direct network topologies. In addition,
this result provides system designers with some simple, intuitive rules for creating routing schemes

that guarantee multicast can be performed in minimum time.

© Copyright 1997 by Barbara D. Birchler

All Rights Reserved

To Edgar, who always says I can do it.

ACKNOWLEDGMENTS

This dissertation is the product of prayers, meaningful conversations, tears, and laughter, not only
of my own, but of many others to whom I am forever in debt. The words that I write here cannot
fully express my thanks, but I want you all to know that I truly appreciate the love, patience, and

commitment that you have shown me these past three (or really ﬁve) years.

I want to thank the members of my committee. This, of course, would not have been possible
without them. I thank Dr. Ni for giving insight into the “practical” and for his willingness to lend
and ear and give advice whenever I asked. I will always aspire to give lectures as exciting as Dr.
Palmer’s “weird stuff” lectures. I pray that he never loses his excitement for graph theory. He
appreciates mathematical beauty, and that is truly inspiring. I think Dr. Palmer appreciated my
crazy graph cupcakes and attire more than anybody, and that really means a lot me even if it seems
like a small thing. Of course most of the credit goes to my advisors. I thank Dr. Eric Tomg for
many insights and for lots of patience. From him I learned many valuable lessons about being a
student, a researcher, and an advisor. Finally, if it were not for Dr. Esfahanian’s conﬁdence in my
abilities, I would never have attempted to earn a PhD. He has been the best kind of mentor a student
could hope for. He shows great excitement for his work (of course it’s very easy to get excited
about graph theory), he is a concerned teacher and advisor, and he has always shown me the utmost
respect. Thank you.

vi

I also owe many thanks to my family — my immediate family and my church family. I thank
Christ Presbyterian Church in Pennsylvania for remembering me even though I live so far away,
and I thank University Reformed Church for giving me a new home. A special thanks to the URC
youth group, the Koinonia class, and the SIGAWOTS for their constant prayers and for helping me
put everything into proper perspective. I thank my parents for their prayers and support, and for
instilling in me that “protestant work ethic” that brought me here. I thank my brothers Rob (Four)
and Pete (If-n-Six), the coolest numbers I have ever known, and my sister Deb for providing me
with a get-away whenever I needed it and for her helpful insights about the many challenges life
always seems to have to offer. I give a HUGE thanks to my nephew Ashwin and my niece Marissa ‘

who remind me of what is really important in life. They have never been anything but a joy to me.

Perhaps my most heartfelt thanks goes out to my friends who have been through it all along side
me. I thank the “old” lunch group - Steve Turner, MP Jolly, Ron Sass and Edgar Kalns - who taught
me how to be a grad student when I was totally clueless. I’m especially grateful to Ron - we have
been through a LOT for a LONG time. I thank him for his willingness to listen and to share and
to rescue me from Latex and all the other “computer stuff” that gives me (and never anyone else)
problems. I also thank the “qualiﬁer group” - Chuck Severance, Mark Brehob, Steve Wagner (icii),
Natawut Nupairoj, Hugh Smith, and Jerry Gannod. We suffered together and we rejoiced together,

but most importantly we LEARNED together!!

I thank my friend, icii. He is certainly the most innovative, creative, intelligent individual I
know. We’ve been through thick and thin together. He was there when everyone else was and
when nobody else was. To him I am forever grateful. Thanks for proof reading my papers (even
the yucky notation and tedious proofs), for listening to me when I was a psych and letting me use
your handkerchief, for working with me, for playing with me, for showing me your smiles and your
frowns, and just for being an ic. “Friendship lasts forever.”

vii

I don’t think there are words to express my gratitude to Edgar Kalns. He was my inspiration
for this task. Edgar was a great example for me to follow. He always shows great conﬁdence in
my abilities, and he always knows how to motivate me and pick me up when I’m down. His love
and support have been a constant source of strength and comfort to me. He also has a “plethora” of
endearing qualities that are certainly too numerous to mention here. Edgars, es tev milu.

Finally, I have saved the best for last. My deepest thanks to Jerry. Thanks for your prayers, your

advice, your jokes, your hugs, your patience, for reminding me that worry accomplishes nothing:

“Who of you by worrying can add a single hour to his life? Since you cannot do this

very little thing, why do you won'y about the rest?” (Luke 12:25-26)

and most of all for your eternal commitment to see me though this and through every other challenge

I will face. I love you, Jerry - not just that either— I’m crazy about you!!

viii

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction

1.1 Motivation .........................................
1.2 Contributions .......................................
1.2.1 Source Limited Inclusive Routing Schemes ......................
1.2.2 Shortest Path Routing Schemes ............................
1.3 Organization of the Dissertation .............................

2 Background

2.1 Direct Network Model ..................................
2.2 Switching Techniques ...................................
2.3 Direct Network Topologies ................................
2.3.1 Mesh ..........................................
2.3.2 Toms ..........................................
2.3.3 Hypercube .......................................
2.3.4 Symmetric Communication ..............................
2.4 Graph Theoretic Formulation ...............................
2.5 Multicast Classiﬁcation ..................................
2.5.1 Implementation .....................................
2.5.2 Software Methods ...................................
2.5.3 Switching ........................................
2.5.4 Routing Schemes ....................................
2.5.5 Routing Information ..................................
2.6 Legal UBM Calling Schedules ..............................
2.6.1 Inter-Class Relationships ................................
2.7 Performance Measures ..................................
2.7.1 Optimality Implications ................................

3 Related Work

3.1 Unicast-Based Multicast Communication in Direct Networks ..............
3.1.1 Neighbor-Switching (Class 7) .............................
3.1.2 Line-Switching (Classes 1-3) ..............................
3.1.3 Node-Switching (Classes 4—6) .............................
3.2 Tree-Based and Path-Based Communicaton (Classes 8-9) ................
3.3 Different Communicaton Models .............................
3.4 Multicast in Systems Not Based on Direct Networks ..................
3.5 Matching Problems in Graphs ...... . ........................
3.6 Summary .........................................

ix

5.

\l‘l GMWN—ﬂ E:

t—In-it—It—tt—bu—It—i—tu—t
QQO‘D’UN-‘F‘i-‘NO

19

28
34
36
36
38
39

42

4 Source Limited Inclusive Routing 44
4.1 Oblivious Routing Models ................................ 44
4.1.1 Difﬁculties in Modeling General Oblivious Routing .................. 45
4.1.2 Inclusive Routing .................................... 47
4.1.3 Source Limited Routing Information .......................... 49
4.2 Properties of Multicast in Directed Trees ......................... 50
4.2.1 Deﬁnitions ....................................... 51
4.2.2 Decomposing Algorithms ............................... 52
4.2.3 Relating Node-switching to Line-switching ...................... 53
4.2.4 Reducing Multicast to Broadcast ............................ 55
4.2.5 Lower Bound on Broadcast in k-ary Ditrees ...................... 57
4.3 Summary . . ‘ ....................................... 61
5 Node~Switching Algorithms for Directed Trees 62
5.1 Centroid Algorithm .................................... 63
5.1.1 Algorithm Description ................................. 63
5.1.2 Time Complexity .................................... 65 '
5.1.3 CA in Meshes and Hypercubes ............................. 66
5.2 Smart Centroid Algorithm ................................ 71
5.2.1 Algorithm Description ................................. 72
5.2.2 Time Complexity .................................... 75
5.3 Optimal Decomposing Algorithm ............................ 77
5.3.1 Deﬁnitions ....................................... 77
5.3.2 The Optimal Substructure ............................... 80
5.3.3 Computing Root Schedules and Labor Vectors .................... 84
5.3.4 Computing |ND(T)| .................................. 86
5.3.5 Time Complexity .................................... 90
5.4 Summary ......................................... 92
6 Shortest Path Oblivious Routing 93
6.1 The Generalized Path-Matching Problem ........................ 93
6.2 Sufﬁcient Conditions for Perfect GDP-matching ..................... 95
6.3 Multicast Algorithm using GDP-matching ........................ 99
6.4 Summary ......................................... 102
7 Conclusions and Future Work 104
7.1 Conclusions ........................................ 104
7.1.1 Source Limited Inclusive Routing ........................... 104
7.1.2 Shortest Path Oblivious Routing ............................ 105
7.2 Future work ........................................ 106
7.2.1 Restricted Routing Model ............................... 106
7.2.2 Time Efﬁcient Line-Switching UBM Algorithms ................... 106
7.2.3 Trafﬁc Efﬁcient Line-Switching UBM Algorithms ...... ' ............ 107
7.2.4 Partially Restricted Routing .............................. 108

BIBLIOGRAPHY 109

LIST OF TABLES

4.1 Oblivious Routing Table ................................. 46
4.2 Source Limited Oblivious Routing Table ......................... 50
5 .1 Time Complexity for ODA ................................ 91

xi

LIST OF FIGURES

2.1 Direct network model ..................................
2.2 A 2 x 3 two~dimensional mesh ..............................
2.3 A 2 x 3 two-dimensional torus ..............................
2.4 A three-dimensional hypercube ..............................
2.5 Digraph representations of direct networks .......................
2.6 Graph model of a 4 x 4 Mesh ..............................
2.7 Graph models of direct networks .............................
2.8 Multicast problem classﬁcation ..............................
2.9 Oblivious routing schemes. ...............................
2.10 Legality relationships for UBM Calling Schedules ....................

3.1 Multicast in node-switching model. ...........................
3.2 Broadcast trees for K3. ..................................
3.3 Broadcast center of a tree. ................................
3.4 Multicast subgraph .....................................
3.5 Matchings in graphs. ...................................

4.1 Graph and Digraph representations for rings .......................
4.2 Multigraph model of oblivious routing ...... . ....................
4.3 Oblivious routing paths ..................................
4.4 Ditree T with source node r ...............................
4.5 Node-switching vs. Line-switching ............................
4.6 Mapping multicast to broadcast. .............................
4.7 Broadcast requires DU: log,‘ n). .............................

5.1 Centroid Algorithm at work. ...............................
5.2 Directed tree representing 2" x 2'” mesh. ........................
5.3 The binomial tree 3);. ..................................
5.4 Ditree representation of a Iii-node hypercube. ......................
5.5 CA performs poorly. ...................................
5.6 T.-’s for broadcast schedules C" (T) and 02(T). .....................

6.1 Cycle on 8 vertices .....................................
6.2 Two conﬂicting paths. ...................... p ............

xii

ll
12
12
14

16'

66

Chapter 1

Introduction

1.1 Motivation

The grand challenges in scientiﬁc computing, e. g., pharmaceutical design, ocean modeling, and
viscous ﬂuid dynamics, can only be solved by computers that can provide Teraﬁops of computing
power. Parallel computers are considered the most promising technology to solve such problems
[30]. Because many (hundreds to thousands of processors) work together to solve problems, ef-
ﬁcient communication among the nodes of a multicomputer is a vital component of system per-
formance. One communication pattern that arises in parallel systems is multicast, or one-to-many
communication. In multicast, one processor, called the source, sends a single message to multiple
destinations in the system. Unicast and Broadcast are special cases of multicast. Unicast, also
called one-to-one communication, involves a single source and a single destination. In broadcast,
or one-to-all communication, the source sends a message to all of the other nodes in the network.
Multicast communication is found in various applications such as parallel search algorithms [16],
graph algorithms [34], and barrier synchronization [39]. Multicast is also used for cache-coherency
protocols in distributed shared memory systems” [36], and it has been deﬁned in the MP1 (Message

1

2

Passing Interface) standard [22] as an essential operation for implementing other collective commu-
nication operations such as all-to-all gather and global reduction in distributed memory machines.
Because multicast communication is a component of many applications in parallel systems, efﬁcient
multicast implementation has been the subject of much study.

This dissertation addresses multicast communication paradigms for an important class of par-
allel systems, namely those that use a direct network interconnection structure [45]. Due to the
complexity of hardware implementations of one-to-many communication, most existing direct net-
work systems only support unicast communication in hardware. Thus, multicast must be provided
in software. A common software technique is Unicast—Based Multicast (UBM), in which multicast.
is accomplished by issuing multiple hardware unicast messages. A UBM implementation consists
of a number of time steps in which nodes that have previously received the multicast message send
the message to uninformed nodes. Ideally, the UBM implementation should maximize the amount
of parallel communication; that is, all the informed nodes should send messages during a single time
step in order to reduce the total number of steps needed to inform all destinations. Thus, the lower
bound on the time to complete multicast is [lg(d + 1)] steps‘, where d is the number of multicast
destinations. The objective of this research is to ﬁnd a general [lg(d + 1)] UBM implementation

for realistic direct network systems.

1.2 Contributions

We study the problem of ﬁnding efﬁcient line-switching UBM algorithms for arbitrary direct net-
work systems that use an oblivious routing scheme. In the line-switching model of communication,
all of the paths that are used to deliver messages simultaneously must be edge-disjoint. We focus

on the line-switching communication model because it most closely ﬁts popular direct network ar-

 

‘We use the convention lg n = log, n.

3

chitectures; however, we will use other communication models as stepping stones toward efﬁcient
line-switching implementations. In addition, we address oblivious routing schemes, in which each
pair of nodes in the network must use a speciﬁc prescribed path to deliver a message between them.
The routing schemes in current systems, such as xy-routing in a mesh and e-cube routing in a hy-
percube, are examples of oblivious routing schemes. Finally, we address arbitrary topologies with
two goals in mind. First, an algorithm for an arbitrary topology will be portable, i.e., it can be
used on any multicomputer topology. For example, a multicast algorithm developed for an arbitrary
topology can be used on a Mesh, Hypercube, Torus, etc. The second reason we consider arbitrary
topologies is that a general multicast algorithm gives practitioners insight about deﬁning the best ’
routing scheme for an arbitrary system. Routing schemes such as xy-routing and e-cube routing ex-
ist for particular multiprocessor systems; however, there is no general routing scheme for arbitrary
topologies.

The main contributions of this dissertation are the introduction of two important classes of rout-
ing techniques for direct network systems, namely Source Limited Inclusive Routing and Shortest

Path Routing, and the development of UBM algorithms for systems that employ these techniques.

1.2.1 Source Limited Inclusive Routing Schemes
Deﬁnitions and Properties

We deﬁne a realistic class of oblivious routing schemes called source limited inclusive routing
schemes. This class of routing schemes makes two assumptions. First, it assumes that each node is
only aware of its own oblivious routing paths. We call such a routing table a source limited routing
table. Next, it assumes the system uses an inclusive routing scheme, a natural class of oblivious
routing schemes used in most current multicomputers. We show that directed bees can be used to

model source limited routing tables for an arbitrary topology that uses inclusive routing, and that a

4

directed tree model has a smaller storage requirement than a general routing table. Furthermore, as
long as the maximum outdegree of the directed tree is not too large, this source limited routing table
contains many (but not all) of the oblivious routing paths used by other nodes in the network.

Our ﬁrst lower bound result shows that UBM in an arbitrary topology that supports oblivious
routing is fundamentally different than UBM in an arbitrary topology that supports free routing.
Speciﬁcally, we show that if the source limited routing table at a node is a complete, balanced, k-
ary tree T with |V(T)| nodes, a lower bound on the number of time steps necessary for performing a
broadcast from this node is 0(k logk |V(T) |), and we show this lower bound can be easily extended

to apply to algorithms which have complete routing tables.

As a ﬁrst step toward solving the general lower bound line-switching multicast problem, we
show that the minimum length node-disjoint multicast schedule in any directed tree T is at most
twice as long as the minimum length edge-disjoint multicast schedule for ditree T. We then show
that we can reduce the problem of ﬁnding an optimal node-disjoint multicast schedule in a directed

tree T to the problem of ﬁnding an optimal node-disjoint broadcast schedule in a related ditree T’.

Node-Disjolnt UBM Algorithms for Ditrees

Because we can reduce multicast in a ditree to an equivalent broadcast problem, we focus on UBM
algorithms for broadcast in directed tree topologies. We develop two polynomial time approxima-
tion algorithms, CA and SCA, for performing broadcast in directed trees under the node-switching
model of communication. The biggest advantage of the CA algorithm is its simplicity. Its best case
time complexity is 0(n lg n) (which it achieves for broadcast in hypercubes, meshes and tori), and
its worst case complexity is 0(n2). The second approximation algorithm (SCA) has slightly higher
complexity (0(n2) in the best case and 0(n3) in the worst case), but it may also produce much

shorter calling schedules than the CA algorithm. The broadcast schedules produced by SCA are

5

guaranteed to be no more than twice the length of an optimal length broadcast schedule.

We also describe a 0(n3) algorithm that always produces optimal length node-disjoint broadcast
schedules for directed tree topologies. This can easily be transformed to an edge-disjoint broadcast
schedule with length at most twice that of an optimal edge-disjoint schedule. Because ditrees rep-
resent a class of routing schemes (source limited inclusive routing schemes), our algorithms can be
applied to any topologies in that class (e. g., hypercubes, meshes, and tori) as well as other arbitrary

topologies.

1.2.2 Shortest Path Routing Schemes
The Generalized Path-Matching Problem

We also introduce a generalization of previously studied matching problems for undirected graphs.
We deﬁne the Generalized Path-Matching (GP-matching) problem, which includes both the edge-
matching problem and the path-matching problem. In a GP-matching, a subset X of vertices is
matched using paths from a speciﬁed set P. We are interested in ﬁnding a perfect GP-matching,
or one that matches all the vertices of X. Additionally, we focus on the edge-disjoint GP-matching
problem (or GDP-matching problem). Suﬁicient conditions for the existence of a perfect GDP-
matching are given. It is shown that if shortest paths are used to match vertices, a perfect GDP-

matching always exists.

A UBM Algorithm for Shortest Path Routing Schemes

We show that the GDP-matching problem can be used to create an optimal UBM algorithm for
an arbitrary direct network that uses a shortest path routing scheme. Because meshes, hypercubes
and tori typically use shortest path routing schemes, this result implies that optimal UBM can be

done in these direct network topologies. Thus, this result provides a theoretical basis that uniﬁes all

6

previous results on optimal multicast algorithms in speciﬁc direct network topologies [21, 46, 47].
In addition, this result provides system designers with some simple, intuitive rules for creating
routing schemes that guarantee multicast can be performed in minimum time, and it gives insight

about updating routing schemes when adding on to a network incrementally.

1.3 Organization of the Dissertation

Chapter 2 describes the background on multicast communication in direct network systems. Chap
ter 3 discusses related work in multicast communication and matching problems in graphs. Chap.

ter 4 describes source limited inclusive routing and several of its properties. Chapter 5 gives two ap-
proximation algorithms and one optimal algorithm for multicast in ditrees under the node-switching
model of communication. Chapter 6 describes the generalized path-matching problem and its appli-
cation to multicast in shortest path oblivious routing systems. Chapter 7 concludes the dissertation

and discusses several areas for future investigation.

Chapter 2

Background

2.1 Direct Network Model

Efﬁcient multicast communication is needed in parallel and distributed systems with widely varying
architectures. The speciﬁc organization of the communication network used by the system restricts
the manner in which multicast can be implemented. In this dissertation, we study multicast com-

munication in networks that are based on a direct network interconnection structure.

Many direct network systems, for example the Intel Touchstone DELTA and the Intel Paragon
[30] use the generic node architecture shown in Figure 2.1(a). Each node consists of a processor,
local memory, and a separate router to handle communication-related tasks. Each router has several
input and output channels. The router uses internal channels to communicate with its associated
processor, and external channels are used for communication between nodes in the network. Nor-
mally, each input channel is paired with a corresponding output channel. Thus, the number of input
and output channels is identical. If there is only one (inputoutput)-pair of internal channels, the sys-
tem is called a one-port architecture. In a one-port architecture, a node can send only one message
at a time and receive only one message at a time.-

7

 

 

 

 

T"? 3‘33

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ll ii
1 L
L__.J

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3
I router :
External External v A J E --I
Input Output ﬂ 4 5
(Jewels ﬂannel: _|
(a) Generic node architecture (b) 5 node network

Figure 2.1: Direct network model

Because memory is not shared, all communication in a direct network system is done via mes-
sage passing. Communication between nodes is either direct or indirect. Two nodes can communi-
cate directly if an external output channel of one node is connected to an external input channel of
the other node. For example, Figure 2.1(b) shows a possible interconnection of ﬁve nodes. In this
network, node 1 can communicate directly with node 2. In contrast, sending a message from node 1
to node 3 requires indirect communication because other nodes must be used to relay the message.

There are two note worthy characteristic of systems that use the generic node design of Fig-
ure 2.1(a). Fust, the router is typically designed using a crossbar, which permits simultaneous
transmission of several distinct messages provided each incoming message requires a different out-
put channel. Second, the separation of the router from the local processor makes the router capable
of relaying messages on its external channels without interrupting the local processor, allowing si-
multaneous computation and communication within each node [45]. To illustrate these concepts,
consider the network in Figure 2.1(b). Suppose node 1 sends a message to node 3, using node 2

as an intermediate node (shown by the dotted line). Node 3 can simultaneously send a message to

9

node 5, also using node 2 as an intermediate node (shown by the dashed line), because it requires
a different external channel of node 2’s router. Furthermore, the computation taking place on node

2’s processor is not interrupted because only node 2’s router is being used to relay messages.

2.2 Switching Techniques

When a message is sent between two nodes in a network, the message travels along a routing path
(which may be a direct communication link or a sequence of several communication links) in the
network. The time needed to deliver the message depends on the characteristics of the underlying .
system. One factor that contributes to the message transmission time is the network latency of
the system, or elapsed time between when the head of the message enters the network and the
time when the tail of the message is received. Network latency is highly dependent on the type of
switching mechanism that the system employs. Essentially two types of switching have been used:

store-and-forward switching and cut-through switching.

Store-and-Forward Switching

Early direct networks used store-and-forward switching techniques. In a system that uses store-
and-forward switching, an entire message is transferred from a node v to one of its neighbors w. If
the message is not destined for w, i. e., w is an intermediate node, the message is forwarded to one
of w’s neighbors when a buffer becomes available. The network latency for the store-and-forward
switching technique is (-’g—)d, where L is the message length, b is the channel bandwidth, and d is
the distance (number of links) the message must traverse. Because each message must be entirely
received at a node before it can be forwarded, the amount of time needed to deliver the message
is proportional to the path length d. For a more detailed discussion of routing algorithms based on

store-and-forward switching see [48].

10
Cut-Through Switching

Cut-through switching was developed to reduce message transmission time. A cut-through scheme
supports non-local communication primitives; that is, a node 1) can send a message to a non-
neighboring node w in “unit” time. This is possible because messages are not stored at interme-
diate nodes unless the next required channel is unavailable. The message is sent along the path in
a pipelined fashion. The network latency for cut-through switching is (%)d + -’g-, where L), is the
length of the header, b is the channel bandwidth, d is the length of the path traversed, and L is the
total length of the message. If L). is signiﬁcantly smaller than L, then the dominant term is %. That '
is, network latency is relatively unaffected by d. In a multicast implementation, each node is send-
ing essentially the same message (the message headers may be slightly different, but this will not
signiﬁcantly affect the message length.) Thus, we can assume that any (reasonably sized) multicast
message can be delivered in unit time, where the “unit” is 71;, regardless of the length of the path

traversed by the message [45].

'I\vo common routing schemes that use the cut-through approach are circuit routing and wann-
hole routing. In circuit routing, before a node u can send a message to a node w, 1) establishes a
circuit, or a path, from v to w. The message is then sent along this path in a pipelined fashion.

Because the channels on the circuit are reserved, no buffers are needed at intermediate nodes.

In wormhole routing, a packet is divided into a number of ﬁits (ﬂow control digits). The header
ﬂit advances along the delivery path and the other ﬂits follow in a pipelined fashion. If the header
is blocked, then all trailing ﬂits are also blocked. Consequently, small ﬂit buffers are needed at
intermediate nodes. Current commercial multicomputers such as the Cray T3D, NCUBE-2 [44],
Intel Paragon [31], and IBM SP-2 [50] use wormhole routing. Ni and McKinley provide a thorough

survey of wormhole routing techniques [45].

11
2.3 Direct Network Topologies

In a direct network system, the interconnection of the external input and output channels of the
routers determine the topology of the system. Three popular network topologies that we will refer

to often throughout this dissertation are: meshes, tori, and hypercubes.

2.3.1 Mesh

In an n x m 2-dimensional mesh, the nodes of the network are labeled by ordered pairs (11,-, 12,-),
whereO 5 i 5 n— 1 andO 53' _<_ m- 1. Eachnodehasadirectconnection toatmostfourother,
nodes, namely (tn-+1, vj), (tn--1, 11,-), (0,, 12,-“), and (0;, 15-1) (if they exist). Figure 2.2 shows an

example of a 2 x 3 mesh.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

* (00) ' “ (or) J (02) E

. 4 . t] r]
i]

1 (1,0) El .3 (1,1) Z .1 (1.2) I

 

 

 

 

 

 

 

 

 

Figure 2.2: A 2 x 3 two—dimensional mesh

Higher dimension meshes are deﬁned similarly. An n-dimensional mesh has nodes that are
labeled by n-tuples (01, . . . , on). Each node has a direct connection to at most 2n other nodes that

differ by one in each component of its n-tuple.

2.3.2 Torus

An n—dimensional torus is an n-dimensional mesh with wrap around edges. For example, in a two-
dimensional torus each node (12,-, 12,-) has exactly four direct connections to nodes (12“ +1)mod 2, 12,-),

(”(i—1)mod 2, 12,-), (vi, ”(j+1)mod 2), and (vi, ”(j-nmod 2). The edges that did not exist for the mesh

12

topology are the wrap around edges. Figure 2.3 shows an example of a 2 x 3 torus. The wrap around

edges are shown in bold lines. Higher dimensional tori are deﬁned similarly.

 

Figure 2.3: A 2 x 3 two-dimensional torus

2.3.3 Hypercube

An n-dimensional hypercube has 2" nodes label by all possible n-bit binary strings. Each node
has direct connections to n other nodes. Speciﬁcally, each node is connected to all nodes whose

bit strings differ in exactly one bit. Figure 2.4 shows and example of a 3-dimensional hypercube

toroloey-

 

Figure 2.4: A three-dimensional hypercube

13

2.3.4 Symmetric Communication

Meshes, Hypercubes, and Tori have the property that if node a: has an external output channel con-
nected to an external input channel of node 3;, then y also has an external output channel connected
to an external input channel of node 2:. Topologies with this property are said to have symmetric (or

“two-way”) communication between nodes :1: and y.

In an arbitrary topology that uses the generic node architecture, communication may not be
symmetric. For example, in the network shown in Figure 2.1(b) node 1 has a direct connection to
node 2, but node 2 does not have a direct communication link to node 1. We will assume arbitrary _

topologies do not have symmetric communication unless otherwise stated.

2.4 Graph Theoretic Formulation

When developing a multicast algorithm, it is convenient to have a mathematical model of the system.
In particular, the topology of a direct network system can be modeled by a graph, and algorithms
are developed using the graph model. In the system described in Section 2.1, each of the commu-
nication links from an external output channel of one node to an external input channel of another
node represents a one-way communication link. Consequently, communication is not necessarily
symmetric. Networks with this organization are best modeled by directed graphs, or digraphs. That
is, a direct network topology is modeled by a digraph G = (V, A), where V represents the nodes in
the network and A represents the communication links. For example, the digraph in Figure 2.5(a)
models the network pictured in Figure 2.1(b), and Figure 2.5(b) is a digraph that represents a 4 x 4
mesh topology.

Topologies that provide symmetric can be modeled by an undirected graph G = (V, E), where

V represents the nodes of the network and E represents two-way communication links between

14
(0.0) (0.1) (0.2) (0.3)
I?“
(1.0) (12) (13)

(2.0)

 

 
     
       

 

 

 

ll

"\ r (3,0) (3,1)] . (3.3)]
U‘ ): ( 05 m

(a) Network in Figure 2.1(b) (b) 4 x 4 mesh

 

 

Figure 2.5: Digraph representations of direct networks

nodes. In particular, if edge e = ‘0in E E(G’) then v,- and 22,- can send messages directly to each
other along the edge e. We assume G is connected and simple, i.e., there are no loops or parallel

edges. The undirected graph model of a 4 x 4 mesh is seen in Figure 2.6. This undirected graph is

(0.0) (0.1) (0.2) (0.3)
f f F

 

(1.0) (1.1) (1.2) (1.3)
e F e~ e

 

(2.0) (7.1) (7.2) (7.3)
e e e e

 

 

 

 

 

(3.0) (3.1) (3.2) (3.3)
e e r

 

Figure 2.6: Graph model of a 4 x 4 Mesh

equivalent to the digraph model in Figure 2.5(b). For the sake of simplicity, we will use a general
graph model to introduce main concepts and use directed graphs in special cases described later.
The graph models for a 4 x 4 toms and a three—dimensional hypercubeare shown in Figures 2.7(a)
and 2.7(b). Notice that these graphs are much easier to decipher than Figures 2.3 and 2.4.

In a direct network, messages are sent between nodes using direct or indirect communication.

The route that the message follows corresponds to a path in the graph model. A messages is sent

15

 

(0.11\ (03') (0

  
   
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

\ 110 111
(I (1.1) (12) (1.3)
a t t .s AW
(6‘ (2.1) (7.3)f i(2N3).§’
“db (3.)) /(33) (3.3) 000
i/ 1‘]
(a) 4 x 4 torus (b) 3-dimensional hypercube

Figure 2.7: Graph models of direct networks

from source node v,- to destination node (23' along an (i, j)-path. A sequence p = 0102 . . . on is a
path1 ifthere is an edge between node v,- and node 0,-4.1, and if v,- 7é 12,- for all i,j. We call p an
(i, j)-path if v; = v,- and v1’, = 1)]“; the path is denoted by p(i, j). Direct communication implies
that the path p(i, j) used to send a message from node v,- to node ‘Uj has length one, i.e., p(i, j) is
precisely the edge e = ”5111'. In indirect communication, an (i, j )-path uses intermediate nodes to
forward a message to its destination, i.e., p(i, j) = v, . . v), . . . 12,-, where v), is an intermediate node.

To initiate communication between nodes in a direct network, a message passing request is
made. A message-passing request is an ordered pair M = (S, D), where S g V(G) is the source
set and D g V(G) is the destination set. Message passing paradigms are classiﬁed according
to the sizes of the source and destination sets. The multicast communication paradigm that we
described above has the characteristics that IS] = 1 and ID] 2 1. Two speciﬁc cases of multicast
are unicast and broadcast. Unicast, or one-to-one, communication involves a single source and
a single destination, i.e., IS] = |D| = 1. In broadcast, also called one-to-all communication,

D = V(G) — S. Thus, the multicast problem is formulated in graph theoretic terms as follows:

 

'A path is typically deﬁned as an alternating sequence of vertices and edges. Since we consider only simple graphs, a
path can be deﬁned by a sequence of vertices.

16

Given a graph G = (V, E) and a message passing request M = (s,D) in G, ﬁnd an efﬁcient

implementation for M in G.

2.5 Multicast Classiﬁcation

Before a multicast algorithm can be developed for a speciﬁc system represented by a graph, the
physical characteristics of the system must be deﬁned, and any constraints that these characteristics

impose must be incorporated into the graph model and multicast implementation.

Figure 2.8 shows a classiﬁcation of different implementation methods for various constraints I
that have been deﬁned for direct network systems. We describe each component of the classiﬁcation

in the following sections.

]

i ,

Switching Routing Str'uegy Routing Inform

 

u------1 -

i

063

   

 

t

l

 

hhﬂﬁaun

 

-{.------------

 

 

El]

..-...-4 ------------

 

 

-.

 

t

 

 

d------d---------

 

Q

 

 

i]

 

 

i’

 

g.)

@

.---1CO---------------------------------------------.

.1
5i

"']"'

Figure 2.8: Multicast problem classﬁcation

17

Our research addresses only 1-port architectures. However, the same classiﬁcation applies to
k-port architectures, in which each node has It pairs of internal channels. Let Class I), refer to a
system with the characteristics of Class 1 in Figure 2.8 and a k-port architecture. We will assume
all problems refer to a l-port architecture unless otherwise stated. Thus, Class 1 (with no subscript)

is understood to be Class 11.

2.5.1 Implementation

Like any operation, multicast communication can be implemented in either hardware or software. _
The NCUBE—2 offered a restricted multicast implementation in hardware; however, the hardware
implementation was not deadlock free and was therefore disabled [40]. Due to the complexity of
hardware implementations of multicast, most existing direct network systems only support unicast
in hardware. Consequently, multicast must be offered in software. We address only software imple-

mentation of multicast communication in this dissertation.

2.5.2 Software Methods

Several software methods have been discussed in the literature: Unicast—Based Multicast (UBM),
path-based multicast, and tree-based multicast. We desribe each of these in the following sections;

however, our research addresses only UBM implementations.

Unicast-Based Multicast

Unicast-Based Multicast (UBM), in which multicast is accomplished by issuing multiple hardware
unicast messages, has become a popular software technique for implementing multicast in direct

network systems [21, 46, 47, 2, 4, 5]. Farley describes a calling schedule for implementing broadcast

18

communication [20, 2112. A calling schedule consists of several time steps of unit length during
which one or more unicasts can be executed in parallel. Each unicast is characterized by three
parameters: source, destination, and time. In order to clearly indicate the path used to perform a
unicast, we represent a unicast as an ordered quadruple (i, j, p(i, j), t) which is read as, “Node 1'
sends a message to node j along path p(i, j) during time step t.” For a multicast request M =
(S, D), where S = {s} and D = {d1, d2, . . . ,dn}, we deﬁne the informed set of M at time step t
with respect to a calling schedule C, denoted Ito (M), to be the set of all nodes from SUD that have
received the message after time step t is executed under calling schedule 0. Thus, If (M) = {s}
for any 0’ and any M. The length of a calling schedule is the number of time steps in which calls .

are made.

To illustrate these concepts, consider multicast request M = {121, v2, v3, 125} in the graph of
Figure 2.5(a). The calling schedule C = {(vl,vg,v2v3,1),(v1,v5,vlv4v5,2),(vg,v3,vgv3,2)}
implements M in G. One unicast call from 01 to 122 is made during the ﬁrst step. During the second

step, two unicast calls are made — one from 01 to 05 and another from oz to 03. Thus, the informed

setsare:

If)" = {”1}

II? = {01)1’2}

I2C = {vlrv2rv3rv5}r
andthe length of C is two.

 

2Farley assumes a graph model. All concepts are easily extended to a digraph model by making edges directed.

19
Thee-Based and Path-Based Multicast

In both the tree-based and path-based approaches, the multicast destinations are speciﬁed in the
message header. In the tree-based approach, a spanning tree rooted at the source node is found, and
the multicast message is propagated along the tree. In particular, the message follows a common
path until a branching point is encountered; the message is then replicated and sent along each
branch, where the message header contains only the destinations of the branch to which it is sent.

In the path-based approach, the message is sent along a path (or multiple paths) that goes through
each destination node. If the router discovers that the message is destined for its associated proces-
sor, it both copies the incoming message to the local processor and forwards the message along the
path. Unlike the tree-based approach, no branching occurs in this approach.

Both of these methods are concerned with ﬁnding trees or paths that are deadlock free. There is
no other measure of “efﬁciency” involved in these approaches. In addition, these appromhes may

required some additional hardware to support replication capabilities.

2.5.3 Switching

The edges of the graph (or arcs of the digraph) model described in Section 2.4 represent the physical
lines of communication between the nodes in the direct network system. Various characteristics of
the system, such as hardware and routing techniques, determine when the physical communication
lines can and cannot be used. In a UBM calling schedule, messages may be sent simulle be-
tween multiple source-destination pairs. One constraint that a UBM algorithm must consider when
scheduling simultaneous unicast calls is the communication model that applies to the direct network
system. Farley deﬁned three different communication models [21]. The distinguishing character-
istic in the three variations is the requirements on the paths that can be used during simultaneous

message transmissions. In the line-switching model, all the paths used simultaneously must be edge-

20
disjoint. In the node-switching model, all the paths used simultaneously must be node-disjoint. In

the neighbor-switching model, all communication is direct, i. e., each path used consists of a single
edge.

As the names suggest, these communication models depend on the type of switching used by
the direct network. Store-and-forward switching is best described by the neighbor-switching model.
In the neighbor-switching model, a message can only be sent to a neighboring node, and the time
needed to send a message to a neighboring node is considered to be one unit. The line-switching
model is applicable to current wormhole-routed, commercial multicomputers with generic nodes as
depicted in Figure 2.1(a). Any call requires unit time because message transmission is relatively '
independent of path length. In addition, each router can relay several messages simultaneously
provided the messages do not require the same external channels (i.e., paths are edge-disjoint).
Thus, two calls that use the same node can be made simultaneously, as long as the external channels
used in each path are disjoint. Because most current commercial multicomputers use cut-through
techniques, we will focus on the line-switching model of communication. Node-switching does
not have an analogous switching technique; however, notice that the node-switching criterion also

satisﬁes the line-switching criterion.

2.5.4 Routing Schemes

In the network in Figure 2.1(b), there may be multiple paths along which a message can be delivered
between two nodes. For example, node 1 can send a message to node 3 using node 2 as an interme-
diate node, or node 1 could also use both 4 and 5 as intermediate nodes to deliver a message to node
3. The path along which the message is delivered depends on the routing scheme used. A routing
scheme R of a direct network is a collection of all permissible paths along which a message can be

delivered for each pair of nodes in the network. If a routing scheme R includes every (2', j)-path for

21

each u,- and 12,-, the system is said to use free routing [46]. If, on the other hand, R includes exactly
one (i, j )-path for every pair of nodes v,- and ‘Uj, the system is said to use oblivious routing [35] or
restricted routing [46]. UBM algorithms are highly dependent on the routing mechanism used by
the underlying system.

In practice, parallel systems use oblivious routing rather than free routing. Current systems
based on mesh and‘hypercube topologies use an oblivious routing scheme called dimension-ordered
routing. For example, two dimensional mesh topologies typically use the :ry-routing scheme. Sup-
pose a message is being sent from source node (2,, y.) to destination node (21-, y,-) in a 2D-mesh.
The message is ﬁrst routed in the x-direction (along row 1:.) until it reaches column yj. The mes- I
sage is then routed in the y direction (in column yj) until it reaches its destination. To deliver a
message from node (2, 1) to node (4, 3) in Figure 2.9(a), the path shown by dashed lines is used.

The zy-routing scheme is oblivious because the speciﬁed path is unique and is used regardless of

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(1.1) (1.2) (1.3) (1.4) 110
f % 0
(ml? (2.2); (2.3) (2.4) 1 100
j - — - — ..... r:
. O
r '5
(31) (3.2) (33): (3 4) g 010
' e e e ' e >.
I
(4.1) (42) (43) ' (4.4) 000
. e}
x direction —- ‘— dim 0"“>
(a) xy-routing in a ZD-mesh (b) e-cube routing in a hypercube

Figure 2.9: Oblivious routing schemes.

network conditions. The Intel Paragon [31], a multiprocessor computer with a two dimensional
mesh network, utilizes zy-routing. Note that a similar dimension-ordered routing can be deﬁned for

torus topologies.

22

Hypercube topologies typically use e-cube routing, in which the message also travels through
the cube dimensions one at a time. Figure 2.9(b) shows the oblivious e-cube routing path used to
deliver a message from node 110 to node 001 in a three dimensional hypercube. The NCUBE-2

[44], a multiprocessor computer with a hypercube network, utilizes e-cube routing.

The third type of routing combines both of the above models. In partially-restricted routing,
part of the path is restricted and part of the path is free. This model was motivated by turnaround
routing in multistage cube networks. In turnaround routing, any path can be used in the forward
direction, but a speciﬁc path must be used in the backward direction [53]. We can deﬁne a similar

type of routing in direct networks.

2.5.5 Routing Information

The ﬁnal issue in the multicast classiﬁcation of Figure 2.8 addresses the amount of routing informa-
tion stored at each node. Ideally, each node has complete routing information. In other words, each
node knows the path used to route a message between any pair of nodes in the network. If routing
information is stored as a routing table, the storage requirements for complete information can be
quite large. In order to reduce storage requirements, each node may contain only partial routing
information. The least amount of information that a node can have is to store only the paths it uses
to route messages to other nodes in the network. Thus, a node v, knows how to route messages to all
nodes in the network but does not know how node v,- routes a message to node vk. We will refer to
this as source limited routing information. Complete routing information and source limited routing
information represent the two extremes of the amount of information that can be stored. There may

be several levels between these extremes.

23
2.6 Legal UBM Calling Schedules

Finding a UBM algorithm for multicast request M = (s, D) in a graph G is equivalent to ﬁnding
a legal calling schedule for M. There are ﬁve requirements for a calling schedule 0 to be legal:
that any path used to send a message must be contained in the routing scheme, that all the paths
used to perform a unicast in any given time step t must be source and destination disjoint (i.e., a
node can only inform one destination in a time step, and a node can only be informed once in a time
step), that only previously informed nodes can be the source of any unicast, that any destination
node receives the message exactly once, and that all destinations in D are informed within a ﬁnite _
amount of time. In generalizing ﬁom broadcasting to multicasting, we add the requirement that only
nodes in the source and destination set of multicast request M may be the source or destination of a
unicast. We present below the formal requirements for a legal calling schedule 0 that implements a
multicast request M under the line-switching model.3 The requirements for legal calling schedules
that implement node- and neighbor-switching are identical except for an appropriate modiﬁcation

to condition (vii).

(i) If (imam) e 0 menpw) e R.
(ii) If(i,j,p(i,j),t) 6 Cand (m,n,p(m,n),t) e Ctheni ¢ m andj 96 n.
(iii) Forall (i, j, p(r', j),t) e C,i 6 now).
(iv) Foran (i.j.p(i.j).t) e C. i r 1,0 (M).
(v) Thereexistsatime steptsuchthatIﬂM) = SUD. Wecalltthelength ofC.
(vi) For all (i, j, p(i, j),t) e C, i, j e so 1).

(vii) If (i, j, p(i, j),t) E C and (m, n, p(m, n),t) 6 C are distinct unicasts, then p(i, j) and
p(m, n) are edge-disjoint.

 

3Farley's requirements for a corresponding legal line-switching calling schedule include only conditions (iii) and (vii).

2.6.1 Inter-Class Relationships

We focus on UBM algorithrris for direct network systems. However, there are seven distinct classes
with the UBM framework. An obvious question that arises is: What is the relationship between the

classes? Figure 2.10 shows the relationships between legal UBM schedules.

 

 

 

Switching Routing Strategy Routing Information
. Cl
: he“ ' ®
-_._ A
‘
oblivious I 4 A

A

 

1

partial

A

 

 

 

 

 

 

J

 

 

{ED

 

i

 

 

P

“L;

>

 

i
i
“E

   

 

 

 

 

 

 

 

5..
“E

 

o
5

 

-----.------------ --------.------------]--------.-

.-.--q
-----q ----
‘1

Figure 2.10: Legality relationships for UBM Calling Schedules

The paths used in any step of a legal node-switching schedule are node-disjoint. Thus, the paths
are automatically edge-disjoint. This implies that any legal node-switching schedule is also a legal
line-switching schedule. Similarly, because the neighbor-switching model only allows calls to be
made along paths of length one, a legal neighbor-switching schedule is inherently node-disjoint.
Thus, a legal neighbor-switching schedule is also a legal node-switching schedule (and therefore

a legal line-switching schedule as Well). With respect to the routing strategy, any legal schedule

25

for a system that uses oblivious routing is also a legal schedule for a system that employs free
routing since the oblivious routing paths are a subset of the free routing paths. Similarly, any legal
schedule that has access to only partial information is a legal schedule in a system that has complete

information.

2.7 Performance Measures

Once we have determined the constraints of the system, there may be several legal calling schedules
that satisfy a multicast request. Thus, we need to evaluate the performance of a UBM algorithm so '
that the most efﬁcient calling schedule can be selected.

In general, efﬁcient algorithms should use as little of the system’s resources as possible. We
consider two resources: time and trafﬁc. Because communication overhead is so costly, it is de-
sirable to minimize the amount of time (or number of steps) necessary to complete a multicast
request. More formally, one objective is to ﬁnd an algorithm that always produces legal calling
schedules of minimum length. The lower bound on the number of steps for a legal UBM schedule is
[lg(|D| + 1)] . A lower bound UBM implementation will require exactly [lg(|D| + 1)]; however, an
optimal UBM schedule may require more steps if the lower bound is not achievable in the particular
system in which the algorithm is used. Thus, there is an important distinction between optimal and
lower bound UBM implementations. We deﬁne a time eﬁ‘icient UBM algorithm to be one that mini-
mizes the number of steps needed to perform the multicast in the particular system (i.e., an optimal
implementation).

Another way to measure performance is by the number of communication links used. Using a
small number of communication lines keeps the trafﬁc low and allows better overall system perfor-
mance. Thus, another objective is to minimize the sum of the length of the paths used in a legal

calling schedule. We will call a UBM algorithm that minimizes the number of communication lines

26
that are used traﬁ‘ic eﬁ‘icient.

2.7.1 Optimality Implications

In Section 2.6.1, we showed the relationships between legal calling schedules for the different
classes of UBM implementations. One may ask: Is there a similar relationship between optimal
UBM implementations? The implications shown in Figure 2.10 are exactly the same with respect
to lower bound implementations. Because no calling schedule can use fewer than [lg(|D| + 1)]
steps, a lower bound implementation 0 that is legal for one class will clearly be a lower bound
implementation for any other class in which 0 is a legal implementation.

Unfortunately, these implications do not hold for optimal implementations. For example, an
optimal schedule for a system in Class 3 may require more than I'lg(|D| + 1)] step. Because
additional routing information is available for Class 2, an optimal schedule for a system in Class
2 may require only [lg(|D| + 1)] steps. A calling schedule 0' is legal for both classes, but not
Optimal for both classes. Very little is known about the optimality implications for the classes. This

dissertation begins to answer this question.

Chapter 3

Related Work

Multicast, in particular broadcast, is one of the most widely studied communication paradigms
in networks. In this chapter, we give a brief survey of the various multicast problems that have
been addressed, focusing on unicast-based multicast implementations. Much of the early work in
multicast communication considered the problem of broadcasting in multicomputer systems that
assumed the neighbor-switching model. We describe this work in Section 3.1.1. As circuit routing
techniques such as wormhole routing became popular, attention turned to multicast under the line-
switching model. We discuss work in this area in Section 3.1.2.

In the UBM algorithms we describe, it is assumed that any message can be delivered in unit
time. In recent years, several authors have used other models in attempt to more accurately model
message transmission time on real networks. We describe several of these variations in Section 3.3.

In addition, parallel systems that are not based on direct networks are gaining popularity. For
example, the IBM SP—2 is based on an indirect network, called a multistage interconnection network,
as opposed to a direct network. We discuss multicast techniques that have been developed for these

and other networks in Section 3.4.

Fmally, because parallel machines and netwOrks are often modeled by graphs, many problems

27

28

in parallel and distributed processing are closely related to problems in graph theory. In Chapter 6
we will show how a path-matching problem in graphs has direct application to multicast communi-

cation. Thus, in Section 3.5 we provide a brief discussion of previous matching problems in graphs.

3.1 Unicast-Based Multicast Communication in Direct Networks

3.1.1 Neighbor-Switching (Class 7)

In the neighbor-switching model, a node may only call one of its neighbors. For example, node
(1, 1) in Figure 3.1 can only make calls to nodes (1, 2) and (2, 1). In this model, we assume that the A

cost to deliver a message is uniform for all communication links, i.e., a message can be delivered in

 

 

 

 

 

 

 

 

unit time.
1.1 1,2 1 .4
( b ( 1% ( .3)/.\ (1 )
J N \J
| . source node
(2.1) (2.2) (2.3) (2.4) , ,

C} \ {D <1.) ® destination node

' i - - xy-routing path
(3.4)
(4.4)

 

 

 

Figure 3.1: Multicast in node-switching model.

Before circuit routing became popular, most networks used store-and-forward routing. As
mentioned in Section 2.5.3, networks that use store-and-forward routing are best modeled by the
neighbor-switching assumption. Consequently, much of the early work on multicast in networks
assumed the neighbor-switching communication model. Notice that the concept of free routing vs.
oblivious routing is not applicable in the neighbor-switching model. There is only one “path” (of

length one) between a node v and its neighbor w. Thus, node v is both free to use any available path

29

and restricted to a speciﬁc path when sending a message to w.

One drawback of the neighbor-switching model is that not all subsets of nodes may be reached
using only local calls. For example, suppose a multicast request with S = {(3, 2)} and D =
{(1,2), (1,4), (2,2), (3, 1), (3,3), (4,3), (4,4)} is made in the network in Figure 3.1. Under the
neighbor-switching assumption, there is no way to deliver the message to node (1, 4) using only
nodes in S U D. Thus, it is not surprising that most of the work that uses the neighbor-switching
model concentrates on the broadcast problem, where S U D = V(G). Other authors have allowed
vertices that are not source or destination vertices to be involved so that multicast can be addressed.
In this case, nodes that do not need the message are interrupted and required to forward the message. .

Slater et al. show that the problem of determining the minimum amount of time required to
broadcast from an arbitrary vertex of an arbitrary graph is NP-complete [49]. Because determining a
minimum length broadcast schedule in an arbitrary graph is NP-complete, researchers have focused
on other aspects of the problem. In the context of neighbor-switching, two classes of problems have
been considered. The ﬁrst problem is to ﬁnd an efﬁcient way to broadcast in a given graph. The

second problem is to build a graph in which broadcast can always be done eﬁiciently.

Finding Efﬁcient UBM Broadcast Algorithms

In general, a broadcast request is made in a given graph, and a calling schedule (see Section 2.6) is
used to implement the broadcast. The source node makes a call to one of its neighbors, and the other
nodes may also make calls to neighbors after they have received the message. This continues until
all nodes are informed. Typically, the edges that the messages travel along make up a broadcast
tree. Figure 3.2 shows two broadcast trees for K 3, where the source node is node 1. The number on
each edge shows the time step during which the edge is used to deliver a message. The broadcast

tree in 3.2(b) requires seven steps while the tree in 3.2(c) requires only three. Many broadcast trees

 

 

 

(a) Complete graph K a (b) 7-step broadcast tree

Figure 3.2: Broadcast trees for Kg.

exist for a single graph. One objective that has been studied is to choose the “best” broadcast tree.

Because determining the minimum amount of time required to broadcast from an arbitrary ver-
tex of an arbitrary graph is NP-complete, Slater et al. focus on broadcasting in trees. In particular,
they develop an algorithm for ﬁnding the broadcast center of a tree. In general, a center of a graph
is deﬁned as follows: (1) each vertex is assigned a value according to some measure, e.g., weight,
distance, or time; (2) the center is the set of vertices that have minimum value. In [49], the measure
being minimized is the number of time steps needed to broadcast from a vertex. That is, the broad-
cast center of a tree is the set of vertices in the tree from which broadcast can be completed in the
least amount of time. Slater et al. Show that the broadcast center of a tree is always a star with two
or more vertices. For example, consider the tree in Figure 3.3(a). The number shown beside each
node is the broadcast time ﬁom that node. Thus, the subgraph shown in bold lines is the broadcast

center of the tree.

Koh and Tcha [43] use a different measure to ﬁnd the broadcast center of a tree. They minimize
the average time at which a vertex in the tree receives the message; this is called the minisum
criterion, whereas the function used by Slater et al. is called the minimax criterion. The average

time that each node receives the broadcast message is shown in Figure 3.3(b), and the center under

31

 

(a) Broadcast center under minimax (b) Broadcast carter under minisum

Figure 3.3: Broadcast center of a tree.

the minisum criterion is again shown by bold lines. The center is not the same as that for the

minimax criterion, however this is not always the case.

Bharath-Kumar and Jaffe [8] consider multicast in arbitrary networks, and they focus on trafﬁc
efﬁciency rather than on time. As mentioned earlier, not all subsets of nodes can be reached using
local calls. Thus, some of the nodes used to deliver the message may not be destination nodes.
Furthermore, the authors permit any subgraph to be used to deliver messages. Suppose a node
must deliver a message to a non-neighboring node. A common technique is for the node to use local
information to determine which link it will use to forward the message (we called this source limited
information in Section 2.5.5), and therefore which node is the next to be responsible for delivering
the message. Because the routing is done in this distributed manner, any subgraph may be used for
the multicast. For example, Figure 3.4 shows a graph that is used for multicast that is not a tree.
Node 1 is the originator of the message, and it is responsible for sending messages to nodes 6 and
7. Node 1 chooses to send the message destined for node 6 through intermediate node 2 and to send
the message destined for 7 through intermediate node 5. Node 2 sends the message directly to node
6. Node 5 choose to send the message destined for 7 through node 6, thus creating a cycle in the

multicast graph.

32

 

. SOUI‘CC

@ destination

 

 

 

 

   

Multicast Graph

Figure 3.4: Multicast subgraph.

Mthin this context, Bharath-Kumar and Jaffe deﬁne two criteria used to measure the efﬁciency
of the multicast implementation. Network Cost (NC) measures the number of communication links '
used to deliver the message and Destination Cost (DC) measures the average delay experienced by
each destination. Their goal is to minimize NC because applications like ﬁle transfer do not rely
heavily on message delay. The authors explain that, while optimizing DC is relatively easy, opti-
mizing NC is an NP-complete problem. They discuss the feasibility of using DC optimal algorithms

to approximate NC optimal solutions, and they devise several heuristics for minimizing NC.

Multi-port Architectures (Class 7),)

Finding efﬁcient neighbor-switching UBM algorithms has also been studied in systems with a multi-
port architecture, i.e., in systems where each node can send or receive more than one messages at
a time. Choi and Esfahanian [14] study multicasting in such systems. They model the network
as a graph and consider both time and trafﬁc efﬁciency. In particular, they propose the Optimal
Communication Tree (OCI') problem, in which a broadcast tree T that preserves the distances in G
from each node to the source and that has the fewest number of nodes is to be found. In general,
OCT is NP-complete. Choi and Esfahanian show that the OCT problem is NP-complete even for the

n-cube and for graphs whose maximum degree is at most three. They also present some heuristics

33
for the OCI‘ problem.

Efﬁcient Broadcast Graphs

Other work under the neighbor-switching assumption has focused on characterizing the types of
graphs in which broadcast can always be done efﬁciently. The broadcast time from a vertex v in a
graph G is the minimum number of time steps needed to broadcast a message from v. The broadcast
time of a graph G, b(G), is the maximum broadcast time from any vertex in G. Because the number
of informed nodes can at most double in each step, b(Kn) = [lg 71]. However, some edges can ’
be removed from Kn, and the resulting graph still has broadcast time I" lg n]. A minimal broadcast
graph on n vertices is a graph G with the properties that b(G) = [lg n], but any subgraph G’ of G
has b(G’) > [lg 73]. Furthermore, B (n) is the minimum number of edges in any minimal broadcast
graph, and a minimum broadcast graph (mbg) on n vertices is a minimal broadcast graph with B(n)
edges. An mbg provides a network topology in which broadcast can be completed from every node
in a minimum number of time steps.

Farley et al. determine B(n) for n _<_ 15 and for graphs with 2" vertices [19]. B(n) has been
calculated for other speciﬁc values of n as well [42, 7], however it appears very difﬁcult to calcu-
late B(n) in general. Thus, many authors have focussed on constructing sparse broadcast graphs.
A sparse broadcast graph contains a small number of edges that is not necessarily the minimum
number of edges of a broadcast graph [20, 13, 26]. Not all of the broadcast graphs found pro-
vide practical network conﬁgurations. In an attempt to make more realistic mbgs, construction of
sparse broadcast graphs with bounded maximum degree has also been studied [37, 6]. All of there-
sults mentioned above were for undirected broadcast graphs. Liestrnan and Peters found minimum

broadcast directed graphs for some speciﬁc values of n [38].

Variations of this problem have also been considered. Gargano et al. study the mbg problem in

34

the presence of failures [25]. In particular, they want to ﬁnd Bk(n), the minimum number of edges
in any graph on n vertices than can broadcast from any node in |' lg n] time steps in the presence of

up to 1: transmission failures.

3.1.2 Line-Switching (Classes 13)

In Section 2.5.3 we showed that current direct networks, in which each node has a processor and
separate router, allow “long distance” calls to non-neighboring nodes to be made in unit time. Un-
like the neighbor-switching model, several physical paths may exist between a source node 3 and
destination node d. If the system allows any (.9, d)-path to be used, it is said to employ a free routing .
scheme. If the system speciﬁes a unique (.9, d)-path for sending a message ﬁom s to d, the system

is said to use oblivious [35] or restricted routing [46].

Free Routing (Class 1)

Farley ﬁrst addressed line-broadcasting (broadcasting in a general graph under the line-switching
assumption) [21]. He shows that under the line-switching assumption, broadcasting can be com-
pleted in minimum time (|' lg n] time units ) in any connected network of n nodes that employs free
routing, regardless of message originator.

McKinley, et al. addressed the problem of performing multicast in wormhole-routed direct
networks [46]. They showed that there exists a lower bound implementation for any multicast
request M = (s, D), i.e., it requires exactly flg(|D| + 1)] steps to deliver the message to all
destinations in D, in systems that admit free unicast communication, regardless of topology. The
general idea of their algorithm is as follows. First, construct a trail that includes the source node and
all the destinations nodes. The message is delivered from the source to a destination at the “middle”

of the trail, and the trail is then broken into two equal length subtrails each of which contains an

35

informed node. This is continued recursively until all destinations are informed, thus [lg(|D] + 1)]

steps are required. Neither Farley nor McKinley et al. consider trafﬁc efﬁciency.

Kane and Peters [32] consider time and trafﬁc efﬁcient broadcasting in cycles. They construct
algorithms that always use a minimum number of phases ([lg n], where n is the number of nodes

in the network) and that also minimize the number of communication links used in each phase.

Oblivious Routing

With respect to time efﬁciency, problems in Class 1 have been solved, i.e., optimal algorithms have
been found for line-switching systems that allow ﬁee routing. Unfortunately, most commercial
machines offer only oblivious unicast routing, particularly in wormhole routed networks where
deadlock would result if free routing were used. Consequently, it is important to study efﬁcient

UBM algorithms for systems that employ oblivious routing.

No popular topologies for multicomputers are the mesh and the hypercube. Both of these
systems use an oblivious routing technique called dimension-ordered routing. Two commonly used
dimension-ordered routing schemes are zy-routing and e-cube routing. In rig-routing, used in a
two-dimensional mesh, a message moves ﬁrst in the :1: direction and then in the y direction. For
example, to deliver a message from node (3,2) to node (1,4) in Figure 3.1 the path shown by
dashed lines is used. The hypercube uses e-cube routing, in which the message travels through the
dimensions one at a time. McKinley et al. [46] use the calling schedule implementation described
in Section 2.6 to develop lower-bound (on time) algorithms for doing multicast in hypercubes and

meshes. Their algorithm was later extended to do multicast in torus networks [47].

36
Multi-port Architectures (Class 2),)

Variations of this problem have also been studied. McKinley and Trefftz study broadcast in all-port
wormhole-routed hypercubes [41]. All-port architectures have the same number of internal and
external channels (see Figure 2.1(a)). They develop the double tree (DT) algorithm, and show that

it completes broadcast in [’21] steps in an n-cube.

3.1.3 Node-Switching (Classes 4-6)

Because the node-switching model does not have a corresponding switching technique in direct
networks it has not been studied as extensively as the other models. Farley addressed node-switching '
algorithms for Class 4, i.e., for systems that allow free routing [21]. He shows that it may be
impossible to ﬁnd a minimum length, i.e., [lg n], node-disjoint broadcast schedule in an arbitrary
tree with n nodes. However, he also shows that for all 11 there does exist a tree with n nodes that

allows minimum time node-disjoint broadcasting to be done.

3.2 Dee-Based and Path-Based Communicaton (Classes 8-9)

As discussed in Chapter 1, there are also multicast algorithms that are not based on making several
unicast calls, i.e., non-UBM algorithms. In tree-based and path-based multicast, the destinations
are encoded in the header, and the router examines the header and capies or forwards the mes-
sage. Because messages that traverse multiple communication links can hold several links at a time,
deadlock can occur if the paths are not chosen careﬁrlly.

In [11], Byrd et al. describe three tree-based multicast protocols for deadlock-free multicast,
however no speciﬁc algorithms are given. Lin and Ni show that in wormhole-routed multicomputers
tree-based multicast algorithms can suffer from deadlock [40]. In particular, they show that deadlock

can occur in the tree-based multicast algorithm implemented in the nCUBE-Z. In [39], Lin et al. give

37
a tree-based algorithm for 2-D meshes that is deadlock-free if double channels are used; however,

they explain that double channels incur extra cost and the performance of the tree-based algorithm

in a 2-D mesh is inferior to other path-based approaches.

Lin et al. propose three deadlock free path-based multicast algorithms for any network that
contains a Hamilton Path, and discuss the performance of these algorithms in 2-D meshes [39] and
in hypercubes [40]. They divide the network into high-channel and low-channel subnetworks. The
dual-path algorithm sends messages along two paths; one containing the destinations with labels less
than the source, the other containing the destinations that are greater than the source. The multi-path
algorithm is similar except that it divides the network into multiple subnetworks and uses multiple
paths to deliver messages. The ﬁxed-path algorithm is similar to the dual-path algorithm except that
it uses longer paths that are the same regardless of the destination set. They use trafﬁc to evaluate
the two algorithms. In [39], they show the performance of the algorithms for an 8 x 8 mesh. They
conclude that the dual-path algorithm is more efﬁcient in that the network latency is relatively stable

for any number of destinations.

The work by Lin et al. was extended by Tseng et al. [51]. The algorithms in [40, 39] are
based on Hamilton paths in the multicomputer network. If faults occur in the network, they may
destroy the Hamilton paths. Consequently, they propose a trip-based, rather than path-based, model
for multicast communication that does not rely on the existence of a Hamilton path. The trip-based
model is similar to the path-based approach. A trip that contains the source and destinations is found
(nodes can appear multiple times in the trip). Forward and backward virtual graphs are constructed
from the trip, and messages are sent along the forward and backward graph to the appropriate
destinations. The path-based algorithms are a special case in which the trip is a Hamilton path.
Tseng et al. discuss the performance of their algorithm in hypercubes, and show its adaptability in

the presence of faults.

38

Boppana et al. [9] propose a deadlock-free multicast algorithm for wormhole routed mesh net-
works. They assume that each node has only one pair of channels associated with each source
and destination for sending and receiving messages. They deﬁne compatibility between unicast and
multicast routing algorithms. A unicast and multicast algorithm are compatible if there are no dead-
locks between the messages routed by them. In the system Boppana et al. describe, the dual-path
and multi-path algorithms described above are not compatible with e-cube routing. They propose
the column-path routing algorithm which is compatible with e-cube routing and compare it to dual-
path and multi-path. They show that its trafﬁc efﬁciency is comparable to that of the dual-path

algorithm.

3.3 Different Communicaton Models

The work we have discussed so far has assumed that a message is delivered in a single step, or
in “unit” time. While this is a fairly good model, other frameworks have been developed to more
closely model communication on actual machines. Such models include the Postal model, the Log?
model, and the Parameterized model.

In the Postal model, a message transmission requires A units. Thus, a message sent at time t will
arrive at its destination at time t+ A - 1. In [10], a method is given for constructing optimal multicast
trees under this model. The authors assume a logical complete-graph topology, i.e., they assume
that communication is uniform between any pair of nodes in the network. The LogP model is a
generalization of the Postal model. In the LogP model, L is the latency (time to send a message), 0 is
the overhead (time that a processor is involved in transmitting or receiving and cannot perform other
operations), and g is the gap (time between successive message transmissions). Karp et al. develop
optimal broadcast implementations under this model [33]. Although the LogP model is more general

than the Postal model, it also has shortcomings in modeling communication on real machines. Park,

39

et al. introduce the Parameterized model to more closely represent actual communication. In this
model, message transmission is comprised of three parameters: the sending latency (tang), the
receiving latency (new), and the network latency (tact). Unfortunately, each of these parameters
is rather difﬁcult and costly to measure. Thus, the easily measured parameters tend, or end-to-end
latency, and thorn. holding time, are introduced. As in [10], Park, et al. assume a logical complete-
graph topology. They describe methods for constructing optimal multicast trees for one-port and
multiport architectures under the Parameterized model.

In each of the above models, communication time is considered to be uniform regardless of the
message size or the lenth of the path being traversed by the message. Fraigniaud and Peters [23] '
describe a method for measuring communication time that relies on message size and path length. In
particular, they model the time to send a message of length 8 over a path of length d as a + d6 + £1,
where or represents startup time, 6 represents the switching delay, and r is the propagation time.
They prepose a circuit-switched algorithm for a torus that requires time d lg(n)a + d§6 + d lg(n)£r

under this model of communication.

3.4 Multicast in Systems Not Based on Direct Networks

Multicast has also been studied for various parallel and distributed systems that are not based on a
direct network interconnection structure.

Gaber and Mansour study broadcast in radio networks [1]. In this problem, if two stations
neighboring a node transmit simultaneously, their messages will collide, and neither of the messages
will be received at the node. Thus, a node can receive a message only if exactly one of its neighbors
transmits during a time slot. Gaber and Mansour develop a broadcast algorithm that requires (2(D)
time slots for sufﬁciently large D, where D is the diameter of the graph.

Huang and McKinley study multicast communication in ATM networks used for parallel com-

4o

puting [29]. They show that multicast trees can provide a substantial beneﬁt over a separate ad-
dressing technique, in which the source sends the message sequentially to each destination.
Multicast for systems based on indirect networks has also been addressed. Xu et al. describe a
software technique for multicast in multistage cube networks that use turnaround routing [53]. In
particular, they develop the U -min algorithm for optimal multicast in such networks. The technique
used is similar to the multicast algorithms developed for mesh and hypercube topologies in [46]. For
a multicast request M = (s, D), the destination addresses are put into a lexicographically-ordered
chain. The destination in the “middle” of the chain is informed, and the chain is divided into two
equal subchains. This process is repeated recursively for [lg(|D| + 1)] until all destinations are '
informed. Xu et al. implement their algorithm for an IBM SP—l and show its superiority over other

existing techniques such as Chameleon [27] and MPI-F [24] broadcast.

3.5 Matching Problems in Graphs

A matching in an undirected graph G = (V, E) is a set M C E(G) of pairwise independent edges
(i. e., no two edges share an endpoint). The endpoints of each edge e E M are said to be matched. We
will refer to M as an edge-matching in G. For example, the bold edges in the graphs of Figure 3.5
are edge-matchings. Edge-matching has been used to model and solve many types of problems.
For example, consider a graph in which vertices represent jobs and workers, and there is an edge
between job j and worker w if worker w is able to perform job j. An edge-matching in such a graph
represents jobs that can be done simultaneously by qualiﬁed workers. It is often desirable to ﬁnd
a maximum matching, where the largest possible number of vertices are matched. The matchings
shown in Figure 3.5 are maximum matchings. In the example where jobs are matched to workers, a
maximum matching represents the largest number of jobs that can be done simultaneously.

A maximum edge-matching in a graph may leave some vertices unmatched. If every vertex

41

 

 

 

2 3
1 I 4
8 4‘
5
7 6
(a) Perfect edge-matching (b) No perfect edge-matching

Figure 3.5: Matchings in graphs.

in the graph can be matched, the matching is called a perfect matching (if there is an odd number
of vertices then exactly one vertex is left unmatched.) 'Ihe graph in Figure 3.5(a) has a perfect
matching, but the graph in Figure 3.5(b) does not.

Wu and Manber introduce a generalization of an edge-matching called a path-matching [52].
In a path-matching, vertices of G are joined by simple paths rather than edges. Speciﬁcally, a
path-matching M in G is a set of simple paths in which no two paths share the same end vertices.
For path-matching, the deﬁnitions for matched vertices and perfect matchings are analogous to the
deﬁnitions used in edge-matching, where an edge is replaced with a path. In Figure 3.5(b) the bold
edges together with the paths shown by dashed lines represent a perfect path-matching in the graph.
Various types of path-matchings can be deﬁned by putting constraints on the paths contained in the
matching M. For example, a disjoint path-matching (or DP-matching) is a path-matching in which
paths in M are pairwise edge-disjoint. 'Ihe path-matching in Figure 3.5(b) is a DP-matching.

Each of the matching problems described above has several commonly studied variations. The
ﬁrst variation is ﬁnding maximum (or perfect) matchings. As seen in Figure 3.5, perfect edge-

matchings do not exist in all graphs, but several algorithms exist for ﬁnding maximum edge-

42

matchings in graphs [12]. Wu and Manber show that, unlike the case for edge-matchings, at least
one perfect DP-matching exists in every connected graph, and they describe an algorithm for ﬁnd-
ing such a perfect DP-matching [52]. A second widely studied variation is further reﬁnement of
the maximum matching problem. In particular, several maximum matchings may exist in a graph,
and the “best” of these matchings must be determined. This problem arises in weighted graphs,
where a maximum matching that has minimum cost is desirable. There are two common ways to
deﬁne the cost of a matching: as the sum of the weights of the edges in the matching or as the max-
imum weight edge in the matching (this is also called a bottleneck matching). Gabow and Tarjan
provide a polynomial algorithm for ﬁnding a bottleneck edge-matching in a graph (i.e., a maximum '
edge-matching whose maximum weight edge is as small as possible). Wu and Manber study both
these measures of cost for path-matchings [52]. They deﬁne the min-sum and min-max variations
of the DP—matching problem. In the min-sum problem, the objective is to minimize the sum of the
weights of all edges used in the path-matching. In the min-max version, the objective is to minimize
the maximal cost of a path in a matching, where the cost of a path is determined by the sum of the
weights of its edges. Wu and Manber show that the min-max problem is NP-complete for general
graphs. 'lhey also provide two polynomial algorithms for ﬁnding perfect min-max DP-matchings
in trees. Although the bottleneck DP-matching problem is NP-complete, Datta and Sen show that it
can be closely approximated In particular, they give a l-approximation algorithm for the bottleneck

DP-matching problem [15].

3.6 Summary

Recall that in Chapter 1 we discussed various components of the multicast problem: network model,
communication models, routing types and performance measures. The bulk of the work has been

done using a graph model for the network. This) necessarily implies that the network is symmetric,

43

i.e., if a: can communicate directely with y then y can also communicate directly with 2:. Very little
has been done in directed graphs, which model non-symmetric networks.

Both the neighbor-switching and line—switching communication models have been studied.
Finding time efﬁcient and trafﬁc efﬁcient multicast algorithms in systems that use neighbor-
switching is NP-complete. However, neighbor-switching does not accurately model current routing
techniques. Direct networks that use circuit routing techniques, such as wormhole routing, conform
to the line-switching communication model. Because several paths may exist between a source and
destination pair, many authors have considered free routing. Optimal multicast algorithms have
been developed for arbitrary topologies that admit free routing. However, most commercial multi- '
computers use some type of oblivious routing. Optimal time UBM algorithms have been developed
for hypercubes, meshes, and tori. However, none of the results for oblivious routing techniques are
general, i.e., they do not apply to any topology that uses an arbitrary oblivious routing scheme.

Finally, two metrics have been used to measure the performance of multicast algorithms: time
and trafﬁc. Very little of the previous work in multicast communication has addressed trafﬁc efﬁ-
ciency. In particular, the problem of creating trafﬁc efﬁcient UBM algorithms has not been studied

for arbitrary systems that admit free or oblivious routing.

Chapter 4

Source Limited Inclusive Routing

The ﬁrst problem we address is ﬁnding time efﬁcient line-switching UBM calling schedules in arbi-
trary topologies that use arbitrary oblivious routing schemes. In Section 4.1 we show that arbitrary
oblivious routing schemes are difﬁcult to model graphiwa and may exhibit poor performance.
We discuss characteristics of oblivious routing schemes and describe a realistic class of oblivious
routing schemes called inclusive routing schemes. In Section 4.1.3, we show that if we are given
only source limited routing information that we can model these inclusive routing schemes as di-
rected trees. In Section 4.2 we present several properties of multicast communication in directed
tree topologies. In Section 4.2.3, we show that node-switching UBM algorithms provide a good
approximation to line-switching algorithms in directed tree topologies, and in Section 4.2.5 we give
a lower bound for broadcast in k-ary ditrees. The results presented in this chapter can be found in

[2. 3. 4].

4.1 Oblivious Routing Models

In the previous work on multicast, authors modeled routing information as a graph. In order to
develop correct multicast algorithms, the graph model should represent the routing information

44

45

accurately, completely, and unambiguously. Current graph models address free routing, however

no model has been developed for oblivious routing.

4.1.1 Difﬁculties in Modeling General Oblivious Routing

Most of the previous work described in Chapter 3 was done using an undirected graph model.
An undirected graph models a network in which communication is symmetric and free-routing is

used. For example, consider the graph in Figure 4.1(a) that represents a bidirectional ring. 'Ihe

 

(a) Bidirectional ring (b) Unidirectional ring

Figure 4.1: Graph and Digraph representations for rings

graph represents the routing information accurately, i.e., there is no path in the graph that does
not correspond to a physical connection in the network. 'Ihe graph also includes complete routing
information, in that there are no physical routes between nodes that are not represented in the graph.
Finally, because free-routing is used, any path in the graph can be chosen to route a message. This
is unambiguously shown in the graph.

Unfortunately, the graph model can not be used to represent networks that do not provide sym-
metric communication. Suppose that we have a unidirectional ring in which the unidirectional links

are from node i to node (i + 1)mod n, where n is the number of nodes. The undirected graph in

46

Figure 4.1(a) does not accurately represent the routing information. For example, there is an edge
from node 5 to node 4, but we can not deliver a message directly from 5 to 4 in the unidirectional
ring. 'Ihus, a different model must be used to capture the routing information. The directed graph
in Figure 4.1(b) can be used to represent a unidirectional ring that supports free routing.

Both the undirected and directed graph models are used to represent systems that support free
routing. If the system employs an oblivious routing technique, the situation changes. For example,
suppose the non-symmetric direct network from Figure 2.1(b) uses the oblivious routing shown in

Table 4.1. While the digraph model in Figure 2.5(a) is accurate and complete, it is not unambiguous.

Table 4.1: Oblivious Routing Table

 

 

 

 

 

 

 

 

 

 

 

 

 

Oblivious Routing Table
source destination routing path source destination routing path
”1 ”2 ”1”2 ”3 ”4 ”3”1”4
”1 ”3 ”1”2”5”3 ”3 ”5 ”1”4”5
”r ”4 ”1”4 ”4 ”1 ”4”1
”1 ”5 ”1”2”5 ”4 ”2 ”4”5”3”2
”2 ”1 ”2”5”3”1 ”4 ”3 ”4”5”3
”2 ”3 ”2”5”3 ”4 ”5 ”4”5
”2 ”4 ”2”5”4 ”5 ”1 ”5”4”1
”2 ”5 ”2”5 ”5 ”2 ”5”3”2
”3 ”1 ”3”1 ”5 ”3 ”5”3
”3 ”2 ”3”2 ”5 ”4 ”5”4

 

 

 

 

 

 

 

There are three paths from node 111 to node 123 in the digraph (v1v2v3, v1v2v5v3, and 0111412503), but
it is not clear that the path 1211220503 must be used to send a message.

We have explored other graph models to solve the problem of ambiguity that exists in the di-
graph model. For example, one may use a multigraphl , in which each edge is labeled with the
(source,destination)-pair that uses it. Figure 4.2 shows the multigraph that corresponds to the obliv-

ious routing in Table 4.1. Unfortunately, a multigraph model can be unmanageable for even small

 

IIn a multigraph there can be multiple edges between a pair of vertices.

 

 

 

 

(5.4)

Figure 4.2: Multigraph model of oblivious routing

direct networks. Furthermore, multiple edges represent a single physical communication line. ’Ihus,
the notion of physically edge-disjoint paths is not present in the multigraph model.

We have discussed three graph models: undirected graphs, directed graphs, and multigraphs.
None of these models is able to represent arbitrary oblivious routing schemes accurately, completely,
and unambiguously. In addition to being difﬁcult to model, arbitrary oblivious routing schemes may
lead to extremely poor performance. For example, consider a ring network where the directed path
usedtotransmitfromany nodevtoany nodewisthelongerofthetwopathsintheringfrom node
v to node w. It is easy to see that performing a multicast in the ring with this oblivious routing
scheme must take 0(a) steps, where n is the number of destination nodes, as almost every pair of

directed paths share an edge.

4.1.2 Inclusive Routing

In order to develop a realistic model for oblivious routing, we need to consider the deﬁning charac-

teristics of the routing and model the Characteristics that best ﬁt practical systems. One characteristic

48

to consider is the properties of the paths speciﬁed by the oblivious routing. In a general oblivious
routing R, R contains exactly one (12,-, 1),) routing dipath for all i and j. A routing dipath may or

may not provide additional information about other routing dipaths in the oblivious routing.

Suppose the oblivious routing path p(i, j) uses intermediate node 11),, as shown in Figure 4.3.
In an arbitrary oblivious routing scheme, the p(lc, j) routing path may or may not use any of the

vertices or edges along p(i, j). On the other hand, p(lc, j) may be precisely the subpath of p(i, j)

 

----- p(kJ)
—— p(id')

 

 

 

Figure 4.3: Oblivious routing paths

from up to 12,-. 'Ihus, one property of oblivious routing is whether or not a path gives any information
about other oblivious routing paths. On one extreme, p(i, j) gives no information about any other
routing paths, i.e., p(i, j) is completely arbitrary. In the previous section we showed that arbitrary

schemes are difﬁth to model and can exhibit poor performance.

Consequently, assuming that a dipath p(i, j) provides no additional information about other
routing paths is unrealistic. 'Ihe other option is that p(i, j) does give some information about other
routing paths. We deﬁne inclusive routing schemes, a natural, realistic class of oblivious routing
schemes. In an inclusive routing scheme, if the directed path used by node v to transmit to node w
is (120,121, . . . ,vk) where v0 = v and v), = w, then the directed path used by node 1).-to transmit to
node Uj for 0 5 i < j S k is (22,-, . . . ,Uj). Note that both zy-routing on a mesh and e-cube routing

on a hypercube are speciﬁc examples of inclusive routing schemes.

49
4.1.3 Source Limited Routing Information

Another deﬁning characteristic of a system that uses oblivious routing is the amount of routing
information available to the router in each node. Nodes can store complete or partial routing in-
formation. Ideally, the router has complete routing information. However, in massively parallel
systems, which have a large number of nodes, storing all routing information may be impractical.
'Ihus, another option is that each router has only partial routing information. It is necessary for each
node to contain the routing paths it uses to deliver messages to all other nodes in the system. Given
this minimal amount of information, a node v.- knows how to route a message to an arbitrary node
Uj, but v,- does not have information regarding the dipaths that 12,- uses to send a message to it or any
other arbitrary node. We will refer to routing tables of this form as source limited routing tables.

One clear disadvantage of storing only partial information is that helpful routing information is
unavailable. For example, other informed nodes may be able to aid 11,- in performing a multicast,
but v,- cannot detect this using its limited routing information. However, source limited schemes
can store enough information to give reasonably good performance. For example, Bharath-Kumar
and Jaffe develop heuristics for NC routing, an NP-complete problem, using only source limited2
information [8]. 'Ihe heuristics performed well even though complete routing information was not
used.

We restrict our attention to inclusive routing schemes because they seem to capture the class of
oblivious routing schemes likely to be implemented in real networks. Furdiermore, we concentrate
on routing schemes that only have access to source limited routing information for three reasons:
ﬁrst, storing complete routing information may have unrealistic storage requirements; second, using
only partial routing information has yielded good results in other multicast problems [8]; third, when

used in conjunction with inclusive routing, there is a corresponding graph model that represents all

 

tharath-Kumar and Jaffe use the term local information.

50

the routing paths from the source to other nodes and some additional routing paths as well.

We use directed trees to model source limited inclusive routing. We use the deﬁnition of directed
trees given in [18]. First, we deﬁne a root of a digraph. A digraph G(V, E) is said to have a root r
if r E V, and there is a (r, v)-dipath for all 1) E V. A digraph is called a directed tree, or ditree, if
it has a root, and its underlying graph is a tree. We use a source’s routing information to construct a
directed tree that models the oblivious dipaths used in the inclusive routing scheme. Speciﬁcally, the
ditree is rooted at the source node, and the dipaths in the directed tree correspond to the oblivious
dipaths used by the source to send messages to each destination. For example, consider a direct
network with eight nodes where the directed paths used by the source node r to communicate with.
the other seven nodes are shown in Table 4.2. We can represent all these dipaths using the directed

tree in Figure 4.4. 'Ihe problem of computing a minimum length multicast schedule in a network

Table 4.2: Source Limited Oblivious Routing Table

 

 

 

 

 

 

 

 

 

 

Routes from source node r
destination dipath

v1 rvl -

”2 T”2

1);; ma

”4 7”?”4

”5 T”2”5

”6 T”2”6

”7 W2”6”7

 

 

 

that employs an inclusive routing scheme given only source limited routing information is exactly

the problem of computing a minimum length edge-disjoint multicast schedule in a directed tree T.

4.2 Properties of Multicast in Directed Trees

Because source limited inclusive routing schemes can be modeled as directed trees, we focus on the

problem of performing multicast in directed tree topologies. We have not resolved the complexity of

51

Figure 4.4: Ditree T with source node r

ﬁnding a minimum length edge-disjoint UBM schedule for ditrees, but in Section 4.2.3 we show that
solving the related problem of computing a minimum length node-disjoint broadcast schedule in a .
directed tree T provides a good approximate solution to the edge-disjoint problem. In this section we
present some properties that are useful for understanding and analyzing multicast communication

in directed tree topologies.

4.2.1 Deﬁnitions

Deﬁnition 4.2.1 A vertex v in a rooted ditree T with root r is at level i if and only if the length (in
number of edges) ofthe dipath p(r, v) in T is 1'.

Deﬁnition 4.2.2 The largest integer h for which there is a vertex at level h in a mated ditree is
called its height Let H (T) denote the height of ditree T.

Deﬁnition 4.2.3 Let T be a directed tree. If directed edge on e E(T), then u is called a child of
v, and v is called the parent of it. Let childrenﬁv) denote the set {ulu is a child of v in T}, and
parentﬁu) denote node u's parent in T. If there is a dipath from v to u in T, then u is called a
deecendantofv, andv is calledan ancestor ofu.

Deﬁnition 4.2.4 For any multicast algorithm A, let A(M, T) denote the multicast schedule pro-
duced by A for any multicast request M = (1', D) in any directed tree T rooted at r, and let
|A(M, T)| denote the length of the multicast schedule. When M is the broadcast problem, we only
write A(T) and |A(T) |. We say that A is a node-switching or line-switching algorithm ifA(M, T)
is a legal node-switching or line-switching multicast schedule, respectively, for any multicast re-
quest M in any directed tree T.

Deﬁnition 4.2.5 Let OPT-ED, OPT-ND, and OPT-NB denote the set of multicast algorithms that
always generate minimum length legal edge-disjoint, node-disjoint, and neighbor-switching multi-

52

cast schedules, respectively. Let ED, ND, and NB denote arbitrary elements of OPT-ED, OPT-ND,
and OPT-NB, respectively.

Deﬁnition 4.2.6 Let IT | denote the number of nodes in T.

4.2.2 Decomposing Algorithms

An attractive feature of node-switching that makes analysis relatively easy is that each unicast de-
composes the original multicast problem into two destination disjoint multicast subproblems. Con-
sequently, node-switching algorithms with this property are called decomposing algorithms, for-

mally deﬁned below.

Deﬁnition 4.2.7 Let T be a directed tree. For 1) E V(T), (1:) denotes the subtree of T induced by v
and all its descendants.

Deﬁnition 4.2.8 A node-switching algorithm A is a decomposing algorithm if it produces a broad-
cast schedule in which no node ever calls a descendant of a previously informed node. That is, the
only informed node on any dipath used by any call in A(T) for all ditrees T is the source of the call.

To illustrate Deﬁnition 4.2.8, suppose we want to ﬁnd a calling schedule 0 to implement mul-
ticast request M = (r, D), where D is the destination set in directed tree T. Let the call (r, v, 1)
be the single unicast in the ﬁrst step of C. After the call is made, we can decompose M into two
subproblems: implementing multicast request M " = (r, D — V((v))) in directed tree T — (v), and
implementing multicast request M ” = (v, D — V(T - (v))) in ditree (v). 'Ihe following result

shows that we can restrict our attention to decomposing algorithms with no penalty.
Lemma 4.2.1 There exists a decomposing algorithm A in OPT-ND.

Proof: Let A e OPT-ND be a non-decomposing multicast algorithm for implementing multicast
request M in ditree T, and let (r, v, 1) be the ﬁrst call in A(M, T). We will construct a decomposing

algorithm A’ such that A’ (M, T) has the same length as A(M, T) for any multicast request M in

53
ditree T. Note that all calls from nodes in T - (v) to nodes in (u) must pass through node v. 'Ihus,

only one such call can occur during any step t, and node it cannot be the source of a call during step
t. We construct A’ by replacing all calls of the form (2:, y, t) e A(M , T), where a: e V(T — (22))
and y 6 V((v)) with the call (it, y, t). 'Ihe same argument applies to the ditrees (v) and T — (v).
'Ihus, we construct multicast schedule A’ (M, T) recursively as described above. ’Ihe resulting
multicast algorithm A’ is a decomposing algorithm, and A’ (M, T) has the same length as A(M, T).

'Ihus, A’ 6 OPT—ND. Cl

4.2.3 Relating Node-switching to Line-switching

We now address the relationship between OPT-ND and OPT-ED. Speciﬁcally, we show that a
minimum length node-switching multicast schedule is at most twice as long as a minimum length

line-switching schedule.

Lemma4.2.2 There exists a family of ditrees 3T such that |ND(T)| = (2 - %)|ED(T)|, where
T E 31 and T has n nodes.

Proof: Consider the multicast request M = (or, {v2,v3, . . . , on}) (i.e., broadcast) in the ditree
T pictured in Figure 4.5. It is clear that under the node-switching model |ND(T)| = n — 1. Under
the line-switching model, or can inform U2, and then both or and up can make calls simultaneously
until all destinations have been informed. 'Ihus, |ED(T)| = 1 + "—;2- = g. Comparing these gives

“8,

IND(T)| _ n

-l 2
IED(T)| 3-

=2-—.
11

54

Figure 4.5: Node-switching vs. Line-switching

Lemma 4.2.3 For all directed trees T and multicast requests M , |ND(M, T)| _<_ 2|ED(M, T)|.

Proof: We ﬁrst state two facts about the directed paths used to perform unicasts in any time step .
of a legal line-switching calling schedule. First, for any uninformed node v, at most one unicast
dipath will usethatnode. Since the tree is directed, acall through it mustcome through thedirected
edge (parentﬂv), it). Only one call is permitted to use this edge. Second, for any informed node
v, at most two unicast paths can use it. Again one call can use the edge (parentﬂv), v). Since
it is informed, it is the only other node that can use node 12 without using edge (parentﬁv), v).
'Ihus, if two calls use 1), it must be the source of a message. Now consider all unicasts (s, d, t), i.e.,
unicasts occurring in time step t, of a legal line-switching calling schedule. Let C; be the set of all
such unicasts. Because of the previous facts, the graph induced by the unicast dipaths in C; is a
forest, say F, of directed trees in which the maximum outdegree of any node is two. Furthermore,
only leaves in F are destinations of unicast calls in step t. We will simulate the unicast calls of C;
using node-disjoint paths. For each leaf l in F, make the nearest informed ancestor responsible
for informing I. Since any such informed node can be responsible for at most two leaves, we can

simulate the unicasts of C; in at most two steps using node-disjoint paths, and the lemma follows. C!

From Lemma 4.2.3, we know that a proof that |ND(M,T)| 2 c immediately translates into

a proof that |ED(M,T)| 2 g. Similarly, if we ﬁnd an algorithm A such that |A(M,T)| 5

55
clND(M,T)| for any multicast request M on ditree T, then we know |A(M,T)| g 2c|ED(M,T)|

for any multicast request M on ditree T. Therefore, we focus on lower bounding and approximating
the minimum length of an optimal node-disjoint multicast schedule in the remainder of this paper,
keeping in mind that we can directly use these results to approximate the minimum length of an

edge—disjoint multicast schedule.

4.2.4 Reducing Multicast to Broadcast

Within the setting of node-switching algorithms, we now show that we can reduce any multicast
request M in ditree T to an equivalent broadcast request in ditree f (T) where f is deﬁned be-
low. Therefore, we are particularly interested in efﬁcient implementations for broadcast requests in

ditrees.

Deﬁnition 4.2.9 For any ditree T and multicast request M = (r, D), deﬁne ditree f (T) as follows.
First, let I (T) = T. Suppose there arei levels in T. Consider each node v in leveli sequentially. If
0 ¢ D, then let children KT) (parentT(v)) be children 10‘) (parentT(v)) U childrenﬂv) — {v}
and remove vfmrn T. Continue this process for levelsi - 1, . . . , 1.

Figure 4.6 gives an example of the mapping f. The multicast request M =
(r, {02, v3,v4,v5,v7, vs, ”10}) in directed tree T is transformed to its corresponding broadcast re-

quest in ditree f (T).

 

Figure 4.6: Mapping'multicast to broadcast.

56

Lemma4.2.4 For any multicast request M = (r,D) in any directed tree T, |ND( f (T))| =
|ND(M,T)|.

Proof: Let ”(a y) denote the directed path p(m, y) in a ditree T. It is clear from Deﬁnition 4.2.9
that for any node a: E {r} U D, :r: 6 p,(7~)(i,j) ifand only ifs: E pp(i,j); the mapping f simply
eliminates nondestination nodes from pﬁi, j) and leaves all destination nodes in the same order.

First, we show that if C' is a legal node-switching calling schedule for implementing M in T,
then C is also a legal node-switching calling schedule for implementing broadcast in f (T). This
implies that |ND(MrTll Z |ND(f(T))|-

Let C be a legal node-switching calling schedule for implementing M in T. Furthermore, let
(i,j, t) 6 C(M,T) and (k,l,t) E C(M,T). Suppose thatp,(T)(i,j) andpf(7~)(lc,l) intersect at
node 1:. Then, a: 6 {r} U D. We know that since :1: E pf(T)(i,j) and a: 6 pf(7~)(k,l), a: is also
on both m(i,j) and pT(k,l). Thus, pT(i,j) and pp(k,l) are not node-disjoint, contradicting the
fact that C is a legal node-switching calling schedule. Thus, our assumption that p I(T) (i, j) and
p !(T) (k, l) intersect is false, and C is a legal node-switching calling schedule for implementing
broadcast in f(T).

Next, we show that the converse is true. That is, if C is a legal node-switching calling schedule
for implementing broadcast in f (T), then C is also a legal node-switching calling schedule for
implementing M in T. This implies that |ND( f (T))| 2 |ND(M,T)|.

Let C be a legal node-switching calling schedule for implementing broadcast in f (T), and let
(i,j, t) and (k, l, t) be two unicast calls that occur in C(f(T)). Suppose thatpT(i,j) and pp(k,l)
intersect. Let node :1: be the ﬁrst vertex that m(i,j) and pT(k,l) havetin common. It must be the
case that :1: ¢ {r} U D, otherwise p,(T)(i, j) and p,(T)(k, l) intersect. Thus, there are two distinct
directed edges or: (on pp(i, j )) and um (on pﬁk, l)). This results in a contradiction, because each

node 2: in a ditree can have at most one parent. Thus, our assumption that m(i,j) and pp(k,l)

57

intersect is false, and C is a legal node-switching calling schedule for implementing M in T.
Since |ND(M,T)I 2 |ND( f (T))| and |ND(f(T))| Z |ND(M,T)I, we can conclude that

|ND(f(T))| = |ND(M,T)I- 0

4.2.5 Lower Bound on Broadcast in k-ary Ditrees

Recall from Chapter 3 that for undirected graphs (and therefore undirected trees) there is always a
line-switching broadcast schedule of length [lg n] , where n is the number of nodes in the graph.
This is not always the case in directed trees. In this section, we prove that perfonning broad- ’
cast in any full k-ary directed tree T" of order it requires 0(k log,‘ n) time steps under the line-
switching model. We prove this lower bound by showing that |NB(T")| = klog,c n. We then
show that |NB(T)| _<_ |ND(T)| + H(T) - 1 for all ditrees T. Since H(T") = log, n, it fol—
lows that |ND(T")| = (2(lclog,g n). Combining this result with Lemma 4.2.3, we show that
IED(T")| = “(k 103:. n)-

First we give a lower bound on the length of an optimal neighbor-switching algorithm that

implements broadcast.

Lemma 4.2.5 |NB(T")| = kllog,c (71)]. where T" is a full hoary ditree of order 7:.

Proof: Let T" be a full k-ary ditree of order it. Since each node in the ditree must inform all
of its children sequentially, the last node in level i receives the broadcast message at time k - i.
Since the last node to receive the message is at level (log,‘ n J , it follows that the time to complete

broadcast in T" is kllogk n]. C]

Now we show the relationship between |NB| and |ND|.

58
Lemma 4.2.6 For all directed trees T, |NB(T)| 5 |ND(T)| + H (T) — 1.

Proof: We prove this by induction on H (T).

The base case when H (T) = 1 is obvious.

Now consider the inductive case. Assume that the lemma holds for all directed trees T’ where
H (T’ ) 5 It. Now consider an arbitrary directed tree T with H (T) = k + 1. Let the root of T have

m children. Name each of the m subtrees rooted at the children of T as T,- for 1 S i S m such that

|ND(T1)| Z |ND(T2)| Z Z |ND(Tm)|-

Notethatineach step ofND, the root ochan only inform anode inone ofthem subtrees.
Thus, ND cannot begin informing nodes in m — 1 of the subtrees until the second step. It cannot

begin informing nodes in m — 2 of the subtrees until the third step, and so on. This implies that

|ND(T)| Z M{IND(T1)I,IND(T2)I+1,...,IND(T;)I+£- 1:°-°1|ND(TM)|+ 7" — 1}- (4-1)

We deﬁne a neighbor-switching algorithm for broadcast as follows. The root of T informs the
root of T1 in time step 1, the root of T2 in time step 2, etc. In general, the root of T informs the root
of T,- in time step i. An optimal neighbor-switching algorithm, NB, clearly does at least as well as

the algorithm we have deﬁned, so

|NB(T)| S W{|N3(Tr)| + 19 |NB(T2)| + 2,. - - r |NB(’1'i)| +13 - - - r |NB(Tm)| + m}- (4.2)

Let i be the index that maximizes |NB(T§)| +5 for 1 _<_ i 5 m. From Equations 4.2 and 4.1,,we
know that |NB(T)| _<_ |NB(T,-)| + i and |ND(T)| 2 |ND(T,-)| +i — 1, respectively. In addition, we
know that H (T5) 5 H (T) - 1 for all 1 g i _<_ m. We apply our inductive hypothesis and these facts

to complete our proof as follows.

59

|NB(T)| |NB(Ti)I +i
(|ND(Ti)I +H(Tt') - 1) +i
|ND(T)| + H(Tt)

|ND(T)| + H(T) -1

IA lA IA IA

We use the previous results to derive a lower bound for the length of an optimal line-switching

algorithm in a directed tree T.

Theorem 4.2.1 |ED(T")| _>_ W = Si(li:log,g n) for any full k-ary directed tree T" of
order 71.

Proof: From Lemma 4.2.3, we know that
IED(T")I 2 §IND(T")I.
Using Lemma 4.2.6, we have
IED(T")| 2 -;-(INB(T")I 4 H(T") +1).
Substituting from Lemma 4.2.5, we get
|r~:o(:r'=)| 2 é-(lclogk n — H(T") + 1).

Finally, since the height of a full k-ary tree is log,: n, it follows that

1+(k—l)logkn
2 .

 

|ED(T")| 2 $(ltlog,c n — log, n +1) =

We now extend the lower bound from Theorem 4.2.1 to a lower bound for edge-disjoint broad-

casting in general digraphs.

Corollary 4.2.1 There exists a digraph G with n nodes and with maximum indegree and outdegree
of k such that |ED(G)| = GU: log,‘ n).

60
We construct a digraph G as follows. Let T; and T2 be full k-ary trees of order 3!, rooted at

r1 and r2, respectively. Orient all edges of T1 away from n, and orient all edges of T2 toward r2.
We call r1 the source root and r2 the sink mot. For each node v in T; with outdegree zero, make a
directed edge ow, where w is the corresponding node with indegree zero in T2. Finally, add directed

edge 1‘21‘1. Figure 4.7 shows an example of digraph G when k is three. Suppose that our routing

 

Figure 4.7: Broadcast requires 90: log,‘ n).

scheme R includes all the dipaths in G. We then have a routing scheme in which each node has at
leastonedipath toany othernode inthenetwork. Thus, ifwecanlowerboundthetimetoperforrn
broadcast under the line-switching model in this network, we also lower bound the time to perform
broadcast under the line-switching assumption in this network with some restricted routing scheme.
The key observation is that at most one dipath in any time step can use edge rzrl. Thus, performing
broadcast in this line-switched network from the source root, n, can be done no faster than twice as

fast as performing broadcast in a full k-ary tree. ’ The number of nodes in the network modeled by

61

G is twice the number of nodes in one of the full k-ary trees. Thus, the general lower bound result

follows.

4.3 Summary

The line-switching model of communication represents networks that use circuit rooﬁng techniques
such as wormhole routing. Although the multicast problem in such systems has been solved using
a graph model, the graph model cannot accurately represent the oblivious routing information used
in current multicomputer systems. We show that directed trees can be used to represent a realistic

class of routing schemes called source limited inclusive routing schemes. We show that multicast
requests in such systems can be transformed into equivalent broadcast requests with only a constant
factor difference in the length of the calling schedule. We then address broadcast in directed trees,

showing that ED(T) = RU: log,‘ n) for the complete, balanced k-ary directed tree T.

Chapter 5

Node-Switching Algorithms for Directed

Ti'ees

In Chapter 4, we described a realistic class of routing schemes, called source limited inclusive
routing schemes, that can be modeled by directed trees. In addition, we showed that, in the context
of node-switching UBM algorithms, a multicast request in a ditree T is equivalent to a broadcast
request in a related ditree T. Thus, we can limit our attention to performing broadcast in ditrees. In
this chapter, we describe three node-switching UBM algorithms for broadcast in directed trees. In
Sections 5.1 and 5.2 we describe two approximation algorithms for ﬁnding minimum length node-
switching schedules for arbitrary ditrees. The Smart Centroid Algorithm, presented in Section 5.2,
produces calling schedules that are at most twice the length of an optimal node-switching schedule.
In Section 5.3, we present an optimal node-switching UBM algorithm for broadcast in directed

HUGS.

62

63
5.1 Centroid Algorithm

In Section 2.5.4, we describd the technique used for generating a minimum length line-switching
calling schedule for a multicast request M = (s, D) in a direct network that admits free routing.
First, a trail containing the source and all destinations is constructed. Then, the trail is recursively
broken into two trails of equal length during each time step. Breaking the problem in half at each
step leads to [lg(|D| + 1)] steps to complete the multicast request. In this section, we present the
Centroid Algorithm (CA) [3], which is built on a similar idea. At each step we attempt to break
the directed tree representing the routing information into two nearly equal subtrees. We ﬁnd an .
edge whose removal leaves two nearly equal sized ditrees. This edge will be adjacent to a centroid
point of the underlying tree. We formally deﬁne the following terms that are used in the Centroid

Algorithm.

5.1.1 Algorithm Description

Deﬁnition 5.1.1 An end vertex is a vertex that has degree 1.

Deﬁnition 5.1.2 A‘ branch at a vertex v of a tree T is a maximal subtree containing 1) as an end
vertex.

Deﬁnition 5.1.3 The weight at a vertex v of T is the maximum number of edges in any branch at 0.
Deﬁnition 5.1.4 A vertex c is a centroid point of a tree T if c has minimum weight, and the centroid
of T consists of all such vertices.

To create a node-disjoint broadcast schedule S for a ditree T, the assignment S = GA(T, 1, 0)
must be made. That is, the initial parameter for CA are the entire ditree T rooted at r, time step
1, and an empty calling schedule. The CA algorithm recursively creates a node-disjoint broadcast

schedule for T as follows:

64

 

Centroid Algorithm [CA] (T: ditree rooted at r; t: time step; S: calling schedule)

If |V(T)| = 1 return S
Find a centroid point c of T
Let Cr,G2,...Gmbethe components ofT—c
Let T1 = 05, where [Ci] Z Ile Vj #i
If? 6 T1 then
S = SU {(136, t)}
else
(a)Letc’betherootofT1
(b)5=3U{(r.C’.t)}
Leth =T-T1
7. S = CA(T1,t + 1,3) UCA(T2,t + 1,3)

.V'PP’N?‘

9‘

 

 

 

Suppose that T is a ditree with maximum outdegree 1:. CA is a simple recursive algorithm that
works as follows. In each step, we ﬁnd a centroid point c of the underlying tree of T. Removal of c
dividesTintomsubtrees, whereO _<_ m g k+1. Weidentifythelargestoneofthesemsubtrees,
andcallitTl. IftherootofT1 isachildofc, then informtherootole. Iftherootole equals

the root of T, then inform c. Now divide the problem into the two subproblems T1 and T - T1, and

continue.

Figure 5.1 shows the operation of CA on a broadcast request in an eight node ditree. The dark

nodes are informed before the time step shown under each set of directed trees. First, the node to be

 

step 1 step 2 step 3

Figure 5.1: Centroid Algorithm at work.

65

called in the ﬁrst step must be determined. Node 3 is chosen as a centroid point of T. Since, the root
(node 1) is in T1, the centroid point of T is informed in step 1, i.e., the call (1,3,1) is made. The
tree is then broken into two smaller trees, one rooted at node 1 and the other rooted at node 3, and
the algorithm continues. For the ditree in Figure 5.1, CA produces an optimal calling node-disjoint

schedule ( lg(7 + 1) = 3 time steps).

Theorem 5.1.1 |CA(T)| = 0(loggfl |T|) for any directed tree T with maximum outdegree 1:.

Proof: Let c be a centroid point of T’s underlying tree. Removal of c leaves at most k + 1 ,
components. Let T’ be the largest of the k + 1 components. Because c is a centroid point,
|T’| 5 L121 [28, 17]. Because T’ is the largest ofthe I: + 1 subtrees formed around c, |T’| 2 1111,31.
Thus, removal ofT’ from T gives, |T — TI 5 |T| - Lag—1 = 12%|“ + Fifi: Thus, the maximum

size of the problem decreases by a factor of at least pkg in each time step, and the result follows. [3

5.1.2 Time Complexity

Let TCA (n) be the worst case time complexity of the centroid algorithm on a ditree with n nodes.
Fust, note that ﬁnding the centroid of a tree requires 0(a) time. Thus, we have the following

recurrence relation for TCA.

TCA(n) = aln + TCAUWI) + TCA(n — kn) + (12, where 0 < k < 1
T0) = 013
The CA algorithm is similar to the traditional QuickSort algorithm. In the worst case, the ditree
T will be divided into two ditrees of size n - 1 and 1 (e.g., when T is a star). Thus, CA has worst

case complexity of 0(n2). However, when the ditree T has a uniform structure and can be divided

66

into nearly equal sized subtrees, the complexity is 0(n lg n).

While the centroid algorithm does not produce optimal length broadcast schedules for all ditrees,
its greatest advantages are its simplicity and speed for well-balanced ditrees. In the next section we
show that the Centroid Algorithm produces optimal length broadcast schedules for some well known

direct network topologies.

5.1.3 CA in Meshes and Hypercubes

Because the number of messages that are sent in each step of a multicast schedule for M = (s, D)
can at most double, the ideal number of time steps needed to complete a multicast is [lg(lDl + 1)]. '
The centroid algorithm runs in 0(|D| lg |D|) time and produces Optimal length, i.e., [lg(lDl + 1)],
calling schedules for implementing broadcast in both mesh and hypercube topologies. Meshes use
a speciﬁc type of inclusive routing called ary-routing. Under this scheme, a message is sent ﬁrst
in the x direction until it reaches the correct column. Then, the message is sent in the y direction
until it reaches its destination. Because xy-routing is an inclusive routing scheme, we can represent

source limited :ry-routing information using a directed tree. Figure 5.2 shows the directed tree that

Viiiiiiiiomm

l l

2 nodes

 

 

 

 

 

 

Figure 5.2: Directed tree representing 2" x 2'" mesh.

corresponds to the restricted routing of a 4 x 8 mesh when the node shown in black is the source

of the broadcast. We show that the centroid algorithm produces an optimal line-switching calling

67
schedule for broadcasting in 2" x 2'" meshes.

Theorem 5.1.2 CA produces optimal length node-disjoint broadcast schedules in 2" x 2'" meshes.

Proof: Suppose we have a broadcast request from node 3 = (row, col) in a 2" x 2’" mesh.
We prove by induction on m that CA produces an optimal length broadcast schedule for a tree
representing the restricted routing of a mesh.

The base case is m = 0, in which we have a 2" x 2° mesh, i.e., a path with 2" nodes. Note
that any centroid point of a path of length n has weight [12!]. Thus, at each step CA will break the
path into two paths with 23"- nodes. Consequently, CA will produce a broadcast schedule of length '
lg 2" = k, which is optimal.

Assume that CA produces optimal length broadcast schedules for all 2" x 2’ meshes, where
0 _<_ i _<_ m. Now, we prove that CA produces an optimal calling schedule for a ditree represent-
ing the restricted routing for a 2" x 2m“ mesh. Suppose that the source of the broadcast is at
mesh coordinate (row, col). The centroid of the directed tree representing the mesh consists of the
following two centroid points:

(1) (row, 2323)

2m+1

(2) (row, T + 1)

2m+1

First, consider case (1). If col > T, then the centroid point (row, 251‘) is called in the ﬁrst step.
If col 5 ﬂail, then (row, gﬂ + 1), i.e., the root of T1, is called in the ﬁrst step. Now consider
case (2). Ifcol < 1311 + 1, then the centroid point (row, E; +1) is called in the ﬁrst step.
Otherwise node (row, #) is called.

In either case, after the ﬁrst call the original ditree is partitioned into two subtrees that represent
meshes of size 2" x 2'”. Thus, our inductive hypothesis holds for the two subtrees. Let C(k, m) be

the length of the broadcast schedule produced by CA for a 2" x 2'” mesh. Then,

68

C(k,m +1) = G(k,m) +1

lg(2" x 2’") +1

= lg2" +lg2'" +lg2
lg 2" + lg 2'"+1
lg(zh X 2m+1)

Thus, CA produces an optimal length edge-disjoint broadcast schedule for all 2" x 2'" meshes, for
all m 2 0.

Because the directed tree that represents the restricted routing in a 2" x 2'" torus is identical to -

a directed tree that represents a 2" x 2'” mesh, we have the following corollary to Theorem 5.1.2.

Corollary 5.1.1 CA produces optimal length edge-disjoint broadcast schedules in 2" x 2’" tori.

Hypercube topologies also use a speciﬁc kind of inclusive routing called dimension-ordered
routing. In this scheme, the dimensions of the hypercube are ordered, and a message follows edges
to its destination in increasing order of dimension. The restricted routing information in a hypercube

with 2" nodes can be represented by the directed binomial tree Bk, deﬁned as follows.

Deﬁnition 5.1.5 The binomial tree Bo consists of a single node. The binomial tree B), consists of
two binomial trees Bk..1 that are linked together such that the root of one is the leﬁmost child of the
root of the other:

Figure 5.3 shows the general form of a binomial tree. Figure 5.4 shows the directed tree that
represents the restricted routing in a 16-node hypercube when broadcasting from node 0001. It is
easy to verify that the ditree in Figure 5.4 is the directed binomial tree 34. We show that CA creates
an optimal calling schedule to implement a broadcast request from the root r in a directed binomial

tree, i.e., a hypercube.

69

 

 

Figure 5.3: The binomial tree Bk.

 

01m 1100 0101 1101

 

 

 

9 . Source
/T 0 Destination
0000 1000 WI 1001

is d

0110 1110 0111 1111

 

(X110 1010 (Kill 1011

Figure 5.4: Ditree representation of a 16-node hypercube.

70

Theorem 5.1.3 CA produces optimal length edge-disjoint broadcast schedules in k-dimensional
hypercubes.

Proof: We prove by induction on k that CA produces an optimal length broadcast schedule in the

directed binomial tree Hg. The base case k = 0 is obvious.

Assume that when k = n, that CA produces an optimal length broadcast schedule in the directed
binomial tree B”, where n 2 0. We must show that when k = n + 1, that CA produces an optimal
length broadcast schedule in the directed binomial tree Bu“. First, note that the centroid of Ba“

contains two nodes:
(1) the root of B...”
(2) the leftmost child of the root of 3,,l , i.e., the root of a 8,. subtree.

For example, in Figure 5.3, nodes r and r’ are in the centroid. In either case, the ﬁrst unicast call
produced by CA is (r, r’,p(r, r’), 1). Then, the tree is divided into two 8,. subtrees, one rooted at r
and the other rooted at r’. We know from our induction hypothesis that CA produces optimal calling
schedules for the 3,, subtrees. Let G (n) be the length of the broadcast schedule for 3,, produced

by CA. Then,

G(n +1) = C(n) +1
lg(2") + 1
= (n) + l

= lg(2n+l)

Thus, CA produces an optimal length edge-disjoint broadcast schedule for Bk, for all h 2 0. CI

71
5.2 Smart Centroid Algorithm

While the Centroid Algorithm works well on many trees, particularly those which are relatively
uniform in structure or those which have small maximum outdegree, it can perform quite poorly on
others. The problem results from the fact that the algorithm always chooses to inform the subtree
with the maximum number of nodes ﬁrst. In some cases, this is a poor choice, as another subtree
with slightly fewer nodes may require more time to perform a broadcast. To illustrate this concept,

consider the tree T in Figure 5.5, which has nh nodes. The root of T is its centroid point, and it has

 

 

 

 

T:
c pathsoflength
(n-1)/k
”e e e C) 0
paths .r
of
length , 0 . .
n 1
|:_: : : mil
2.:— -
_l l is so a; o a; J
[(k-1)n] nodes (n-l) nodes

Figure 5.5: CA performs poorly.

hchildren. Theﬁrstk- 1 children areallrootsofsubtreesthatarepathsoflength n. Thelcthchild
istherootofatreewiththesamestructure asTbuthasonlyn - 1 nodes. Thecentroidalgorithm
will produce a calling schedule of length (1(lclog,c |T|). whereas the optimal calling schedule will
have length 0(k + log), |T|).

We present below a new algorithm which we call the Smart Centroid Algorithm (SCA) [3].
Similar to the CA algorithm, S CA is a recursive algorithm that creates a node—disjoint broadcast
schedule S for a ditree T using the statement S = SCA(T, 0, 0). SCA is also based on ﬁnding

a centroid of T, however SCA employs a better heuristic than the GA algorithm for determining

72

which child of the centroid point to inform ﬁrst. Con sequently, we are able to guarantee that

|SCA(T)| S 4|ED(T)| for all directed trees T.

5.2.1 Algorithm Description

We describe two algorithms: Smart Centroid Algorithm (S CA) and Smart Time (ST). ST returns
the number of time steps necessary to complete broadcasting using SCA, i.e., |SCA(T)|. S CA
generates a calling schedule and uses ST to determine the order in which unicast calls are to be
made. The Smart Centroid Algorithm works as follows. After identifying a centroid point c which
has outdegree m - 1, we decompose the problem into the subproblems Cr , Cg, . . . , Cm determined
by the components of T—c. We compute |SCA(C,-)| for each i g m. The root begins to inform these
components in non-increasing order of calling schedule length. When the component containing the
root of T is the next subtree to be informed, the root informs the centroid c. In subsequent steps, c
is responsible for informing its remaining uninformed children, freeing the root to inform its own

component.

 

Smart Time [ST] (T: ditree rooted at r)

If [V(T)| = 1 return 0
Find a centroid point c of T
Let Cr, 02, . . . , Cm be the components ofT -— e
Let node c,- be the root of component C,-
For i = 1 to m do
Let max,- = ST on 0:, rooted at c,-
Reorder maxi, 1 S i g m. such that marl 2 771.0152 2 2 max",
7. Return maxlSi5m{max,- + i}

$09939?"

9‘

 

 

 

Lemma 5.2.1 ISCA(T)] S |ED(T)I + lg ITI for all directed trees T.

Proof: We prove this by induction on ITI.

73

 

Smart Centroid Algorithm [S CA] (T: ditree rooted at r; t: time step; S: calling schedule)

If|V(T)| = 1 return S
Find a centroid point e of T
Let C1, Cg, . . . , Cm be the components of T — c
Let node c.- be the root of component C,-
Fori = 1 to m do
Let time,- = ST(C,')

6. Reorder the components such that timer 2 timeg 2 2 timem
7. Lett' = 1
8. While(r;éc,-)do
9. Lett = t-l-l
10. S=SU{(r,c,-,t)}
11. S = SCA(C,-,t,S)
12. Leti = i + 1

End while
13. S=SU{(r,c,t+l)}
14. S: SCA(C,',t+1,5)USCA(T-Ujsng,t+1,5)

V'P‘P’PT‘

 

 

 

The base case when |T| = 1 is obvious.

Now consider the inductive case. Assume that the lemma holds for all directed trees T’ where
IT’I g j. We use the inductive hypothesis to show that the lemma holds for any directed tree T,
where j < IT] 3 2j. Consider an arbitrary directed tree T with j < [TI 3 2j. Find a centroid
point of this tree T. For the sake of simplicity, assume it is the root (the other case is handled in a
nearly identical fashion). Let the root have m children. Break the tree into the m distinct subtrees
rooted at the children of the root. Name these m subtrees so that IS CA(T1)| Z I S CA(Tg)| 2 -- - 2
|SCA(Tm)|. Note |T§| g [Ii] 3 j for 1 5 i g m, so the induction hypothesis applies to each of

these subtrees. Note also

ISCA(T)| = maX{ISCA(Tt)I +1.ISCA(T2)| + 2. . - - . ISCA(Tt)i + i. - - - r ISCA(Tm)! + m}-

Now let o(i) be a permutation on {1, - - - ,m} such that |ND(T,(1))| 2 |ND(T,(2))| _>_ _>_

|ND(T,,(m))|. Note, in any time step of ND the root can only inform nodes in one of these m

subtrees. This implies that

|ND(T)| Z max{|ND(Ta(1))|v - - - a |ND(Ta(i))I +7; - 1: - - ° v |ND(Ta(m))I + m _1}'

To relate these two sets of values, we know from our induction hypothesis that |ND(T1)| 2
13mm): —1gj. so woman! 2 ISCA(T1)! —1gj. Similarly. |ND(T2)I 2 ISCA(T2)I — 1g,-
and |ND(T1)I Z |50A(T1)| - 131' Z |30A(Tz)! -1gJ°. so |ND(T0(2))| 2 |50A(T2)l - 131'. In
general. |ND(Ta(.-))| 2 ISCA(72)I -1gj.

Now let 1 5 i S m be the value such that |SCA(T)| = |.S'C'A(T.-)| + i. Combining this with

the above fact, we have

iND(Ta(i))I Z |50A(Ti)l - 181'
|ND(Ta(i))l Z ISCAlTN - i “131'
|ND(Ta(i))l + i + lgj Z ISCA(T)|

We know that |ND(T)| 2 |ND(T,(,))| + i - 1, thus

|50A(T)| (|ND(Ta(i))l +i -- 1) + 1 “U

S

g |ND(T)|+1+lgj
s |ND(T)|+lg21'
S |ND(T)I+lngl

and we are done.

Lemma 5.2.2 IS CA(T)| S 2|ND(T)| for all directed tree: T.

Proof: This follows from Lemma 5.2.1 and the fact that |ND(T)| _>_ lg |T|. (:1

Theorem 5.2.1 IS CA (T)! g 4|ED(T)| for all directed trees T.

75

Proof: This follows from Lemma 5.2.2 and Lemma 4.2.3. C]

If the centroid point divides T into subtrees in which the largest tree requires the most time to
perform broadcast, then the CA and SCA algorithms will produce the same broadcast schedule for
T. For example, in the directed binomial tree 8.3, there are two centroid points. In either case,
B], will be divided into two Bk_1 directed trees. Thus, CA and S CA produce the same broadcast
schedules for binomial trees (or equivalently for hypercube topologies). It is easy to see that CA and
S CA also produce the same broadcast schedules for mesh topologies. This leads to the following

corollaries of Theorem 5.1.2 and Theorem 5.1.3.

Corollary 5.2.1 S CA produces Optimal length edge-disjoint broadcast schedules in 2" x 2'" meshes
(and tori).

Corollary 5.2.2 S CA produces optimal length edge-disjoint broadcast schedules in k-dimensional
hypercubes.

5.2.2 Time Complexity

Because 5 CA employs a better heuristic than CA we can guarantee that it will produce a broadcast
schedule with length at most twice the optimal. However, this improvement over the Centroid
Algorithm necessitates an increase in algorithm complexity. We show that S CA has worst case
complexity of 0(113).

First, we must calculate the complexity of the Smart Time algorithm. Suppose removal of the
centroid divides the tree into m components. Then we have the following recurrence relation for

the time complexity of ST, denoted T37"-

aln + TST(al(n —1))+---+ T57(am(n —1))+ Cgmlgm + 03171 + 024,
where 2'" (11:1

12:1
TSTU) = 05

TST(”)

76

Similar to the CA algorithm, the worst case occurs when m - 1 components are size 1, and one

component is size n — m. Thus, the worst case complexity for ST is
T5107.) = out + T(n - m) + 02171. lg m + 03171 + a4.

Solving this recurrence gives

T51(n) = 0(112).

We use this to calculate the worst case complexity, TSCA (n), of the Smart Centroid Algorithm.
In step 5, ST is called m times. In the ideal case, each of the m components has size %, and the
complexity of ST is 0(a). Since ST is called 0(a) times, step 5 has complexity 0(n2). The worst
case _for the ST algorithm is when m — 1 components have size 1 and the remaining component has
size n - 1 - m. In this case the complexity of step Sis 0(n2) + (m -1)O(1). Since m S n -1, the
complexity of step 5 is 0(n2) in all cases. We have the following recurrence relation for the worst

case complexity of SCA.

T301101) = (1117.2 + agn lgn + (1311 + TSCA (n -1)+ (14
TSCAU) = as I

Solving this recurrence gives,

TSCA(”) = 0013)-

In the best case,

n
Isa/1(a) == 01112+agnlgn+a3n+2TscA(§)+a4

= 0m?)

77

While S CA’s complexity is not markedly worse than that of the CA algorithm, direct networks
with uniform directed tree representations (such as meshes, hypercubes, and tori) can save time
using the CA algorithm to calculate edge-disjoint broadcast schedules. For general topologies that

use inclusive routing, S 0.4 will provide a better approximation.

5.3 Optimal Decomposing Algorithm

Neither the CA algorithm nor the SCA algorithm is guaranteed to produce a minimum length broad-
cast schedule for every directed tree. In this section we present a 0(n3) algorithm that always

produces minimum length broadcast schedules [4].

5.3.1 Deﬁnitions

Recall from Lemma 4.2.1 that there is an optimal node-disjoint multicast algorithm that is a de-
composing algorithm. It should be clear that of the decomposing algorithms, we can focus on those
algorithms in which the root is not idle until all its children are informed. We now deﬁne some terms
that we will use to both describe our optimal decomposing algorithm and to prove its optimality.
We will illustrate these deﬁnitions using the following two minimum length broadcast schedules for

the ditree of Figure 4.4. Note that these are the only minimum length broadcast schedules for T.

61(T) : {(r1v2!l))(r'v112)!(v21v612)1(r’v313)9(Uziv‘93)’(v6!0793)3(W’v5!4)}

02(T) = {(r,v5,1),(r,v2,2),(v3,v7, 2), (r, v1.3). (v2,v4,3), (r, 123,4), (v2,v5,4)}

Deﬁnition 5.3.1 Let C (T) be a node~disjoint broadcast schedule for ditree T. We deﬁne the root
schedule C, (T) of C(T).to be {(12, w, t) E C(T) I v = r}, i.e., the set ofcalls made by the root.
Forl S i S |C,(T)|, we deﬁne 1),-(Cr (T)) = w where (r,w, i) 6 CAT). i.e., the i‘h node informed
directly by the root. For 0 S i S lCr(T)|, we deﬁne T.- = T — U;=1 (12j(Cr(T))). i.e., the subtree
of T that the root still has responsibility to inform aﬂer the ith time step of C (T).

78

C}(T) {(23:22,l),(r,v1,2),(r,v3,3)}

C3(T) = {(T, v5i1)r(rr 112,2),(1', U1,3),(7’,‘U3,4)}

.Zoqﬂe Gm Q

000
0

T T T T

0 l 2 3

:E:M “9 Q

T T T T T

0 l 2 3 4

Figure 5.6: T,"s for broadcast schedules Cl(T) and 02(T).

Deﬁnition 5.3.2 Suppose C (T;) is a broadcast schedule for ditree T, with root 1'; such that
(mu, 1) E C(T.) We deﬁne C(T)-+1), where Ti.” = T,- — (v), to be the broadcast schedule
C (T.) modiﬁed as follows. All calls that involve nodes in (v) are deleted. All remaining calLs have
their time steps decremented by I .

Since To = T.
01(T1) = {(rsvlvl)i(rtv3s2)}'

02(7)) {(r,v2, l), (r,v1,2), (v2,v4, 2), (ram, 3), (v2, v5,3)}

Deﬁnition 5.3.3 Let n = lCrSTH- The labor vector of broadcast schedule C (T) in ditree T.
denoted chT), is the vector VC(T) = (|C(To)|, |C(T1)|,;. . , |C(Tn)|). Let (VC(T)).- be the ith

element ofVC(T), i.e., (VC(T)); = |C(T,--1)|. Note that (VC(T))1 = |C(To)| = |C(T)| and that
(VC(T))n+I = lC(Tn)l = 0'

79

from = (4, 2, 1, 0)

Vera) = (4, 3, 2, 1, 0)

Deﬁnition 5.3.4 A broadcast schedule C (T) is a least labor broadcast schedule if @(T) is lexico-
graphically minimal over all broadcast schedules for T. The corresponding vector Vc (T) is called
a least labor vector: In general, let LLl/(T) denote the least labor vector of T.

Deﬁnition 5.3.5 Let M LS (T) be the set of all minimum length broadcast schedules for ditree T,
and let LLS (T) be the set of all least labor broadcast schedules for ditree T.

It is not hard to verify that both C 1 (T) and 02(T) are in M LS (T) and that C 1(T) E LLS (T)

while 02(T) g LLS(T).

Deﬁnition 5.3.6 We deﬁne the set of minimum ﬁrst calls ( MF C) of ditree T = (r) to be

MFC(T) = U {(r,v.-, l) E C}.
C6MLS(T)

Deﬁnition 5.3.7 We deﬁne the set of least labor ﬁrst calls ( LLF C) of ditree T = (r) to be

LLFC(T)= U {(r,u,-,1)eC}.
CELLS(T)

In our example,

MFC(T) = {(mvz. 1), (73116, 1)}
LLFC(T) = {(r,v2, 1)}

We now describe a decomposing algorithm ODA which is in OPT-ND. The algorithm ODA
will work on a tree T in two phases, one bottom up and one top down. Let U be a node in T with
p children (1,- for 1 g i _<_ p. In the bottom up phase, ODA computes the root schedule Sr((v))
and the least labor vector 173((v)) of a least labor broadcast schedule S ((21)) for ditree (1;) using
the root schedules and the least labor vectors of least labor broadcast schedules S ((d,)) previously

computed for ditrees ((1,) for 1 g i 5 p. In the top down phase, ODA pieces together the various

80

root schedules and least labor vectors to compute a least labor broadcast schedule ODA(T) for the

entire tree T.

5.3.2 The Optimal Substructure

The heart of the bottom up phase of ODA lies in determining which node of a directed tree T = (r)
should be informed in the ﬁrst time step. The following two lemmas are used to determine which

node to call in the ﬁrst time step.

Lemma 5.3.1 Let T and T ' be ditrees rooted at nodes r and 1", respectively. Let each of r and 1"
have p children d1, d2, . . . , d, and d’1 , dfz, . . . , d}, respectively, with the following properties:

(i) The subtrees (dg) and (d:) are isomorphic for all i < p
(ii) LLV((dpl) S LLV((d§,)).
Then. LLV(T) S LLV(T’).

Proof: Let C E LLS(T’). The basic structure of our proof is as follows. Let 0' 6 LLFC(T’).
We show that there exists a schedule F for T with the property that I71: (T) S l7c(T’). In order to

do this, we will choose a node v 6 V(T) such that
iF(T)| = maXHND(<v))l, END(T-(v})|}+1 _<_ maX{|ND((v'))|,|ND(T'-(v'))l}+1=|C(T')l.

and

LLV(T — (1))) S LLV(T' — (12').

Since C E LLS(T'), this proves that LLV(T) g LLV(T’). We prove the result by induction
on n = lCrr (T’)|, the length of the least labor root schedule for T’.
The base case is when n = 1, i.e., 1" makes exactly one call. In this case, it is obvious p = l

which means 1" and 1' each have oniy one child. Furthermore, it is clear that v’ = di- We choose

81

v = d1. In this case, from property (ii), it is easy to see that |F(T)| = |ND((d1))| + 1 S
|ND((d’1))| +1 = |C'(T’)| and that LLV(T - (31)) = LLV(T’ — (41)) = (0). Thus, LLV(T) g
LLV(T’).

Our inductive hypothesis is that for all n 5 Ir, if properties (i) and (ii) hold, then LLV(T) g
LLV(T’). We now prove the inductive step which is that if lCrr(T’)| = k + 1 and properties (i)
and (ii) hold, then LLV(T) g LLV(T’). Note that |C,r(T’ — (v’))| = |C,r(T’)| — 1 = k, so the
induction hypothesis will apply to ditrees T’ — (v’) and T - (v) if conditions (i) and (ii) hold.

We consider several cases depending on v’ and the characteristics of T’ and T.

(1) Suppose v’ E V((d§)), where i < p. We choose it = 11’, which means |NB((v))| =
|ND((v’))|. Therefore, showing LLV(T — (11)) g LLV(T’ — (v’)) also shows |F(T)| 5
|C(T’)|. Because isomorphic subtrees are removed from T and T’, properties (i) and. (ii)
still hold for T’ -— (v’) and T —- (v). Therefore, by the induction hypothesis, LLV(T - (12)) g

LLV(T’ — (v’)).
(2) Suppose v’ E V((d;)). We consider two possibilities.

(i) Suppose |ND((dp))| = |ND((d;,))|. We again divide this into two cases.

(a) Suppose v’ has the property |ND((v’))| = |ND((d;,))|. Then choose 1: = dp.
i.e., we make the call (r, dp, 1) in T. This gives us |ND((v))| = |ND((dp))| =
|ND((d;))| = |ND((v’))|. Thus, as in case (1), we merely need to show that
LLV(T — (v)) S LLV(T’ — (v’)). Since T — (v) is a subtree of T’ - (v’),
LLV(TI) g LLV(Tf).

(b) Suppose now that v’ has the property that |ND((v'))| < |ND((d;,))|. Since
|C(T')| 2. lND(<d§))|. this implies |C(T')l = maX{|ND((dL))I, |ND(T' *- (v'))| +

1}. Choose 22 E LLFC((dp)) which means |ND((v))| < |ND((dp))|. Since

82

lF(T)| _>. |ND((dp))|. this implies IF(T)| = max{|ND((dp))|. |ND(T-(v))|+1}-
Since we have assumed that |ND((dp))| = |ND((d;,))|, we ate assured that
lF(T)l S |C(T')| if LLV(T - (v)) _<. LLV(T’ - (v’>)-

Since 11 e (tip) and v’ e (4,), condition (2) is still true of T - (v) and T’ — (v’).
Let w e LLFC((dp), then LLV((dp —. (w)) 5 LLV((dI’D) -— (in). Because
it e LLFC((dp)) and condition (it), LLV((dp) — (v)) _<_ LLV((dg) — (w)).
ntetetote, LLV((dp) — (v)) s, LLV((d;) — (v’)) and condition (ii) holds for
T — (v) and T’ — (v’). rhctcrote, by the induction hypothesis, LLV(T - (v)) s
LLV(T’ — (in).

(ii) Suppose |ND((d,))| < |ND((d;,))|. We choose v = d,,, i.e., we make the call (1', d,, 1)
in T. Since T — (v) is a subtree ofT’ — (v’), LLV(T - (1))) _<_ LLV(T’ — (in). We
now must insure that |F(T)| s |C(T’)|.

Since LLV(T - (v)) s LLV(T' - (v’l). we know |ND(T - MN 5 |ND(T - (v’))l.
nieterote, we only need to show that |ND(_(v))| + 1 g lam). By our assumption,
IND(<v))I < |ND((d;>)l- This means that |ND((v)l + 1 s |ND(<d;))l- Clearly.
|ND((d;,))| _<_ loam. “meterote, we conclude |F(T)| s |C(T’)| and the lemma is

proven.

Lemma 5.3.2 Consider any ditree T rooted at 1‘. Let r have p children (15, 1 S i S p, such that
LLV((d1)) Z - - - _>_ LLV((d;)) Z -- - _>_ LLV((dp)), andlet(d1,v, l) e LLFC((dﬁ), then

(i) If |ND(T)| = |ND((d1))I. then (rill, 1) E LLFC(T)-
(ii) If |ND(T)| > |ND((d1))|, then (r, (11, l) 6 LLFC(T).
Proof: There are two cases to consider. First, let |ND(T)| = |ND((d1))|. Clearly, we must call a

node to in (d1) with the property that |ND((w))| g |ND((d1))| — 1; otherwise we cannot inform all

83
nodes in (d1) in |ND((d1))| time steps. For all such to, LLV((dt) — (1))) g LLV((dt) — (10)) by

the deﬁnition ofa LLFC. Thus, by Lemma 5.3.1 LLV(T — (~u)) S LLV(T — (w)), and (r, v, 1) E
LLFC (T).

In the second case, |ND(T)| > |ND((d1))|. Let C be a calling schedule that calls ii; in the
ﬁrst step and then follows a least labor schedule for the remaining subtrees, and let F be a calling
schedule that calls some w aé (it in the ﬁrst step and then follows a least labor calling schedule
for the remaining subtrees. We will show that 17C(T) g VAT), and consequently that (1', d1, 1) E
LLFC(T).

. Let T1 = T — (d1) and T{ = T — (to). It follows that,

|C(T)|=max{|ND(Tt)|ilND(<dl))|}+1 and |F(T)|=max{lND(Ti)|ilND((w))|}+1-

Then,

|C(T)l if i=1 - |F(T)| if i=1 .
and (VF(T))r = ‘

(LLV(T1)),-_1 otherwise (LLV(T{)).-_1 otherwise.

I“
l
O
A
ll

Thus, we need to show |C(T)| 5 |F(T)| and LLV(Tl) S LLV(Ti’).

We will ﬁrst Show that |C(T)| = |ND(T1)| + 1 and |F(T)| = |ND(T1’)| + 1. If we can prove
this, we need only show that LLV(TI) S LLV(Tl’) to Show that VC(T) g VAT). After calling d1, .
we know that |ND(T1)! 2 |ND(T)| — 1 or |ND((d1))| Z |ND(T) — 1| or both. Because |ND(T)| >
|ND((dl)|. we know that |ND(T1)I Z |ND(T)| -1 Z lND((d1))I.ThUSIC(T)I = IND(Ti)l+1-
Similar reasoning leads us to conclude |ND(T{)| 2 |ND((w))| and |F(T)| = |ND(T()| + 1.

We now show that LLV(TI) g LLV(Tf). There are three cases to consider:

(1) Let in E V((d1)). Clearly LLV(T)) S LLV(TI’).

84

(2) Let w = d;. where i sé 1. By Lemma 5.3.1 and our assumption that LLV((d1)) Z

LLV((d,)) roti > 1, LLV(T)) g LLV(Tf).

(3) Letw 6 V((d,)), wherei # 1. Then we know LLV(Tl’) _>_ LLV(T - (do). Combining this

with case (2) from above, we have LLV(T)) g LLV(T — (d,)) g LLV(Tf).

Since LLV(TI) g LLV(Tl’) for all w, it follows that (r, (it, 1) E LLFC(T). 13

Lemma 5.3.2 describes the ﬁrst call made by the root in a least labor broadcast schedule. The fol-
lowing decomposition pr0perty of least labor broadcast schedules will help us to apply Lemma 5.3.2

to the construction of complete least labor root schedules for any ditree T.

Lemma 5.3.3 Let L(T) be a least labor broadcast schedule for ditree T with (r, v, 1) E L(T).
Then L(T - (v)) is a least labor broadcast schedule for T — (12).

Proof: This follows in a straightforward manner from the deﬁnition of L(T - (1))) and the

deﬁnition of least labor broadcast schedules. Cl

5.3.3 Computing Root Schedules and Labor Vectors

We now describe. how we utilize Lemma 5.3.2 to compute the root schedule and the labor vector of a
least labor broadcast schedule for any directed tree T with root node r. In all of the following algo-
rithms we assume that r has p children (11,. . . ,dp and that we are given the root schedules Sr((d,-))
and labor vectors 95((d,)) of a least labor broadcast schedule S ((d,)) for all (1,. Furthermore, the
children ofr are ordered such that 93((d1)) Z 2 175((d,)) 2 Z 5((dp)).

First, we deﬁne the procedure FIRSTCALL which decides which node u should be informed

by r in the ﬁrst time step. The correctness of FIRSTCALL, assuming the procedure TIME returns

85

the value of |ND(T)|, follows immediately from Lemma 5.3.2.

 

FIRSTCALL (T)
1. Call TIMECI') which returns |ND(T)|.
2- If|ND(T)| = |ND((d1))l. the"
Set v = ”1(Sr((dl))
Else
Set it = d1.

3. Return v and |ND(T)|.

 

 

 

Next, we deﬁne the procedure ROOT SCI-[ED which repeatedly uses FIRSTCALL to cre-
ate the root schedule Sr (T) and the labor vector I75 (T) of a least labor broadcast schedule S (T) for
any directed tree T with root node 1'. The key to our ability to repeatedly apply FIRSTCALL stems
from Lemma 5.3.3 which shows us how to quickly compute the root schedule and labor vector of a

least labor broadcast schedule for (all) -— (v) in steps (5a) and (5b) of ROOTSCHED.

86

 

ROOT SCI-{ED (T)
l. Initialize S,(T) = 0 and Visa) = () andt = 1.
2. Call FIRSTCALL“) which returns v and L.
3. Update 5.0“) and Vs (T) as follows
(a) S,(T) = S..(T) U (r,v,t)
(b) Append L to 175 (T)
4. If v = (11 then modify T as follows
(a) Eliminate child <11 and its associated root schedule and labor vector.
(b) For 2 S i g p, rename child d,- (and associated root schedule and labor vector) to be child d,--1 .
(c) Decrement p by one.
(d) If p > 0 increment t and Goto Step 2. Otherwise append 0 to 175 (T) and Exit.
5. Else modify T as follows
(a) Replace S,((d1)) with S,((dt) - (22)).
(1:) Replace feud,» with 175%) - (11)).

(c) Rename children sothat 95((d1)) 2 2 95((d,)) 2 ~ -- 2 95((dp)).

 

(d) Increment t and Goto Step 2.

 

 

5.3.4 Computing |ND(T)|

We now describe procedure TIME and prove its correctness.

 

TIMECF)
l. Setn = |ND((dt))|
2. Call CHECK(n, T))
3. If CHECK returns “yes" then
Return n.
Else

Increment n by 1 and goto 2

 

 

 

87

 

CHECK(n, T)
1. If n = 0 and r has children, return “no"
2. If n 2 0 and r has no children, return “yes"
3. Else
(a) If |OP'T-ND((d1))| > it. return “no”
(b) If |OPT-ND((d1))| < n, return(CHECK(n - l, T — (d1)).

(c) If |OP'I—ND((d1))| = 11. let u = v1(S,((d1))). Return(CHECK(n — 1, T — (v))).

 

 

 

Lemma 5.3.4 For all ditrees T and all n _>_ O, CHECKln.T) returns “yes" if |ND(T)| S n, and
CHECK(n,T) returns “no" if |ND(T)| > 11,
Proof: We will prove this lemma by induction on the variable 11. Let D; = (all).

The base case is when n = 0. Suppose |ND(T)| = 0. Clearly, this means there are no nodes
to inform, i.e., the root of T has no children. In this case, step 2 of CHECK applies, and “yes" is
returned. Suppose |ND(T)| > 0. This means that r has children left to inform. In this case, step
1 of CHECK applies, and “no” is returned. Thus, for all ditrees T, CHECK(O,T) retums “yes” if
|ND(T)| s 0. and CHECK(O,T) returns “no" if lND(T)| > 0.

Now assume that for all ditrees T and for n _<_ k, CHECK(n,T) returns “yes" if |ND(T)| S
n, and CHECK(n,T) returns “no" if |ND(T)| > n. We now show that for all ditrees T,
CHECKU: +1,T) returns “yes" if |ND(T)| g k + 1, and CHECKU: + 1,T) returns “no" if
|ND(T)| > k + 1.

Suppose |ND(T)| > k + 1. There are three possibilities to consider:
(:1) Suppose |ND(D1)| > k + 1. Then step 3(a) applies, and “no" is returned.

(b) Suppose |ND(D1)| < k + 1. Then Step 3(b) applies, and CHECK(k + 1,T) is ex-

actly CHECKUc, T — D1). Since |ND(D1)| _<_ k and 1ND(T)| > k + 1, we conclude

88

|ND(T - (v))| _>_ k +1. By our induction hypothesis, CHECK(k, T — Dr). and consequently

CHECK(k + 1, T), returns “no.”

(c) Suppose |ND(DI)| = k + 1. Then step 3(c) applies, and CHECKUC + 1, T) is exactly

CHECK(lc,T— (1))), where v E LLFC(DI). Since 12 6 LLFC(Dt) and |ND(Dl)| = k+1,
we can conclude that ND((v))| 5 Is. Using the same argument from case (b), it follows that
|ND(T — (v))| _>_ k + 1. By our induction hypothesis, CHECKUc, T — (1))) returns “no”, and

consequently CHECK“: + 1,T), returns “no."

Thus, we have shown that for all ditrees T, if IND(T)| > k + 1, CHECK“: + 1, T) returns “no.”

Now suppose that |ND(T)| S k + 1. Clearly |ND(DI)| 5 k + 1, so we only need to consider

cases (b) and (c) from above.

(b) If |ND(D1)| < I: + 1, then CHECKUc + 1,T) is exactly CHECK(k, T - D1). From

(C)

Thus,

Lemma 5.3.2, we know that there exists a v e DI such that v E LLFC(T). This, along
with the fact that |ND(T)| S k + 1, implies |ND(T — (1r))| 5 1:. Since T - D) is a sub-
tree of T — (12). it follows that |ND(T - D1)| 5 la. Thus, by our induction hypothesis,

CHECK(k,T — Dt) returns “yes," and consequently CHECKUc + 1,T) also returns “yes."

If |ND(Dl)| = k + 1, then CHECKUC + 1,T) is exactly CHECK(k, T — (12)) where v E
LLFC(Dl). Because |ND(DI)I = k + 1 and 1r 6 LLFC(Dl) we know |ND(v)| S 1:.
Furthermore, we know from Lemma 5.3.2 that v 6 LLFC (T). Using this along with the fact
that |ND(T)| S k + 1, implies |ND(T — (11))! g k. Thus, by our induction hypothesis, we

know CHECK(k, T - (11)), and consequently CHECKUc + 1,T), returns “yes.”

we have shown that for all ditrees T, if |ND(T)| S k + 1, CHECKUc + 1, T) returns “yes.”

By the principle of mathematical induction, we have shown that for all ditrees T and all n 2 0,

CHECK(n, T) returns “yes” if IND(T)! g n and CHECK(n,T) returns “no” if |ND(T)| > n. C]

89

The remainder of algorithm ODA can be implemented in a straightforward manner. CREATE
and SCHED are the procedures that implement the bottom up and top down phases of algorithm
ODA, respectively. CREATE simply calls ROOT SCHED in a bottom up fashion after creating
empty root vectors and labor vectors equal to (0) for all nodes with outdegree 0. SCI-{ED uses these
least labor vectors and root schedules to create a complete minimum length node-disjoint broadcast

schedule for the ditree T.

 

Optimal Decomposing Algorithm ODA

I. Call CREATE(T) to makeleast labor root schedules and least labor vectors for all (12) in T.

2. Call SCHED(T) using the entire output of CREATE(T) to create the broadcast schedule ODA(T).

 

 

CREATE(T)

1. For all nodes 12 with outdegree 0, make S,((v)) = G and 175((v)) = 0.
2. Initialize Completed = {v I outdegree(v) = 0}
3. While Completed a6 V(T) do the following
(a) For all vertices v in V(T) — Completed
(i) If all the children of v are in the set Completed then call ROOTSCHEDU‘)

(ii) Add v to the set Completed.

 

 

SCHED (T)

l. Initialize D(T) = 5,-(T) and Completed = {r}.
2. While V(T) - Completed 96 G
(a) If all ancestors of v are in Completed

(i) Call PROCESS(v, D(T))

 

 

(ii) Add 1) to the set Completed

 

 

paocess (1r, D(T))

1. time = 1', s.t. (w,1r,i) 6 D(T)
2. j = minj s.t. (w,v,-(D,.(v)),k) ¢ D(T) for all k, w.
3. Fort = 1' t0 lDr((v))|

(a) D(T) = D(T) U (w,v¢(D,((v))),time + t —j + 1), where (w,v¢(D,((v))),t) E D,((v)).

 

 

 

Lemma 5.3.5 Let T be a ditree rooted at r. CREATEYT) makes the root schedule and the labor
vector of a least labor broadcast schedule for (v) for all 11 E V(T).

Proof: This follows from Lemmas 5.3.2, 5.3.3 and 5.3.4. D

This leads us to our main result. The optimal algorithm ODA uses the least labor root schedules

and vectors for all (u) in T to create an optimal node-disjoint broadcast schedule for T.

Theorem 5.3.1 Algorithm ODA creates a minimum length node-disjoint broadcast schedule
0DA(T) for any directed tree T.

Proof: This follows from Lemma 5.3.5 and the correctness of the procedure SCHED(T). Cl

5.3.5 Time Complexity

Table 5.1 gives the worst case time complexity for all the algorithms used by the Optimal Decom-
posing Algorithm (ODA).

We ﬁrst calculate the time complexity of the CHECK algorithm. Let TCHECKUI) be the time
complexity, where n is the number of nodes in ditree T. We have the following recurrence relation

for the worst cast time complexity, '

91

Table 5.1: Time Complexity for ODA

 

 

 

 

Worst Case Complexities
Algorithm Complexity
CHECK 0(a)
TIME 0(n)

 

FIRSTCALL 0(n)
ROOTSCHED 0(n2)

 

 

 

 

 

 

 

 

 

CREATE on?)

SCHED 0(52)

PROCESS 0(n)

ODA 0(n3)
TCHECK(") = a1 + Tonecxln - 1)
TCHECKU) = 02

Solving this recurrence gives us

TCHECtrl") = 00")-

Given the complexity of CHECK, it is easy to see that the TIME algorithm also has complexity
0(n). This implies that algorithm FIRSTCALL has time complexity 0(a) as well.

Now we can calculate the complexity of ROOTSCHED. The bulk of the work is done in step 2,
in which algorithm FIRSTCALL is called. Steps 4(d) and 5(d) precede step 2. It is clear that the
maximum number of times that 4(d) and 5(d) can occur is n. Thus, ROOTSCHED has complexity
0(n2) .

Given the complexity of ROOI‘SCHED, we are able to calculate the complexity of CREATE.
CREATE can call ROOTSCHED at most 11 times, thus worst case complexity is 0(n3).

'Ihe complexity of PROCESS is 0(n) since j in step 2 can be no larger than 11. Because

PROCESS is called at most 11 times by SCHED, the complexity of SCI-[ED is 0(n2). 'Ihus, the

complexity of ODA is 0(n3 + 712) = 0(n3).

5.4 Summary

We describe two polynomial time approximation algorithms, CA and SCA, for performing broad-
cast in directed trees under the node-switching model of communication. The biggest advantage of
the CA algorithm is its simplicity. Its best case time complexity is 0(n log n) (which it achieves
for broadcast in hypercubes, meshes and tori), and its worst case complexity is 0(n2). While the
SCA algorithm has slightly higher complexity (0(n2) in the best case and 0(n3) in the worst
case), it also may produce much shorter calling schedules. 'Ihe broadcast schedules produced by
SCA are guaranteed to be no more than twice the length of an optimal length broadcast schedule. Fi-
nally, we describe a 0(n3) algorithm that always produces minimum length node-disjoint broadcast
schedules for directed tree topologies. This can easily be transformed to an edge disjoint broadcast

schedule with length at most twice that of an optimal edge disjoint schedule.

Chapter 6

Shortest Path Oblivious Routing

6.1 The Generalized Path-Matching Problem

In Chapter 3 we described matching problems that have been previously studied. Traditionally, a
matching (or edge-matching) in a graph G = (V, E) is deﬁned as a set of pairwise independent
edges. Wu and Manber introduced a generalization of an edge-matching called a path-matching.
In a path-matching, vertices of G are joined by simple paths rather than edges [52]. Speciﬁcally, a
path-matching M in a graph G is a set of simple paths in which no two paths share the same end
vertices. Of particular interest is the problem of ﬁnding a perfect path-matching in which paths in
M are pairwise edge-disjoint (a DP-matching).

In this Chapter, we introduce a generalized path-matching, which includes both edge-matchings
and path-matchings. We Show how a generalized path-matching can be used to create an optimal .
UBM algorithm for arbitrary topologies that employ a shortest path routing scheme. We will deﬁne
a generalized path-matching for a symmetric digraph rather than an undirected graph. A directed
graph G = (V, A) is symmetric if for every arc ij 6 A(G), we have are ji 6 11(0) as well. A
generalized path-matching is deﬁned as follows:

93

94

Deﬁnition 6.1.1 Given a symmetric digraph graph G = (V, A), a set P of directed paths in G, and
a set X S; V(G) of vertices, we call Mp g P a generalized path-matching (GP-Matching) on
(G, P, X) if the origin and terminus of each directed path p 6 Mp are distinct vertices from X,
and no two dipaths have a common origin or terminus.

Note that every undirected graph has a corresponding symmetric digraph representation ob-
tained by replacing each edge ab with two directed arcs ab and ba. Thus, a GP-matching can be
thought of as being deﬁned for an undirected graph, except the paths used to match vertices are
directed. Directed paths are used to distinguish an 1' j-path from a ji-path. That is, if the prescribed
set P of paths contains a directed ij-path, it does not necessarily contain a directed ji-path. Thus,
we will assume an ij-path refers to a directed path with origin 1' and terminus j.

In the generalized case, a perfect GP-matching is one that matches all vertices from X. Both
the edge-matching problem and the path-matching problem are speciﬁc cases of the GP-matching
problem. In the edge-matching problem, X = V(G) and P consists of all paths in G that are length
one (i.e., P = A(G), the arcs in the symmetric digraph that corresponds to graph G). Thus, a
GP-matching MA for (G, V, A) is exactly an edge-matching in G. Similarly, if P(G) denotes the
set of all simple dipaths in G = (V, A), then the GP-matching M p(G) for (G, V, P(G)) is exactly a
path-matching in G.

In Section 3.5, we described two commonly studied variations of matching problems in graphs.
The ﬁrst variation is ﬁnding maximum (or perfect) matchings, and the second variation is ﬁnd-
ing minimum cost maximum matchings. We will focus on a slightly different variation; we wish to
identify properties that guarantee a perfect GP-matching exists. We are particularly interested in per-
fect GP—matchings in which the directed paths are pairwise edge-disjoint (perfect GDP-matchings).
TWO directed paths p(z'o,$¢) and p(ymyt) are edge-disjoint if they do not have any of their un-
derlying edges in common. Unlike DP-matchings, perfect GDP-matchings do not always exist. In

Section 6.2, we give sufﬁcient conditions that guarantee an input instance (G, P, X) has a perfect

9S

GDP-matching. In particular, we show that if P contains a shortest path between each pair of ver-
tices in X, then there exists at least one perfect GDP-matching for (G, P, X). In addition, the proof
provides a method for ﬁnding a perfect GDP-matching.

'Ihe GDP-matching problem has a direct application in multicast communication for direct net-
work systems. In a line-switching UBM schedule, several messages may be sent concurrently along
edge-disjoint paths in the network. In Section 6.3, we show how GDP-matchings can be used to
maximize parallel communication and therefore to allow an optimal line-switching multicast algo-

rithm to be constructed for direct network systems.

6.2 Sufﬁcient Conditions for Perfect GDP-matching

Since every connected graph contains a perfect DP-matching, we know that when X = V(G)
and P contains all the paths in G, that there is a perfect GDP-matching for (G, P, X). It is not
true in general that a perfect GDP-matching exists. For example, consider the graph C3, pictured in

Figure 6.1. Suppose X = {1,2, 3,8}. Let pm, j) be the longest ij-path, pc(i, j) be the “clockwise”
1 2

6 5

Figure 6.1: Cycle on 8 vertices.

ij-path, and p3(i, j) be a shortest ij-path in 03. Furthermore, let

P1 = {PL(i.J') Virj E X}
P2 {p6(irj) Vi,j€X}
P3 = {p5('rj) Vii]. E X}

96

Clearly no perfect GDP-matching exists for (03, P1 , X) because every pair of paths must share
a common underlying edge. If we use the paths in P2, M p2 = {pc(l, 2), pc(3, 8)} is a perfect GDP-
matching for (03, P2, X). If we use P3, the vertices matched in the previous example, using paths
from P2, cannot be matched by disjoint paths from P3 (both p3(l, 2) and p3(3, 8) use the underlying
edge between vertex 1 and vertex 2), but Mp3 = {ps(l, 8),ps(2, 3)} is a perfect GDP-matching
for (Ca, P3,X).

Because perfect GDP-matchings do not always exist, a natural question that arises is: “under
what conditions is there a perfect GDP-matching for instance (G, P, X )7” In this section, we give
sufﬁcient conditions for the existence of a perfect GDP-matching. In particular, we show that if X i
is any subset of vertices and P contains a shortest path for each pair of vertices in X, then there is
l a perfect GDP-matching for (G, P, X). The following Lemma is the key for ﬁnding edge-disjoint

pathsthatmatchpairsofverticesinX.

Deﬁnition 6.2.1 Let |p(i, j)| denote the length of path p(i, 3').

Lemma 6.2.1 Let G = (V, A) be a symmetric digraph, X Q V(G) a subset of vertices, and P be a
set of directed paths that contains a shortest path for all pairs of vertices in X. Further; let p(zc, 3;)
denote a shortest xy-pathﬁ'om set P. If there are two paths p(zl , m) and p(zz, m) that contain a
common underlying edge, then either

|P($1.$2)| + MINA/all < |P($1.y1)|+lp($2ty2)|

0r
|P($1.1I2)| + [p(x'hylll < |P($1sy1)| + |P(22.y2)|-

Proof: Suppose that directed paths p(zl, 111) and p(zg, 312) are not edge-disjoint, and that they
both contain the underlying edge ab. WLOG, assume that a occurs before b on p(xl,y1). Then

there are two cases to consider,

1. 0 occurs before b on p(xg, yg) as pictured in Figure 6.2(a).

97

2. b occurs before a on p(zg, 312) as pictured in Flgure 6.2(b).

1' ﬁx"
‘9 a__,”.." 3'1 x, a b "'H y,
.” .I"
‘9 3’2
(a) Case 1 (b) Case 2
ﬁgure 6.2: Two conﬂicting paths.
In the ﬁrst case,
[p(xt.yt)| = |p(xt.a)| + [p(a.b)| + lp(b.yt)l
lP($2ty2)| = |P($2.a)| + Watbll + L901, 112”
lp(zt.yt)| + |p(22.y2)| = lp(xt.a)l + lp(a.b)l + [p(b.yt)|

+lp(z2.a)l + lp(a.b)l + may.”
' IP($1tl/1)l + |P($2sy2)| = (who): + |P($2sa)|) +(lP(bsy1)| + lp(b.y2)l) + 2

IV

lp($lryl)l + lp(321y2)l [D(31t32)1+lplyltl/2)l + 2

lp(xt.yt)l + lp(32.y2)l > lp($t.z2)l + lp(yt.y2)|

Similarly, in the second case,

|P($1.y1)| + |P($2.y2)| > |P($1sy2)| + |P($2.y1)|-

98
Lemma 6.2.1 leads naturally to a method for ﬁnding a perfect GDP-matching for (G, P, X).

Theorem 6.2.1 Let G = (V, A) be a symmetric digraph, and let X Q V(G) be a set of vertices
from G. If P contains a shortest i j -path for all i, j e X , then there is a perfect GDP-matching for
(G, P, X). ‘

Proof: We describe a polynomial-time algorithm for ﬁnding a perfect GDP-matching for
(G, P, X) when P contains a shortest ij-path for all i,j E X.

Let X = {2:1,. . . ,xn}, and let p(m,-,zj) be an arbitrary shortest ij-path from P. A perfect
GDP-matching contains 1: = [12!] paths. We initialize a matching M (0) to be a set of paths between -
It arbitrary pairs of vertices in X. WLOG, let M(0) = uf___,p,-, where p, = p(m,, ow).

Initially, M (0) may not contain pairwise edge-disjoint paths. We describe a procedure that
transforms M (0) into a perfect GDP-matching after several iterations. Let M (i) be the matching
associated with iteration i in the procedure, and assume the path 12,- has source .9,- and terminus tj.

We transform M (0) into a GDP-matching as follows:

1. Let iteration = 0
2. If there exist paths p,- = (3;, t,-) and p,- = (3,, tj) from M (iteration) that
both contain the underlying edge ab, then increment iteration and use
Lemma 6.2.1 to create a new matching M (iteration) as follows:
IfarcabEpgandareabEpJ-Jhen
M (iteration) = M (iteration — 1) — p,- — pj + p(sg, Sj) + p(ti, tj)
ElseifarcabEpgandarebaEijhen
M(iteration) = M(iteration - 1) — p.- — pj + p(s,, tj) + p(sj, ti)
Else there is no edge contention. STOP.

3. Repeat step 2.

99
If the procedure terminates with iteration = i, M (i) is a perfect GDP-matching for (G, P, X)

because all k paths in M (i) are pairwise edge disjoint. Thus, we only need to show that the proce-

dure terminates.

Deﬁne size[M(i)] = Z lpl. Because each path p in M (i) must contain at least one edge,
106M“)

‘v’i size[M (1)] Z k. (6.1)
By Lemma 6.2.1,
Vi size[M (i + 1)] < size[M (1)] (6.2)

Combining the facts in Equations 6.1 and 6.2 guarantees that the procedure must terminate. D

6.3 Multicast Algorithm using GDP-matching

'Ihe GDP-matching problem has a direct application to multicast communication. We present an
optimal multicast algorithm that is based on Theorem 6.2.1. In a UBM implementation, the lower
bound on the number of steps required to complete a multicast request M = (s, D) is [lg(lDl +
1)]. 'Ihus, a multicast schedule with length [lg(IDl + 1)] is optimal. Ideally, we want to ﬁnd
an optimal UBM schedule 0 for implementing any multicast request M in any direct network
topology represented by a graph G. Recall from Chapter 3 that optimal UBM algorithms have
been developed for arbitrary network topologies that employ free routing, for mesh tapologies that
use xy-routing, for hypercube topologies that use e-cube routing, and for tours topologies that use

dimension ordered routing. Unfortunately, multicast cannot always be performed in [lg(lDl + 1)]

100

steps. A natural question to ask is, “Under what conditions can UBM be done in the optimal number
of time steps, [lg(lDI + 1)] ?” In this section, we use Theorem 6.2.1 to show that any routing
scheme that includes a shortest i j-path for all pairs (i, j) allows optimal line-switching UBM in
any network topology that can be represented by a symmetric digraph (i. e., a direct network that

provides symmetric communication). First, some deﬁnitions are in order.

Deﬁnition 6.3.1 Let disa(i, j) be the length of a shortest i j -path in G.

Deﬁnition 6.3.2 A muting scheme R is called a shortest path routing scheme if for every pair of
nodes v; and 1);, R contains a routing path pp(i,j) oflength disa(i,j).

Theorem 6.3.1 There is an optimal line-switching UBM implementation for any shortest path mut-

ing scheme in any arbitrary symmetric graph topology.

Proof: We give a constructive proof, i. e., we describe a method to create a multicast schedule
of length [lg(lDl + 1)] for any multicast request M = (s, D) in any system represented by a
symmetric digraph and that employs a shortest path routing scheme R. Let pp(s, d) be the routing
pathusedbthosendamessage from nodestonode d.

In order to achieve multicast in [lg(lDl + 1)) time steps, the number of informed nodes must
double in each time step. Thus, in the last step, half of the destination nodes are already informed
and must make calls to the other half of the destinations. This is the underlying principle of the
technique that we describe.

A multicast schedule is built from the bottom up. That is, the calls in time step [lg(lDI + 1)] are
scheduled ﬁrst, then the calls in step [lg(lDl + 1)] — l, and so on. We use Theorem 6.2.1 to create
the optimal multicast schedule. The set X in Theorem 6.2.1 corresponds to the set of nodes involved
in the multicast request (i.e., s U D). The set of paths P is exactly the routing paths included in R.
Because R contains a shortest path for each possible source-destination pair, Theorem 6.2.1 guar-

antees that a perfect GDP-matching can be found for (G, R, s U D). Let the perfect GDP-matching

101
Mp be {pp(sl,d1), . . . ,pR(3k,dk)}, where k = [1%]. The paths in Mp represent the calls that

will be made in the last step of the multicast schedule. Let S = Uf=lsg. Since the nodes in S are
sources of unicast calls in step [lg(lDI + 1)], these nodes must be informed in [lg(IDI + 1)] - 1
steps. Essentially, we have a new multicast request M’ = (s, D’), where D’ = S — {s}. We
ﬁnd a perfect GDP-matching for s U D’ as described above, and continue this process for a total

of [lg(IDI + 1)] steps, when the single call in the ﬁrst step of the multicast algorithm is scheduled. D

Theorem 6.3.1 provides the theoretical basis for all the previous work on optimal multicast
in direct network topologies. In [21, 46], optimal UBM algorithms are given for for any arbitrary
network that uses free routing. In addition, optimal UBM algorithms for meshes that use try-routing,
hypercubes that use e-cube routing, and tori that use dimension ordered torus routing have been

developed [46, 47]. Each of these is a corollary to Theorem 6.3.1.

Corollary 6.3.1 There is an optimal line-switching UBM implementation for all M = (s, D) in an
arbitrary topology that employs free routing.

Corollary 6.3.2 There is an optimal line-switching UBM implementation for all M = (s, D) in
any mesh topology that employs reg-routing.

Corollary 6.3.3 There is an optimal line-switching UBM implementation for all M = (s, D) in
any hypercube topology that employs e-cube routing.

Corollary 6.3.4 There is an optimal line-switching UBM implementation for all M = (s, D) in
any torus topology that employs dimension-ordered torus routing.

In addition, Theorem 6.3.1 gives guidelines for developing routing strategies. The theorem
states that if shortest paths are used to route messages between all pairs of nodes, this is sufﬁcient to
guarantee an optimal multicast implementation. Current routing techniques use shortest paths. Us-
ing shortest paths can reduce overall message transmission latency and can reduce network trafﬁc.

In addition, shortest paths require less space to encode in routing table. In essence, Theorem 6.3.1

102

says that a simple, natural routing scheme in an arbitrary topology allows optimal multicast perfor-
mance.

Finally, Theorem 6.3.1 has implications for updating routing schemes as new nodes are incre-
mentally added on to the system. Ifnew nodes are added to a system, routing tables must be updated
to include the routes for new nodes. As long as all routing paths are shortest paths in the network,
multicast can be done in optimal time. First, we must verify if existing routing paths are still shortest
paths after the additional nodes are added. Then, any shortest path can be used for routes containing
new nodes and to replace original rooﬁng paths that are no longer shortest paths. No particular
structure or symmetry must be maintained by the routing paths, making it quite simple to update the '

routing scheme.

6.4 Summary

This chapter describes a generalization of previously studied matching problems for undirected
graphs. The Generalized Path-Matching (GP-matching) problem is deﬁned, which includes both
(the edge-matching . problem and the path-matching problem. In a GP-matching, a subset X of
vertices is matched using paths from a speciﬁed set P. We are interested in ﬁnding a perfect GP-
matching, or one that matches all the vertices of X. Additionally, we focus on the edge-disjoint
GP-matching problem (or GDP-matching problem). Sufﬁcient conditions for the existence of a
perfect GDP-matching are given. It is shown that if shortest paths are used to match vertices, a
perfect GDP-matching always exists.

Sufﬁcient conditions for optimal multicast communication in parallel systems based on direct
networks are a direct consequence of the GDP-matching result. We showed that in a symmet-
ric direct network system, a shortest path routing scheme allows optimal UBM communication in

any arbitrary topology. Because meshes, hypercubes and tori typically use shortest path routing

103

schemes, this result shows that optimal UBM can be done in these direct network systems. Thus,
this result provides a theoretical basis that uniﬁes the optimal algorithms previously developed for
these speciﬁc topologies [46, 47]. Since using shortest paths to route messages is suﬁicient to guar-
antee optimal time multicast, designing routing strategies for arbitrary topologies can be straight
forward. Shortest paths should be used for routing, but routing paths do not need to be uniform or
symmetric. Additionally, this result implies that adding on to a network incrementally should be
relatively simple. Old routes may remain unchanged and new routes only need to satisfy the shortest
path property. Finally, the proof provides a technique for constructing an optimal length multicast

schedule for any multicast request in an arbitrary topology that uses shortest path routing.

Chapter 7

Conclusions and Future Work

The goal of our research is to provide a general UBM approach for efficient multicast communica-
tion in direct networks. This dissertation describes signiﬁcant progress toward that end. We present
two general models for realistic direct network systems and provide multicast algorithms for arbi-
trary topologies. We summarize these contributions in Section 7 . 1. Ideally, we want to develop time
and trafﬁc efﬁcient UBM algorithms for arbitrary restricted rooﬁng systems in general topologies.

In Section 7.2, we outline several open problems that merit future investigation.

7 .1 Conclusions

7.1.1 Source Limited Inclusive Routing

Source limited inclusive routing schemes represent a realistic class of routing schemes for several
reasons. F‘u'st, all routing schemes that are implemented on current machines are inclusive routing
schemes. Second, storing only source limited routing information allows for the smallest possible
storage requirements for routing information. Furthermore, because all routing paths contain addi-
tional routing information within them, much of the system-wide routing information is available

104

105

even though nodes store only source limited information. Finally, source limited inclusive routing
schemes have a natural graph theoretic model, namely directed trees, that makes them easy to work
with.

Although the complexity of determining optimal line-switching UBM schedules for directed
trees has not been determined, we show that node-switching UBM schedules provide a very good
approximation for line-switching UBM schedules in ditrees. In addition, decomposing node-
switching algorithms in a ditree have several advantages. First, multicast problems in ditrees under
the node-switching assumption can be transformed to equivalent broadcast problems. Consequently
we can focus on broadcast in directed tree topologies. Additionally, decomposing algorithms allow
a ditree to be broken into two disjoint ditrees after a unicast call has been made. Because the result-
ing problems are disjoint, we know that the unicast paths used to deliver messages in different steps
of the UBM schedule will not interfere with each other. Thus, the UBM algorithm does not need to
be synchronous (i.e., does not need a barrier at the end of each time step). Finally, the decomposing

nature of node-switching algorithms makes analysis of these algorithms relatively simple.

7.1.2 Shortest Path Oblivious Routing

Although storing source limited information has very small storage requirements, helpful routing
information may not be available, and therefore optimal algorithms cannot be determined. In the
ideal case, all routing information is available to each node of the network. If complete information
is stored, shortest path oblivious routing schemes are the most realistic schemes to use. Using
shortest paths to deliver messages can reduce overall network trafﬁc, and it also minimizes the
storage for complete routing information.

We show that GDP-matchings can be used to create an Optimal UBM algorithm for arbitrary

direct network systems that employ a shortest path routing scheme. This result provides a theoretical

106

basis that uniﬁes the previous results on optimal multicast algorithms in speciﬁc direct network
topologies [21, 46, 47]. In addition, this result provides system designers with some simple, intuitive

rules for creating routing schemes that guarantee multicast can be performed in minimum time.

7.2 Future work

7 .2.1 Restricted Routing Model

In Chapter 4, we described the difﬁculties associated with ﬁnding a graph model for arbitrary re-
stricted routing schemes. In addition, we present a ditree model for source limited inclusive routing .
schemes. The clear disadvantage of this model is that it does not use complete routing information
to create multicast schedules. If complete routing information is available, we may be able to use
this information to reduce the number of time steps needed to satisfy a multicast request. Other real-
istic assumptions or different modeling techniques may lead to a model that can represent arbitrary

restricted routing information completely, accurately, and unambiguously.

7.2.2 Time Efﬁcient Line-Switching UBM Algorithms

Our initial results have concentrated on time efﬁcient multicast algorithms for source limited in-
clusive routing schemes and for shortest path oblivious routing schemes. This work stemmed from
related work on time efﬁcient multicast in systems that allow free routing. We present an optimal
time line-switching UBM algorithm for shortest path schemes, and we show that we can approxi-
mate time efﬁcient multicast for source limited inclusive routing schemes under the line-switching
model; however, the existence of an optimal time line-switching algorithm for ditrees remains an
open problem and deserves further investigation. In addition, we need to address the larger problem
of optimal time line-switching UBM algorithms in other restricted routing schemes. Thus, we want

to develop an optimal polynomial time UBM algorithm for arbitrary restricted routing schemes, or

107

prove that no such algorithm exists.

7.2.3 Thafﬁc Efﬁcient Line-Switching UBM Algorithms

The results in this dissertation focus on time efﬁcient UBM algorithms. However, trafﬁc efﬁciency
is also a very important metric tO consider. Ideally, a UBM algorithm will consider trafﬁc efﬁcient
multicast in conjunction with time efﬁciency. One can start by addressing this issue in systems that
allow free routing. The advantage Of studying free routing ﬁrst, is that free routed systems have
a well deﬁned graph model. Often, solving a problem in a simpliﬁed model gives insight about
solving a harder problem. For example, the multicast algorithm in [46] was the foundation for the
Centroid Algorithm presented in Chapter 5. In the same way, trafﬁc efﬁcient algorithms in free

routing may provide insight about restricted routing systems.

As discussed in Chapter 3, systems that assume free routing have lower bound implementations
for any multicast request. We will ﬁrst consider the existing UBM algorithm proposed by McKinley
er al. [46]. This algorithm is optimally time efﬁcient; however, it does not consider traﬁic efﬁciency.
Incorporating the notion Of trafﬁc efﬁciency into the existing algorithm can be explored. The algo-
rithm relies on using an Euler trail tO determine the unicast call that are made. The trafﬁc efﬁciency
is greatly inﬂuenced by the Euler trail that is chosen. Thus, it is important tO characterize the Euler

trail that produces the most trafﬁc efﬁcient multicast schedule.

Preliminary results not presented in this dissertation show that using the Euler trail method
described by McKinley et al. is not optimally trafﬁc efﬁcient. In some networks, an optimal trafﬁc
efﬁcient multicast schedule cannot be generated by any Of the possible Euler trails. In order to
produce Optimal trafﬁc efﬁcient multicast schedules some other method must be used. Such methods

merit further investigation.

108

7.2.4 Partially Restricted Routing

In addition to multicast in Oblivious routing schemes, the issues Of multicast in partially restricted
routing schemes can be explored. We described one type Of partially restricted routing in which
part Of the path is restricted and part Of the path is free. This was inspired by turnaround routing in
multistage cube networks. Other types Of partial restricted may also be used. For example, we can
restrict several Of the nodes a path must travel through, but not the path itself. Very little is known

about multicast performance for partially restricted routing schemes.

BIBLIOGRAPHY

Bibliography

[1] his Bager and Ylshay Mansour. Broadcast in radio networks. In Proceedings 6th Annual
ACMoSIAM Symposium on Discrete Algorithms, pages 577-585, 1995.

[2] Barbara D. Birchler, Adel-I-lossein Esfahanian, and Eric Tomg. Toward a general theory Of
unicast-based multicast communication. In Manfred Nagl, editor, Lecture Notes in Computer
Science, volume 1017, pages 237-251. Springer-Verlag, 1995.

[3] Barbara D. Birchler, Adel-Hossein Esfahanian, and Eric Tomg. Toward a general theory Of
unicast-based multicast communication. Technical Report CPS-95-3, Michigan State Univer-
sity, March 1995.

[4] Barbara D. Birchler, Adel-Hossein Esfahanian, and Eric Tomg. Information dissemination
in restricted routing networks. In The International Symposium on Combinatorics and Appli-
cations, pages 33—43, June 1996.

[5] Barbara D. Birchler, Adel-Hossein Esfahanian, and Eric Tomg. Sufﬁcient conditions for
Optimal multicast communication. In Proceedings of the I 997 International Conference on
Parallel Processing, pages 390-393, August 1997.

[6] Jean-Claude Bermond, Pavol Hell, Arthur Liestman, and Joseph Peters. Broadcasting in
bounded degree graphs. SIAM J. Disc. Math, 5(1):]0—24, February 1992.

[7] Jean-Claude Bermond, Pavol Hell, Arthur Liestman, and Joseph Peters. Sparse broadcast
graphs. Discrete Applied Mathematics, 36:97-130, 1992.

[8] Kadaba Bharath-Kumar and Jeffrey M. Jaffe. Routing tO multiple destinations in computer
networks. IEEE Transactions on Communications, 31(3):343-351, March 1983.

[9] Rajendra V. Boppana, Suresh Chalasani, and CS. Raghavendra. On multicast wormhole rout-
ing in multicomputer networks. In Proceedings of the 6th IEEE Symposium on Parallel and
Distributed Processing, pages 722-729, 1994.

[10] Jehoshua Bruck, Luc De Coster, Natalie Dewulf, Ching-T’len HO, and Rudy Lauwereins. On
the design and implementation Of broadcast and global combine Operations using the postal
model. IEEE Transactions on Parallel and Distributed Systems, 7(3):256—265, March 1996.

[11} Gregory T. Byrd, Nakul P. Saraiya, and Bruce A. Delagi. Multicast communication in multi-
processor systems. In Proceedings of the I 989 lntemarional Conference on Parallel Process-
ing, pages I—196-I-200, 1989.

[12] Gary Charter and Ortrud R. Oellerrnann. Applied and Algorithmic Graph Theory. Intema-
tional Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York, 1993.

109

110

[13] SC. Chan and AL. Liestman. Constructing minimal broadcast networks. J. Combin. Inform.
System. Sci, 10:110—122. 1985.

[14] Hyeong-Ah Choi and Abdol-Hossein Esfahanian. A message-routing strategy for multicom-
puter systems. Networks, 22:627—646, 1992.

[15] Alak K. Datta and Ranjan K. Sen. l-approximation algorithm for bottleneck disjoint path
matching. Information Processing Letters, 55:41-44, 1995.

[16] R. F. DeMara and D. I Moldovan. Performance indices for parallel marker-propagation. In
Proceedings of the 1991 International Conference on Parallel Processing, pages 658-659, St.
Charles, Illinois, August 1991.

[17] Narsingh DeO. Graph Theory with Applications to Engineering and Computer Science. Series
in Automatic Computation. Prentice-Hall, Inc., EnglewOOd Cliffs, N.J., 1974.

[18] Shimon Even. Graph Algorithms. Computer Science Press, Inc., Rockville, MD, 1979.

[19] A. Farley, S. Hedetniemi, S. Mitchell, and A. Proskurowski. Minimum broadcast graphs.
Discrete Mathematics, 25:189—193. 1979.

[20] A. M. Farley. Minimal broadcast networks. Networks, 91313-332, 1979.
[21] Arthur M. Farley. Minimum-time line broadcast networks. Networks, 10:59-70, 1980.

[22] Message Passing Interface Forum. Message passing interface forum. TechnicaliReport CS-
93-214, University Of Tennessee, November 1993.

[23] Pierre Fraingiaud and Joseph Peters. Structured communication in torus networks. In Hawaii
Int. Conf on Systems Science, pages 584-593. IEEE, January 1995.

[24] H. Franke. MPI-F: An MP1 implementation for IBM SP-l, February 1994. Available by
anonymous ftp from info.mcs.anl.gov.

[25] Luisa Gargano, Arthur Liestman, Joseph Peters, and Dana Richards. Reliable broadcasting.
Discrete Applied Mathematics, 53:135-148. 1994.

[26] M. Grigni and D. Peleg. Tight bounds on minimum broadcast networks. SIAM 1. Discrete
Math, 4:207-222, 1991.

[27] W. GrOpp and B. Smith. Users manual for the chameleon parallel programming tools. Tech-
nical Report ANL-93/23, Argonne National Laboratory, June 1993.

[28] Frank Harary. Graph Theory. Addison—Wesley Series In Mathematics. Addison-Wesley Pub-
lishing Company, Reading, Massachusetts, 1969.

[29] Chengchang Huang and Philip K. McKinley. Communication issues in parallel computing
across ATM networks. IEEE Parallel and Distributed Technology, pages 73-86, “Ruler 1994.

[30] Kai Hwang. Advanced Computer Architecture: Parallelism, Scalability, Programmability.
McGraw-Hill Series in Computer Science. McGraw-Hill, Inc., New York, 1993.

[31] Intel, Supercomputer Systems Division, Intel Corporation, Beaverton, OR 97006. Paragon
XP/S Product Overview, 1991.

111

[32] J ave 0. Kane and Joseph G. Peters. Line broadcasting in cycles. Technical Report CMPT
TR 94-11, School Of Computing Sciences, Simon Fraser University, Burnaby, B.C., Canada,
December 1994.

[33] R. Karp, A Sahay, E. Santos, and K. Schauser. Optimal broadcast and summation in the LogP
model. In Fifth Symposium on Parallel Algorithms and Architectures, June 1993.

[34] V. Kumar and V. Singh. Scalability Of parallel algorithms for the all-pairs shortest path prob
lem. Technical Report ACT-OODS-058-90, Rev. 1, MCC, January 1991.

[35] F. T. Leighton. Introduction to Parallel Algorithms and Architectures. Morgan Kaufrnann
Publishers, San Mateo, CA, 1992.

[36] Daniel Lenoski, James Landon, Kourosh Gharachorloo, AnOOp Gupta, and John Hennessy.
The directory-based cache coherence protocol for the dash multi-processor. In Proceedings
of the 17th Annual International Symposium on Computer Architecture, pages 148-159, May
1990.

[37] Arthur L. Liestrnan and Joseph G. Peters. Broadcast networks Of bounded degree. SIAM J. ’
Disc. Math, 1(4):531-540, November 1988.

[38] Arthur L. Liestrnan and Joseph G. Peters. Minimum broadcast digraphs. Discrete Applied
Mathematics, 37/38z401-419, 1992.

[39] Xiaola Lin, Philip K. McKinley, and Lionel M. Ni. Deadlock-free multicast wormhole rout—
ing in 2-d mesh multicomputers. IEEE Transactions on Parallel and Distributed Systems,
5(8):?93—804, August 1994.

[40] Xiaola Lin and Lionel M. Ni. Deadlock-free multicast wormhole routing in multicomputer
networks. In The I 8th Annual International Symposium on Computer Architecture, pages
116—125, May 1991.

[41] Philip K. McKinley and Christian Trefftz. Efﬁcient broadcast in all-port wormhole-routed hy-
percubes. In Proceedings of 1993 International Conference on Parallel Processing, volume 2,
pages 288-291, St. Charles, Illinois, August 1993.

[42] S. Mitchell and S. Hedetniemi. A census Of minimum broadcast graphs. J. Combin. Inform.
Systems Sci, 5:141-151, 1980.

[43] Jae moon KOh and Dong wan Tcha. Local broadcasting in a tree under minisum criterion.
Networks, 23:71-79, 1993.

[44] NCUBE Company. NCUBE 6400 Processor Manual, 1990.

[45] L. M. Ni and P. K. McKinley. A survey Of wormhole routing techniques in direct networks.
IEEE Computer, 26:62—76, February 1993.

[46] P. K. McKinley, H. Xu, A-H. Esfahanian, and L. M. Ni. Unicast-based multicast communica-
tion in wormhole-routed networks. IEEE Transactions on Parallel and Distributed Systems,
5(12):1252-1265, December 1994.

[47] David F. Robinson, Philip K. McKinley, and Betty H.C. Cheng. Optimal multicast com-
munication in wormhole-routed torus networks. In Proceedings of the I 994 International
Conference on Parallel Processing, pages I-134-I—141, 1994.

112

[48] S. A. Felperin, et a1. Routing techniques for massively parallel communications. Proc. 1555,
79(4):488-503, April 1991.

[49] P. J. Slater, E. J. Cockayne, and S. T. Hedetniemi. Information dissemination in trees. Society
for Industrial and Applied Mathematics, 10(4):692-701, November 1981.

[50] Craig B. Stunkel, Dennis G. Shea, Bulent Abali, Mark Atkins, Carl A. Bender, Don G. Grice,
Peter H. Hochschild, Douglas J. Joseph, Ben J. Nathanson, Richard A. Swetz, Robert F.

Stucke, Michael TsaO, and Philip R. Varker. Sp2 high-performance switch. IBM Systems
Journal, 34(5):185-204, 1995.

[51] Yu-Chee Tseng, Dhabaleswar K. Panda, and Ten-Hwang Lai. A trip-based multicasting model
in wormhole-routed networks with virtual channels. IEEE Transactions on Parallel and Dis-
tributed Systems, 7(2):138-150, February 1996.

[52] Sun Wu and Udi Manber. Path—matching problems. Algorithmica, 8(2):89-101, 1992.

[53] Hong Xu, Ya-Dong Gui, and Lionel M. Ni. Optimal software multicast in wormhole-routed .

multistage networks. In Proceedings of the 1994 Supercomputing Conference, pages 703—712,
November 1994.

"llllllllllllllll