I" . . - .
1'. 1 ‘\" o I. '1
{1’ .1

‘ ... ‘." - o 3

w‘ —4

 

A-
.4...

#1.

--
f
I
\f’
0" r.

a
-

:‘W‘f‘

3".“ p
O

.L_

mass
3

2‘

:-
131' I I, .
Wt" .,
{ "I, a '
K' _’;. - .- .
1“} ' .
’1' (“ 1 .
. .' . . '. . " 1“ -
" $5.923?“ ”1:2". "I 11
'1 33";‘2‘63' 3' ' , .. - -2
.1" - .
3 a;ﬁtf"?‘~.htu.:-J~,A 1,; . .' ' I
‘1 "‘71131*-/‘.’>; 7. ' 1 _ .- - .. 5
H ‘ v-V- “ﬁ’ﬁfng- “1'7.” . ‘ . I ‘ .
To 'ﬁ II . ' v.14x11 .uhl l1 I .‘u‘u, 9 .q, o ‘. ‘ ..,
' 2' 1“,,” ' .r‘ ”1",.1" J 11' '.1 ' '1'." n l u. q.
(m. 1‘ '_ . . .1
l ' k,’ ‘ -.r.v,x 7131*“)5'. :23! ‘7‘: I“, 21.0.. n 3"

‘.l"

1.1”. ;: 91.11111; r ,1” I ;: .
‘ '. .1
ﬁﬁﬂ'111'0.1~1.-.:l._

 
 
 
 
    
      
   

_ , {x
' ’4"{5:ﬁ'{g' '
1?} .n‘: .
g . _.E
A
13.21”:

.r‘i1b'h' 111?";1' If”;""- . ." '11,, "E1“.
' ' ‘ (gig)? «yummy.

11"!“ h .1

Elf .'P:Ed‘®]vbm_
"Huff; “$311515

 

1 . ‘ ' ‘ - '
.0. If
. 11o . ' .0. ~00 k~v~1.v\ ‘5’.
. I ..’ I "\’ O O‘..-
. , v 0 O. “I" I 'Lw. ‘ ‘ O
"C |- ’..L :'..‘ '3 1" ,' 4 . .
on - . 1.}:1‘L4‘J .‘5’ ‘ .
. -' I '3 a .0
. n 11"“... ’vun‘, . 1V,
‘ ' . .‘9 d o I
. . , ' )5... '.v;~b’.’ . I'
‘ . ' I 0‘ . :T . V." .rv.’ .
a
. c 1"- (WEI-{‘2’ ,' '. I
U} '31.!“ 3 ,
I . ... r 1. ; ... ‘
ml .'-- , a
- .- 11'" . 1““. "1.3”“. . -.
0,1 . 1' _ . , , ‘ .
IO": ""1 ' ,lnY '1’... ‘r ‘ ‘v .
.- ’ I ' "o . .-
l . i ‘ '7 W . 1 1 1 . g
a . . ‘3'- .. 0 -- HI 'I‘ z
., t ' o
.. - .r M.
O I 1' ' ' '
u o» . .
‘ I 0 .U " : ‘ d
. . 0 In]. . ' .
1 I‘ o . -I- , 3
x, ' ‘ . '
.. 4 i. I
- O ..l, . .'. f
O h . I}
k u 3
t . ‘
I ‘ ; , '
It. ' . I - -
- - O
u ‘ I‘ . _ .
I ' .. . 1 _ 1‘
V -- l.; - - .‘1
- ‘ '.. 5-. . ‘.§
03‘ . ._ r‘: I!
1 IL - I I I'
" - ..-’u‘<’.ﬂ’.'. -‘ '
1‘1”" HI-
| "4;- "I D
1 "1‘. T...
.--v—.:

        
 

1‘ .
.ﬂHI'H 'Irﬂ“‘. Il‘l ’ ..

 

. . 421'? “téﬁﬁ
.‘e’uo' "3%,: 4' :Q‘L‘GIS; 1"." ru‘fcrltcﬂ'

5&3“:

1.5414 619:?” #1..

 

_'1'..I

”‘5
1K -
$5
'0.

‘1? E“
L?
11,.

{3*

RR.
‘44.

in la-
‘33:"

“9‘8, ‘ rt :1
5" L -“ “‘1'? at” I
J. J .3 I v; .‘
“12:”; L5?!) {13"} N I ‘3 ' .ﬂ'r‘ fa
gal-F1 ' .114 .5; : ‘ ‘ >‘.— ,
’7: ‘ If. F {‘41 ‘
“and“ :’:;,,l- '»’:‘., (F 1-51"

Ilill'lllllllllllillﬁHills

3 1293 00877 3537

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This is to certify that the

dissertation entitled

GRAPH WEDDING AND DATA COMMUNICATION IN A
LARGE FAMILY OF NETWORK TOPOLOGIES

presented by

Jenshiuh Liu

has been accepted towards fulﬁllment
of the requirements for

DOCTOR—— degree in W

A4,]

Major professor

Date 7% Zﬁg /fja

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

LlBRARY ‘
| Michigan State
1 Unlverslty

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or betore one due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU to An Afﬁrmative Action/Equal Opportunity Institution
emunS-m

GRAPH EMBEDDING AND DATA
COMMUNICATION IN A LARGE FAMILY OF
NETWORK TOPOLOGIES

By

J enshiuh Liu

A DISSERTATION
Submitted to
Michigan State University

in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

August, 1992

.4 W- {5‘53

ABSTRACT

GRAPH EMBEDDING AND DATA
COMMUNICATION IN A LARGE FAMILY OF
NETWORK TOPOLOGIES

By

J enshiuh Li'u.

In a parallel computer or a distributed system the interconnection pattern (net-
work topology, interconnection topology) of the communicating modules is one of the
key factors that determine the efﬁciency, reliability, and applicability of the system.
MOst previous results in the network design area are about a single kind of inter-
connection topology; relatively few have considered fault-tolerance. Here we present
new results about graph embedding and data communication for a large family of
network topologies, which consists of the generalized Fibonacci cube [63, 80] and the
Incomplete Hypercube [68]. This family includes the hypercube as a special case.
However, unlike the hypercube, both the generalized Fibonacci cube and the Incom-
plete Hypercube are irregular graphs in general. The Incomplete Hypercube offers
unrestrictive network size; however, it suffers from a low degree of fault-tolerance
under certain situations. On the other hand, the generalized Fibonacci cube offers

structural recurrences and also provides a guaranteed degree of fault tolerance. This

family of network topologies has two major areas of application: (1) Fault Tolerance —
each member network serves as an alternative structure for reconﬁguring a hypercube
in the presence of faults, and (2) Incremental Expansion - allowing a system to add
nodes in small increments, which is an important advantage for systems that evolve
with time.

To investigate their applications in parallel or distributed systems, we present a
collection of provably efﬁcient graph embedding and data communication algorithms
for this large family of network topologies. The study of graph embedding is to de-
termine whether these networks can ﬂexibly simulate other common networks. The
results can be applied to the emulation of a given network on a family of networks;
analysis of these results offers insights about the structural relationships between dif-
ferent networks. On the other hand, the study of common communication primitives
provides efﬁcient tools for developing application algorithms for this family of net-
works; it also offers insights about the relations between forms (structural properties)
and functions (algorithms). Specifically, we demonstrate how to embed linear array,
ring, mesh, and hypercube on the generalized Fibonacci cube; We also show how
to embed linear array and ring on the Incomplete Hypercube. In addition, efﬁcient
algorithms for single node broadcasting/accumulation, single node scatter/gather,
and multinode broadcasting/ accumulation are presented for both the generalized Fi—
bonacci cube and the Incomplete Hypercube. All of these algorithms are carefully

analyzed under both the all- and one-port models.

Copyright © by
J enshiuh Liu
August, 1992

To my parents and my wife

ACKNOWLEDGEMENTS

I would like to thank Professor Herman D. Hughes, my thesis co-advisor, for his
guidance, intellectual advice and support. Many thanks to Dr. Moon Jung Chung for
his suggestions and careful reading the manuscript. I am grateful to my other guidance
committee members, Professor Carl V. Page, Professor T. Y. Li and Professor Richard
0. Hill, for their help, encouragement and guidance.

I am indebted to my wife Bi-Yun for her support, understanding and encourage-
ment during my stay at MSU. Without her love and support, this thesis would not
have been possible. I am grateful to my parents for their support, to my sons Sean
and Andy for giving me a lot of joy and inspiration during the writing of this thesis.

vi

TABLE OF CONTENTS

LIST OF TABLES ix
LIST OF FIGURES x
1 INTRODUCTION 1
1.1 Network Topology ‘ ............................ 2
1.2 Motivation ................................. 4
1.2.1 Hypercube ............................. 5

1.2.2 Incomplete Hypercube and Generalized Fibonacci Cube . . . . 6

1.2.3 Graph Embedding and Data Communication .......... 7

1.3 Overview .................................. 8

2 PRELIMINARIES 11
2.1 Incomplete Hypercube .......................... 11
2.2 Generalized Fibonacci Cube ....................... 12
2.3 A Characterization of Generalized Fibonacci Cube ........... 15
2.4 Basic Properties of Generalized Fibonacci Cube ............ 19
2.4.1 Number of Edges ......................... 20

2.4.2 Node Degrees ........................... 23

2.5 Summary of Results ........................... 25

GRAPH EMBEDDING IN GENERALIZED FIBONACCI CUBE 27

3.1
3.2
3.3

3.4
3.5

Hamiltonian Path ............................. 29
Cycles ................................... 31
Mesh .................................... 37
3.3.1 2D Mesh .............................. 37
3.3.2 3D Mesh .............................. 44
Hypercube ................................. 45
Summary of Results ........................... 46

vii

4 DATA COMMUNICATION IN GENERALIZED FIBONACCI

CUBE 49
4.1 Routing .................................. 51
4.1.1 Shortest path ........................... 52
4.1.2 Deadlock .............................. 56
4.2 Single Node Broadcasting ............ i ............ 57
4.3 Analysis of Single Node Broadcasting .................. 64
4.3.1 All-port Model .......................... 64
4.3.2 One-port Model .......................... 65
4.3.3 Optimality of One-port Algorithm ................ 73
4.4 Single Node Accumulation ........................ 75
4.5 Single Node Gather and Scatter ..................... 76
4.5.1 One-port Model .......................... 80
4.5.2 All-port Model .......................... 81
4.6 Multinode Broadcasting and Accumulation ............... 83
4.7 Summary and Remarks .......................... 85

5 GRAPH EMBEDDING AND DATA COMMUNICATION IN IN-

COMPLETE HYPERCUBE 88
5.1 Hamiltonian Problem ........................... 90
5.1.1 Cycles ............................... 91

5.1.2 Hamiltonian Path .................. . ....... 93

5.2 I Routing and Single Node Broadcasting ................. 94
5.3 Analysis of Single Node Broadcasting .................. 97
5.3.1 All-port Model .......................... 97

5.3.2 One-port Model .......................... 97

5.4 Single Node Accumulation ........................ 104
5.5 Single Node Gather and Scatter ..................... 104
5.6 Multinode Broadcasting and Accumulation ............... 106
5.7 Summary of Results ........................... 107

6 CONCLUSION AND FUTURE WORK 109
6.1 Main Results and Contributions ..................... 109
6.2 Future Work ................................ 111
BIBLIOGRAPHY 114

viii

2.1
2.2
2.3

3.1

4.1

5.1
5.2

LIST OF TABLES

Example of generalized Fibonacci numbers ................
Examples of 2—FCs and 3-FCs .......................

A comparison of the hypercube and generalized Fibonacci cube. . . .
Embedding of Hypercubes .........................

Time complexities for communication primitives in I‘fr ........

5—bit Binary Reﬂected Gray Codes. ...................

Time complexities for communication primitives in I N. ........

ix

13
14
26

46
87

91
108

LIST OF FIGURES

1.1 The composition of the family of network topologies. .........
2.1 Structure of Is. ..............................
2.2 Second-order Fibonacci cube of dimensions 2 to 6 ............
2.3 Third-order Fibonacci cube of dimensions 3 to 7. ...........
3.1 F: contains a cycle of length 20 ......................
3.2 Embedding of cycles in F3; : (a) a decomposition of F2, (b) Case 1, and

(c) Case 2 ..................................
3.3 Embedding of an F {E21 x F9] and an FF] x FF] meshes in F3. .....
3.4 Dividing an (m + n)-bit 2-FC into an m- and n-bit 2-FCs. ......
3.5 Dividing an (m + n)-bit 3-FC into an m- and n-bit 3-F Cs. ......
3.6 Dividing an (m + n)-bit 4-FC into an m— and n-bit 4-FCs. ......
3.7 Embedding of meshes in Pic ........................
3.8 Embedding of meshes in F3. . . ' ........... I ..........
3.9 Two different labelings of F3. ......................
3.10 Embedding of meshes in Pic ........................
3.11 Dividing an (1+ m + n)-bit 2-FC into an I-, m- and n-bit 2-FCs. . . .
3.12 Dividing an (I + m + n)-bit 3-FC into an l-, m- and n-bit 3-FCs. . . .
4.1 Routing path from 01010 to 10101 in [12,. ................
4.2 An SBT rooted at node 0000. ......................
4.3 Broadcast from node 01010 in I‘?, .....................
4.4 Construction of (b) STF:(01100) from (a) SBT5(01100). .......
4.5 Type-a node and its type-ﬂ child .....................
4.6 A portion of ST F:(011011). .......................
4.7 One-port broadcasting in F3, from node 00000. .............
4.8 An optimal one-port broadcasting in I? from node 00000 ........
4.9 Single node gathering in B4. .......................
4.10 (a) STF:(01100) and (b) a subgraph of SBT5(01100) ..........
4.11 One-port multinode broadcasting through the longest cycle. .....

12
16
16

32

35
37
38
40
41
42
42
43
43
44
48

55
58
61
69
71
73
73
74
77
79
86

5.1 Embedding of longest cycles in (a) I22, and in (b) 130 .......... 92

5.2 Embedding of a cycle with length 22 in 123 ................ 93
5.3 Embedding of a Hamiltonian path in I23 ................. 94
5.4 Routing path from node 011 to 100 in Is ................. 95
5.5 A broadcasting from node 0101 in In. ................. 96
5.6 One-port broadcasting from node 0101 in In. ‘ ............. 98
5.7 Construction of STIao(01110) from SBT5(01110). ........... 98
5.8 Construction of STIgg(01110) from SBT5(01110). ........... 99
5.9 Type-a node and its type-ﬂ child ..................... 102
5.10 STIu(1010). ............................... 103
5.11 (a) 5T122(01110) and (b) a subgraph of SBT5(01110) .......... 105

xi

CHAPTER 1

INTRODUCTION

High performance computers have become indispensable for many application areas
such as engineering design, system modeling and simulation, medical and basic science
research [4] [13] [27] [30] [35] [49] [65] [74] [104] [114] . Two fundamental approaches
have been employed to improve the speed of computers. The ﬁrst approach is to
increase the speed of the circuitry used in building the computers. The second ap-
proach, which is commonly called parallel processing, is to use multiple processors in
executing multiple operations concurrently. The speed of light imposes a fundamental
limit on the ﬁrst approach, whereas the number of processors is virtually unlimited.
Hence parallel processing appears to be the more promising approach.

There are two basic models for studying the parallel computers. In the Parallel
Random Access Machine (PRAM) model [45] (also referred to as the Shared-Memory
model), multiple processors can randomly read or write any locations of a shared
memory, and processors communicate by writing/ reading through the shared mem-
ory. On the other hand, in the Distributed-Memory (DM) model, a memory module
is associated with each processor, and two processors communicate by explicitly send-
ing/ receiving messages. With the current technology, it is sometimes criticized that
the PRAM model is unrealistic because for a large system the cost of the intercon-

nection network will become prohibitively high [2]. Therefore, the DM model is much

more suitable for a large-scale parallel system. Since a DM model based on the com-
plete graph may not be cost-effective, networks with relatively sparse interconnections
are often used for the DM systems. There are many types of networks, and each one
offers different performance. Therefore, it is important, to investigate the relations
between the structural characteristics and the performance under common functional

criteria.

1.1 Network Topology

Graphs are often used to model the network topology (i.e., the pattern of the intercon-
nections) of the DM parallel / distributed systems, where a node represents a processor
and an edge (link) between two nodes denotes the communication link between the
corresponding processors. The following criteria are commonly applied to evaluate

network topologies for the DM multiprocessor systems.

1. Degree. The degree of a graph is the maximum number of edges incident on
a node. A network of a large degree usually implies more alternative paths
between a pair of nodes and hence a larger bandwidth. However, a large degree

may also result in a higher hardware cost or difficulty in implementation.

2. Diameter. The distance between a pair of nodes is the smallest number of
edges that have to be traversed in order to reach one node from the other. The
diameter of a graph G is the maximum distance between any pair of nodes
in G. The diameter is often a lower bound on the time required to perform
certain calculations on a network. This is because one processor i may need the
information residing on another processor j, which has to be transferred from
j to i. Therefore, the distance between two nodes determines a lower bound of

the communication time. In general, a small diameter is desirable.

3. Bisection width. The bisection width of a graph G with N nodes is the
minimum number of edges that have to be removed in order to dissect G into
two subgraphs with [IV/2] and [N / 2] nodes respectively. The bisection width
is often a critical factor in determining the speed with which a network can
perform computations that involve a major exchange of information between
two parts of the network. For example, to sort a list of data in a network into
certain order, the data items residing in one part of the network may need to
be swapped with those residing in the other part. Clearly, under this situation,
this operation has to be achieved through the set of links connecting the two
halves. Therefore, the bisection width of the network determines a lower bound

on the time required for sorting on this network.

4. Expansibility. The expansibility of a network measures the ease to add or
delete nodes to/from the network. This consideration is important especially
when the network is still evolving or when it is necessary to construct the system

incrementally.

5. Fault-tolerance. With a large number of processors/ links in systems, the
probability of having a processor or link failure increases significantly. It is
desirable to have a network that provides independent paths between all pairs

of nodes and hence a higher degree of fault-tolerance.

Notice that some of the criteria mentioned above are actually related or even
in conﬂict, hence they cannot be improved without compromising the others. For
example, a low degree usually translates into a large diameter, low bisection width,
fewer alternative paths and hence low fault-tolerance. The designer, therefore, will
have to balance between the cost constraints and the performance requirements of

the network.

1.2 Motivation

A graph is regular if each node has the same degree [108]. Most network topologies for
the DM model are based on regular graphs [2] [46] [75] [97]. Very limited knowledge
has been accumulated for irregular network topologies fOr lack of appropriates tools.
The driving force behind this work is to further our knowledge on network topologies
that display less regularity. Also, instead of proposing a set of unrelated networks,
we aim to provide a family of networks, where the interconnection patterns and the
performance features can be characterized by a set of parameters.

The family of network topologies studied here consists of the generalized Fibonacci
cube [63, 80] and the Incomplete Hypercube [68]. Both the Generalized Fibonacci
Cube (GFC) and the Incomplete Hypercube (IHC) are irregular topologies in general.
However, both of them contain the hypercube (HC) as a subset; moreover, each
member of this family (i.e., a generalized Fibonacci cube or an Incomplete Hypercube)
is a subgraph of a hypercube. Figure 1.1 shows the constitution of this family (see
Chapter 2 for details). The generalized Fibonacci cube stems from the Fibonacci cube
(FC) proposed by Hsu [61, 62], which is also a subset of the generalized Fibonacci

cube. In this thesis, we will study efﬁcient graph embedding and data communication

Figure 1.1. The composition of the family of network topologies.

for the members of this family.

1.2.1 Hypercube

Numerous network topologies have been proposed and studied. Among them, the
trees, meshes, meshes of trees, and the hypercube have drawn the most attentions (see,
for example, [2] [46] [75] [97]). Many useful computations, such as matrix operations
[2] [75] [97] and image processing [32] [33] [91], can be naturally mapped into trees,
meshes, and meshes of trees. However, trees, meshes, and meshes of trees suffer
from a low bisection width or a large diameter, which prevents the development of
logarithmic-time parallel algorithms for fundamental applications such as sorting.

The hypercube (or Boolean cube, binary n-cube, n-cube) of dimension n (denoted
by B”) consists of 2" nodes and n2"'1 edges. The nodes are labeled from 0 to 2" — 1
and each node has an n-bit binary code (called address) such that two nodes are
connected if and only if the binary codes of their labels diﬂer in exactly one bit. Re-
cently, the hypercube has become a popular network topology for parallel processing
[55] [103] [106], because it has many appealing properties such as the logarithmic
diameter and node degree, high bisection width, and ease to embed other common
structures [19] [20] [39] [44] [60] [75] [101] [112] [113]. Intel’s iPSC, NCUBE, Ametek
14 and FPS-T series are a few examples of commercially available parallel computers
based on the hypercube topology. Many applications have been successfully devel-
oped for the hypercube, e. g., image processing [31] [98] [99], matrix algebra [18] [36]
[47], Particle-in-Cell (PIC) [86], and others [46].

Although the hypercube is a powerful and popular topology for parallel computa-
tion, different variations have been proposed to improve certain characteristics, such
as the diameter and node degree, of the hypercube. The approaches can be generally
divided into the following three categories: (1) twisted hypercube [42] [56], (2) en-

hanced hypercube [40] [41] [111], and (3) bounded-degree hypercube-derivatives e.g.,

butterﬂy, cube-connected cycles (CCC) [96], and shuﬂ‘le-exchange [105]. These
networks provide either smaller diameter, or shorter average internode distance, or
low node degree. However, their permissible sizes are generally restricted to powers
of two. As such, they have lower expansibility than the networks which we are about

to study.

1.2.2 Incomplete Hypercube and Generalized Fibonacci

Cube

The Incomplete Hypercube proposed by Katseff [68] improves the expansibility of the
hypercube. The Incomplete Hypercube with N nodes (denoted by I N), where N
may be any positive integer, is constructed in the same way as the hypercube. In
other words, nodes are labeled from 0 to N — l, and two nodes are connected by
an edge if and only if their binary representations differ in exactly one bit. It can
be shown that [N is a subgraph of B,, where n = [log N]." Algorithms for routing
and single node broadcasting have been devised by Katseff [68]; Tzeng [109] has also
analyzed some structural properties, such as diameter, internode distance and trafﬁc
density, of the Incomplete Hypercube, which are shown to be comparable with that
of the hypercube. Furthermore, Tzeng et al. [110] have shown that binary trees and
two-dimensional (2D) meshes can be embedded in the Incomplete Hypercube. These
existing results suggest that the Incomplete Hypercube is a. useful network topology.
However, the Incomplete Hypercube suffers from two shortcomings. One is low degree
of fault tolerance under certain situations. For example, a node may have a degree
of one when an Incomplete Hypercube contains exactly 2" + 1 nodes. The other
shortcoming is the lack of recurrence in its structure, which makes the designing of

certain application algorithms in the Incomplete Hypercube much more difﬁcult.

 

‘In this thesis, all logs are to the base 2.

Hsu [61, 62] has proposed the Fibonacci cube based on the Fibonacci numbers
[53, 71]. Recently, he also presented the generalized Fibonacci cube [63] based on
the generalized Fibonacci numbers (see, e.g., [72]). The generalized Fibonacci cube
can be defined by two integral parameters n and I: where n Z I: Z 2. The lc‘"
order Fibonacci number is deﬁned by Fl,” = F31], + F31], + - - - FEE], for n 2 k, and
F3" = F,“ = = Flt], = o,F":1, = 1. Informally, the 1:“ order Fibonacci cube of
dimension n (denoted by F: ) consists of F,["] nodes; the nodes are numbered from 0
to Fl,” - 1 and each node has a unique address, called generalized Fibonacci code, such
that two nodes are connected by an edge if and only if their generalized Fibonacci
codes differ in exactly one bit. The generalized Fibonacci cube bears two important

features: (1) F: can be decomposed into I: subcubes: FL, , FL.“ - - ~ , I‘" k which are of

"—

sizes F331, Fﬂz - - - , Fifi,“ respectively, and (2) Every node has a logarithmic number
of degree (as a function of the total number of nodes). Therefore, the generalized
Fibonacci cube offers structural recurrences and an improved fault tolerance over the

Incomplete Hypercube.

1.2.3 Graph Embedding and Data Communication

When using DM parallel computer systems, we often partition an application into
many tasks and map (assign) the tasks to the processors. The interaction among tasks
located on different processors are achieved through sending and receiving messages.
This inter-processor communication is usually modeled by a graph which will be
referred to as the guest graph. On the other hand, the graph that represents the
network topology of a given parallel system will be referred to as the host graph.
Let G = (Va, Ea) denote a guest graph and H = (V3, 133) a host graph, where Va
and V” denote the node sets of the guest and host graphs respectively, and E0 and EH
edge sets. The embedding of G in H is a function f that maps each node in G into a

node in H. Dilation and expansion are two common metrics used in evaluating graph

embeddings: the dilation of the embedding is ma:c(dis( f (i), f (j )), V(i, j) E Ea), and
the expansion of the embedding is [if/‘3], where [VGI denotes the cardinality of the set
Va, Will the cardinality of V”, and dis(i, j) the distance between two nodes i and j.

A parallel computer should allow the users to write programs based on the algo-
' rithmic structures (guest graphs) rather than the underlying structure of the system
(host graph). Each application usually dictates a certain guest graph when it is
mapped into a parallel computer system. Our study of graph embedding can provide
useful tools for emulating various guest graphs on a given host graph. Furthermore,
the study of graph embedding will provide insight about the relations among different
network topologies.

Advances in the VLSI technology have enabled us to construct parallel computer
systems consisting of hundreds or thousands processors. This trend has contributed
to an ever ﬁner granularity in parallelism. Consequently, the computation per com-
munication is becoming smaller and smaller. With the current technology, it is easier
to build a high speed processor than to construct a high bandwidth interconnection
network. Therefore, efﬁcient data communications are critical for high performance
parallel computers. In this thesis, we will demonstrate the effectiveness of the family
of network topologies by designing tools for graph embedding and data communica-

tion in it.

1.3 Overview

’We now give a brief overview of this thesis.

In Chapter 2, we review the Incomplete Hypercube and some of its properties.
Then we formally deﬁne the generalized Fibonacci cube, characterize its recurrence
relation, and study some of its basic properties such as the number of edges and node

degrees.

In Chapter 3, we present some graph embeddings in the generalized Fibonacci
cube. The Hamiltonian problem is to determine whether a graph contains a spanning
(Hamiltonian) path or cycle. In [52], it is considered as one of the fundamental
problems in graph theory. The linear array and ring are two common structures
which have many applications in parallel processing [2] [10] [75] [97]. We will show
that 1‘: contains a Hamiltonian path, which implies the embedding of linear array.
Then, by decomposing the generalized Fibonacci cube into subcubes and properly
rearranging the Hamiltonian paths in the subcubes, we prove that each member of
the generalized Fibonacci cube contains a Hamiltonian cycle if and only if it has
an even number of nodes; on the other hand, for members which do not contain a
Hamiltonian cycle, the longest cycles contain all but one node. This result implies
that a ring of size Fl,” can be embedded in F]: with a dilation no more than two. The
mesh structure is commonly used in many applications [2] [10] [75] [97]. Therefore,
we will also present the 2D and 3D mesh embeddings. In addition, we show that
the hypercube can be embedded in a generalized Fibonacci cube, even though each
generalized Fibonacci cube is a subgraph of a hypercube.

In Chapter 4, we study the data communication in the generalized Fibonacci cube,
where we devise and analyze efﬁcient algorithms for the following data communication
primitives:

1. Routing: a node sends a message to another node,

2. Single node broadcasting (accumulation): a node sends a message to all the

other nodes (receives messages from all the other nodes),

3. Single node scatter (gather): a node sends different messages to all the other

nodes (receives one message from all the other nodes), and

4. Multinode broadcasting (accumulation): every node sends a message to all the

other nodes (receives a message from all the other nodes).

10

These primitives are studied because broadcasting is used in many algebraic compu-
tations, see, for example [18] [48] [66]; accumulation can be applied to, for instance,
processor synchronization [94]; scatter and gather are commonly used in, for exam—
ple, preﬁx computation and tree contraction [1] [12] [14]. [64] [73] [88]‘[89] [90]. As in
[9, 67], we will analyze all of our data communication algorithms under two different
models. In the one-port model, one assumes that only one port on every node can be
used for sending and receiving data (one-port model) at any moment. On the other
hand, in the all-port model, one assumes that all the communication ports on every
node can be used for sending and receiving data at any time.

In Chapter 5, we consider the Hamiltonian problem and data communication in
the Incomplete Hypercube. By employing the Binary Reﬂected Gray code (see, e.g.,
[25, 101]), we show that, for N Z 4, I N contains a Hamiltonian cycle if and only if
N is even, whereas the longest cycle contains all but one node if and only if N is
odd. Consequently, I N contains a Hamiltonian path if N is even. On the other hand,
when N is odd, we are still able to construct a Hamiltonian path by substituting one
edge in the longest cycle with another one. Therefore, we have optimal embeddings
of linear array and ring in the Incomplete Hypercube. For data communication,
we demonstrate that all of the communication primitives mentioned earlier (single
node broadcasting and accumulation, single node scatter and gather, and multinode
broadcasting and accumulation) can be also efﬁciently implemented in the Incomplete
Hypercube under both the one- and all-port models.

In Chapter 6, we summarize our main results, highlight contributions, and discuss

some directions for future work.

CHAPTER 2

PRELIMINARIES

In this chapter, we present the basic properties of the family of network topologies.
As mentioned earlier, the family of network topologies of our interest consists of both
the Incomplete Hypercube [68] and the generalized Fibonacci cube [63, 80]. A graph
G is represented by (V, E), where V denotes the set of nodes and E the set of edges.
The Hamming distance between two binary strings is the number of bits where these

two strings differ.

2.1 Incomplete Hypercube

The Incomplete Hypercube was ﬁrst proposed by Katseff [68]. An Incomplete Hyper-
cube with N nodes (where N may be any positive integer) is constructed in the same
way as the hypercube, i.e., two nodes are linked if and only if the Hamming distance
between their addresses (i.e., binary codes) is exactly one. Formally, let H(i, j) de-
note the Hamming distance between the binary representations of two integers i and

j. The Incomplete Hypercube can be deﬁned as follows.

Deﬁnition 2.1 [68] Let N 2 1 denote an integer. The Incomplete Hypercube with
N nodes, denoted by IN, is a graph (V, E), where V = {0,1,---,N — 1} and E =
{(isj) I H(i,j)=1,f07'0 S 5,1. S N ‘- 1}

11

12

Figure 2.1 shows the structure of Is, where nodes are labeled by their addresses. The

 

- Figure 2.1. Structure of 16-

following two lemmas are important in order to see the applications of the Incomplete

Hypercube.
Lemma 2.1 IN is a subgraph of IN“ for any N Z 1.
Lemma 2.2 [68] IN is a subgraph of B" where n = [logN].

With Lemma 2.1, we can build the Incomplete Hypercube incrementally. In other
words, for example, to have an I4, we may construct I; ﬁrst, then add one node
and one edge to form 13, and ﬁnally add one more node and edge. On the other
hand, Lemma 2.2 suggests that the Incomplete Hypercube may have fault-tolerant
applications in the hypercube. For example, when any single node fails in Ba, the

remaining network can still be used as an I7.

2.2 Generalized Fibonacci Cube

In this section, we will deﬁne the generalized Fibonacci cubes. For this purpose,
we ﬁrst examine the generalized Fibonacci numbers (see e.g., [72]). The lc‘h order

Fibonacci numbers, denoted by Elf], are deﬁned by the following equation

Fik] = Frill + 1:11:12 'i' ’ ' ' + Fiﬂk, for Tl 2 k,

13

where

n"“=o for cyst—2, and F3951.

In other words, each number in the sequence is the sum of the preceding 1: numbers.
Table 2.1 lists the ﬁrst 10 non-zero generalized Fibonacci numbers of orders 2 to 10.

Notice that we have F 3:], = F ,E’“ for k 2 2; the usual Fibonacci numbers are obtained

Table 2.1. Example of generalized Fibonacci numbers.

 

 

 

 

 

 

 

 

 

 

 

.- FE FE. FE. F... FE. FE FE FE.—
-1 1 1 1 1 1 l 1 1
0 1 1 l 1 l l 1 1
1 2 2 2 2 2 2 2 2
2 3 4 4 4 4 4 4 4
3 5 7 8 8 8 8 8 8
4 8 13 15 m 16 16 16 16
5 13 24 29 31 3_2 32 32 32
6 21 44 56 61 63 64 64 64
7 34 81 108 120 125 127 128 128
8 55 149 208 236 248 253 255 E

 

 

when k = 2. Furthermore, it is not difﬁcult to see that Rm 2 2““ (i.e., F!” is a
power of 2) for 2 S k S i 5 2k — 1. We next introduce the Fibonacci codes which

enable us to deﬁne the generalized Fibonacci cubes.
Deﬁnition 2.2 Assume that n Z I: 2 2. Leti be an integer where 0 S i < Elf]. The

19"“ order Fibonacci code {k-FC) of i, denoted by FC,‘,“‘(i), is a sequence of n — 1:

binary numbers (b.._k_1, ..., b1, b0) where
1. i: 293-1 b,- . Flfjk, and

J

2. bj-bj+1-...-bj+k_1 =0,f01‘ OSjS(n—2k).

14

In the following, we write F C(i) for F C,‘,“](i) if k and n can be explicitly deduced
from context. We adopt the following convention to number a binary string. The
rightmost (least-signiﬁcant) bit of a binary string is referred to as bit 0 and the second
to the rightmost as bit 1, and so on. The following two observations are immediate

from the preceding deﬁnition:
1. bit 1' in the la“ order Fibonacci code (k-FC) carries the weight of FE)“ and
2. No I: or more consecutive 1’s appear in k—FCs.

These properties distinguish the Fibonacci codes from the (base-2) binary codes. In
the generalized Fibonacci number system [17, 72], every nonnegative integer has a
unique representation as a sum of distinct In“ order Fibonacci numbers (Fl-m, for
i 2 Ir) such that no 10 consecutive Fibonacci numbers are used. The k-FC of an
integer I can be obtained by the following greedy approach: ﬁnd the greatest Fj‘k‘
that is less than or equal to I, assign 1 to bit j — k, then proceed recursively on
I — Ff”; ﬁnally, all unassigned bits are set to 0’3. Table 2.2 lists 2-FCs and 3—FCs for

all integers from 0 up to 12.

Table 2.2. Examples of 2-FCs and 3-FCs.

 

FCWG) FCWU) z FCmﬁ) chu)
00000 0000 7 01010 1000
00001 0001 8 10000 1001
00010 0010 9 10001 1010
00100 0011 10 10010 1011
00101 0100 11 10100 1100
01000 0101 12 10101 1101
01001 0110

 

 

 

 

 

 

 

OQU‘rFOOMt—lOon.

 

 

15

The deﬁnition of the generalized Fibonacci cubes is based on the Fibonacci codes

and the Hamming distance.

Deﬁnition 2.3 The let“ order Fibonacci cube of dimension n, denoted by 1‘: (where
n Z I: Z 2), is a graph (V:,E,’f), where V: = {0,1,..,F,‘,“] — 1} and E: = {(i,j) |
H(FC,‘,"1(i),FC,[,"l(j))= 1, 0 g i, j < F34}. Deﬁne r: = (¢,¢) for 0 g i g k — 2, and
I”ti—1 = ({0}’¢)-

Recall that Fit], = F9] for I: 2 2. Deﬁnition 2.3 implies that both FL, and
I"; represent the same graph and they contain exactly one node. Figure 2.2 shows
the structure of the second-order Fibonacci cubes (or Fibonacci cubes as in [61, 62])
of dimensions 2 to 6, where nodes are labeled by 2-FCs. It can be seen that 1‘3,
can be partitioned into two subcubes: one subgraph (induced by the black nodes
in Figure 2.2) is isomorphic to FL, and the other (induced by the white nodes)
is isomorphic to FLT Figure 2.3 shows some of the third-order Fibonacci cubes.
Similarly, P: can be partitioned into three subcubes: the ﬁrst one (the black nodes
in Figure 2.3) is isomorphic to F ‘24, the second (grey nodes) to FL, and the third
(white nodes) to I‘:_3. The following observation is important: F: is a subgraph of
3..-]. for n _>_ k. In the following section, we will prove this observation and other

properties of the generalized Fibonacci cubes.

2.3 A Characterization of Generalized Fibonacci

Cube

It is clear that the deﬁnition of the generalized Fibonacci cubes is analogous to that of
the Boolean cubes. In this section, we study some basic properties of the generalized
Fibonacci cubes and their relations with the Boolean cubes. It is necessary to examine

the properties of lc-FCs at this point. For convenience, we view each k-FC as a string.

010

a
1‘,

 

I

r3 r4 1 so:

 

 

 

r3
oooo oozo 1000 1010
0001 31 noon 1°“
moo 0110
1100
0101 no:
r,

Figure 2.3. Third-order Fibonacci cube of dimensions 3 to 7.

Let a and ,6 denote two strings, and let A represent the null string. The following
notations will be used: the concatenation of two strings or and ﬂ is denoted by a - ﬂ,
or simply aﬂ. Deﬁne A - a = a - A = a. We use 01" to represent the concatenation of
a string a with itself for i times, where i 2 1; deﬁne a0 = A. For a set of strings S,
deﬁnea-S={a-ﬂ|,365}.‘

Let V: denote the set of lc-FCs for integers between 0 and Fl,” — 1, where n > k.

Deﬁne VLI = V}: = {A}? It should be clear that the nodes in F: can be labeled

 

’Notice that I"‘_l and I‘: each contains exactly one node. We use A to address the node in this
case.

17

exactly by the codes in 11:. From Table 2.2, we can observe that V: = {0,1}, V}
= {00,01, 10}, and v: = {000,001,010, 100,101} = {000,001,010} 0 {100,101} = 0-
V} U10. V3. Moreover, V: = {A}, V} = {0,1}, and V53 = {00,01,10,11}, and V:
= {000,001,010,011,100,101,110} = {000, 001,010,011} 11 {100,101} u {110} = 0.
V: U10- V2 U110- V3. This observation leads to the following lemma that describes

an important recurrence of the set V5.

Lemma 2.3 Assume that I: 2 2, and n 2 2k. Let V: denote the set of lc-FCs for
integers between 0 and F,‘,"1 - 1. Then

v}: = 0 - v,';_, u 10 - v,’:_, u ---1"-’0 - v,':_,,+, u1k-10-v,’:_,,.

Proof: By induction on n. D

We next examine the recurrence of the generalized Fibonacci cubes. By deﬁnition,
F: contains Fl,” nodes. Recall that F,[“1 is deﬁned as the sum of the lc preceding gen-
eralized Fibonacci numbers. Lemma 2.4 shows that 1‘: contains lc disjoint subgraphs
that are isomorphic to I"‘_1 , FL” - - - , and Ff,_k respectively; mOreover, each node
in a subcube of dimension j is connected to one node in every subcube of dimension

i, provided j < i.

Lemma 2.4 Assume that k 2 2 and n 2 2k. Deﬁne 9::(2') = 1‘0 - V"_1_,-, and Gf,(i)

the subgraph induced by 9:0) in Pi, where 0 S i < 10. Then 1‘: can be partitioned
into Gf,(0) , Gf,(1) , -- - ,G:(lc — 1) such that

(1) Gf,(i) is isomorphic to FL _,- for 0 S i < lc, and
1

(2) For two nodes :1: and y in Ff, such that a: in Gf,(i), and y in Gf,(j) and 0 S i <
J‘ < k, the edge (:r,y) is in r: if and only if y — a: = F‘k‘

n-l—i'

Proof: Property (1) is an immediate result of Lemma 2.3. It remains to show

property (2). The if part follows from Deﬁnition 2.3. We need to prove the only if

18

part. By deﬁnition, we have g:(i) = 1‘0 VLF,- and g:( j) = 1’0 V:_1_j. By recur-
sively applying Lemma 2.3 on 13:44, we can deduce that 95(5) contains 1‘01""“0
V:_1_,,-. Notice that 95(1) can be rewritten as PIN-“10 VLF,- . Since (any) is an
edge in F 1:, they must differ in the (n — lc — l — i)“ bit. Therefore, the difference

between y and a: is ‘k‘ D

n-l-i'

Example 2.1 Let k = 3 and n = 6. We have 93(0) = {000, 001,010,011}, 93(1) =
{100,101,} and 93(2) = {110}. Figure 2.3 shows that F: can be partitioned into G3(0)
(black nodes), G20) (gray nodes) and Gg(2) (white node), which are, respectively,
isomorphic to F2 , F3 and F3. Furthermore, G20) is connected to 03(0) by the edges
(100, 000) and (101, 001); G§(2) is connected to G2(0) by the edge (110, 010), and
G§(2) is connected to G2(1) by the edge (110, 101).

The relation among different generalized Fibonacci cubes is captured by the fol—

lowing lemma.

Lemma 2.5 Assume that k _>_ 2 denotes an integer.
(1) 1‘: is a subgraph of Ff,“ for n _>_ k, and

(2) I": is a subgraph of F2“ for n > 1:.

Proof: By Deﬁnition 2.3. Omitted. Cl

Lemma 2.5 has the following two implications.

1. We may build a generalized Fibonacci cube incrementally. In other words, for

example, to construct F3, we can ﬁrst build F: then adding nodes and edges to

form 1‘3, F3, and ﬁnally F3.

2. The generalized Fibonacci cube deﬁnes a class of network topologies with a large

alternatives for fault-tolerance. This is because when certain nodes and / or links

19

fail in Ft, we may form a Ff. (where 2 S lc’ S k and lc < n’ S n) from the

remaining nodes and links.

It is known that nodes in BM}. can be addressed by (n — lc)-bit binary codes where
n > It, similarly it is seen in Section 2.2 that nodes in "F: can also be addressed by
(n — lc)-bit Fibonacci codes. Since the set of m-bit Fibonacci codes is a subset of all

m-bit binary codes, we have the following relations between the generalized Fibonacci

cube and the Boolean cube.

Lemma 2.6 Assume that k 2 2 denotes an integer.

(1) 1‘: is isomorphic to 8..-], for all n, where I: S n < 2k.

(2) F: is a proper subgraph of Bn._1, for all n 2 2k.

One way to obtain F: from BM}, is to remove every node (along with their inci-
dent edges) whose binary address is not a lc-FC. Lemma 2.6 also indicates that the
generalized Fibonacci cube may have fault-tolerant applications for the hypercube.
We now closely reexamine Fig. 1.1 that shows the constitution of the family of
network topologies under our study. Recall that F,‘,“] = 2"“‘ for 2 S k S n S 2k — 1.
Therefore, by varying k and n, it can be shown that the hypercube is a special case
(subset) of the generalized Fibonacci cube. Furthermore, Flt] = 2" -— 1 for k 2 2.
Hence, when 2 S k S n S 2k, P: (which contains Fl,” nodes) and I Fitz] are isomorphic,

because both are either a hypercube or a hypercube with one node missing.

2.4 Basic Properties of Generalized Fibonacci

Cube

In this section we study some basic properties of the generalized Fibonacci cube. In

particular, we will count the number of edges and the node degrees. The number of

20

edges is an important metric in studying a network topology because it is a factor
related to the cost of the network. On the other hand, the minimal degree of a
topology usually sets a bound on the connectivities of a graph, which is a prime

factor in studying fault-tolerance.

2.4.1 Number of Edges

In this section, we count the number of edges in the generalized Fibonacci cubes. Let
Elf] denote the number of edges in I“: By Lemma 2.4, 1‘: consists of k subgraphs
Gf,(i), where 0 S i < k. The recurrence equation of Elf] is

EL” = E53. + ESE. + - -- + E52. + Fl“. + 2F.‘.’:‘3 + - ~ + (k — 1)F.E".‘. (2.1.)

where

E,‘k1=0 for OSiSk and Elli-1:1-

To see Eq. (2.1), we note that the ﬁrst 10 terms account for the number of edges in
subgraphs Gf,(i)’s and the remaining k — 1 terms account for the number of edges
among these subgraphs. For example, the subgraph Gf,(0) has Em, edges, and there
are Fit]: edges connecting the subgraphs Gf,( 1) to Gf,(0).

We now solve Eq. (2.1). For convenience, we introduce the generating functions
(see e.g., [53]) for the generalized Fibonacci numbers. Let G‘k](z) denote the generat-
ing function for the In“ order Fibonacci numbers, where k 2 2. It is known (see e. g.,

[72]) that

zlc—l

G"“(z) =

 

l—z—zz----—z" (2'2)

21

We will ﬁrst solve the case when k = 2. In this case, we have
13,131: E531. +£12.12 +FI’J. (2.3)

Let H [21(2) denote the generating function for Effl’s, and deﬁne El" = 0 for i < 0.

By multiplying both sides of Eq. (2.3) with z“ and taking summations, we have

We) 5 23,?le
n=0

= 2 E2112” + 2 E5222" + z F3122"
n=0 n=0 n=0

= zH‘2‘(z) + 22H‘2‘(z) + acme)

= _.1__2_z2gl2l(z)

l—z—z

where
z

G‘2‘(z) =

l—z—z2

as we saw in Eq. (2.2). Further calculation leads to
HMO!) = ZIG‘2‘(Z)12 (2-4)

It can be shown that H [21(2) is a convolution of the function G‘2‘(z) with itself and

followed by a shift. Therefore, we can solve Eq. (2.4) and get

n-l
£3,131 = Z n‘2‘F,‘,’_‘,_, for n 2 1 (2.5)

i=0

We next proceed to solve the case when k = 3. According to Eq. (2.1), we have

Bits] = EEL + E5312 + Eitsls + Fria—lz + 2Fkﬂs

ﬂ

22

Let H [31(2) denote the generating function for EE’I’s, and deﬁne EP‘ = 0 for i < 0.

Follow a similar procedure, we have

1
z—z2

 

H‘3‘(z) = 1 _ _ 23(z"‘G‘:”](z) + 223G‘3‘(z))

where

$2

 

G‘3‘(z) = 1

— z — z2 -— 23
Hence,

H [31(2) = [091(2)]? + 2ZlG‘3‘(«':)12

It is not surprising that the relation between H [31(2) and G‘3](z) is similar to what

we observed in the second order case. TherefOre, we have

n

n—l
E531 = 2E‘3‘F‘3‘ + 2 Z 151911519114 for n 2 1

n—i
i=0 i=0

For the general case, let H [“l(z) denote the generating function for Elfl’s, and deﬁne

Elm = O'for i < 0. Follow a similar procedure, we have the expression for H “1(2) :

H‘k‘(z) = 1 . X

l—z—zz—----z"

 

where
X = (zFGI’=1(z)+ 223Gl’=1(z) + - . - + (k — 2)z“‘1G"“(z) + (k — 1)z“G““(z))
By substituting Eq. (2.2), we have
1 _2_

H"“(z) = -,;,;_—;.,[G"“(2)]2 + ,1.-. [G"“(.-z)]2 + . -- + (k — 2)[GI"1(2)]2 + (k — 1)z[G““(z)]2

where G‘k](z) is the generating function for the lat“ order Fibonacci numbers as ex-

23

pressed in Eq. (2.2). Therefore, it can be shown that

k—l n-2+j
Elf] = :(k— j) Z F,"“F,‘,"_‘,+,_,. for n > k (2.6)
j=l i=0

Let X: denote the number of edges in 3..-}, for n 2 k 2 2. For sufﬁciently large n, it

can be shown that

5,1,9

0.79 S X:

S 1 (2.7)

 

where the lower bound is obtained [62] when k = 2, and the upper bound is achieved

when 1‘: becomes a hypercube.

2.4.2 Node Degrees

We now study the node degrees in the generalized Fibonacci cube. In particular,
we will determine the maximum and minimum node degrees. Recall that nodes in
F]: are addressed by (n — k)-bit k-FCs. Clearly, node 0 has a degree of n — lc, since
every node adjacent to node 0 has exactly one bit with value 1 in it lc-FC. Therefore,
the maximum node degree of 1‘: is n — k. In the remainder of this section we will
determine the minimum node degree.

Two observations are immediate:

1. For Ir 2 2 and lc S n < 21:, by Lemma 2.6, F: is isomorphic to Bn-k. Hence all

nodes have the same degree of n — k.

2. For l: 2 2 and n = 2k, F: is obtained by removing the node with 1" as its

address. Therefore the minimum degree of nodes is k — 1.

The next lemma characterizes the general case.

24

Lemma 2.7 For I: Z 2 and n 2 2]: + 1, let p denote the node in F" whose lr-FC is

n)

formed by repeating the sequence 01““0. Then p has the minimum degree in Pi.

Proof: We distinguish among the following three cases:

Case 1: n=2lc+l (i.e.,n-lc=lc+l)

Clearly, node p has a degree of k - 1. By observation 2 in the preceding paragraph,
the minimum degree of nodes in FL, is also lc — 1. Furthermore, it can be shown
inductively that the minimum node degree of F: is a non-decreasing function of n for
a ﬁxed lc. Therefore node p has the minimum degree in Pi.

‘Case 2: 2k+1<n<3k+2 (i.e., lc+1 <n—k<2(k+l))

Let r = n — 2k — 1. We divide the (n - k)-bit generalized Fibonacci code into two
parts: Part L which includes the leftmost k + 1 bits, and Part R which includes the
rightmost r bits. We claim that the minimum node degree of Ff, is k + r — 2 if r = k,
or I: + r — 1 if otherwise (i.e., r < h). Then, it is not difﬁcult to verify that p is a
node which has the minimum node degree as we just speciﬁed. To see the previous

claim, we further consider the following two possibilities:

(a) Part L is of the form (01k‘10). The minimum node degree determined by Part L
is k— 1, which has been shown in Case 1. The minimum node degree determined
by Part R is r—l if r = k and Part R contains exactly lc—l 1’s, or r if otherwise

(i.e., r < 1:). Thus the minimum node degree is exactly as we claimed.

(b) Part L is not of the form (01k‘10). There is at most 1 bit in both Part L and R

which cannot be ﬂipped from 0 to 1; hence the minimum node degree is k+r— 1.

Comparing the above two, we conclude that the minimum degree is captured by (a),
which is what we claimed.
Case 3: n 2 3k+2 (i.e., n—ch 2(k+1))

This can be proved similarly as we did in Case 2. D

25

Example 2.2 Consider I: = 4. Nodes (011001), (0110011) and (01100110) are re-
spectively a node with the minimum node degree in Pg, , F11 and IE2 , respectively.

The minimum node degree is 4 in the above three cases.

Lemma 2.7 leads to the following result:

Lemma 2.8 Let 6: denote the minimum node degree of Pi. For n _>_ I: + 1, we have

the following recurrence equation:
6:: = 53+, + [C — 1 (2.8)

wherebﬁ=n—lcforkSnS2lc—l and6§k=k—l.

One way to solve Eq. (2.8) is to apply the generating function as we did in Sec-
tion 2.4.1 to count the number of edges. For simplicity, we make use of some obser-
vations obtained in the proof of Lemma 2.7 to solve Eq. (2.8). Recall that the node
with its k-FC formed by repeating the sequence 01“"10 has the minimum degree. Let
m = [(n — k)/(lc + 1)]. By completing Case 3 in the proof of Lemma 2.7, it can be
shown that

k n‘—k—2m—1 ifn=m(k+1)+2k
6": (2.9)

n — k — 2m otherwise

where

m = 1(n — was + 01.

2.5 Summary of Results

We have reviewed the Incomplete Hypercube and deﬁned the generalized Fibonacci
cubes. Table 2.3 compares some parameters between the hypercube and the general-

ized Fibonacci cube.

3H

app]?

9-0

26

Table 2.3. A comparison of the hypercube and generalized Fibonacci cube.

[L__ _ l :8“ l I‘l‘. I]
[I Number of nodes 2"" FE]

[I Number of edges (n — l1:)2"""’l Eq. (2.6)

 

 

 

 

 

 

 

 

 

Max. Degree n — lc n — lc
Min. Degree 11 — 10 Eq. (2.9)
I] Diameter n — k n — kl

 

I See Chapter 4.

For each element of the family, we have shown that I N is a subgraph of B,, where
n = [logN], and 1‘: is also a subgraph of 8..-}. for n 2 k 2 2. Additionally, both
the Incomplete Hypercube and the generalized Fibonacci cube include the hypercube
as a special case. The family of network topologies has the following two areas of

application:

1. to allow an incremental expansion of a hypercube-like parallel computer sys-

tems, and

2. to serve as a fault-tolerant structure for the hypercube systems.

CHAPTER 3

GRAPH EMBEDDING IN
GENERALIZED FIBONACCI
CUBE

As explained in Chapter 1, graph embedding is an important issue in parallel com-
puter systems. In this chapter, we will study embeddings of linear array, ring, mesh,
and hypercube in the generalized Fibonacci cube.

Embeddings in the hypercube have been studied extensively in the past [19] [20]
[21] [26] [39] [57] [59] [60] [101] [112] [113] . In many of these studies the Gray code
[50] has been an important tool. The Gray codes are binary codes representing a
sequence of numbers such that the codes representing two successive numbers are
different in exactly one bit. From the property of the Gray code, it can be seen that a
linear array can be embedded in the hypercube [101]. Furthermore, by employing the
binary reﬂected Gray codes Saad and Schultz [101] have shown that cycles of even
lengths can be embedded in the hypercube.

By choosing ml-bit Gray codes for the :r-direction and mg-bit Gray codes for the
y-direction, it is also shown in [101] that an (m1 +m2)-cube embeds a 2D mesh of size

2'"1 x 2"”. In general, a (2m x 2'"2 x - - - x 2"”) d-dimensional mesh can be embedded

27

28

in B”, provided that n = m1 + mg + - - - + m4 [101]. Clearly, when the side lengths
of a mesh are not powers of two, the Gray code scheme may not result in a minimal—
expansion embedding. Chan and Chin [20] have employed the Gray code combined
with an ad hoc method to achieve minimal-expansion, dilation-2 embeddings of most
2D meshes. By using optimal embeddings of small meshes with certain sizes and
a scheme called “graph decomposition,” Ho and Johnsson [59, 60] have shown that
87% of all 2D meshes and a large portion of 3D meshes can be embedded in the
hypercube with minimum expansion and dilation two. Finally, Chan [19] has settled
the question regarding minimal-expansion, dilation-two embeddings of all 2D meshes
in the hypercube. I

The Hamiltonian problem in the generalized Fibonacci cube has been studied in

[83] where we have constructively proved the following two optimal results:
1. F: contains a Hamiltonian path, and

2. for Fl,” 2 4, F]: contains a Hamiltonian cycle if and only if Fl,” is even, whereas

the longest cycle in F: contains all but one node if and only if F,‘,"] is odd.

These results imply that F 5‘, embeds a linear array of size F,‘,"‘ with expansion-1 and
dilation-1, and F]: embeds a ring of size F,‘,"‘ with expansion-1 and dilation no greater
than two. In this chapter, we extend our work to the embeddings of meshes, which
stem from the embedding of linear arrays and rings. Furthermore, we will also report
the embedding of the hypercube in the generalized Fibonacci cube.

The remainder of this chapter is organized as follows. In Section 3.1, we prove
that each generalized Fibonacci cube contains a Hamiltonian path. By embedding
linear arrays in the subcubes and properly arranging them, we show in Section 3.2
that any cycle of length I can be embedded in Ff" provided that 4 S l S F,‘,“] and l is
even. With the Hamiltonian path, in Section 3.3 we study the embeddings of 2D and

3D meshes. Section 3.4 describes the embedding of the hypercube in the generalized

29

Fibonacci cubes. Finally, we summarize our graph embedding results in Section 3.5.

3.1 Hamiltonian Path

In this section, we will show that there is a Hamiltonian path in every Ft, which serves
as the basis for our study of cycles in the generalized Fibonacci cubes. To motivate,

we ﬁrst study the Hamiltonian paths in the second-order generalized Fibonacci cube.

Example 3.1 The sequence P0 = (01,00,10) and P1 = (0,1) are, respectively,
Hamiltonian paths in 1‘: and I‘g (refer to Fig. 2.2). By Lemma 2.4, the graph F:
contains a copy of F3 and a copy of I‘g. To construct a Hamiltonian path in I‘g,
we attach a preﬁx ‘0’ to each code in P0 and “10” to each in P1, respectively, then
reverse both sequences. The two resulted sequences are (010,000,001) and (101,100).
Since H(001, 101) = 1, these two sequences can be linked to form a Hamiltonian path
in I‘g. Repeating the same procedure, we can obtain a Hamiltonian path in Ff, by

concatenating (0100,0101,0001,0000,0010) and (1010,1000,1001).

By using induction, it can be shown that the recursive procedure described in
Example 3.1 always produces a Hamiltonian path in the second order generalized
Fibonacci cube. The following theorem extends this observation to the generalized

Fibonacci cube.
Theorem 3.1 I‘: contains a Hamiltonian path for n 2 Ir 2 2.

Proof: Here we will prove the case of k = 3 by induction (on n). Cases when
k > 3 can be proved similarly. For convenience, nodes are referred to by their k-FCs.
A path will be represented by the sequence of the nodes in it.

It is easily seen that P0 = (A), P; = (0,1) and P2 = (01, 00, 10, 11) are Hamiltonian
paths in F3, F2, and F2, respectively. We establish our induction basis for n = 6. To

obtain a Hamiltonian path in F2, we take the following procedure:

30

Step 1 Preﬁx a ‘0’ (respectively, “10” and “110”) to each code in P0 (respectively,
P1 and P2). Let the resulted sequences be denoted by 0P0,10P1, and llOPg,

respectively.

Step 2 Reverse each code in 0P0,10P1, and 110P2, respectively. Let the resulted

sequences be denoted by 0P3, 10F; and HOE, respectively.
Step 3 Concatenate the three sequences in the following order: O-PZ, 1071— and 1107);.

Therefore, we obtain the sequence (011, 010, 000, 001, 101, 100, 110) that is a Hamilto-
nian path in F3. We assume that the above procedure always results in a Hamiltonian
path in 1‘: for all n S I, where I Z 6 denote an integer.

Now consider the case when n = I + 1, i.e., 1?“. By Lemma 2.4, 1‘93], can be
decomposed into three subgraphs, G?(0), G3_,(1) and G§_2(2), which are isomorphic
to PE”, 1431,, and F5312 respectively. By induction hypothesis, there is a Hamiltonian
path in F53], [“1311 and F‘alz, respectively. Let P,- (where 0 S j S 2) denote the

‘3‘ - obtained by the previous procedure. Let H,- denote the

Hamiltonian path in I‘I_,

reversed sequence of Pj. It remains to show that the three sequences (i.e., 0H0, 1071
and 110??) obtained in Step 2 of the previous procedure can be concatenated.

Let the ﬁrst (respectively, last) node of a path be referred to as the head (re-
spectively, tail), and let h,- (respectively, t.) denote the head (respectively, tail)
of the Hamiltonian path obtained by the previous procedure in I? (For example,
h3 = t3 = A, h = 0, and t4 = 1.) It is can be shown that the following recurrence
relation is a consequence of the procedure described above: h,, = 0 . t.._1.

Given this, observe that the tail of 0770- (which was the head of UFO) is now labeled
by 0 - h; = 0 - 0t1_1. On the other hand, the head of 107’;- (which was the tail of NH)
is labeled by 10 - t1..1. Since H(00t1_1,10t1_1) = 1, there is an edge between the two
nodes, i.e., the two (Hamiltonian) paths can be linked together. Similarly, the tail of
10? now has the label 10 - h1_1 = 100 - n-2, while the head of 11072 now has the

31

label 110 - t[-2. Hence there is also an edge between the two nodes. Therefore, we
conclude that the previous procedure results in a Hamiltonian path in F5311.
It should be clear that the above argument can be extended to the general case

when I: > 3. _ D

3.2 Cycles

In this section we will study the embedding of cycles in the generalized Fibonacci
cubes. The embedding of cycles is based on the embedding of Hamiltonian paths pre-
sented in the previous section and the recursive properties of the generalized Fibonacci
cubes. Let L: be the Hamiltonian path constructed according to the algorithm de-
scribed in the previous section. To motivate, let us consider embeddings of cycles
in F: in the following example. Recall that L: in F3 is obtained by a concatenation
of two sequences in 0V? and 10123. By Lemma 2.4, F: consists of two copies of F3,
(i.e., 00V: and 1011:) and one copy of F2 (i.e., 0101):). By properly arranging three
Hamiltonian paths in these three subcubes, we can obtain embeddings of cycles in

rg.

Example 3.2 In Example 3.1 we have shown that L3, = (0100, 0101, 0001, 0000,
0010, 1010, 1000, 1001). Now, let Lop = (000100, 000101, 000001, 000000, 000010,
001010, 001000, 001001), L0,] = (010100, 010101, 010001, 010000, 010010), and L1 =
(100100, 100101, 100001, 100000, 100010, 101010, 101000, 101001). It can be seen
that Lop, L0,1 and L1 are Hamiltonian paths in 001162, 0101/52 and 101162, respectively.
Figure 3.1 shows that 1‘; contains a cycle of length 20. Notice that the nodes are
addressed by 6-bit 2-FCs, where the least signiﬁcant 4 bits are shown next to the
nodes and the most signiﬁcant two bits are shown on the far left—hand side. Note

that the ﬁrst I nodes in both Lop and L1 can be used to construct a cycle of length

32

21, where 2 S l S 8. Furthermore, nodes in LOJ are needed to construct a cycle of

length 18 or 20.

MCI.-

0100 0101 0°01 0°00 0°10

   

01 O
Lo.1
zoo
LOAD ” exoe oxen ooo: oooo 0010 ten 1ooo
L1 1° one out see: oooo ooze use 100 zoo:

Figure 3.1. F: contains a cycle of length 20.

We now proceed to study the embedding of cycles in the generalized Fibonacci

cubes. Lemma 3.1 will facilitate our presentation.

Lemma 3.1 Assume that Is 2 2, and n 2 21: + 1. We have

V: = ovii-l U 1(vrli-l\lk-10vrli—k—l)

= 0(vrli—1\1k-10vrli--k 1) U 1(Vn-1\1k_10vrli—k-l) U Olk-lovii—k—i

Proof: By Lemma 2.3 we have

v: = 0V,’:_1 010123;, 011012;, 11 - - - u 1"“201I,’f_,,,1 u 1""‘012’1,c
= 0v: _, 01(01):: _,UIOV,'f_3U mu 1""3012,’f_,,+1 111" 20121: -1.)

ka -1 U1(vr’i-l \ lk-lovr’i-k-l)

= 0V..( —1\1k-10V:— k— 1) U 01"“10V,‘f_,,_ 1 U 1(1’71—1\1k-10vr’:-k—1)

C1
lie
456:

one 1

p l

33

Theorem 3.2 Assume that lc _>_ 2. For all n Z I: + 2, any cycle of length I can be
embedded in I‘: provided that l is even and 4 S l S F,[“].

Proof: An example for k = 2 has been examined in Example 3.2. We will ﬁrst

show the case when k = 3. By Lemma 3.1, we have
v2 = 012,1, u 1023;1 \110v3_,)

Deﬁne So to be the subgraph induced by nodes in OWL1 , and 51 the subgraph induced
by nodes in 1033-1 \ 1101734), i.e., nodes in 101124 U 1101134. Note that So is
isomorphic to FL1 and 51 is isomorphic to a graph obtained by removing a subcube
I‘ll—4 from F24 (refer to Figure 3.2 (a)). By Theorem 3.1, there exists a Hamiltonian
path in So; furthermore, from the construction of the Hamiltonian path in F3_1, we
can observe that there exists a Hamiltonian path in 51. Let L0 be the sequence of
3-FCs formed by attaching a preﬁx ‘0’ to each 3-FC of Lil-1: and L1 be the sequence
of 3-FCs formed by attaching a preﬁx ‘1’ to each of the ﬁrst (FE!l — F334) 3-F Cs of
Lil—1' Note that attaching the preﬁx ‘1’ to each element of the set(V3_1 \ 1101134)
still results in a valid 3—FC. One should be able to see that L0 and L1 are Hamiltonian
paths in So and 51, respectively. Moreover, for all i (where 1 S i S F311 — 51,),
there is an edge connecting the it“ nodes of L0 and L1 (refer to Fig. 3.2 (b)). We next
show that, by properly selecting nodes in the L0 and L1, cycles of even length can be
embedded in F3. We distinguish between the following two cases:

Case 1: 4 _<_ I 5 2(F,£".‘1 — FE, .

The cycle is constructed by the ﬁrst I / 2 nodes in both L0 and L1 together with two
additional links (represented by heavy segments in Fig. 3.2(b)). The two links are:
one edge connecting the heads of L0 and L1; the other edge connecting the two (I / 2)“

nodes in both Hamiltonian paths involved.

Case 2: 2(F,‘;°'_‘, — F,‘,3_‘4) <13 FE].

34

Deﬁne l’ = l - 2(F,‘,3_]1 — 33,). We further break Lo into two parts: [10.0 which is
a Hamiltonian path in 0011i2 U 0101224, and L0,; which is a Hamiltonian path in
01101124 (refer to Fig. 3.2(c)). Notice that 001);, contains 001011,:4 (see Fig. 3.2
(a)), and each node in 00101231,4 has an edge connecting to one node in 01101224.
The cycle of length 1 consists of all nodes in [10,0 and L1 together with the ﬁrst 1’
nodes in Log. Since the inclusion of two adjacent nodes in L03 requires a pair of “up”
and “down” edges (represented by heavy segments in Fig. 3.2 (c)), only cycles of even
length can be embedded in P i

We next discuss the case for arbitrary k where It: > 3. Following the same approach
as in lc = 3, let us define Lop, L1 and LOJ to be the Hamiltonian paths in 0(V,'f_l \
154011344), 0(V“_1\1"’10V,'f_,,_1), and 01“‘10V,',‘_,,.l , respectively. It is not difﬁcult
to see that any cycle of length 23 can be obtained by selecting the ﬁrst 3 nodes from
each of Lop and L1, where 2 S s S F321 — F,‘,k_‘k_1 . To have a cycle of length 23, where
F3111 — FRI“, < s S Fiﬁ/2, we just include the ﬁrst 23 — 2(I*",‘,k_‘l — Fi’ﬂkd) nodes in

the [10,1 . D

It can be easily seen from the previous proof that P 1‘, contains a Hamiltonian
cycle if the length of L04 is even. Notice that L0,] is a Hamiltonian path in a graph
isomorphic to Pf,_,,_1 , which contains exactly I"',‘,"_‘,,_1 nodes. We next examine some
properties of the generalized Fibonacci numbers and the generalized Fibonacci cubes,

and show that Pf, contains a Hamiltonian cycle if and only if Fl,” 2 4 and is even.

Lemma 3.2
(1)Forlc=2 andi22,

E]? is even, and 17:]211, F3314 are odd.

(2) Fork_>_3andi22,

I: I: k I.
Fikiqw Fishy—31 F([k1-1)i-4’ vF([Is]+1).'—k are we": and

35

 

 

 

 

 

 

 

J 3
av", my” 110V”,
J J J
ooV,,_, may” or 1oV,,_,
J J
000V”, oo1oV,” oonovr,
(a)
1'0
- A A A A A A A A A A
0v, ' v v v v v v v v v '
n-l
nevi“; nevi, : = = = = :
LI
(5)
L0,:
3 .
°"°V...4
Lap
3 3 1 D
00VM U 010V”
3 3
10V” U "We: : = e = e =
L!

(c)

Figure 3.2. Embedding of cycles in 1‘3 : (a) a decomposition of PE], (b) Case 1, and
(c) Case 2.

k I:
F([k]+l)i-—11 F([k]+1)'-_2 are Odd.

Proof: By induction on 1'. CI

Note that F311,“, is even if and only if E?“ is even, where n 2 2(k + 1). Hence

by Theorem 3.2, Pf, is Hamiltonian if F,‘,“] 2 4 and it is even. Furthermore, by

Lemma 3.2, k — 1 out of the k + 1 consecutive 10‘" order Fibonacci numbers are even.

36

We next study the Hamiltonian problem for the generalized Fibonacci cubes whose
total numbers of nodes are odd. It is known (see e.g., [44]) that the binary hypercubes
are bipartite. Moreover, a bipartite graph cannot have a cycle of odd length. Hence

we have the following property (see e.g., [101]):
Lemma 3.3 There is no cycle of odd length in the binary hypercubes.

Lemma 2.6 shows that Pf, is a subgraph of 3,...» Therefore P: is also bipartite.

Consequently, we have
Lemma 3.4 There is no cycle of odd length in the generalized Fibonacci cubes.
Theorem 3.2, Lemmas 3.2 and 3.4 lead to the following result:

Theorem 3.3
(1) For k = 2, P3, is Hamiltonian if and only if n = 3i for some integer i _>_' 2;

otherwise, the length of the longest cycle is Ff] — 1.

(2) For lc Z 3, P: is Hamiltonian if n can be expressed as one of the following
forms: (I: + l)i, (I: + l)i — 3, (Ir +1)i— 4, ~-- , (k +1)i— ls: for some integer

i 2 2; otherwise, the length of the longest cycle is F,‘,"‘ — 1.

Corollary 3.1 For k 2 2 and n — k 2 3, a ring of size 131“] can be embedded in P1“,

with dilation one if F,‘,“] is even, and dilation two if otherwise.

Proof: The case for F,‘,“] being even is proved in Theorem 3.3. If F,‘,“] is odd, the
only node that is not included in the longest cycle has at least one adjacent node
which is in the cycle. Hence the embedding of ring can be achieved by paying a

higher price (i.e., dilation is 2). Cl

37

3.3 Mesh

In this section, we will study the embedding of mesh in the generalized Fibonacci cube.
We will begin our discussion with the embedding of 2D mesh, then extend our result
to mesh of higher dimensions. For convenience, all the Hamiltonian paths referred in

this section are obtained by the procedure mentioned in the proof of Theorem 3.1.

3.3.1 2D Mesh

To motivate, we examine an example of embedding in P3.

Example 3.3 In Example 3.1, we saw that (01,00,10) and (010,000,001,101,100) are
Hamiltonian paths in P} and P2, respectively. Figure 3.3 shows that an F5‘2‘ x F?‘
and an FF] x F12] mesh can be embedded in P3, where a 2-FC is divided into a left
(leftmost 4-bit) and a right (rightmost 3-bit) parts. Notice that the sequence of codes
over the y-direction in the F5‘2‘ x F5‘2‘ mesh corresponds exactly to the Hamiltonian
path in P2, whereas the sequence of codes over the x-direction is obtained by sufﬁxing
a ‘0’ to each code of the same Hamiltonian path. On the other hand, the sequences
of codes over the z- and y-direction in the FF] x F12] mesh are obtained by sufﬁxing

“01” and preﬁxing ‘0’ to the Hamiltonian path in P3.

 

 

 

 

 

 

0100 0000 0010 1010 1000 0101 0001 1001
010 010
000 000
001 001

 

 

 

 

 

 

101

 

100

 

 

 

 

 

 

 

Figure 3.3. Embedding of an F 5‘2] x F 5‘21 and an F4‘2‘ X FF] meshes in P3,.

38

Figure 3.4 explains the idea behind the embedding shown in Example 3.3. It can
be seen that there are exactly two possible combinations when dividing an (m +n)-bit

2-FC into an m- and n-bit 2-FCs. Consider a node whose bit n is a 0 (combination A).

. l .. xlol 1x .. 1
s I xlolil [on I

Figure 3.4. Dividing an (m + n)-bit 2-F C into an m- and n-bit 2-FCs.

 

 

 

 

Observe that all the n bits in the right part can vary, which results in all n-bit 2-F Cs;
on the other hand, all but the rightmost bit (which is ﬁxed to 0) in the left part can
vary, which results in all (m — l)-bit 2-F Cs. Recall that any generalized Fibonacci
cube contains a Hamiltonian path. We can select the Hamiltonian path in P3,“. as
the sequence of codes for embedding over the :c-direction, and the Hamiltonian path
in P3,, +1 with a sufﬁx ‘0’ as the sequence of codes for y-direction. Therefore, the nodes
captured in combination A will enable an embedding of a mesh with size Fifi, x F312.
Similarly, the nodes captured in combination B will enable an embedding of a mesh

with size FE] x F311. We next show that this scheme can be generalized to embed

2D meshes in the generalized Fibonacci cubes.
Theorem 3.4

(I) For m _>_ 2 and n 2 1, an+n+2 embeds an FE], x F322 and an F3] x FE],

meshes.

(2) Form 2 3 and n 2 2, Fiﬁ” embeds an F‘S‘H x F933, an F9!” x (F,‘,312+F,‘,3311)

m n m

and an FE] x Fﬂz meshes.

39

(3) For k > 4, m 2 I: and n 2 k —1,P]‘,,+,,+,a embeds an F‘Jihx F‘ﬂhan
I: I: I: k I: I:
Fhik-2X(F1si+lk— MFriik-2+"°+Frii1) 0" Finlk-3X(Fr[1+]-k—1+F[+]-lc—2+H +

F1“), - - - ,and an F,‘,[“ x F [fits-1 meshes.

Proof: The case when I: = 2 has been examined in the preceding paragraph. We
will prove the case when k = 3. Consider dividing an (m +n)-bit 3-FC into two parts,
where the left part consists of the leftmost m bits and the right part the remaining
n bits. Figure 3.5 shows the four possible combinations. It can be shown that the
nodes captured in combination A (respectively, D) enable an embedding of a mesh
with size F ‘ ,3, x F,‘,+3 (respectively, FE," x F32). To see the embedding of the mesh
with size REL, x (Fiﬁ + F311), observe that the nodes captured in combinations B
and C have the same left part and their right parts can be obtained from V2” by
removing all codes in the 1101):. Hence by the procedure presented in the proof of
Theorem 3.1, we are able to embed a linear array of size Fﬂz + F331 in them.

The case when lc Z 4 can be handled similarly. For example, Figure 3.6 shows
the case when k = 4. It can be seen that there are seven possible combinations to
break a 4-FC. Again, nodes captured in combination A enable an'embedding of an
F313 x F214 mesh, nodes captured in combination G an F ‘4‘ x F323 mesh, nodes
captured in combinations B, C and D enable anF ><F(“ F‘4‘3 FE] 2+ Ff,“ ) mesh.
Finally, nodes captured in combinations E and F enable an F2], x (F323 + F312)

mesh. CI

The embedding of meshes in P :20 is displayed in Figure 3.7. It is seen that there
are three separate 2D meshes. This is similar to what we saw in Fig. 3.3 where two
separate meshes are obtained when k = 2. One interesting question is whether these
separate meshes are connected? Figure 3.8 shows that the two meshes embedded
in P3 (refer to Fig. 3.3) are indeed connected and form one piece. Notice that the

code sequence over the :c-direction can be obtained by reversing each 2-FC of the

 

IQ
0

can

The

40

“l *l°| I1‘ I
Bl xloH hlx
cL xl°|1||1l°lx I
I>| Fl°l1|d MK I

Figure 3.5. Dividing an (m + n)-bit 3-F C into an m- and n-bit 3-FCs.

 

 

 

 

 

 

 

 

 

Hamiltonian path in P: (refer to Example 3.1). We next demonstrate that by properly
rearranging the code sequence over the :c-direction, these meshes are connected and
form one piece. It is necessary to deﬁne the Reversed Fibonacci Code (RFC). Given
a k-FC P = (p;p,-_1 - - -p1po), the RFC of P (denoted by RFC’(P)) is (popl - - op,-_1p.-).
Figure 3.9 shows that there are two different ways to label the nodes in P (2,: one is
2-FC and the other is in reversed 2-FC. The following lemma shows that the RFC

can be also used to label nodes in the generalized Fibonacci cube.

Lemma 3.5 Let p (0 S p < F,‘,“]) be a node in P1], and FC(p) =
pn—Ic-lpn-lc-2 ' "P1P0- LCt q = PoFrilﬂl + PlFsi’ﬂz + ' " + pn-k-2Frik—lk+l + pn-k-l Elk—1k-

Then q can be used to relabel the node p if all nodes in Pf, are relabeled by their RFCs.

Proof: The proof consists of two parts. We ﬁrst show that RF C() is a one-to-
one and onto function from V: to 11:. Then we demonstrate that two nodes with
unit Hamming distance under their h-FCs will preserve unit Hamming distance under

their RFCs. The ﬁrst part of our proof consists of three steps:

1. RFC () is a one-to—one function. To see this, assume that RFC () is not a one-to-

one function. Then there exist 0 S p 75 p’ < F,‘,“] such that RF G (p) = RFC (p’ )

Al :11 I: I
3| onM 1on 1
- 4...] 111on |
0L 411111111on |
21 4011111 1on |
.[ 401111111“
«F 411111 1on I

Figure 3.6. Dividing an (m + n)-bit 4-FC into an m- and n-bit 4—FCs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This implies that F C (p) = F C (p’ ) Since the k-FCs for both p and p’ are unique,

we conclude that p = p’, which is a contradiction.

2. Given V: as the domain of the function RF C(-), then the range will also be V,’f.
To see this, assume that the binary code Q = RF C(p) is not in V,’f, then Q
must contain lc consecutive 1’s. This implies that F C (p) contains k consecutive

1’s, which is a contradiction.
3. RFC () is an onto function. This follows from 1 and 2.

Now, we prove the second part. Recall that two nodes p and p’ are adjacent in
Pf, if and only if their lc—FCs differ in exactly one bit. In other words, p and p’ are
adjacent if and only if Ip — p’ | = E‘k‘ for some I: S i < 11. Given a pair of adjacent

nodes pand p’, let q and q’ be their second labels, respectively. It can be shown that

42

0110 0100 0000 0010 1010 1000 1100 0101 0001 1001 1101 0011 1011

 

Figure 3.7. Embedding of meshes in Pic-

0010 1010 1000 0000 0100 0101 0001 1001

 

Figure 3.8. Embedding of meshes in P3.

|q — q’l = Fﬂbh, if and only if [p - p’] = F,‘k‘ for some I: S i < n. In other words,

two nodes with unit Hamming distance will preserve unit Hamming distance in their

second labels. ‘3

From Fig. 3.5, it can be seen that the least signiﬁcant bits(s) in the left parts are
ﬁxed to ‘0,’ “01” and “011,” respectively. Lemma 3.5 shows that nodes can be also
labeled by their RFCs. The Hamiltonian path constructed by the recursive procedure

presented in Section 3.1 will include nodes with leading bit(s) ‘0,’ then nodes with

‘43

 

Figure 3.9. Two different labelings of Pg.

“10” and followed by nodes with “110.” Figure 3.10 shows that by ﬁrst obtaining the
Hamiltonian path and relabeling each node with its RFC over the :c-direction, we are
able to connect the three separate meshes into one piece for the embedding in P330.

Clearly, this scheme can be applied to k > 3.

0110 0010 1010 1000 0000 0100 1100 1101 0101 0001 1001 1011 0011

 

 

 

 

 

 

 

 

 

 

 

 

 

011 011
010 010
000 ‘ 000
001 001
101 101
100 ' 100
110 , 110

 

 

 

 

 

 

 

 

 

Figure 3.10. Embedding of meshes in P 110.

44

3.3.2 3D Mesh

The embedding of 3D meshes is a further extension of the results obtained in the
previous section. Recall that the approach taken before is to divide the lc-FC into
two parts. To have an embedding of 3D mesh, we divide the k-FC into three parts:
left, middle and right, which are used for embedding over the 1:, y and 2 directions.
Figure 3.11 shows that there are four possible combinations when dividing an (I +

m + n)—bit 2-FC into an l-, m- and n-bit 2-FCs. Notice that the difference between

I bits m bits n bits

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

D] xloll [olx x 0'1] o]xg

Figure 3.11. Dividing an (1+ m + n)-bit 2—FC into an l-, m- and n-bit 2-FCs.

 

 

 

the left and middle part in combinations A and B is similar to what we saw in the
2D case (refer to Fig. 3.4), except that the rightmost bit in both middle parts are 0’s.
By analogy, it can be shown that nodes captured in combination A and B embed the
following two 3D meshes of sizes: (A) HE], x FE!” x F312, and (B) FP‘ x F3] x F312,
respectively. Similarly, the nodes captured in combinations C and D embed the
following two 3D meshes of sizes: (C) RE], x RE] x Fiﬁ], and (D) Em X Fifi, X F5311,
respectively.

The embeddings in Pf+m+n+3 is pictured in Figure 3.12. There are a total of 16

possible combinations. By analogy, the sizes of the meshes are:

t"

assesses:

r SD 2 i: r ?= e

' FIE]: X F314»: X Fiﬁ-’13,

E311 X Fﬁii X Fair-sis,
F311 X FE] X F33,
Iv'P‘ x FE], x F323,
EﬁxFﬂsXEﬂ.
FE. x FE] x FE.
F511 X Friil-l X F5112,
F,‘3‘ x FE] x FE]; ,
dﬁxfﬂsxﬂi.
ﬂﬂxﬁExﬂﬁa
Flt-:11 X Fiﬁ-1 X F31,
F13] >< F1?! x FE.
FIE-‘2 X FE] X F312,
F1311 X Figl-l X Fills,

Eli-11 X F512 X Full-2,

71

FP‘ x F51. >< F‘i‘ls.

ﬂ

and

respectively.

3.4 Hypercube

We saw that each generalized Fibonacci cube is a subgraph of a hypercube.

45

In

Theorem 3.5, we will show that a generalized Fibonacci cube also embeds a hypercube.

Theorem 3.5 For n 2 k 2 2, P: embeds Bd where d = n — k — m and m

[(71 - Wkl-

46

Proof: Recall that nodes in P: are addressed by (n— k)-bit k-FCs. Table 3.1 shows
some embeddings of hypercubes in Pf, for some 2 S k S 5 and 2 S n S 10. Notice
that a ‘-’ means that such Pf, is undeﬁned; a ‘A’ stands for the graph containing a
single node, which trivially embeds Bo; a ‘x’ means “don’t care.” Finally, the total
number of ‘x’ in an entry determines the dimension of the hypercube that can be
embedded in the P ,’§ Since no 1: consecutive 1’s is allowed in the lc-FCs, it can be
shown that every I: — 1 x’s should be followed by a 0.. Hence we have d = n — lc — m.

C]

Table 3.1. Embedding of Hypercubes.

 

 

 

 

 

 

 

 

ln\kll2 l3 l4 l5 l
2 A A -— -—
3 x A A -—
4 x0 x A A
5 x0x xx x A
6 x0x0 xx0 xx x
7 x0x0x xx0x xxx xx
8 x0x0x0 xx0xx xxx0 xxx
9 x0x0x0x xx0xx0 xxxOx xxxx
10 x0x0x0x0 xx0xx0x xxxOxx xxxxO

 

 

3.5 Summary of Results

We have considered the Hamiltonian problem, mesh and hypercube embeddings in
the generalized Fibonacci cube. For the Hamiltonian problem the following optimal

results have been obtained:

47

1. P: contains a Hamiltonian path. Hence a linear array of size Fl” can be em-

bedded in Pf, with unit dilation.

2. The longest cycle in Pf, (where n — 3 Z k 2 2) contains F,‘,"] nodes if F,‘,"] is
even, and F,‘,"] — 1 nodes if otherwise. Consequently, a ring of size F,["1 can be

embedded with dilation-l if F,‘,"] is even, with dilation-2 if otherwise.

Using the result of Hamiltonian path, we have shown that certain 2D and 3D meshes
can be embedded in the generalized Fibonacci cube with expansion-l and dilation-l.
Moreover, we have also shown how to embed a hypercube in the generalized Fibonacci

cube.

48

lblts mbits nbits

 

 

xI°I Ix *I°| |x

 

 

xl°l1l 1on ‘ onI I:

 

 

 

...H 11on FM 1:

 

 

 

X O[1]1| IOIx XIO] IX

 

 

 

4.] [x 14.44 |.[x

 

 

 

 

 

x1011 .. 40111 M:

 

 

 

 

xI°I1I I1I°Ix x|°I1I |°Ix

 

 

 

“11111 1.1. onH 1on

 

 

 

 

x 0] I1: x]o]1| [110 x

 

 

 

 

 

 

 

*|°|1| «>1: . olll {1|on

 

 

 

 

XLOlll [llolx XIOII 1'0 X

 

 

 

 

 

 

 

 

X Olllll O X XIOII] [IIOIX

 

 

 

 

 

xlol |* Mild Iolx

 

 

 

x|°lll Iolx 441111 1on

 

 

*I°|1I I1I°|x xI°|1I1I F’Ix

 

 

 

 

 

x o]1|1] [o x onIlIlI [0]):

I
J
|
J
I
I
I
I
I
I
I
I
|
I
|
|

 

re 3.12. Dividing an (I + m + n)-bit 3-FC into an l-, m- and n-bit 3-FCs.

CHAPTER 4

DATA COMMUNICATION IN
GENERALIZED” FIBONAC CI
CUBE

Efﬁcient data communications are critical to the performance of the parallel or dis-
tributed systems. Many previously known results have focused on systems that are
based on regular interconnection topologies such as the hypercubes and meshes [58]
[67] [92] [93] [100] [106]. In this chapter, we design and analyze algorithms for data
communications in the generalized Fibonacci cubes.

Algorithms for routing and single node broadcasting messages in hypercubes have
been extensively studied in the past: Deadlock-free and shortest-path routing algo-
rithm is ﬁrst presented by Sullivan and Bashkow [106]. By employing an integer weight
[106] for bookkeeping, they have also presented an optimal single node broadcasting
algorithm. In [100], the communication latency was considered by Saad and Schultz
in designing algorithms for single node broadcasting and other data communication
primitives. By splitting packets and using multiple paths, several optimal and near
optimal broadcasting algorithms have been reported by Ho and Johnsson [58, 67].

Unfortunately, none of these routing and single node broadcasting algorithms applies

49

50

to the generalized Fibonacci cube, even though they are subgraphs of the hypercubes.

Message routing under the faulty hypercube has also drawn attention from re-
searchers. Chen and Shin [23, 24] have proposed a depth-first search scheme for
routing in an injured hypercube; Gordon and Stout [51] have studied random and
sidetracking routing for the hypercube in the presence of faults. However, these ap-
proaches do not guarantee that the paths produced are the shortest possible and
deadlock—free.

Routing and single node broadcasting algorithms for the Incomplete Hypercube
are studied by Katseff [68]. We showed in [81] that the algorithms designed for the
Incomplete Hypercubes can be applied to the (second-order) Fibonacci cubes as well,
which is a little surprising because the latter appear to be less regular than the former.
The extension of the single node broadcasting algorithm to the generalized Fibonacci
cubes has been reported in [80]. In this chapter we further extend our work and

report results on the following communication primitives:
1. Single node broadcasting,
2. Single node accumulation,
3. Single node gather,
4. Single node scatter,
5. Multinode broadcasting, and

6. Multinode accumulation.

Communication in the system is achieved by message passing on links. A message
originated from one processor (node) and destined for another may be routed through
some intermediate processors. A path is represented by an ordered list of nodes
(processors) in which every two consecutive nodes are connected by a link (edge).
The length of a path is the number of links in the path. A link has a link number i

if and only if it connects two nodes whose addresses differ in exactly bit i.

51

The relative address of two nodes is the bitwise Exclusive-OR, denoted by GB,
of their addresses (which are binary codes for nodes in the hypercube or Fibonacci
codes for nodes in the generalized Fibonacci cube). For convenience, we use 8,1, to
denote an n-bit binary string with bit j being 1 and [all other bits being 0, i.e.,
8,1; = e..-1e,,-2 - - - eleo where e,- =1 and e.- = 0 Vi 75 j.

The organization of this chapter is as follows. We present distributed routing algo-
rithm in Section 4.1, and show that the routing algorithm carries two most desirable
properties, i.e., all the paths determined are the shortest possible and deadlock-free.
The distributed shortest—path, deadlock-free single node broadcasting algorithm is
presented in Section 4.2, which is based on a spanning tree (called STF) on the gen-
eralized Fibonacci cube. In Section 4.3, we analyze the single node broadcasting in
two models: We show that our proposed all-port broadcasting algorithm is optimal
in terms of minimizing time steps. By using the most-signiﬁcant-linlc-ﬁrst scheduling
discipline, our one-port broadcasting algorithm is shown to be optimal for many cases.
Results for the single node accumulation are reported in Section 4.4. Algorithm for
the single node scatter/ gather is presented and analyzed in Section 4.5. Section 4.6
reports the results for the multinode broadcasting and accumulation. Finally, we

summarize our results in Section 4.7.

4.1 Routing

The routing problem we studied in this section is to send a message from one node
to another node. Katseff [68] has devised a routing algorithm for the Incomplete
Hypercube. The key idea is to identify an existing link and forward the message via
. it until the message reaches the destination node. We discover that a similar approach
can be applied to the generalized Fibonacci cubes.

The routing algorithm is a procedure that is executed by the source (originating)

52

node and by every intermediate node on the path to the destination. It is initiated by
the source node which computes its relative address with respect to the destination
node, then it sends the message to the next node on the path. This procedure is
repeated until the message reaches its destination. Let (reladd, msg) denote the
information to be sent on links, where reladd is the relative address between the
destination node and an intermediate node while msg denotes the message to be
sent. Notice that reladd is updated on each intermediate node when the message is
being routed toward the destination. Formally, the routing algorithm is described as

follows.

 

Algorithm 1 To send a message from node source to node destination:

if node_id 8 source then
reladd «— source 69 destination

else /* this is an intermediate node */
receive (reladd, msg).

if reladd 75 0 then

begin /* message has reached an intermediate node */
Let i be the bit number of the most significant 1 bit
in the reladd, where link i from this node exists.
Send (reladdEB 8,1,4, msg ) on link i.

end

 

 

 

4.1.1 Shortest path

In [68], Katseff proved that the length of a path determined by this algorithm, when
applied to the Incomplete Hypercube, is equal to the Hamming distance between
any source and destination nodes. We next show that Algorithm 1 can be applied

to the generalized Fibonacci cubes; furthermore, we will show that all routing paths

53

determined are the shortest possible. It is necessary to introduce a few notations
at this point. Let B = (bn_1b,,_g ---b1bo) be an n-bit string, then B[i:j] denotes
the substring (b,b,-..1 - - - bj) of B, where i and j are two integers and n — 1 Z i 2
j Z 0. For example, suppose that B = (b4b3b2b1bo), then B[4z2] = (b4b3b2). The
minimum (respectively, maximum) of two integers i and j is denoted by min(i,j)
(respectively, max( i, 3)) Lemma 4.1 characterizes the interconnection rules for nodes

in the generalized Fibonacci cubes.

Lemma 4.1 Interconnection rules : Let p denote an arbitrary node in P:. Let
FC(p) = p.._k-1p,,_;._2 . . .plpo be the lc-FC of node p, and Q denote FC‘k](p) 69831-1:
Then, link j from node p exists if and only if one of the following two conditions is

satisﬁed:
R1 p, = l, or
R2 p, = 0 and no 1: consecutive 1 ’3 appear in Q[min(n-k-1,j+k-1):max(j—k+1,0)].

Proof: The ﬂipping of bit j in F C (p) from 1 to 0 always results in a valid lc-F C,
because this never creates k consecutive 1’s. Hence we have the rule R1. Similarly,
to obtain a new k-FC by ﬂipping bit j in F C (p) from 0 to 1 is possible if and only if
the new code does not contain k or more consecutive 1’s. Now, the ﬂipping of bit j

in F C (p) from 0 to 1 can possibly create 1:: or more consecutive 1’s only in Q[min(n-

k-1,j+k-I):max(j-k+1,0)]. This is captured by rule R2. C]

We now present an important property of Algorithm 1.

Lemma 4.2 Lets and d be two distinct nodes in P "

n)

and let H(s,d) be the Hamming
distance between these two nodes. Algorithm 1 always ﬁnds a path of length H(s,d)

from s to d.

54

Proof: First we prove that Algorithm 1 always succeeds in ﬁnding a
link at each intermediate node. Let F C(s) = sn_k-ls,,_k_2 - . . also and
F C(d) = "4-14.4-2 ---d1do be the k-FCs of nodes 3 and d respectively. If
H (F C(s),F C(d)) = 1, then by Deﬁnition 2.3 s and d are connected. Without loss
of generality, assume that H(FG(s),FG(d)) Z 2. Let R = rn_k_1r,,_k-2 - --r1ro be
the relative address of nodes 3 and d. We say that R speciﬁes bit i if r,- = 1. Assume
that j is the most signiﬁcant and l is the second most-signiﬁcant bit speciﬁed by R.
We show that either link j or link I exist(s) from node 3. By repeating the same
procedure, we conclude that Algorithm 1 always succeeds in ﬁnding a link at each
intermediate node. To see the existence of link j or I, we distinguish between the
following two cases.
Case 1: le—lc+1.
We further consider the following two subcases: (a) s,- = 1 (d,- = 0). By rule R1 of
Lemma 4.1 linkj exists from s. (b) s,- = 0 (d,- = 1). If s; = 1 (d; = 0), then by rule
R1, link 1 exists from 3. On the other hand, when 31 = 0 (d; = 1) it should be clear
that there is no lc consecutive 1’s in FC(d)[n — k — 1 : max(j — k + 1,0)]; hence by
rule R2, link j exists from 3.
Case 2: l<j-lc+1.
If s,- = l (d,- = 0), then by rule R1, link j exists from 3. On the other hand, if
s, = 0 (d,- = 1), then link j also exists from 3. To see this, Observe that there is no k
consecutive 1’s in FC(d)[n - k - 1 : max(j — k + 1,0)], additionally, j is the only bit
where FC(s)[n — k —l :max(j — k +1,0)] and FC(d)[n - k —1:max(j— k +1,0)]
differ. Hence by rule R2, link j exists from s.

It is clear that, every time the message is forwarded from any intermediate node,
the distance to the destination is also reduced by 1. Hence the length of each routing
path determined by Algorithm 1 is exactly the Hamming distance between the source

and destination nodes, i.e., H(s,d). D

55

Example 4.1 Consider the routing of a message from the source node 01010 to the
destination node 10101 in P3. The Hamming distance between these two nodes is 5.
Figure 4.1 shows the (directed) routing path between the source and the destination
nodes given by Algorithm 1, which has a length of 5. The dotted edges represent the

remaining links in P?,.

 

Figure 4.1. Routing path from 01010 to 10101 in P3.

It is known that given two nodes 3 and d in a hypercube, it is impossible to have a
path between 3 and t with a length less than H (s, d). Since the generalized Fibonacci
cube is a subgraph of the hypercube, the same bound applies. Therefore we have

proved our main result in this section.

Theorem 4.1 Algorithm 1 can be applied to any generalized Fibonacci cube to route
messages between two arbitrary nodes; moreover, the path determined is alway the

shortest possible.
Corollary 4.1 The graph Pf, is connected and its diameter is n — 1:.

Proof: Given any two nodes in P:, Algorithm 1 always succeeds in ﬁnding a path

between them; hence the graph is connected. To see the diameter, we note that the

56

Hamming distance between the two nodes (1010 - . ) and (0101 - - ) is exactly 11 — k,

which is clearly the maximum between any two nodes. 0

4.1.2 Deadlock

An important consideration of any message routing algorithm is to avoid deadlock
when multiple sources and destinations are involved in routing concurrently. We
now show that Algorithm 1 is deadlock-free for the generalized Fibonacci cubes. We
assume that all links are bidirectional and there is a buffer of ﬁxed size at either end
of each link. Lemma 4.3 is taken from [68], which says that if there is a deadlock,

there must be a cycle among the deadlocked nodes.

Lemma 4.3 If a network with a ﬁnite number of nodes is deadlocked, then there
exists a cycle of nodes ao,a1,- - - ,am_1 (where a node may appear more than once)
in which each node a,- is blocked sending a message that has arrived from the link

(a(,-_1)modm,a,-) and is to be sent on the link (a;,a(;+1)mod m).

Theorem 4.2 shows that the condition described in Lemma 4.3 cannot occur when

applying Algorithm 1 to the generalized Fibonacci cubes.

Theorem 4.2 Algorithm 1 is deadlock-free when it is applied to the generalized Fi-

bonacci cubes.

Proof: Observe that each link number must appear for an even number of times
in the cycle. To see this, each node in the cycle differs from the previous one in
exactly 1 bit, and a bit must be ﬂipped an even number of times so as to return to
its original value. Assume that there is a deadlock in P:. Then there is a cycle of
nodes a0, a1, - ~ - , a,,,_1 as described in Lemma 4.3. Let In,- be the link number through

which a,- is intended to send a message to a(,-+1) modm. Let 2 = max{ln,-|0 S i < m}.

In
an

10

Span
Cubes
‘33 001.

for 3

57

Then there exists a node a,- at which In,- = z and In,- represents a l to 0 transition in
Fibonacci codes.

We now consider the message that is blocked at transmission from node a ,-. Clearly,
by our assumption this message came from a(j-1)mod m and is destined for a(,~+1)mod m
or beyond. It should be easy to see that F C(a(,-_1)modm) and FC(a(,-+1) modm) differ
in two bits: 2 and z’, where z 2 z’ and link 2’ connects a(,-_1)mod,,, to a,- and link 2
connects a,- to a(,+1)modm. We further distinguish [between the following two cases.
Case 1: z’ < 2.
In this case, node (a(,-_1),,,odm — F z‘ﬂk) is in Pi, because its bit z is 0 while nodes a,
and a(j-1) mod... have bit 2 being 1. By rule R1 of Lemma 4.1, there is a link between
a(,-_.1) mod", and (“(i-1)mod m - Fz‘ﬂk). Therefore, by Algorithm 1, aj_.1 would send the
message along link 2 (instead of link 2’), which is a contradiction.
Case 2: z’ = 2.
In this case, there are exactly two nodes in the cycle, i.e., a(,--1) modm(= “(141) mod m)

and a5. Therefore 2’ must represent a transition from 0 to 1, since 2 is deﬁned as a 1

to 0 transition. This is also a contradiction, because all links are bidirectional. D

4.2 Single Node Broadcasting

In this section, we study the problem of sending a message from one node to all
the other nodes, i.e., the single node [9] (or one-to-all [67]) broadcast. The familiar
spanning binomial trees(SBTs) [67] have been used for broadcasting in the Boolean
cubes [67, 100, 106]. Figure 4.2 displays an example of a (directed) SBT in B4 rooted
at node 0000, where links are labeled by their link numbers. Broadcasting algorithms

for 3,, based on the SBT can be schematically described as follows (see e.g., [100]).

1. The source node sends the broadcast message along its link n — 1.

58

2. For i = 2,3,~--n do:
Each node (including the source node) that has received the broadcast message

in a previous step sends the message via its link n — i.

 

3 2 0
cm ®1®®
21 0 10 o
1;... ®
M1011 @D

Figure 4.2. An SBT rooted at node 0000.

The above description also demonstrates that a message broadcast from a node
to all the other nodes in B" can be ﬁnished in n steps based on properties of the
SBT. In the following, we view the SBTs as directed trees, where the direction of
a link is deﬁned as the direction in which the message ﬂows. Given a node p, the
link from which p receives its message(s) is called an in-edge of p, similarly, the link
through which p sends its message(s) is called an out-edge of p. It is clear that the
root of an SBT has out-edge(s) only, leaf nodes have in-edges, whereas all the other
nodes have one in-edge and at least one out-edge. We note that different SBTs may
be obtained by using different construction rules. The SBTs we will be referring to
in the remainder of this presentation are obtained according to the following rules:
(1) The root node selects all of its incident links as out-edges, and (2) Let i be a
non-root node and l,- the link number of its in-edge. Node i selects all of its incident

link(s) whose link number is less than I,- as out-edge(s) in constructing the SBT (see

fr

ca

E)

eIIi
the

init

‘11 rc

Sendi

59

Fig. 4.2).

In [106], an integer weight is used to implement the SBT-based broadcasting in
Boolean cubes. The key idea is to sent a weight along with the message to indicate
which link(s) each receiving node should use to forward the message. Katseff [68]
modiﬁed this idea for broadcasting messages in Incomplete Hypercubes. His algorithm
uses an array (called travel) of size [logN] to keep track of all link information
during the broadcast process; it records links that have not been traversed due to
their absence in the previous node(s) so that they can be compensated for in a later
stage once these links exist.

We show that the same strategy can be applied to perform broadcasting on the
generalized Fibonacci cubes. Speciﬁcally, let travel[i] denote the 1"" element of the
array. The source node originates the broadcast by setting all elements in the travel
array to TRUE, and it sends the message plus an updated travel array via each link
existing from the source node. Suppose that a message received at a node t with
travel[i] set to TRUE, then the message will be sent along link i provided it exists
from node t. A

Let (travel, msg) denote the information to be forwarded. The detailed broad-

casting algorithm is shown as follows.

Example 4.2 Consider a broadcast from node 01010 in P t Figure 4.3 shows the
links determined by Algorithm 2 in the broadcast. Each node (represented by an
ellipse) is labeled by its 2-F C. The links shown are those determined by Algorithm 2;
the travel array received at each node is shown above it. Note that the source node

initializes the array to all 1’s (TRUE).

If we further review Figures 4.1 and 4.3, it is not difﬁcult to see that the path used
in routing messages from node 01010 to node 10101 is exactly the same one used in

sending a broadcasting from node 01010 to node 10101. This is no coincidence. We

 

D1
rt

p11

D01
a b
fact
from

with

The(
by .41.

Aha”

60

 

Algorithm 2 Single node broadcasting in Pi:

if node.id - source then
set all bits in travel to TRUE
else It This is not the source node. */
receive (travel, msg ).
if none of the links specified by the travel array exists
0R all elements in the travel array are FALSE,
then stop.
for each link I that exists from this node:
begin
if travel[l] is TRUE then
send (newtravel, msg) on link l with newtravel computed
as follow:
for iE{j|1San—k}:
newtravel[i] (— TRUE iff
(travel[i] is TRUE) AND
(i <1 OR link i from this node does not exist)
end

 

 

 

now show that the broadcasting algorithm (Algorithm 2) routes broadcast messages

through exactly the same path given by the routing algorithm (Algorithm 1). More

precisely, let p be the path determined by Algorithm 1 to route a message from a

node 3 to another node t, and let p’ be the path determined by Algorithm 2 to route

a broadcast message originated from s to t. Then we show that p = p’. Given this

fact, it should be easy to see that Algorithm 2 routes a broadcast message originated

from any node to every other node exactly once and it does so along a shortest path

with length no greater than n - k.

Theorem 4.3 Lets and t be two distinct nodes in Pi. Let p be the path determined

by Algorithm 1 to route a message from s to t. If a message is broadcast from s by

Algorithm 2, then it will reach t along a unique path that coincides with p.

 

 

‘11

P1

DO

eIe;

and

51a.

pig]

61

an!»
101 , 0101
00 00101 10100
am an»
0010 °°°° ooooo
00100 ®
0000 ooooo ooooo

00000

Figure 4.3. Broadcast from node 01010 in Pg.

Proof: Let p’ be a path determined by Algorithm 2 to route the broadcasting mes-
sage from s to t. We prove the equivalence of the two paths p and p’ in two steps: We
ﬁrst show that (Claim 1) the broadcasting algorithm routes the message via the path
determined by the routing algorithm. Then we prove that (Claim 2) the broadcast

messages are not routed along any other path. [3

Claim 1: When applying Algorithm 2, the path p is used to route the broadcasting
messages from s to t.
Proof: Let d be the Hamming distance between 3 and t. Deﬁne p.- to be the if" node
on path p. In other words, p = (po,p1, - - - ,pd-1,pd) where p0 = s and p4 = t. We prove
by induction on i that the message broadcast from 3 based on Algorithm 2 reaches
p,- via the path (po,p1, . - - ,p,-_1), where 0 S i S d; in addition, travel[j] received at
node p;-1 is also set to TRUE for every bit j in which F C (p;_1) and F C' (t) differ.
For i = 0, it should be clear that p0 = s and Algorithm 2 starts out by setting all
elements in the travel array to TRUE.
Assume that the hypothesis is true for i — 1, where i is any arbitrary integer
and 0 < i S d. We will prove that the lemma is true for i. Let 2 be the most
signiﬁcant bit speciﬁed by the relative address of p;-1 and t such that link 2 from

p;_1 exists. (From the proof of Lemma 4.2, we deduce the existence of 2.) By the

62

deﬁnition of path p, Algorithm 1 routes the message from p;..1 to p.- along link 2. By
induction hypothesis, travel [2] received at p.-_1 is TRUE. Hence Algorithm 2 also sends
the broadcast message from p;-1 to p,- via link 2. It remains to show that travel[j]
received at p,- is set to TRUE for every bit j in which F GI"1(p.-) and FGI"1(t) differ.

The following three observations complete our proof:
1. FC(p;_1) Q FC(t) and FC(pg) 69 FC(t) differ in exactly bit 2.

2. For 0 S j < z, the travel[j] received at p,- is the same as that received at pg-“
since Algorithm 2 never changes these bits (which, by induction hypothesis, are

set appropriately when they reach p,-_1).

3. For 2 < j < n — k — 1, the travel[j] received at p,--1 is TRUE if and only if link
j does not exist from any node in the path (p0, p1, - - ~ p54). Obviously, link j
does not exist from p,-_1 either. Otherwise, Algorithm 1 would select link j to

route the message (instead of link 2). Therefore, for z < j < n — k — 1, the

travel[j] received at p,- is also set to TRUE, wherever F C (pg) and F C (t) differ.

D

Claim 2: When applying Algorithm 2, p is the only path to route the broadcasting
messages from s to t.

Proof: Let p’ denote a second path determined by Algorithm 2 to route a message
from s to t such that p’ 75 p. Deﬁne p,- (respectively, pf) to be the i“ node of path
p (respectively, p’). Let j be the smallest integer where p,- 75 p3, i.e., p,- = p: for
0 S i < j and pJ- 75 pg. Let 2 be the bit in which FC(p,-_1) and FC(pj) differ, and
let z’ the bit in which FC(pi-_1) (=FG(p,-_1)) and FC’) differ. That is, FC(p,--1)
differs from F C (t) in both bits 2 and 2’. By assumption p,- 5£ pg, it should be clear
that z 79 z’ and both links 2 and z’ from pj_1 exist. Consider the following two cases:

Case 1: z > z’.

(11

re-
I10
of
the

Sbc

63

According to Algorithm 2, travel[z] received at p;- would be set to FALSE. Therefore
the broadcast message would never reach node t, which is a contradiction to Claim 1.
Case 2: z < 2’.
Algorithm 1 would send the message along link 2’ (instead of link 2 ), which is also
a contradiction.

Hence we conclude that p = p’. D

An important result follows from Theorem 4.3.

Corollary 4.2 The nodes and links traversed by executing Algorithm 2 for broadcast-

ing from a given node in Pi form a spanning tree.

Proof: By Corollary 4.1, Pi is connected with diameter n—k. Furthermore, initially
Algorithm 2 sets all bits in the travel array to TRUE. It can be shown that Algorithm 2
will traverse all nodes in Pi. It remains to show that all edges traversed by Algorithm 2
do not form any cycle.

Let s be the source node to originate the broadcast. By Lemma 4.2 and Theo-
rem 4.3, the broadcast message will reach a node, say pj, in exactly H ( F C (s), F C ( pj))
hops. Hence the only possible situation for any cycle involving p,- is to have two copies
of the broadcast message arrive at p,- from two distinct nodes, say p,-1 and p34 , such
that both nodes have the same distance H (F C (s), F C (p,)) — 1 from the source 3. It

should be easy to see that this is a contraction to Theorem 4.3. D

The spanning tree generated by executing Algorithm 2 will be referred to as the
spanning tree for the generalized Fibonacci cube (STF). We note that two STFs with
different roots are generally not isomorphic. In contrast, all SBTs of a Boolean
cube are isomorphic. In the next section, we will closely examine these two kinds of

spanning trees and show how to transform an SBT to an STF in order to analyze the

of

0P

5313

Th

Can

64

time required by the single node broadcasting algorithm.

4.3 Analysis of Single Node Broadcasting

In this section, we will analyze the time complexity of Algorithm 2. Two communica-
tion models are considered as in [9, 67]. In the one-port [67] (or single link availability
[9]) model, a processor can only send and receive a message on one of its communica-
tion ports at any given time. On the other hand, in the all-port [67] (or multiple link
availability [9]) model a processor can communicate (send and receive) on all of its
ports concurrently. The one-port model is more suitable for networks with a relative
high degree of interconnection from the implementation point of view.

For convenience, we will follow the model used in [9]. In Other words, we will use
the number of time steps to measure the time requirement; all messages are assumed
to be of unit length and it takes one time step to send (forward) a message from a

node to its adjacent node(s).

4.3.1 All-port Model

We ﬁrst consider the all-port communication model in the Boolean cube. Notice that
the farthest node from any given node in B" is at a distance of n (the diameter of
B”). Hence n is a lower bound of any broadcasting algorithm for B... Since the height
of any SBT in B,, is exactly 11, broadcasting algorithms based upon the SBTs are
optimal. Similarly, by Theorems 4.1 and 4.3, Algorithm 2 requires only minimal time

steps to complete any single node broadcasting in the generalized Fibonacci cubes.

Theorem 4.4 Let d, be the maximal distance between an (arbitrary) node 3 and all
the other nodes in Pi. Then, under the all-port communication model Algorithm 2

can optimally complete a broadcast from s in exactly d, time steps.

1b
to}
SP1
deI

her.
be 5.
the

65

Notice that for 1‘: we have [ﬁlial-j S d, S n - lc, and the lower bound [Ml-55:11]

is achieved when broadcasting from node 0.

4.3.2 One-port Model

In the one-port model, only one port is available for transmitting message at any
moment. Hence the scheduling scheme applied to nodes which need to send/ forward
a broadcast message to multiple nodes will make a critical difference in the overall
broadcasting time. With the Boolean cubes, it is known [67] that SBTs combined
with the “Tallest Remaining Subtree First” (TRSF) scheduling discipline results in
an optimal one-port broadcasting algorithm. We now present a scheduling discipline
called Most-Signiﬁcant-Link-First (MSLF) for the generalized Fibonacci cubes. Re-
call that under Algorithm 2 a node p should forward the message through every link
j provided that the corresponding travel [7] received at p is TRUE. With the MSLF
rule, each node ﬁrst sends the message via the speciﬁed most-signiﬁcant link, then via
the speciﬁed second most-signiﬁcant link, and so on. In other words, a node should
follow the left-to-right scanning order in sending messages to multiple links that are
speciﬁed in the received travel array. Our one-port broadcasting algorithm is thus
deﬁned as Algorithm 2 combined with the MSLF scheduling.

Notice that with the Boolean cubes the MSLF reduces to the TRSF scheme,
because there are no missing links. Let I.- be the link number of a link in an SBT; it can
be shown that n—l; is exactly the time step at which the MSLF scheme used to forward
the broadcast messages when the one-port broadcasting algorithm is applied (cf.
Fig 4.2). Consequently, the time required by our one-port broadcasting algorithm is
exactly n for B... We next show that our one—port single node broadcasting algorithm
requires no greater than n — lc time steps for any single node broadcasting in Fﬁ. It
is necessary to examine closely the relation between the STF and the SBT at this

point.

l:

l":
cor
and

link
This

1mm!

66

To simplify our presentation, in the remainder of this section a node in the Boolean
cube or the generalized Fibonacci cube will be referred to by its binary address (which
is either a binary code or a k-FC). For convenience, we introduce a few new notations.
Let SBT..(s) denote the SBT rooted at a node 3 in B". Similarly, let STF:(s) denote
the STF rooted at a node 3 in Ft.

A path between node 3 and node t can be also represented as an ordered list of
link numbers which comprises a sequence of link numbers leading from s to t. In this
case, we will refer to link sequence to emphasize that the path is in terms of link
numbers instead of node labels. The relation between a path p and its corresponding
link sequence q is as follows. Let p = (uo,u1, - --u,-_1,u;) denote a path from node
no to node u,- and q = (l1,l2, - - - l,--1, l,-) the link sequence corresponding to the path.
Then u,- = no 69 ll EB lg - - - 613 l,- for 1 S j S i. As an example, in Fig. 4.2 the path
connecting nodes 0000 and 1111 in the SBT4(0000) is (0000, 1000, 1100, 1110, 1111)
and the corresponding link sequence is (3,2,1,0). Let (l1,l2, - - - ,l,-) be an arbitrary
link sequence in an SBT. It is important to note that l1 > I; > - - - > I.- is always true.
This is because, as deﬁned in Section 4.2, for any internal node of an SBT the link
number(s) of the out-edge(s) is less than that of the in—edge.

We now proceed to analyze the time required by our one-port broadcasting algo-
rithm when it is applied to the generalized Fibonacci cubes. We write V: to represent

the set of k-FCs for all nodes in Pi. Lemma 4.4 establishes a relation between an

SBT and an STF, which will enable us to transform an SBT to an STF.

Lemma 4.4 Let uo be a node in V5. Let (u,,u;+1,u,-+2) be the path in SBTn-k(uo)
connecting node u,- to node 11,-4.2, and (l,-+1, lg“) the corresponding link sequence. As-

sume that 11,-4.2 is in V: and a,“ is not in V5. Then
1. Node u.- must be in V5,

2. Within 8..-,“ link l,-+2 from u,- leads to a node r that must be in V5, and

by

(3;.

C0D

in];

ll) {b

node

67

3. Within 8..-," link lg.“ from r leads to 11,-4.2.

Proof: First we prove that node u,- is in V: by contradiction. Assume that node
u,- is not in Vlf, we distinguish between the following two cases. (Notice that u,- and
u,-+1 differ in exactly bit l,-+1.)
Case 1: There are k 1’s in both u;[n - k —1 :l.-+1 +1] and ug+1[n — k -1 :l,-+1 +1].
This implies that there are k 1’s in ug+2[n — k : I,“ + 1], because u,“ inherits these
bits from 11,-4.1. Therefore node u,“ would not be in Vlf, which contradicts to our
assumption.
Case 2: There are k 1’s in both u;[l,-+1 — 1 : 0] and ug+1[l,-+1 - 1 : 0].
This implies that there are k 1’s in uo[l,-+1 — 1 : 0], because both u.- and u,“ inherit
these bits from no. Therefore node uo would not be in V5, which is also a contradiction.
We now prove that r is in V,'f. Let (b..-k-1, b,,_k_2, - - - , b1“, , bl.+,—1,' - - , bu", ~ - - , b0)
be the binary code of 11,-. We claim that b;.. H = 0 and bl“, = 1. Consequently, node r
must be in Vlf, because r is obtained from u,- (which has been proven to be in Vlf) by
ﬂipping bit I,“ from 1 to 0. To verify the previous claim, we examine all the three
other possibilities:

(1)51”, = 0 and b1

n+2 = 0. Node 11,-4.2 would not be in V,'f, since Ug+2 is obtained from
a,“ (which is not in V: ) by ﬂipping bit I,” from 0 to 1. This is a contradiction.
(2) b1,+1 = 1 and by,“ = 0. Node u,“ would be in Vlf, since 21,-4.1 is obtained from u,-
by ﬂipping bit lg.” from 1 to 0. This is a contradiction.

(3) bl,“ = l and b;

m = 1. Similar to (2), node u,“ would be in V:. This is also a
contradiction.
Finally, the edge (r, 21,-4.2) is in 8..-], (more precisely, I": ), because both nodes are

in V: and they differ in exactly bit lg.” as we just observed. Cl

With Lemma 4.4, we can devise a scheme to transform an SBT to an STF. Let uo be a

node in Vlf. A link (p, q) in SBTn_;.(uo) can be classiﬁed into the following four groups:

(13

an.
bro
lust
are 3

link

We a

68

(1) both p and q are in V:, (2) p is in V:, whereas q is not, (3) p is not in Vlf, whereas
q is, or (4) neither p nor q is in V5. The scheme is as follows. We retain all group 1
links, remove all group 2 links together with the corresponding q’ 3, remove all group 3
links together with the corresponding p’ s, and remover links in group 4 together with
the corresponding p’ s, q’s. Each node q in group 3 needs special attention, since the
corresponding p had been removed. Fortunately, Lemma 4.4 guarantees that there
exists a node in V:, which has a link to q and, hence, can become the new parent of

q. Algorithm 3 describes a method to transform an SBT to an STF.

 

Algorithm 3 To transform an SBT to an STF:
/* Assume that uo is an arbitrary node in V: */

Step 1 Remove every node (along with all incident edges) that is
not in Vlf.

Step 2 For each node u in V:, whose parent has been removed in Step
1, find its new parent by applying Lemma 4.4 and add a new
link between them.

 

 

 

Example 4.3 Figure 4.4 illustrates a construction of STF83(01100) from
SBT5(01100). For comparison, nodes not in V: are signiﬁed by dotted ellipses in
STF83(01100). There are two labels for each link in the STF: one is its link number
and the other (parenthesized) signiﬁes the routing step determined by the one-port
broadcasting algorithm to broadcast a message from the root. To form the STF, for
instance, node 11100 is deleted from the SBT in Step 1, and nodes 10100 and 11000
are respectively connected to nodes 00100 and 01000. Notice that (4,3) and (3,4) are
link sequences from node 01100 to node 10100 in the SBT and the STF respectively.

We now prove the correctness of the above algorithm.

L.

C]?

69

an»
3 e
w «my: ’ «m‘ m «an
a ' a. o a o o
. O O 1 . o o
x o o 0 °
@cuzm‘msrmm «my
«my.

 
  
  

(33333

  
 

 

(: 1 (n

6321361339 «m «m

Figure 4.4. Construction of (b) STF§(01100) from (a) SBT5(01100).

Lemma 4.5 Given an SBTn-k(uo) as its input, Algorithm 3 correctly produces

STF:(uo), provided that no is in V5.

Proof: It should be clear that the graph generated by Algorithm 3 is still a tree.
It remains to show that this tree is STFﬂuo). Consider three nodes 11,-, u,“ 11,-4.2 and
the corresponding link sequence (l.-+1, l.-+2) as speciﬁed in Lemma 4.4. It can be shown
that there exists a path from u; to 11.42 in STFﬂuo) and link l,-+1 is in that path. We
now prove by contradiction that (1.4.2, 1.4.1) is exactly the link sequence in STF/{(110),
which connects u,- to 11.42. Assume that (”+21 l.-+1) is the link sequence in STF:(uo)
connecting u; to v.42, where 1;“ gé lg+2. There are two possible cases: (1) l§+2 > 1.4.2.

We claim that l,“ > ls”. Because if the claim were false, then the link sequence

 

t3

L6

610

STF
and z
for (b
have 1'
U are 0
Case 1
Withom

70

(12,”, 154.1) from u.- would lead to a node other than man. Consequently, by deﬁnition
of SBT, (l;+1,l£+2) is a link sequence from u; in SBTn-k(uo), which also leads to a
node other than ug+g. This is a contradiction. (2) l2” < 1.4.2. In this case, it is clear
that (L41, If“) is a link sequence from u.- in SBTn-k(uo), and this link sequence would

lead to a node other than u,-+g. This is also a contradiction. 0

Observe that links in STFs can be classiﬁed into two sets: set A includes the links
added in Step 2 of Algorithm 3, and set B the remaining links (i.e., the links
that also exist in the corresponding SBT). For example, in STF83(01100) the links
(00100,10100) and (01000,11000) are in set A, and all other links are in set B (refer
to Fig 4.4). Links in set A are resulted from the “missing” nodes which have k 1’s
in their binary addresses. Each link I in set A connects two special nodes: the one
which deﬁnes l as an out-edge is called type—a node and the other (which deﬁnes l
as an in-edge) type-ﬂ node. We next show that each type-a node has exactly one

type-ﬂ node as its child.

Lemma 4.6 Let s be a node in F: and u a type-a node in STF/f(s). Then u has

exactly one type-,6 node as its child in STF:(s).

Proof: Let p be the parent of u and m the link number of the link (p,u) in the
STF. We prove the lemma by contradiction. Assume that u has two type-,6 nodes v
and w as its children in the STF, additionally, i and l (where i > I) are link numbers
for the links (u, v) and (u, w), respectively. Since both links i and l are in set A, we
have i > I > m. From the proof of Lemma 4.4, we deduce that all bits i,l and m of
u are 0’s. We further distinguish among the following three cases:

Case 1: i _>_ l+ 2 and there existsj such that i > j > I and bitj of u is 1.

Without loss of generality, assume that i = j+1, j = l +1 and l = m+1. Figure 4.5 (a)

shows a portion of the SFT, where nodes are speciﬁed by bits i, j,l and m. Since v

As
the

The

71

is in the STF, there is no k consecutive 1’s in v[n - k — 1 : j]. Similarly, there is no
k consecutive 1’s in p[l : 0]. Hence, there exists a node u’ in the STF, whose k-FC
is v[n - k — 1 : j] - p[l : 0]. Consequently, at node p Algorithm 2 would forward the
message to u’, and u’ would become the parent of v. This contradicts our assumption
that u is the parent of v.

Case 2: i Z l+2 and all bits in u[i—1,l+l] are 0’s.

Again, we may assume that i = j+1,j = l+1 and l = m+1. Figure 4.5 (b) illustrates
the situation. Similarly, node 11’ (speciﬁed by 1001) is in the STF, and u’ would be
the parent of v in stead of u, which is also a contradiction to our assumption that u
is the parent of v.

Case 3: i = l + 1.

This is a degenerated situation of Case 2. C]

 

Figure 4.5. Type-a node and its type-ﬂ child.

As a consequence of Lemmas 4.5 and 4.6, the next theorem gives an upper bound on

the time required by our one-port single node broadcasting algorithm.

Theorem 4.5 The one-port broadcasting algorithm requires no greater than n-k time

the
dec
afie

72

steps for any single node broadcasting in Pi.

Proof: Let an be the source node to initiate the broadcast. We consider the
scheduling for links in set A as deﬁned before. Let u,-, u.-+1, u,~+2 and r be four nodes
as specified in Lemma 4.4. Thus links (11;, u,-+1) and (ugil, m“) are not in S TF:(uo);
moreover, link (r, u,-+g) is added in Step 2 of Algorithm 3. It can be shown inductively
that the forwarding of the broadcast message on link (r, 11,-4.2) can be scheduled at
a time no later than that of (u,+1,u,-+2) in the one-port broadcast in 3..-), by using
SBTn—k(uo)- To see this, by Lemma 4.6, (r, ug+2) is the only set A link added to node
r; furthermore, it is the ﬁrst link to be scheduled at r according to the MSLF scheme.
With the removal of (u,, 11.2“) the scheduling of (11,-, r) can be advanced by at least 1
step. Consequently, the scheduling of links in set B will not be delayed by the added

links in set A. El

Remark: Let (po,p1,~ - ,p;) be a path in STFﬂuo). Figure 4.6 shows that there
may be two type-a nodes in such a path, i.e., nodes 001011 and 101001. (In general,
there may be more than two such nodes in a path.) However, one should be able to
deduce that the the scheduling in the second type-a node, i.e., 101001, will not be

affected (delayed) by a type-a ancestor.

DOt

 

Figure 4.6. A portion of STF:(011011).

4.3.3 Optimality of One-port Algorithm

We now examine the optimality of our one-port broadcasting algorithm. Figure 4.7
illustrates an example of broadcasting in I‘; from source node 00000, where links are

labeled by the time steps determined by the one-port algorithm. It is seen that a

‘w w H010 4 w (mo

 

Figure 4.7. One-port broadcasting in I32, from node 00000.

tOtB-l of 5 time steps is needed to complete the broadcasting. However, by redeﬁning

node 00001 as the child of 10001 and properly rearranging the scheduling, Figure 4.8

74

shows that the broadcasting can be completed in 4 time steps, which can be shown

to be optimal.

Figure 4.8. An optimal one-port broadcasting in 1‘; from node 00000.

To study the optimality, we need to determine the lower bound of one-port broad-
casting. Under the one—port model, each node can send messages to exactly one other
node at any given time. Clearly, under the one-port model, the number of nodes that
receivethe broadcast message is at most doubled after each time step. Therefore, it
takes at least [log N] time steps to complete a broadcasting for any network consist-
ing of N nodes. Consequently, a lower bound for any one-port broadcasting in 1‘: is
Dos Flkll-

Recall that Theorem 4.5 shows that n -- k is an upper bound for our one-port
broadcasting in Fﬁ. Let s be a node in Pi, and 3 the node whose binary address is
the binary complement of F C' (s) The following theorem identiﬁes situations where

our one-port algorithm achieves optimality.

Theorem 4.6 For n > k 2 2, the one-port single node broadcasting algorithm is

optimal under the following two conditions:

1. broadcast from any node in I‘" where Fy‘] > 2"""1, or

n)

ti

75
2. broadcast from a node s in I‘: such that the node 3 is also in Pi.

Proof: Condition 1 is true-because the lower bound [log Fiﬁ] is equal to n — k.
Condition 2 follows from the observation that the minimum amount of time to send

a message from node 3 to '5 is H(s,§) = n — k. i D

For a ﬁxed k, there exists an integer no such that F,[,"] S 2"""‘1 for n > no; hence
condition 1 may not always hold. However, since the diameter of F: is n — k, the one-
port algorithm may already be optimal for single node broadcasting in the generalized
Fibonacci cube.

We now further show that other common communication primitives can be efﬁ-
ciently implemented in the generalized Fibonacci cubes. We follow the same assump-
tions that all messages are of unit length and it takes one time step to send (forward)

a message.

4.4 Single Node Accumulation

A dual Operation for single node broadcasting is the single node accumulation [9, 10]
(or reduction [67]) in which a particular node is to receive messages (data) from all the
other nodes. In this operation, the messages received at a node can be “combined”
with the local message before transmission. Here we assume that the “combined”
message still takes one time steps for transmission. This problem arises, for example,
when we want to compute the sum consisting of one term from each node, where
addition can be viewed as “combining” messages. The broadcasting and the accu-
mulation are no different in concept: any broadcasting algorithm can be transformed
into a dual algorithm for accumulation by simply reversing the data paths [9, 10, 100].
Therefore, the times required by the all- and one-port single node accumulation in the

generalized Fibonacci cubes are also captured by Theorems 4.4 and 4.5, respectively.

\———‘—’~\_ .
I

senc
in 5
In
ing) a
rithm
Algorit}

“meal

76

4.5 Single Node Gather and Scatter

Many algorithms require a particular node to collect one personalized message from
every other node; this data transfer operation is referred to as the single node gather
[9, 10, 100] (or all-to—one personalized communication [67]). Notice that in the gather-
ing operation messages may not be “combined” (cf. accumulation). A dual operation
for the single node gather is the single node scatter [9, 10, 100] (or one-to-all per-
sonalized communication [67]) in which a particular node sends different personalized
messages to all the other nodes. Again, gathering and scattering are dual problems:
any algorithm for gathering can be translated into a dual algorithm for scattering,
and vice versa [9, 10, 100]. Algorithm 4 is a simple elegant gathering algorithm for

the hypercube, which was proposed in [100].

 

Algorithm 4 Single Node Gathering in B":

fori=0ton—1do: .
All nodes addressed with *""‘110' send messages accumulated from
the previous steps to nodes *"""0‘+1.

 

 

 

It can be seen that Algorithm 4 consists of n steps: at the i“ step nodes sun-”110"
send 2‘ messages to nodes *““‘10‘+1. Figure 4.9 illustrates an example for gathering
in B4, where node 0 collects messages.

Indeed, the gather (scatter) algorithm is similar to the accumulation (broadcast-
ing) algorithm. Not surprisingly, the communication graph determined by Algo-
rithm 4 (refer to Fig. 4.9) is exactly the SBT. To design and analyze the gathering
algorithm for the generalized Fibonacci cubes, we further examine the STFs. For

convenience, we introduce some new terms regarding the SBTs. The children of a

node
link 1
fmm .
ﬁlPPin
HOde u
and r :
deﬁned
node u i

ﬂipping

77

GED 639 @®
@4596!» @6115 (ED

(momma em
«I!»
(II)
as can an» an em
worm «no 65’
«no
0:) (6)
can
(6)»

Figure 4.9. Single node gathering in B4.

node p in SBTn(s) is represented by Cn(p,s). Let u be a child of node p and j the
link number of the link (p,u). The i“ child of u is obtained by ﬂipping bit j —i
from u, where l S i S j; the 2"“ child of the root is deﬁned as the node obtained by
ﬂipping bit n —i from the root, where 1 S i S n. Node r is the 2"“ right sibling of
node v in SBTn(s) if both u and r are in Cn(p,s) for a node p such that u = p $ 8,3,
and r = p63 8,1,“ for 0 S j S n — 1 and 1 _<_ i S j; the 0‘“ right sibling of node u is
deﬁned to be u itself. In other words, node 1‘ is referred to as the i”l right sibling of
node u if they have the same parent p in SBTn(s) such that u and r are obtained by

ﬂipping respectively bits j and j —i from p. It is worth mentioning that for any node

to tj

Exam
and th.
in the 5
in 5875
mapped
of 53m
‘“ 5W

78

in a given SBT the number of its children equals to the number of its right siblings.
The root of an SBT deﬁnes level 0, the children of the root deﬁne level 1, and so on.

In what follows, we will ﬁrst show that S TF,',‘(s) is isomorphic to a subgraph _of
SBTn-k(3). With this important observation, we will be able to demonstrate that

Algorithm 4 can be adopted to the generalized Fibonacci cubes.

Lemma 4.7 Lets be a node in Ft. The tree STF:(3) is isomorphic to a subgraph
of SBT.._;.(s) for n 2 k 2 2.

Proof: We show how to map type-oz nodes in STF,’,°(s) to SBTn-k(s). Recall that
I‘: is a subgraph of BM,“ Let us compare SBTn-k(s) with STF:(s) from level 0
to n — k. Clearly, level 0 of both trees are equivalent. Let q be a node in level 1 of
S BTn-k(s), which is to be removed in Step 1 of Algorithm 3. Assume that node v is
the i‘“ child of q and v is in Pi. By Lemma 4.4, the 1"" right sibling of q, say u, is in
Pi, and u is the parent of v in STF:(s). By Lemma 4.6, v is the only type-,6 node
associated with u (a type-a node). Therefore, node u in STFﬂs) should be mapped
to the i - 1‘” right sibling of q in S BTn-k(s). Consequently, descendants of u should
be mapped accordingly. The nodes in level 2 through 11 — k can be argued similarly.

CI

Example 4.4 We redraw STF83(01100) in Figure 4.10 (a), where the type-ﬂ nodes
and their descendants are highlighted by dotted rectangles. Since node 11100 is not
in the STF, node 00100 (the ﬁrst right sibling of 11100) should be mapped to 11100
in SBT5(01100). Similarly, node 01000 (the second right sibling of 11100) should be
mapped to 00100 in SBT5(01100). Hence STF83(01100) is isomorphic to the subgraph
of S BT5(01100) shown in Figure 4.10 (b). Notice that all descendants of node 00100
in STF:(01100) are mapped to different nodes in SBT5(01100) except the subtree

To

Lerl
Pﬁnh
SlEps

°P€ra

PTOQ

Show 1

79

rooted at 10100. On the other hand, all descendants of node 01000 in STF33(01100)

are mapped to different nodes in SBT5(01100).

 

1

 

‘13} (EDGE?

(ED‘ED‘

 

 

 

 

 

 

 

(I)
can
a o
3
am am «an»
3 o
1 e
1 0 a. 0 o
l o o o
«m 4!!» am GED
e
«m
(b)

Figure 4.10. (a) STF:(01100) and (b) a subgraph of SBT5(01100).

To adopt Algorithm 4 to the generalized Fibonacci cubes, we need the following

lemma.

Lemma 4.8 Let s be a node in Pi. Assume that algorithm A can be applied to

perform the gather (respectively, scatter) operation based on SBTn-k(s) in X time

steps. Then algorithm A can be applied to perform the gather (respectively, scatter)

operation based on STF:(s) in no more than X time steps.

Proof: By Lemma 4.7, STFﬂs) is isomorphic to a subgraph of SBTn_k(s). To

show that algorithm A can be applied to the STF, we take the following approach:

80

0 Map the STF/f(s) to the subgraph of SBT.._;.(s) that is isomorphic to STFﬂs),

and

0 Based on the SBT, apply algorithm A to the subgraph and take “no-operation”

whenever the node(s)/link(s) that should be involVed is not in the subgraph.

Clearly, it takes no more than X time steps for algorithm A to complete the gather

(scatter) in the STF. 0

Therefore Algorithm 4 can be adopted for gathering in the generalized Fibonacci
cubes. We next analyze the time required by the algorithm. Again, we consider the

one- and all-port models separately.

4.5.1 One-port Model

Recall that the scheduling is a critical issue for minimizing time steps in the one-port
broadcasting. This is also true for the one-port gathering. Notice that schedul-
ing is implicitly deﬁned in Algorithm 4, since at each step an active node is send-
ing/receiving message(s) to/from exactly one other node. Indeed, the scheduling
in Algorithm 4 is the reverse of the MSLF (refer to Fig. 4.9) for the single node
broadcasting. Therefore Algorithm 4 is exactly a one-port gather algorithm for the

hypercube, and its complexity is as follows.

Lemma 4.9 Under the one-port model, Algorithm 4 takes 2" — 1 time steps for any

single node gather in B" for n 2 1, which is optimal.

Proof: The proof of time complexity is given in [100]. The optimality follows from
the observation that the root needs to collect 2" -— 1 data, which takes at least 2" - 1

time steps under the one-port model. Cl

We

The
in F.
4.5.2

Imprc

We ex

Mingle 1

81

By Lemmas 4.7, 4.8 and 4.9, one should be able to see that under the one-port model,

“‘l‘ — 1 time steps.

any single node gather in I“: can be achieved in no more than 2
On the other hand, since the node that initiates the gather operation needs to receive
1‘1“] - 1 messages, a lower bound for any one—port singlenode gather in Ff, is Fl,” - 1.

We now present a scheme for single node scatter, which attains this lower bound.

The scheduling rule is:

The source node sends the message to the farthest node ﬁrst (break ties
arbitrarily) along the links in the STF.

The above scheme will be referred to as the STF Farthest Node First (STF-FNF)
rule. To analyze the STF-FNF scheme, let N.- denote the number of nodes with
(Hamming) distance i from the root (i.e., the number of nodes in level i of the STF).
By the STF-FNF rule, the nodes in level 1 will be scheduled between FF] — N1 and
F5,” — 1, and the messages will reach them in one time step. Similarly, the nodes in
level 2 will be scheduled between Fl,” — N1 — N2 and Fl,” - N1 — l, and the messages
will reach them in two time steps. In general, it can be shown that the STF-FN F
requires Fl,” —1 time steps to complete any scatter in 1‘: as long as N.- _>_ 1. Therefore

we have proved the following theorem.

Theorem 4.7 Under the one-port model, a single node scatter (respectively, gather)

in Ff, can be achieved in exactly 171"] — 1 time steps for n > k 2 2, which is optimal.

4.5.2 All-port Model

Improvements can be obtained by taking advantage of the all-port communications.

We examine the case in the hypercube ﬁrst.

Lemma 4.10 Under the all-port model, Algorithm 4 takes 2"“1 time steps for any

single node gather in 8,, for n 2 1.

me

utili

82

Proof: Since all SBTs rooted at different nodes are isomorphic, we need only
examine the case for SBT..(0). Observe that SBT..(0) can be divided into two SBTs:
T; which consists of the node 10""1 together with all its descendants, and T o the
remainder of SBT..(0). We prove the lemma by induction on n. Clearly, Algorithm 4
takes 1 time step when n = 1. Assume that the lemma is true for all n < I, where
1 denotes an (arbitrary) positive integer. Let us consider the case when n = I. By
induction hypothesis, Algorithm 4 takes 2"2 time steps in both To and T1 and they
can be done in parallel. A (21"2 + 2’") time scheduling is easy to obtain, because
node 10""1 can gather all 2"1 messages located in T1 in 2"2 time steps then passes
all of them to node 0. The key to achieve a 2"1 time scheduling is that node 10’”1
should performs the gathering and passing messages to node 0 concurrently. Indeed

it can be shown that the following scheduling rule will result in 2’ "1 time:

Given a node u with all messages gathered in it, u sends the message

originated from the node with the largest label ﬁrst.

Again, by Lemmas 4.7, 4.8 and 4.10, we have the following result.

Theorem 4.8 Under the all-port model, any single node gather (respectively, scatter)

in F: can be achieved in no more than 2"""1 time steps for n > k 2 2.

Indeed, it can be shown that a tighter upper bound for the single node gather/ scatter
is IT,|, where IT,| is the maximum size (number of nodes) of the subtrees branched
from the root. This is because the root can follow the STF-FNF in sending the
messages on all its incident links concurrently. By Lemma 4.7, IT.| S 2""I“l for Ft.

Remark: A way to improve the all-port single node gathering in B., is to fully

utilize all the n links incident from the root by partitioning all other nodes into

at

for

{t is

‘0 nc

83

equivalent class based on their weights.’ Based on this idea, the “perfectly balanced

spanning trees” deﬁned in [10, 9] have been proposed for all-port single node gathering

in B... They have shown that the gathering can be achieved in (if;

 

1 time steps,
which is optimal, because the root needs to collect 2” .— 1 messages through its logn
links. However, it can be shown that their approach do not apply to the generalized

Fibonacci cubes directly, due to the “missing” nodes and links.

4.6 Multinode Broadcasting and Accumulation

Another useful data communication primitive is the multinode broadcasting [10, 9]
(or all-to-all broadcasting [67]) in which every node sends a data (message) to all
the other nodes. Similarly, the multinode accumulation is a dual operation of the
multinode broadcasting. A naive approach is to broadcast the message from each
node in turn by using the single node broadcasting algorithm. Trivially, the time
required by this approach is Fy‘] - Tb, where T5 is the time required for the single node
broadcasting algorithm.

A simple idea from [100] can make much improvement. Let (p, q) be a link in B...
At a step i node p should send all received messages originating from s to q such
that the Hamming distance between 3 and q is 2'. Since B.. is a connected graph with
diameter n, it can be shown that after n steps (iterations) a message originating from
a node will reach all the other nodes. Formally, a multinode broadcasting algorithm
for B.. is described as follows [100].

Under the all-port model, (1) and (2) in Algorithm 5 can be performed in parallel.

n — 1

It is shown in [100] that at the 2"” step, node p will send exactly messages
i - l

to node q. Therefore by summing i from 1 to n the total time required by Algorithm 5

 

‘The weight of a binary code is deﬁned as the number of 1’s in it.

f

hm

84

 

Algorithm 5 All-port Multinode Broadcasting in B.. :

for i=1 to n do:
for every link (p,q) do:
(1)Send from p to q all received messages which originated
from s such that H(s,q) = i.
(2)8end from q to p all received messages which originated
from s such that H(s,p) = i.

 

 

 

is 2"”1 time steps. By Corollary 4.1, I": is a connected graph with diameter n — k.
Hence it can be shown that Algorithm 5 also applies to the generalized Fibonacci

cubes, and consequently the time required is as follows.

Theorem 4.9 Under the all-port model, the multinode broadcasting (respectively, ac-

cumulation) in F: can be achieved in no more than 2""""1 time steps for n > k 2 2.

We now focus on the one-port model. Let us consider the multinode broadcasting
in a ring topology with size N, denoted by RN; it is known [10, 6,7] that Algorithm 6

gives an optimal result.

 

Algorithm 6 One-port Multinode Broadcasting in RN :

Node j sends its broadcasting message to node (j + 1) mod N.

for i=2 to N—1 do:
Node j sends the broadcasting message received in step i—l
to node (j +1)modN.

 

 

 

Clearly, the time required by Algorithm 6 is exactly N — 1 time steps. A lower
bound for any one-port multinode broadcasting is also N — 1, since each node needs

to receive N — 1 messages. Therefore, we have proved the following lemma.

4.7

scatte'

85

Lemma 4.11 Under the one-port model, the multinode broadcasting (respectively,
accumulation) in a ring of size N can be achieved in N — 1 time steps, which is

optimal.

By Theorem 3.3, the length of the longest cycle in F: is Fl“) if Ff] is even and
F3] — 1 if otherwise. Therefore, by embedding a longest cycle in Ft, we have the

following result.

Theorem 4.10 Under the one-port model, the multinode broadcasting (respectively,
accumulation) in P: (where n — 3 Z k 2 2) can be achieved in Fl,” time steps if Elf]

is even and in 2151"] -— 3 time steps if otherwise.

Proof: The case when Fl,” is even follows from Theorem 3.3 and Lemma 4.11. It
remains to show the case when 151“] is odd. In this case, Ff, contains a cycle of length
Fifi — 1. Let q denote the node that is not in the longest cycle; Figure 4.11 shows a
schematic diagram of the longest cycle. The multinode broadcasting can be divided
into two phases. In the first phase, all nodes in the longest cycle (i.e., all but node
q) execute Algorithm 6 to accomplish a multinode broadcasting among them. In the
second phase, (refer to Fig. 4.11) node q receives all the other Fl,” — 1 messages from
node p and sends its message to s which forwards the message to r then to t from
which the message travels clockwise along the cycle until it reaches p. The ﬁrst phase
takes Elf] — 2 time steps and the second phase takes Fl,” — 1. Therefore, the total

time required is 2F,[,"] — 3 steps. [J

4.7 Summary and Remarks

We have studied the routing, single node broadcasting/accumulation, single node

scatter/ gather, and multinode broadcasting / accumulation data communication prim-

in

te

stu

erat
difie
or n

‘0 a1

86

9.0
r st

Figure 4.11. One-port multinode broadcasting through the longest cycle.

itives in the generalized Fibonacci cube. The SBTs have been widely used to imple-
ment data communication in the hypercube. Indeed, the broadcasting/ accumulation
and scatter/ gather primitives discussed in this chapter can be efﬁciently achieved by
employing the SBTs combined with some appropriate scheduling rules [67] [100]. It
is known that the hypercube is a regular graph; all SBTs with different roots in B..
are isomorphic. This fact simpliﬁes the analysis of the SBT-based data communica-
tion primitives, since only SBT..(0) need to be considered. Here, for the generalized
Fibonacci cube, we have deﬁned the. STFs and presented the STF-based algorithms
for single node broadcasting/ accumulation and single node scatter/ gather. Although
the STFs with different roots in Ff, are not isomorphic in general, we have presented
an algorithm which allows us to transform an SBT to an STF. Moreover, by show-
ing that an STF is isomorphic to a subgraph of an SBT, we have presented a new
technique for studying data communication in the generalized Fibonacci cube which
is irregular in general.

Table 4.1 summarizes the communication primitives and their time complexities
studied in this chapter.

One communication primitive we have not yet addressed is the total exchange op-
eration [9] (or all-to-all personalized communication [67]) in which every node sends
different messages to all the other nodes. This is equivalent to the multinode scatter
or multinode gather, since every node needs to receive and send different messages

to all the other nodes. Clearly, the total exchange can be implemented by executing

87

Table 4.1. Time complexities for communication primitives in Ft.

 

II Communication_ Primitive [ Model I Time I]

 

 

 

 

 

 

 

 

 

 

 

single node broadcasting all-port d:
(accumulation) one-port _ n — k
single node scatter all-port 2"“1’1
(gather) one-port Fl,” - l
multinode broadcasting all-port 2"""l
(accumulation) one-port Fl,“ — It
2541- 3:

 

 

 

 

‘The distance between source node 8 and all the other nodes.
1 If Fl.” is even.
1 If Fl.” is odd.

single node scatter (gather) by all nodes, taking turns. This results in Fl,” . T, com-
plexity, where T. is the time required by the single node scatter (gather). Obviously
improvements can be made. One approach is to invoke single node scatter (gather)
concurrently on all nodes. However, this leads to a queueing delay. The analysis of

this approach remains open.

(1
(11
he

[67
the
(list

2‘ Hi

ll 7;

CHAPTER 5

GRAPH EMBEDDING AND
DATA COMMUNICATION IN
INCOMPLETE HYPERCUBE

In this chapter, we will study the Hamiltonian problem and data communications
in the Incomplete HyperCube. Much research has been undertaken to study the
Incomplete Hypercube. Many useful properties of the Incomplete Hypercube, such
as diameter and trafﬁc density, have been shown [109] to be comparable with that of
the hypercube. In [110], Tzeng et al. have shown that the binary trees and the two-
dimensional meshes can also be embedded in the Incomplete Hypercube. Katseff [68]
has presented algorithms for routing and single node broadcasting. Recently, more
sophisticated single node broadcasting algorithms based on results for the hypercube
[67] have been devised and analyzed by Tien et. al. [107]. These results suggest that
the Incomplete Hypercube is an attractive candidate for interconnecting parallel and
distributed systems.

Note that results reported in [107] [109] [110] are for Incomplete Hypercubes of 2"+
2“ nodes, where 0 S k < n. It can be seen that the “incomplete hypercubes” studied in

[107] [109] [110] represent only a small subset of the general case. For example, there

88

In

[In

con
Hm
sing
for t
ter/g

and,

89

are only 10 different such incomplete hypercubes within the size between 21° and 211
(exclusive); on the other hand, there are a total of 210 different Incomplete Hypercubes
in such a range. In this chapter, we present embedding and data communication
primitives for I N where N can be any positive integer.

As seen in Chapter 3, embeddings of linear array and ring are essential to the
embeddings of other structures such as mesh. Here we will show that the longest cycle
in I N contains N nodes if N is even, and N — 1 nodes if otherwise. Nevertheless,
IN always contains a Hamiltonian path. Consequently, our results on embeddings
of linear array and ring in the Incomplete Hypercube are both optimal in terms of
minimal dilation.

A single node broadcasting algorithm for the Incomplete Hypercube has been pre-
sented in [68], which is essentially the same as that for the generalized Fibonacci cube.
In [79, 82], we have presented optimal algorithms for the single node broadcasting
under both the all- and one-port communication models. The all-port broadcast-
ing algorithm is a result from [68]; the MSLF scheduling scheme was applied again
to attain an optimal one-port broadcasting. Here, we further'devise and analyze
other communication primitives, such as accumulation, gather and scatter, for the
Incomplete Hypercubes.

In this chapter, we write N (Z 1) to denote the total number of nodes in the
Incomplete Hypercube, and n = [log N] .

The remainder of this chapter is organized as follows. Section 5.1 shows that IN
contains a Hamiltonian path and rings can be optimally embedded in the Incomplete
Hypercubes. Routing and single node broadcasting are reviewed in Section 5.2. The
single node broadcasting algorithms are analyzed in Section 5.3. Optimal results
for the single node accumulation are reported in Section 5.4. The single node scat-
ter/ gather is studied in Section 5.5. Section 5.6 considers the multinode broadcasting

and accumulation. Finally, we summarize our results in Section 5.7.

90
5.1 Hamiltonian Problem

In this section we study the Hamiltonian problem in the Incomplete Hypercube. We
will show that IN (N Z 1) contains a Hamiltonian path. Additionally, IN (N 2 4)
contains a Hamiltonian cycle if and only if N is even, whereas the longest cycle in I N
contains all but one node if N is odd.

The embedding of cycles in the Incomplete Hypercubes is based on Gray codes.
There are many different ways to generate Gray codes. Here we adopt the known
Binary Reﬂected Gray Code (BRGC), see e.g.,[25, 101], to achieve our embedding. Let
G... denote the sequence of m-bit BRGCs. It is easily seen that G1 = (0,1). To build
the 2—bit BRGCs, preﬁx a ‘0’ to each code in G1, then reverse G1 and preﬁx a ‘1’ to
each code. In other words, we get G2 = (00, 01, 11, 10). More generally, let G.- denote
the sequence of codes obtained from G. by reversing its order, and 0G. (respectively,
1G.) the sequence of codes obtained from G.- by preﬁxing a ‘0’ (respectively, ‘1’) to
each element of the sequence G.'. The m-bit BRGCs can be generated by the recursion
[101]:

G... =(0G..._1,IG..._1)

Table 5.1 shows the 5-bit BRGCs, where G5(i) is the ith (0 S i S 31) BRGC in the
sequence G5. Let G5(i) denote the (regular) 5-bit binary code for integer i. Table 5.1
also lists the integer j such that C'5( j ) = G5(i) for each i and j.

The following observation is important:

Lemma 5.1 The binary codes C...(i) and C...(i + 1) diﬂ'er in bit 0 if and only ifi is

evenforOSiSZm—l.

Consequently, C...(i) and C...(i + l) are two consecutive BRGCs, although C...(i) may

come before or after G...(i + 1) in the sequence Gm.

91

Table 5.1. 5-bit Binary Reﬂected Gray Codes.

 

 

 

i 05(1) j i 05(1) j i 05(1) j i 05(1) j
0 00000 0 8 01100 12 16 11000 24 24 10100 20
1 00001 1 9 01101 13 17 11001 25 25 10101 21
2 00011 3 10 01111 15 18 11011 '27 26 10111 23
3 00010 2 11 01110 14 19 11010 26 27 10110 22
4 00110 6 12 01010 10 20 11110 30 28 10010 18
5 00111 7 13 01011 11 21 11111 31 29 10011 19
6 00101 5 14 01001 9 22 11101 29 30 10001 17
7 00100 4 15 01000 8 23 11100 28 31 10000 16

 

 

 

 

 

 

 

 

 

 

 

 

 

5.1.1 Cycles

Our embedding of ring is based on the longest cycle in the Incomplete Hypercube.

We ﬁrst consider cycles in the Incomplete Hypercube.

Theorem 5.1 A cycle of length l can be embedded in IN, where l is even and 4 S
l S N.

Proof: 3 Since the result for the (complete) hypercubes is known [101], we assume
that N is not a power of 2. Notice that I N contains a hypercube B.._1 . Hence according
to [101] cycles of length I can be embedded in B.._1 for 4 S l S 2"“. It remains to
show that the theorem is true for 2'"1 < l S N. We distinguish between the following
two cases:

Case 1: 2’”1 +1 S N S 3-2”‘2.

Deﬁne M = 3 - 2”"2 - N. Let L0 = 00G.._2, L1 = 01G.._2 and L2 = 10(G..-2 \ M),
where (G..-g \ M) denotes the sequence of codes resulted from the removal of the
M C.._g(j)’s from G.._2 for 2"“2 — M S j < 2””. Figure 5.1(a) shows a schema,
where by selecting nodes alternatively between Lo and L2, a cycle of length 22 can be
embedded in 122. Indeed, deﬁne two nodes j and j +1 to be a pair, where j is an even

number and 2""1 < j < N. By Lemma 5.1, C..( j ) and C..( j + 1) are two consecutive

92

 

BRGCs. Hence a pair of “up” and “down” edges can be used to include the pair j
and j + 1 in the L2 to form a cycle. Consequently, every node in L; can be included
to form a cycle if it is paired with another node.

Case2: 3-2""+1SNS2"—1. .

DeﬁneM = 2"—N. Let L0 = 00G.._2, L1 = 01G.._2, L; = 10G.._g and L3 = 11(G..-2\
M), where (G...; \ M) is deﬁned the same way as in case 1. Figure 5.1(b) shows that
by selecting nodes alternatively between Lo and L2, and nodes alternatively between

L; and L3, a cycle of length 28 can be embedded in [28. Again, only an even number

 

 

of nodes in L3 can be included in a cycle as we observed in case 1. D
"it“
La
I-o
L1 0‘ m 001 011 010 110 In 101 100
(II)
La
Lo
”'1
L: ‘1 000 001 on one 101 100

 

(b)

Figure 5.1. Embedding of longest cycles in (a) In, and in (b) 130.

We next study cycles of Odd lengths. By applying the same schema described in
the previous proof, Figure 5.2 shows an embedding of a cycle with length 22 in 123.
Note that node 10110 (22) is excluded from the cycle, since node 23 is not in 123.

Recall the fact that there is no cycle of odd length in the hypercube (Lemma 3.3).

 

93

 

000 001 O11 010 110 111 101 100

Figure 5.2. Embedding of a cycle with length 22 in 123.

Furthermore, Lemma 2.2 shows that I N is a subgraph of B... Hence we have
Lemma 5.2 There is no cycle of odd length in the Incomplete Hypercube.
Consequently, we have proved the following results:

Theorem 5.2 For N _>_ 4, I N contains a Hamiltonian cycle if and only if N is even,

whereas the length of the longest cycle in I N is N — 1 if and only if N is odd.

Corollary 5.1 A ring of size N can be embedded in IN with dilation one if N is 7

even, with dilation two if otherwise.

Proof: It is clear when N is even. For N being Odd, the only node that cannot be
included in the longest cycle has at least one adjacent node in the cycle. Hence the

embedding can be achieved by paying a higher price (i.e., dilation is 2). Cl

5.1.2 Hamiltonian Path

We now focus on the embedding of linear array. By Theorem 5.1, the graph I N
contains a Hamiltonian path when N is even. Figure 5.3 shows that by replacing
edge (00110,00111) with edge (10110.00110), 123 contains a Hamiltonian path. More

generally, we will prove the following result:

Theorem 5.3 IN contains a Hamiltonian path.

94

 

0‘ one 001 011 010 110 111 101 100

Figure 5.3. Embedding of a Hamiltonian path in 123.

Proof: It is easy to verify for N S 4. For N Z 5, we distinguish between the
following two cases:

Case 1: N is even.

In this case, I N contains a Hamiltonian cycle according to Theorem 5.2. By removing
exactly one edge from the cycle, we have a Hamiltonian path.

Case 2: N is odd.

Recall that nodes are labeled between 0 and N -— 1. Based on the schema presented
in the proof of Theorem 5.1, we are able to embed a cycle, LC, in I N, which contains
all but node N — 1. Now, consider the two nodes N — 1 — 2’“1 and N — 2"“. Observe
that the number N - 1 - 2"“1 is even and hence N — 2""1 is odd. As a consequence
of Lemma 5.1, the edge (N — 1 — 2"“1,N — 2"”) is in LG. Furthermore, the edge
(N - 1,N — 1 — 2"“) is also in IN. The path formed by the removal of the edge
(N — l — 2””1,N — 2"”) and the addition of the edge (N — 1, N — l — 2"“) to LG

is exactly a Hamiltonian path. CI

5.2 Routing and Single Node Broadcasting

In this section, we review the routing and the single node broadcasting algorithms
proposed for the Incomplete Hypercubes. The routing problem in the Incomplete

Hypercube was ﬁrst studied by Katseff [68]. Indeed, Algorithm 1 for routing in the

 

95

generalized Fibonacci cubes is essentially the same as that proposed in [68]. Figure 5.4
shows the structure of Is and the routing path (with arrow in each link) from node

011 to 100, which is determined by Algorithm 1. Lemma 5.3 is a direct result from

 

100

Figure 5.4. Routing path from node 011 to 100 in 16.

[68], which says that Algorithm 1 always ﬁnds a shortest path between any two nodes.

Lemma 5.3 Lets and d be two distinct nodes in IN, and H(s,d) the Hamming dis-
tance between these two nodes. Algorithm 1 always ﬁnds a shortest path of length
H(s,d) from s to d.

It is also shown in [68] that the routing paths determined by Algorithm 1 are deadlock-
free. Given any two nodes in I N, by Lemma 5.3 there is a path between them. Hence
I N is a connected graph. Furthermore, the Hamming distance between the two nodes
2""1 (with address 100 - - 0 0) and 2'”1 — l (with address 011 - - - 1) in IN is exactly n.

Therefore we have proved the following result.
Lemma 5.4 IN is a connected graph and its diameter is n.

Katseff [68] has also proposed an algorithm for single node broadcasting in the
Incomplete Hypercube, which is essentially the same as Algorithm 2 presented in
Section 4.2. Figure 5.5 illustrates an example of a single node broadcasting from

node 0101 in I“, when Algorithm 2 is applied. Notice that each node is represented

 

96

by an ellipse labeled with its address; only the links determined by Algorithm 2 are
shown in the ﬁgure. The travel array received at each node is shown above it, whereas

the source node initializes the array to all 1’s (TRUE).

 

 

 

Figure 5.5. A broadcasting from node 0101 in In.

It is known [68] that the set of nodes and links traversed by executing Algorithm 2
for any single node broadcasting in I N form a spanning tree, and the spanning tree
will be referred to as STI. We write 5' TI N(r) to denote the STI in I N rooted with node
r. Figure 5.5 also shows the STIu(0101). Observe that two STIs rooted at different
nodes are generally not isomorphic, which is similar to what we saw in the case of
the STFs. The next lemma is a result from [68], which shows that the single node
broadcasting algorithm sends the message through the same path as determined by

the routing algorithm.

Lemma 5.5 Lets and t be two distinct nodes in IN. Let p be the path determined
by Algorithm 1 to route a message from s to t. If a message is broadcast from s by

Algorithm 2, then it will reach t along a unique path that coincides with p.

 

97
5.3 Analysis of Single Node Broadcasting

In this section, we analyze the single node broadcasting. We will follow the assump-
' tions made in Chapter 4, i.e., all messages are of unit length and each takes a unit

time for transmission between two connected nodes.

5.3.1 All-port Model

Theorem 5.4 shows that Algorithm 1 is optimal under the all-port model.

Theorem 5.4 Let d, be the maximal distance between an (arbitrary) node 3 and all
the other nodes in IN. Then, under the all-port communication model, Algorithm 2

can optimally complete a single node broadcast from node 3 in exactly d, time steps.

Proof: It follows from Lemmas 5.3 and 5.5. CI
Notice that d. is n if both nodes 3 and 2" - s — 1 (i.e., 3) exit in IN, and n — 1 if

otherwise.

5.3.2 One-port Model

As we saw in the case of the generalized Fibonacci cubes, the scheduling scheme
makes a critical difference in the overall broadcasting time. Recall that the MSLF
scheme has been proposed for one-port single node. In what follows, we will show
that the same one-port single node broadcasting algorithm presented in Section 4.3.2
is Optimal when applied to the Incomplete Hypercube.

To motivate, we revisit the broadcast from node 0101 in In. Figure 5.6 shows
that the one-port broadcasting requires 4 time steps, where each link is labeled by
the time step at which the link is involved in forwarding the message. Notice that
the distance between the nodes 0101 and 1010 is exactly 4, which is a lower bound

for such a broadcasting. Hence the one-port algorithm is optimal in this case. We

 

 

Figure 5.6. One-port broadcasting from node 0101 in [11.

next show that the one-port broadcasting algorithm takes exactly n time steps for
any single node broadcasting in I N. It is necessary to examine closely the relation
between the STIs and the SBTs at this point.

First we try to apply Algorithm 3 to construct STIs. Figure 5.7 illustrates a

 

 

 

 

 

 

a - o A
001.1 a 0101 3 on. .o
o o
000:. 0010 on 0100 0101 one
1 0 o 0
000 001 0010 zoo

 

Figure 5.7. Construction Of STI30(01110) from SBT5(01110).

construction of S T130(01110) from SBT5(01110). It is seen that the parents of nodes
10110, 11010 and 11100 are switched respectively to nodes 00110, 01010 and 01100,
since node 11110 is not in 130. However, Algorithm 3 fails to give a construction of

STI22(01110), because both nodes 11110 and 10110 are not in In whereas node 10010

99 I

(a child of 10110) is in In (refer to Fig. 5.7). Further effort leads to a construction of
the STI;2(01110) from the SBT5(01110) as illustrated in Figure 5.8. Notice that the

 

(3132223 «In» ' an!» ‘ an;
(3333) (2:139 (325.333) Giﬁiib an» «In-1 «mp am an» @-
o . 0
«m» «m sass cat-2;?» @293» (2:13:95 «an» «as!» am
1 ' O
«m «In cases «mm
0
«as

Figure 5.8. Construction of 5T122(01110) from SBT5(01110).

link sequences (4,3,2) and (3,2,4) connect node 01110 to node 10010 in SBT5(01110)
and STIn(01110), respectively. Similarly, the link sequences (4,3,1) and (3,1,4) con-
nect node 01110 to node 10100 in the SBT and the STI, respectively. This Observation

leads to the following lemma which extends Lemma 4.4.

Lemma 5.6 Let uo be a node in IN. Let (uo, u1,- - - , 11..., 11;, u.+1, - . - , u.+,-_1, u...,-) be
the path in SBT..(uo) connecting node no to u.-+,-, and (l1, l2, - - . , l.-, l.-+1, l.-+2, - - - , l.-+,-)
the corresponding link sequence. Assume that 11.4, is in I N and Ui+j_1 is not in IN.
Then

I. There exists a node u. that is in IN whereas all 21,-4.1, 11,-4.2, - - - ,u.+,_1 are not in

IN:

2. Within B.., the link sequence (l.-+2,l.-+3, - - ~ , l.-+,-) from u.- leads to a node u that

is in IN, and

3. Within B.., link l.-+1 from u leads to u.-+,-.

 

100

Proof: By back tracking from node u.-+,- in SBT..(uo) to the root uo, one should
beable to see that there exists a node u,- that is in I N. This is true because at least
uo is in I N. (Note that u.- may be the same node as uo.)

Next we prove that u is in I N. The following two- Observations are crucial:
1. u.-+,- and u differ in exactly bit l.-+1, and
2. (Lemma 5.8) bit I.“ of u is 0 and bit I.“ of u,+,- is 1

Since u.-+,- is in IN, node u must be in IN. This is because, by observation 2, the
(integer) label of u is less than that of u.-+,-.
Finally, the edge (u, u.+,-) in B.. (or, more precisely, I N) is because both nodes are

in I N and they differ in exactly bit l.-+1 as we just observed. [2]

As in the case of the generalized Fibonacci cubes, links in STIN(s) can also be
classiﬁed into two sets: set A includes links which are not in SBT..(s), and set B the
remaining links. For example, links (00010,10010) and (00110.00010) in S TI22(01110)
are in set. A and B respectively (refer to Fig. 5.8). Links in set A are resulted from
some “missing” nodes (relative to B..). Similarly, a link I in set A connects two special
nodes: the one deﬁning l as an out-edge is called type-a node and the other is type-ﬂ.

The following lemma shows that each type-oz node has exactly one type-,6 node as its

child.

Lemma 5.7 Let u be a type-a node in STIN(s). Then u has exactly one type-ﬂ node
as its child in STIN(s).

We need the following crucial observation before proving Lemma 5.7.

Lemma 5.8 Let (q,r) be a set A link in STIN(s) such that i is its link number and

q is a type-a node. Then bit i ofq is 0.

101

Proof: Note that q must be an internal node in an STI, because no links in set
A could incident from a root node. Assume that p is the parent of q in STI,.(s) and
the link number of (p, q) is j. By deﬁnition i > j. We prove by contradiction that bit
i of q is 0. Suppose that bit i of q is 1, then bit i of p is also 1. Notice that p and
r differ in exactly bits i and j. Deﬁne t = p — 2‘. Clearly, node t is in IN. According
to the MSLF scheme, at node p the one—port algorithm would forward the message
through link i to node t which, however, would forward the message to r. This is a

contradiction. C]

Now, we proceed to prove Lemma 5.7.

Proof: Observe that u must be an internal node in 5 TI N(s). Let p be the parent
of u and k be the link number of link (p, u). We prove the lemma by contradiction.
Suppose u has two children v and w and the link numbers for (u, v) and (u, w) are i
and j, respectively. Without loss of generality, assume that i = 2, j = 1 and k = 0.
By Lemma 5.8, both bits 2 and 1 of u (and hence p) are 0. Consider the following
two cases: (a) bit 0 of u is 1, and (b) bit 0 of u is 0 as diSplayed in Figure 5.9.
Observe that N must be at least 6 and 5 in case a and case b, respectively. By the
one-port algorithm, the broadcasting path from p to v would be (000,100,101) in case
a. Similarly, the broadcasting path from p to w would be (001,011,010) in case b.

Both are contradictions. Hence we have proved the lemma. [3

By Lemmas 5.6 and 5.7, one should be able to design an algorithm similar to Algo-
rithm 3, which allows us to transform an SBT to an STI. Hence we have the time

required by the one-port broadcasting algorithm.

Theorem 5.5 The one-port single node broadcasting algorithm requires at most n

time steps to broadcast one message from any node in I N.

 

102

   

(a) (b)

Figure 5.9. Type-a node and its type-ﬂ child.

Proof: (Sketch) Consider the scheduling in type-a
nodes. Let (uo,u1,- - - ,u.~-1,u.-,u.-+1, - - - ,u.-+j-1,u.-+,-) be a path and u be the node
as speciﬁed in Lemma 5.6, i.e., link (u, u.-+,-) is in set A, and u (respectively, u.-+,-) is a
type-a (respectively, type-ﬂ ) node. Within B.., the broadcasting message will reach
and at step 12. — 1.4,. On the other hand, it can be shown that the broadcast message
will reach node u in no more than n — 1...,- - 1 time steps. By Lemma 5.7, there is
exactly One set A link originated from u. Under the MSLF scheme, u will forward
the message through the set A link ﬁrst, then through set B link(s). Therefore, the
message will reach u.+,- (a type-,6 node) no later than n — 1..., steps, i.e., at a time
no later than that resulted from the one-port broadcasting in B... For nodes that are
not type-a, it can be shown that the scheduling on them will not be delayed by the

type-a nodes. C]

Remark: Let (uo,u1, - - - ,u.) be a path in STIN(uo). Figure 5.10 shows that there
may be two type-a nodes in such a path, i.e., nodes 0001 and 1000. (In general,
there may be more than two type-a nodes in a path.) Since it takes only 1 (which
is 1 time step earlier compared to the scheduling in SBT4(1010)) time step for the

broadcasting message to reach node 0001, one should be able to see the scheduling

103

in node 1000 will not be affected (delayed) by a type-a ancestor. This is similar to

what we saw in the case of the generalized Fibonacci cubes.

 

Figure 5.10. 3711110010).

Corollary 5.2 The one-port broadcasting algorithm is optimal in terms of minimal

time steps.

Proof: It sufﬁces to show that n is a lower bound for any one-port broadcasting
algorithm. To see this, under the one-port model the total number of nodes receiving
broadcast message is at most doubled for every consecutive time step. Hence it takes

at least n steps for all N nodes to receive the broadcast message. C]

We now show that accumulation, gather and scatter operations can also be efﬁ-
ciently implemented in the Incomplete Hypercube. The approaches taken here are

similar to what we did in Chapter 4.

104

5.4 Single Node Accumulation

As we saw in Section 4.4, the accumulation is a dual Operation of the broadcasting.
Therefore, the times required for the all- and one-port single node accumulation in

the Incomplete Hypercube are captured by Theorems 5.4 and 5.5, respectively.

5.5 Single Node Gather and Scatter

Recall that the STF-FNF scheme for the single node scatter in the generalized Fi-
bonacci cubes. Observe that all paths in the STIs are shortest-path, which is a key
factor to the efﬁciency of the STF-FNF. Along the same line, we propose the following

scheduling rule:

The source sends the message to the farthest node ﬁrst (break ties arbi-

trarily) along the links in the STI.

for scatter Operation in the Incomplete Hypercube. The above scheme will be referred

to as STI, Farthest Node First (STI-F NF), and the time required by the STI-FNF is:

Theorem 5.6 Under the one-port model, any single node scatter (respectively,
gather) in IN (where N Z 2) can be achieved in exactly N — 1 time steps, which

is optimal.

Proof: The proof for time requirement is similar to the case of the generalized
Fibonacci cubes. To see the optimality, observe that the root needs to send N - 1

messages. C]

The following two lemmas will facilitate our analysis of the all-port single node

gather / scatter.

Lemma 5.9 Lets be a node in IN. The tree STIN(s) is isomorphic to a subgraph of
SBT..(s).

105

Proof: Recall that in the case of the generalized Fibonacci cubes, Lemmas 4.4
and 4.6 are essential to the proof of Lemma 4.7. Lemmas 5.6 and 5.7 show that the

Incomplete Hypercube possesses similar properties. We omit the detailed proof. D

Figure 5.11 shows that STIgg(01110) is isomorphic to a subgraph of SBT5(01110).
It can be seen that node 00110 in the STI should be mapped to 11110 in the corre-
sponding SBT. Consequently, node 00010 (respectively, 00100 and 00111) should be
mapped to 10110 (respectively, 11010 and 11100) etc.

 

 

 

 

 

 

 

 

 

 

«use
1 a o
a 3 ° 0
0 e
an: ®
(O)
am
e . a o.
m (in! am an»
:I a 1 1 O 0
mm m» m, an:- (553- am»
3 O 1 O O

 

(b)

Figure 5.11. (a) STI22(01110) and (b) a subgraph of SBT5(01110).

Lemma 5.10 Let s be a node in IN. Assume that algorithm A can be applied to
perform gather (respectively, scatter) operation based on SBT..(s) in X time steps.
Then algorithm A can be applied to perform gather (respectively, scatter) operation

based on STIN(s) in no more than X time steps.

106

Proof: Similar to the proof of Lemma 4.8. Omitted. 0

Therefore, by Lemmas 4.10, 5.9 and 5.10, we have the following result for the
all-port model.

Theorem 5.7 Under the all-port model, any single node gather (respectively, scatter)

in I N can be achieved in no more than 2"”1 time steps for N 2 2.

Again, it can be shown that a tighter upper bound is |T.|, where |T,| (S 2"“) is the

maximum size of the subtrees branched from the root.

5.6 Multinode Broadcasting and Accumulation

Recall that Algorithm 5 has been proposed for the multinode broadcast-
ing/accumulation in the generalized Fibonacci cubes. By Lemma 5.4, I N is a con-
nected graph with diameter n. Hence it can be shown that Algorithm 5 also applies

to the Incomplete Hypercube. Consequently, the time required. is:

Theorem 5.8 Under the all-port model, the multinode broadcasting (respectively, ac-

cumulation) in I N can be achieved in no more than 2""1 time steps for N Z 2.

By Theorem 5.1, the length of the longest cycle in I N is N if N is even and N - 1

if otherwise. Again, by embedding a longest cycle in I N, we have the following result.

Theorem 5.9 Under the one-port model, the multinode broadcasting (respectively,
accumulation) in I N (where N Z 4) can be achieved in N — 1 time steps if N is even

and in 2N — 3 time steps if otherwise.

Proof: Similar to the proof Of Theorem 4.10. Omitted. Cl

107

5.7 Summary of Results

We have studied the Hamiltonian problem and some data communication primitives
in the Incomplete Hypercube. For the Hamiltonian problem the following Optimal

results have been obtained:

1. I N contains a Hamiltonian path. Hence a linear array of size N can be embedded

with dilation-1 , and

2. The longest cycle in IN (where N 2 4) contains N nodes if N is even, and
N — 1 nodes if otherwise. Consequently, a ring of size N can be embedded with

dilation-1 if N is even, with dilation-2 if otherwise.

For data communication, we have studied the single node broadcasting/ accumulation,
single node scatter/ gather, and multinode broadcasting/accumulationprimitives in
the IncOmplete Hypercube. We have deﬁned the STIs and presented the STI-based
algorithms for single node broadcasting / accumulation and single node scatter / gather
in the Incomplete Hypercube. The STIs with different roots in I N are not isomorphic
in general. Again, by showing that an STI is isomorphic to a subgraph of an SBT, we
have presented a new technique for studying data communication in the Incomplete
Hypercube.

Table 5.2 summarizes the communication primitives and their time complexities

studied in this chapter.

108 '

Table 5.2. Time complexities for communication primitives in I N.

 

 

IrCommunication Primitive I Model I Time II

 

 

 

 

 

 

 

single node broadcasting all-port d; l
(accumulation) one-port n
single node scatter all-port 2"-l
(gather) one-port N — 1
multinode broadcasting all-port 2"”l
(accumulation) one-port N — 1f

2N -'- 31

 

 

 

 

 

 

‘ The distance between the source node 8 and all the other nodes.
1 If N is even.
1 If N is odd.

CHAPTER 6

CONCLUSION AND FUTURE
WORK

In this chapter, we summarize our main results, highlight our contributions, and

discuss some directions for future work.

6.1 Main Results and Contributions

We have studied a large family of irregular network topologies that consists of the
generalized Fibonacci cube and the Incomplete Hypercube. Each member of the
family is shown to be a subgraph of a hypercube. Additionally, both the generalized
Fibonacci cube and the Incomplete Hypercube include the hypercube as a special
case. The Incomplete Hypercube Offers unlimited sizes; however, it suffers from a low
degree of fault-tolerance under certain situations. On the other hand, the generalized
Fibonacci cube offers a guaranteed degree of fault-tolerance.

To investigate their applications in parallel or distributed systems, we have pre-
sented a collection of efficient graph embedding and data communication algorithms
for this large family of network topologies. In particular, we have Obtained the fol-

lowing graph embedding results for the generalized Fibonacci cube:

109

110

1. 1‘: (where n 2 k 2 2) contains a Hamiltonian path, hence a linear array.

2. If Fl” is even, the longest cycle in Ff,(where n - 3 Z k 2 2) contains all nodes,
which implies that I‘I‘, embeds a ring of size Ff,” with dilation-l. On the other
hand, if 151"] is odd, the longest cycle in 1‘: Contains all but one node, which

implies that F: embeds a ring of size Ff] with dilation-2.

3. We have shown that some 2D and 3D meshes of certain sizes can be embedded

in the generalized Fibonacci cube with expansion-1 and dilation-1.
4. We have shown that a hypercube can be embedded in Fiﬁ.

The Hamiltonian problem for the Incomplete Hypercube has also been studied. We
have shown that I N contains a Hamiltonian path, and the length of the longest cycle
in IN (where N Z 4) is N if N is even, and N - 1 if otherwise.

For data communication in both the generalized Fibonacci cube and the Incom-

plete Hypercube, we have devised efﬁcient algorithms for the following primitives:
1. Single node broadcasting/ accumulation,
2. Single node scatter/ gather, and
3. Multinode broadcasting/ accumulation.

All of our algorithms have been carefully analyzed under both the one— and all-port
models.
The family of network topologies studied in this dissertation has the following two

areas of application.

1. It provides a spectrum of alternative structures for fault-tolerant computing in
the hypercube systems. This is because each member of the family (a generalized

Fibonacci cube or Incomplete Hypercube ) is a subgraph of a hypercube.

111

2. It makes small-sized incremental expansion possible. This is a very desirable
property when constructing parallel or distributed systems that evolve with

time.
We consider the following as our contributions.

1. Most existing results of graph embedding and data communication are only
applicable to a single kind of regular networks. Here we have presented new

results that can be applied to a large family of irregular network topologies.

2. By using a few parameters, (e.g., n and k for the generalized Fibonacci cube,
and N for the Incomplete Hypercube) to characterize members of the family,
we have been able to described the relations between the structural properties
and the algorithmic efﬁciency of a network. Our approach has provided a new

framework for studying networks.

6.2 Future Work

Obviously, there are many important issues that have not been addressed. The fol-

lowing is a list of possible directions for future work.
1. Graph embedding: We have identiﬁed the following outstanding problems:

0 Embedding in hypercube: As we mentioned, this family of network topolo-
gies studied can serve as a fault-tolerant structure for the hypercube. How-
ever, we have not addressed the issue of how to efﬁciently embed a gener-

alized Fibonacci cube or an Incomplete Hypercube in a faulty hypercube.

e Subcube allocation: It has been widely anticipated that future parallel sys-
tems will consist of thousands or even millions nodes. Since a typical user

may not require all the processors, Hayes et al. [55] and Peterson et al. [95]

112

have suggested that the hypercube systems should support multiprogram-
ming to promote system utilization. The. problem of subcube allocation
in the hypercube systems has been studied extensively [3] [22] [25] [28]
[38] [70]. Similarly, the problem of submesh allocation in the 2D mesh
systems has also drawn much attention [76] [77] [85]. Since the generalized
Fibonacci cube can also be decomposed into subcubes, we expect that the
subcube allocation in the generalized Fibonacci cube can be tackled in a

ﬂexible fashion.

0 Optimal embedding of meshes: We have shown that some 2D and 3D
meshes can be embedded in the generalized Fibonacci cube with expansion-
1 and dilation-1. Given a 2D or 3D mesh of certain size, how can we achieve

a minimal expansion and/or minimal dilation embedding?

2. Structural properties: Our work is just a beginning. To have more complete

knowledge of the family of networks, we need to investigate other topologi-
cal properties such as bisection width, node/edge connectivities, independent
paths, and the relation between the generalized Fibonacci cube and Incomplete

Hypercube.

. Application algorithms: A network topology is never useful if it does not
support efﬁcient algorithms. We have presented many tools for efﬁcient graph
embedding and data communication for the family of network topologies. Re-
cently, we have shown that preﬁx computation can be done efﬁciently in the
generalized Fibonacci cube [64]. We would like to investigate along the same

line.

. Optimization: We have studied a spectrum of new topologies, which supports
different performance/structural features. Given a speciﬁcation of desirable

features, which may consist of a list of requirements (e.g., number of nodes,

TR

 

, 113

edges, degree, and performance features), how can we select member(s) from

this family of networks that will satisfy the requirements with a minimal cost?

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] ABRAHAMSON, K., DADOUN, N ., KIRKPATRICK, D. G., AND PRZYTYCKA,
T. A simple parallel tree contraction algorithms. Journal of Algorithms 10
(1989), 287—302.

[2] AKL, S. G. The Design and Analysis of Parallel Algorithms. Prentice Hall,
1989.

[3] AL-DHELAAN, A., AND BOSE, B. A new strategy for processors allocation in
an n-cube multiprocessor. In Proc. IEEE the 8th Annual Int ’l Phoenix Conf.
on Comput. Comm. (1989), pp. 114-118.

[4] ANDERSON, E., AND ET. AL. LAPACK: a portable linear algebra library for
high-performance computers. In Proc. of Supercomputing Conference (1990),
pp. 2-11.

[5] BANERJEE, P. Strategies for reconﬁguring hypercubes under faults. In IEEE
20‘“ Intn’l Conf. on Fault-Tolerant Computing (1990), pp. 210-217.

[6] BARNES, G. H., BROWN, R. M., KATO, M., KUCK, D. J., SLOTNICK,
D. L., AND STOKES, R. A. The Illiac IV computers. IEEE Trans. Computers
027 (1968), 84—87.

[7] BATCHER, K. E. Sorting networks and their applications. In Proc. of the
Spring Joint Computer Conference (1968), AFIPS Press, pp. 307—314.

[8] BECKER, B., AND SIMON, H.-U. How robust is the n-cube. Information and
Computation 77 (1988), 162—178.

[9] BERTSEKAS, D. P., OZVEREN, C., STAMOULIS, G. D., TSENG, P., AND
TSITSIKLIS, J. N. Optimal communication algorithm for hypercubes. Journal

of Parallel and Distributed Computing 11 (1991), 263-275.

[10] BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and Distributed Compu-
tation. Prentice Hall, 1989.

114

115

[11] BHUYAN,‘ L. N., AND AGRAWAL, D. P. Generalized hypercube and hyperbus
structures for a computer network. IEEE Trans. Computers C-33, 4 (Apr.

1984), 323-333.

[12] BLELLOCH, G. E. Scan as primitive parallel operations. IEEE Trans. Com-
puters 38, 11 (Nov. 1989), 1526-1538.

[13] BRAATEN, M. E. Solution of viscous ﬂuid ﬂows on a distributed memory con-
current computer. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 243-

250.

[14] BRENT, R. P. The parallel evaluation of general arithmetic expressions. Jour-
nal of the ACM 21, 2 (Apr. 1974), 201—206.

[15] BROWN, JR., .1. L. Zeckendorf’s theorem and some applications. Fibonacci
Quarterly 2, 3 (1964), 163-168.

[16] BRUCK, J ., CYPHER, R., AND SOROKER, D. Running algorithms efﬁciently
on faulty hypercubes. In ACM 2”“ Symposium on Parallel Algorithms and

Architectures (1990), ACM, pp. 37—44.

[17] BRUCKMAN, P. S. The generalized Zeckendorf theorems. Fibonacci Quarterly
27, 4 (1989), 338—347.

[18] CAPPELLO, P. R. Gaussian elimination on a hypercube automaton. Journal
of Parallel and Distributed Computing 4 (1987), 288—308.

[19] CHAN, M. Y. Dilation-2 embeddings Of grid into hypercubes. In Proc. Intn’l
Conf. on Parallel Processing (1988), pp. 295-298.

[20] CHAN, M. Y., AND CHIN, F. Y. On embedding rectangular grids in hyper-
cubes. IEEE Trans. Computers C—37, 10 (Oct. 1988), 1285—1288.

[21] CHAN, M. Y., AND LEE, S.-.I. Distributed fault-tolerant embeddings of rings
in hypercubes. J. of Parallel and Distributed Computing 11 (1991), 63—71.

[22] CHEN, G.-I., AND LAI, T.-H. Scheduling independent jobs on partionable
hypercubes. Journal of Parallel and Distributed Computing 12 (1991), 74—78.

[23] CHEN, M.-S., AND SHIN, K. G. Message routing in an injured hypercube.

In Proc. 3'“ Conf. Hypercube Concurrent Computers and Applications (1988),
SIAM, pp. 312—317.

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

116

CHEN, M.-S., AND SHIN, K. G. Depth-ﬁrst search approach for fault-tolerant
routing in hypercube multicomputers. IEEE' Trans. Parallel and Distributed

Systems 1, 2 (1990), 152—159.

CHEN, M.-S., AND SHIN, K. S. Processor allocation in an n-cube multi-
processor using Gray codes. IEEE Trans. Computers C-36, 12 (Dec. 1987),
1396—1407.

CHEN, S.-K., LIANG, C.-T., AND TSAI, W.-T. Loops and multi-dimensional
grids on hypercubes: Mapping and reconﬁguration algorithms. In Proc. Intn’l

Conf. on Parallel Processing (1988), pp. 315-322.

CHRISTON, M. A vectorized 3—D ﬁnite element model for transient simulation
of two-phase heat transport with phase transformation and a moving interface.

In Proc. of Supercomputing Conference (1990), pp. 436-445.

CHUANG, P.-J., AND TZENG, N.-F. Dynamic processor allocation in hyper—

cube computers. In Proc. The 17th Annual Int’l Sympo. on Computer Archi-
tecture (1990), pp. 40—49.

CVETANOVIC, Z. The effects Of problem partitioning, allocation, and granular-
ity on the performance of multiple-processor systems. IEEE Trans. Computers

035, 4 (Apr. 1987), 421-432.

CVETANOVIC, Z., FREEDMAN, E. G., AND NOFSINGER, C. Efﬁcient de-
composition and performance of parallel PDE, FFT, Monte Carlo simulation,

simplex, and sparse solvers. In Proc. of Supercomputing Conference (1990),
pp. 465474.

CYPHER, R., SANZ, J., AND SNYDER, L. Hypercube and shufﬂe-exchange
algorithms for image component labeling. Journal of Algorithms 10, 1 (1989),

140—150.

CYPHER, R., SANZ, J., AND SNYDER, L. Algorithms for image component

labeling on simd mesh-connected computers. IEEE Trans. Computers G39, 2
(1990), 276-281.

CYPHER, R., SANZ, J ., AND SNYDER, L. The hough transform has O(n)

complexity on n times n mesh-connected computers. SIAM J. of Comput. 19,
5 (1990), 805—820.

117

[34] DALLY, W. J., AND SEITZ, C. L. Deadlock-free message routing in multipro-
cessor interconnection networks. IEEE Trans. Computers C-36, 5 (May 1987),
547-553.

[35] DEIWERT, G. S. Computational aerothermodynamics. In Proc. of Supercom-
puting Conference (1989), pp. 51-57.

[36] DEKEL, E., NASSIMI, D., AND SAHNI, S. Parallel matrix and graph algo—
rithms. SIAM J. of Comput. 10, 4 (1981), 657—675.

[37] DUTT, 5., AND HAYES, J. P. Designing fault-tolerant systems using automor-
phisms. Journal of Parallel and Distributed Computing 12 (1991), 249—268.

[38] DUTT, 3., AND HAYES, J. P. Subcube allocation in hypercube computers.
IEEE Trans. Computers C-IO, 3 (Mar. 1991), 341—352.

[39] EI-‘E, K. Embedding mesh of trees in the hypercube. Journal of Parallel and
Distributed Computing 11, 3 (Mar. 1991), 222-230.

[40] EL-AMAWY, A., AND LATIFI, S. Bridged hypercube network. Journal of
Parallel and Distributed Computing 10 (1990), 90—95.

[41] EL-AMAWY, A., AND LATIFI, S. Properties and performance of folded hyper-
cubes. IEEE Trans. Parallel and Distributed Systems 2, 1 (Jan. 1991), 31—42.

[42] ESI-‘AHANIAN, A.-H., NI, L. M., AND SAGAN, B. The twisted n-cube with

application to multiprocessing. IEEE Trans. Computers C40, 1 (Jan. 1991),
88—93.

[43] FISHBURN, J. P., AND FINKEL, R. A. Quotient networks. IEEE Trans.
Computers C-31, 4 (1982), 288—295.

[44] FOLDES, S. A characterization of hypercubes. J. Discrete Math. 17 (1977),
155-159.

[45] FORTUNE, 5., AND WYLLIE, J. Parallelism in random access machines. In
Proc. 10“ Annual ACM Symposium on Theory of Computing (1978), pp. 114—
118.

[46] Fox, G., JOHNSON, M., LYZENGA, G., OTTO, S., SALMON, J., AND
WALKER, D. Solving Problems on Concurrent Processors. Prentice Hall, 1988.

[47] FOX, G. C., OTTO, S. W., AND .HEY, A. J. G. Matrix algorithms on a
hypercube I: Matrix multiplication. Parallel Computing 4 (1987), 17-31.

118

[48] GANNON, D., AND VAN ROSENDALE, J. On the impact of communication
complexity in the design of parallel numerical algorithms. IEEE Trans. Com-
puters 33, 12 (Dec. 1984), 1180-1194.

[49] GHOSH, K., AND FUJIMOTO, R. M. Parallel discrete event simulation using
space-time memory. In Proc. Intn’l Conf. on Parallel Processing (1991), pp. III-
201—III—208.

[50] GILBERT, E. N. Gray codes and paths on the n—cube. Bell System Technical
Journal 37 (1958), 815-826.

[51] GORDON, J. M., AND STOUT, Q. F. Hypercube message routing in the
presence of faults. In Proc. 3'“ Con]. Hypercube Concurrent Computers and

Applications (1988), SIAM, pp. 318—327.

[52] GOULD, R. J. Updating the hamiltonian problem — A survey. J. of Graph
Theory 15, 2 (June 1991), 121—157.

[53] GRAHAM, R. L., KNUTH, D. E., AND PATASHNIK, 0. Concrete Mathematics.
Addison Wisley, 1989.

[54] HASTAD, J ., LEIGHTON, T., AND NEWMAN, M. Reconﬁguring a hypercube
in the presence of faults. In Proc. 19th Anna. ACM Symp. Theory of Computer
Science (1987), ACM, pp. 274-284.

[55] HAYES, J. P., MUDGE, T. N., AND STOUT, Q. P. Architecture of a hy-
percube supercomputer. In Proc. Intn’l Conf. on Parallel Processing (1986),
pp. 653—660.

[56] HILBERS, P. A. J ., KOOPMAN, M. R. J., AND VAN DE SNEPSCHEUT, J.
L. A. The Twisted Cube. Spring-Verlag, 1987, pp. 152-159. Lecture Notes in

Computer Science.

[57] HO, C.-T. Optimal Communication Primitives and Graph Embeddings on
Hypercubes. PhD thesis, Yale University, May 1990.

[58] HO, C.-T., AND JOHNSSON, S. L. Distributed routing algorithms for broad-
casting and personalized communication in hypercubes. In Proc. Intn’l Conf.

on Parallel Processing (1986), pp. 640—648.

[59] HO, C.-T., AND JOHNSSON, S. L. On the embedding of arbitrary meshes in

boolean cubes with expansion two dilation two. In Proc. Intn’l Conf. on Parallel
Processing (1987), pp. 188—191.

119

[60] HO, C.-T., AND JOHNSSON, S. L. Embedding meshes in boolean cubes by
graph decomposition. Journal of Parallel and Distributed Computing 8 (1990),

325-339.

[61] HSU, W.-J. Fibonacci cubes - properties of an asymmetric interconnection
topology. In Proc. Intn’l Conf. on Parallel Processing (1991), pp. I722—723.

[62] HSU, W.-J. Fibonacci cubes-A new interconnection topology. IEEE Trans.
Parallel and Distributed Systems (1992). To appear.

[63] HSU, W.-J., PAGE, C. V., AND LIU, J. Embedding meshes in a large family
of graphs. Parallel Processing Letters (1992). Accepted for publication.

[64] HSU, W. J ., PAGE, C. V., AND LIU, J. Parallel construction of Fibonacci
trees and preﬁx computations on a family of interconnection topologies. In

Proc. CAAP/ESOP (1992).

[65] JAKOB, R., AND HACK, J. J. Parallel MIMD programming for global models

of atmospheric ﬂow. In Proc. of Supercomputing Conference (1989), pp. 106-
112.

[66] JOHNSSON, S. L. Communication efﬁcient basic linear algebra computations
on hypercube architectures. Journal of Parallel and Distributed Computing 4,
2 (1987), 133—172.

[67] JOHNSSON, S. L., AND HO, C.-T. Optimum broadcasting and personalized
communication in hypercubes. IEEE Trans. Computers G38, 9 (Sept. 1989),
1249-1268.

[68] KATSEFF, H. Incomplete hypercube. IEEE Trans. Computers C-37, 5 (May
1988), 604—608.

[69] KERMANI, P., AND KLEINROCK, L. Virtual cut-through: A new computer
communication switching technique. Computer Networks 3, 4 (1979), 267-286.

[70] KIM, J., DAS, C. R., AND LIN, W. A processor allocation scheme for hyper-

cube computers. In Proc. Intn’l Conf. on Parallel Processing (1989), pp. 11231-
11238.

[71] KNUTH, D. E. The Art of Computer Programming - Fundamental Algorithms,
2 ed., vol. 1. Addison-Wesley, 1973.

[721

[73]

[74]

[75]

[75]

[77]

[78]

[79]

[80]

[31]

182]

[83]

120

KNUTH, D. E. The Art of computer Programming — Sorting and Searching,

2 ed., vol. 3. Addison-Wesley, 1973.

LADNER, R. E., AND FISCHER, M. .1. Parallel preﬁx computation. Journal
of the ACM 27, 4 (1980), 831—838.

LANDER, E., AND MESIROV, J. P. Protein sequence comparison on a data
parallel computer. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 257-
263.

LEIGHTON, T. Introduction to Parallel Algorithms and Architectures: Arrays
Trees Hypercubes. Morgan Kaufmann, 1992.

L1, K., AND CHENG, K. H. Job scheduling in a partitionable mesh using a
two-dimensional buddy system partitioning scheme. IEEE Trans. Parallel and
Distributed Systems 2, 4 (Oct. 1991), 413—422.

LI, K., AND CHENG, K. H. A two-dimensional buddy system for dynamic re-

source allocation in a partitionable mesh connected system. Journal of Parallel

and Distributed Computing 12 (1991), 79—83.

LIU, J., CHIANG, C.-M., AND HUGHES, H. D. Performance analysis ofload-
sharing for hypercube multiprocessor systems. Computer Systems Science and
Engineering (1992). Accepted for Publication.

LIU, J., AND HSU, W.-J. Routing and broadcasting algorithms for a large
family of interconnection topologies. Tech. Rep. CPS-91-05, Dept. of Computer
Science, Michigan State University, Oct. 1991. Submitted for Publication.

LIU, J., AND HSU, W.-J. Distributed algorithms for shortest-path, deadlock-
free routing and broadcasting in a class of interconnection topologies. In Proc.
International Parallel Processing Symposium (1992), pp. 589—596.

LIU, J., AND HSU, W.-J. Distributed algorithms for shortest-path, deadlock-
free routing and broadcast in Fibonacci cubes. In Proc. 11‘” Phoenix Conf. on

Computers and Communications (1992), pp. 23-30.

LIU, J., AND HSU, W.-J. Optimal broadcasting and embedding in incomplete
hypercube. Submitted for publication, May 1992.

LIU, J., HSU, W.—J., AND CHUNG, M. J. Generalized Fibonacci cubes are
mostly hamiltonian. Submitted for publication, May 1992.

121

[84] LIU, J ., AND HUGHES, H. D. Performance studies of hypercube multiprocessor
systems. In Proc. Summer Computer Simulation (1990), Society for Computer
Simulation.

[85] LIU, J ., AND HUGHES, H. D. Dynamic job scheduling in partitionable mesh
connected multicomputer systems. In Proc. Summer Computer Simulation
(1992), Society for Computer Simulation. To appear.

[86] LUBECH, O. M., AND FABER, V. Modeling the performance of hypercubes: A
case study using the particle-in-cell application. Parallel Computing .9 (1988),

37-52.

[87] MA, Y.-W. E., AND TAO, L. Embeddings among toruses and meshes. In
Proc. Intn’l Conf. on Parallel Processing (1987), pp. 178-187.

[88] MILLER, G. L., AND REIF, J. H. Parallel tree contraction and its application.
In 26th Symposium on Foundations of Computer Science (1985), pp. 478-489.

[89] MILLER, G. L., AND REIF, J. H. Parallel tree contraction Part I: fundamen—
tals. In Advances in Computing Research, S. Micali, Ed., vol. 5. JAI Press Inc.,
1989, pp. 47—72.

[90] MILLER, G. L., AND REIF, J. H. Parallel tree contraction Part II: further
applications. SIAM J. of Comput. 20, 6 (Dec. 1991), 1128—1147.

[91] MILLER, R., AND STOUT, Q. Geometric algorithms for digitized pictures on
a mesh-connected computer. IEEE Trans. on PAMI 7, 2 (1985), 216-228.

[92] NASSIMI, D., AND SAHNI, S. An optimal routing algorithm for mesh-connected
parallel computers. Journal of the ACM 27, 1 (Jan. 1980), 6—29.

[93] NASSIMI, D., AND SAHNI, S. Data broadcasting in SIMD computers. IEEE
Trans. Computers 30, 2 (Feb. 1981), 101-107.

[94] PELEG, D., AND ULLMAN, J. D. An Optimal synchronizer for the hypercube.
SIAM J. of Comput. 18, 4 (Aug. 1989), 740—747.

[95] PETERSON, J. C., TUAZON, J. O., LIEBERMAN, D., AND PNIEL, M. The
Mark III hypercube-ensemble concurrent computer. In Proc. Intn’l Conf. on
Parallel Processing (1985), pp. 71-75.

122

[96] PREPARATA, F. P., AND VUILLEMIN, J. The cube-connected cycles: A versa-
tile network for parallel computation. Communications of the ACM 24, 5 (May
1981), 300-309.

[97] QUINN, M. J. Designing Eﬂicient Algorithms for Parallel Computers. McGraw
Hill, 1987. -

[98] RANKA, 8., AND SAHNI, S. Image template matching on MIMD hypercube
multicomputers. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 92-99.

[99] RANKA, 5., AND SAHNI, S. Hypercube algorithms : with Applications to Image
Processing and Pattern Recognition. Springer-Verlag, 1990.

[100] SAAD, Y., AND SCHULTZ, M. H. Data communication in hypercube. Tech.
Rep. YALEU/DCS/RR-428, Dept. of Computer Science, Yale Univ., 1985.

[101] SAAD, Y., AND SCHULTZ, M. H. Topological properties of the hypercubes.
IEEE Trans. Computers C-37, 7 (July 1988), 867-872.

[102] SCHWABE, E. J. On the computational equivalence of hypercube-derived net—
works. In ACM 2"“ Symposium on Parallel Algorithms and Architectures (1990),
ACM, pp. 388-397.

[103] SEITZ, C. L. The cosmic cube. Communications of the ACM 28, 1 (1985),
22-33.

[104] SHAPIRO, S. L., AND TEUKOLSKY, S. A. Building black holes, gravitational
waves, and relativistic ﬂuid ﬂow: Supercomputer cinema. In Proc. of Super—
computing Conference (1990), pp. 805—814.

[105] STONE, H. S. Parallel processing with the perfect shuffle. IEEE Trans. Com-
puters C-20, 2 (Feb. 1971), 153-161.

[106] SULLIVAN, H., AND BASHKOW, T. R. A large scale homogeneous, full dis-
tributed parallel machine, I. In Proc. 4th Symp. Comput. Architecture (1977),
pp. 105-117.

[107] TIEN, J.—Y., HO, C.—T., AND YANG, W.—P. Broadcasting on incomplete
hypercubes. Tech. Rep. RJ 8130, IBM Corp., 1991.

[108] TUTTE, W. T. Graph Theory. Addison-Wesley Publishing CO., 1984. .

[109] TZENG, N.-F. Structural properties of incomplete hypercube computers. In
Proc. Intn’l Conf. on Distributed Computing Systems (1990), pp. 262-269.

123

[110] TZENG, N.-F., CHEN, H.-L., AND CHUANG, P.-J. Embeddings in incomplete
hypercubes. In Proc. Intn’l Conf. on Parallel Processing (1990), pp. III-335-
111-339.

[111] TZENG, N. -F. ,AND WEI, S. Enhanced hypercubes. IEEE Trans. Computers
040, 3 (Mar. 1991), 284—294.

[112] WAGNER, A. Embedding arbitrary binary trees in a hypercube. Journal of
Parallel and Distributed Computing 7 (1989), 503-520.

[113] WU, A. Y. Embedding of tree networks into hypercubes. Journal of Parallel
and Distributed Computing 2 (1985), 238-249.

[114] YANG, G.-C. Parallelizing spice2 on shared-memory multiprocessors. In Proc.
Intn’l Conf. on Parallel Processing (1991), pp. III—151—III-158.

STATE UNIV LIBRRRIE

llll|l[ll|||2|]lll]|3 ollllle I]

[I] III ll