I" . . - . 1'. 1 ‘\" o I. '1 {1’ .1 ‘ ... ‘." - o 3 w‘ —4 A- .4... #1. -- f I \f’ 0" r. a - :‘W‘f‘ 3".“ p O .L_ mass 3 2‘ :- 131' I I, . Wt" ., { "I, a ' K' _’;. - .- . 1“} ' . ’1' (“ 1 . . .' . . '. . " 1“ - " $5.923?“ ”1:2". "I 11 '1 33";‘2‘63' 3' ' , .. - -2 .1" - . 3 a;fitf"?‘~.htu.:-J~,A 1,; . .' ' I ‘1 "‘71131*-/‘.’>; 7. ' 1 _ .- - .. 5 H ‘ v-V- “fi’fifng- “1'7.” . ‘ . I ‘ . To 'fi II . ' v.14x11 .uhl l1 I .‘u‘u, 9 .q, o ‘. ‘ .., ' 2' 1“,,” ' .r‘ ”1",.1" J 11' '.1 ' '1'." n l u. q. (m. 1‘ '_ . . .1 l ' k,’ ‘ -.r.v,x 7131*“)5'. :23! ‘7‘: I“, 21.0.. n 3" ‘.l" 1.1”. ;: 91.11111; r ,1” I ;: . ‘ '. .1 fififl'111'0.1~1.-.:l._ _ , {x ' ’4"{5:fi'{g' ' 1?} .n‘: . g . _.E A 13.21”: .r‘i1b'h' 111?";1' If”;""- . ." '11,, "E1“. ' ' ‘ (gig)? «yummy. 11"!“ h .1 Elf .'P:Ed‘®]vbm_ "Huff; “$311515 1 . ‘ ' ‘ - ' .0. If . 11o . ' .0. ~00 k~v~1.v\ ‘5’. . I ..’ I "\’ O O‘..- . , v 0 O. “I" I 'Lw. ‘ ‘ O "C |- ’..L :'..‘ '3 1" ,' 4 . . on - . 1.}:1‘L4‘J .‘5’ ‘ . . -' I '3 a .0 . n 11"“... ’vun‘, . 1V, ‘ ' . .‘9 d o I . . , ' )5... '.v;~b’.’ . I' ‘ . ' I 0‘ . :T . V." .rv.’ . a . c 1"- (WEI-{‘2’ ,' '. I U} '31.!“ 3 , I . ... r 1. ; ... ‘ ml .'-- , a - .- 11'" . 1““. "1.3”“. . -. 0,1 . 1' _ . , , ‘ . IO": ""1 ' ,lnY '1’... ‘r ‘ ‘v . .- ’ I ' "o . .- l . i ‘ '7 W . 1 1 1 . g a . . ‘3'- .. 0 -- HI 'I‘ z ., t ' o .. - .r M. O I 1' ' ' ' u o» . . ‘ I 0 .U " : ‘ d . . 0 In]. . ' . 1 I‘ o . -I- , 3 x, ' ‘ . ' .. 4 i. I - O ..l, . .'. f O h . I} k u 3 t . ‘ I ‘ ; , ' It. ' . I - - - - O u ‘ I‘ . _ . I ' .. . 1 _ 1‘ V -- l.; - - .‘1 - ‘ '.. 5-. . ‘.§ 03‘ . ._ r‘: I! 1 IL - I I I' " - ..-’u‘<’.fl’.'. -‘ ' 1‘1”" HI- | "4;- "I D 1 "1‘. T... .--v—.: 1‘ . .flHI'H 'Irfl“‘. Il‘l ’ .. . . 421'? “téfifi .‘e’uo' "3%,: 4' :Q‘L‘GIS; 1"." ru‘fcrltcfl' 5&3“: 1.5414 619:?” #1.. _'1'..I ”‘5 1K - $5 '0. ‘1? E“ L? 11,. {3* RR. ‘44. in la- ‘33:" “9‘8, ‘ rt :1 5" L -“ “‘1'? at” I J. J .3 I v; .‘ “12:”; L5?!) {13"} N I ‘3 ' .fl'r‘ fa gal-F1 ' .114 .5; : ‘ ‘ >‘.— , ’7: ‘ If. F {‘41 ‘ “and“ :’:;,,l- '»’:‘., (F 1-51" Ilill'lllllllllllillfiHills 3 1293 00877 3537 This is to certify that the dissertation entitled GRAPH WEDDING AND DATA COMMUNICATION IN A LARGE FAMILY OF NETWORK TOPOLOGIES presented by Jenshiuh Liu has been accepted towards fulfillment of the requirements for DOCTOR—— degree in W A4,] Major professor Date 7% Zfig /fja MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 LlBRARY ‘ | Michigan State 1 Unlverslty PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or betore one due. DATE DUE DATE DUE DATE DUE MSU to An Affirmative Action/Equal Opportunity Institution emunS-m GRAPH EMBEDDING AND DATA COMMUNICATION IN A LARGE FAMILY OF NETWORK TOPOLOGIES By J enshiuh Liu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science August, 1992 .4 W- {5‘53 ABSTRACT GRAPH EMBEDDING AND DATA COMMUNICATION IN A LARGE FAMILY OF NETWORK TOPOLOGIES By J enshiuh Li'u. In a parallel computer or a distributed system the interconnection pattern (net- work topology, interconnection topology) of the communicating modules is one of the key factors that determine the efficiency, reliability, and applicability of the system. MOst previous results in the network design area are about a single kind of inter- connection topology; relatively few have considered fault-tolerance. Here we present new results about graph embedding and data communication for a large family of network topologies, which consists of the generalized Fibonacci cube [63, 80] and the Incomplete Hypercube [68]. This family includes the hypercube as a special case. However, unlike the hypercube, both the generalized Fibonacci cube and the Incom- plete Hypercube are irregular graphs in general. The Incomplete Hypercube offers unrestrictive network size; however, it suffers from a low degree of fault-tolerance under certain situations. On the other hand, the generalized Fibonacci cube offers structural recurrences and also provides a guaranteed degree of fault tolerance. This family of network topologies has two major areas of application: (1) Fault Tolerance — each member network serves as an alternative structure for reconfiguring a hypercube in the presence of faults, and (2) Incremental Expansion - allowing a system to add nodes in small increments, which is an important advantage for systems that evolve with time. To investigate their applications in parallel or distributed systems, we present a collection of provably efficient graph embedding and data communication algorithms for this large family of network topologies. The study of graph embedding is to de- termine whether these networks can flexibly simulate other common networks. The results can be applied to the emulation of a given network on a family of networks; analysis of these results offers insights about the structural relationships between dif- ferent networks. On the other hand, the study of common communication primitives provides efficient tools for developing application algorithms for this family of net- works; it also offers insights about the relations between forms (structural properties) and functions (algorithms). Specifically, we demonstrate how to embed linear array, ring, mesh, and hypercube on the generalized Fibonacci cube; We also show how to embed linear array and ring on the Incomplete Hypercube. In addition, efficient algorithms for single node broadcasting/accumulation, single node scatter/gather, and multinode broadcasting/ accumulation are presented for both the generalized Fi— bonacci cube and the Incomplete Hypercube. All of these algorithms are carefully analyzed under both the all- and one-port models. Copyright © by J enshiuh Liu August, 1992 To my parents and my wife ACKNOWLEDGEMENTS I would like to thank Professor Herman D. Hughes, my thesis co-advisor, for his guidance, intellectual advice and support. Many thanks to Dr. Moon Jung Chung for his suggestions and careful reading the manuscript. I am grateful to my other guidance committee members, Professor Carl V. Page, Professor T. Y. Li and Professor Richard 0. Hill, for their help, encouragement and guidance. I am indebted to my wife Bi-Yun for her support, understanding and encourage- ment during my stay at MSU. Without her love and support, this thesis would not have been possible. I am grateful to my parents for their support, to my sons Sean and Andy for giving me a lot of joy and inspiration during the writing of this thesis. vi TABLE OF CONTENTS LIST OF TABLES ix LIST OF FIGURES x 1 INTRODUCTION 1 1.1 Network Topology ‘ ............................ 2 1.2 Motivation ................................. 4 1.2.1 Hypercube ............................. 5 1.2.2 Incomplete Hypercube and Generalized Fibonacci Cube . . . . 6 1.2.3 Graph Embedding and Data Communication .......... 7 1.3 Overview .................................. 8 2 PRELIMINARIES 11 2.1 Incomplete Hypercube .......................... 11 2.2 Generalized Fibonacci Cube ....................... 12 2.3 A Characterization of Generalized Fibonacci Cube ........... 15 2.4 Basic Properties of Generalized Fibonacci Cube ............ 19 2.4.1 Number of Edges ......................... 20 2.4.2 Node Degrees ........................... 23 2.5 Summary of Results ........................... 25 GRAPH EMBEDDING IN GENERALIZED FIBONACCI CUBE 27 3.1 3.2 3.3 3.4 3.5 Hamiltonian Path ............................. 29 Cycles ................................... 31 Mesh .................................... 37 3.3.1 2D Mesh .............................. 37 3.3.2 3D Mesh .............................. 44 Hypercube ................................. 45 Summary of Results ........................... 46 vii 4 DATA COMMUNICATION IN GENERALIZED FIBONACCI CUBE 49 4.1 Routing .................................. 51 4.1.1 Shortest path ........................... 52 4.1.2 Deadlock .............................. 56 4.2 Single Node Broadcasting ............ i ............ 57 4.3 Analysis of Single Node Broadcasting .................. 64 4.3.1 All-port Model .......................... 64 4.3.2 One-port Model .......................... 65 4.3.3 Optimality of One-port Algorithm ................ 73 4.4 Single Node Accumulation ........................ 75 4.5 Single Node Gather and Scatter ..................... 76 4.5.1 One-port Model .......................... 80 4.5.2 All-port Model .......................... 81 4.6 Multinode Broadcasting and Accumulation ............... 83 4.7 Summary and Remarks .......................... 85 5 GRAPH EMBEDDING AND DATA COMMUNICATION IN IN- COMPLETE HYPERCUBE 88 5.1 Hamiltonian Problem ........................... 90 5.1.1 Cycles ............................... 91 5.1.2 Hamiltonian Path .................. . ....... 93 5.2 I Routing and Single Node Broadcasting ................. 94 5.3 Analysis of Single Node Broadcasting .................. 97 5.3.1 All-port Model .......................... 97 5.3.2 One-port Model .......................... 97 5.4 Single Node Accumulation ........................ 104 5.5 Single Node Gather and Scatter ..................... 104 5.6 Multinode Broadcasting and Accumulation ............... 106 5.7 Summary of Results ........................... 107 6 CONCLUSION AND FUTURE WORK 109 6.1 Main Results and Contributions ..................... 109 6.2 Future Work ................................ 111 BIBLIOGRAPHY 114 viii 2.1 2.2 2.3 3.1 4.1 5.1 5.2 LIST OF TABLES Example of generalized Fibonacci numbers ................ Examples of 2—FCs and 3-FCs ....................... A comparison of the hypercube and generalized Fibonacci cube. . . . Embedding of Hypercubes ......................... Time complexities for communication primitives in I‘fr ........ 5—bit Binary Reflected Gray Codes. ................... Time complexities for communication primitives in I N. ........ ix 13 14 26 46 87 91 108 LIST OF FIGURES 1.1 The composition of the family of network topologies. ......... 2.1 Structure of Is. .............................. 2.2 Second-order Fibonacci cube of dimensions 2 to 6 ............ 2.3 Third-order Fibonacci cube of dimensions 3 to 7. ........... 3.1 F: contains a cycle of length 20 ...................... 3.2 Embedding of cycles in F3; : (a) a decomposition of F2, (b) Case 1, and (c) Case 2 .................................. 3.3 Embedding of an F {E21 x F9] and an FF] x FF] meshes in F3. ..... 3.4 Dividing an (m + n)-bit 2-FC into an m- and n-bit 2-FCs. ...... 3.5 Dividing an (m + n)-bit 3-FC into an m- and n-bit 3-F Cs. ...... 3.6 Dividing an (m + n)-bit 4-FC into an m— and n-bit 4-FCs. ...... 3.7 Embedding of meshes in Pic ........................ 3.8 Embedding of meshes in F3. . . ' ........... I .......... 3.9 Two different labelings of F3. ...................... 3.10 Embedding of meshes in Pic ........................ 3.11 Dividing an (1+ m + n)-bit 2-FC into an I-, m- and n-bit 2-FCs. . . . 3.12 Dividing an (I + m + n)-bit 3-FC into an l-, m- and n-bit 3-FCs. . . . 4.1 Routing path from 01010 to 10101 in [12,. ................ 4.2 An SBT rooted at node 0000. ...................... 4.3 Broadcast from node 01010 in I‘?, ..................... 4.4 Construction of (b) STF:(01100) from (a) SBT5(01100). ....... 4.5 Type-a node and its type-fl child ..................... 4.6 A portion of ST F:(011011). ....................... 4.7 One-port broadcasting in F3, from node 00000. ............. 4.8 An optimal one-port broadcasting in I? from node 00000 ........ 4.9 Single node gathering in B4. ....................... 4.10 (a) STF:(01100) and (b) a subgraph of SBT5(01100) .......... 4.11 One-port multinode broadcasting through the longest cycle. ..... 12 16 16 32 35 37 38 40 41 42 42 43 43 44 48 55 58 61 69 71 73 73 74 77 79 86 5.1 Embedding of longest cycles in (a) I22, and in (b) 130 .......... 92 5.2 Embedding of a cycle with length 22 in 123 ................ 93 5.3 Embedding of a Hamiltonian path in I23 ................. 94 5.4 Routing path from node 011 to 100 in Is ................. 95 5.5 A broadcasting from node 0101 in In. ................. 96 5.6 One-port broadcasting from node 0101 in In. ‘ ............. 98 5.7 Construction of STIao(01110) from SBT5(01110). ........... 98 5.8 Construction of STIgg(01110) from SBT5(01110). ........... 99 5.9 Type-a node and its type-fl child ..................... 102 5.10 STIu(1010). ............................... 103 5.11 (a) 5T122(01110) and (b) a subgraph of SBT5(01110) .......... 105 xi CHAPTER 1 INTRODUCTION High performance computers have become indispensable for many application areas such as engineering design, system modeling and simulation, medical and basic science research [4] [13] [27] [30] [35] [49] [65] [74] [104] [114] . Two fundamental approaches have been employed to improve the speed of computers. The first approach is to increase the speed of the circuitry used in building the computers. The second ap- proach, which is commonly called parallel processing, is to use multiple processors in executing multiple operations concurrently. The speed of light imposes a fundamental limit on the first approach, whereas the number of processors is virtually unlimited. Hence parallel processing appears to be the more promising approach. There are two basic models for studying the parallel computers. In the Parallel Random Access Machine (PRAM) model [45] (also referred to as the Shared-Memory model), multiple processors can randomly read or write any locations of a shared memory, and processors communicate by writing/ reading through the shared mem- ory. On the other hand, in the Distributed-Memory (DM) model, a memory module is associated with each processor, and two processors communicate by explicitly send- ing/ receiving messages. With the current technology, it is sometimes criticized that the PRAM model is unrealistic because for a large system the cost of the intercon- nection network will become prohibitively high [2]. Therefore, the DM model is much more suitable for a large-scale parallel system. Since a DM model based on the com- plete graph may not be cost-effective, networks with relatively sparse interconnections are often used for the DM systems. There are many types of networks, and each one offers different performance. Therefore, it is important, to investigate the relations between the structural characteristics and the performance under common functional criteria. 1.1 Network Topology Graphs are often used to model the network topology (i.e., the pattern of the intercon- nections) of the DM parallel / distributed systems, where a node represents a processor and an edge (link) between two nodes denotes the communication link between the corresponding processors. The following criteria are commonly applied to evaluate network topologies for the DM multiprocessor systems. 1. Degree. The degree of a graph is the maximum number of edges incident on a node. A network of a large degree usually implies more alternative paths between a pair of nodes and hence a larger bandwidth. However, a large degree may also result in a higher hardware cost or difficulty in implementation. 2. Diameter. The distance between a pair of nodes is the smallest number of edges that have to be traversed in order to reach one node from the other. The diameter of a graph G is the maximum distance between any pair of nodes in G. The diameter is often a lower bound on the time required to perform certain calculations on a network. This is because one processor i may need the information residing on another processor j, which has to be transferred from j to i. Therefore, the distance between two nodes determines a lower bound of the communication time. In general, a small diameter is desirable. 3. Bisection width. The bisection width of a graph G with N nodes is the minimum number of edges that have to be removed in order to dissect G into two subgraphs with [IV/2] and [N / 2] nodes respectively. The bisection width is often a critical factor in determining the speed with which a network can perform computations that involve a major exchange of information between two parts of the network. For example, to sort a list of data in a network into certain order, the data items residing in one part of the network may need to be swapped with those residing in the other part. Clearly, under this situation, this operation has to be achieved through the set of links connecting the two halves. Therefore, the bisection width of the network determines a lower bound on the time required for sorting on this network. 4. Expansibility. The expansibility of a network measures the ease to add or delete nodes to/from the network. This consideration is important especially when the network is still evolving or when it is necessary to construct the system incrementally. 5. Fault-tolerance. With a large number of processors/ links in systems, the probability of having a processor or link failure increases significantly. It is desirable to have a network that provides independent paths between all pairs of nodes and hence a higher degree of fault-tolerance. Notice that some of the criteria mentioned above are actually related or even in conflict, hence they cannot be improved without compromising the others. For example, a low degree usually translates into a large diameter, low bisection width, fewer alternative paths and hence low fault-tolerance. The designer, therefore, will have to balance between the cost constraints and the performance requirements of the network. 1.2 Motivation A graph is regular if each node has the same degree [108]. Most network topologies for the DM model are based on regular graphs [2] [46] [75] [97]. Very limited knowledge has been accumulated for irregular network topologies fOr lack of appropriates tools. The driving force behind this work is to further our knowledge on network topologies that display less regularity. Also, instead of proposing a set of unrelated networks, we aim to provide a family of networks, where the interconnection patterns and the performance features can be characterized by a set of parameters. The family of network topologies studied here consists of the generalized Fibonacci cube [63, 80] and the Incomplete Hypercube [68]. Both the Generalized Fibonacci Cube (GFC) and the Incomplete Hypercube (IHC) are irregular topologies in general. However, both of them contain the hypercube (HC) as a subset; moreover, each member of this family (i.e., a generalized Fibonacci cube or an Incomplete Hypercube) is a subgraph of a hypercube. Figure 1.1 shows the constitution of this family (see Chapter 2 for details). The generalized Fibonacci cube stems from the Fibonacci cube (FC) proposed by Hsu [61, 62], which is also a subset of the generalized Fibonacci cube. In this thesis, we will study efficient graph embedding and data communication Figure 1.1. The composition of the family of network topologies. for the members of this family. 1.2.1 Hypercube Numerous network topologies have been proposed and studied. Among them, the trees, meshes, meshes of trees, and the hypercube have drawn the most attentions (see, for example, [2] [46] [75] [97]). Many useful computations, such as matrix operations [2] [75] [97] and image processing [32] [33] [91], can be naturally mapped into trees, meshes, and meshes of trees. However, trees, meshes, and meshes of trees suffer from a low bisection width or a large diameter, which prevents the development of logarithmic-time parallel algorithms for fundamental applications such as sorting. The hypercube (or Boolean cube, binary n-cube, n-cube) of dimension n (denoted by B”) consists of 2" nodes and n2"'1 edges. The nodes are labeled from 0 to 2" — 1 and each node has an n-bit binary code (called address) such that two nodes are connected if and only if the binary codes of their labels difler in exactly one bit. Re- cently, the hypercube has become a popular network topology for parallel processing [55] [103] [106], because it has many appealing properties such as the logarithmic diameter and node degree, high bisection width, and ease to embed other common structures [19] [20] [39] [44] [60] [75] [101] [112] [113]. Intel’s iPSC, NCUBE, Ametek 14 and FPS-T series are a few examples of commercially available parallel computers based on the hypercube topology. Many applications have been successfully devel- oped for the hypercube, e. g., image processing [31] [98] [99], matrix algebra [18] [36] [47], Particle-in-Cell (PIC) [86], and others [46]. Although the hypercube is a powerful and popular topology for parallel computa- tion, different variations have been proposed to improve certain characteristics, such as the diameter and node degree, of the hypercube. The approaches can be generally divided into the following three categories: (1) twisted hypercube [42] [56], (2) en- hanced hypercube [40] [41] [111], and (3) bounded-degree hypercube-derivatives e.g., butterfly, cube-connected cycles (CCC) [96], and shufl‘le-exchange [105]. These networks provide either smaller diameter, or shorter average internode distance, or low node degree. However, their permissible sizes are generally restricted to powers of two. As such, they have lower expansibility than the networks which we are about to study. 1.2.2 Incomplete Hypercube and Generalized Fibonacci Cube The Incomplete Hypercube proposed by Katseff [68] improves the expansibility of the hypercube. The Incomplete Hypercube with N nodes (denoted by I N), where N may be any positive integer, is constructed in the same way as the hypercube. In other words, nodes are labeled from 0 to N — l, and two nodes are connected by an edge if and only if their binary representations differ in exactly one bit. It can be shown that [N is a subgraph of B,, where n = [log N]." Algorithms for routing and single node broadcasting have been devised by Katseff [68]; Tzeng [109] has also analyzed some structural properties, such as diameter, internode distance and traffic density, of the Incomplete Hypercube, which are shown to be comparable with that of the hypercube. Furthermore, Tzeng et al. [110] have shown that binary trees and two-dimensional (2D) meshes can be embedded in the Incomplete Hypercube. These existing results suggest that the Incomplete Hypercube is a. useful network topology. However, the Incomplete Hypercube suffers from two shortcomings. One is low degree of fault tolerance under certain situations. For example, a node may have a degree of one when an Incomplete Hypercube contains exactly 2" + 1 nodes. The other shortcoming is the lack of recurrence in its structure, which makes the designing of certain application algorithms in the Incomplete Hypercube much more difficult. ‘In this thesis, all logs are to the base 2. Hsu [61, 62] has proposed the Fibonacci cube based on the Fibonacci numbers [53, 71]. Recently, he also presented the generalized Fibonacci cube [63] based on the generalized Fibonacci numbers (see, e.g., [72]). The generalized Fibonacci cube can be defined by two integral parameters n and I: where n Z I: Z 2. The lc‘" order Fibonacci number is defined by Fl,” = F31], + F31], + - - - FEE], for n 2 k, and F3" = F,“ = = Flt], = o,F":1, = 1. Informally, the 1:“ order Fibonacci cube of dimension n (denoted by F: ) consists of F,["] nodes; the nodes are numbered from 0 to Fl,” - 1 and each node has a unique address, called generalized Fibonacci code, such that two nodes are connected by an edge if and only if their generalized Fibonacci codes differ in exactly one bit. The generalized Fibonacci cube bears two important features: (1) F: can be decomposed into I: subcubes: FL, , FL.“ - - ~ , I‘" k which are of "— sizes F331, Fflz - - - , Fifi,“ respectively, and (2) Every node has a logarithmic number of degree (as a function of the total number of nodes). Therefore, the generalized Fibonacci cube offers structural recurrences and an improved fault tolerance over the Incomplete Hypercube. 1.2.3 Graph Embedding and Data Communication When using DM parallel computer systems, we often partition an application into many tasks and map (assign) the tasks to the processors. The interaction among tasks located on different processors are achieved through sending and receiving messages. This inter-processor communication is usually modeled by a graph which will be referred to as the guest graph. On the other hand, the graph that represents the network topology of a given parallel system will be referred to as the host graph. Let G = (Va, Ea) denote a guest graph and H = (V3, 133) a host graph, where Va and V” denote the node sets of the guest and host graphs respectively, and E0 and EH edge sets. The embedding of G in H is a function f that maps each node in G into a node in H. Dilation and expansion are two common metrics used in evaluating graph embeddings: the dilation of the embedding is ma:c(dis( f (i), f (j )), V(i, j) E Ea), and the expansion of the embedding is [if/‘3], where [VGI denotes the cardinality of the set Va, Will the cardinality of V”, and dis(i, j) the distance between two nodes i and j. A parallel computer should allow the users to write programs based on the algo- ' rithmic structures (guest graphs) rather than the underlying structure of the system (host graph). Each application usually dictates a certain guest graph when it is mapped into a parallel computer system. Our study of graph embedding can provide useful tools for emulating various guest graphs on a given host graph. Furthermore, the study of graph embedding will provide insight about the relations among different network topologies. Advances in the VLSI technology have enabled us to construct parallel computer systems consisting of hundreds or thousands processors. This trend has contributed to an ever finer granularity in parallelism. Consequently, the computation per com- munication is becoming smaller and smaller. With the current technology, it is easier to build a high speed processor than to construct a high bandwidth interconnection network. Therefore, efficient data communications are critical for high performance parallel computers. In this thesis, we will demonstrate the effectiveness of the family of network topologies by designing tools for graph embedding and data communica- tion in it. 1.3 Overview ’We now give a brief overview of this thesis. In Chapter 2, we review the Incomplete Hypercube and some of its properties. Then we formally define the generalized Fibonacci cube, characterize its recurrence relation, and study some of its basic properties such as the number of edges and node degrees. In Chapter 3, we present some graph embeddings in the generalized Fibonacci cube. The Hamiltonian problem is to determine whether a graph contains a spanning (Hamiltonian) path or cycle. In [52], it is considered as one of the fundamental problems in graph theory. The linear array and ring are two common structures which have many applications in parallel processing [2] [10] [75] [97]. We will show that 1‘: contains a Hamiltonian path, which implies the embedding of linear array. Then, by decomposing the generalized Fibonacci cube into subcubes and properly rearranging the Hamiltonian paths in the subcubes, we prove that each member of the generalized Fibonacci cube contains a Hamiltonian cycle if and only if it has an even number of nodes; on the other hand, for members which do not contain a Hamiltonian cycle, the longest cycles contain all but one node. This result implies that a ring of size Fl,” can be embedded in F]: with a dilation no more than two. The mesh structure is commonly used in many applications [2] [10] [75] [97]. Therefore, we will also present the 2D and 3D mesh embeddings. In addition, we show that the hypercube can be embedded in a generalized Fibonacci cube, even though each generalized Fibonacci cube is a subgraph of a hypercube. In Chapter 4, we study the data communication in the generalized Fibonacci cube, where we devise and analyze efficient algorithms for the following data communication primitives: 1. Routing: a node sends a message to another node, 2. Single node broadcasting (accumulation): a node sends a message to all the other nodes (receives messages from all the other nodes), 3. Single node scatter (gather): a node sends different messages to all the other nodes (receives one message from all the other nodes), and 4. Multinode broadcasting (accumulation): every node sends a message to all the other nodes (receives a message from all the other nodes). 10 These primitives are studied because broadcasting is used in many algebraic compu- tations, see, for example [18] [48] [66]; accumulation can be applied to, for instance, processor synchronization [94]; scatter and gather are commonly used in, for exam— ple, prefix computation and tree contraction [1] [12] [14]. [64] [73] [88]‘[89] [90]. As in [9, 67], we will analyze all of our data communication algorithms under two different models. In the one-port model, one assumes that only one port on every node can be used for sending and receiving data (one-port model) at any moment. On the other hand, in the all-port model, one assumes that all the communication ports on every node can be used for sending and receiving data at any time. In Chapter 5, we consider the Hamiltonian problem and data communication in the Incomplete Hypercube. By employing the Binary Reflected Gray code (see, e.g., [25, 101]), we show that, for N Z 4, I N contains a Hamiltonian cycle if and only if N is even, whereas the longest cycle contains all but one node if and only if N is odd. Consequently, I N contains a Hamiltonian path if N is even. On the other hand, when N is odd, we are still able to construct a Hamiltonian path by substituting one edge in the longest cycle with another one. Therefore, we have optimal embeddings of linear array and ring in the Incomplete Hypercube. For data communication, we demonstrate that all of the communication primitives mentioned earlier (single node broadcasting and accumulation, single node scatter and gather, and multinode broadcasting and accumulation) can be also efficiently implemented in the Incomplete Hypercube under both the one- and all-port models. In Chapter 6, we summarize our main results, highlight contributions, and discuss some directions for future work. CHAPTER 2 PRELIMINARIES In this chapter, we present the basic properties of the family of network topologies. As mentioned earlier, the family of network topologies of our interest consists of both the Incomplete Hypercube [68] and the generalized Fibonacci cube [63, 80]. A graph G is represented by (V, E), where V denotes the set of nodes and E the set of edges. The Hamming distance between two binary strings is the number of bits where these two strings differ. 2.1 Incomplete Hypercube The Incomplete Hypercube was first proposed by Katseff [68]. An Incomplete Hyper- cube with N nodes (where N may be any positive integer) is constructed in the same way as the hypercube, i.e., two nodes are linked if and only if the Hamming distance between their addresses (i.e., binary codes) is exactly one. Formally, let H(i, j) de- note the Hamming distance between the binary representations of two integers i and j. The Incomplete Hypercube can be defined as follows. Definition 2.1 [68] Let N 2 1 denote an integer. The Incomplete Hypercube with N nodes, denoted by IN, is a graph (V, E), where V = {0,1,---,N — 1} and E = {(isj) I H(i,j)=1,f07'0 S 5,1. S N ‘- 1} 11 12 Figure 2.1 shows the structure of Is, where nodes are labeled by their addresses. The - Figure 2.1. Structure of 16- following two lemmas are important in order to see the applications of the Incomplete Hypercube. Lemma 2.1 IN is a subgraph of IN“ for any N Z 1. Lemma 2.2 [68] IN is a subgraph of B" where n = [logN]. With Lemma 2.1, we can build the Incomplete Hypercube incrementally. In other words, for example, to have an I4, we may construct I; first, then add one node and one edge to form 13, and finally add one more node and edge. On the other hand, Lemma 2.2 suggests that the Incomplete Hypercube may have fault-tolerant applications in the hypercube. For example, when any single node fails in Ba, the remaining network can still be used as an I7. 2.2 Generalized Fibonacci Cube In this section, we will define the generalized Fibonacci cubes. For this purpose, we first examine the generalized Fibonacci numbers (see e.g., [72]). The lc‘h order Fibonacci numbers, denoted by Elf], are defined by the following equation Fik] = Frill + 1:11:12 'i' ’ ' ' + Fiflk, for Tl 2 k, 13 where n"“=o for cyst—2, and F3951. In other words, each number in the sequence is the sum of the preceding 1: numbers. Table 2.1 lists the first 10 non-zero generalized Fibonacci numbers of orders 2 to 10. Notice that we have F 3:], = F ,E’“ for k 2 2; the usual Fibonacci numbers are obtained Table 2.1. Example of generalized Fibonacci numbers. .- FE FE. FE. F... FE. FE FE FE.— -1 1 1 1 1 1 l 1 1 0 1 1 l 1 l l 1 1 1 2 2 2 2 2 2 2 2 2 3 4 4 4 4 4 4 4 3 5 7 8 8 8 8 8 8 4 8 13 15 m 16 16 16 16 5 13 24 29 31 3_2 32 32 32 6 21 44 56 61 63 64 64 64 7 34 81 108 120 125 127 128 128 8 55 149 208 236 248 253 255 E when k = 2. Furthermore, it is not difficult to see that Rm 2 2““ (i.e., F!” is a power of 2) for 2 S k S i 5 2k — 1. We next introduce the Fibonacci codes which enable us to define the generalized Fibonacci cubes. Definition 2.2 Assume that n Z I: 2 2. Leti be an integer where 0 S i < Elf]. The 19"“ order Fibonacci code {k-FC) of i, denoted by FC,‘,“‘(i), is a sequence of n — 1: binary numbers (b.._k_1, ..., b1, b0) where 1. i: 293-1 b,- . Flfjk, and J 2. bj-bj+1-...-bj+k_1 =0,f01‘ OSjS(n—2k). 14 In the following, we write F C(i) for F C,‘,“](i) if k and n can be explicitly deduced from context. We adopt the following convention to number a binary string. The rightmost (least-significant) bit of a binary string is referred to as bit 0 and the second to the rightmost as bit 1, and so on. The following two observations are immediate from the preceding definition: 1. bit 1' in the la“ order Fibonacci code (k-FC) carries the weight of FE)“ and 2. No I: or more consecutive 1’s appear in k—FCs. These properties distinguish the Fibonacci codes from the (base-2) binary codes. In the generalized Fibonacci number system [17, 72], every nonnegative integer has a unique representation as a sum of distinct In“ order Fibonacci numbers (Fl-m, for i 2 Ir) such that no 10 consecutive Fibonacci numbers are used. The k-FC of an integer I can be obtained by the following greedy approach: find the greatest Fj‘k‘ that is less than or equal to I, assign 1 to bit j — k, then proceed recursively on I — Ff”; finally, all unassigned bits are set to 0’3. Table 2.2 lists 2-FCs and 3—FCs for all integers from 0 up to 12. Table 2.2. Examples of 2-FCs and 3-FCs. FCWG) FCWU) z FCmfi) chu) 00000 0000 7 01010 1000 00001 0001 8 10000 1001 00010 0010 9 10001 1010 00100 0011 10 10010 1011 00101 0100 11 10100 1100 01000 0101 12 10101 1101 01001 0110 OQU‘rFOOMt—lOon. 15 The definition of the generalized Fibonacci cubes is based on the Fibonacci codes and the Hamming distance. Definition 2.3 The let“ order Fibonacci cube of dimension n, denoted by 1‘: (where n Z I: Z 2), is a graph (V:,E,’f), where V: = {0,1,..,F,‘,“] — 1} and E: = {(i,j) | H(FC,‘,"1(i),FC,[,"l(j))= 1, 0 g i, j < F34}. Define r: = (¢,¢) for 0 g i g k — 2, and I”ti—1 = ({0}’¢)- Recall that Fit], = F9] for I: 2 2. Definition 2.3 implies that both FL, and I"; represent the same graph and they contain exactly one node. Figure 2.2 shows the structure of the second-order Fibonacci cubes (or Fibonacci cubes as in [61, 62]) of dimensions 2 to 6, where nodes are labeled by 2-FCs. It can be seen that 1‘3, can be partitioned into two subcubes: one subgraph (induced by the black nodes in Figure 2.2) is isomorphic to FL, and the other (induced by the white nodes) is isomorphic to FLT Figure 2.3 shows some of the third-order Fibonacci cubes. Similarly, P: can be partitioned into three subcubes: the first one (the black nodes in Figure 2.3) is isomorphic to F ‘24, the second (grey nodes) to FL, and the third (white nodes) to I‘:_3. The following observation is important: F: is a subgraph of 3..-]. for n _>_ k. In the following section, we will prove this observation and other properties of the generalized Fibonacci cubes. 2.3 A Characterization of Generalized Fibonacci Cube It is clear that the definition of the generalized Fibonacci cubes is analogous to that of the Boolean cubes. In this section, we study some basic properties of the generalized Fibonacci cubes and their relations with the Boolean cubes. It is necessary to examine the properties of lc-FCs at this point. For convenience, we view each k-FC as a string. 010 a 1‘, I r3 r4 1 so: r3 oooo oozo 1000 1010 0001 31 noon 1°“ moo 0110 1100 0101 no: r, Figure 2.3. Third-order Fibonacci cube of dimensions 3 to 7. Let a and ,6 denote two strings, and let A represent the null string. The following notations will be used: the concatenation of two strings or and fl is denoted by a - fl, or simply afl. Define A - a = a - A = a. We use 01" to represent the concatenation of a string a with itself for i times, where i 2 1; define a0 = A. For a set of strings S, definea-S={a-fl|,365}.‘ Let V: denote the set of lc-FCs for integers between 0 and Fl,” — 1, where n > k. Define VLI = V}: = {A}? It should be clear that the nodes in F: can be labeled ’Notice that I"‘_l and I‘: each contains exactly one node. We use A to address the node in this case. 17 exactly by the codes in 11:. From Table 2.2, we can observe that V: = {0,1}, V} = {00,01, 10}, and v: = {000,001,010, 100,101} = {000,001,010} 0 {100,101} = 0- V} U10. V3. Moreover, V: = {A}, V} = {0,1}, and V53 = {00,01,10,11}, and V: = {000,001,010,011,100,101,110} = {000, 001,010,011} 11 {100,101} u {110} = 0. V: U10- V2 U110- V3. This observation leads to the following lemma that describes an important recurrence of the set V5. Lemma 2.3 Assume that I: 2 2, and n 2 2k. Let V: denote the set of lc-FCs for integers between 0 and F,‘,"1 - 1. Then v}: = 0 - v,';_, u 10 - v,’:_, u ---1"-’0 - v,':_,,+, u1k-10-v,’:_,,. Proof: By induction on n. D We next examine the recurrence of the generalized Fibonacci cubes. By definition, F: contains Fl,” nodes. Recall that F,[“1 is defined as the sum of the lc preceding gen- eralized Fibonacci numbers. Lemma 2.4 shows that 1‘: contains lc disjoint subgraphs that are isomorphic to I"‘_1 , FL” - - - , and Ff,_k respectively; mOreover, each node in a subcube of dimension j is connected to one node in every subcube of dimension i, provided j < i. Lemma 2.4 Assume that k 2 2 and n 2 2k. Define 9::(2') = 1‘0 - V"_1_,-, and Gf,(i) the subgraph induced by 9:0) in Pi, where 0 S i < 10. Then 1‘: can be partitioned into Gf,(0) , Gf,(1) , -- - ,G:(lc — 1) such that (1) Gf,(i) is isomorphic to FL _,- for 0 S i < lc, and 1 (2) For two nodes :1: and y in Ff, such that a: in Gf,(i), and y in Gf,(j) and 0 S i < J‘ < k, the edge (:r,y) is in r: if and only if y — a: = F‘k‘ n-l—i' Proof: Property (1) is an immediate result of Lemma 2.3. It remains to show property (2). The if part follows from Definition 2.3. We need to prove the only if 18 part. By definition, we have g:(i) = 1‘0 VLF,- and g:( j) = 1’0 V:_1_j. By recur- sively applying Lemma 2.3 on 13:44, we can deduce that 95(5) contains 1‘01""“0 V:_1_,,-. Notice that 95(1) can be rewritten as PIN-“10 VLF,- . Since (any) is an edge in F 1:, they must differ in the (n — lc — l — i)“ bit. Therefore, the difference between y and a: is ‘k‘ D n-l-i' Example 2.1 Let k = 3 and n = 6. We have 93(0) = {000, 001,010,011}, 93(1) = {100,101,} and 93(2) = {110}. Figure 2.3 shows that F: can be partitioned into G3(0) (black nodes), G20) (gray nodes) and Gg(2) (white node), which are, respectively, isomorphic to F2 , F3 and F3. Furthermore, G20) is connected to 03(0) by the edges (100, 000) and (101, 001); G§(2) is connected to G2(0) by the edge (110, 010), and G§(2) is connected to G2(1) by the edge (110, 101). The relation among different generalized Fibonacci cubes is captured by the fol— lowing lemma. Lemma 2.5 Assume that k _>_ 2 denotes an integer. (1) 1‘: is a subgraph of Ff,“ for n _>_ k, and (2) I": is a subgraph of F2“ for n > 1:. Proof: By Definition 2.3. Omitted. Cl Lemma 2.5 has the following two implications. 1. We may build a generalized Fibonacci cube incrementally. In other words, for example, to construct F3, we can first build F: then adding nodes and edges to form 1‘3, F3, and finally F3. 2. The generalized Fibonacci cube defines a class of network topologies with a large alternatives for fault-tolerance. This is because when certain nodes and / or links 19 fail in Ft, we may form a Ff. (where 2 S lc’ S k and lc < n’ S n) from the remaining nodes and links. It is known that nodes in BM}. can be addressed by (n — lc)-bit binary codes where n > It, similarly it is seen in Section 2.2 that nodes in "F: can also be addressed by (n — lc)-bit Fibonacci codes. Since the set of m-bit Fibonacci codes is a subset of all m-bit binary codes, we have the following relations between the generalized Fibonacci cube and the Boolean cube. Lemma 2.6 Assume that k 2 2 denotes an integer. (1) 1‘: is isomorphic to 8..-], for all n, where I: S n < 2k. (2) F: is a proper subgraph of Bn._1, for all n 2 2k. One way to obtain F: from BM}, is to remove every node (along with their inci- dent edges) whose binary address is not a lc-FC. Lemma 2.6 also indicates that the generalized Fibonacci cube may have fault-tolerant applications for the hypercube. We now closely reexamine Fig. 1.1 that shows the constitution of the family of network topologies under our study. Recall that F,‘,“] = 2"“‘ for 2 S k S n S 2k — 1. Therefore, by varying k and n, it can be shown that the hypercube is a special case (subset) of the generalized Fibonacci cube. Furthermore, Flt] = 2" -— 1 for k 2 2. Hence, when 2 S k S n S 2k, P: (which contains Fl,” nodes) and I Fitz] are isomorphic, because both are either a hypercube or a hypercube with one node missing. 2.4 Basic Properties of Generalized Fibonacci Cube In this section we study some basic properties of the generalized Fibonacci cube. In particular, we will count the number of edges and the node degrees. The number of 20 edges is an important metric in studying a network topology because it is a factor related to the cost of the network. On the other hand, the minimal degree of a topology usually sets a bound on the connectivities of a graph, which is a prime factor in studying fault-tolerance. 2.4.1 Number of Edges In this section, we count the number of edges in the generalized Fibonacci cubes. Let Elf] denote the number of edges in I“: By Lemma 2.4, 1‘: consists of k subgraphs Gf,(i), where 0 S i < k. The recurrence equation of Elf] is EL” = E53. + ESE. + - -- + E52. + Fl“. + 2F.‘.’:‘3 + - ~ + (k — 1)F.E".‘. (2.1.) where E,‘k1=0 for OSiSk and Elli-1:1- To see Eq. (2.1), we note that the first 10 terms account for the number of edges in subgraphs Gf,(i)’s and the remaining k — 1 terms account for the number of edges among these subgraphs. For example, the subgraph Gf,(0) has Em, edges, and there are Fit]: edges connecting the subgraphs Gf,( 1) to Gf,(0). We now solve Eq. (2.1). For convenience, we introduce the generating functions (see e.g., [53]) for the generalized Fibonacci numbers. Let G‘k](z) denote the generat- ing function for the In“ order Fibonacci numbers, where k 2 2. It is known (see e. g., [72]) that zlc—l G"“(z) = l—z—zz----—z" (2'2) 21 We will first solve the case when k = 2. In this case, we have 13,131: E531. +£12.12 +FI’J. (2.3) Let H [21(2) denote the generating function for Effl’s, and define El" = 0 for i < 0. By multiplying both sides of Eq. (2.3) with z“ and taking summations, we have We) 5 23,?le n=0 = 2 E2112” + 2 E5222" + z F3122" n=0 n=0 n=0 = zH‘2‘(z) + 22H‘2‘(z) + acme) = _.1__2_z2gl2l(z) l—z—z where z G‘2‘(z) = l—z—z2 as we saw in Eq. (2.2). Further calculation leads to HMO!) = ZIG‘2‘(Z)12 (2-4) It can be shown that H [21(2) is a convolution of the function G‘2‘(z) with itself and followed by a shift. Therefore, we can solve Eq. (2.4) and get n-l £3,131 = Z n‘2‘F,‘,’_‘,_, for n 2 1 (2.5) i=0 We next proceed to solve the case when k = 3. According to Eq. (2.1), we have Bits] = EEL + E5312 + Eitsls + Fria—lz + 2Fkfls fl 22 Let H [31(2) denote the generating function for EE’I’s, and define EP‘ = 0 for i < 0. Follow a similar procedure, we have 1 z—z2 H‘3‘(z) = 1 _ _ 23(z"‘G‘:”](z) + 223G‘3‘(z)) where $2 G‘3‘(z) = 1 — z — z2 -— 23 Hence, H [31(2) = [091(2)]? + 2ZlG‘3‘(«':)12 It is not surprising that the relation between H [31(2) and G‘3](z) is similar to what we observed in the second order case. TherefOre, we have n n—l E531 = 2E‘3‘F‘3‘ + 2 Z 151911519114 for n 2 1 n—i i=0 i=0 For the general case, let H [“l(z) denote the generating function for Elfl’s, and define Elm = O'for i < 0. Follow a similar procedure, we have the expression for H “1(2) : H‘k‘(z) = 1 . X l—z—zz—----z" where X = (zFGI’=1(z)+ 223Gl’=1(z) + - . - + (k — 2)z“‘1G"“(z) + (k — 1)z“G““(z)) By substituting Eq. (2.2), we have 1 _2_ H"“(z) = -,;,;_—;.,[G"“(2)]2 + ,1.-. [G"“(.-z)]2 + . -- + (k — 2)[GI"1(2)]2 + (k — 1)z[G““(z)]2 where G‘k](z) is the generating function for the lat“ order Fibonacci numbers as ex- 23 pressed in Eq. (2.2). Therefore, it can be shown that k—l n-2+j Elf] = :(k— j) Z F,"“F,‘,"_‘,+,_,. for n > k (2.6) j=l i=0 Let X: denote the number of edges in 3..-}, for n 2 k 2 2. For sufficiently large n, it can be shown that 5,1,9 0.79 S X: S 1 (2.7) where the lower bound is obtained [62] when k = 2, and the upper bound is achieved when 1‘: becomes a hypercube. 2.4.2 Node Degrees We now study the node degrees in the generalized Fibonacci cube. In particular, we will determine the maximum and minimum node degrees. Recall that nodes in F]: are addressed by (n — k)-bit k-FCs. Clearly, node 0 has a degree of n — lc, since every node adjacent to node 0 has exactly one bit with value 1 in it lc-FC. Therefore, the maximum node degree of 1‘: is n — k. In the remainder of this section we will determine the minimum node degree. Two observations are immediate: 1. For Ir 2 2 and lc S n < 21:, by Lemma 2.6, F: is isomorphic to Bn-k. Hence all nodes have the same degree of n — k. 2. For l: 2 2 and n = 2k, F: is obtained by removing the node with 1" as its address. Therefore the minimum degree of nodes is k — 1. The next lemma characterizes the general case. 24 Lemma 2.7 For I: Z 2 and n 2 2]: + 1, let p denote the node in F" whose lr-FC is n) formed by repeating the sequence 01““0. Then p has the minimum degree in Pi. Proof: We distinguish among the following three cases: Case 1: n=2lc+l (i.e.,n-lc=lc+l) Clearly, node p has a degree of k - 1. By observation 2 in the preceding paragraph, the minimum degree of nodes in FL, is also lc — 1. Furthermore, it can be shown inductively that the minimum node degree of F: is a non-decreasing function of n for a fixed lc. Therefore node p has the minimum degree in Pi. ‘Case 2: 2k+1_ I: + 1, we have the following recurrence equation: 6:: = 53+, + [C — 1 (2.8) wherebfi=n—lcforkSnS2lc—l and6§k=k—l. One way to solve Eq. (2.8) is to apply the generating function as we did in Sec- tion 2.4.1 to count the number of edges. For simplicity, we make use of some obser- vations obtained in the proof of Lemma 2.7 to solve Eq. (2.8). Recall that the node with its k-FC formed by repeating the sequence 01“"10 has the minimum degree. Let m = [(n — k)/(lc + 1)]. By completing Case 3 in the proof of Lemma 2.7, it can be shown that k n‘—k—2m—1 ifn=m(k+1)+2k 6": (2.9) n — k — 2m otherwise where m = 1(n — was + 01. 2.5 Summary of Results We have reviewed the Incomplete Hypercube and defined the generalized Fibonacci cubes. Table 2.3 compares some parameters between the hypercube and the general- ized Fibonacci cube. 3H app]? 9-0 26 Table 2.3. A comparison of the hypercube and generalized Fibonacci cube. [L__ _ l :8“ l I‘l‘. I] [I Number of nodes 2"" FE] [I Number of edges (n — l1:)2"""’l Eq. (2.6) Max. Degree n — lc n — lc Min. Degree 11 — 10 Eq. (2.9) I] Diameter n — k n — kl I See Chapter 4. For each element of the family, we have shown that I N is a subgraph of B,, where n = [logN], and 1‘: is also a subgraph of 8..-}. for n 2 k 2 2. Additionally, both the Incomplete Hypercube and the generalized Fibonacci cube include the hypercube as a special case. The family of network topologies has the following two areas of application: 1. to allow an incremental expansion of a hypercube-like parallel computer sys- tems, and 2. to serve as a fault-tolerant structure for the hypercube systems. CHAPTER 3 GRAPH EMBEDDING IN GENERALIZED FIBONACCI CUBE As explained in Chapter 1, graph embedding is an important issue in parallel com- puter systems. In this chapter, we will study embeddings of linear array, ring, mesh, and hypercube in the generalized Fibonacci cube. Embeddings in the hypercube have been studied extensively in the past [19] [20] [21] [26] [39] [57] [59] [60] [101] [112] [113] . In many of these studies the Gray code [50] has been an important tool. The Gray codes are binary codes representing a sequence of numbers such that the codes representing two successive numbers are different in exactly one bit. From the property of the Gray code, it can be seen that a linear array can be embedded in the hypercube [101]. Furthermore, by employing the binary reflected Gray codes Saad and Schultz [101] have shown that cycles of even lengths can be embedded in the hypercube. By choosing ml-bit Gray codes for the :r-direction and mg-bit Gray codes for the y-direction, it is also shown in [101] that an (m1 +m2)-cube embeds a 2D mesh of size 2'"1 x 2"”. In general, a (2m x 2'"2 x - - - x 2"”) d-dimensional mesh can be embedded 27 28 in B”, provided that n = m1 + mg + - - - + m4 [101]. Clearly, when the side lengths of a mesh are not powers of two, the Gray code scheme may not result in a minimal— expansion embedding. Chan and Chin [20] have employed the Gray code combined with an ad hoc method to achieve minimal-expansion, dilation-2 embeddings of most 2D meshes. By using optimal embeddings of small meshes with certain sizes and a scheme called “graph decomposition,” Ho and Johnsson [59, 60] have shown that 87% of all 2D meshes and a large portion of 3D meshes can be embedded in the hypercube with minimum expansion and dilation two. Finally, Chan [19] has settled the question regarding minimal-expansion, dilation-two embeddings of all 2D meshes in the hypercube. I The Hamiltonian problem in the generalized Fibonacci cube has been studied in [83] where we have constructively proved the following two optimal results: 1. F: contains a Hamiltonian path, and 2. for Fl,” 2 4, F]: contains a Hamiltonian cycle if and only if Fl,” is even, whereas the longest cycle in F: contains all but one node if and only if F,‘,"] is odd. These results imply that F 5‘, embeds a linear array of size F,‘,"‘ with expansion-1 and dilation-1, and F]: embeds a ring of size F,‘,"‘ with expansion-1 and dilation no greater than two. In this chapter, we extend our work to the embeddings of meshes, which stem from the embedding of linear arrays and rings. Furthermore, we will also report the embedding of the hypercube in the generalized Fibonacci cube. The remainder of this chapter is organized as follows. In Section 3.1, we prove that each generalized Fibonacci cube contains a Hamiltonian path. By embedding linear arrays in the subcubes and properly arranging them, we show in Section 3.2 that any cycle of length I can be embedded in Ff" provided that 4 S l S F,‘,“] and l is even. With the Hamiltonian path, in Section 3.3 we study the embeddings of 2D and 3D meshes. Section 3.4 describes the embedding of the hypercube in the generalized 29 Fibonacci cubes. Finally, we summarize our graph embedding results in Section 3.5. 3.1 Hamiltonian Path In this section, we will show that there is a Hamiltonian path in every Ft, which serves as the basis for our study of cycles in the generalized Fibonacci cubes. To motivate, we first study the Hamiltonian paths in the second-order generalized Fibonacci cube. Example 3.1 The sequence P0 = (01,00,10) and P1 = (0,1) are, respectively, Hamiltonian paths in 1‘: and I‘g (refer to Fig. 2.2). By Lemma 2.4, the graph F: contains a copy of F3 and a copy of I‘g. To construct a Hamiltonian path in I‘g, we attach a prefix ‘0’ to each code in P0 and “10” to each in P1, respectively, then reverse both sequences. The two resulted sequences are (010,000,001) and (101,100). Since H(001, 101) = 1, these two sequences can be linked to form a Hamiltonian path in I‘g. Repeating the same procedure, we can obtain a Hamiltonian path in Ff, by concatenating (0100,0101,0001,0000,0010) and (1010,1000,1001). By using induction, it can be shown that the recursive procedure described in Example 3.1 always produces a Hamiltonian path in the second order generalized Fibonacci cube. The following theorem extends this observation to the generalized Fibonacci cube. Theorem 3.1 I‘: contains a Hamiltonian path for n 2 Ir 2 2. Proof: Here we will prove the case of k = 3 by induction (on n). Cases when k > 3 can be proved similarly. For convenience, nodes are referred to by their k-FCs. A path will be represented by the sequence of the nodes in it. It is easily seen that P0 = (A), P; = (0,1) and P2 = (01, 00, 10, 11) are Hamiltonian paths in F3, F2, and F2, respectively. We establish our induction basis for n = 6. To obtain a Hamiltonian path in F2, we take the following procedure: 30 Step 1 Prefix a ‘0’ (respectively, “10” and “110”) to each code in P0 (respectively, P1 and P2). Let the resulted sequences be denoted by 0P0,10P1, and llOPg, respectively. Step 2 Reverse each code in 0P0,10P1, and 110P2, respectively. Let the resulted sequences be denoted by 0P3, 10F; and HOE, respectively. Step 3 Concatenate the three sequences in the following order: O-PZ, 1071— and 1107);. Therefore, we obtain the sequence (011, 010, 000, 001, 101, 100, 110) that is a Hamilto- nian path in F3. We assume that the above procedure always results in a Hamiltonian path in 1‘: for all n S I, where I Z 6 denote an integer. Now consider the case when n = I + 1, i.e., 1?“. By Lemma 2.4, 1‘93], can be decomposed into three subgraphs, G?(0), G3_,(1) and G§_2(2), which are isomorphic to PE”, 1431,, and F5312 respectively. By induction hypothesis, there is a Hamiltonian path in F53], [“1311 and F‘alz, respectively. Let P,- (where 0 S j S 2) denote the ‘3‘ - obtained by the previous procedure. Let H,- denote the Hamiltonian path in I‘I_, reversed sequence of Pj. It remains to show that the three sequences (i.e., 0H0, 1071 and 110??) obtained in Step 2 of the previous procedure can be concatenated. Let the first (respectively, last) node of a path be referred to as the head (re- spectively, tail), and let h,- (respectively, t.) denote the head (respectively, tail) of the Hamiltonian path obtained by the previous procedure in I? (For example, h3 = t3 = A, h = 0, and t4 = 1.) It is can be shown that the following recurrence relation is a consequence of the procedure described above: h,, = 0 . t.._1. Given this, observe that the tail of 0770- (which was the head of UFO) is now labeled by 0 - h; = 0 - 0t1_1. On the other hand, the head of 107’;- (which was the tail of NH) is labeled by 10 - t1..1. Since H(00t1_1,10t1_1) = 1, there is an edge between the two nodes, i.e., the two (Hamiltonian) paths can be linked together. Similarly, the tail of 10? now has the label 10 - h1_1 = 100 - n-2, while the head of 11072 now has the 31 label 110 - t[-2. Hence there is also an edge between the two nodes. Therefore, we conclude that the previous procedure results in a Hamiltonian path in F5311. It should be clear that the above argument can be extended to the general case when I: > 3. _ D 3.2 Cycles In this section we will study the embedding of cycles in the generalized Fibonacci cubes. The embedding of cycles is based on the embedding of Hamiltonian paths pre- sented in the previous section and the recursive properties of the generalized Fibonacci cubes. Let L: be the Hamiltonian path constructed according to the algorithm de- scribed in the previous section. To motivate, let us consider embeddings of cycles in F: in the following example. Recall that L: in F3 is obtained by a concatenation of two sequences in 0V? and 10123. By Lemma 2.4, F: consists of two copies of F3, (i.e., 00V: and 1011:) and one copy of F2 (i.e., 0101):). By properly arranging three Hamiltonian paths in these three subcubes, we can obtain embeddings of cycles in rg. Example 3.2 In Example 3.1 we have shown that L3, = (0100, 0101, 0001, 0000, 0010, 1010, 1000, 1001). Now, let Lop = (000100, 000101, 000001, 000000, 000010, 001010, 001000, 001001), L0,] = (010100, 010101, 010001, 010000, 010010), and L1 = (100100, 100101, 100001, 100000, 100010, 101010, 101000, 101001). It can be seen that Lop, L0,1 and L1 are Hamiltonian paths in 001162, 0101/52 and 101162, respectively. Figure 3.1 shows that 1‘; contains a cycle of length 20. Notice that the nodes are addressed by 6-bit 2-FCs, where the least significant 4 bits are shown next to the nodes and the most significant two bits are shown on the far left—hand side. Note that the first I nodes in both Lop and L1 can be used to construct a cycle of length 32 21, where 2 S l S 8. Furthermore, nodes in LOJ are needed to construct a cycle of length 18 or 20. MCI.- 0100 0101 0°01 0°00 0°10 01 O Lo.1 zoo LOAD ” exoe oxen ooo: oooo 0010 ten 1ooo L1 1° one out see: oooo ooze use 100 zoo: Figure 3.1. F: contains a cycle of length 20. We now proceed to study the embedding of cycles in the generalized Fibonacci cubes. Lemma 3.1 will facilitate our presentation. Lemma 3.1 Assume that Is 2 2, and n 2 21: + 1. We have V: = ovii-l U 1(vrli-l\lk-10vrli—k—l) = 0(vrli—1\1k-10vrli--k 1) U 1(Vn-1\1k_10vrli—k-l) U Olk-lovii—k—i Proof: By Lemma 2.3 we have v: = 0V,’:_1 010123;, 011012;, 11 - - - u 1"“201I,’f_,,,1 u 1""‘012’1,c = 0v: _, 01(01):: _,UIOV,'f_3U mu 1""3012,’f_,,+1 111" 20121: -1.) ka -1 U1(vr’i-l \ lk-lovr’i-k-l) = 0V..( —1\1k-10V:— k— 1) U 01"“10V,‘f_,,_ 1 U 1(1’71—1\1k-10vr’:-k—1) C1 lie 456: one 1 p l 33 Theorem 3.2 Assume that lc _>_ 2. For all n Z I: + 2, any cycle of length I can be embedded in I‘: provided that l is even and 4 S l S F,[“]. Proof: An example for k = 2 has been examined in Example 3.2. We will first show the case when k = 3. By Lemma 3.1, we have v2 = 012,1, u 1023;1 \110v3_,) Define So to be the subgraph induced by nodes in OWL1 , and 51 the subgraph induced by nodes in 1033-1 \ 1101734), i.e., nodes in 101124 U 1101134. Note that So is isomorphic to FL1 and 51 is isomorphic to a graph obtained by removing a subcube I‘ll—4 from F24 (refer to Figure 3.2 (a)). By Theorem 3.1, there exists a Hamiltonian path in So; furthermore, from the construction of the Hamiltonian path in F3_1, we can observe that there exists a Hamiltonian path in 51. Let L0 be the sequence of 3-FCs formed by attaching a prefix ‘0’ to each 3-FC of Lil-1: and L1 be the sequence of 3-FCs formed by attaching a prefix ‘1’ to each of the first (FE!l — F334) 3-F Cs of Lil—1' Note that attaching the prefix ‘1’ to each element of the set(V3_1 \ 1101134) still results in a valid 3—FC. One should be able to see that L0 and L1 are Hamiltonian paths in So and 51, respectively. Moreover, for all i (where 1 S i S F311 — 51,), there is an edge connecting the it“ nodes of L0 and L1 (refer to Fig. 3.2 (b)). We next show that, by properly selecting nodes in the L0 and L1, cycles of even length can be embedded in F3. We distinguish between the following two cases: Case 1: 4 _<_ I 5 2(F,£".‘1 — FE, . The cycle is constructed by the first I / 2 nodes in both L0 and L1 together with two additional links (represented by heavy segments in Fig. 3.2(b)). The two links are: one edge connecting the heads of L0 and L1; the other edge connecting the two (I / 2)“ nodes in both Hamiltonian paths involved. Case 2: 2(F,‘;°'_‘, — F,‘,3_‘4) <13 FE]. 34 Define l’ = l - 2(F,‘,3_]1 — 33,). We further break Lo into two parts: [10.0 which is a Hamiltonian path in 0011i2 U 0101224, and L0,; which is a Hamiltonian path in 01101124 (refer to Fig. 3.2(c)). Notice that 001);, contains 001011,:4 (see Fig. 3.2 (a)), and each node in 00101231,4 has an edge connecting to one node in 01101224. The cycle of length 1 consists of all nodes in [10,0 and L1 together with the first 1’ nodes in Log. Since the inclusion of two adjacent nodes in L03 requires a pair of “up” and “down” edges (represented by heavy segments in Fig. 3.2 (c)), only cycles of even length can be embedded in P i We next discuss the case for arbitrary k where It: > 3. Following the same approach as in lc = 3, let us define Lop, L1 and LOJ to be the Hamiltonian paths in 0(V,'f_l \ 154011344), 0(V“_1\1"’10V,'f_,,_1), and 01“‘10V,',‘_,,.l , respectively. It is not difficult to see that any cycle of length 23 can be obtained by selecting the first 3 nodes from each of Lop and L1, where 2 S s S F321 — F,‘,k_‘k_1 . To have a cycle of length 23, where F3111 — FRI“, < s S Fifi/2, we just include the first 23 — 2(I*",‘,k_‘l — Fi’flkd) nodes in the [10,1 . D It can be easily seen from the previous proof that P 1‘, contains a Hamiltonian cycle if the length of L04 is even. Notice that L0,] is a Hamiltonian path in a graph isomorphic to Pf,_,,_1 , which contains exactly I"',‘,"_‘,,_1 nodes. We next examine some properties of the generalized Fibonacci numbers and the generalized Fibonacci cubes, and show that Pf, contains a Hamiltonian cycle if and only if Fl,” 2 4 and is even. Lemma 3.2 (1)Forlc=2 andi22, E]? is even, and 17:]211, F3314 are odd. (2) Fork_>_3andi22, I: I: k I. Fikiqw Fishy—31 F([k1-1)i-4’ vF([Is]+1).'—k are we": and 35 J 3 av", my” 110V”, J J J ooV,,_, may” or 1oV,,_, J J 000V”, oo1oV,” oonovr, (a) 1'0 - A A A A A A A A A A 0v, ' v v v v v v v v v ' n-l nevi“; nevi, : = = = = : LI (5) L0,: 3 . °"°V...4 Lap 3 3 1 D 00VM U 010V” 3 3 10V” U "We: : = e = e = L! (c) Figure 3.2. Embedding of cycles in 1‘3 : (a) a decomposition of PE], (b) Case 1, and (c) Case 2. k I: F([k]+l)i-—11 F([k]+1)'-_2 are Odd. Proof: By induction on 1'. CI Note that F311,“, is even if and only if E?“ is even, where n 2 2(k + 1). Hence by Theorem 3.2, Pf, is Hamiltonian if F,‘,“] 2 4 and it is even. Furthermore, by Lemma 3.2, k — 1 out of the k + 1 consecutive 10‘" order Fibonacci numbers are even. 36 We next study the Hamiltonian problem for the generalized Fibonacci cubes whose total numbers of nodes are odd. It is known (see e.g., [44]) that the binary hypercubes are bipartite. Moreover, a bipartite graph cannot have a cycle of odd length. Hence we have the following property (see e.g., [101]): Lemma 3.3 There is no cycle of odd length in the binary hypercubes. Lemma 2.6 shows that Pf, is a subgraph of 3,...» Therefore P: is also bipartite. Consequently, we have Lemma 3.4 There is no cycle of odd length in the generalized Fibonacci cubes. Theorem 3.2, Lemmas 3.2 and 3.4 lead to the following result: Theorem 3.3 (1) For k = 2, P3, is Hamiltonian if and only if n = 3i for some integer i _>_' 2; otherwise, the length of the longest cycle is Ff] — 1. (2) For lc Z 3, P: is Hamiltonian if n can be expressed as one of the following forms: (I: + l)i, (I: + l)i — 3, (Ir +1)i— 4, ~-- , (k +1)i— ls: for some integer i 2 2; otherwise, the length of the longest cycle is F,‘,"‘ — 1. Corollary 3.1 For k 2 2 and n — k 2 3, a ring of size 131“] can be embedded in P1“, with dilation one if F,‘,“] is even, and dilation two if otherwise. Proof: The case for F,‘,“] being even is proved in Theorem 3.3. If F,‘,“] is odd, the only node that is not included in the longest cycle has at least one adjacent node which is in the cycle. Hence the embedding of ring can be achieved by paying a higher price (i.e., dilation is 2). Cl 37 3.3 Mesh In this section, we will study the embedding of mesh in the generalized Fibonacci cube. We will begin our discussion with the embedding of 2D mesh, then extend our result to mesh of higher dimensions. For convenience, all the Hamiltonian paths referred in this section are obtained by the procedure mentioned in the proof of Theorem 3.1. 3.3.1 2D Mesh To motivate, we examine an example of embedding in P3. Example 3.3 In Example 3.1, we saw that (01,00,10) and (010,000,001,101,100) are Hamiltonian paths in P} and P2, respectively. Figure 3.3 shows that an F5‘2‘ x F?‘ and an FF] x F12] mesh can be embedded in P3, where a 2-FC is divided into a left (leftmost 4-bit) and a right (rightmost 3-bit) parts. Notice that the sequence of codes over the y-direction in the F5‘2‘ x F5‘2‘ mesh corresponds exactly to the Hamiltonian path in P2, whereas the sequence of codes over the x-direction is obtained by suffixing a ‘0’ to each code of the same Hamiltonian path. On the other hand, the sequences of codes over the z- and y-direction in the FF] x F12] mesh are obtained by suffixing “01” and prefixing ‘0’ to the Hamiltonian path in P3. 0100 0000 0010 1010 1000 0101 0001 1001 010 010 000 000 001 001 101 100 Figure 3.3. Embedding of an F 5‘2] x F 5‘21 and an F4‘2‘ X FF] meshes in P3,. 38 Figure 3.4 explains the idea behind the embedding shown in Example 3.3. It can be seen that there are exactly two possible combinations when dividing an (m +n)-bit 2-FC into an m- and n-bit 2-FCs. Consider a node whose bit n is a 0 (combination A). . l .. xlol 1x .. 1 s I xlolil [on I Figure 3.4. Dividing an (m + n)-bit 2-F C into an m- and n-bit 2-FCs. Observe that all the n bits in the right part can vary, which results in all n-bit 2-F Cs; on the other hand, all but the rightmost bit (which is fixed to 0) in the left part can vary, which results in all (m — l)-bit 2-F Cs. Recall that any generalized Fibonacci cube contains a Hamiltonian path. We can select the Hamiltonian path in P3,“. as the sequence of codes for embedding over the :c-direction, and the Hamiltonian path in P3,, +1 with a suffix ‘0’ as the sequence of codes for y-direction. Therefore, the nodes captured in combination A will enable an embedding of a mesh with size Fifi, x F312. Similarly, the nodes captured in combination B will enable an embedding of a mesh with size FE] x F311. We next show that this scheme can be generalized to embed 2D meshes in the generalized Fibonacci cubes. Theorem 3.4 (I) For m _>_ 2 and n 2 1, an+n+2 embeds an FE], x F322 and an F3] x FE], meshes. (2) Form 2 3 and n 2 2, Fifi” embeds an F‘S‘H x F933, an F9!” x (F,‘,312+F,‘,3311) m n m and an FE] x Fflz meshes. 39 (3) For k > 4, m 2 I: and n 2 k —1,P]‘,,+,,+,a embeds an F‘Jihx F‘flhan I: I: I: k I: I: Fhik-2X(F1si+lk— MFriik-2+"°+Frii1) 0" Finlk-3X(Fr[1+]-k—1+F[+]-lc—2+H + F1“), - - - ,and an F,‘,[“ x F [fits-1 meshes. Proof: The case when I: = 2 has been examined in the preceding paragraph. We will prove the case when k = 3. Consider dividing an (m +n)-bit 3-FC into two parts, where the left part consists of the leftmost m bits and the right part the remaining n bits. Figure 3.5 shows the four possible combinations. It can be shown that the nodes captured in combination A (respectively, D) enable an embedding of a mesh with size F ‘ ,3, x F,‘,+3 (respectively, FE," x F32). To see the embedding of the mesh with size REL, x (Fifi + F311), observe that the nodes captured in combinations B and C have the same left part and their right parts can be obtained from V2” by removing all codes in the 1101):. Hence by the procedure presented in the proof of Theorem 3.1, we are able to embed a linear array of size Fflz + F331 in them. The case when lc Z 4 can be handled similarly. For example, Figure 3.6 shows the case when k = 4. It can be seen that there are seven possible combinations to break a 4-FC. Again, nodes captured in combination A enable an'embedding of an F313 x F214 mesh, nodes captured in combination G an F ‘4‘ x F323 mesh, nodes captured in combinations B, C and D enable anF >| Fl°l1|d MK I Figure 3.5. Dividing an (m + n)-bit 3-F C into an m- and n-bit 3-FCs. Hamiltonian path in P: (refer to Example 3.1). We next demonstrate that by properly rearranging the code sequence over the :c-direction, these meshes are connected and form one piece. It is necessary to define the Reversed Fibonacci Code (RFC). Given a k-FC P = (p;p,-_1 - - -p1po), the RFC of P (denoted by RFC’(P)) is (popl - - op,-_1p.-). Figure 3.9 shows that there are two different ways to label the nodes in P (2,: one is 2-FC and the other is in reversed 2-FC. The following lemma shows that the RFC can be also used to label nodes in the generalized Fibonacci cube. Lemma 3.5 Let p (0 S p < F,‘,“]) be a node in P1], and FC(p) = pn—Ic-lpn-lc-2 ' "P1P0- LCt q = PoFrilfll + PlFsi’flz + ' " + pn-k-2Frik—lk+l + pn-k-l Elk—1k- Then q can be used to relabel the node p if all nodes in Pf, are relabeled by their RFCs. Proof: The proof consists of two parts. We first show that RF C() is a one-to- one and onto function from V: to 11:. Then we demonstrate that two nodes with unit Hamming distance under their h-FCs will preserve unit Hamming distance under their RFCs. The first part of our proof consists of three steps: 1. RFC () is a one-to—one function. To see this, assume that RFC () is not a one-to- one function. Then there exist 0 S p 75 p’ < F,‘,“] such that RF G (p) = RFC (p’ ) Al :11 I: I 3| onM 1on 1 - 4...] 111on | 0L 411111111on | 21 4011111 1on | .[ 401111111“ «F 411111 1on I Figure 3.6. Dividing an (m + n)-bit 4-FC into an m- and n-bit 4—FCs. This implies that F C (p) = F C (p’ ) Since the k-FCs for both p and p’ are unique, we conclude that p = p’, which is a contradiction. 2. Given V: as the domain of the function RF C(-), then the range will also be V,’f. To see this, assume that the binary code Q = RF C(p) is not in V,’f, then Q must contain lc consecutive 1’s. This implies that F C (p) contains k consecutive 1’s, which is a contradiction. 3. RFC () is an onto function. This follows from 1 and 2. Now, we prove the second part. Recall that two nodes p and p’ are adjacent in Pf, if and only if their lc—FCs differ in exactly one bit. In other words, p and p’ are adjacent if and only if Ip — p’ | = E‘k‘ for some I: S i < 11. Given a pair of adjacent nodes pand p’, let q and q’ be their second labels, respectively. It can be shown that 42 0110 0100 0000 0010 1010 1000 1100 0101 0001 1001 1101 0011 1011 Figure 3.7. Embedding of meshes in Pic- 0010 1010 1000 0000 0100 0101 0001 1001 Figure 3.8. Embedding of meshes in P3. |q — q’l = Fflbh, if and only if [p - p’] = F,‘k‘ for some I: S i < n. In other words, two nodes with unit Hamming distance will preserve unit Hamming distance in their second labels. ‘3 From Fig. 3.5, it can be seen that the least significant bits(s) in the left parts are fixed to ‘0,’ “01” and “011,” respectively. Lemma 3.5 shows that nodes can be also labeled by their RFCs. The Hamiltonian path constructed by the recursive procedure presented in Section 3.1 will include nodes with leading bit(s) ‘0,’ then nodes with ‘43 Figure 3.9. Two different labelings of Pg. “10” and followed by nodes with “110.” Figure 3.10 shows that by first obtaining the Hamiltonian path and relabeling each node with its RFC over the :c-direction, we are able to connect the three separate meshes into one piece for the embedding in P330. Clearly, this scheme can be applied to k > 3. 0110 0010 1010 1000 0000 0100 1100 1101 0101 0001 1001 1011 0011 011 011 010 010 000 ‘ 000 001 001 101 101 100 ' 100 110 , 110 Figure 3.10. Embedding of meshes in P 110. 44 3.3.2 3D Mesh The embedding of 3D meshes is a further extension of the results obtained in the previous section. Recall that the approach taken before is to divide the lc-FC into two parts. To have an embedding of 3D mesh, we divide the k-FC into three parts: left, middle and right, which are used for embedding over the 1:, y and 2 directions. Figure 3.11 shows that there are four possible combinations when dividing an (I + m + n)—bit 2-FC into an l-, m- and n-bit 2-FCs. Notice that the difference between I bits m bits n bits D] xloll [olx x 0'1] o]xg Figure 3.11. Dividing an (1+ m + n)-bit 2—FC into an l-, m- and n-bit 2-FCs. the left and middle part in combinations A and B is similar to what we saw in the 2D case (refer to Fig. 3.4), except that the rightmost bit in both middle parts are 0’s. By analogy, it can be shown that nodes captured in combination A and B embed the following two 3D meshes of sizes: (A) HE], x FE!” x F312, and (B) FP‘ x F3] x F312, respectively. Similarly, the nodes captured in combinations C and D embed the following two 3D meshes of sizes: (C) RE], x RE] x Fifi], and (D) Em X Fifi, X F5311, respectively. The embeddings in Pf+m+n+3 is pictured in Figure 3.12. There are a total of 16 possible combinations. By analogy, the sizes of the meshes are: t" assesses: r SD 2 i: r ?= e ' FIE]: X F314»: X Fifi-’13, E311 X Ffiii X Fair-sis, F311 X FE] X F33, Iv'P‘ x FE], x F323, EfixFflsXEfl. FE. x FE] x FE. F511 X Friil-l X F5112, F,‘3‘ x FE] x FE]; , dfixfflsxfli. flflxfiExflfia Flt-:11 X Fifi-1 X F31, F13] >< F1?! x FE. FIE-‘2 X FE] X F312, F1311 X Figl-l X Fills, Eli-11 X F512 X Full-2, 71 FP‘ x F51. >< F‘i‘ls. fl and respectively. 3.4 Hypercube We saw that each generalized Fibonacci cube is a subgraph of a hypercube. 45 In Theorem 3.5, we will show that a generalized Fibonacci cube also embeds a hypercube. Theorem 3.5 For n 2 k 2 2, P: embeds Bd where d = n — k — m and m [(71 - Wkl- 46 Proof: Recall that nodes in P: are addressed by (n— k)-bit k-FCs. Table 3.1 shows some embeddings of hypercubes in Pf, for some 2 S k S 5 and 2 S n S 10. Notice that a ‘-’ means that such Pf, is undefined; a ‘A’ stands for the graph containing a single node, which trivially embeds Bo; a ‘x’ means “don’t care.” Finally, the total number of ‘x’ in an entry determines the dimension of the hypercube that can be embedded in the P ,’§ Since no 1: consecutive 1’s is allowed in the lc-FCs, it can be shown that every I: — 1 x’s should be followed by a 0.. Hence we have d = n — lc — m. C] Table 3.1. Embedding of Hypercubes. ln\kll2 l3 l4 l5 l 2 A A -— -— 3 x A A -— 4 x0 x A A 5 x0x xx x A 6 x0x0 xx0 xx x 7 x0x0x xx0x xxx xx 8 x0x0x0 xx0xx xxx0 xxx 9 x0x0x0x xx0xx0 xxxOx xxxx 10 x0x0x0x0 xx0xx0x xxxOxx xxxxO 3.5 Summary of Results We have considered the Hamiltonian problem, mesh and hypercube embeddings in the generalized Fibonacci cube. For the Hamiltonian problem the following optimal results have been obtained: 47 1. P: contains a Hamiltonian path. Hence a linear array of size Fl” can be em- bedded in Pf, with unit dilation. 2. The longest cycle in Pf, (where n — 3 Z k 2 2) contains F,‘,"] nodes if F,‘,"] is even, and F,‘,"] — 1 nodes if otherwise. Consequently, a ring of size F,["1 can be embedded with dilation-l if F,‘,"] is even, with dilation-2 if otherwise. Using the result of Hamiltonian path, we have shown that certain 2D and 3D meshes can be embedded in the generalized Fibonacci cube with expansion-l and dilation-l. Moreover, we have also shown how to embed a hypercube in the generalized Fibonacci cube. 48 lblts mbits nbits xI°I Ix *I°| |x xl°l1l 1on ‘ onI I: ...H 11on FM 1: X O[1]1| IOIx XIO] IX 4.] [x 14.44 |.[x x1011 .. 40111 M: xI°I1I I1I°Ix x|°I1I |°Ix “11111 1.1. onH 1on x 0] I1: x]o]1| [110 x *|°|1| «>1: . olll {1|on XLOlll [llolx XIOII 1'0 X X Olllll O X XIOII] [IIOIX xlol |* Mild Iolx x|°lll Iolx 441111 1on *I°|1I I1I°|x xI°|1I1I F’Ix x o]1|1] [o x onIlIlI [0]): I J | J I I I I I I I I | I | | re 3.12. Dividing an (I + m + n)-bit 3-FC into an l-, m- and n-bit 3-FCs. CHAPTER 4 DATA COMMUNICATION IN GENERALIZED” FIBONAC CI CUBE Efficient data communications are critical to the performance of the parallel or dis- tributed systems. Many previously known results have focused on systems that are based on regular interconnection topologies such as the hypercubes and meshes [58] [67] [92] [93] [100] [106]. In this chapter, we design and analyze algorithms for data communications in the generalized Fibonacci cubes. Algorithms for routing and single node broadcasting messages in hypercubes have been extensively studied in the past: Deadlock-free and shortest-path routing algo- rithm is first presented by Sullivan and Bashkow [106]. By employing an integer weight [106] for bookkeeping, they have also presented an optimal single node broadcasting algorithm. In [100], the communication latency was considered by Saad and Schultz in designing algorithms for single node broadcasting and other data communication primitives. By splitting packets and using multiple paths, several optimal and near optimal broadcasting algorithms have been reported by Ho and Johnsson [58, 67]. Unfortunately, none of these routing and single node broadcasting algorithms applies 49 50 to the generalized Fibonacci cube, even though they are subgraphs of the hypercubes. Message routing under the faulty hypercube has also drawn attention from re- searchers. Chen and Shin [23, 24] have proposed a depth-first search scheme for routing in an injured hypercube; Gordon and Stout [51] have studied random and sidetracking routing for the hypercube in the presence of faults. However, these ap- proaches do not guarantee that the paths produced are the shortest possible and deadlock—free. Routing and single node broadcasting algorithms for the Incomplete Hypercube are studied by Katseff [68]. We showed in [81] that the algorithms designed for the Incomplete Hypercubes can be applied to the (second-order) Fibonacci cubes as well, which is a little surprising because the latter appear to be less regular than the former. The extension of the single node broadcasting algorithm to the generalized Fibonacci cubes has been reported in [80]. In this chapter we further extend our work and report results on the following communication primitives: 1. Single node broadcasting, 2. Single node accumulation, 3. Single node gather, 4. Single node scatter, 5. Multinode broadcasting, and 6. Multinode accumulation. Communication in the system is achieved by message passing on links. A message originated from one processor (node) and destined for another may be routed through some intermediate processors. A path is represented by an ordered list of nodes (processors) in which every two consecutive nodes are connected by a link (edge). The length of a path is the number of links in the path. A link has a link number i if and only if it connects two nodes whose addresses differ in exactly bit i. 51 The relative address of two nodes is the bitwise Exclusive-OR, denoted by GB, of their addresses (which are binary codes for nodes in the hypercube or Fibonacci codes for nodes in the generalized Fibonacci cube). For convenience, we use 8,1, to denote an n-bit binary string with bit j being 1 and [all other bits being 0, i.e., 8,1; = e..-1e,,-2 - - - eleo where e,- =1 and e.- = 0 Vi 75 j. The organization of this chapter is as follows. We present distributed routing algo- rithm in Section 4.1, and show that the routing algorithm carries two most desirable properties, i.e., all the paths determined are the shortest possible and deadlock-free. The distributed shortest—path, deadlock-free single node broadcasting algorithm is presented in Section 4.2, which is based on a spanning tree (called STF) on the gen- eralized Fibonacci cube. In Section 4.3, we analyze the single node broadcasting in two models: We show that our proposed all-port broadcasting algorithm is optimal in terms of minimizing time steps. By using the most-significant-linlc-first scheduling discipline, our one-port broadcasting algorithm is shown to be optimal for many cases. Results for the single node accumulation are reported in Section 4.4. Algorithm for the single node scatter/ gather is presented and analyzed in Section 4.5. Section 4.6 reports the results for the multinode broadcasting and accumulation. Finally, we summarize our results in Section 4.7. 4.1 Routing The routing problem we studied in this section is to send a message from one node to another node. Katseff [68] has devised a routing algorithm for the Incomplete Hypercube. The key idea is to identify an existing link and forward the message via . it until the message reaches the destination node. We discover that a similar approach can be applied to the generalized Fibonacci cubes. The routing algorithm is a procedure that is executed by the source (originating) 52 node and by every intermediate node on the path to the destination. It is initiated by the source node which computes its relative address with respect to the destination node, then it sends the message to the next node on the path. This procedure is repeated until the message reaches its destination. Let (reladd, msg) denote the information to be sent on links, where reladd is the relative address between the destination node and an intermediate node while msg denotes the message to be sent. Notice that reladd is updated on each intermediate node when the message is being routed toward the destination. Formally, the routing algorithm is described as follows. Algorithm 1 To send a message from node source to node destination: if node_id 8 source then reladd «— source 69 destination else /* this is an intermediate node */ receive (reladd, msg). if reladd 75 0 then begin /* message has reached an intermediate node */ Let i be the bit number of the most significant 1 bit in the reladd, where link i from this node exists. Send (reladdEB 8,1,4, msg ) on link i. end 4.1.1 Shortest path In [68], Katseff proved that the length of a path determined by this algorithm, when applied to the Incomplete Hypercube, is equal to the Hamming distance between any source and destination nodes. We next show that Algorithm 1 can be applied to the generalized Fibonacci cubes; furthermore, we will show that all routing paths 53 determined are the shortest possible. It is necessary to introduce a few notations at this point. Let B = (bn_1b,,_g ---b1bo) be an n-bit string, then B[i:j] denotes the substring (b,b,-..1 - - - bj) of B, where i and j are two integers and n — 1 Z i 2 j Z 0. For example, suppose that B = (b4b3b2b1bo), then B[4z2] = (b4b3b2). The minimum (respectively, maximum) of two integers i and j is denoted by min(i,j) (respectively, max( i, 3)) Lemma 4.1 characterizes the interconnection rules for nodes in the generalized Fibonacci cubes. Lemma 4.1 Interconnection rules : Let p denote an arbitrary node in P:. Let FC(p) = p.._k-1p,,_;._2 . . .plpo be the lc-FC of node p, and Q denote FC‘k](p) 69831-1: Then, link j from node p exists if and only if one of the following two conditions is satisfied: R1 p, = l, or R2 p, = 0 and no 1: consecutive 1 ’3 appear in Q[min(n-k-1,j+k-1):max(j—k+1,0)]. Proof: The flipping of bit j in F C (p) from 1 to 0 always results in a valid lc-F C, because this never creates k consecutive 1’s. Hence we have the rule R1. Similarly, to obtain a new k-FC by flipping bit j in F C (p) from 0 to 1 is possible if and only if the new code does not contain k or more consecutive 1’s. Now, the flipping of bit j in F C (p) from 0 to 1 can possibly create 1:: or more consecutive 1’s only in Q[min(n- k-1,j+k-I):max(j-k+1,0)]. This is captured by rule R2. C] We now present an important property of Algorithm 1. Lemma 4.2 Lets and d be two distinct nodes in P " n) and let H(s,d) be the Hamming distance between these two nodes. Algorithm 1 always finds a path of length H(s,d) from s to d. 54 Proof: First we prove that Algorithm 1 always succeeds in finding a link at each intermediate node. Let F C(s) = sn_k-ls,,_k_2 - . . also and F C(d) = "4-14.4-2 ---d1do be the k-FCs of nodes 3 and d respectively. If H (F C(s),F C(d)) = 1, then by Definition 2.3 s and d are connected. Without loss of generality, assume that H(FG(s),FG(d)) Z 2. Let R = rn_k_1r,,_k-2 - --r1ro be the relative address of nodes 3 and d. We say that R specifies bit i if r,- = 1. Assume that j is the most significant and l is the second most-significant bit specified by R. We show that either link j or link I exist(s) from node 3. By repeating the same procedure, we conclude that Algorithm 1 always succeeds in finding a link at each intermediate node. To see the existence of link j or I, we distinguish between the following two cases. Case 1: le—lc+1. We further consider the following two subcases: (a) s,- = 1 (d,- = 0). By rule R1 of Lemma 4.1 linkj exists from s. (b) s,- = 0 (d,- = 1). If s; = 1 (d; = 0), then by rule R1, link 1 exists from 3. On the other hand, when 31 = 0 (d; = 1) it should be clear that there is no lc consecutive 1’s in FC(d)[n — k — 1 : max(j — k + 1,0)]; hence by rule R2, link j exists from 3. Case 2: l z’. (11 re- I10 of the Sbc 63 According to Algorithm 2, travel[z] received at p;- would be set to FALSE. Therefore the broadcast message would never reach node t, which is a contradiction to Claim 1. Case 2: z < 2’. Algorithm 1 would send the message along link 2’ (instead of link 2 ), which is also a contradiction. Hence we conclude that p = p’. D An important result follows from Theorem 4.3. Corollary 4.2 The nodes and links traversed by executing Algorithm 2 for broadcast- ing from a given node in Pi form a spanning tree. Proof: By Corollary 4.1, Pi is connected with diameter n—k. Furthermore, initially Algorithm 2 sets all bits in the travel array to TRUE. It can be shown that Algorithm 2 will traverse all nodes in Pi. It remains to show that all edges traversed by Algorithm 2 do not form any cycle. Let s be the source node to originate the broadcast. By Lemma 4.2 and Theo- rem 4.3, the broadcast message will reach a node, say pj, in exactly H ( F C (s), F C ( pj)) hops. Hence the only possible situation for any cycle involving p,- is to have two copies of the broadcast message arrive at p,- from two distinct nodes, say p,-1 and p34 , such that both nodes have the same distance H (F C (s), F C (p,)) — 1 from the source 3. It should be easy to see that this is a contraction to Theorem 4.3. D The spanning tree generated by executing Algorithm 2 will be referred to as the spanning tree for the generalized Fibonacci cube (STF). We note that two STFs with different roots are generally not isomorphic. In contrast, all SBTs of a Boolean cube are isomorphic. In the next section, we will closely examine these two kinds of spanning trees and show how to transform an SBT to an STF in order to analyze the of 0P 5313 Th Can 64 time required by the single node broadcasting algorithm. 4.3 Analysis of Single Node Broadcasting In this section, we will analyze the time complexity of Algorithm 2. Two communica- tion models are considered as in [9, 67]. In the one-port [67] (or single link availability [9]) model, a processor can only send and receive a message on one of its communica- tion ports at any given time. On the other hand, in the all-port [67] (or multiple link availability [9]) model a processor can communicate (send and receive) on all of its ports concurrently. The one-port model is more suitable for networks with a relative high degree of interconnection from the implementation point of view. For convenience, we will follow the model used in [9]. In Other words, we will use the number of time steps to measure the time requirement; all messages are assumed to be of unit length and it takes one time step to send (forward) a message from a node to its adjacent node(s). 4.3.1 All-port Model We first consider the all-port communication model in the Boolean cube. Notice that the farthest node from any given node in B" is at a distance of n (the diameter of B”). Hence n is a lower bound of any broadcasting algorithm for B... Since the height of any SBT in B,, is exactly 11, broadcasting algorithms based upon the SBTs are optimal. Similarly, by Theorems 4.1 and 4.3, Algorithm 2 requires only minimal time steps to complete any single node broadcasting in the generalized Fibonacci cubes. Theorem 4.4 Let d, be the maximal distance between an (arbitrary) node 3 and all the other nodes in Pi. Then, under the all-port communication model Algorithm 2 can optimally complete a broadcast from s in exactly d, time steps. 1b to} SP1 deI her. be 5. the 65 Notice that for 1‘: we have [filial-j S d, S n - lc, and the lower bound [Ml-55:11] is achieved when broadcasting from node 0. 4.3.2 One-port Model In the one-port model, only one port is available for transmitting message at any moment. Hence the scheduling scheme applied to nodes which need to send/ forward a broadcast message to multiple nodes will make a critical difference in the overall broadcasting time. With the Boolean cubes, it is known [67] that SBTs combined with the “Tallest Remaining Subtree First” (TRSF) scheduling discipline results in an optimal one-port broadcasting algorithm. We now present a scheduling discipline called Most-Significant-Link-First (MSLF) for the generalized Fibonacci cubes. Re- call that under Algorithm 2 a node p should forward the message through every link j provided that the corresponding travel [7] received at p is TRUE. With the MSLF rule, each node first sends the message via the specified most-significant link, then via the specified second most-significant link, and so on. In other words, a node should follow the left-to-right scanning order in sending messages to multiple links that are specified in the received travel array. Our one-port broadcasting algorithm is thus defined as Algorithm 2 combined with the MSLF scheduling. Notice that with the Boolean cubes the MSLF reduces to the TRSF scheme, because there are no missing links. Let I.- be the link number of a link in an SBT; it can be shown that n—l; is exactly the time step at which the MSLF scheme used to forward the broadcast messages when the one-port broadcasting algorithm is applied (cf. Fig 4.2). Consequently, the time required by our one-port broadcasting algorithm is exactly n for B... We next show that our one—port single node broadcasting algorithm requires no greater than n — lc time steps for any single node broadcasting in Ffi. It is necessary to examine closely the relation between the STF and the SBT at this point. l: l": cor and link This 1mm! 66 To simplify our presentation, in the remainder of this section a node in the Boolean cube or the generalized Fibonacci cube will be referred to by its binary address (which is either a binary code or a k-FC). For convenience, we introduce a few new notations. Let SBT..(s) denote the SBT rooted at a node 3 in B". Similarly, let STF:(s) denote the STF rooted at a node 3 in Ft. A path between node 3 and node t can be also represented as an ordered list of link numbers which comprises a sequence of link numbers leading from s to t. In this case, we will refer to link sequence to emphasize that the path is in terms of link numbers instead of node labels. The relation between a path p and its corresponding link sequence q is as follows. Let p = (uo,u1, - --u,-_1,u;) denote a path from node no to node u,- and q = (l1,l2, - - - l,--1, l,-) the link sequence corresponding to the path. Then u,- = no 69 ll EB lg - - - 613 l,- for 1 S j S i. As an example, in Fig. 4.2 the path connecting nodes 0000 and 1111 in the SBT4(0000) is (0000, 1000, 1100, 1110, 1111) and the corresponding link sequence is (3,2,1,0). Let (l1,l2, - - - ,l,-) be an arbitrary link sequence in an SBT. It is important to note that l1 > I; > - - - > I.- is always true. This is because, as defined in Section 4.2, for any internal node of an SBT the link number(s) of the out-edge(s) is less than that of the in—edge. We now proceed to analyze the time required by our one-port broadcasting algo- rithm when it is applied to the generalized Fibonacci cubes. We write V: to represent the set of k-FCs for all nodes in Pi. Lemma 4.4 establishes a relation between an SBT and an STF, which will enable us to transform an SBT to an STF. Lemma 4.4 Let uo be a node in V5. Let (u,,u;+1,u,-+2) be the path in SBTn-k(uo) connecting node u,- to node 11,-4.2, and (l,-+1, lg“) the corresponding link sequence. As- sume that 11,-4.2 is in V: and a,“ is not in V5. Then 1. Node u.- must be in V5, 2. Within 8..-,“ link l,-+2 from u,- leads to a node r that must be in V5, and by (3;. C0D in]; ll) {b node 67 3. Within 8..-," link lg.“ from r leads to 11,-4.2. Proof: First we prove that node u,- is in V: by contradiction. Assume that node u,- is not in Vlf, we distinguish between the following two cases. (Notice that u,- and u,-+1 differ in exactly bit l,-+1.) Case 1: There are k 1’s in both u;[n - k —1 :l.-+1 +1] and ug+1[n — k -1 :l,-+1 +1]. This implies that there are k 1’s in ug+2[n — k : I,“ + 1], because u,“ inherits these bits from 11,-4.1. Therefore node u,“ would not be in Vlf, which contradicts to our assumption. Case 2: There are k 1’s in both u;[l,-+1 — 1 : 0] and ug+1[l,-+1 - 1 : 0]. This implies that there are k 1’s in uo[l,-+1 — 1 : 0], because both u.- and u,“ inherit these bits from no. Therefore node uo would not be in V5, which is also a contradiction. We now prove that r is in V,'f. Let (b..-k-1, b,,_k_2, - - - , b1“, , bl.+,—1,' - - , bu", ~ - - , b0) be the binary code of 11,-. We claim that b;.. H = 0 and bl“, = 1. Consequently, node r must be in Vlf, because r is obtained from u,- (which has been proven to be in Vlf) by flipping bit I,“ from 1 to 0. To verify the previous claim, we examine all the three other possibilities: (1)51”, = 0 and b1 n+2 = 0. Node 11,-4.2 would not be in V,'f, since Ug+2 is obtained from a,“ (which is not in V: ) by flipping bit I,” from 0 to 1. This is a contradiction. (2) b1,+1 = 1 and by,“ = 0. Node u,“ would be in Vlf, since 21,-4.1 is obtained from u,- by flipping bit lg.” from 1 to 0. This is a contradiction. (3) bl,“ = l and b; m = 1. Similar to (2), node u,“ would be in V:. This is also a contradiction. Finally, the edge (r, 21,-4.2) is in 8..-], (more precisely, I": ), because both nodes are in V: and they differ in exactly bit lg.” as we just observed. Cl With Lemma 4.4, we can devise a scheme to transform an SBT to an STF. Let uo be a node in Vlf. A link (p, q) in SBTn_;.(uo) can be classified into the following four groups: (13 an. bro lust are 3 link We a 68 (1) both p and q are in V:, (2) p is in V:, whereas q is not, (3) p is not in Vlf, whereas q is, or (4) neither p nor q is in V5. The scheme is as follows. We retain all group 1 links, remove all group 2 links together with the corresponding q’ 3, remove all group 3 links together with the corresponding p’ s, and remover links in group 4 together with the corresponding p’ s, q’s. Each node q in group 3 needs special attention, since the corresponding p had been removed. Fortunately, Lemma 4.4 guarantees that there exists a node in V:, which has a link to q and, hence, can become the new parent of q. Algorithm 3 describes a method to transform an SBT to an STF. Algorithm 3 To transform an SBT to an STF: /* Assume that uo is an arbitrary node in V: */ Step 1 Remove every node (along with all incident edges) that is not in Vlf. Step 2 For each node u in V:, whose parent has been removed in Step 1, find its new parent by applying Lemma 4.4 and add a new link between them. Example 4.3 Figure 4.4 illustrates a construction of STF83(01100) from SBT5(01100). For comparison, nodes not in V: are signified by dotted ellipses in STF83(01100). There are two labels for each link in the STF: one is its link number and the other (parenthesized) signifies the routing step determined by the one-port broadcasting algorithm to broadcast a message from the root. To form the STF, for instance, node 11100 is deleted from the SBT in Step 1, and nodes 10100 and 11000 are respectively connected to nodes 00100 and 01000. Notice that (4,3) and (3,4) are link sequences from node 01100 to node 10100 in the SBT and the STF respectively. We now prove the correctness of the above algorithm. L. C]? 69 an» 3 e w «my: ’ «m‘ m «an a ' a. o a o o . O O 1 . o o x o o 0 ° @cuzm‘msrmm «my «my. (33333 (: 1 (n 6321361339 «m «m Figure 4.4. Construction of (b) STF§(01100) from (a) SBT5(01100). Lemma 4.5 Given an SBTn-k(uo) as its input, Algorithm 3 correctly produces STF:(uo), provided that no is in V5. Proof: It should be clear that the graph generated by Algorithm 3 is still a tree. It remains to show that this tree is STFfluo). Consider three nodes 11,-, u,“ 11,-4.2 and the corresponding link sequence (l.-+1, l.-+2) as specified in Lemma 4.4. It can be shown that there exists a path from u; to 11.42 in STFfluo) and link l,-+1 is in that path. We now prove by contradiction that (1.4.2, 1.4.1) is exactly the link sequence in STF/{(110), which connects u,- to 11.42. Assume that (”+21 l.-+1) is the link sequence in STF:(uo) connecting u; to v.42, where 1;“ gé lg+2. There are two possible cases: (1) l§+2 > 1.4.2. We claim that l,“ > ls”. Because if the claim were false, then the link sequence t3 L6 610 STF and z for (b have 1' U are 0 Case 1 Withom 70 (12,”, 154.1) from u.- would lead to a node other than man. Consequently, by definition of SBT, (l;+1,l£+2) is a link sequence from u; in SBTn-k(uo), which also leads to a node other than ug+g. This is a contradiction. (2) l2” < 1.4.2. In this case, it is clear that (L41, If“) is a link sequence from u.- in SBTn-k(uo), and this link sequence would lead to a node other than u,-+g. This is also a contradiction. 0 Observe that links in STFs can be classified into two sets: set A includes the links added in Step 2 of Algorithm 3, and set B the remaining links (i.e., the links that also exist in the corresponding SBT). For example, in STF83(01100) the links (00100,10100) and (01000,11000) are in set A, and all other links are in set B (refer to Fig 4.4). Links in set A are resulted from the “missing” nodes which have k 1’s in their binary addresses. Each link I in set A connects two special nodes: the one which defines l as an out-edge is called type—a node and the other (which defines l as an in-edge) type-fl node. We next show that each type-a node has exactly one type-fl node as its child. Lemma 4.6 Let s be a node in F: and u a type-a node in STF/f(s). Then u has exactly one type-,6 node as its child in STF:(s). Proof: Let p be the parent of u and m the link number of the link (p,u) in the STF. We prove the lemma by contradiction. Assume that u has two type-,6 nodes v and w as its children in the STF, additionally, i and l (where i > I) are link numbers for the links (u, v) and (u, w), respectively. Since both links i and l are in set A, we have i > I > m. From the proof of Lemma 4.4, we deduce that all bits i,l and m of u are 0’s. We further distinguish among the following three cases: Case 1: i _>_ l+ 2 and there existsj such that i > j > I and bitj of u is 1. Without loss of generality, assume that i = j+1, j = l +1 and l = m+1. Figure 4.5 (a) shows a portion of the SFT, where nodes are specified by bits i, j,l and m. Since v As the The 71 is in the STF, there is no k consecutive 1’s in v[n - k — 1 : j]. Similarly, there is no k consecutive 1’s in p[l : 0]. Hence, there exists a node u’ in the STF, whose k-FC is v[n - k — 1 : j] - p[l : 0]. Consequently, at node p Algorithm 2 would forward the message to u’, and u’ would become the parent of v. This contradicts our assumption that u is the parent of v. Case 2: i Z l+2 and all bits in u[i—1,l+l] are 0’s. Again, we may assume that i = j+1,j = l+1 and l = m+1. Figure 4.5 (b) illustrates the situation. Similarly, node 11’ (specified by 1001) is in the STF, and u’ would be the parent of v in stead of u, which is also a contradiction to our assumption that u is the parent of v. Case 3: i = l + 1. This is a degenerated situation of Case 2. C] Figure 4.5. Type-a node and its type-fl child. As a consequence of Lemmas 4.5 and 4.6, the next theorem gives an upper bound on the time required by our one-port single node broadcasting algorithm. Theorem 4.5 The one-port broadcasting algorithm requires no greater than n-k time the dec afie 72 steps for any single node broadcasting in Pi. Proof: Let an be the source node to initiate the broadcast. We consider the scheduling for links in set A as defined before. Let u,-, u.-+1, u,~+2 and r be four nodes as specified in Lemma 4.4. Thus links (11;, u,-+1) and (ugil, m“) are not in S TF:(uo); moreover, link (r, u,-+g) is added in Step 2 of Algorithm 3. It can be shown inductively that the forwarding of the broadcast message on link (r, 11,-4.2) can be scheduled at a time no later than that of (u,+1,u,-+2) in the one-port broadcast in 3..-), by using SBTn—k(uo)- To see this, by Lemma 4.6, (r, ug+2) is the only set A link added to node r; furthermore, it is the first link to be scheduled at r according to the MSLF scheme. With the removal of (u,, 11.2“) the scheduling of (11,-, r) can be advanced by at least 1 step. Consequently, the scheduling of links in set B will not be delayed by the added links in set A. El Remark: Let (po,p1,~ - ,p;) be a path in STFfluo). Figure 4.6 shows that there may be two type-a nodes in such a path, i.e., nodes 001011 and 101001. (In general, there may be more than two such nodes in a path.) However, one should be able to deduce that the the scheduling in the second type-a node, i.e., 101001, will not be affected (delayed) by a type-a ancestor. DOt Figure 4.6. A portion of STF:(011011). 4.3.3 Optimality of One-port Algorithm We now examine the optimality of our one-port broadcasting algorithm. Figure 4.7 illustrates an example of broadcasting in I‘; from source node 00000, where links are labeled by the time steps determined by the one-port algorithm. It is seen that a ‘w w H010 4 w (mo Figure 4.7. One-port broadcasting in I32, from node 00000. tOtB-l of 5 time steps is needed to complete the broadcasting. However, by redefining node 00001 as the child of 10001 and properly rearranging the scheduling, Figure 4.8 74 shows that the broadcasting can be completed in 4 time steps, which can be shown to be optimal. Figure 4.8. An optimal one-port broadcasting in 1‘; from node 00000. To study the optimality, we need to determine the lower bound of one-port broad- casting. Under the one—port model, each node can send messages to exactly one other node at any given time. Clearly, under the one-port model, the number of nodes that receivethe broadcast message is at most doubled after each time step. Therefore, it takes at least [log N] time steps to complete a broadcasting for any network consist- ing of N nodes. Consequently, a lower bound for any one-port broadcasting in 1‘: is Dos Flkll- Recall that Theorem 4.5 shows that n -- k is an upper bound for our one-port broadcasting in Ffi. Let s be a node in Pi, and 3 the node whose binary address is the binary complement of F C' (s) The following theorem identifies situations where our one-port algorithm achieves optimality. Theorem 4.6 For n > k 2 2, the one-port single node broadcasting algorithm is optimal under the following two conditions: 1. broadcast from any node in I‘" where Fy‘] > 2"""1, or n) ti 75 2. broadcast from a node s in I‘: such that the node 3 is also in Pi. Proof: Condition 1 is true-because the lower bound [log Fifi] is equal to n — k. Condition 2 follows from the observation that the minimum amount of time to send a message from node 3 to '5 is H(s,§) = n — k. i D For a fixed k, there exists an integer no such that F,[,"] S 2"""‘1 for n > no; hence condition 1 may not always hold. However, since the diameter of F: is n — k, the one- port algorithm may already be optimal for single node broadcasting in the generalized Fibonacci cube. We now further show that other common communication primitives can be effi- ciently implemented in the generalized Fibonacci cubes. We follow the same assump- tions that all messages are of unit length and it takes one time step to send (forward) a message. 4.4 Single Node Accumulation A dual Operation for single node broadcasting is the single node accumulation [9, 10] (or reduction [67]) in which a particular node is to receive messages (data) from all the other nodes. In this operation, the messages received at a node can be “combined” with the local message before transmission. Here we assume that the “combined” message still takes one time steps for transmission. This problem arises, for example, when we want to compute the sum consisting of one term from each node, where addition can be viewed as “combining” messages. The broadcasting and the accu- mulation are no different in concept: any broadcasting algorithm can be transformed into a dual algorithm for accumulation by simply reversing the data paths [9, 10, 100]. Therefore, the times required by the all- and one-port single node accumulation in the generalized Fibonacci cubes are also captured by Theorems 4.4 and 4.5, respectively. \———‘—’~\_ . I senc in 5 In ing) a rithm Algorit} “meal 76 4.5 Single Node Gather and Scatter Many algorithms require a particular node to collect one personalized message from every other node; this data transfer operation is referred to as the single node gather [9, 10, 100] (or all-to—one personalized communication [67]). Notice that in the gather- ing operation messages may not be “combined” (cf. accumulation). A dual operation for the single node gather is the single node scatter [9, 10, 100] (or one-to-all per- sonalized communication [67]) in which a particular node sends different personalized messages to all the other nodes. Again, gathering and scattering are dual problems: any algorithm for gathering can be translated into a dual algorithm for scattering, and vice versa [9, 10, 100]. Algorithm 4 is a simple elegant gathering algorithm for the hypercube, which was proposed in [100]. Algorithm 4 Single Node Gathering in B": fori=0ton—1do: . All nodes addressed with *""‘110' send messages accumulated from the previous steps to nodes *"""0‘+1. It can be seen that Algorithm 4 consists of n steps: at the i“ step nodes sun-”110" send 2‘ messages to nodes *““‘10‘+1. Figure 4.9 illustrates an example for gathering in B4, where node 0 collects messages. Indeed, the gather (scatter) algorithm is similar to the accumulation (broadcast- ing) algorithm. Not surprisingly, the communication graph determined by Algo- rithm 4 (refer to Fig. 4.9) is exactly the SBT. To design and analyze the gathering algorithm for the generalized Fibonacci cubes, we further examine the STFs. For convenience, we introduce some new terms regarding the SBTs. The children of a node link 1 fmm . filPPin HOde u and r : defined node u i flipping 77 GED 639 @® @4596!» @6115 (ED (momma em «I!» (II) as can an» an em worm «no 65’ «no 0:) (6) can (6)» Figure 4.9. Single node gathering in B4. node p in SBTn(s) is represented by Cn(p,s). Let u be a child of node p and j the link number of the link (p,u). The i“ child of u is obtained by flipping bit j —i from u, where l S i S j; the 2"“ child of the root is defined as the node obtained by flipping bit n —i from the root, where 1 S i S n. Node r is the 2"“ right sibling of node v in SBTn(s) if both u and r are in Cn(p,s) for a node p such that u = p $ 8,3, and r = p63 8,1,“ for 0 S j S n — 1 and 1 _<_ i S j; the 0‘“ right sibling of node u is defined to be u itself. In other words, node 1‘ is referred to as the i”l right sibling of node u if they have the same parent p in SBTn(s) such that u and r are obtained by flipping respectively bits j and j —i from p. It is worth mentioning that for any node to tj Exam and th. in the 5 in 5875 mapped of 53m ‘“ 5W 78 in a given SBT the number of its children equals to the number of its right siblings. The root of an SBT defines level 0, the children of the root define level 1, and so on. In what follows, we will first show that S TF,',‘(s) is isomorphic to a subgraph _of SBTn-k(3). With this important observation, we will be able to demonstrate that Algorithm 4 can be adopted to the generalized Fibonacci cubes. Lemma 4.7 Lets be a node in Ft. The tree STF:(3) is isomorphic to a subgraph of SBT.._;.(s) for n 2 k 2 2. Proof: We show how to map type-oz nodes in STF,’,°(s) to SBTn-k(s). Recall that I‘: is a subgraph of BM,“ Let us compare SBTn-k(s) with STF:(s) from level 0 to n — k. Clearly, level 0 of both trees are equivalent. Let q be a node in level 1 of S BTn-k(s), which is to be removed in Step 1 of Algorithm 3. Assume that node v is the i‘“ child of q and v is in Pi. By Lemma 4.4, the 1"" right sibling of q, say u, is in Pi, and u is the parent of v in STF:(s). By Lemma 4.6, v is the only type-,6 node associated with u (a type-a node). Therefore, node u in STFfls) should be mapped to the i - 1‘” right sibling of q in S BTn-k(s). Consequently, descendants of u should be mapped accordingly. The nodes in level 2 through 11 — k can be argued similarly. CI Example 4.4 We redraw STF83(01100) in Figure 4.10 (a), where the type-fl nodes and their descendants are highlighted by dotted rectangles. Since node 11100 is not in the STF, node 00100 (the first right sibling of 11100) should be mapped to 11100 in SBT5(01100). Similarly, node 01000 (the second right sibling of 11100) should be mapped to 00100 in SBT5(01100). Hence STF83(01100) is isomorphic to the subgraph of S BT5(01100) shown in Figure 4.10 (b). Notice that all descendants of node 00100 in STF:(01100) are mapped to different nodes in SBT5(01100) except the subtree To Lerl Pfinh SlEps °P€ra PTOQ Show 1 79 rooted at 10100. On the other hand, all descendants of node 01000 in STF33(01100) are mapped to different nodes in SBT5(01100). 1 ‘13} (EDGE? (ED‘ED‘ (I) can a o 3 am am «an» 3 o 1 e 1 0 a. 0 o l o o o «m 4!!» am GED e «m (b) Figure 4.10. (a) STF:(01100) and (b) a subgraph of SBT5(01100). To adopt Algorithm 4 to the generalized Fibonacci cubes, we need the following lemma. Lemma 4.8 Let s be a node in Pi. Assume that algorithm A can be applied to perform the gather (respectively, scatter) operation based on SBTn-k(s) in X time steps. Then algorithm A can be applied to perform the gather (respectively, scatter) operation based on STF:(s) in no more than X time steps. Proof: By Lemma 4.7, STFfls) is isomorphic to a subgraph of SBTn_k(s). To show that algorithm A can be applied to the STF, we take the following approach: 80 0 Map the STF/f(s) to the subgraph of SBT.._;.(s) that is isomorphic to STFfls), and 0 Based on the SBT, apply algorithm A to the subgraph and take “no-operation” whenever the node(s)/link(s) that should be involVed is not in the subgraph. Clearly, it takes no more than X time steps for algorithm A to complete the gather (scatter) in the STF. 0 Therefore Algorithm 4 can be adopted for gathering in the generalized Fibonacci cubes. We next analyze the time required by the algorithm. Again, we consider the one- and all-port models separately. 4.5.1 One-port Model Recall that the scheduling is a critical issue for minimizing time steps in the one-port broadcasting. This is also true for the one-port gathering. Notice that schedul- ing is implicitly defined in Algorithm 4, since at each step an active node is send- ing/receiving message(s) to/from exactly one other node. Indeed, the scheduling in Algorithm 4 is the reverse of the MSLF (refer to Fig. 4.9) for the single node broadcasting. Therefore Algorithm 4 is exactly a one-port gather algorithm for the hypercube, and its complexity is as follows. Lemma 4.9 Under the one-port model, Algorithm 4 takes 2" — 1 time steps for any single node gather in B" for n 2 1, which is optimal. Proof: The proof of time complexity is given in [100]. The optimality follows from the observation that the root needs to collect 2" -— 1 data, which takes at least 2" - 1 time steps under the one-port model. Cl We The in F. 4.5.2 Imprc We ex Mingle 1 81 By Lemmas 4.7, 4.8 and 4.9, one should be able to see that under the one-port model, “‘l‘ — 1 time steps. any single node gather in I“: can be achieved in no more than 2 On the other hand, since the node that initiates the gather operation needs to receive 1‘1“] - 1 messages, a lower bound for any one—port singlenode gather in Ff, is Fl,” - 1. We now present a scheme for single node scatter, which attains this lower bound. The scheduling rule is: The source node sends the message to the farthest node first (break ties arbitrarily) along the links in the STF. The above scheme will be referred to as the STF Farthest Node First (STF-FNF) rule. To analyze the STF-FNF scheme, let N.- denote the number of nodes with (Hamming) distance i from the root (i.e., the number of nodes in level i of the STF). By the STF-FNF rule, the nodes in level 1 will be scheduled between FF] — N1 and F5,” — 1, and the messages will reach them in one time step. Similarly, the nodes in level 2 will be scheduled between Fl,” — N1 — N2 and Fl,” - N1 — l, and the messages will reach them in two time steps. In general, it can be shown that the STF-FN F requires Fl,” —1 time steps to complete any scatter in 1‘: as long as N.- _>_ 1. Therefore we have proved the following theorem. Theorem 4.7 Under the one-port model, a single node scatter (respectively, gather) in Ff, can be achieved in exactly 171"] — 1 time steps for n > k 2 2, which is optimal. 4.5.2 All-port Model Improvements can be obtained by taking advantage of the all-port communications. We examine the case in the hypercube first. Lemma 4.10 Under the all-port model, Algorithm 4 takes 2"“1 time steps for any single node gather in 8,, for n 2 1. me utili 82 Proof: Since all SBTs rooted at different nodes are isomorphic, we need only examine the case for SBT..(0). Observe that SBT..(0) can be divided into two SBTs: T; which consists of the node 10""1 together with all its descendants, and T o the remainder of SBT..(0). We prove the lemma by induction on n. Clearly, Algorithm 4 takes 1 time step when n = 1. Assume that the lemma is true for all n < I, where 1 denotes an (arbitrary) positive integer. Let us consider the case when n = I. By induction hypothesis, Algorithm 4 takes 2"2 time steps in both To and T1 and they can be done in parallel. A (21"2 + 2’") time scheduling is easy to obtain, because node 10""1 can gather all 2"1 messages located in T1 in 2"2 time steps then passes all of them to node 0. The key to achieve a 2"1 time scheduling is that node 10’”1 should performs the gathering and passing messages to node 0 concurrently. Indeed it can be shown that the following scheduling rule will result in 2’ "1 time: Given a node u with all messages gathered in it, u sends the message originated from the node with the largest label first. Again, by Lemmas 4.7, 4.8 and 4.10, we have the following result. Theorem 4.8 Under the all-port model, any single node gather (respectively, scatter) in F: can be achieved in no more than 2"""1 time steps for n > k 2 2. Indeed, it can be shown that a tighter upper bound for the single node gather/ scatter is IT,|, where IT,| is the maximum size (number of nodes) of the subtrees branched from the root. This is because the root can follow the STF-FNF in sending the messages on all its incident links concurrently. By Lemma 4.7, IT.| S 2""I“l for Ft. Remark: A way to improve the all-port single node gathering in B., is to fully utilize all the n links incident from the root by partitioning all other nodes into at for {t is ‘0 nc 83 equivalent class based on their weights.’ Based on this idea, the “perfectly balanced spanning trees” defined in [10, 9] have been proposed for all-port single node gathering in B... They have shown that the gathering can be achieved in (if; 1 time steps, which is optimal, because the root needs to collect 2” .— 1 messages through its logn links. However, it can be shown that their approach do not apply to the generalized Fibonacci cubes directly, due to the “missing” nodes and links. 4.6 Multinode Broadcasting and Accumulation Another useful data communication primitive is the multinode broadcasting [10, 9] (or all-to-all broadcasting [67]) in which every node sends a data (message) to all the other nodes. Similarly, the multinode accumulation is a dual operation of the multinode broadcasting. A naive approach is to broadcast the message from each node in turn by using the single node broadcasting algorithm. Trivially, the time required by this approach is Fy‘] - Tb, where T5 is the time required for the single node broadcasting algorithm. A simple idea from [100] can make much improvement. Let (p, q) be a link in B... At a step i node p should send all received messages originating from s to q such that the Hamming distance between 3 and q is 2'. Since B.. is a connected graph with diameter n, it can be shown that after n steps (iterations) a message originating from a node will reach all the other nodes. Formally, a multinode broadcasting algorithm for B.. is described as follows [100]. Under the all-port model, (1) and (2) in Algorithm 5 can be performed in parallel. n — 1 It is shown in [100] that at the 2"” step, node p will send exactly messages i - l to node q. Therefore by summing i from 1 to n the total time required by Algorithm 5 ‘The weight of a binary code is defined as the number of 1’s in it. f hm 84 Algorithm 5 All-port Multinode Broadcasting in B.. : for i=1 to n do: for every link (p,q) do: (1)Send from p to q all received messages which originated from s such that H(s,q) = i. (2)8end from q to p all received messages which originated from s such that H(s,p) = i. is 2"”1 time steps. By Corollary 4.1, I": is a connected graph with diameter n — k. Hence it can be shown that Algorithm 5 also applies to the generalized Fibonacci cubes, and consequently the time required is as follows. Theorem 4.9 Under the all-port model, the multinode broadcasting (respectively, ac- cumulation) in F: can be achieved in no more than 2""""1 time steps for n > k 2 2. We now focus on the one-port model. Let us consider the multinode broadcasting in a ring topology with size N, denoted by RN; it is known [10, 6,7] that Algorithm 6 gives an optimal result. Algorithm 6 One-port Multinode Broadcasting in RN : Node j sends its broadcasting message to node (j + 1) mod N. for i=2 to N—1 do: Node j sends the broadcasting message received in step i—l to node (j +1)modN. Clearly, the time required by Algorithm 6 is exactly N — 1 time steps. A lower bound for any one-port multinode broadcasting is also N — 1, since each node needs to receive N — 1 messages. Therefore, we have proved the following lemma. 4.7 scatte' 85 Lemma 4.11 Under the one-port model, the multinode broadcasting (respectively, accumulation) in a ring of size N can be achieved in N — 1 time steps, which is optimal. By Theorem 3.3, the length of the longest cycle in F: is Fl“) if Ff] is even and F3] — 1 if otherwise. Therefore, by embedding a longest cycle in Ft, we have the following result. Theorem 4.10 Under the one-port model, the multinode broadcasting (respectively, accumulation) in P: (where n — 3 Z k 2 2) can be achieved in Fl,” time steps if Elf] is even and in 2151"] -— 3 time steps if otherwise. Proof: The case when Fl,” is even follows from Theorem 3.3 and Lemma 4.11. It remains to show the case when 151“] is odd. In this case, Ff, contains a cycle of length Fifi — 1. Let q denote the node that is not in the longest cycle; Figure 4.11 shows a schematic diagram of the longest cycle. The multinode broadcasting can be divided into two phases. In the first phase, all nodes in the longest cycle (i.e., all but node q) execute Algorithm 6 to accomplish a multinode broadcasting among them. In the second phase, (refer to Fig. 4.11) node q receives all the other Fl,” — 1 messages from node p and sends its message to s which forwards the message to r then to t from which the message travels clockwise along the cycle until it reaches p. The first phase takes Elf] — 2 time steps and the second phase takes Fl,” — 1. Therefore, the total time required is 2F,[,"] — 3 steps. [J 4.7 Summary and Remarks We have studied the routing, single node broadcasting/accumulation, single node scatter/ gather, and multinode broadcasting / accumulation data communication prim- in te stu erat difie or n ‘0 a1 86 9.0 r st Figure 4.11. One-port multinode broadcasting through the longest cycle. itives in the generalized Fibonacci cube. The SBTs have been widely used to imple- ment data communication in the hypercube. Indeed, the broadcasting/ accumulation and scatter/ gather primitives discussed in this chapter can be efficiently achieved by employing the SBTs combined with some appropriate scheduling rules [67] [100]. It is known that the hypercube is a regular graph; all SBTs with different roots in B.. are isomorphic. This fact simplifies the analysis of the SBT-based data communica- tion primitives, since only SBT..(0) need to be considered. Here, for the generalized Fibonacci cube, we have defined the. STFs and presented the STF-based algorithms for single node broadcasting/ accumulation and single node scatter/ gather. Although the STFs with different roots in Ff, are not isomorphic in general, we have presented an algorithm which allows us to transform an SBT to an STF. Moreover, by show- ing that an STF is isomorphic to a subgraph of an SBT, we have presented a new technique for studying data communication in the generalized Fibonacci cube which is irregular in general. Table 4.1 summarizes the communication primitives and their time complexities studied in this chapter. One communication primitive we have not yet addressed is the total exchange op- eration [9] (or all-to-all personalized communication [67]) in which every node sends different messages to all the other nodes. This is equivalent to the multinode scatter or multinode gather, since every node needs to receive and send different messages to all the other nodes. Clearly, the total exchange can be implemented by executing 87 Table 4.1. Time complexities for communication primitives in Ft. II Communication_ Primitive [ Model I Time I] single node broadcasting all-port d: (accumulation) one-port _ n — k single node scatter all-port 2"“1’1 (gather) one-port Fl,” - l multinode broadcasting all-port 2"""l (accumulation) one-port Fl,“ — It 2541- 3: ‘The distance between source node 8 and all the other nodes. 1 If Fl.” is even. 1 If Fl.” is odd. single node scatter (gather) by all nodes, taking turns. This results in Fl,” . T, com- plexity, where T. is the time required by the single node scatter (gather). Obviously improvements can be made. One approach is to invoke single node scatter (gather) concurrently on all nodes. However, this leads to a queueing delay. The analysis of this approach remains open. (1 (11 he [67 the (list 2‘ Hi ll 7; CHAPTER 5 GRAPH EMBEDDING AND DATA COMMUNICATION IN INCOMPLETE HYPERCUBE In this chapter, we will study the Hamiltonian problem and data communications in the Incomplete HyperCube. Much research has been undertaken to study the Incomplete Hypercube. Many useful properties of the Incomplete Hypercube, such as diameter and traffic density, have been shown [109] to be comparable with that of the hypercube. In [110], Tzeng et al. have shown that the binary trees and the two- dimensional meshes can also be embedded in the Incomplete Hypercube. Katseff [68] has presented algorithms for routing and single node broadcasting. Recently, more sophisticated single node broadcasting algorithms based on results for the hypercube [67] have been devised and analyzed by Tien et. al. [107]. These results suggest that the Incomplete Hypercube is an attractive candidate for interconnecting parallel and distributed systems. Note that results reported in [107] [109] [110] are for Incomplete Hypercubes of 2"+ 2“ nodes, where 0 S k < n. It can be seen that the “incomplete hypercubes” studied in [107] [109] [110] represent only a small subset of the general case. For example, there 88 In [In con Hm sing for t ter/g and, 89 are only 10 different such incomplete hypercubes within the size between 21° and 211 (exclusive); on the other hand, there are a total of 210 different Incomplete Hypercubes in such a range. In this chapter, we present embedding and data communication primitives for I N where N can be any positive integer. As seen in Chapter 3, embeddings of linear array and ring are essential to the embeddings of other structures such as mesh. Here we will show that the longest cycle in I N contains N nodes if N is even, and N — 1 nodes if otherwise. Nevertheless, IN always contains a Hamiltonian path. Consequently, our results on embeddings of linear array and ring in the Incomplete Hypercube are both optimal in terms of minimal dilation. A single node broadcasting algorithm for the Incomplete Hypercube has been pre- sented in [68], which is essentially the same as that for the generalized Fibonacci cube. In [79, 82], we have presented optimal algorithms for the single node broadcasting under both the all- and one-port communication models. The all-port broadcast- ing algorithm is a result from [68]; the MSLF scheduling scheme was applied again to attain an optimal one-port broadcasting. Here, we further'devise and analyze other communication primitives, such as accumulation, gather and scatter, for the Incomplete Hypercubes. In this chapter, we write N (Z 1) to denote the total number of nodes in the Incomplete Hypercube, and n = [log N] . The remainder of this chapter is organized as follows. Section 5.1 shows that IN contains a Hamiltonian path and rings can be optimally embedded in the Incomplete Hypercubes. Routing and single node broadcasting are reviewed in Section 5.2. The single node broadcasting algorithms are analyzed in Section 5.3. Optimal results for the single node accumulation are reported in Section 5.4. The single node scat- ter/ gather is studied in Section 5.5. Section 5.6 considers the multinode broadcasting and accumulation. Finally, we summarize our results in Section 5.7. 90 5.1 Hamiltonian Problem In this section we study the Hamiltonian problem in the Incomplete Hypercube. We will show that IN (N Z 1) contains a Hamiltonian path. Additionally, IN (N 2 4) contains a Hamiltonian cycle if and only if N is even, whereas the longest cycle in I N contains all but one node if N is odd. The embedding of cycles in the Incomplete Hypercubes is based on Gray codes. There are many different ways to generate Gray codes. Here we adopt the known Binary Reflected Gray Code (BRGC), see e.g.,[25, 101], to achieve our embedding. Let G... denote the sequence of m-bit BRGCs. It is easily seen that G1 = (0,1). To build the 2—bit BRGCs, prefix a ‘0’ to each code in G1, then reverse G1 and prefix a ‘1’ to each code. In other words, we get G2 = (00, 01, 11, 10). More generally, let G.- denote the sequence of codes obtained from G. by reversing its order, and 0G. (respectively, 1G.) the sequence of codes obtained from G.- by prefixing a ‘0’ (respectively, ‘1’) to each element of the sequence G.'. The m-bit BRGCs can be generated by the recursion [101]: G... =(0G..._1,IG..._1) Table 5.1 shows the 5-bit BRGCs, where G5(i) is the ith (0 S i S 31) BRGC in the sequence G5. Let G5(i) denote the (regular) 5-bit binary code for integer i. Table 5.1 also lists the integer j such that C'5( j ) = G5(i) for each i and j. The following observation is important: Lemma 5.1 The binary codes C...(i) and C...(i + 1) difl'er in bit 0 if and only ifi is evenforOSiSZm—l. Consequently, C...(i) and C...(i + l) are two consecutive BRGCs, although C...(i) may come before or after G...(i + 1) in the sequence Gm. 91 Table 5.1. 5-bit Binary Reflected Gray Codes. i 05(1) j i 05(1) j i 05(1) j i 05(1) j 0 00000 0 8 01100 12 16 11000 24 24 10100 20 1 00001 1 9 01101 13 17 11001 25 25 10101 21 2 00011 3 10 01111 15 18 11011 '27 26 10111 23 3 00010 2 11 01110 14 19 11010 26 27 10110 22 4 00110 6 12 01010 10 20 11110 30 28 10010 18 5 00111 7 13 01011 11 21 11111 31 29 10011 19 6 00101 5 14 01001 9 22 11101 29 30 10001 17 7 00100 4 15 01000 8 23 11100 28 31 10000 16 5.1.1 Cycles Our embedding of ring is based on the longest cycle in the Incomplete Hypercube. We first consider cycles in the Incomplete Hypercube. Theorem 5.1 A cycle of length l can be embedded in IN, where l is even and 4 S l S N. Proof: 3 Since the result for the (complete) hypercubes is known [101], we assume that N is not a power of 2. Notice that I N contains a hypercube B.._1 . Hence according to [101] cycles of length I can be embedded in B.._1 for 4 S l S 2"“. It remains to show that the theorem is true for 2'"1 < l S N. We distinguish between the following two cases: Case 1: 2’”1 +1 S N S 3-2”‘2. Define M = 3 - 2”"2 - N. Let L0 = 00G.._2, L1 = 01G.._2 and L2 = 10(G..-2 \ M), where (G..-g \ M) denotes the sequence of codes resulted from the removal of the M C.._g(j)’s from G.._2 for 2"“2 — M S j < 2””. Figure 5.1(a) shows a schema, where by selecting nodes alternatively between Lo and L2, a cycle of length 22 can be embedded in 122. Indeed, define two nodes j and j +1 to be a pair, where j is an even number and 2""1 < j < N. By Lemma 5.1, C..( j ) and C..( j + 1) are two consecutive 92 BRGCs. Hence a pair of “up” and “down” edges can be used to include the pair j and j + 1 in the L2 to form a cycle. Consequently, every node in L; can be included to form a cycle if it is paired with another node. Case2: 3-2""+1SNS2"—1. . DefineM = 2"—N. Let L0 = 00G.._2, L1 = 01G.._2, L; = 10G.._g and L3 = 11(G..-2\ M), where (G...; \ M) is defined the same way as in case 1. Figure 5.1(b) shows that by selecting nodes alternatively between Lo and L2, and nodes alternatively between L; and L3, a cycle of length 28 can be embedded in [28. Again, only an even number of nodes in L3 can be included in a cycle as we observed in case 1. D "it“ La I-o L1 0‘ m 001 011 010 110 In 101 100 (II) La Lo ”'1 L: ‘1 000 001 on one 101 100 (b) Figure 5.1. Embedding of longest cycles in (a) In, and in (b) 130. We next study cycles of Odd lengths. By applying the same schema described in the previous proof, Figure 5.2 shows an embedding of a cycle with length 22 in 123. Note that node 10110 (22) is excluded from the cycle, since node 23 is not in 123. Recall the fact that there is no cycle of odd length in the hypercube (Lemma 3.3). 93 000 001 O11 010 110 111 101 100 Figure 5.2. Embedding of a cycle with length 22 in 123. Furthermore, Lemma 2.2 shows that I N is a subgraph of B... Hence we have Lemma 5.2 There is no cycle of odd length in the Incomplete Hypercube. Consequently, we have proved the following results: Theorem 5.2 For N _>_ 4, I N contains a Hamiltonian cycle if and only if N is even, whereas the length of the longest cycle in I N is N — 1 if and only if N is odd. Corollary 5.1 A ring of size N can be embedded in IN with dilation one if N is 7 even, with dilation two if otherwise. Proof: It is clear when N is even. For N being Odd, the only node that cannot be included in the longest cycle has at least one adjacent node in the cycle. Hence the embedding can be achieved by paying a higher price (i.e., dilation is 2). Cl 5.1.2 Hamiltonian Path We now focus on the embedding of linear array. By Theorem 5.1, the graph I N contains a Hamiltonian path when N is even. Figure 5.3 shows that by replacing edge (00110,00111) with edge (10110.00110), 123 contains a Hamiltonian path. More generally, we will prove the following result: Theorem 5.3 IN contains a Hamiltonian path. 94 0‘ one 001 011 010 110 111 101 100 Figure 5.3. Embedding of a Hamiltonian path in 123. Proof: It is easy to verify for N S 4. For N Z 5, we distinguish between the following two cases: Case 1: N is even. In this case, I N contains a Hamiltonian cycle according to Theorem 5.2. By removing exactly one edge from the cycle, we have a Hamiltonian path. Case 2: N is odd. Recall that nodes are labeled between 0 and N -— 1. Based on the schema presented in the proof of Theorem 5.1, we are able to embed a cycle, LC, in I N, which contains all but node N — 1. Now, consider the two nodes N — 1 — 2’“1 and N — 2"“. Observe that the number N - 1 - 2"“1 is even and hence N — 2""1 is odd. As a consequence of Lemma 5.1, the edge (N — 1 — 2"“1,N — 2"”) is in LG. Furthermore, the edge (N - 1,N — 1 — 2"“) is also in IN. The path formed by the removal of the edge (N — l — 2””1,N — 2"”) and the addition of the edge (N — 1, N — l — 2"“) to LG is exactly a Hamiltonian path. CI 5.2 Routing and Single Node Broadcasting In this section, we review the routing and the single node broadcasting algorithms proposed for the Incomplete Hypercubes. The routing problem in the Incomplete Hypercube was first studied by Katseff [68]. Indeed, Algorithm 1 for routing in the 95 generalized Fibonacci cubes is essentially the same as that proposed in [68]. Figure 5.4 shows the structure of Is and the routing path (with arrow in each link) from node 011 to 100, which is determined by Algorithm 1. Lemma 5.3 is a direct result from 100 Figure 5.4. Routing path from node 011 to 100 in 16. [68], which says that Algorithm 1 always finds a shortest path between any two nodes. Lemma 5.3 Lets and d be two distinct nodes in IN, and H(s,d) the Hamming dis- tance between these two nodes. Algorithm 1 always finds a shortest path of length H(s,d) from s to d. It is also shown in [68] that the routing paths determined by Algorithm 1 are deadlock- free. Given any two nodes in I N, by Lemma 5.3 there is a path between them. Hence I N is a connected graph. Furthermore, the Hamming distance between the two nodes 2""1 (with address 100 - - 0 0) and 2'”1 — l (with address 011 - - - 1) in IN is exactly n. Therefore we have proved the following result. Lemma 5.4 IN is a connected graph and its diameter is n. Katseff [68] has also proposed an algorithm for single node broadcasting in the Incomplete Hypercube, which is essentially the same as Algorithm 2 presented in Section 4.2. Figure 5.5 illustrates an example of a single node broadcasting from node 0101 in I“, when Algorithm 2 is applied. Notice that each node is represented 96 by an ellipse labeled with its address; only the links determined by Algorithm 2 are shown in the figure. The travel array received at each node is shown above it, whereas the source node initializes the array to all 1’s (TRUE). Figure 5.5. A broadcasting from node 0101 in In. It is known [68] that the set of nodes and links traversed by executing Algorithm 2 for any single node broadcasting in I N form a spanning tree, and the spanning tree will be referred to as STI. We write 5' TI N(r) to denote the STI in I N rooted with node r. Figure 5.5 also shows the STIu(0101). Observe that two STIs rooted at different nodes are generally not isomorphic, which is similar to what we saw in the case of the STFs. The next lemma is a result from [68], which shows that the single node broadcasting algorithm sends the message through the same path as determined by the routing algorithm. Lemma 5.5 Lets and t be two distinct nodes in IN. Let p be the path determined by Algorithm 1 to route a message from s to t. If a message is broadcast from s by Algorithm 2, then it will reach t along a unique path that coincides with p. 97 5.3 Analysis of Single Node Broadcasting In this section, we analyze the single node broadcasting. We will follow the assump- ' tions made in Chapter 4, i.e., all messages are of unit length and each takes a unit time for transmission between two connected nodes. 5.3.1 All-port Model Theorem 5.4 shows that Algorithm 1 is optimal under the all-port model. Theorem 5.4 Let d, be the maximal distance between an (arbitrary) node 3 and all the other nodes in IN. Then, under the all-port communication model, Algorithm 2 can optimally complete a single node broadcast from node 3 in exactly d, time steps. Proof: It follows from Lemmas 5.3 and 5.5. CI Notice that d. is n if both nodes 3 and 2" - s — 1 (i.e., 3) exit in IN, and n — 1 if otherwise. 5.3.2 One-port Model As we saw in the case of the generalized Fibonacci cubes, the scheduling scheme makes a critical difference in the overall broadcasting time. Recall that the MSLF scheme has been proposed for one-port single node. In what follows, we will show that the same one-port single node broadcasting algorithm presented in Section 4.3.2 is Optimal when applied to the Incomplete Hypercube. To motivate, we revisit the broadcast from node 0101 in In. Figure 5.6 shows that the one-port broadcasting requires 4 time steps, where each link is labeled by the time step at which the link is involved in forwarding the message. Notice that the distance between the nodes 0101 and 1010 is exactly 4, which is a lower bound for such a broadcasting. Hence the one-port algorithm is optimal in this case. We Figure 5.6. One-port broadcasting from node 0101 in [11. next show that the one-port broadcasting algorithm takes exactly n time steps for any single node broadcasting in I N. It is necessary to examine closely the relation between the STIs and the SBTs at this point. First we try to apply Algorithm 3 to construct STIs. Figure 5.7 illustrates a a - o A 001.1 a 0101 3 on. .o o o 000:. 0010 on 0100 0101 one 1 0 o 0 000 001 0010 zoo Figure 5.7. Construction Of STI30(01110) from SBT5(01110). construction of S T130(01110) from SBT5(01110). It is seen that the parents of nodes 10110, 11010 and 11100 are switched respectively to nodes 00110, 01010 and 01100, since node 11110 is not in 130. However, Algorithm 3 fails to give a construction of STI22(01110), because both nodes 11110 and 10110 are not in In whereas node 10010 99 I (a child of 10110) is in In (refer to Fig. 5.7). Further effort leads to a construction of the STI;2(01110) from the SBT5(01110) as illustrated in Figure 5.8. Notice that the (3132223 «In» ' an!» ‘ an; (3333) (2:139 (325.333) Gifiiib an» «In-1 «mp am an» @- o . 0 «m» «m sass cat-2;?» @293» (2:13:95 «an» «as!» am 1 ' O «m «In cases «mm 0 «as Figure 5.8. Construction of 5T122(01110) from SBT5(01110). link sequences (4,3,2) and (3,2,4) connect node 01110 to node 10010 in SBT5(01110) and STIn(01110), respectively. Similarly, the link sequences (4,3,1) and (3,1,4) con- nect node 01110 to node 10100 in the SBT and the STI, respectively. This Observation leads to the following lemma which extends Lemma 4.4. Lemma 5.6 Let uo be a node in IN. Let (uo, u1,- - - , 11..., 11;, u.+1, - . - , u.+,-_1, u...,-) be the path in SBT..(uo) connecting node no to u.-+,-, and (l1, l2, - - . , l.-, l.-+1, l.-+2, - - - , l.-+,-) the corresponding link sequence. Assume that 11.4, is in I N and Ui+j_1 is not in IN. Then I. There exists a node u. that is in IN whereas all 21,-4.1, 11,-4.2, - - - ,u.+,_1 are not in IN: 2. Within B.., the link sequence (l.-+2,l.-+3, - - ~ , l.-+,-) from u.- leads to a node u that is in IN, and 3. Within B.., link l.-+1 from u leads to u.-+,-. 100 Proof: By back tracking from node u.-+,- in SBT..(uo) to the root uo, one should beable to see that there exists a node u,- that is in I N. This is true because at least uo is in I N. (Note that u.- may be the same node as uo.) Next we prove that u is in I N. The following two- Observations are crucial: 1. u.-+,- and u differ in exactly bit l.-+1, and 2. (Lemma 5.8) bit I.“ of u is 0 and bit I.“ of u,+,- is 1 Since u.-+,- is in IN, node u must be in IN. This is because, by observation 2, the (integer) label of u is less than that of u.-+,-. Finally, the edge (u, u.+,-) in B.. (or, more precisely, I N) is because both nodes are in I N and they differ in exactly bit l.-+1 as we just observed. [2] As in the case of the generalized Fibonacci cubes, links in STIN(s) can also be classified into two sets: set A includes links which are not in SBT..(s), and set B the remaining links. For example, links (00010,10010) and (00110.00010) in S TI22(01110) are in set. A and B respectively (refer to Fig. 5.8). Links in set A are resulted from some “missing” nodes (relative to B..). Similarly, a link I in set A connects two special nodes: the one defining l as an out-edge is called type-a node and the other is type-fl. The following lemma shows that each type-oz node has exactly one type-,6 node as its child. Lemma 5.7 Let u be a type-a node in STIN(s). Then u has exactly one type-fl node as its child in STIN(s). We need the following crucial observation before proving Lemma 5.7. Lemma 5.8 Let (q,r) be a set A link in STIN(s) such that i is its link number and q is a type-a node. Then bit i ofq is 0. 101 Proof: Note that q must be an internal node in an STI, because no links in set A could incident from a root node. Assume that p is the parent of q in STI,.(s) and the link number of (p, q) is j. By definition i > j. We prove by contradiction that bit i of q is 0. Suppose that bit i of q is 1, then bit i of p is also 1. Notice that p and r differ in exactly bits i and j. Define t = p — 2‘. Clearly, node t is in IN. According to the MSLF scheme, at node p the one—port algorithm would forward the message through link i to node t which, however, would forward the message to r. This is a contradiction. C] Now, we proceed to prove Lemma 5.7. Proof: Observe that u must be an internal node in 5 TI N(s). Let p be the parent of u and k be the link number of link (p, u). We prove the lemma by contradiction. Suppose u has two children v and w and the link numbers for (u, v) and (u, w) are i and j, respectively. Without loss of generality, assume that i = 2, j = 1 and k = 0. By Lemma 5.8, both bits 2 and 1 of u (and hence p) are 0. Consider the following two cases: (a) bit 0 of u is 1, and (b) bit 0 of u is 0 as diSplayed in Figure 5.9. Observe that N must be at least 6 and 5 in case a and case b, respectively. By the one-port algorithm, the broadcasting path from p to v would be (000,100,101) in case a. Similarly, the broadcasting path from p to w would be (001,011,010) in case b. Both are contradictions. Hence we have proved the lemma. [3 By Lemmas 5.6 and 5.7, one should be able to design an algorithm similar to Algo- rithm 3, which allows us to transform an SBT to an STI. Hence we have the time required by the one-port broadcasting algorithm. Theorem 5.5 The one-port single node broadcasting algorithm requires at most n time steps to broadcast one message from any node in I N. 102 (a) (b) Figure 5.9. Type-a node and its type-fl child. Proof: (Sketch) Consider the scheduling in type-a nodes. Let (uo,u1,- - - ,u.~-1,u.-,u.-+1, - - - ,u.-+j-1,u.-+,-) be a path and u be the node as specified in Lemma 5.6, i.e., link (u, u.-+,-) is in set A, and u (respectively, u.-+,-) is a type-a (respectively, type-fl ) node. Within B.., the broadcasting message will reach and at step 12. — 1.4,. On the other hand, it can be shown that the broadcast message will reach node u in no more than n — 1...,- - 1 time steps. By Lemma 5.7, there is exactly One set A link originated from u. Under the MSLF scheme, u will forward the message through the set A link first, then through set B link(s). Therefore, the message will reach u.+,- (a type-,6 node) no later than n — 1..., steps, i.e., at a time no later than that resulted from the one-port broadcasting in B... For nodes that are not type-a, it can be shown that the scheduling on them will not be delayed by the type-a nodes. C] Remark: Let (uo,u1, - - - ,u.) be a path in STIN(uo). Figure 5.10 shows that there may be two type-a nodes in such a path, i.e., nodes 0001 and 1000. (In general, there may be more than two type-a nodes in a path.) Since it takes only 1 (which is 1 time step earlier compared to the scheduling in SBT4(1010)) time step for the broadcasting message to reach node 0001, one should be able to see the scheduling 103 in node 1000 will not be affected (delayed) by a type-a ancestor. This is similar to what we saw in the case of the generalized Fibonacci cubes. Figure 5.10. 3711110010). Corollary 5.2 The one-port broadcasting algorithm is optimal in terms of minimal time steps. Proof: It suffices to show that n is a lower bound for any one-port broadcasting algorithm. To see this, under the one-port model the total number of nodes receiving broadcast message is at most doubled for every consecutive time step. Hence it takes at least n steps for all N nodes to receive the broadcast message. C] We now show that accumulation, gather and scatter operations can also be effi- ciently implemented in the Incomplete Hypercube. The approaches taken here are similar to what we did in Chapter 4. 104 5.4 Single Node Accumulation As we saw in Section 4.4, the accumulation is a dual Operation of the broadcasting. Therefore, the times required for the all- and one-port single node accumulation in the Incomplete Hypercube are captured by Theorems 5.4 and 5.5, respectively. 5.5 Single Node Gather and Scatter Recall that the STF-FNF scheme for the single node scatter in the generalized Fi- bonacci cubes. Observe that all paths in the STIs are shortest-path, which is a key factor to the efficiency of the STF-FNF. Along the same line, we propose the following scheduling rule: The source sends the message to the farthest node first (break ties arbi- trarily) along the links in the STI. for scatter Operation in the Incomplete Hypercube. The above scheme will be referred to as STI, Farthest Node First (STI-F NF), and the time required by the STI-FNF is: Theorem 5.6 Under the one-port model, any single node scatter (respectively, gather) in IN (where N Z 2) can be achieved in exactly N — 1 time steps, which is optimal. Proof: The proof for time requirement is similar to the case of the generalized Fibonacci cubes. To see the optimality, observe that the root needs to send N - 1 messages. C] The following two lemmas will facilitate our analysis of the all-port single node gather / scatter. Lemma 5.9 Lets be a node in IN. The tree STIN(s) is isomorphic to a subgraph of SBT..(s). 105 Proof: Recall that in the case of the generalized Fibonacci cubes, Lemmas 4.4 and 4.6 are essential to the proof of Lemma 4.7. Lemmas 5.6 and 5.7 show that the Incomplete Hypercube possesses similar properties. We omit the detailed proof. D Figure 5.11 shows that STIgg(01110) is isomorphic to a subgraph of SBT5(01110). It can be seen that node 00110 in the STI should be mapped to 11110 in the corre- sponding SBT. Consequently, node 00010 (respectively, 00100 and 00111) should be mapped to 10110 (respectively, 11010 and 11100) etc. «use 1 a o a 3 ° 0 0 e an: ® (O) am e . a o. m (in! am an» :I a 1 1 O 0 mm m» m, an:- (553- am» 3 O 1 O O (b) Figure 5.11. (a) STI22(01110) and (b) a subgraph of SBT5(01110). Lemma 5.10 Let s be a node in IN. Assume that algorithm A can be applied to perform gather (respectively, scatter) operation based on SBT..(s) in X time steps. Then algorithm A can be applied to perform gather (respectively, scatter) operation based on STIN(s) in no more than X time steps. 106 Proof: Similar to the proof of Lemma 4.8. Omitted. 0 Therefore, by Lemmas 4.10, 5.9 and 5.10, we have the following result for the all-port model. Theorem 5.7 Under the all-port model, any single node gather (respectively, scatter) in I N can be achieved in no more than 2"”1 time steps for N 2 2. Again, it can be shown that a tighter upper bound is |T.|, where |T,| (S 2"“) is the maximum size of the subtrees branched from the root. 5.6 Multinode Broadcasting and Accumulation Recall that Algorithm 5 has been proposed for the multinode broadcast- ing/accumulation in the generalized Fibonacci cubes. By Lemma 5.4, I N is a con- nected graph with diameter n. Hence it can be shown that Algorithm 5 also applies to the Incomplete Hypercube. Consequently, the time required. is: Theorem 5.8 Under the all-port model, the multinode broadcasting (respectively, ac- cumulation) in I N can be achieved in no more than 2""1 time steps for N Z 2. By Theorem 5.1, the length of the longest cycle in I N is N if N is even and N - 1 if otherwise. Again, by embedding a longest cycle in I N, we have the following result. Theorem 5.9 Under the one-port model, the multinode broadcasting (respectively, accumulation) in I N (where N Z 4) can be achieved in N — 1 time steps if N is even and in 2N — 3 time steps if otherwise. Proof: Similar to the proof Of Theorem 4.10. Omitted. Cl 107 5.7 Summary of Results We have studied the Hamiltonian problem and some data communication primitives in the Incomplete Hypercube. For the Hamiltonian problem the following Optimal results have been obtained: 1. I N contains a Hamiltonian path. Hence a linear array of size N can be embedded with dilation-1 , and 2. The longest cycle in IN (where N 2 4) contains N nodes if N is even, and N — 1 nodes if otherwise. Consequently, a ring of size N can be embedded with dilation-1 if N is even, with dilation-2 if otherwise. For data communication, we have studied the single node broadcasting/ accumulation, single node scatter/ gather, and multinode broadcasting/accumulationprimitives in the IncOmplete Hypercube. We have defined the STIs and presented the STI-based algorithms for single node broadcasting / accumulation and single node scatter / gather in the Incomplete Hypercube. The STIs with different roots in I N are not isomorphic in general. Again, by showing that an STI is isomorphic to a subgraph of an SBT, we have presented a new technique for studying data communication in the Incomplete Hypercube. Table 5.2 summarizes the communication primitives and their time complexities studied in this chapter. 108 ' Table 5.2. Time complexities for communication primitives in I N. IrCommunication Primitive I Model I Time II single node broadcasting all-port d; l (accumulation) one-port n single node scatter all-port 2"-l (gather) one-port N — 1 multinode broadcasting all-port 2"”l (accumulation) one-port N — 1f 2N -'- 31 ‘ The distance between the source node 8 and all the other nodes. 1 If N is even. 1 If N is odd. CHAPTER 6 CONCLUSION AND FUTURE WORK In this chapter, we summarize our main results, highlight our contributions, and discuss some directions for future work. 6.1 Main Results and Contributions We have studied a large family of irregular network topologies that consists of the generalized Fibonacci cube and the Incomplete Hypercube. Each member of the family is shown to be a subgraph of a hypercube. Additionally, both the generalized Fibonacci cube and the Incomplete Hypercube include the hypercube as a special case. The Incomplete Hypercube Offers unlimited sizes; however, it suffers from a low degree of fault-tolerance under certain situations. On the other hand, the generalized Fibonacci cube offers a guaranteed degree of fault-tolerance. To investigate their applications in parallel or distributed systems, we have pre- sented a collection of efficient graph embedding and data communication algorithms for this large family of network topologies. In particular, we have Obtained the fol- lowing graph embedding results for the generalized Fibonacci cube: 109 110 1. 1‘: (where n 2 k 2 2) contains a Hamiltonian path, hence a linear array. 2. If Fl” is even, the longest cycle in Ff,(where n - 3 Z k 2 2) contains all nodes, which implies that I‘I‘, embeds a ring of size Ff,” with dilation-l. On the other hand, if 151"] is odd, the longest cycle in 1‘: Contains all but one node, which implies that F: embeds a ring of size Ff] with dilation-2. 3. We have shown that some 2D and 3D meshes of certain sizes can be embedded in the generalized Fibonacci cube with expansion-1 and dilation-1. 4. We have shown that a hypercube can be embedded in Fifi. The Hamiltonian problem for the Incomplete Hypercube has also been studied. We have shown that I N contains a Hamiltonian path, and the length of the longest cycle in IN (where N Z 4) is N if N is even, and N - 1 if otherwise. For data communication in both the generalized Fibonacci cube and the Incom- plete Hypercube, we have devised efficient algorithms for the following primitives: 1. Single node broadcasting/ accumulation, 2. Single node scatter/ gather, and 3. Multinode broadcasting/ accumulation. All of our algorithms have been carefully analyzed under both the one— and all-port models. The family of network topologies studied in this dissertation has the following two areas of application. 1. It provides a spectrum of alternative structures for fault-tolerant computing in the hypercube systems. This is because each member of the family (a generalized Fibonacci cube or Incomplete Hypercube ) is a subgraph of a hypercube. 111 2. It makes small-sized incremental expansion possible. This is a very desirable property when constructing parallel or distributed systems that evolve with time. We consider the following as our contributions. 1. Most existing results of graph embedding and data communication are only applicable to a single kind of regular networks. Here we have presented new results that can be applied to a large family of irregular network topologies. 2. By using a few parameters, (e.g., n and k for the generalized Fibonacci cube, and N for the Incomplete Hypercube) to characterize members of the family, we have been able to described the relations between the structural properties and the algorithmic efficiency of a network. Our approach has provided a new framework for studying networks. 6.2 Future Work Obviously, there are many important issues that have not been addressed. The fol- lowing is a list of possible directions for future work. 1. Graph embedding: We have identified the following outstanding problems: 0 Embedding in hypercube: As we mentioned, this family of network topolo- gies studied can serve as a fault-tolerant structure for the hypercube. How- ever, we have not addressed the issue of how to efficiently embed a gener- alized Fibonacci cube or an Incomplete Hypercube in a faulty hypercube. e Subcube allocation: It has been widely anticipated that future parallel sys- tems will consist of thousands or even millions nodes. Since a typical user may not require all the processors, Hayes et al. [55] and Peterson et al. [95] 112 have suggested that the hypercube systems should support multiprogram- ming to promote system utilization. The. problem of subcube allocation in the hypercube systems has been studied extensively [3] [22] [25] [28] [38] [70]. Similarly, the problem of submesh allocation in the 2D mesh systems has also drawn much attention [76] [77] [85]. Since the generalized Fibonacci cube can also be decomposed into subcubes, we expect that the subcube allocation in the generalized Fibonacci cube can be tackled in a flexible fashion. 0 Optimal embedding of meshes: We have shown that some 2D and 3D meshes can be embedded in the generalized Fibonacci cube with expansion- 1 and dilation-1. Given a 2D or 3D mesh of certain size, how can we achieve a minimal expansion and/or minimal dilation embedding? 2. Structural properties: Our work is just a beginning. To have more complete knowledge of the family of networks, we need to investigate other topologi- cal properties such as bisection width, node/edge connectivities, independent paths, and the relation between the generalized Fibonacci cube and Incomplete Hypercube. . Application algorithms: A network topology is never useful if it does not support efficient algorithms. We have presented many tools for efficient graph embedding and data communication for the family of network topologies. Re- cently, we have shown that prefix computation can be done efficiently in the generalized Fibonacci cube [64]. We would like to investigate along the same line. . Optimization: We have studied a spectrum of new topologies, which supports different performance/structural features. Given a specification of desirable features, which may consist of a list of requirements (e.g., number of nodes, TR , 113 edges, degree, and performance features), how can we select member(s) from this family of networks that will satisfy the requirements with a minimal cost? BIBLIOGRAPHY BIBLIOGRAPHY [1] ABRAHAMSON, K., DADOUN, N ., KIRKPATRICK, D. G., AND PRZYTYCKA, T. A simple parallel tree contraction algorithms. Journal of Algorithms 10 (1989), 287—302. [2] AKL, S. G. The Design and Analysis of Parallel Algorithms. Prentice Hall, 1989. [3] AL-DHELAAN, A., AND BOSE, B. A new strategy for processors allocation in an n-cube multiprocessor. In Proc. IEEE the 8th Annual Int ’l Phoenix Conf. on Comput. Comm. (1989), pp. 114-118. [4] ANDERSON, E., AND ET. AL. LAPACK: a portable linear algebra library for high-performance computers. In Proc. of Supercomputing Conference (1990), pp. 2-11. [5] BANERJEE, P. Strategies for reconfiguring hypercubes under faults. In IEEE 20‘“ Intn’l Conf. on Fault-Tolerant Computing (1990), pp. 210-217. [6] BARNES, G. H., BROWN, R. M., KATO, M., KUCK, D. J., SLOTNICK, D. L., AND STOKES, R. A. The Illiac IV computers. IEEE Trans. Computers 027 (1968), 84—87. [7] BATCHER, K. E. Sorting networks and their applications. In Proc. of the Spring Joint Computer Conference (1968), AFIPS Press, pp. 307—314. [8] BECKER, B., AND SIMON, H.-U. How robust is the n-cube. Information and Computation 77 (1988), 162—178. [9] BERTSEKAS, D. P., OZVEREN, C., STAMOULIS, G. D., TSENG, P., AND TSITSIKLIS, J. N. Optimal communication algorithm for hypercubes. Journal of Parallel and Distributed Computing 11 (1991), 263-275. [10] BERTSEKAS, D. P., AND TSITSIKLIS, J. N. Parallel and Distributed Compu- tation. Prentice Hall, 1989. 114 115 [11] BHUYAN,‘ L. N., AND AGRAWAL, D. P. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Computers C-33, 4 (Apr. 1984), 323-333. [12] BLELLOCH, G. E. Scan as primitive parallel operations. IEEE Trans. Com- puters 38, 11 (Nov. 1989), 1526-1538. [13] BRAATEN, M. E. Solution of viscous fluid flows on a distributed memory con- current computer. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 243- 250. [14] BRENT, R. P. The parallel evaluation of general arithmetic expressions. Jour- nal of the ACM 21, 2 (Apr. 1974), 201—206. [15] BROWN, JR., .1. L. Zeckendorf’s theorem and some applications. Fibonacci Quarterly 2, 3 (1964), 163-168. [16] BRUCK, J ., CYPHER, R., AND SOROKER, D. Running algorithms efficiently on faulty hypercubes. In ACM 2”“ Symposium on Parallel Algorithms and Architectures (1990), ACM, pp. 37—44. [17] BRUCKMAN, P. S. The generalized Zeckendorf theorems. Fibonacci Quarterly 27, 4 (1989), 338—347. [18] CAPPELLO, P. R. Gaussian elimination on a hypercube automaton. Journal of Parallel and Distributed Computing 4 (1987), 288—308. [19] CHAN, M. Y. Dilation-2 embeddings Of grid into hypercubes. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 295-298. [20] CHAN, M. Y., AND CHIN, F. Y. On embedding rectangular grids in hyper- cubes. IEEE Trans. Computers C—37, 10 (Oct. 1988), 1285—1288. [21] CHAN, M. Y., AND LEE, S.-.I. Distributed fault-tolerant embeddings of rings in hypercubes. J. of Parallel and Distributed Computing 11 (1991), 63—71. [22] CHEN, G.-I., AND LAI, T.-H. Scheduling independent jobs on partionable hypercubes. Journal of Parallel and Distributed Computing 12 (1991), 74—78. [23] CHEN, M.-S., AND SHIN, K. G. Message routing in an injured hypercube. In Proc. 3'“ Conf. Hypercube Concurrent Computers and Applications (1988), SIAM, pp. 312—317. [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] 116 CHEN, M.-S., AND SHIN, K. G. Depth-first search approach for fault-tolerant routing in hypercube multicomputers. IEEE' Trans. Parallel and Distributed Systems 1, 2 (1990), 152—159. CHEN, M.-S., AND SHIN, K. S. Processor allocation in an n-cube multi- processor using Gray codes. IEEE Trans. Computers C-36, 12 (Dec. 1987), 1396—1407. CHEN, S.-K., LIANG, C.-T., AND TSAI, W.-T. Loops and multi-dimensional grids on hypercubes: Mapping and reconfiguration algorithms. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 315-322. CHRISTON, M. A vectorized 3—D finite element model for transient simulation of two-phase heat transport with phase transformation and a moving interface. In Proc. of Supercomputing Conference (1990), pp. 436-445. CHUANG, P.-J., AND TZENG, N.-F. Dynamic processor allocation in hyper— cube computers. In Proc. The 17th Annual Int’l Sympo. on Computer Archi- tecture (1990), pp. 40—49. CVETANOVIC, Z. The effects Of problem partitioning, allocation, and granular- ity on the performance of multiple-processor systems. IEEE Trans. Computers 035, 4 (Apr. 1987), 421-432. CVETANOVIC, Z., FREEDMAN, E. G., AND NOFSINGER, C. Efficient de- composition and performance of parallel PDE, FFT, Monte Carlo simulation, simplex, and sparse solvers. In Proc. of Supercomputing Conference (1990), pp. 465474. CYPHER, R., SANZ, J., AND SNYDER, L. Hypercube and shuffle-exchange algorithms for image component labeling. Journal of Algorithms 10, 1 (1989), 140—150. CYPHER, R., SANZ, J., AND SNYDER, L. Algorithms for image component labeling on simd mesh-connected computers. IEEE Trans. Computers G39, 2 (1990), 276-281. CYPHER, R., SANZ, J ., AND SNYDER, L. The hough transform has O(n) complexity on n times n mesh-connected computers. SIAM J. of Comput. 19, 5 (1990), 805—820. 117 [34] DALLY, W. J., AND SEITZ, C. L. Deadlock-free message routing in multipro- cessor interconnection networks. IEEE Trans. Computers C-36, 5 (May 1987), 547-553. [35] DEIWERT, G. S. Computational aerothermodynamics. In Proc. of Supercom- puting Conference (1989), pp. 51-57. [36] DEKEL, E., NASSIMI, D., AND SAHNI, S. Parallel matrix and graph algo— rithms. SIAM J. of Comput. 10, 4 (1981), 657—675. [37] DUTT, 5., AND HAYES, J. P. Designing fault-tolerant systems using automor- phisms. Journal of Parallel and Distributed Computing 12 (1991), 249—268. [38] DUTT, 3., AND HAYES, J. P. Subcube allocation in hypercube computers. IEEE Trans. Computers C-IO, 3 (Mar. 1991), 341—352. [39] EI-‘E, K. Embedding mesh of trees in the hypercube. Journal of Parallel and Distributed Computing 11, 3 (Mar. 1991), 222-230. [40] EL-AMAWY, A., AND LATIFI, S. Bridged hypercube network. Journal of Parallel and Distributed Computing 10 (1990), 90—95. [41] EL-AMAWY, A., AND LATIFI, S. Properties and performance of folded hyper- cubes. IEEE Trans. Parallel and Distributed Systems 2, 1 (Jan. 1991), 31—42. [42] ESI-‘AHANIAN, A.-H., NI, L. M., AND SAGAN, B. The twisted n-cube with application to multiprocessing. IEEE Trans. Computers C40, 1 (Jan. 1991), 88—93. [43] FISHBURN, J. P., AND FINKEL, R. A. Quotient networks. IEEE Trans. Computers C-31, 4 (1982), 288—295. [44] FOLDES, S. A characterization of hypercubes. J. Discrete Math. 17 (1977), 155-159. [45] FORTUNE, 5., AND WYLLIE, J. Parallelism in random access machines. In Proc. 10“ Annual ACM Symposium on Theory of Computing (1978), pp. 114— 118. [46] Fox, G., JOHNSON, M., LYZENGA, G., OTTO, S., SALMON, J., AND WALKER, D. Solving Problems on Concurrent Processors. Prentice Hall, 1988. [47] FOX, G. C., OTTO, S. W., AND .HEY, A. J. G. Matrix algorithms on a hypercube I: Matrix multiplication. Parallel Computing 4 (1987), 17-31. 118 [48] GANNON, D., AND VAN ROSENDALE, J. On the impact of communication complexity in the design of parallel numerical algorithms. IEEE Trans. Com- puters 33, 12 (Dec. 1984), 1180-1194. [49] GHOSH, K., AND FUJIMOTO, R. M. Parallel discrete event simulation using space-time memory. In Proc. Intn’l Conf. on Parallel Processing (1991), pp. III- 201—III—208. [50] GILBERT, E. N. Gray codes and paths on the n—cube. Bell System Technical Journal 37 (1958), 815-826. [51] GORDON, J. M., AND STOUT, Q. F. Hypercube message routing in the presence of faults. In Proc. 3'“ Con]. Hypercube Concurrent Computers and Applications (1988), SIAM, pp. 318—327. [52] GOULD, R. J. Updating the hamiltonian problem — A survey. J. of Graph Theory 15, 2 (June 1991), 121—157. [53] GRAHAM, R. L., KNUTH, D. E., AND PATASHNIK, 0. Concrete Mathematics. Addison Wisley, 1989. [54] HASTAD, J ., LEIGHTON, T., AND NEWMAN, M. Reconfiguring a hypercube in the presence of faults. In Proc. 19th Anna. ACM Symp. Theory of Computer Science (1987), ACM, pp. 274-284. [55] HAYES, J. P., MUDGE, T. N., AND STOUT, Q. P. Architecture of a hy- percube supercomputer. In Proc. Intn’l Conf. on Parallel Processing (1986), pp. 653—660. [56] HILBERS, P. A. J ., KOOPMAN, M. R. J., AND VAN DE SNEPSCHEUT, J. L. A. The Twisted Cube. Spring-Verlag, 1987, pp. 152-159. Lecture Notes in Computer Science. [57] HO, C.-T. Optimal Communication Primitives and Graph Embeddings on Hypercubes. PhD thesis, Yale University, May 1990. [58] HO, C.-T., AND JOHNSSON, S. L. Distributed routing algorithms for broad- casting and personalized communication in hypercubes. In Proc. Intn’l Conf. on Parallel Processing (1986), pp. 640—648. [59] HO, C.-T., AND JOHNSSON, S. L. On the embedding of arbitrary meshes in boolean cubes with expansion two dilation two. In Proc. Intn’l Conf. on Parallel Processing (1987), pp. 188—191. 119 [60] HO, C.-T., AND JOHNSSON, S. L. Embedding meshes in boolean cubes by graph decomposition. Journal of Parallel and Distributed Computing 8 (1990), 325-339. [61] HSU, W.-J. Fibonacci cubes - properties of an asymmetric interconnection topology. In Proc. Intn’l Conf. on Parallel Processing (1991), pp. I722—723. [62] HSU, W.-J. Fibonacci cubes-A new interconnection topology. IEEE Trans. Parallel and Distributed Systems (1992). To appear. [63] HSU, W.-J., PAGE, C. V., AND LIU, J. Embedding meshes in a large family of graphs. Parallel Processing Letters (1992). Accepted for publication. [64] HSU, W. J ., PAGE, C. V., AND LIU, J. Parallel construction of Fibonacci trees and prefix computations on a family of interconnection topologies. In Proc. CAAP/ESOP (1992). [65] JAKOB, R., AND HACK, J. J. Parallel MIMD programming for global models of atmospheric flow. In Proc. of Supercomputing Conference (1989), pp. 106- 112. [66] JOHNSSON, S. L. Communication efficient basic linear algebra computations on hypercube architectures. Journal of Parallel and Distributed Computing 4, 2 (1987), 133—172. [67] JOHNSSON, S. L., AND HO, C.-T. Optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Computers G38, 9 (Sept. 1989), 1249-1268. [68] KATSEFF, H. Incomplete hypercube. IEEE Trans. Computers C-37, 5 (May 1988), 604—608. [69] KERMANI, P., AND KLEINROCK, L. Virtual cut-through: A new computer communication switching technique. Computer Networks 3, 4 (1979), 267-286. [70] KIM, J., DAS, C. R., AND LIN, W. A processor allocation scheme for hyper- cube computers. In Proc. Intn’l Conf. on Parallel Processing (1989), pp. 11231- 11238. [71] KNUTH, D. E. The Art of Computer Programming - Fundamental Algorithms, 2 ed., vol. 1. Addison-Wesley, 1973. [721 [73] [74] [75] [75] [77] [78] [79] [80] [31] 182] [83] 120 KNUTH, D. E. The Art of computer Programming — Sorting and Searching, 2 ed., vol. 3. Addison-Wesley, 1973. LADNER, R. E., AND FISCHER, M. .1. Parallel prefix computation. Journal of the ACM 27, 4 (1980), 831—838. LANDER, E., AND MESIROV, J. P. Protein sequence comparison on a data parallel computer. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 257- 263. LEIGHTON, T. Introduction to Parallel Algorithms and Architectures: Arrays Trees Hypercubes. Morgan Kaufmann, 1992. L1, K., AND CHENG, K. H. Job scheduling in a partitionable mesh using a two-dimensional buddy system partitioning scheme. IEEE Trans. Parallel and Distributed Systems 2, 4 (Oct. 1991), 413—422. LI, K., AND CHENG, K. H. A two-dimensional buddy system for dynamic re- source allocation in a partitionable mesh connected system. Journal of Parallel and Distributed Computing 12 (1991), 79—83. LIU, J., CHIANG, C.-M., AND HUGHES, H. D. Performance analysis ofload- sharing for hypercube multiprocessor systems. Computer Systems Science and Engineering (1992). Accepted for Publication. LIU, J., AND HSU, W.-J. Routing and broadcasting algorithms for a large family of interconnection topologies. Tech. Rep. CPS-91-05, Dept. of Computer Science, Michigan State University, Oct. 1991. Submitted for Publication. LIU, J., AND HSU, W.-J. Distributed algorithms for shortest-path, deadlock- free routing and broadcasting in a class of interconnection topologies. In Proc. International Parallel Processing Symposium (1992), pp. 589—596. LIU, J., AND HSU, W.-J. Distributed algorithms for shortest-path, deadlock- free routing and broadcast in Fibonacci cubes. In Proc. 11‘” Phoenix Conf. on Computers and Communications (1992), pp. 23-30. LIU, J., AND HSU, W.-J. Optimal broadcasting and embedding in incomplete hypercube. Submitted for publication, May 1992. LIU, J., HSU, W.—J., AND CHUNG, M. J. Generalized Fibonacci cubes are mostly hamiltonian. Submitted for publication, May 1992. 121 [84] LIU, J ., AND HUGHES, H. D. Performance studies of hypercube multiprocessor systems. In Proc. Summer Computer Simulation (1990), Society for Computer Simulation. [85] LIU, J ., AND HUGHES, H. D. Dynamic job scheduling in partitionable mesh connected multicomputer systems. In Proc. Summer Computer Simulation (1992), Society for Computer Simulation. To appear. [86] LUBECH, O. M., AND FABER, V. Modeling the performance of hypercubes: A case study using the particle-in-cell application. Parallel Computing .9 (1988), 37-52. [87] MA, Y.-W. E., AND TAO, L. Embeddings among toruses and meshes. In Proc. Intn’l Conf. on Parallel Processing (1987), pp. 178-187. [88] MILLER, G. L., AND REIF, J. H. Parallel tree contraction and its application. In 26th Symposium on Foundations of Computer Science (1985), pp. 478-489. [89] MILLER, G. L., AND REIF, J. H. Parallel tree contraction Part I: fundamen— tals. In Advances in Computing Research, S. Micali, Ed., vol. 5. JAI Press Inc., 1989, pp. 47—72. [90] MILLER, G. L., AND REIF, J. H. Parallel tree contraction Part II: further applications. SIAM J. of Comput. 20, 6 (Dec. 1991), 1128—1147. [91] MILLER, R., AND STOUT, Q. Geometric algorithms for digitized pictures on a mesh-connected computer. IEEE Trans. on PAMI 7, 2 (1985), 216-228. [92] NASSIMI, D., AND SAHNI, S. An optimal routing algorithm for mesh-connected parallel computers. Journal of the ACM 27, 1 (Jan. 1980), 6—29. [93] NASSIMI, D., AND SAHNI, S. Data broadcasting in SIMD computers. IEEE Trans. Computers 30, 2 (Feb. 1981), 101-107. [94] PELEG, D., AND ULLMAN, J. D. An Optimal synchronizer for the hypercube. SIAM J. of Comput. 18, 4 (Aug. 1989), 740—747. [95] PETERSON, J. C., TUAZON, J. O., LIEBERMAN, D., AND PNIEL, M. The Mark III hypercube-ensemble concurrent computer. In Proc. Intn’l Conf. on Parallel Processing (1985), pp. 71-75. 122 [96] PREPARATA, F. P., AND VUILLEMIN, J. The cube-connected cycles: A versa- tile network for parallel computation. Communications of the ACM 24, 5 (May 1981), 300-309. [97] QUINN, M. J. Designing Eflicient Algorithms for Parallel Computers. McGraw Hill, 1987. - [98] RANKA, 8., AND SAHNI, S. Image template matching on MIMD hypercube multicomputers. In Proc. Intn’l Conf. on Parallel Processing (1988), pp. 92-99. [99] RANKA, 5., AND SAHNI, S. Hypercube algorithms : with Applications to Image Processing and Pattern Recognition. Springer-Verlag, 1990. [100] SAAD, Y., AND SCHULTZ, M. H. Data communication in hypercube. Tech. Rep. YALEU/DCS/RR-428, Dept. of Computer Science, Yale Univ., 1985. [101] SAAD, Y., AND SCHULTZ, M. H. Topological properties of the hypercubes. IEEE Trans. Computers C-37, 7 (July 1988), 867-872. [102] SCHWABE, E. J. On the computational equivalence of hypercube-derived net— works. In ACM 2"“ Symposium on Parallel Algorithms and Architectures (1990), ACM, pp. 388-397. [103] SEITZ, C. L. The cosmic cube. Communications of the ACM 28, 1 (1985), 22-33. [104] SHAPIRO, S. L., AND TEUKOLSKY, S. A. Building black holes, gravitational waves, and relativistic fluid flow: Supercomputer cinema. In Proc. of Super— computing Conference (1990), pp. 805—814. [105] STONE, H. S. Parallel processing with the perfect shuffle. IEEE Trans. Com- puters C-20, 2 (Feb. 1971), 153-161. [106] SULLIVAN, H., AND BASHKOW, T. R. A large scale homogeneous, full dis- tributed parallel machine, I. In Proc. 4th Symp. Comput. Architecture (1977), pp. 105-117. [107] TIEN, J.—Y., HO, C.—T., AND YANG, W.—P. Broadcasting on incomplete hypercubes. Tech. Rep. RJ 8130, IBM Corp., 1991. [108] TUTTE, W. T. Graph Theory. Addison-Wesley Publishing CO., 1984. . [109] TZENG, N.-F. Structural properties of incomplete hypercube computers. In Proc. Intn’l Conf. on Distributed Computing Systems (1990), pp. 262-269. 123 [110] TZENG, N.-F., CHEN, H.-L., AND CHUANG, P.-J. Embeddings in incomplete hypercubes. In Proc. Intn’l Conf. on Parallel Processing (1990), pp. III-335- 111-339. [111] TZENG, N. -F. ,AND WEI, S. Enhanced hypercubes. IEEE Trans. Computers 040, 3 (Mar. 1991), 284—294. [112] WAGNER, A. Embedding arbitrary binary trees in a hypercube. Journal of Parallel and Distributed Computing 7 (1989), 503-520. [113] WU, A. Y. Embedding of tree networks into hypercubes. Journal of Parallel and Distributed Computing 2 (1985), 238-249. [114] YANG, G.-C. Parallelizing spice2 on shared-memory multiprocessors. In Proc. Intn’l Conf. on Parallel Processing (1991), pp. III—151—III-158. STATE UNIV LIBRRRIE llll|l[ll|||2|]lll]|3 ollllle I] [I] III ll