c5: ‘ J. . in" :gm'fl F I )a. in: Hi... 11.1 .31.}?! n, . P. n In 1 tgo‘n 9‘ '41.. II... tv‘lcurlvnrfl vix;{u.|| Vina? . It}! a. c. a. r. A} ‘3. $5. ' ....\ .IA 7 ..- .!u" E i finadwfimmfl? . ...!fl..c.it "8&1! u... .. i 99‘ plan}; Hr! grooIIH ’Irra'a' .ol‘ ; - ’ii‘ fin-l Ina..!4,vli.b|)b'l . ’ fillut‘..l.. . ...; ‘ .I- 7'53}: r. gilt!“ rs nl.I.ior 51".. .it ...,Vrl. vi‘I.?l I.v.-: . . . pl; I..1. .1.!.-.aublr ..... .0... nvs..t:.pl .lnnl7. .Xl.. q A ¢_.I11le.|lll0 . ..Illtr . Al.l.....l-.l .llIIO". mm??? u I ' . ' H‘ ‘ i fig; 7) I ' 5". 1' .. a: luttalttfnvl.lxri|t;fl . s..‘;x-l.3. ! ..- 3.0.1111}; .. V .-.“.t...vx-....r...1 . . I!!i..l.!.t..ut - . . Ett’l..-f’lal‘l t 03"- u I... .r)! a I "9mm. , rl ... ‘- pOUII . . .. . I I; lollrt. v.5! 5|.oitl! 5. . ll>ru§k 9.}.Ivf.vl.v..uvvl.ln.n . . .I..T>‘VIAI.. p...l.£loto'|. ..li» . . . .. LA! . . va'¥xot.‘.“.,l , . n'll‘l!!.»ln .IIIII «llvlv....tf. I - . V , . ....I ... . . . . (\‘th: .A. 4 a I..ll..E3.i:f-z.2,. . . . . .. . ..J: ...: 3.x, .. . .8 .-- (It: .. DILWII.14»IV. c 1...hsu..wu.llvou.i.. 7. -.«.......“vv. . ..a.......,. 7115:1471. ....rw‘nwfu .. ...".Hurmndfigmkb .fi/vinmmwohil. WI). viir. - . .-vi .. .. .... ...‘a ¢ chnW‘Wenfld W1;F..tfl.§vhmnb 1k}. .finfi. 1ft? ......vl . T. . .cl 1: 5.32m“ .. .. . ..bllng‘mswu .....i .....v . ..- f . fol l....!.. 1 If ...Attpl .71.»- s , .. a 1 v 1 .. f.t..sv n . ‘. M.,l.v .. 21%;. g MIICH GANS l 'llHlliH’lllUll"IUIIHIHIIIHHHIIIHHI 300902 1308 WI} This is to certify that the dissertation entitled On Some Topological Issues In Message-Passing Multiprocessor Architectures presented by Guy Warren Zimmerman has been accepted towards fulfillment of the requirements for PA 0. degree In WWQ Date 7'- /3‘— yo MSU L! an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY Mlchlgan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. [_______________ DATE DUE DATE DUE DATE DUE ._ I I: l —i ii IL____ ifiT—l MSU Is An Affirmative Action/Equal Opportunity Institution chG-o.‘ ll ON SOME TOPOLOGICAL ISSUES , INMESSAGE-PASSING MULTIPROCESSOR ARCHITECI'URES By Guy Warren Zimmerman A DISSERTATION Submitted to Michigan State University in partial fullfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1990 ABSTRACT ON SOME TOPOLOGICAL ISSUES IN MESSAGE-PASSING MULTIPROCESSOR ARCHITECTURES By Guy Warren Zimmerman Advances in computer technology have made it possible to construct machines with very large numbers of processors. Among such machines are the distributed-memory machines, termed Multicomputers, which have the potential to deliver very high perfor- mance for a large class of applications. A major component in the design of a Multicom- puter system is its underlying interconnection topology (usually modeled by a graph), since it has a significant influence on virtually every other aspect of the system, such as communication capabilities and reliability. This dissertation considers some aspects of communication and reliability in a Mul- ticomputer system which are related to the underlying topology of the system. In particu- lar, a type of communication known as broadcasting is studied. The roles of trees in com- pleting broadcasting, and a near optimal broadcasting algorithm for a specific Multicom- puter system known as the De Bruijn networks are explained. Other partial results such as the distribution of the broadcast times of trees of difl’erent orders are given. The reliability, or fault-tolerance, issues discussed here are of topological nature. Specifically, a new approach to system-level fault-tolerance in Multicomputers is presented. In this approach a bound is placed on the maximum number of connections allowed at each processor and the number and configuration of redundant components is then determined to achieve a specified level of fault-tolerance. The approach is applied to the design of fault-tolerant ring topologies. Designs are presented which can tolerate up to three processor failures. To my wife, Janet iii ACKNOWLEDGMENTS I would like to thank my advisor, Abdol-Hossein Esfahanian for all of his assistance dur- ing my Ph.D studies, for his well chosen words of encouragement and for his confidence in me. I would also like to thank Dr. Lionel Ni, Dr. Anthony Wojcik, and Dr. Bruce Sagan for their service on my committee and for their helpful comments and suggestions regarding the preparation of this dissertation. I would like to thank my wife, Janet Ergo Zimmerman, without whose support and sacrifice I could not have completed my Doc- toral work. I would like to thank my parents, Jack and Jane Zimmerman, for instilling in me the importance of learning and for giving me the opportunity to pursue a college edu- cation. Finally, I wish to thank Jack and Carol Ergo and my grandparents, Mable and Harry Raymond, and for their years of support and encouragement. iv List of Tables List of Figures 1. Bibliography . Graphs and GMP ' Broadcasting in Multicomputers . Conclusion TABLE OF CONTENTS Introduction ........ 1.1 Multiprocessors and Multicomputers 1.2 Motivation and Problem Statement 1.3 Thesis Organization - 2.1 Graph Definitions, Notation, and Terminology 2. 2. GMP- A software package for graph manipulation 2.2.1 An overview of GMP 2.2.2 Utilization of GMP 3.1 Background - 3.2 Broadcasting in General Graphs 3. 3 Broadcasting in 'Il'ees 3.3.1 Minimum broadcast trees 3. 3. 2 Characteristics of general minimum broadcast trees 3. 3. 3 Broadcast times for general trees 3.4 Broadcasting in Binary DeBruijn Graphs 3.4.1 Binary DeBruijn graphs 3.4.2 A distributed broadcast algorithm for 806 (n) Fault-Tolerant Loop Architectures 4.1 Background 4. 2 Chordal Rings as It -ft Cycle Topologies 4..21Chordalrings 4. 2.2 Fault tolerance of CR (N ,w) 4. 2. 3 Comparison with previous results 5.1 Conclusions and Summary of Research Contributions 5.2 Suggestions for Future Study LIST OF TABLES Table 3-1. The values of M(r .A) - Table 3-2. Distribution of broadcast times for all trees, 1', 4 s IT I s 28 - Table 3-2 (Cont’d) 38 41 Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 2-6. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 3-8. Figure 4-1. Figure 4-2. Figure 4—3. Figure 4-4. Figure 4-5. Figure 4—6. Figure 4-7. Figure 4-8. Figure 4-9 Figure 4-10. F = {0.1.2,2i-1,2i.2i+1} ,N—2i 32 mod 4 Figure 4-1 Figure 4-12. F . (0,1,2.N-16} Figure 4-1 Figure 4—14. F = [0.1.2.N-8} Figure 4-15. F a (0,1,2,N-4] Figure 4-16. F = (0.3.6.73) Figure 4-17. F a {0.12.3.9} Figure 41 Figure 4-19. F a [0,3,12,13,14] Figure 4-20. F = [0,3,10,2i-l.2i.2i+1],N—2i I2mod 4 Figure 4—2 Figure 4.22. p = {0,4,2i—1.2i,2i+l],N-2i I2mod 4 Figure 4-2 LIST OF FIGURES The three principal GMP windows -- The Std. Graphs menu The Miscellaneous menu - The Algorithms menu The Find Cycles algorithm is invoked on a Chordal Ring The Goodies Menu Two example broadcasts in a graph Two examples of MBTs Three rooted MBTs and their common underlying free tree ................... The unique trees Mam“), for I: =- 2,3,4 MBT (2‘)is constructed from two c0pies of MBT(2’) An MBT not having MBT(23) as a subgraph The graph 806(4) - An example broadcast in BBC (4) using Algorithm 1 The Chordal Ring CR(20.3) A 16-cycle in CR (20.3) - [0,1] A Hamiltonian cycle in CR(20,3) An 18-cycle in CR (203) - {0.3} An induction-by-two illustration An induction-by-four illustration An induction-by-six, version A illustration An induction-by-six, version B illustration . An advance-by-four illustration 1. F I: {0,12,2i-l,2i.2i+1] , N-2r' I0 mod 4 3. F :3 {0.1.2,N-12} 8. F a: {0,3,4,10,ll} 1. F = [0,3,10,2i—l,2i.2i+l},N-2i IOmod 4 - 3. F = [0,4,2i-l,2i,2i+l],N-2¢° n0mod4 - ‘wvvvwv- 16 17 18 19 21 25 30 3 l 32 34 36 48 57 59 61 61 65 67 68 69 7 1 71 72 72 73 75 76 77 78 79 80 80 81 Figure 4-24. Figure 4-25. Figure 4-26. Figure 4-27. Figure 428. Figure 4-29. Figure 4-30. Figure 4-31. Figure 4-32. Figure 4-33. Figure 4-34. Figure 4-35. Figure 4-36. Figrn'e 437. Figure 4-38. Figure 4.39 Figure 4-40 Figure 4-41. Figure 4-42. Figure 4-43. Figure 4-44. Figure 4-45. Figure 446. Figure 4-47. Figure 4—48. Figure 4-49. Figure 4-50 F ={0,4.N-8.N-7] . - - F a [094N49N‘5] F I: [0,4,N-4,N-3} - F = {05.10.11} - F = [0.5.12] F 3 {0.5.13} F a: [0,5,14,15,16] F = [0,5,2i-1,2i,2i+1},N-2i 32mm! 4 F a: {0,5,2i-l,2i,2i+l},N-2i 10mm! 4 F s [0.6.7,12,13] F s {0,63,13,14} F a (0.6.7.8,15] F '10.6.7.N-7l F a [0,6,7,N—8} F :- {0.6,7.N-9} - . F s [0.6.7.41' ,4i +1] . F = [0.6.7,4i 4241' +3} F = {0.6.7,N-10] - F :3 [0,73,15,16] F a {0.7.8.N-9} F = [0.7.8.N—9N-10] F a [0,7,8,N—ll] F = {0.7.8.2j +1} F = [0.7.8.20] F = [0.7.8.N-2j} F = {0.2} +1.1] F ={02j21t} viii 82 83 85 86 87 88 89 89 91 92 93 94 95 96 98 100 101 101 102 103 104 105 106 Chapter 1 Introduction Advances in computer technology have made it possible to construct machines with very large numbers of processors. Such machines have the potential to deliver very high performance for a large class of applications. In fact, the development of multiple- processor computers has made it possible to sharply reduce the amount of time required to solve large scale problems and as a consequence, to greatly increase the scale of prob- lems which can be solved in a reasonable amount of time. These problems are most often of a scientific nature, are very CPU intensive, and typically require large amounts of memory. Some examples of such problems include: aerodynamic simulations, weather forecasting, petroleum exploration, image processing, and artificial intelligence [HwBr84]. 1.1. Multiprocessors and Multicomputers At the highest level, the major components of the architecture of a multiprocessor system are the processors, memory unit(s) and the interconnection network through which the processors communicate with each other and access the memory unit(s). 2 Interconnection networks can be classified in two categories: static networks and dynamic networks [Feng81]. The difierence is that the physical connections may change in a dynamic network, while they always remain fixed in a static network. Examples of static networks include the ring, tree, 2-D mesh and hypercube. Dynamic networks include single-stage and multistage structures. Single-stage is also called recirculating network, which is composed of a stage of switch boxes cascaded to a link connection pat- tern. A typical multistage interconnection network in a multiprocessor system with N processors consists of logkN stages with N /k k X]: switch boxes at each stage, where 2 S k S 8. A multistage network can usually connect an arbitrary input to an arbitrary output. Examples of multistage networks include Omega (shufile-exchange) and Delta networks. Dynamic networks ofl’er greater flexibility at the cost of additional hardware and the overhead of managing the interconnection network [Agra83, BhAg83]. Over the years, two broad categories of multiple-processor computer systems have emerged, based largely on the degree of autonomy of the processors and the way the memory is organized and controlled. The first of these classes of systems is referred to as SIMD, an acronym for Single Instruction Multiple Data. SIMD machines are character- ized by the fact that at any given instant, every enabled processor in the system is execut- ing the same instruction as all the other processors, although each may be processing difl’erent data. For many applications with inherent massive data parallelism, SIMD machines can provide high performance at a reasonable cost. Several successful SIMD machines have been developed, including MPP, ICL’s DAP, and the Connection Machine [Flan77, Batc80, I-Iill85]. One of the best features of these machines is their case of programming. This feature is also the principal drawback in that the ease of pro- gramming is made possible by the limited range of applications which can be efi‘ectively executed. The other major category of multiple-processor machines is the MIMD class; MIMD being an acronym for Multiple Instruction Multiple Data. As the name indicates, at any given time each processor in the system may be executing a difierent instruction, and is typically processing difi‘erent data. These machines are a more general class of multiple-processor computer systems, since they may simulate the operation of SIMD machines. MIMD machines are further categorized by the way the memory is organized and managed. A shared-memory machine has a single global memory which is usually equally accessible to all its processors. Since having a single memory can lead to memory contention problems, the memory is often physically organized into modules and the processors access the modules through a communication network, most fre- quently multistage. Even so, the possibility for contention for access to specific modules can still be a problem. Processors communicate with one another by accessing pre- arranged locations in the shared-memory. Examples of shared memory machines include the BBN Butterfly, the NYU Ultracomputer, the IBM RP3 , the Sequent Balance/Symmetry, the Encore Multimax and the Cray X-MP/Y-MP [Gott83, Lars84, Crow85, PfisSS, Sten88, RCCI‘90]. The chief advantage of shared memory machines is that they allow the programmer the impression of a single address space. The major disadvantages are that the machines do not scale well and contention for memory access can lead to so-called "hot—spots" which can degrade performance. The second type of MIMD machine is referred to as a distributed-memory machine. Here each processor in the system has its own local memory (which it controls) and the processors are connected by a communication network (most frequently static). The pro- grammer does not have the advantages of a single global address space. Processors com- municate by passing messages through the communication network, and a message may have to go through several intermediate processors before reaching its destination. For this reason, these systems have also been referred to as message-passing or point-to-point systems. A number of research and development projects have been undertaken to con- struct such machines [Seit85, I-IMSC86, SAFM88]. Among the existing systems, the most dominant one is the hypercube which is commercially available in different variations including the Mark III, the iPSC-l, and the Ncube [PTLP85, Gui-1886, GrRe86]. In contrast to shared-memory machines, message passing architectures gen- erally scale better and are more robust. However, they are more diflicult to program efiectively; and since communication is usually slower, they are better suited to problems where the communication/computation ratio is relatively small. This dissertation con- cerns several problems related to this type of machine, which we will refer to as a Multi- computer (MC). 1.2. Motivation and Problem Statement There are many aspects to the design of an MC. As in the development of every computer there is the broad hardware/ software dichotomy; and within each of these two areas, one may consider many difl‘erent levels of design. Here we will be concerned with system-level issues. System-level means the consideration of the MC in a macroscopic fashion. In terms of software, the issues are the development of operating systems, com- pilers, programming tools, etc., all of which must address the MC as a complete entity. In the hardware area, system-level means viewing the MC as a collection of processor/memory-pairs connected by communication links. These two areas are highly interdependent, and a successful design must consider both areas as they relate to each other and to the design goals. In terms of hardware, at the system-level the selection of the topology of the inter- connection network is fundamental since it has a significant influence on virtually every other aspect of the MC including communication capabilities and system reliability. In a certain sense, the topology is the MC. Communication characteristics such as message delay and routing are directly related to the underlying topology [FaKr83]. Many research articles have focused on the relationship between the topology and one or more of these characteristics. As a result, many topologies for MCs have been proposed including cube-connected cycles, binary tree, 2D-mesh, binary DeBruijn networks, and k-ary n-cube [Prad81, PrVu81, HsYZS7, HwGh87, Dun88, ChCD88, Dall90]. This dissertation will examine how the topology of an MC influences its communication capa- bilities and reliability. Since MCs rely on passing messages, the performance of an MC is significantly tied to the types of communication that can be accomplished, and how quickly and efliciently they can be done. Experience has shown that several communication paradigms are use- ful in developing algorithms for MCs. These paradigms include: one-to-one where a pro- cessor wishes to send a message to another specific processor; one-to-all where a proces- sor wishes to send a message to all the other processors in the system; and finally one-to- many, which is a generalization of the previous two. These classes have also been referred to as unicast, broadcast and multicast, respectively. Some of the issues in implementing these communication paradigms are: the number of messages that a pro- cessor may simultaneously send or receive, the type of routing strategy (e.g., static or dynamic), and the type of switching strategy (e.g. packet switched. circuit switched, wormhole, virtual cut/through) [KeKl‘l9, DaSe87]. The performance of each of these communication schemes is afiected by the topology. In this dissertation we consider the problem of broadcasting in MCs and we assume that each processor may send or receive only one message at a time. The large number of components (such as processors and memories) and the overall complexity of such systems serve to exacerbate reliability problems, and much researeh has been done in studying reliability issues in MCs [KuRe80, PrRe82]. Several types of reliability have been distinguished based principally on the nature of the applications the MC is to execute. A highly available system may have frequent failures, but the failures are such that the system can be restored to proper functionality in a short time; these sys- tems are "up" most of the time. The Electronic Switching System (ESS) used by AT&T is designed for high availability [Prad86]. An ultra reliable system is one which has very few failures over a specified time interval. This type of reliability is demanded by applications where failures can have catastrophic results. Examples of such applications include operation of aircraft and medical monitoring equipment. One approach to developing reliable systems is to use ultra-reliable components in the construction. However, this can be an expensive proposition and does not always adequately address the problem. Another alternative is the so-called fault avoidance approach in which the system is designed to avoid faults from occurring. The approach we will consider in this dissertation is referred to as fault-tolerance. As the name implies, the idea is to design systems in which faults may be tolerated, and a system will be said to be fault-tolerant if it can remain functional in the presence of failures. At the system level, two types of failures are considered: the failure of a set of processors, and the failure of a set of communication links. What constitutes a functional system depends largely on the type of applications that the system will execute. Two basic functionality criterion have received much attention. According to one of these, an MC is considered to be functional as long as there is a non-faulty communi- cation path between all pairs of non-faulty processors. In graph theoretic terms, the underlying topology is connected. This type of functionality is applicable to the so- called coarse grain application, in which the algorithms are relatively insensitive to topology. Typically, each processor is made responsible for executing some subset of the tasks required to solve some (large) problem; in the event of a failure, the tasks are assigned to other processors. This is the so called fault-tolerance through degraded per- formance approach [ArLe81, Esfa88, RaGA85]. The second functionality criterion considers an MC to be functional only when a desired topology is contained in the system. In the past few years, much work has been done in developing eflicient, high performance parallel algorithms for various MCs. For many of these algorithms, the existence of certain topologies is a significant factor in delivering the desired performance. For such applications, the system should be able to provide a specific topology throughout the execution of the algorithm. This criterion was first formulated by Hayes, and subsequently a number of fault-tolerant (in the sense of the second criterion) topologies have been proposed [Haye76]. The basic approach in this case is to employ redundant components (links or processors) throughout the system. In the event of a failure, the system could then be reconfigured to exclude the faulty com- ponent by using one of the redundant ones. In this dissertation, fault tolerance will refer to the second functionality criterion, and we will consider only processor failures. Broadcasting and fault-tolerance have much in common in that they are both inti- mately tied to the topology of the MC. In addition, broadcasting can be an integral part of the reconfiguration process for a fault-tolerant system. Also, a broadcasting scheme for a fault-tolerant system must be robust enough to accommodate faulty system com- ponents. A number of papers have addressed problems common to these two areas [Bien88, LeHa88, Lie588]. 1.3. Thesis Organization As indicated, this dissertation will focus on two problems both of which are related to the topology of the interconnection network. Such a topology is conveniently modeled as a graph where the nodes of the graph represent the processors and the edges represent the communication links. Many problems in Computer Science and Engineering can be so formulated. However, it is often the case that such models are only practical for "small" instances of problems, since even modestly sized graphs can become incredibly complicated and unwieldy to work with. In addition, many of the operations one wishes to perform on such models are of a trial and error genre, and such operations become increasingly labor-intensive for all but the smallest graphs. To address this problem, we have developed a software package for graph manipulation which has been instrumental in obtaining some of the results in this dissertation and other research as well. This pack- age is described in Chapter 2, where we also present graph theoretic definitions and ter- minology which will be used throughout the dissertation. Broadcasting in MCs is discussed in Chapter 3. In particular, we examine the role that trees play in the broadcasting process. A partial characterization of a class of graphs called minimum broadcast trees is given. Finally, we present a distributed algorithm for doing broadcasting in networks based on Binary DeBruijn graphs. Fault-tolerance in MCs is the subject of Chapter 4. In this chapter we introduce a new approach to system-level fault-tolerance. The approach is applied to the case when the topology of the MC is a ring. For this case, a class of graphs called Chordal Rings is shown to be an Optimal solution. In the final chapter we present our conclusions, summarize our research contribu- tions and suggest some directions for future study. Chapter 2 Graphs and GMP We will use graphs to model the topology of an MC. The vertices of the graph represent processors and the edges of the graph represent the communication links. With this model in mind, we will use the terms vertex, node, and processor interchangeably and similarly for the terms edge and link. In this chapter, we present our mph definitions, notation and terminology which are common to both of the two problem areas that we consider in subsequent chapters. Definitions and terms which are specific to a problem area will be introduced in the context in which they are needed. We also describe a software mph manipulation package which we have developed and which has been useful in our research. 2.1. Graph Definitions, Notation, and Terminology In this section we present our mph theoretic definitions and notation. Graph theoretic terms not defined here can be found in [I-Iara72]. Let G (V,E) be a finite mph without loops or multiple edges with the vertex set V = WC) and the edge set E = E(G ). The order (size) of G (V,E) is equal to the cardinality of the vertex set (edge 10 set), and is denoted by |V(G)| ( |E(G)| ) or simply |V| ( |E| ) when the context is clear. If an edge e = (u,v) e E, then vertices u and v are said to be adjacent, and the edge e is said to be incident to these vertices. For a vertex v e V,I(G :v) represents the set of all edges incident to v in G . Two vertices connected by an edge will be referred to as neighbors. The degree of a vertex v is d(v) = I! (v ) | . The degree sequence of G is a non-decreasing list of the degrees of all the vertices in G . The minimum and maximum degrees of a mph G are 8(G)=min[d(v) lv 6 V} and A(G) = max { d(v) Iv e V}, respectively. A mph G is r-regular if for all veV, d(v) = r. A mph is a cubic mph ifit is 3-regular. A mph H which has all of its vertices and edges in G is a subgraph of G , denoted H :6. H6 is isomorphic toa submph ofH, we say G maybe embeddedinH. IfH is a submph of G , then G is a supergraph of H. A spanning subgraph of G is a sub- mph containing all the vertices in G. For a set F : V(G ), the notation G-F represents the submph of G obtained by removing from G all the vertices in F along with their incident edges. A path P is a sequence of distinct vertices vo,v1,..,vn_l,vn where (v‘. ,v‘. +1) 6 E(G ), 0 Si Sn-l. The length of the path is the number of edges in the path. A mph is con- nected if every pair of vertices are joined by a path. A cycle is a sequence of vertices vo,v1,..,vn_1,vn where (vi ,v‘. +1) 5 E (G ), 0 Si Sn-l, n 2 3, v0 = v” and all the other ver- tices are distinct. The cycle of order N, also called an N cycle, will be denoted as C”. When convenient, a cycle will be specified by listing, in order, the vertices in the cycle. For example, x0, 11, x2, x3, x0 defines a 4-cyc1e. A Hamiltonian mph contains a cycle passing through all its vertices. The distance between two vertices u and v , dist(u ,v ), is defined as the length of a shortest path joining these vertices. The diameter of mph G , D (G), is the largest value of dist (u ,v) in G . A connected acyclic mph is also called a tree. The vertices of degree one in a tree, T(V,E ), are called the leaves of T. A rooted tree is one in which a specific 11 node is labeled as the root. When no such designation is made, the tree is said to be free. Thelevel ofanodeinarootedtreeisthelengthoftheuniquepathfromtheroottotlre node. The height of the tree is the number of levels or the length of the longest path from the root to any node in the tree. A spanning submph which is also a tree is called a spanning tree. G is a bipartite mph if V(G) can be partitioned into two disjoint subsets V1 and V2 such that every edge ofG joins a vertex ueV1 with a vertex veVz. In such a case G may be denoted G (V1,V2,E). 2.2. GMP - A Software Package for Graph Manipulation In this section we give a brief description of the software package for mph mani- pulation that we developed. A more detailed description, including a user’s manual, pro- mmmers manual, as well as the source code can be found in [ZiEs88]. We will describe the functionality of the package, its utilization to date and some suggestions for improve- ment. Graph theory has long become recognized as one of the more useful research tools in Computer Science and Engineering. Extensive use of mph theory has been made in areas such as topological design of networks, complexity theory, and design and analysis of algorithms. This dissertation considers two such problems. In the course of our work in mph theory, we had increasingly noticed the need for a software package which would enable the user to interactively manipulate graphs, test conjectures, etc. No such package was then available, partially due to the unavailability of afl'ordable mphics workstations. In the summer of 1986, when such workstations became accessible to us, we undertook a project to develop such a software package. We dubbed it GMP, an acronym for Graph Manipulation Package. The package was written in C and currently runs in the SunView environment on SUN workstations. The first version of GMP became available in the winter of 1987. It is an interactive program which allows users to visually manipulate mphs with up to 100 vertices. There 12 are essentially two major facets of the program. First, GMP allows the user to construct and modify mphs on the screen. This construction can be done manually using a mouse. Further, a menu of standard mphs is available, or the user can create a graph of-line and then lead it into GMP. Second, GMP allows the user to investigate the interaction of mph algorithms and mphs. A number of classic algorithms are built into GMP, and the program has been designed to allow users to easily incorporate new algorithms into the system. Facilities exist for storage and retrieval of mphs, as well as for obtaining publi- cation quality hardcopies of created mphs. In developing GMP, our initial intent was to produce an exploratory tool rather than a results-oriented, computational promm ala LINPACK, etc. We wanted something which would allow us to easily test conjectures and investigate ideas. This orientation had important consequences on the overall design. One such consequence was the deci- sion to limit the maximum size of mphs to 100 vertices. While the specific number was somewhat arbitrary, it seemed that any number much higher would have decreased the user’s ability to get a good understanding of what was going on. Also, our experience was that any size monitor gets crowded with more than 100 vertices. Our design goal also affected various aspects of the implementation. For example, data structures were selected for their simplicity and universality, rather than economy of memory, etc. Further, in adding an algorithm to the package, we generally chose a version which was easier to implement, rather than one with the best performance. This was in keeping with the above philosophy, and it was thought that for ”small" mphs, the decreased perfor- mance was probably minimal. 2.2.1. An overview of GMP The GMP user interface consists of three main windows: the message window, the control window, and the canvas window (Figure 2—1). Virtually all user input to GMP is done via the mouse, although the keyboard is required for some functions. The rummage l3 window is the long horizontal window at the top of the screen. One purpose of this win- dow is to provide feedback and instructions to users. This window also contains five pulldown menus which are accessed by clicking the RIGHT mouse button over the menu title (Figures 2-2 - 2-5). Each of these menus is described in more detail later. The long vertical window on the right is the control window. From this window the user can ini- tiate file system commands to store and retrieve mphs and select which manipulation command is to be active when the cursor is in the canvas window. The large square win- dow is the canvas window. This is the window in which the mph is drawn and manipu- lated. Most user feedback occurs through this window. As stated earlier, GMP has two major overlapping functions: 1) creating and mani- pulating mphs, and 2) invoking algorithms which act on the created mphs. Graphs can be created in several ways. Some standard mph definitions are available in the Std Graphs pulldown menu in the message panel. Selecting one of these options will create a mph of the specified type using the default number of vertices. In Figure 2-2, the Std. Graphs menu is shown with the Binary Tree option highlighted. The mph which is gen- erated by this command is shown in the canvas window. Note also the default parame- ters window, in which the parameter number of vertices is set at 20. This indicates the order of the mph that will be created by the Std. Graphs Menu. Graphs can be created manually using one of the manipulation commands selected from the control window. For example, select Add Vertex, move the cursor into the canvas and click where you would like a new vertex to be placed. In addition, you can type an adjacency matrix into a file, add an appropriate GMP header and then lead this mph into GMP. The vertices and edges of mphs created using one of these three methods all have default values for their parameters: edge weight, node weight, edge type, node type, etc. The default values for each may be set by the user, and the user may also manually set the value of any parameter for a particular vertex and/or edge. 14 The Miscellaneous menu (see Figure 2-3), as the name implies, contains a number of commands for various purposes. Among these are: displaying/hiding mph labels, manipulating the weights associated with the mph vertices/edges, and altering the lay- out of the mph. Figure 2-3 shows the main miscellaneous menu along with the Weight Options submenu. The Show Edge Weights option is highlighted and the results of this command are seen in the canvas: all edges in the mph shown have weight equal to 1.0. Users invoke algorithms on created mphs via the Algorithms menu (see Figure 2- 4). The Shortest Path algorithm is highlighted and the results of the command are displayed in the canvas. In this example, a shortest path from a source vertex (black— ened) to all the other vertices in the mph is shown by the ”double" edges. In addition, the distance from the source to each vertex is displayed next to the vertex. The figure also shows an example of another mph type available in the Std. Graphs menu: 2D- Mesh. A second example of the results of an algorithm is seen in Figure 2-5. The Find cycles algorithm was invoked and the user requested a search for a cycle of order 26. Such a cycle was found and displayed in the canvas using double edges. As mentioned, the package was designed to allow users to easily add new algo- rithms. A programmer’s manual is available containing prommming guidelines, exam- ples and a description of user-accessible utility procedures. User implemented algorithms are accessed through the User Algs menu. This menu is user-definable and may contain an unlimited number of user algorithms. Finally, the Goodies menu (Figure 2-6) allows the user to perform a variety of operations which alter the visual appearance of the mph. 2.2.2. Utilization of GMP Since its inception, GMP has been used in conjunction with mph theory related courses in both the Mathematics and Computer Science departments at Michigan State University. In particular, in the Computer Science course entitled Analysis of Graph 15 Algorithms students have used the package in a discovery mode to investigate basic con- cepts in graph theory. The interactive and visual features have been extremely motivat- ing in this context. The package also provides a framework from which one can develop and test new mph algorithms, as well as to study existing ones. Students have written and/or coded algorithms as projects for this class. In addition to its instructional use, GMP has also been a valuable research tool. Briefly, it automates many of the mundane operations that a mph theorist would nor- mally do by hand. This automation allows many conjectures to be easily tested and refined, a task which is often too onerous to be done manually. The package has been instrumental in obtaining results reflected in recent publications. In particular, GMP has expedited the formulation of a new approach to system-level fault-tolerance and com- munication paradigms for multicomputers [13st 89, ZiEs90]. In addition, some of the proofs in Chapter 4 of this dissertation would have not been possible without it. It also served as a tool for producing all the mph related mphics in this dissertation and other recent publications. The package has also been made available to researchers at other institutions including Western Michigan University, Grand Valley State University, George Wash- ington University, Georgia Institute of Technology, Rutgers University, Boston Univer- sity, University of Iowa, and Université Paris-Sud. The responses have all been positive. To date, GMP is unique in its category of software. l6 .. 29¢) x3e; 3:28 0 I a .cofiueooH penance uo>o ecuuan an“; x030 £th 4 and On. “h #Nww.33v.a”..h””aw.upfififififiwngfiufifi. gfififir. ...6"n"”mm"nu”...»“unmunwunuunmuxfiamufiwux”“9030..“fig”... shanghai??? .. ggofier .....fiv. 5....:... .3853 .30 queues 8% 2:. .2 2:5 none. :- 33... D .3... act; .935 D G 29.) 38 .956 U 83 3:; D 92$ 32.: D 53 e3 0 5:3 26: D 9 x32; 26.-um D :3...) RED ”9338 5:15am; 7:32.38 .1335 a a a 3.3.8”. 9’ «gm auteur. ...-rill been“: Houudou \ hoe—:5 omen—nu: 8:... 8.88 8: ..8: 25.82 \3852822 aura .Bm «abfincoaxtoax CE A::-:a-p:~:~<-:~:»-51¢:<---m-¢,~'-:-:-.~:--«ea-~:-:-:¢¢rwe+:v. a, >3mu¢>§5 tween amaze—z .teEE.m .) .9 FE citecsmw ...-.c >3 EQESB r.N pus—nun; .333..ch 9.35.52?! cant: W .. 33.3 .. ._ sifis§~>xieiszex>§> 17 .558 Eng Em 2F .N-~ 2:3...“ 33 A: 32332.. 83 O O O O 0 no: gauze; .3 i 833;... t5 .3353: mtwucfiunu .2333 15 nume- :. 26.5: D En... xefo> one-5 D .29.) x3e; 3ch U .29.) 58 3:26 D 33 33:. D .98 26.5: D 38 R: D x3e; 30: U xoauo> 395: D 5:; “.3 .8 33.3 52:52! O O 9.5 9:: 2896 9.5 €318 centuaoa .cou suite... emcee; O 3.2a; 329.3 59.6 3215a H2:. 8:58 no: he: 952%: Renee—posh fife .595. aeofimtmatms "he a .c0uuan ounce hzomm 30: ecu x030 5:9: use—u :3.“ 33> oh. «Es 523.553 .:|.< >9 acne—ego :d ESE; unmade... 5.323.351 some . 44. .> .. . ... who”. .... . . . . . .41... .4? .. . .. 4. av“. s .wxgug 9/5: figfifiam» 18 .258 385—683 2F .m-~ meE . . . ....u . .... .... . ... ...... .. -..... . .....n. . ... ... ........nv..p.n.h... . n. . . . . . .. ... . . . .. ...-.... .... .. .. .. ... .... ........ .....- ..... ...... . .......... ...“. ...... ......”J . .. A. . \. qr... ..u . n. ... V ... .. .. v.0“ . .L).u!....\5...>nh\..¥n5nh.. . . VTVVV .. . . . . HAVE). . . . . ”AK-VTY.) .. .... .. . . .. .. ....-.- .. . ... n ......5..... .......... ..Airuu.....H....n ...-.... . :1 Any»; u w -_ .uuxuu; \‘A‘ u 315:3:7¢21$:3:¥2§3‘-‘7372-"T'3 3:7" ' umwh- my . . ....” .V.v.m noon. 2. 26.5: D .33 x3...> 855 D 29-) 5:2, .935 D 23.) 33 855 0 83 3:; 0 a... :2... o .26 a: D xouLo> 262 D o... .VAW\\WI.' .‘o‘IIN VAKV-‘V .‘I. xofo> 26.5”. D xofo> 23 U ”2368 SF: Eng! Write...“ 2.5.3 22:... 8 3.. .... . 3 ...... a a a 2:33 3 3:3.) act; :3: s 95 ...-3 o o 95 I .23.: .9; ...... 2: 3.. 82...... u _ u m~za.o: noego> ou.z gauge agony—mica . . . . uses-a a.ga.o) nougo> saga camp «moan oncogu . . . . . . 9 upon-a Ix gauge logo's — call“. 50.5.5 -=.....-5::2::w:¢:=r-m3:=¢¥<>:-‘ fiflififi’ififd‘:532313:7'?5‘:7'=4:5$:W&19 5‘.'Ifu‘n'vff.‘.‘.\‘.‘.‘.’I~'.‘.‘.’.‘.'.‘\'-‘.'A’JV-'4‘ '9 5'." 5"? .‘A'.W.‘-‘INA’.'-5W.’I .‘E‘fif. H'- . ....c, ....5. .' "0:“. 3:58 on: be: 2:: Qtab .3» o§\n..oa\.a:\ CE 5...........5..~. 1:...unznv...<.\ .. 33...... ......II. x... H... ...-... 4.5.31.1? ..... nuff. 550‘“... 19 58.: guano»? 2:. .3 Baa qu.‘ I. .x- u, N v .. 9.53).}unwa~Ajvf~55~15u¢x53.).3j/5fa-sygmy.~.-.-.-.A.-.u.- - - . - -.- - - -.- VI. noun. :- o>o§¢ D ' ave.- ponS x392, cue-cu D 230) roan-ta 00:25 0 .29.) 83 358 D n ... . . ... .. . n... .......... ...... . .... 5...... .. ..... . .. .. . .. ...... .. ..... . . . . . . \....... ..........y.... . . ... f . .. .. ., ... . . . .. .. .. .. . . . .. .. . . .. .. .. I .. O. (Ctr.Y€>.L.Iu\7P-.;;€)£C)>V;Pfuu.’))).a)f?.\.€ hung)? ...C FCC r. Pr. FYI). ...ru...s-)P.s.r...l‘.... .t.§f..h...r)\.c.€.£45.F-)b>\.<)z.\.h\ ($3934)... . \ ... 1...}... . .... ............ (CF-...s ......r. .C..\X\ 3... Cr... ...-.1.. ...L.Iv..>..\......5 C)... . 0.. .0 o... ... .. .... 0.. 0.. 00.0 0.. 0.. o... .03 39.3. D 00.6 34 D o... m ... AV“...- A ... .67.; 26: D I. -T"-f'f-Z‘M\V-$W6?-"' "5W". '5-55'561‘3-3443'5-"7 ' 1': x325 26.5: D 5?: 83b 9 «3.32.59 fl 33.8 1:25.53. .... v“? A 5.; ... J ... -.-.-.-.-.’.-.-.v.v.-.-rah-nave.-.-.v~mxv.-.~.s- - vx "T "-.V565?.-T-T-1(-M~M"N'f"ffif¢'*?-‘ ...uau c=.u ooh» soap-«a c colon—om 9.226.: H3328}. :33}; :9: ......o—au o o a a a 3.. AV||Illlnl3£ a...“ "can” ... "HAY-... "AV 3.. . sac—ago 9 9.2.3.: m using... 5953 H 3 ...-co: .2. 338.29 «926 - I .2. 9323.29 933.33.. ’u'.‘.‘.‘c'."¢' 5%....- ... -,.\.A-...._.'. . o:— mfcconn 532:: «can; «nougogm .. 2.2232,. . .I-tMMme-fiwwxrwWWWNt-ww "up: 00:38 537.8: ‘58-:ng 919.0 .3» oeozxmgoaxgoax CS -.-.\ .COuuan canoe Hzonm vac: vcu xouao .scos csov Hana aou> op P5525 .CaEE—N .) .6 «Eu 5.5538 .216 >2 vane—$25 . $.m c389) 6933.... 832355.: :92... W, “mandataa ..... ....fifittmfimufifimfififi33333 $3.333: x3333. ugfig. .. ...finfixréwfit .3“. .3 3.3 3:26 a .8 325 a 558% 865 2E 2:. .3 25mm ......A..,.,..............,....VU....,.. ”...... ........, . ... ... _.,._ oeuv- Za {6‘50 .3... xoago> one-.6 D ¢5v0) XOflkO> Dunn's D a 1 8 uy *u aw AMW Ag; u; MUAA 22.) SB 858 D 3...“. 3:; D .25 26...... D 652 a... a "3.. 3 so». ¢ 88 21 D it: 26: 0 act; 26‘s 0 :38, R: D 83.8 Brad-.53. ...uk. 8. 8 8 .26 ;‘_.A-M. “Luv“ A.“ n - Hugoao'L-Pwra:oo H mm@ a fig can—go also“. 8.8..» 3:... .1289; Ill-3:832 It“. .3» 935-8383 23 '.'W-'W u-A‘A‘u-Lu ._-u .-,,u ...,U nu . n .. . Au. ...,4. . u A WN "UHaflwovw UNA. 00Hho HQHCW Mm .CxE§.N .) .3 Uta Cm,:¢£eum..— .II.< >8 —.QQC_..>..: 2.x. E: .L..> 1.9.1:... c..:.._.:_::1 29...: . ..§$33§<§§sJé§§§§§ki>x3§>x...:..u...... . ..n..u......z.x.x6....53e...u NW- »..mblc: 32m 5.3.3.: Ytfifi>$¥3$8§§ . .... .... , 133$ 21 .2»: 8680 2F .3 95mm 1.: ..v I ..l.. .. I>4 ‘I vauluklt .1121: uI.I.II-.I - . ... . I...r ». .4~.n....) .u- .3 V. ... v» 1.. . .., .~.. . n . u . . .. .III .I. I.I . K. . . ... V . Alv.n.-.~ .x. u U I . 4 . . If. ; . I.....,.v~ . . . . . . . . .» .V . u .u r I. C ....v H .4. . r - ~ . . u u ..r .v .. ‘ . V u . . .‘ . ..yhtss.r).v.ir.~9$.kdr§.r)gr).>r€ :2».>€>.,... ... 5.5.2.2;5» .......). tr... £vs.;.?r..r.)y>.x Ekatsz ...... (V...,. .F....€ r... . .. .. .... r.>:..€....‘. r?...x.:€;.f ..... C r.>< {.2}. . ... ...)».nr. . .. C... . C ...! 5:. ...)r..,.. C c. , .. u coon. :- 33: D .3... it: 858 0 22., .623 855 D W 22.) 83 855 0 w. 33 .63; D W .93 26.5: D W 33 an. n. ., x3...) 25: D W xougo> o>osum nu :35, “in W ”9238 3:3; W 1.33625 53:64 W 3. . 9.8 in: 5:3” IEBHI _c~co~_;c: 5:. 8: ...... 857.8: 38.2.83: nits .3» oscxosoatoax :3 £00.53 oases ...:on vac: vac £0.30 5:»... ...-on :2“ 33$ 8. ::7_£;. .ngtzcrzw .) .: vcm Ce.Ce£:»rW .:-.< )0 B:&:.:>;3 3.3 :;_:;;> .zccx.1; ::.ur_:;_:11 saw; .3sz . ...fiuafi. .. ..é3ééé >§§>§z§1§«333«35§<$5§,>5§? §3<2 xdmruc>ficz :«aum grows-M39 w. Chapter 3 Broadcasting in Multicomputers One type of information exchange that often arises in MCs and communication net- works in general (e. g. computer networks, spy networks) is referred to as broadcasting. Here, one member of the network, called the originator, wishes to send a message to all the other members in the network, as rapidly as possible. The underlying topology of the network plays a significant role in how broadcasting should be done. In some networks, by default, every message transmitted is received by all members of the network, whereas in some others, a transmitted message is received by no more than one other member of the network. These networks are respectively referred to as broadcast net- work: and point-to-point networks [Tane88]. Here we are concerned with the problem of broadcasting MultiComputer networks as defined in Chapter 1. However, since our focus is at the topological level, our discussion is also applicable to general point-to-point net- works, and we hereafter refer simply to networks. There are many applications of broadcasting in such networks. In general terms, broadcasting is necessary whenever some information must be relayed to all the other members of the network. Some specific examples include: synchronizing processors in 22 23 a distributed network, reconfiguring processors to achieve a desired logical interconnec- tion, diagnosing the state of a distributed system, and communicating results from paral- lel algorithms computations [Prad85]. The basic idea in broadcasting is to keep informing uninformed members of the net- work until all member are informed, however, which uninformed members and how many of them can be informed at a time are usually restricted by the network technology and/or application environment. This has led to defining difierent types of broadcasting. In this chapter, we first survey the existing work on broadcasting in a network. We then discuss the difficulties that arise in implementing broadcasting and review the known algorithms for broadcasting. We will examine the role that trees play in broadcasting and we present a disu'ibuted broadcasting algorithm for a particular class of networks based on Binary DeBruijn Graphs. 3.1. Background We first consider the broadcasting process in general. In this process, one node called the originator wishes to send a message to all the other nodes in the network. The originator begins the process by making a "call" to another node in the graph, informing it of the message. Subsequently, the informed nodes call their uninformed neighbors and the process continues until all nodes in the network are informed. It has been suggested [FHMP79] that each node be involved in at most one call during each time interval, although calls between distinct pairs of nodes may take place concurrently. We assume that all calls take the same amount of time; the time interval during which a call takes place will be referred to as a time step or simply step. Thus during the first step of the broadcasting process, the originator calls another node in the graph. At the end of the first step, two nodes are informed. Since there is no benefit gained from a node receiving the same message twice, we specify that each node (except the originator) be the receiver of exactly one call during the broadcast. 24 Several types of broadcasting have been distinguished in the literature [Farl80]. In local broadcasting a node may only call one of its neighbors. In line broadcasting a node may call any other node to which it is connected by a path; however, the edges used to make the call must not be used by any other call during that step. Path broadcasting is similar to line broadcasting, except that it is further required that no node is used in more than one call in any time step. The difierence between line and path broadcasting is that in the former, a node may allow several calls to pass through during any step, whereas in the latter each node may participate in at most one call per step. In both cases, however, only the starting node in a broadcast path need be informed; all the intermediate nodes in the path can act as a switch and relay the message without being aware of its contents. This assumption may not be very realistic; if a node is to relay a message it might as well examine its contents since, in this context, the same message will be sent to it eventually. Here, we will consider only local broadcasting and unless stated otherwise, we will use the terms broadcasting and local broadcasting interchangeably. The broadcast process is illustrated in Figure 3-1. The top graph represents the topology of a communication network. Broadcasts originating from nodes e and f are shown in the middle and bottom graphs respectively. The edges used in the broadcast are highlighted with arrows to indicate the sender/receiver and are labeled with the time step in which they are used. Unused edges are shown as dashed lines. Note that the broadcast was completed in five time steps in the middle graph; the lower graph used only four. It is clearly desirable to complete broadcasting in a minimum number of time steps. For a general graph with N nodes, the minimum amount of time to complete a broadcast is at least [log N 1 (all logarithms are in base 2). This result follows from the fact that the number of informed nodes can at most double after each step of the process and may be proved formally by a simple inductive argument [FHMP79]. Clearly, the "location" of the originator within the graph afi‘ects the amount of time needed; an originator which is "in the middle" of the graph may require fewer time steps than one on the "periphery". Figure 3-1. Two example broadcasts in a graph. 26 For example, in Figure 3-1, broadcasting from node a will require more time than broad— casting from node 3 . In addition, the sequence in which an informed node informs its uninformed neighbors affects the number of time steps for a particular broadcast. In the middle graph of Figure 3-1, e sends messages to nodes b, a , and f , in that order. How- ever, it is not difficult to see that the number of steps can be reduced to four by using the order f , b , a. These considerations mofivate the following definitions. Definition. 3.1 The minimum time (in number of time steps) required to complete broadcasting in a graph G, when node v is the originator is denoted bt (G :v ). The minimum is taken over all possible orderings of calls. Definition. 3.2 Let bt (G) be the best possible broadcast time for a particular graph G. We have bt(G) = min{bt(G:v)|veV}. Note that bt(G) is not necessarily equal to [log|V(G)| 1. Definition. 3.3 Let BT (G) be the worst possible broadcast time for a particular graph G. We have BT(G) = max{bt(G:v)|veV}. Definition. 3.4 The broadcast center of a graph G , BC(G) = {veV | bt(G:v) = bt(G)}, is the set of all nodes which when chosen as the originator, can achieve the best possible broadcast time for G . This set is clearly nonempty and may have more than one element. Definition. 3.5 A calling schedule for a broadcast is a set of statements of the form "node i calls node j during time step t”. The calling schedule is a legal calling schedule if it satisfies the criteria [Farl79]: i) During a given time step, each node may participate in at most one call. ii) For each call scheduled for step t, the sender is either the originator or was calledduringsteps, l0). These trees contain the maximum number of nodes that can be informed in k time steps. The trees in this class (also known as binomial trees) are in a sense, completely defined with respect to many difierent properties, some of which are given below. We will utilize these proper- ties in the characterization theorems for general MBTs in the next section. Property 1. For each value of 1: >0, the set SMBT (2") has exactly one element. That is, up to isomorphism, there is a unique minimum broadcast tree with 2‘r nodes and we will denote it by MBT (2"). Figure e-4 illustrates some examples of trees from this class. The 32 Figure 3.4. The unique trees Mam“ ), for k = 2,3,4. 33 labels indicate the time step in which each node becomes informed. Property 2. MBT (2") can be constructed by combining two copies of MBT (2""1 ); the tree can be decomposed into two symmetric halves. The construction is depicted in Fig- ure 3-5. Let u be an originator in one copy of MBT(2k'1) and v be an originator in a second copy of MBT(2"'1). A single edge is added between the originators, yielding a tree with exactly 2" nodes. To see that is also an MBT, choose it (or v arbitrarily) to be the originator for the new tree. Initially, u calls v , using 1 time unit. Since each of u and v is the originator in their respective subtrees, the remaining nodes may be informed in k-l steps. It is clear that all trees MBT (2") can be so constructed. Property 3. The degree sequence of MBT (2k) is completely determined and can be com- puted recursively. In particular, there are two nodes of degree It; one of which must be the originator of the broadcast; the other must be the first node to become informed in the broadcast process. And, for each d, 1 S d S k-l there are 2‘4 nodes of degree d. Property 4. There is exactly one calling schedule for MBT (2" ), once the originator is fixed. Property 5. The number of nodes at any level 1 in MBT (2" )is given by [i l , hence the name binomial tree. Property 6. The number of levels in MBT (2k) is exactly I: . Property 7. The number of leaves in MBT (2k) is exactly 2"“. 3.3.2. Characteristics of general minimum broadcast trees We now present some results for general MBTs in the form of several theorems. These theorems represent some necessary (and suflicient) conditions for a tree to be a minimum broadcast tree. These results will serve as a first step in characterizing minimum broadcast graphs. Theorem 3-1: A tree T(V,E) of order N is a member of SMBT (N) if and only if T is a subgraph of MBT (2") where 2"“1 < N s 2". 34 Figure 3-5. MBT (2‘) is constructed from two copies of MBT (23). 35 Proof: (—>) By definition, there is a legal calling schedule C , which will complete the broadcast in T in k = [ log N 1 time steps. Begin the broadcasting. Let step 1: be the first step during which a node has no neighbors left to inform and S (T) = l v e V |v and all of its neighbors were informed before step T}. For each node v 6 5(1) add a new node uv to V and add a new edge to E, connecting uv and v. Inform all the newly created nodes. Continue this process during each subsequent step through time step It . It is clear that in each step we inform as many nodes as possible (by adding new nodes where necessary, it is assured that the number of informed nodes must double) and that all nodes are informed in 1: steps. But MBT (2") is the only tree with these properties. Thus we have constructed MBT (2") by adding nodes and edges to T and hence the tree T is a subgraph of MBT (2" ). (<—) Suppose TcMBT (2k ). By property 4, there is a unique calling schedule C ’ for MBT (2" ). Now remove all elements of C ’ which refer to a node not in T. We thus obtain a legal calling schedule C , C c: C ’ for T which completes the broadcast in 1: time steps. Thus T e SMBT (N ). I Observe that the above proof implies that we can embed T in MBT (2k) in such a way that the originator of r is embedded onto the originator of MBT (2"). It should be noted that it is not necessarily the case that MBT(2k-1) is a subgraph of an MBT on N vertices, where 2""1 < N 52". This is illustrated in Figure 3-6 where an MBT with 9 nodes (right) is constructed by inserting a node (labeled C) into the tree MBT(23) (left). It is clear from the figure that the original tree (left) is not a subgraph of the new one, illustrating that an MBT need not have MBT (2") , k >2 as a subgraph. Theorem 3.2: Let T(V,E) e warm). Then A(T) 5 [log N]. Proof: Let veV be a node whose degree d(v) > llogN]. Suppose veV is an origina- tor of T. Since a tree is acyclic, v must inform each of its adjacent nodes and this takes more than [log N 1 steps, contradicting the definition of T. Now suppose v is not an ori- ginator. It takes at least 1 time step to inform v , using one of the edges incident to v , v 36 Figure 3-6. An MBT not having Mar(23) as a subgraph. then has d (v )-1 uninformed neighbors. Since the only path to these nodes is through v , v must send the message to each of them in turn. This requires d(v )--1 calls. Thus the time to complete the broadcast is at least d(v) > [log N 1, again violating minimality. Thus no such v exists. I Note that the converse of the above theorem is not necessarily true; it is easy to con— struct trees with small maximum degree, which are not MBTs. For example, consider a path P of order 2" , 1: >2. To be an MBT, it would have to be possible to complete a broadcast in P in t S 1: time steps. It is clear that this cannot be done; the reason being that each vertex (Other than the originator) may only inform one other vertex, and there are more vertices in the tree than can be informed in the required number of time steps. An interesting related question is: how many nodes can be informed in a tree with maximum degree A after t time steps? In lemma 3.1, a recurrence relation is established which relates the maximum number of nodes in a tree T which can be informed during 37 time step 1:, to the maximum degree of the tree, A. Lemma 3-1: Let T be a tree with maximum degree A and MAX(t,A) be the maximum number of nodes in T which can be informed during time step 1: of a local broadcast in T. Then the following recurrence relation holds. t-l MAX(t,A)= 2 ISTSA MAX (t—l,A) + MAX(t-2,A) + - - . + MAX(t-A+1,A) T > A Proof. The correctness of the recurrence relation may be seen as follows. Consider an infinite rooted tree in which all nodes have degree A. With the root of the tree being the originator, we wish to know how many nodes of this tree may be informed during time step 1:. During each step ’t S A, each informed node has unused edges which may be used to inform an uninformed node. Thus we generate MBT (25 and consequently 2"1 nodes are informed during time step 1:. Now, consider any 1: > A. Every node which is first informed at time step t—A+1 has exactly A—l edges by which it can inform other nodes. Since it may inform only one node per time step, it will inform one node during time steps ‘t-A+2 , t—A+3, . . . ,1. Thus every node which becomes informed at step ‘t-A+l will inform a node during time step 1. Similarly, every node which becomes informed at time step T-A+k will inform a node during steps: t-A-i-lH-l ,‘t-A-l-k +2, . . . ,t. This yields the stated recurrence relation. I The relation of lemma 3-1 can be used to determine the maximum number of nodes which can be informed in a tree T with maximum degree A, in t time steps. Let M (t ,A) represent this number. Then we have I M(t,A) = Emma) 3.1 i=1 Table 3-1 shows the values of M (t ,A) for A = 2,3,4,5 and t = 1, . . . ,20. The entries in the second through fifth columns of the table are the values of M (t ,A). The last column gives the lower bound on the number of nodes which must be informed in order for a tree 38 Table 3-1. The values of M (t ,A). t A = 2 A = 3 A = 4 A = 5 Minimum 0 l 1 1 1 0 l 2 2 2 2 1 2 4 4 4 4 2 3 6 8 8 8 4 4 8 14 16 16 8 5 10 24 30 32 16 6 12 40 56 62 32 7 14 66 104 120 64 8 16 108 192 232 128 9 18 176 354 448 256 10 20 286 652 864 512 1 l 22 464 1200 1666 1024 12 24 752 2208 3212 2048 13 26 1218 4062 6192 4096 14 28 1972 7472 l 1936 8192 15 30 3192 13744 23008 16384 16 32 5166 25280 44350 32768 17 34 8360 46498 85488 65536 18 36 13528 85524 164784 131072 19 38 21890 157304 317632 262144 20 40 35420 289328 612256 524288 to be an MBT. For example, with t=5, 17 or more nodes must be informed for the tree to be an MBT. If 16 or fewer are informed, then by definition, this cannot represent an MBT. The table indicates in particular, that the largest MBT with maximum degree 2 has 6 nodes. This is because with A=2 and t =4, at most 8 nodes can be informed and thus, by definition, the tree will not be an MBT. The table also shows that if a tee with maximum degree 3 has more than 66 nodes, it is not possible to complete broadcasting in 7 time steps. Thus no MBT with maximum degree 3 may have more than 66 nodes. Similar claims may be made regarding trees with higher maximum degrees. Theorem 3-3: Let T (V,E ) e SMBT (N ) and A. be the number of levels in T. Then [logNl 2 53.5 llogNl. 39 Proof: The right hand inequality can be seen by recalling that by Theorem 3-1, T is a subgraph of MBT (2" ), where k = [logN1. By property 6, the number of levels in MBT (2") is k = [log Nl. We prove the left hand inequality by contradiction. Suppose 2. < [k /2 l . First, note that T cannot have fewer than 2k'l+l nodes (otherwise it will not belong to SMBT (N )). By property 5, the number of nodes in MBT (2") has a binomial distribution over the levels of the tree. Embed T onto M8712") and then remove all nodes in levels [k /2] and greater. Now we consider two cases. Case 1: k is odd. The disuibution of nodes is symmetric with respect to the levels in the tree. Here we have removed exactly half of the nodes from MBT (2" ), so that the number of remaining nodes is 2"". Case 2: k is even. The distribution of nodes is asymmetric. Here we have removed the middle level as well as the levels below it, and the number of nodes remaining is less than 2"". In both cases, we must have removed nodes from T in the pruning process, since the pruned tree has fewer nodes than T is required to have. This yields the contradiction. I Theorem 3-4: Let T (V,E) e SMBT (N) and p be the number of leaves in T. Then, p S 2 [losN'I-l. Proof: Let It = i log N1 and embed 7 within MBT (2"). Recall property 7. Now prune MBT (2") by removing leaves, one at at time, to yield T. Clearly, the pruning process cannot increase the number of leaves. I 3.3.3. Broadcast times for general trees We have seen that for an arbitrary tree T, [log N 1 S bt (T) S N -1, and by applying some of the above results we can further refine these bounds. Even so, this represents a large range of broadcast times. We examined all possible trees on N nodes (1 S N S 10) and have noticed that in a large percentage of these, broadcasting can be done in the optimal or near optimal number of steps. This motivated us to investigate the distribution 40 of broadcast times for all the trees of a given order. In [WROM86], an algorithm is presented which generates all free trees of a given order. Combining this algorithm with the algorithm to compute bt (T) from [SlCH81], we wrote a computer program to calculate the disuibution mentioned above, for all ord- ers N, 4 S N S 28. Verifying the correctness of the program was done by carefully deter- mining that each algorithm was accurately translated into the C programming language and by comparing the results of the program for trees up to order 10 with previously known results. The results are given in Table 3-2, from which we make the following observations. Table 3-2. Distribution of broadcast times for all trees, T, 4 S IT | S 28. Olderof Tree 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 l 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 l 2 4 2 1 0 0 0 0 O 0 0 0 0 0 0 4 O l l 7 17 28 42 46 45 29 16 4 l 0 0 0 5 0 0 l 1 3 14 52 147 370 788 1543 2727 4516 6867 9758 12715 6 0 0 0 l 1 3 7 30 105 3% 1293 3935 1M0 28407 69110 159211 7 0 0 0 0 1 l 3 7 19 63 229 848 3134 “”51 36354 114449 8 0 0 0 0 0 l l 3 7 19 47 149 494 1840 6974 26142 9 0 0 0 0 0 0 1 l 3 7 19 47 127 359 1136 3977 10 O 0 0 0 0 0 0 1 1 3 7 19 47 127 330 926 11 0 0 0 0 0 0 0 0 1 1 3 7 19 47 127 330 12 0 0 O 0 0 0 0 0 0 1 l 3 7 19 47 127 13 0 0 0 0 0 0 0 0 0 0 1 l 3 7 19 47 14 0 0 0 0 0 0 0 0 0 0 O l l 3 7 19 15 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 7 l6 0 0 0 0 0 0 0 O 0 0 0 0 0 l 1 3 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 41 Table 3-2 (cont’d). Order of Tree 1 20 21 22 23 24 25 26 27 28 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 5 15334 16814 16818 15022 11966 8249 4900 2390 954 6 350268 739433 1505525 2963910 5657301 1048m29 18865609 33m4506 5613583 7 343884 989511 2741135 7336115 1%39628 48059182 118313484 284672624 670891043 8 94433 328241 1097455 3542358 1 1075203 33654560 99685018 288543207 817941510 9 14990 57687 219887 816259 2936970 10237157 34637275 1 14047524 366427678 10 2732 8971 32196 13918 478692 1853958 7(354-86 26016218 93624639 11 889 2424 6938 21293 71723 262261 101 1898 39853” 15655171 12 330 889 2378 6506 18162 53365 168262 578628 2146253 13 127 330 889 2378 6450 17577 48756 139014 417062 14 47 127 330 889 2378 6450 17510 47986 132470 15 19 47 127 330 889 2378 6450 17510 47907 16 7 19 47 127 330 889 2378 6450 17510 17 3 7 19 47 127 330 889 2378 6450 18 1 3 7 19 47 127 330 889 2378 19 1 l 3 7 19 47 127 330 889 20 0 l 1 3 7 19 47 127 330 21 0 0 1 l 3 7 19 47 127 22 0 0 0 1 l 3 7 19 47 23 0 0 0 0 1 1 3 7 19 24 0 0 0 0 0 1 1 3 7 25 0 0 0 0 0 0 1 1 3 26 0 0 0 0 0 0 0 l 1 27 0 0 0 0 0 0 0 0 1 42 Observations from Table 3-2. 1 The distribution is heavily skewed towards the lower values of 1:. That is, in the vast majority of trees of a given order , broadcasting can be accomplished in near the theoretical minimum time. 2 A pattern is evident in the "tails" of each column. Most obvious is the diagonal of 1’s at the end of each column. Moreover, moving from left to right, the entries in each tail become fixed after a certain point. For each column after the "10" column, the last 4 entries are always 7,3,1,l. In fact the number of "fixed entries" increases by one for each even tree order. 3.4. Broadcasting in Binary DeBruijn Graphs Binary De Bruijn graphs (BDG) are graphs whose interconnections are determined by a simple coding scheme based on a numbering of the vertices in the graph. Networks modeled after binary De Bruijn graphs will be referred to as binary De Bruijn networks (BDNs). It has been shown that many network topologies can be embedded within BDNs. The ring, one dimensional array, complete binary tree and shuflie exchange net- works are some examples of such networks. Each of these networks is particularly well suited for solving certain classes of problems [SaPr89]. For example, the shuflle exchange network admits efiicient computation of FFT; a one dimensional array network is effective in solving problems in the pipeline class, among which is matrix-vector multi- plication. A single BDN can be configured to act as any of a number of special purpose networks and thus can serve as a general purpose network for the efiicient solution of a diverse set of problems. These facts make BDNs an attractive choice for the architecture of an MC. We have noted some of the issues related to broadcasting in general graphs. For an arbitrary graph, determining an optimal broadcast schedule is NP-hard. We have given a lower bound for bt (G) and noted that the value of bt (G) is related to the maximum 43 degree of G . In this section, we consider the problem of broadcasting in BDNs. We first give a precise definition of binary De Bruijn graphs and some of their properties. Then we present a distributed algorithm for doing broadcasting in BDNs which requires (2-log2 N )-1 time steps where N is the number of nodes in the network. 3.4.1. Binary De Bruijn graphs The binary De Bruijn graph, BDG (n ), has 2" vertices and can be constructed using procedure 3.1. It is not diflicult to see that each vertex (bn_l,bn_2. . . . .bo) in BDG (n) is adjacent to vertices (bn_2,bn_3. . . . .bo.a) and (a,bn_1,bn_2. . . . .bl) where as (0.1}. This makes BDG (n) a regular graph with the degree of each vertex equal to four. Procedure 3.1 Construct BDG (n ). 1. Label the 2" vertices using distinct binary n-tuples (bn_1.bn_2. . . . .b0). 2. Connect (via an edge) each vertex (bn_1.bn_2. . . . .bo) to vertices (bn_2,bn_3, . . . .b0.0) and (bn_2,bn_3. . . . .bo,1). The fact that the A(BDG (n )) = 4 (the number of connections at each processing element is fixed) and the diameter of BDG (n) is equal to n means that the diameter of the network increases only logarithmically with the number of vertices. This makes BDG (n) particularly attractive as an interconnection network. Additionally, many use- ful topologies are contained in BDG (n) including a 2" vertex linear array, a 2" vertex ring, a (Zn—l) vertex complete binary tree, a (3-2"'2-2) vertex tree machine, and a 2" vertex one-step shuflle-exchange graph [SaPr89]. As mentioned in Section 3.4, each of these t0pologies is useful for solving various classes of problems. The fact that De Bruijn graphs can accommodate these topologies makes them an especially powerful topology. The construction given in procedure 3.1 implies that the two vertices with n-tuple labels (0,0, . . . .0) and (1.1. . . . ,1) have a loop edge. Further. the two vertices with labels (0.1.0.1. . . . .) and (1.0.1.0, . . . ,). (i.e.. alternating 0’s and 1’s) have 2 edges between them. In a practical situation. the loop edges would not be used and the doubled edge would be implemented as a single edge. Figure 3-7 shows BDG (4) with the above modifications. In what follows. BDG (n) will refer to this modified graph. @ as or 04 02 @ §2 03? Q 13 ll 14 07 Figure 3-7. The mph BDG (4). 3.4.2. A distributed broadcast algorithm for BDG(n) In this section we present an algorithm which generates a calling schedule for broadcasting in BDG (n ). Observe that bt (BDG (n )) 2 n; however, in general bt (BDG (n )) > n . This is due to the fact that A(BDG (n )) is a constant. The formula 3.1 from Section 3.3.2 can be used to rigorously prove that BDG (n) is not a minimum broad- cast mph in general. Our broadcasting algorithm for BDG (n) generates a calling schedule that requires exactly (Zn-l) time steps to complete broadcasting. irrespective 45 of the choice of the originator. Also. the control of the broadcast process described by the algorithm is distributed. in the sense that each vertex can determine to whom it should transmit the message. It is only required that the label of the originator of the broadcast accompany the message. Canal to our broadcast algorithm is an integer valued function BTS (A ), called broadcast time step function. defined on binary m-tuples A . Let A = (am_1.am_2. . . . .a0) and define K(ai,aj), 0 S i .j Sm—l. as follows: 2 if a‘. 6 a]. = 0 K (a; ,aj) = 1 otherwise where e is the binary Exclusive-0R operation. BTS is used to determine the time step when a vertex becomes informed. Now. if m S 1 then BTS (A) = 0, otherwise BTS (A) is defined as: m-2 BTS(A) = EK(ai’ai+l) i=0 Example 3.1 Let A = (1,0,1.0,0.1.0). Then BTS (A) = 1+l+2+1+1+1 = 7. Note that the value of BTS (A) is maximum if and only if am_1 = and = - - - = a0. and the maximum value is 2(m -1). Intuitively. our broadcasting algorithm proceeds as follows. Each vertex will inform at most two of its adjacent vertices. In particular. for a vertex B = bn_1,bn_2. . . . ,bo). let LSNE (B) and LSCE (B ) be two of its adjacent vertices whose labels, respectively, are LSNE(B) = (bn_2,bn_3, . . . .bo.bo) and 1503(3) = (bu_2.bn_3, . . . .b0.bo) where 5; is the bit complement of b... (LSNE and was stand for Left Shift with Normal Extension and Left Shift with Complement Extension, respectively.) Vertex B will attempt to inform vertices LSCE (B) and LSNE (B ), and in that order provided that these vertices have not already been informed. Observe that this scheme implies that a vertex 46 Algorithm 1. 1: Find the largest i such that B = (b "-19b "-2, o o 0 ,b0) = (Si_1,8i_2, . e e ,SOybn_i_l,bn_‘-_2, o o 0 ,b0) 2: /* check LSCE neighbor */ IF LSCE(B) = S goto4. ELSE find the largest j such that LSCE (B) = (s._1,sj_2, . . . .so,bn_j_2,bn_j_3. . . . .bofo) 3: IF BTS(so,bu_l_2.b n+3, . . . ,boj'o) > BTS(so,bn_,._1.bn_i_2. . . . .bo) THEN inform LSCE (B). 4: /"‘ check LSNE neighbor */ IF LSNE (B) = S STOP ELSE find the largest k such that ”NE (B) = (st-IJk-z' e o o 'SO’bfl-k-Z’bn-k-3' e e . ,bo,bo) 5: IF BTS(s0.bn_k.2.bn_k_3.... .bo,bo) > BTS(so,b,H_l. bn_,_2,...,bo) THEN inform LSNE (B ). STOP. can potentially get informed from two other vertices. since for any vertex A = (an_l,an_2. . . . .ao) there are exactly two other distinct vertices X and Y such that either LSNE (X) = LSNE (Y) = A orLSCE (X) = LSCE (Y) = A . namely. X = (0. "_1.an_2. . . . ,al) and Y = (l. "_1. an_2, . . . .al). The function BTS will be used to check whether a vertex is already informed or not. We proceed by stating our broadcast algorithm, Algorithm 1. formally. An example of its execution is given. and then we prove a theorem which implies the correctness of the algorithm It is assumed that the broadcast originator is vertex S = (s _1,sn_2. . . . ,so) and this information accompanies the broadcast message. Each vertex B , in BDG (n ). once informed of the broadcast message, will perform Algorithm 1. Example 3.2 Let us trace its execution for vertex B = (1.0.0.1) when the network is BDG (4) and the originator of the broadcast is vertex S = (0.0.1.0). In this case. 47 LSNE (B) is the vertex (0.0.1.1) and LSCE (B) is the vertex (0.0.1.0). The value of i found in step 1 is 2. Since LSCE (1.0.0.1) = (0.0.1.0) = S . execution will continue from step 4. In Step 4, the value of k is found to be 1. Thus. in step 5 we have: BTS(so,bl.bo,bo) = BTS(0,0,1.1) = 2+1+2 and BTS(so.bl,bo) = BTS(0.0.1) = 1+2, and therefore since BTS (s0.b1,bo.bo) > BTS (s 0,b l,b0). LSNE (B) will be informed. The broadcast process in BDG (4) is illustrated in Figure 3-8. The arrows indicate the direction of the broadcast and highlight the spanning tree that is induced by the broadcasting. Note that, as in example 3.2, node 9 informs node 3 and not node 2. Since node 2 is the originator. Theorem 3-5. Let vertex S = (sn_l,sn_2. . . . ,so) be the originator of a broadcast in BDG (n ). Also, let A = (an_1.an_2. . . . .ao) be an arbitrary vertex in BDG (n) and find the largesti such that A = (an-l’an-2’ - - - '00) = (Si-lJi-2' - ° - 'SO’an-i—l’an-i—2’ - - - .ao) - Then. using Algorithm 1, vertex A is informed at the end of time step T(A) = BTS (so,a,._i_1.an_i_2. . . . .ao) Moreover, at the end of time step (2n -1) all vertices are informed. Proof: We prove the theorem by induction on the number of time steps. Clearly. only T(S) = BTS(s0) = 0. Now suppose the theorem is true for all vertices B with T(B) Sit. and let A be a vertex such that T(A) = k +1. Consider the vertex X = (spsH, . . . ,so,an_i_1.au_i_2. . . . ,al) where i is defined as in the Statement of the theorem. Observe that vertex X is adjacent to vertex A in BDG (n ). Since we either have LSNE(X) =A orLSCE(X)=A. Also. 48 ® .. 6 ”--~ ~~~~‘ ‘~ 5 ..... ~ ‘~ ... {29 4 I I i ,t i I 2 I I I ’ I : 04 02 : I ‘ 3 '3' . l ‘ ’p' I I I I ' ‘\ o" I 1 \ ’l’ l I I \ I . . . ® . ' 1 ‘~ .r" 5 i ’ I ’3’ I I 4" ‘ I O' . O Q ({0} 05 - ® 6 .... vvvv 14 Figure 3-8. An example broadcast in BDG (4) using Algorithm 1 T(A) = BTS (so,an_i_1.an_i_2, . . . ,a0) = BTS(so,an_i_l,an_i_2.....al) + BTS(a1.ao) T(X) + BTS(a1,ao). Since BTS(a1.ao) 2 l. we have T(X ) S It. By the induction hypothesis. vertex X was informed by the end of time step T(X ). We now consider the following two cases: Case l: 016 a0: 1. This means that BTS(a1,ao) =1 and thus we have T(X) =k. Also. in this case, LSCE (X) = A . and therefore. by Step 3 of Algorithm 1, vertex A will 49 be informed at the end of time Step k+1. Case 2: 019 a0 = 0. This implies BTS(al,ao) = 2 and thus we. have T(X) = k-l. In this case. LSNE (X) = A, and therefore. by step 5 of Algorithm 1. vertex A will be informed at the end of time Step 1: +1. In both cases. vertex A is informed at the end of time step T(A ). Note that when A = ($050, . . . .I'o), T(A) is maximum. In this case. T(A) = 2n-1. Therefore, at the end of time step (Zn-1) all vertices are informed. I The above theorem and the inequalities in Step 3 and step 5 of Algorithm 1 ensure that each vertex receives the broadcast message exactly once. Thus Algorithm 1 satisfies the criteria of local broadcasting. Chapter 4 Fault-Tolerant Loop Architectures The large number of components (processors, memory units. etc.) in an MC. cou- pled with the demands placed on them by the types of applications that they are intended to be used for, make reliability concerns a very important issue. We noted in Chapter 1 that one method of improving reliability is to design a system in such a way that it will be able to tolerate component failures and still remain operational. For an MC. fault- tolerance at the system level (also referred to as the topological level) implies that the type of faults to be tolerated are processor and/or link failures. and an MC (by an MC we will often mean its topology) is said to be fault tolerant if it can remain functional in the presence of such failures. It is. however. the topological requirements of the application that essentially determine when an MC is considered to be functional. No basic functionality criteria have received much attention. According to one of these criteria. an MC is considered functional as long as there is a nonfaulty communica- tion path between each pair of nonfaulty processors [ArLe81, PrRe82, RaGA85. Boes86. 50 51 Esfa88]. In other words, the underlying topology of the MC Should remain connected in the presence of certain failures. Both processor and link failures have been considered in the literature. This fault-tolerance model is particularly applicable to large-scale MCs that are to execute concurrent algorithms whose performance is mostly insensitive to infiequent changes in the topology of the system. Such MCs generally permit graceful degradation. The second functionality criterion considers an MC functional as long as a desired topology is contained in the system. In the past few years much work has been done in developing parallel algorithms and the best MCs for their executions [Quin87]. For these algorithms, the existence of certain topologies is a significant factor in delivering the desired performance. Thus. for such applications. the system should be able to provide a specific topology throughout the execution of the algorithm. The existing work on such fault-tolerant tapologies has mainly considered processor failures. The basic approach in achieving fault-tolerance when this functionality criterion is used is to employ system-wide redundancy. Two topologies that have received much attention are the ring and the tree [Haye76. Kvfl'o81. RaAE84, LoFu87. DuI-Ia88] Here, fault-tolerance will refer to the second functionality criteria. and we will con- sider only processor failures. We assume that all failures are permanent and that some diagnosis mechanism is available to detect and isolate the faulty processors. Further. it is assumed that in the event of processor failures. some mechanism exists that allows the system to be "reconfigured" (using redundant processors) to maintain the desired topol- ogy. This also implies that as a part of this process. the tasks assigned to the faulty pro- cessor can be successfully shifted to a fault-free processor. Neither the diagnosis nor the reconfiguration problems will be addressed here [YaHa86]. In designing such a fault tolerant topology, the number of faults to be tolerated is specified a priori. Given this value, say It. a topology is determined such that the desired sub—topology can be realized given any collection of k faults. In previous work. the 52 primary design criteria has been to employ the minimum number of redundant processors in deference to other parameters. such as the number of communication links. etc. The overriding rationale for this choice has largely been economic: processors are usually the most expensive component. However. using the absolute minimum number of Spare pro- cessors has typically forced these designs to incorporate a very large number of redun- dant communication links; in mph theoretic terms each vertex has very "high" degree. From a practical perspective this presents a problem in that processors have only a finite (usually quite small) number of communication ports. For large scale systems these designs can only be used to achieve low levels of fault tolerance, since for high levels. the number of ports needed would exceed the number available. Additionally. in terms of VLSI and WSI technology, one of the primary design concerns is overcoming pin- limitation and layout problems; designs which require large numbers of links only exa- cerbate these problems [ChCD88, Dall88]. In this chapter, we present an alternate primary design criterion and employ it in the design of fault-tolerant loop t0pologies. The basic idea is to place a bound on the max- imum number of connections at each processor and determine the number and configuration of redundant components required to achieve a specified level of fault- tolerance. Loops are a common interconnection topology and are important in their own right. However, as many other topologies have loops "embedded" within them. results relating to the construction of "low" degree fault-tolerant loops can give insight into the feasibility of such constructions for other topologies. 4.1. Background In this section we introduce some additional terminology. definitions and a formal statement of the problem we will consider. As we have done throughout this disserta- tion. we model the topology of an MC by a mph G (.V,E ). In particular. a loop intercon- nection network connecting N processors is modeled by the cycle CN . The failure of a 53 processor is modeled by the removal of the corresponding vertex and its incident edges in the mph. As previously mentioned, in this model an MC is functional as long as a desired structure is contained within the system. This criterion was first formulated by Hayes and can be stated formally as follows [Haye76]. Let G (V ,5) represent the topology of an MC and D (V,E ) be a desired structure. Then, G is k-fault-tolerant (k -ft) with respect toD. iffor any set offaulty vertices F cV(G), with IF I = k. the mph D :G -—F. We also say that G is a k-ft D mph. Thus, G is a l-ft N -cycle means that CN C G — {v} for each ve V(G ). It is clear from this definition that G is a k-ft D graph implies | V(G) | 2 | V(D) | + It. For a given mph D, and specified integers k and m. the set of all k-ft D mphs G. with |V(G)| = |V(D) | +m. will be denoted by I‘k [D .m]. Using the above definition. several classes of problems can be formulated by impos- ing certain requirements on the mph G. The problem originally proposed by Hayes can be Stated as: Given a graph D , and a positive integer k . construct an "optimal" graph, G . which is k -ft with respect to D . The optimality criteria considered in [Haye76] is: (a) |V(G) l = |V(D) I + k (b) [E (G) | is as small as possible subject to (a). In [Haye76] three candidates for the mph D are considered, namely cycles. trees and Simple paths. In particular. a construction is given for G when D is a cycle of even order. In this case. G is a (k+2)-regu1ar mph. A similar construction is also given for the case of D being a cycle of odd order. The work of Hayes was followed by the results in [KWI‘oSL RaAE84, LoFu87. DuHa88]. In these papers the authors have adhered to the same optimality criterion. but imposed additional restrictions on the set F . For exam- ple. in [RaAE84] the case of D being a binary tree is considered with the additional res- triction that failures only occur at difi’erent levels of the tree. 54 The criterion (a) alone dictates that the number of spares be exactly equal to the desired level of fault tolerance. When no additional restriction is placed on the set F . this criterion implies 8(6) 2 8(D) + 1:. Thus the degree of each vertex in G is at least proportional to the number faults to be tolerated. Since current technology limits the number of connections at each processing node (and it seems this constraint will persist for some time to come [ChCDSS. DA1188],) the above optimality criteria may not be viable in certain situations. To address the above concern. we propose the following modified version of the problem. Given a desired mph D and positive integers k and r . construct a mph G such that: (a) D is a subgraph ofG-F for any setF c: V(G). with IF I = k. (b) G is r-regular. and (e) no mph satisfying (a) and (b) above has fewer vertices than G . It is clear that this problem may not have any solution for certain choices of D . k . and r. A case in point. when r=2 and D = C . there is no solution G for any It . Thus 3 is the smallest value of r for which a solution may exist for the case of D = CN . In the sequel we will consider the modified problem for the case r=3 and D = CN , N even. We will show that no solutions using exactly I: spares exist; thus we must incor- porate "extra" redundancy into the design. In fact. we Show that the minimum number of spares required is bounded by 21:. that is | V(G)| 2 |V(D) | + 2k. and we will exhibit a class of mphs. called Chordal Rings for which the lower bound is achievable for It =1,2.3. Further. this result will be shown to be "optimal" in the sense that higher levels of fault tolerance cannot be achieved for this class of mphs using the minimum number of spares. 55 4.2. Chordal Rings as lc-ft Cycle Topologies In this section we will define the Chordal Ring class of mphs, and we examine the extent to which they can serve as fault-tolerant mphs for CN . The results are presented in the form of several theorems, organized as follows. Theorem 4-1 establishes a lower bound on the number of spares required by the use of cubic (3-regular) mphs. Theorems 4—2 and 4-3 establish that certain members of the chordal ring class are l-ft and 2-ft for CN . No lemmas are given to establish Theorem 4-4, which provides a necessary condition for any mph G to be a member of the class I‘leN.2k]. This theorem is used in proving Theorem 45. which establishes that 3-ft is the highest level of fault-tolerance that can be achieved using Chordal Rings. Finally, in Theorem 4-6, we prove that this level of fault tolerance can be achieved by the chordal rings CR (M ,7). We begin with some additional definitions that we make use of in the proofs. Definition 4.1. A fault-set F is a set of faulty vertices. Definition 4.2. Given a mph G and a desired mph D, a vertex v is unusable with respect to a given fault set F . if no submph of G - F isomorphic to D contains v . Example 4.1 Let G be 3-regular and the desired mph D be a cycle. Let {u1.u2.u3} c V(G) and {(u1.u2).(u2.u3)} c E (G ). Then the fault set [u1.u3} makes u2 unusable because u2 has only one edge connected to a non-faulty vertex in G . implying that u2 cannot be a vertex in any cycle. It is "trapped" between two faulty vertices. Note that by this definition any faulty vertex is unusable. A vertex is usable if it is not unusable. Definition 4.3. Given a mph G. a fault set F, and a desired mph D, the set of all unusable vertices induced by P will be denoted U (P) or simply U . Clearly, a necessary condition for a mph to be k-ft for CN is that there be N usable vertices for every fault set F of order k . In other words, the maximum cardinality of an unusable set may not exceed the total number of spare vertices. As we have noted. the tradeofi‘ in using cubic mphs for k-ft cycles is that we must include "extra" spare ver- tices in the design. Theorem 4-1 establishes the lower bound (cited earlier) on the 56 amount of redundancy required. Theorem 4-1: Let G e I‘k[CN ,m] be 3-regular and connected. Then m 2 2k-1. Proof. By definition. there is an N -cycle in G . Label the vertices of this cycle v0,vl.....vN_1 and consider the fault set F = {v1.v3. . . . .v2k_1}. As in the example 4.1, the failure of the vertices v2“l and v2, +1 . 1 Si S k-l . "traps" the vertex v2... making it unusable. Thus all the vertices "between" such pairs of vertices in F are unusable. This gives {v1,v2. . . . .vu_1} c U and hence I U | 22k-1. I Since we are considering designs for even N , Theorem 4-1 and the fact that there exists no cubic mph of odd order imply that the minimum number of spares is at least 21: . Hereafter we restrict our attention to cubic mphs having this minimum number of spares, i.e.. those in the class I‘k[CN,2k]. In addition. we will consider only the cases where k < N /2. This seems a reasonable requirement for large scale systems and elim- inates many Special cases in the proofs. 4.2.1. Chordal rings A chordal ring of degree 3, hereafter referred to simply as a chordal ring. is a 3- regular Hamiltonian mph. Clearly, chordal rings are of even order. Following [ArLe81] we define a specific class of chordal rings, CR (N ,w ). which can be constructed using Procedure 4.1. The parameter w is called the chord length which is required to be odd and at least 3. Note that CR (N .w) is bipartite with vi and vi +1 being in different parti- tions. Without loss of generality. we will assume that w S N /2. The chordal ring CR (20.3) is shown in Figure 4-1. (For sake of readability. we have labeled the mphs in the figures using integral labels. i.e.. vertex v‘. is labeled simply i.) Procedure 4.1. Construct CR (M .w) 1. Construct a cycle of order M; labeling the vertices 0.1, - - - ,M-1. 2. Foreach i,0Si < (M—l)/2, add the edge (2ij) wherej = (2i 4» w)mod M. 57 Figure 4-1. The Chordal Ring CR(20.3). 4.2.2. Fault tolerance of CR(N.w) As we noted earlier, two solutions for l-ft cycles were given in [Haye76]. The solu- tion for even cycles has A = 4, while the solution for odd cycles is cubic. We have noticed that Hayes’ solution for 1-ft C 2’. +1 (using one Spare) can also serve as a l-ft C2]. (using two Spares). In fact Hayes’ design is the chordal ring CR (2] +2.3). Thus such chordal rings belong to 1‘1 [C 21. ,2]. The following theorem establishes a Stronger result for l-ft cycles. Theorem 4-2: For any even N 2 4 . CR (N +2.w) e I‘l[C .2]. where w is as defined above. Proof. Let N be given and let the vertex set of CR (N +2.w) be V = {v0.v1, - - ° .vN+1]. Without loss of generality. let P = {v0}. Then it is not diflicult to see that the N -cycle 58 v1.v2.v3. ' ' ' ’vN+1-w’vN+1’vN’. ' ' ’vN+3—w’v1 is contained in CR (N+2.w) — F. I For the 2-ft case. we have proved [ZiEs89] that no cubic mph exists for N =4. and we thus consider N 26. The next theorem establishes that for such N . the chordal rings CR (N +4.3) are 2-ft for C” . Theorem 4-3: For any even N 2 6. CR (N +4.3) 6 F2[CN.4]. Proof. Let N be given and let the vertex set of CR (N +4.3) be V = [vo,vl. - - - .vN+3}. To establish the result, we must Show that for any fault set F = (x, y I c V. C" : CR (N +4.3) - F . Since the design uses 4 spare vertices. for any fault set F as above, two additional vertices will not be used in the resulting cycle. The two faulty vertices and the two additional ones will be referred to as the inactive vertices for the fault F . Without loss of generality, we let x = v0 and consider the possibilities for y . There are 3 cases: Case 1. y = vl Consider the two paths: v3.v4.v7,v8. ° ' ' ’v4i-1’v4i’ ° ’ ' ’vN-l’vN’vN-o-l V2’V5’Volv9’ ' ° ' ’v4i-2’v4i +1, ' ' ' ’vN-3’vN-2’VN +1 ° These two paths have only the vertex v” +1 in common. A cycle of order N can be formed by concatenating the two paths and including the edge between v2 and v3. Note that the vertices vN +2 and vN +3 are inactive in this cycle. An example of such a cycle is given in Figure 4-2 for CR(20.3). The edges in the 16-cycle are shown as solid lines; dashed lines represent unused edges. In the remaining two cases we will exploit the Hamiltonian cycle shown in Figure 4—3 to establish the result. This cycle is made up of paths of the form v25 “,vz‘. ,v2i +3; that is, an edge fiom the "peripheral" cycle followed by a chord in the "opposite" direction. It is clear that such a cycle always exists in CR(M.3). The salient feature of this Hamil- tonian cycle is that we may remove any two "endpoints" of a chord and using an edge Figure 4-2. A 16-cycle in CR (20.3) - {0,1}. previously not used. obtain a new cycle without altering any other edge or vertex in the original cycle. The order of the new cycle is obviously two less than that of the initial one. For example. we may remove the vertices v0 and v3 in Figure 43 and employ the edge (v 1.v 2) to obtain the cycle shown in Figure 44. Case 2. y = vj . j¢1,3 In this case, an N -cycle can be obtained by removing the chords (i.e.. their endpoints) corresponding to the faulty vertices as discussed above. Case 3. y = v3 . Here. the two faulty vertices are the endpoints of a chord. To obtain a cycle of the correct size. we choose the endpoints of any other chord in the mph (with the exception of the pair V]. V” _3) and remove them to obtain a cycle of the desired Size as in case 2. Thus for all combinations of 2 failures we can obtain a cycle of order N. I 60 We have Shown that CR (N +2k .w) e I‘,‘[CN ,2k] for k =l.2 and w =3. We will next Show that the 3-ft is the best that can be achieved from chordal rings using the minimum number of spare vertices. To do this we need the following results. Lemma 4-1 : Let G be a 3-regular and connected mph. Further. let G e I‘kICN.2k]. Then the girth ofG.g(G)2k+1. Proof. By contradiction. Suppose G has a j-cycle C j. j S k. Consider the set A = [v lvd Cj .v is adjacent to some vertex u e Cj}. Call A the adjacent set of Cj and let I A | = m. Note that m Sj. since each vertex in Cj has one incident edge not in the cycle with which to connect to another vertex in G . Suppose all the vertices in A are faulty. The vertices in Cj are "trapped" by the fault set A and thus (A U Cl.) C U. Since G is k-ft for CN and m S k. there is an N -cycle in G which does not include any vertices in A UCj. Let wo,w1. . . . .wN_1,w0 be such a cycle. Since G is connected, there is a path between some vertex vo eA to a vertex “’0 in the N -cycle which does not include any other vertex in A. Let this path be P = v 0.ul,u 2. . . . , P,w0. Note that the path P may be of length 1. Now extend P to P ’ by concatenating with it the path formed by the vertices of the N-cycle: P’ = vo,ul,u2. . . . . vaoth . . . .wN_1. So. we have established that there is a path beginning at v0. of length at least N . which does not con- tain any vertices of A u Cj other than v0. Let us relabel P’, denoting the first 2k-2m +2 vertices as vo.v1.v2. . . . ,vz,‘_2m +1. Now consrder the fault set: F =A U [v1.v3, . . . 'v2k-7JI4-l} - {v0} which contains | F | =m + (k-m+l)-l = k nodes. Thismakesthe set [A UCJ. U {v1.v2, . . . ’v2k-2m+1}] CU and thus I U| 2m + j + 2k—2m+122k+1,which contradicts the definition ofG. I Lemma 4-2: Let G be as in Lemma4.l, then g(G) ¢k+1. Proof. By contradiction. Suppose G has a k + l-cycle C , and let A be the adjacent set of this cycle as in Lemma 4.1. If IA |Sk. then the proof ofLemma 4.1 also holds here. 61 Figure 4-4. An l8-cycle in CR (20,3) - {0.3} 62 and (C uF)cU. This implies | U | 2k+1+k =2k+1 which contradicts the definition of G . I Lemmas 4.1 and 4.2 give us the following theorem. Theorem 4-4: Let G be a 3-regular and connected mph. Further. let G e I‘,‘[CN.2k]. Theng(G)2k-I-2. I We are now ready to state the following theorem which indicates that 3-ft is the highest level of fault-tolerance that can be attained with Chordal Rings using the minimum number of spare vertices. Theorem 4-5: CR (N +2k ,w) e I“ [CN ,2k] for any k24. Proof. For any values of N and w as defined above, the mph CR (N .w) contains the cycle induced by the edge set: {(vo.v1).(v1.vz).(v2.vw,2).(vw,2.vw,1).(vw,1.vw).(vw #0)} . This implies that g (CR (M .w ))S6 and therefore from Theorem 4 the conclusion holds for k25. For k =4. consider the fault set F = {VO’VZ’V4’VN+2k—w+2}' This makes the set F’ = FVW1’v3’VN+2k—w+l’vN+2k-w«I-31 C U‘ Since five of these vertices all belong to one set of a bipartition of CR (N +21: .10 ). the largest cycle that can exist in CR (N +2k.w) —F’ is C” +2.40. This gives the desired result for k =4. I The final theorem of this chapter. establishes that the chordal rings CR (N +6.7) are 3-ft for C”. N 2 20. These mphs have chord length equal to seven. Note that since g (CR (M .3)) = 4. we may conclude from theorem 4-4 that CR (N +6.3) 4 1‘3[C~,6] for any N. In addition. for the chordal rings CR (N +6.5). we have found counterexamples which Show that the mphs are not member of 1‘3[CN ,6] for arbitrarily large values of N. 63 Theorem 4-6: CR (N .7) e I‘3[CM .6]. where M =N —6 and N 2 26. Proof. We need to show that for any fault set F = {xy .2 }, CM : CR (N .7)—F. We proceed by specifying fault-sets F , and demonstrating that an M -cycle is present in the presence of each such fault-set. For some fault sets, a Specific cycle will be given explicitly. In the remaining cases. the existence of an M ~cycle in a specific mph or set of mphs along with an inductive argument will demonstrate that the fault-set can be tolerated for every CR (N .7). where N is as in the statrnent of the theorem. The proof employs four types of inductive arguments. which are illustrated in Figures 4-5, 4-6. 4-7. 4-8. A fifth type of argument (non-inductive) shows how a special class of faults can be handled. Each of these arguments is described below. In our approach, each fault set is characterized in terms of the relative distances between its member vertices; the induc- tive argument is always done in such a way as to preserve these relative distances, and hence the fault-set itself. . Figure 4-5 illustrates the most straightforward inductive variation. In the figure, all the faults are contained "within" one chord. The M -cycle is apparent. It is clear that we may add a pair of vertices to this mph. (anywhere except between vertices 26 and 5) extending the cycle by two. and not altering the fault set. We thus obtain a chordal ring of order N +2, which has an M +2 cycle for the given fault-set. The key characteristic here is that there be an edge in the M ~cycle of the form (2i +1.2i +2) such that there is no chord in the M -cycle which begins at the vertices 2i -4. 2i -2, 2i. Intuitively. no edge of the M -cycle "jumps" past the vertices 2i +1. 2i +2. It is clear that in such a circumstance. we may add any even number of vertices between vertices 2i +1 and 2i +2, which will establish the pattern for all chordal rings. We will refer to this as the induct-by-two (1B2) pattern. The second inductive variation is Show in Figure 4-6. The key characteristic is that there be an edge in the M -cycle of the form (2i ,2i +7) (i.e. a chord) and that the edges (2i +2.2i +9) and (2i -2,2i +5) are not used in the M -cycle. Intuitively. imagine drawing Figure 4-5. An induction-by-two illustration. a straight line from the center of the circle, between two vertices such that the only edge of the M -cycle intersected is a chord. Observe that such a cycle can be extended by adding four vertices between vertices 2i +3 and 2i +4, as shown in the bottom mph. In the example Shown, 2i = 10 and the four new vertices are labeled "N" for "new". To make use of this pattern, we will demonstrate the desired cycle in mphs of orders N and N +2. and then use this property to induct by four to yield the conclusion. We will refer to this as the induct-by-four (1B4) pattern. Figure 4-7 illustrates the first of two "induct-by-Six" patterns. The key feature in this first pattern is that the M -cycle "doubles-back" on itself. Intuitively. a line can be drawn which does not intersect any edge of the M -cycle. Note that "doubling back" implies the existence of two "corners": vertices 5 and 6 are corners in the upper mph. Observe that any such cycle can be extended to a cycle with 6 additional vertices as shown in the lower mph. The extension can be accomplished by removing the peri- pheral edge (6.7). adding 6 new vertices between vertices 5 and 6 and reforming the 65 Figure 4-6. An induction-by-four illustration. 66 cycle as shown. Note that the extension can be done at either corner. Since this process adds six vertices. to use it inductively it will be necessary to establish that a cycle exists for a given fault-set in mphs with orders N ,N +2,N +4. We will refer to this as the induct-by-six-A (IB6A) pattern, Figure 4-7. The fourth variation will be referred to as the induct-by-six-B (IB6B) pattern, illus- trated in Figure 4-8. The characteristic here is that the M -cycle uses three consecutive chords (2i .2i +7). (2i +2.2i +9) and (2i +4.2i +13) and does not use the edge (2i +5,2i+6). The cycle is extended by adding six vertices between 2i +5 . 2i+6 as shown (2i = 8 in the example). Again. this situation may be intuitively characterized as being able to draw a line from the center of the circle between a pair of vertices (2 j +1.2j) which intersect exactly three edges of the cycle. all of which are chords. The final type of argument we use is illustrated in Figure 4—9. The idea is to identify a cycle pattern for a specific fault-set and observe that the fault set can be "pushed" for- ward and the cycle reformed. In the top mph, the fault is F = {0.3.4.5}. The cycle uses the edge (1,2) and then "skips over" vertices 3.4.5 using a chord. The cycle then traverses groups of four vertices connected by peripheral edges. with each group con- nected to the next by a chord. Observe in the middle mph that the fault-set has been advanced forward by four vertices. i.e. F = {0.7.8.9}. and the cycle reformed. Essen- tially. one of the groups of four vertices has been displaced (backwards) by the advanced fault set. This process can be continued until all the groups of four have been displaced, as seen in the bottom mph. Note that typically. as in the figure. the cycles we exhibit have specific patterns before and after the fault-set being advanced. The number of ver- tices needed by these patterns (before and after) varies between fault-classes. but typi— cally meets a certain congruence relation and range criteria. For example. the fault class above can be described as follows. F = {0.2i—l,2i,2i+1}. where 4S2i SN—l4 and N -2i 5 2 mod 4. This argument will sometimes be used in conjunction with the four inductive arguments in establishing the result for certain fault-sets. We will refer to this 67 Figure 4-7. An induction-by—six, version A illustration. 68 Figure 4-8. An induction-by-six. version B illustration. 69 Figure 4-9. An advance-by-four illustration. 70 as the advance-by-four (AB4) pattern. Since the mphs we are considering are bipartite. we may assume without loss of generality that x = 0 and then consider the possible values for y and z . Note that when y is odd. by symmetry we need only consider values of z in the range 2y S 2 S (N +y —l)/2. Also. when y is even. we need only consider values of z where 2y S 2 S N -2y -1. We proceed by considering several cases based on choices for y . For each case. we provide a series of specific fault sets, numbered for convenience. CASE 1. y e {1.2}. Fault Set: 1 F = {0.1,2,3.4,N-1]. Here. all the faults are contained "inside" a chord, which can be used to "short circuit" the faulty vertices. See Figure 4-5. This also covers the faultF = {0.1,2,N-3,N-2,N—1]. Fault Set: 2 F = {0.1,2.2i-1,2i,2i-1} , where N-2i I 2 mod 4 and 6 S 2i SN—6. See Figure 4-10. We may apply AB4. The requirement N —2i 5 2mod 4 simply guarantees that the number of vertices after the fault is a multiple of four so that the group of four pattern will work. Fault Set: 3 F = [0,1,2,2i-l.2i,2i-1] , where N-2i s. 0 mod 4 and 6 SN-2i SN—20. See Figure 4-11. We may apply AB4. Here the path after the fault-set requires 18 ver- tices. The inequality aboves ensures that there are enough vertices for this pattern. The above fault-sets cover all the odd vertices and all even vertices except N -l6,N -12,N-8,N —4, which are considered next. Fault Set: 4 F = {0,1,2,N-16]. See Figure 442. Apply IB2 to the edge (3,4). Fault Set: 5 F = {0.1.2,N—12]. See Figure 4-13. Apply IB2 to the edge (3.4). Fault Set: 6 F = {0.1.2,N-8}. See Figure 4.14. Note that the cycle doubles back at vertices 5 and 6. We may apply IB6A to either corner. It is important to note that we are not altering the fault pattern, since in this case we may consider the faults to be 71 Figure 4-11. F = {0.1.2,2i-1.2i,2i+1} , N—2i 50 mod 4 72 Figure 4-12. F = {0.1.2,N-16}. Figure 4-13.F = {0.1.2,N-12}. Figure 4-14. F = {0.1.2,N-8}. 74 "contained" between N —8 and 2 (in a clockwise sense from N —8) and altering the graph outside of this containment preserves the relative distance between faults. Fault Set: 7 F = {0.1.2,N—4}. See Figure 4-15. Apply IB6A to either corner. We have covered all possible fault-sets of the form {0.1.2.2 }. completing this case. CASE 2. y = 3, 6S2 SN/2+l. Fault Set: 8 F = {03.6.7.8}. See Figure 4-16. Apply IB6A to either corner. Fault Set: 9 F = {0.12.3.9}. See Figure 417. Apply IB6A to vertex N -1. Fault Set: 10 F = {0.3.4.10.11}. See Figure 4-18. Apply IB6A to vertex 12. Fault Set: 11 F = {0,3,12,13,14}. See Figure 4-19. Apply IB2 to the edge (15,16). Fault Set: 12 F = {0.3.10.2i-l.2i ,2i +1 }. Where 14 S 2i S N-10 and N—2i E 2 mod 4. See Figure 4-20. Apply AB4. Fault Set: 13 = {0.3.10.2i-1,2i.2i+1}. Where 14 S2i SN—16 and N-2i a 0mod 4. See Figure 4-21. Apply AB4. This completes case 2. CASE 3. y = 4. 8 S2 SN-3. Fault Set: 14 F = {0,4,2i-1.2i.2i+l}. Where 8 S 2i SN-10 and N-2i a 2 mod 4. See Figure 4-22. Apply AB4. Fault Set: 15 F = {0.4,2i—l.2i.2i+l}. Where 8 S2i SN-12 and N-2i50mod4. See Figure 4-23. Apply AB4. 15. F = {0.1.2,N—4}. Figure 4- .8}. 7 9 {0.3.6 Figure 4-16. F {0,1,2,3.9]. 17. F Figure 4- ,10,11}. = {0,3,4 Figure 4-18. F 79 Figure 4-19. F = {0,3,12,13,14}. Figure 4-20. F = {0.3,10.2i-l,2i,2i+1}.N—2i =2mod 4. Figure 4-21. F = {0.3,10.2i-1,2i,2i+1],N—2i EOmod 4. 81 Figure 4-22. F = {O,4,2i-1,2i,2i+l].N-2i 52mm! 4. Figure 4-23. F = {0.4,2i-1,2i,2i+1}.N—2i 50mm! 4. 82 The previous sets cover all possible values of z in the range 8 S 2 S N —8. The next three fault-sets complete this case. Fault Set: 16 F = [0,4,N-8,N—7}. See Figtue 4-24. We may apply IB2 to the edge (5.6). Fault Set: 17 F = {0,4,N —6,N—5}. See Figure 4-25. Apply IB6A to either corner. Fault Set: 18 F = {0.4,N-4,N-3}. See Figlue 4-26. Apply IB6A to either corner. CASE 4. y = 5.10 S z SN/2+2. Fault Set: 19 F = {0,5,10,11}. See Figure 4.27. Apply 1B4 between vertices N -1 and 0. Figure 4-24. F = {0.4,N-8,N—7}. Figme 4-25. F = {0,4,N—6,N—5}. 84 -3}. [0.4.51 -4,N Figure 4-26. F 85 {0,5,10,11}. Figure 4-27. F 86 Fault Set: 20 F = {0.5.12}. See Figure 4-28. Apply IB2 to the edge (13.14). Fault Set: 21 F = {0.5.13}. See Figure 4-29. Apply IB6B between vertices 15 and 16. Fault Set: 22 F = {0,5,14,15,16}. See Figure 4-30. Apply IB2 to the edge (17.18). The previous fault-sets are sufiicient (for this case) for N =26 and N =28. The following two complete this case forN 2 30 Fault Set: 23 F = {0.5,2i-l,2i.2i+l}. where 14S2i SN-10 and N-2i §2mod 4. See Figure 4-31. Apply AB4. Fault Set: 24 F = {0.5,2i-l,2i,2i+1}. where 14S2i SN-l6 and N-2i IOMOd 4. See Figure 4-32. Apply AB4. This complete the case {0,5,2 }. Figure 4-28. F = {0.5.12}. Figure 4-29. F = {0.5.13}. Figure 4-30. F = {0,5,14,15,16}. CASE 5. y e {6,7}.12Sz SN—6. Fault Set: 25 F = {0,6,7.12,13}. See Figure 4-33. Apply IB6B between N -l and 0. Fault Set: 26 F = {0,6,7.13.l4}. See Figure 4-34. Apply IB6B between N -1 and 0. Fault Set: 27 F = {0.6.7.8,15}. See Figure 4-35. Apply IB6B between 23 and 24. Fault Set: 28 F = {0,6,7.N-7}. See Figure 4-36. Apply I82 to the edge (17,18). 89 Figure 4-32. F = {0.5.2i—1,2i,2i+1].N-2i aOmod 4. Figure 4.33. F = {0.6.7,12.13}. = {0,6,7.13.l4}. Figure 4-34. F Figure 4-35. F = {0,678.15}. Figure 4-36. F = {0.6.7,N—7}. Fault Set: 29 F = {0.6.7,N-8}. See Figure 4-37. Apply IB6B between vertices l7 and 18. Fault Set: 30 F = {0.6,7.8,N-9]. See Figure 4-38. Apply IB2 to the edge (9,10). Fault Set: 31 F = {0,6,7.4i .4i +1 }. See Figure 4-39. The range of 4i is qualified as fol- lows. 16 S 4i SN—12. where N—4i E 0 mod 6 (top), 16 S 4i SN—l4, where N—4i £2mod 6 (middle), and 16S4i SN-16, where N-4i =4mod 6 (bottom). To see this result, first fix 4i = 16. Then we may apply IB6B between vertices N —l and 0. This verifies this choice of 4i. Now note that we also apply 1B4 between 11 and 12. Figure 4-37. F = {0.6.7,N-8}. 95 Figure 4-38. F = {0.6.7,N—9}. Fault Set: 32 F = {0.6.7.4i +2.4i +3}. See Figure 4-40. The range of 4i is qualified as follows. 16 S 4i S N-12, where N—4i = 0 mod4 (top). 16 S 4i S N—14, where N —4i 5 2 mod 4 (bottom). To see this result, first fix 4i = 16. Then we may apply IB4 between vertices l9 and 20. This verifies this choice of 4i . Now note that we also apply IB4 between 7 and 8. The above fault-sets cover all possible values of 2 except for z = N —10, which fol- lows. Fault Set: 33 F = {0,6,7.8,N-10}. See Figure 4-41. Apply IB4 between 7 and 8. This completes the case {0.6.7.2}. CASE 6. y e{7.8},16 S 2 SN—8. Fault Set: 34 F = {0.7.8.15.l6]. See Figure 4.42, Apply IB6B between N -1 and 0. Fault Set: 35 F = {0.7.8,N-9}. See Figure 4-43. Apply IB2 to the edge (13,14). Figure 449.1i = {0.6,7.4i,4i +1 }. 97 Figure 4-40. F = {0.6,7.4i+2.4i+3}. 98 Figure 4-41. F = {0.6.7,N-10}. {0.7.8.15,l6]. Figure 4-42. F 100 Figure 4-43. F = {0.7,8,N—9}. We have thus far shown that any fault-set in which two vertices are at a distance of at most eight can be tolerated. It is not diflicult to see that for N = 26. two faults must meet this requirement. and thus the proof is complete for Fault Set: 36 F = {0.7,8,N-9,N—10}. See Figure 4-44. Apply IB4 to vertices 9 and 10. For reasons similar to the above. the previous fault-set completes the proof for N = 28. Fault Set: 37 F = {0.7.8,N-11}. See Figtue 4-45. Apply IB2 to 9 and 10. For reasons similar to the above. the previous fault-set completes the proof for N = 30. Fault Set: 38 F = {0.7.8.2j+l}. where 17 S2j+l SN-ll. See Figure 4-46. We may apply IB2 to the edge (9.10) and also IB6B to the vertices N —l and 0. allowing us to cover all odd vertices in the specified range. Fault Set: 39 F = {0.7.8,20}.N 232 . See Figure 4-47. Apply IB4 to 25 and 26. Fault Set: 40 F = {0.7,8,N-2j }, 10 S 2j SN -22. See Figure 4-48. Apply IB2 to the edge (23.24). 101 Figure 4-45. F = {0.7.8,N—11}. Figure 4-46. F = {0.7.8,2j+1}. 103 Figure 4.47, F = {0.7.8.20}. Figure 4-48. F = {0.7.8.N—2j }. This completes the case of {0.7.8.2} and we now consider our final two cases. CASE 7. y = 2i+l.2i+129 Fault Set: 41 F = {0.2i+l.2i+1+z} , where 9S2i+1