I F II J" i 1.? I. .. v u o I a ‘ . . . .. w. .I... I... . .. . 1.. . . 34.3.... 3.. .3. -- - v .vn. IMMIIIAIIIIAWMV‘I I.U...Im.«....!.. fr 3 . II..I . II I ‘5 ‘ t I. t - '0 I. ‘I v c ‘0‘ inanid. hIoIQII I II. III. IIII'IWIIIIIIIHIHJIIII . . I. (whoa. II.1.4.uorIH.flI»d. IIIJIIIII... .nuI «WINNhH :3 9.4%.. An... {.\.IEI. .xuang IsIWLthu ..:n..II - I “It: I‘IIIIIIO VI‘I‘I.II¢OI~}' IIIIIQIIIII.II ILIXIVIHQHPIHIIII .. WI I. I .. X I I II it“ ‘ Itv‘tfln‘mIoIIN‘a‘IV I IIIIHI TIMIJIUIIIHIJw‘l-‘DIIIIIRIWQ‘ .I.IIIIl.I V I I In . I I .I. I up In . MI I. ,. ..I. I v. . u. 0.. I I- a I I J I A I I P. . II? - I o J! I .I I I I I: II". .I ”I I I I u” FHSIII III ,II I, II n O: I 'D.fIII (I III: II c U I. . -\II I. In“ In. "WIIII "fl! I ‘g’ WIS. : '1 II '4 I I . 1 I I I I II I III!‘ . I. 211‘. III-.IIII. .IIiII.IIIII.II\I. 1'» I. I. u .\. IIIIIMIIIII II IIIIUIIIIIIJWIU IIIIIOUIHIIIJIIIkIhJIIMIIIIUm ~n..u:lo§....rsa.uu~lx\ub.hflfiu EItW, nosu . . .L . . «I III II IIII‘WIIII‘IQ‘Q I I? \III III In I. I III .xIHIdchIIIIHIIdI . v ., I I .I. III-IIIHH.III«NIIIHMIIIIIII1\InaIv finflfihIIfiIIIWIIIldtuIIr JIWLmIIII‘HIIIMIlI. IIIIIIJIII HIBHrIIdIIL...IInIIIu..VI. I. I u IIHI II WU III \ II I I If III IIIWIIIIIIhuIIUI “III; Ih‘IIIIII I‘I‘IIIDHDIIIIULIIIWIII . I. I. I.I \‘IWIII I III «IIVHIIIIIIIIVII III II “HILIIITVIIHHIIHIIIII “I394“. I . IIIIW. Input. I.IIMI.thuIIIII...I "Han: . . . IIII .I HI IIIDHVIl-IIIII. IEII‘EHI I I II .I IIIIIIIIIIUVIIII “null-IIIIIVIIV.“ I IIIILIn I‘ll-III ' t OIII\‘I\ I‘QIIVIIIIIIUIQI‘II.‘ .\..l‘clInIv.IvrI\v . . I ‘K‘bx‘fiu‘? I! I\l.\.|.|- I‘IWI: Ififh‘nrnllvul L111... IIVIIIIIIIIA. IIIII IIdr .HIIUIIIHIMRIIIII:I.\ .I, I I I v .I II. I I.» IIJ. I —I .1 I I " 5’ n m" In"; .II v" I'I l II'I‘iI‘I. IlI' "Il'li. .. I I \I II 1 . I I I. Io. I . ,II II I I IHIII II II \IlIIIIIIII-IIII. IIIHWH IIIIIIIUI III. Id‘u‘. Iu.II«Idl.I,uNIH....n.Ivvmmtlnr p. I .I III I VIIIIIIII‘II‘INIINBIN‘ .IhllelfivI; In‘, QMH‘ II,..III..\LI...1 III I‘IW‘..‘III-..II.V.~I .v. I A . all . I I . . \I IIIII IIIII III-‘34.“! IIIIY‘I‘IIILIIII III} I IIIInIIfI. IQIIWIIOIL . .‘II......u I . I II ”III - I. III III. IIIIlflI.IIIIIIIIIHI..\IrIIIIu“u.. III III .IflIfifmfiIIIKUPJMHIIII I I .I II I :I .. vi..- J. -I I1Ir . I. I 0‘4 H . ~ I | ~ AI. .I III'III'I .r y ' I .I ... x In I II I “H. . I . 13'." I .5 I Ia! "van“ I I. IAI‘I ‘ I' Il‘,‘ I .‘II' I 7”" ,‘I '11 I ‘1. I '.|.‘ I' ': uni“, ; ~:‘ - IW‘ . q I: um III . ’I H" ‘I, I ‘R 1‘I III II It _I a- H II In: . fin. I - I .I LI! 0'! III II III! I I . I I I III I UVIIIIII IIIIHVIHIIII hI v u. I I \I IIIIIHuI ,I‘IIIII III III..- I. I . c . I . I I I It‘ll-OI . . II I - I III .qullllI I‘l‘llflIlIHm‘I‘ln‘IIl‘l V IIII‘I‘IIHI‘I IIIVII If: l)IIIvI.II .3; 7 II I 4“. .oJIIlitHI-LI IR II II. II'I'IIII‘ ‘III I \ck'l . III‘II II .I\ III .. I..I~ IIII‘ III'IDIIIIIVIII‘D IIII- ‘0 N-I‘IO‘I llh III III III‘II‘IV‘H‘IIIHI‘I‘IIIII‘III II IuINIII III): «I SI‘HIII‘HvI‘IIIIII “I“..IJHHIQIIIHIKIVVIWHII! .oI-IHIt It: (IINIIIII‘II: Z IIIII)‘: .\II . “- . I u‘% . . , :III .I I II II t I I I.\III‘II . III I I ‘ . . .- IIIIII‘IIIII II I \I . {VIII .1; ,IIIIIL, II I. III I hionIIIIIIII.IIIIIIIII\I.I\II\\I.IIIII II . IIUIIIIIIIIIIIIIVHIIIHNIqu . III IdI-“II.vII.oIIIhsIIIII“vIIIIHI .. \I‘IH II; I \IIIMXOIHIIIIII \hIIlIIRA. IIIIII LI AII§II\.I III NII‘I III, I IIIIIINMIIIHI$HIIIIIIHIIII§IIHJhIIfiIi VIIIIIIIIP-IIDIIINHw EIIH... IIIIII-IIDO Ibatfih‘ I..I.III\I I III .\It. I..I.I..DII. I‘Ill‘tI.I!..I.IIII IIIVI L1. ll . . I {l I 1‘ l I I ‘ f'VGufi‘Tfl I ‘ ' " V, II“? 'III II I” II ...I. «III..- I III II IIIIHII‘ RII .1!- I I . I III. I IIIIIIIIIIIII hnmIIIIwwIIIquuI. II: III III. III . I. .. IIIIII I. I , I I . wtuuvvhb 'Ifl‘lfimuIl-I v.lI II! 1:. n IIIII‘.“ u IIMIWWIIIIIIIUIII- III gianfihflijII II‘IHNIWIWPAI .w 4.- I .II III . . I I I I . I I . II I I I I . . I I. .1:\III “a“. ‘00“.I-IVII. . I II 11114.1». I . I. .II .. on w- I I . . . I II ..... III I... I III ,InIiIInl 3.112. I". I‘ I I I III I ”1"“ I I I II “VI I. II I . II II' .' I} "' I I i] \I IS‘ “‘Hfijlv This is to certify that the dissertation entitled THE MODELING AND ANALYSIS OF PARALLEL ALGORITHM DESIGN presented by Gerald William Cichanowski has been accepted towards fulfillment of the requirements for PhD mgnmhlcomputer Science D gel WC: 7% mo Lie Major professor Date 3st 7—7) M33 MSU is an Affirmative Anion/Equal Opportunity Inslilurion 0-12771 ”“53”,; . r“ r“ "' \ 113;?u‘m3’ Micwgm {State 033%"; it: i K , w EVISSIIJ’ RETURNINQLMAIERIALS: ‘ ' .1 Place in book drop to Liamuuss,‘ remove this checkout from “ your record. FINES will be charged if book is returned after the date stamped below. - ‘ 9-. . r. -‘ -' g L- ‘ F“? x} 'W' :3“ (I: f1 9““ in} Copyright by Gerald William Cichanowski 1983 /‘//-£3yg» THE MODELING AND ANALYSIS OF PARALLEL ALGORITHM DESIGN BY Gerald William Cichanowski A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1983 ABSTRACT THE MODELING AND ANALYSIS OF PARALLEL ALGORITHM DESIGN BY Gerald William Cichanowski Parallel algorithms contain information which describes process synchronization and intercommunication. This information is only trivially present in- sequential algorithms. The impact of this added information on the design process for parallel algorithms for multiprocessor systems is studied. In this research, the metrics of Halstead's software science are extended to describe the information content of parallel algorithms. The extended metrics are used to show that parallel algorithms may have higher information contents than their sequential counterparts. Part of this increase is due to the sequential component supporting the activities of the parallel component. We call this support subsidization. Given that parallel algorithms may contain.additional information over that for their sequential counterparts, this research examines the effects of this added information on the parallel algorithm design process. To do this, a simulation model of the algorithm design process 'was developed. The model is based on the information content of an algorithm, and the information which is used to make Gerald William Cichanowski decisions during the design process. Two representative algorithms were chosen for experiments based on the simulation model; a parallel root finding algorithm and a parallel bin packing algorithm. Both algorithms have a single localized parallel segment, but they have other structural differences. In the experiments, it was seen that the type of information which was used by design decisions impacted the number of steps which were required to design the two algorithms. It also impacted the rate at which information was added to an algorithm, while it was being designed. Also, the performance of a given type of information differed when used for each of the two algorithms. These results are then used to suggest ways in which parallel algorithms can be effectively developed. ACKNOWLEDGEMENTS I would like to thank the members of my committee, Drs. Carl Page, Herman Hughes, Jacob Plotkin, and Robert Reynolds, for their assistance in creating this work and for directing my studies. I am specially grateful to my advisor, Dr. Reynolds, for his many hours of consultation and assistance. We had a major influence on the creation of this dissertation. iii TABLE OF CONTENTS List of Tables.......................................... vii List of Figures...........................................ix 1.0 Introductidn..........................................1 1.1 The Algorithm Design Process.........................h 1.2 Organization of the Dissertation.....................8 2.0 Metrics for the Description of Algorithm Information Content..............................................12 2.1 Metrics for Sequential Algorithms...................13 2.1.1 An Example of Halstead's Metrics for a Sequential Algorithm........................................1h 2.1.2 Potential Volume.................................17 2.2 Metrics for Parallel Algorithms.....................19 2.2.1 Comparison of a Sequential and a Parallel Root Finding Algorithm................................23 2.2.2 Component Metrics and Potential Conceptual Volume...........................................29 2.3 Summary.............................................31 3.0 The Graph Model of Parallel Algorithm Structure......33 3.1 The Graph Language..................................34 3.2 Relation of Graph Language to the Metrics...........38 3.3 Metrics for Partial Designs.........................39 3.” Summary.............................................u5 iv 4.0 Algorithm Transformation and Subsidization...........46 h.1 Algorithm Transformation............................u7 u.2 Subsidization.......................................49 4.3 Summary.............................................53 5.0 A Parallel Bin Packing Algorithm.....................55 5.1 The Bin Packing Problem.............................57 5.2 The Best-Fit With Replacement Rule..................60 5.3 Implementation of the BFR Rule as a Parallel Algorithm...........................................63 5.# The MBFR Algorithm Simulation.......................66 5.5 The Computational Complexity of the MBFR Algorithm..75 5.6 Summary.............................................78 6.0 The Algorithm Design Model...........................80 6.1 A Design Step.......................................81 6.2 The Placement Process...............................85 6.3 The Elaboration Process.............................93 6.# The Procedural Model................................99 6.5 The Simulation Model of Algorithm Design...........102 6.5.1 The Model's Environment.........................103 6.5.2 Design Methodologies in the Simulation Model....106 6.5.3 The Design Process..............................108 6.6 An Illustration of the Design Step Process.........112 6.7 Summary............................................118 7.0 The Experiments.....................................120 7.1 The Sample Algorithms..............................121 7.2 Design Methodologies Used in the Experiments.......126 7.3 The ExperimentSOOOOOOOOOOOOOOOOOOOOOOO0.0.0.0000000128 7.4 Structural Differences Between Algorithms ROOTF and MBFR...........................................134 7.5 Effects of the Algorithm's Structure on the Elaboration Process................................136 7.6 Relationship Between the Placement Process and the Rate Information is Added to an Algorithm's Design.138 7.7 Correspondence Between the Model's Results and Human Designers....................................142 7.8 Summary............................................1#9 8.0 Conclusions.........................................151 Appendix A The Bin Packing Algorithm....................156 Bibliography............................................160 vi 2.1 2.2 2.3 2.# 2.5 2.6 3.1 3.2 3.3 4.1 4.2 5.1 5.2 6.1 6.2 LIST OF TABLES Operator counts for the polynomial program...........16 Operand counts for the polynomial program............16 Operator counts for the bisection algorithm..........26 Operand counts for the bisection algorithm...........26 Operator counts for the multiprocessor algorithm.....27 Operand counts for the multiprocessor algorithm......28 Element counts and conceptual component volumes for the algorithm in Figure 3.2..........................40 Element counts for the algorithm of Figure 3.3.......uu Element counts for the algorithm of Figure 3.“.......44 Component lengths and vocabularies for the two root finding algorithms...................................49 Usage indicies for the two root finding algorithms...51 Packings for each simulation list length for each of the 30 random permutations...........................68 A summary of the performance of the MBFR algorithm for each list length (L).............................69 Element counts and conceptual volume for step one of the abstract model illustration.....................117 Element counts and conceptual volume for step two of the abstract model illustration.....................117 vii 7.1 7.2 7.4 The design methodology probability mixes used in the parallel algorithm design experiments...............127 The average number of design steps which were required for each design mix, for algorithms MBFR and ROOTF...........................................129 The average information (bits of conceptual volume) added to the algorithm by a design step, for each design mix..........................................131 Structural differences between algorithms ROOTF and MBFReeoeeeeeoeeeeeoeeoeeeeeeeeeee0.00000000000000000135 viii 1.1 3.1 3.2 3.3 3.4 4.1 4.2 5.1 5.2 5-3 5.u 5.4 6.1 LIST OF FIGURES Kung's matching criteria for hardware and algorithms..5 The structures of the graph language.................35 A sample graph language algorithm....................37 A partial algorithm design...........................42 The algorithm of Figure 3.3 after the next design step.................................................43 Transformations for converting a sequential algorithm into a parallel algorithm............................48 The placement of subsidization for the four algorithm transformation types.................................52 A comparison of the BF, BFR, BFD, and optimal packing for the example list.................................62 Plot of the average packing as a ratio of the optimal packing for the MBFR simulation......................70 Plot of the worst case packing as a ratio of the optimal packing for the MBFR simulation..............71 Plot of the maximum number of excess bins as a ratio to the optimal packing for the MBFR simulation.......72 Worst case, average, and optimal packings with the maximum and average number of processors, required by the algorithm, for the MBFR simulation............73 Abstract model of the algorithm design process......100 ix 6.2 6.3 6.4 6.5 6.6 607 701 7.2 7.4 7.5 7.6 Simulation model's representation of the design environment.........................................104 Simulation model's representation of a design methodology.........................................107 An overview of the designer component of the simulation model....................................113 Procedure for choosing selection criteria and next chunk...............................................113 An overview of the elaboration procedure for the simulation model....................................114 A sample algorithm to illustrate the flow of the algorithm design model..............................115 Graph language representation of the ROOTF algorithm...........................................122 Graph language representation of the MBFR algorithm...........................................123 The details of the PARFOR block for the MBFR algorithm of Figure 7.2.............................124 Average design time. in minutes, for algorithms ROOTF and MBFR as a function of the design mix......133 The producer and consumer processes in full detail..146 The producer and consumer processes with the synchronization and process intercommunication details removed.....................................147 The producer and consumer processes with all parallel details removedIOOOOOOO...OOOOOOOOOOOOOOOOOj-ua CHAPTER 1 Introduction During the past decade, many advances have been made in the realm of computer architecture. While many of these advances have dealt with traditional Von Nuemann architectures, alternate models of computer systems have begun to appear [1-2]. These systems include pipeline processors, array processors, multiprocessors, and data flow computers [2]. Although these non-traditional models of computer systems differ from each other, the common thread which unites them is that they each potentially support some concurrency within an algorithm. Since these non-traditional architectures support parallelism, if they are to be effectively used, algorithms must be developed to expolit this capability. This research ‘will examine the software design process for parallel algorithms. Two approaches have been taken for the development of software for such non-traditional architectures. The first approach is to start with a sequential algorithm and then to translate it into a form which is suitable for execution on a parallel computer. Although these transformations are made to the algorithm's code, since the code is a model of the algorithm, the transformations are, in essence, made to the algorithm. These translations may be made by hand, or by an automated means, such as a compiler. Ruck, [3], discusses this approach. He describes procedures for the automatic transformations of programs written in sequential languages to a form suitable for vector processors. These transformations include techniques such as tree height reductions, vectorizing DO LOOPs, simultaneous evaluation of alternate conditional paths, etc. This approach is promising primarily for achieving speedups to existing sequential software. The approach of translating sequential algorithms taken above, can also take advantage of the extensive research which has been done in the area of sequential software design. Since the algorithm is first created as a sequential algorithm and then translated into parallel, standard sequential design techniques can be used to create the sequential algorithm. Then a simple transformation can be used to inject parallelism into the design. Starting in the late 60's with Dijkstra's famous letter to the editor [4], through much of the 70's many successful sequential design methodologies have been identified and formalized. [5-10] These methodologies include several hierarchical design methods; top-down programming, bottom up programming, stepwise refinement of program abstraction, and modular programming. Many tools and strategies have been developed to take advantage of these methodologies; structured programming, HIPOs, Nassi-Sniederman charts, etc. [5]. The second approach for the design of parallel algorithms is to design the algorithms from scratch as parallel. This has been done by basing the design on languages which have been specifically designed or extended for parallel machines. Languages such as VAL [11] for data flow computers, and Concurrent Pascal [12] for multiprocessors are examples of such languages. The strength of using such languages is that a programming language is essentially a model of the computer which is being programmed. As such, any program developed in that language should be directly supported by the computer in question, and should not require extensive transformations to take advantage of the machine's capabilities. The weakness of this approach is that the strong foundation of design methodologies which have been developed for sequential algorithms, may not necessarily be applicable to software for non-Von Nuemann architectures. In light of these potential differences in design technqiues for different architectures, we will now examine the algorithm design process. 1.1 The Algorithm Design Process The design process for algorithms is essentially an information gathering process. To successfully match an - algorithm to hardware, Kung [13] has suggested three pieces of required information or matching criteria; process granularity (size), process control, and communication between processes (Figure 1.1). These criteria correspond to information which a designer must have to successfully develope an algorithm. The ability to support these criteria differ significantly between sequential and parallel architectures. In a sequential algorithm, process synchronization and interprocess communication can be ignored, since there is only a single process. On a parallel machine, potentially several processes must be controlled. Thus, synchronization and communication between them becomes necessary. Differences also exist between algorithms for different types of parallel architectures. For eXample, an algorithm which is designed for a multiprocessor will differ from its counterpart for a data flow processor. On a data flow processor, granularity is small, an individual operation. On a multiprocessor, granularity is normally larger, and may differ between the 5 (Concurrepcy control) F** (Centdalized (Distrgbuted (Control via . control) contool) shared data) (Synchronous) (Synchronous) (Asynchronous) (Asynchronous) _— ESimple local (Complex local; (Simple local; (Complex local; control control control control (Communication geometry) (Regflar) (Irreéhlar) l 1* I I * (Tree) (Array) (Shuffle) . . (Crossbar) (One-dimensional) (Two-dimensional) . (Square) (Hexagonal) (Triangular) ... (Module granularity) (Small constants) (Small) (Large) Figure 1.1 Kung's matching criteria for hardware and algorithms. processes which make up the algorithm. Another way of viewing the differences in information requirements between sequential and parallel algorithms is through the process of correctness proofs. Owicki and Gries [l4] discuss correctness proofs for parallel algorithms. In their discussions, they show that a need exists to go beyond partial correctness and termination, the standard properties required for sequential algorithms, to demonstrate correctness. Partial correctness is shown by demonstrating that a program's output set is generated from its input set, using assertions based on the program's‘variables and rules describing the effects of the program's statements. Owicki and Gries show that additionally, non-interference between concurrent processes and the absence of deadlock must be proven. These additional properties correspond directly to the additional information that is required to design parallel algorithms. Therefore, we can see that the information requirements for designing algorithms differs when the algorithm is designed for a sequential or parallel architecture. Indeed, the information requirements vary between different types of parallel architectures. Therefore, it is important to identify what types of parallel architectures will be considered. This research will restrict itself to a subclass of parallel architectures. The parameters which describe the architectures of interest are; asynchronous process control, local process intercommunication, and varying process size. This describes multiprocessors when the granularity is large, and data flow computers when granularity is small. The purpose of this research is to examine the design process for parallel algorithms, in light of these additional information requirements. This will be done to find out if a different design approach is required for parallel algorithms. To do this, the algorithm design process will be modeled in terms of the information content of an algorithm. An algorithm is constructed by making a sequence of design decisions. Each decision causes objects to be added, deleted, or modified in an algorithm. Thus, each decision has an impact on the information content of an algorithm. Normally, a programmer has a set of rules which govern what information may be used by a design decision, and what portions of an algorithm may be altered as a result of these decisions. These rules represent a design methodology. The result of any design decision, with the exception of the last decision, is a partially completed algorithm. Thus, the sequence of design decisions leads to a sequence of pgrtial algorithm design . By examining the information content of each of the partial designs, we can observe the effects of each design decision on the information in the current partial design. 1.2 Organization of the Dissertation The first section suggests that parallel algorithms may contain additional' information regarding process synchronization and intercommunication and thus may have higher information contents than their sequential counterparts. lg 52; amount 9; ggig additional information enough £2 suggest Egg 352g £25 modifications pg traditiongl design strategies? In chapter two, an attempt is made to address this question by developing a set of metrics which will be used to show that indeed parallel algorithms may have a higher information content than their sequential counterparts. This will be done by examining a sequential root finding algorithm and a parallel version which was derived from it. Th5 metrics g; chapter Egg Egg pg 2559 pg guantify 525 information content 95 seggential ggg parallel algorithms. They were derived from those suggested by Halstead's software science [16]. The metrics as envisioned by Halstead, are normally used to describe completed algorithms. In Chapter three, it will be shown that our extensions to Halstead's metrics are also useful for describing the information content of partial designs. ThusI gs will 95 able pg guantify the effects ‘9; 3 design decision, by comparing the information contents of suceeding partial designs. A convenient means of representing algorithms and their partial designs will also be presented in chapter three. A graph language based on the abstraction of an algorithm's information content will be used to demonstrate how design decisions change the information content of an algorithm. Given that parallel algorithms may contain a significant amount of additional information, pp! lg 5235 ggggg information distributed throughout pg algorithm? There are two possible extremes for the placement of this added information. 1. The added information may be localized in one small section of an algorithm. 2. The added information may be distributed throughout an algorithm. I__ig hypgthesized that the concentration and placement ‘_£ the added information may affect the approach one needs pp 5355 1252 designing g pgrallel algorithm. Chapter four examines the placement of this additional information, by examining a sequential and parallel version of a root finding algorithm. There it will be shown that for the parallel root finding algorithm, the added information is concentrated in a small, local portion of the algorithm. 10 The parallel root finding algorithm is an example of an algorithm whose parallel information is localized. Also, the parallel component of the algorithm is small. Chapter five will present a second parallel algorithm which also has localized parallel information. However, the parallel component of this algorithm is much larger. The algorithm‘ solves the bin packing problem and is designed for multiprocessor systems. Thus, we will have examples of two algorithms with localized parallel information. The root finding algorithm has a small parallel component, and the bin packing algorithm has a large parallel segment. The two algorithms will be used to investigate the type of information which should be used to design varying sized algorithms with localized parallel components. For algorithms, such as our two sample algorithms, with localized parallel information, gpg_ gppg 525 gig; pg Egg parallel component affect Egg pypg pf information 32152 should pp pgpg py 5 design decision? In order to answer this question, we need to model design decisions in terms of the information which they use. In chapter six, a model of the algorithm design process will be presented. The model will focus on the use of spatial, temporal, and functional information for the design of parallel algorithms with localized parallel components. It will examine how the structure of the algorithm effects the kind of information which is needed to efficiently design it. The model will 11 simulate the design of an algorithm by reconstructing a graph language representation of it. The order in which objects are added to a design will be controlled by exploiting relationships which are based on the use of different types of information. By monitoring the number of steps _it takes to complete the algorithm, and the information content of each of the partial designs, we will be able to observe the effects of the use of spatial, temporal, and functional information in the algorithm design process. An implementation of the model, in Pascal, for the Cyber 750 will also be discussed. Chapter seven will describe a set of experiments which were conducted to examine the effects of using methodologies which differ in terms of their use of spatial, temporal, and functional information on algorithms with local parallel components. The use of this information will be examined relative to the granularity of an algorithm's parallel component. The root finding and bin packing algorithms will be the subjects of the experiments. Finally, chapter eight will summarize the findings of this research. It will discuss the correspondence between pp; pypg‘pg information pggg during Egg design processl ppg. 525 ‘pipg ppg placement 2; £25 pgrallel component pg 525 algorithm . Suggestions will be offered for the development of parallel algorithms based upon this correspondence. CHAPTER 2 Metrics For The Description of Algorithm Information Content In order to effectively model and compare design strategies, it is necessary to have an objective criteria regarding the information content of an algorithm, and to have a means of monitoring the growth of information of an algorithm, as it moves from its conception through the design process. Several sets of metrics have been proposed for this purpose [15]. The metrics proposed by Halstead [16] seem to dominate and have led to the field of "Software Science”. Halstead uses the term software science to describe a set of metrics which are used to describe and compare algorithms. The metrics are based on counts of the elements which compose the algorithm. The metrics represent the number of bits which are required to uniquely encode an algorithm. As such, they represent the information content 12 13 of an algorithm. The metrics as presented by Halstead, have been used to describe sequential algorithms. In [17] they were extended to allow their use with parallel algorithms. A brief description of Halstead's metrics will be followed by their extensions into the parallel realm. 2.1 Metrics For Sequential Algorithms Halstead has proposed the following basic metrics for ,sequential algorithms. n1 - number of unigue or distinct operators in an algorithm's implementation. n2 - number of unigue or distinct operands in an algorithm's implementation. N1 - total count of the appearance of all operators in an implementation. N2 . total count of the appearance of all operands in an implementation. F1 . - number of occurences of the jth most frequently used operator, where j = l, 2, ..., n1. 14 P . Number of occurences of the jth most frequently 2,j ‘used operand, where j . 1, 2, ..., n2. Also it is evident that: £1 ”1 " a=1 Fm and n = 2 N2 §=1 F24 As can be seen from the descriptions above, the metrics are based on simple element counts of a particular implementation of an algorithm. It should be clear that implementations of an algorithm in different languages will potentially yield different values for these metrics, showing that the information content of an algorithm is dependent on the language in which it is implemented. 2.1.1 Ag Example pf Halstead's Metrics for g Seguential Algorithm As an example of these metrics, an algorithm from [18] which evaluates a polynomial p(x)-a xn+ a x"-1 n "-1 +...+818+ao will be examined. P :- A[O]; POWERX :- 1; FOR I :8 1 TO N D0 BEGIN POWERX :3 X * POWERX; e :-P+A[I] *Powsxx END; The operator and operand counts for the algorithm are given in Tables 2.1 and 2.2. Note, that the counts are based on a static description of the code, and that they ignore the number of references made to a particular piece of code, unlike the more traditional measures such as an algorithm's asymptotic behavior. Using the basic metrics more sophisticated measures can be formed. The vocabulary of an algorithm is the sum of the number of unique operators, n1, and unique operands, n2. For the example algorithm, the vocabulary, n, is: n - n1+ n2- 15 The implementation length, N, of an algorithm is based on the number of occurences of the elements in the algorithm's vocabulary. Thus: The length of an algorithm and its vocabulary are often combined to form the volume metric. Volume represents the number of bits which are required to encode an algorithm. Thus, volume requires a component to uniquely represent each 16 Table 2.1 Operator counts for the polynomial program. 0 rato F . pe r J 1,3 := 1 5 : 2 4 * 3 2 E] 4 2 + 5 1 BEGIN...END 6 1 FOR...T0...DO 7 1 n1=7 N1=16 Table 2.2 Operand counts for the polynomial program. W N :4 Operand OWERX ZNOHwam mummtme u HHHNNNwF II CD 2 N ll H 0\ n2 17 member of the vocabulary, logzn. Plus, it needs to include the number of occurences (length) of those elements. Therefore, volume is expressed as: V - N logzn. For the example algorithm this yields: v . 32 109215 . 125.02 bits. 2.1.2 Potential Volume As mentioned earlier, the language in which an algorithm is implemented has an effect on the metrics. Because of this, a basis is needed to compare different implementations of a given algorithm. Potential volume, V*, is the metric used for this purpose. This represents the minimal information needed regarding an algorithm. This is comparable to calling a predefined procedure. The information which is needed are the input and output operands. The operators become assignment operators to the output operands, plus the function which is the purpose of the procedure. Thus, if we let n1* and n2* represent the minimal vocabulagy and N1* and N2* represent the minimal length, the minimal or potential volume becomes: v* . (N1* + N2*) logz(nl* + n2*) 18 For the polynomial algorithm this is: V* - (2+4) 1092(2+4) = 15.5 bits. We assume three input operands (A, N, and X) and one output operand (P). The operators are the polynomial evaluation function and the assignment operator for the result operand. Knowing that the most succint form of our algorithm is 15.5 bits, it can now be compared to the acutal implementation volume of 125.02 bits. The resulting metric is called the implementation level, L. L . v* - 15.5 - .124 V- T5576? Note that the closer L is to one, fewer details regarding the algorithm are required. If L equals one, the algorithm must be present as a component of the language which is being used. As L becomes smaller, the implementation volume must increase. Since implementation volume represents information, one would suspect such an algorithm required more design steps, with a resultant increase in the design time and the number of expected design errors. L can also be used to compare two implementations of an algorithm. The version which has the smaller value for L can be assumed to be more complex, and require more design effort. This concludes the discussion of metrics for sequential algorithms. In [16] other uses for these metrics are given. It is shown there that the metrics are useful for predicting 19 development and maintenance time and effort. The next section will describe extensions to these metrics for use with parallel algorithms. 2.2 Metrics For Parallel Algprithms The metrics described in [17] for parallel algorithms are direct extensions of their sequential counterparts. As such, they are also based on element counts. However, there is a difference in how the elements are to be counted. In chapter one, it was argued that the design of parallel algorithms require information above their sequential counterparts. This information includes knowledge of process intercommunication, insuring mutual exclusion where required, and so forth. This new information was not required in designing sequential algorithms. As a result of this, for our purposes in analyzing an algorithm's information content, we will view an algorithm as being composed of two components: a sequential component, and a parallel component. Thus, the element counts become: n1 p . a count of those unique operators that represent I the activities of parallel processes, their synchronization, and access to shared data. 20 n1 5 - a count of those unqiue operators that represent I the basic sequential activities performed by an algorithm and are not invoked be a parallel operator. r12,p . a count of those unqiue operands used by parallel operators including synchronization and resource sharing operators. This includes all operands associated with the invocation of a parallel operator. "2,s - a count of those unqiue operands that are used in the algorithm's basic sequential computational activities, excluding those invoked by a parallel operator. When dealing with these elements, the designer must be cognizant of the domain in which the elements are found (sequential or parallel), since they are treated seperately in each domain. Thus, the simple element counts will be superceded by counts of operator-context and operand-context pairs. Thus, we now include conceptual differences between sequential elements and possible parallel counterparts. Therefore, if an element is found in both a sequential and a parallel portion of the algorithm, for purposes of the vocabulary, it is counted twice. It is counted as belonging to the sequential component, and also as belonging to the 21 parallel component. The vocabulary is not inflated, because now the vocabulary measures unique operator/operand-context pairs. If we return to the polynomial algorithm of the last section, we notice that it is Composed of only a sequential component. Thus, we can easily see that the sequential metrics of the first section are a special case of the parallel metrics, with n1 and n Op 20p Since sequential algorithms can be described in terms of the both equal to zero. new metrics, the metrics will be useful in comparing a parallel algorithm to its sequential counterparts, as well as for comparing two parallel algorithms. As before, the total number of occurences of element-context pairs yield the metrics shown below. N1 p - the total count of parallel operators and those sequential operators invoked by them. N1 s . the total count of those sequential operators not invoked by a parallel operator. N2'p . total number of occurences for operands used by parallel operators. This includes operands involved in sequential actions invoked by a parallel operator. 22 used in frequently N2 3 a total number of occurences of operands sequential activities not invoked by parallel operators. F31 p . number of occurences of the jth most used operator in this class. F31 s . number of occurences of the jth most used I sequential operator. We can define operands F32 and F32 in the same :5 :P It follows that: N - Ei's Fj 1’8 30:1 1.8 and n N =Z1'P F3 N and N2 can be similarly defined. 2'3 op' The total number of operators and operands designated N1 and N2 respectively, where: fashion. will be 23 i=s,p J=1 Also, the number of unique operators and operands are given as n1 and n2. They are defined as: =Zn. n1 i=s,p 1.1 n = IE n . 2 i=s,p 2,1 The sum of the number of unique operators and operands in each category for an algorithm is the vocabulary of the parallel algorithm. We again designate it as n, where: n - n1+ n2 Likewise, the length, N, of a parallel algorithm equals: N - N1+ N2 2.2.1 Compprison g; g Seguential. and 3 Parallel Root Finding Algorithm To illustrate the use of the new metrics in comparing two algorithms, a sequential and a parallel version of a 24 root finding algorithm will be presented and analyzed using the new metrics. The sequential algorithm taken from [19] is' shown below. The algorithm assumes that the function, which is to be evaluated, is monotonically increasing and has a root in the interval [LOW, HIGH]. real procedure bisect(LOW, HIGH, E) begin real 2, X, LOW, HIGH, E, Y; while (HIGH - LOW) > 2 * E do begin x :- (LOW + HIGH) / 2; Y :- r(x): if Y < 0 then HIGH - x else LOW - X end: 2 :- (LOW + HIGH) / 2 end: The element-context pair counts are shown in Tables 2.3 and 2.4. Since it is a sequential algorithm, all parallel metrics equal zero, and the metrics' correspond to the standard sequential versions. Next, a parallel version of the algorithm is given. The parallel component is that block which composes the 'PARPOR' statement and the definitions of the shared variables. real procedure rootf(LOW, HIGH, N, E) begin integer shared N, J: integer 1; real shared DELTA, LOW, Y[l..N]; real B, 2, HIGH; while ABS(HIGH - LOW) > E do begin DELTA :- ABS(HIGH - LOW) / (N + 1); 25 parfor J := 1 until N do be in r J] :- F(LOW + (J * ostra)) end; LOW :- LOW 4- DELTA; I :- 1; while 0 > Y[I] and I <- N do begin LOW :- LOW 4' DELTA; I :- I + 1 end: HIGH :- LOW + DELTA end; 2 :8 (LOW + HIGH) / 2 end; Tables 2.5 and 2.6 give the element-context counts for algorithm ROOTF. Using the counts given in the tables, it can be seen that the vocabularies of the respective algorithms are: n ' "1 * n2 * “1,5+ n2 IP Ip '8 BISECT 0 + 0 + 13 + 8 - 21 ROOT? 10 + 6 + 15 + 11 - 42 The lengths of the two algorithms are: N ' N1,p+ N2,p+ "1,5+ N2,s BISBCT 0 + 0 + 32 + 29 - 61 ROOTF 16 + 15 + 50 + 39 - 120 Thus, we can immediately observe that the parallel algorithm has twice the vocabulary and length of its sequential counterpart. As with the vocabulary and length metrics, there is a new counterpart to the volume metric. Now volume refers to 26 Table 2.3 Operator counts for the bisection algorithm. rank . i Operator (1) F1,p begin...end, (...) *V|\+" \OCDVO‘skn-F'UNH while...do if...then...else 10 F real 12 real procedure bisect 13 HHHHHHHI—‘Nmmflm Table 2.4 Operand counts for the bisection algorithm. rank .0perand (i) U) HIGH ONNMI‘IN cowowurmmI-e HNKDKDUUIOO‘: N1° 27 Table 2.5 Operator counts for the multiprocessor algorithm. Parallel Operator rank (1) pi H !P begin...end, (...) 1 3 2 3 L] 3 2 ,= 4 2 + 5 1 * 6 1 parfor...until...do 7 1 F 8 1 real shared 9 1 integer shared 10 1 r11,p 10 N1,p 16 rank i Sequential Operator (i) F1 3 := 1 11 : ( ) 2 8 begin...end, ... 3 9 + 4 6 while...do 5 2 ) 6 2 - 7 2 ABS 8 2 9 2 5 10 1 C: 11 1 integer 12 1 real 13 1 real procedure rootf 14 1 and 15 1 n1,s 15 N1,s-5O Table 2.6 Operand counts for the multiprocessor algorithm. 28 Parallel Operand (i) F:,p J l 4 N 2 3 Low 3 2 DELTA 4 2 Y 5 2 1 6 2 nz’p-6 N2,p 15 rank i Sequential Operand (i) F2 3 LOW 1 9 I 2 6 HIGH 3 6 DELTA 4 4 1 5 3 E 6 3 N 7 3 Z 8 2 Y 9 1 2 10 1 O 11 1 n2.s 11 N2.3 39 29 the number of bits required to encode conceptual discriminations which an algorithm's designer must make. This new metric, which is called the conceptual volume, 2, is: y - (Ns+ Np) logz(ns+ np). As with the other metrics, the sequential metrics presented earlier are a special case with Np and np equal to zero. For our example root finding algorithms, 3 is shown below. arsacr y - (61 + 0) 1092(21 + 0) - 267.93 ROOTF y . (89 + 31) 1092(26 + 16) - 647.07 2.2.2 Component Metrics and Potential Conceptual Volume At times, it will be useful to examine the parallel or sequential portion of an algorithm only. The resulting metrics are known as component metrics. Thus, the gppponent conceptual volume for the parallel portion of an algorithm is calculated as below. V . (N —p 1,p+ N2,p) logz( n + n2 ) 1.? :9 Thus, for ROOTF: yp - (16 + 15) 1092(10 + 6) . 124 25 - (so + 39) 1092(15 + 11) - 413.34 30 This can be used to see that the bulk of the information resides in the sequential portion of the ROOTF algorithm. Next, the concept of potential volume will be examined in the parallel context. The reader should recall that the potential volume represented the minimal expression of an algorithm, equivalent to calling a predefined procedure. Thus, one would not expect there to be a difference in the potential volume whether the underlying procedure was sequential or parallel, since all that is seen is the procedure call. A procedure call would be the same for procedures which are either sequential or parallel. In either case, we assume the parameters passed and two operators (the function and assignment operators) belong in the sequential domain. Finally, the potential conceptual volume, 2* can be expressed as: 2* - (N*s+ N*p) logz(n*s+ n*p) - (N*s) logz(n*s) Which for both BISBCT and ROOTF is: 2* - (2 + 4) 1092(2 + 4) - 15.51 bits. We assume 2, LOW, HIGH, and B are the parameters, and as before there.are two operators. 31 2.3 Summary This chapter began by reviewing Halstead's metrics which describe the information content of sequential algorithms. Extensions were introduced to allow the metrics to describe the information content of parallel algorithms as well. Two algorithms, BISECT and ROOTF, were used to show that the extended metrics can be used to compare sequential and parallel algorithms. By comparing the two algorithms, it can be seen that the information content of the parallel algorithm is greater than that of its sequential counterpart. Thus the prediction, which was made in chapter one, that parallel algorithms may need to include additional information to handle process synchronization, process intercommunication, etc., is supported. Although we have only examined a single algorithm, one of our basic questions is answered. That is, a parallel algorithm may have an increased information content over its sequential counterparts. If an increase in information is seen for an algorithm as simple as the root finder, one would suspect that more complex algorithms would show an even larger increase in their information contents. The remainder of this research will examine whether this increase in an algorithm's information content has an impact on the design process for such algorithms. 32 CHAPTER 3 The Graph Model of Parallel Algorithm Structure In chapter two, metrics were presented for describing parallel algorithms. These metrics were based on counts of the elements of the language in which the algorithm was expressed. Thus, the metrics represent the information content of a particular implementation of an algorithm. As designed, the metrics include counts of non-active elements, i.e., BEGIN/END pairs, semi-colons, etc.. In seeking to abstract out only those algorithm features which actively contribute to and describe the information needed to design an algorithm, a graphical model of parallel algorithm structure was developed. In [20] Reynolds and Chang describe the PAL (Parallel Algorithm Laboratory) system. PAL is a graph language which is used to describe parallel algorithms. 25; allows pppp completed algorithms and algorithms still ig the design 33 34 process pp pg represented. In its original intent, it was to describe algorithms for VLSI array based architectures, but it has proved useful for describing algorithms designed for other parallel architectures as well. A description of PAL, as extended for this research, follows in the next section. 3.1 The Graph Languagp The graph language is based on a functional representation of an algorithm. It consists of data objects and structures. Data objects are given a label (name) and flow where needed in the completed graph in a data-flow fashion. Data objects are represented, in the graph, by closed rectangles, Figure 3.1. Structures represent manipulation of data objects and the control of an algorithm. The simplest structure is the function, Figure 3.1a. It consists of operands (data objects) and an operator, producing a result, which optionally may be given a label or used directly as an operand, when the functions are nested, Figure 3.1b. Functions may be nested to any level. Parallelism between functions is indicated by stacking the functions vertically, as shown in Figure 3.1c. Control in the graph proceeds from left to right. Control structures in the graph language consist of closed rectangle-like like structures as shown in Figures 3.1d and 3.1a. The backward block C represents a logical 35 E3 *4. (a. + h [E + El — _. _, _. Ia [,3] + El {1 A. B c. Elva [El lie-1. E”: + El m 2, D E. F as cw , IE [a PARFOR J'flLM [31 + [m 8. . m 7. En El - El is *1. G H. Figure 3.1 The structures of the graph language. 36 condition. In Figure 3.1d the condition operates as a pretest (DO WHILE), and in Figure 3.1e, it operates as a post test (REPEAT UNTIL). Any structure may be nested inside of a control block, including other control blocks. In Figure 3.1f, another use for the condition block is shown. In this example, it serves as an IF THEN like structure. In Figure 3.19, it is shown representing an IF THEN ELSE like structure. The use of the conditional block is not restricted to the function structures, any structure may be governed by such a conditional. An additional control construct remains, the PARFOR structure. This structure allows an arbitrary number of copies of a structure to be placed in parallel. It is shown in Figure 3.1h. N copies of function f1 will proceed in parallel, when this structure is encountered. Figure 3.2 shows how the structures can be combined to describe a complete algorithm. Several features of the graph language can be seen in Figure 3.2. First, the placement and amount of parallelism in the algorithm is easily observed. Second, the graph language forces the designer to construct his algorithms in a modular form. If one attempted to construct a complex algorithm in a non-modular fashion using this notation, it would be visually confusing. For purposes of this research, global and local data are represented the same. However, the graph language could be easily extended to allow discriminations 37 O0 am ...50 .ESPfipome mmmswcma nmmpw mamemm < m.m muswam .4 llI"l (a a: (as 38 between them. 3.2 Relation of Graph Language to the Metrics The structures of the graph language correspond to design decisions which must be made when designing an algorithm. As such, each element is assumed to be an abstraction of a piece of information which describes the algorithm. Thus, there is a correspondence between the graph elements and the metrics of chapter two, which describe an algorithm's information content. When a designer creates an algorithm, he uses many pieces of information, and introduces much of that information into the algorithm. The first piece of information, which is required, is the realization of the need for a structure in the algorithm. This is represented in the graph language by the structure's label. Second, the need for a structure is determined because its 'results are needed. Finally, the designer must be concerned with those elements which make up the body of the structure, the input operands, the operator(s), and the conditionals, if they exist. This procedure is quantified, for purposes of the metrics, by counting the structure's label, results, input operands, operator, and conditionals. The counts are combined, as in chapter two, to form the conceptual volume and other metrics. 39 Additionally, the graph language directly supports the notion of context. There are two distinct components to any graph; those structures and operands which are strictly sequential and those which are in parallel portions of the algorithm. Table 3.1 shows the element counts for the algorithm in Figure 3.2. It also shows how the counts are combined to form the conceptual and component volumes for the algorithm. 3.3 Metrics for Partial Designs The metrics of chapter two are normally used to describe completed algorithms. In analyzing design methdologies, it is important to be able to determine the effects of each design decision or step. The graph language and metrics presented in‘ chapter two, allow this to be easily done. Since the metrics represent the information content of an algorithm, and partially completed algorithms, partially represent the description of their completed designs, the metrics can be used to describe the information content of the partial design at each stage of the algorithm design process. By comparing the information content at each stage of an algorithm's development, the growth in the information content of the algorithm during the design process can be 40 Table 3.1 Element counts and conceptual component volumes for the algorithm in Figure 3.2. ’rank rank . . i i Sequent1al Elements (1) F2,s (i) F1,s X 1 4 FROG 1 1 Y 2 2 Cl 2 1 OUT 3 2 f1 3 1 Z 4 1 f4 4 1 1 5 1 > 5 1 w 6 1 + 6 1 V 7 1 * 7 1 7 12 7 7' rank 1 rank 1 Parallel Elements (i) F2,p (1) Fle Y 1 2 + 1 2 X 2 1 f2 2 1 z 3 1 f3 3 1 W 4 1 V 5 1 5 6 3 4 4< 44 l ”d (12+7) logz (7+7) = 72-3 ( 6+4) log2 (5+3) = 30.0 41 monitored. Figure 3.3 shows a partially completed algorithm. It is recognized that 'PROG” has three inputs, one output, and is to be constructued as a D0 WHILE like block. At this stage of the design process, the structure has an information content of 15.5 bits (Table 3.2). Each additional element added to ”PROG”, will change that information content. By monitoring the metrics, we can observe the change in the information content, as the design process proceeds. The values obtained for the metrics at each stage of the design process are called pgrtial metrics, PV represents partial volume, and PL represents partial length. The difference between the partial metrics at succeeding design steps are called ggggg metrics, i.e., volume added or length added. » vadded ' PVn ’ Pvn‘l lz’added ' PLn - PLn-l Thus, the resulting partial volume and length for the algorithm in Figure 3.4 is 25.2 bits. The volume added from the design step which led from Figure 3.3 to Figure 3.4 is 9.7 bits (Table 3.3). Although the term 'added' implies that the volume increases with each design step, this may not always be the case. The designer may backtrack in his design, or realize that what originally appeared to be distinct objects are the 42 Sea .swwmoc anpfluowam Hmfippmg < m.m onzmfim Ad 1|IH.I|I.J _n,zH_. dZH mzb 43 GQSK 4&5 on m one a man aka 6 as: m.m answaa L hopmm 93 MTG on p swam .Qopm am2H_ 7.2...” :21 Table 3.2 Element counts for the algorithm of 44 Figure 3 e 3 0 rank i rank i Sequential Elements (1) F2,s (1) 1,8 IN1 1 1 FROG 1 1 IN2 2 1 Cl 2 I 1N3 3 1 OUT 4 1 4 4 2 2 is = (4+2) log2 (4+2) = 15.5 Table 3.3 Element counts for the algorithm of Figure 3.4. Sequential Elements (1) F218 (1) 1.§ IN1 1 2 PROG 1 1 IN3 2 2 c1 2 1 1N2 3 1 <. 3 1 OUT 4 1 4 6 3 3 = (6+3) log2 (4+3) = 25.2 _ 2 1 - ,added ‘ Es ‘ Es ’ 9'7 45 same. Either of these situations could cause the design step to reduce the partial conceptual volume for an algorithm. Thus vadded and Ladded may have negative values. 3.4 Summary This chapter introduced a graph language which will be used to represent algorithms in a model of the parallel algorithm design process. The graph elements are based on an abstraction of an algorithm's information content, and thus have a strong correspondence to the metrics discussed in chapter two. Halstead's metrics are normally used to describe completed algorithms. This chapter has shown that our extensions to them are also useful for describing the information content of algorithms during the design process. Thus, we can now observe the effects of a design decision on the information content of an algorithm. CHAPTER 4 Algorithm Transformations and Subsidization In chapter two, two versions of a root ,finding algorithm were introduced. There it was shown that the parallel version contained significant additional information, above that found in the sequential algorithm. This chapter will examine the placement of that additional information. Parallel algorithms, such as the root finder, may be developed using several approaches [3]. One approach is to start with a sequential algorithm and make a set of simple transformations to it. Such transformations are often based on isolating and replacing a small sequential segment of the algorithm with a parallel segment. When a replacement of this nature is made, how does this impact the structural complexity of the algorithm, and are there interaction effects between the parallel and sequential components? The 46 47 nature of these interactions will be interpreted in terms of algorithm transformations. 4.1 Algorithm Transformations Algorithm transformations can be categorized into four types (Figure 4.1), based upon changes in an algorithm's input and output sets. A type 1 transformation is the simplest. A single parallel code segment replaces a sequential segment. The parallel segment receives, as input, a single data object and produces, as output, a single data object. A type 2 transformation differs, in that it receives multiple input objects, but only produces a single output object. This is indicative of log sum reduction transformations. Many of the automatic transformations suggested by Ruck [3] are of this type. The parallel root finding algorithm is an example of a type 3 transformation. ‘A single data object is input, the search interval, and multiple data objects are output, the function values at the interval's midpoints. Finally, a type 4 transformation involves multiple inputs as well as multiple outputs. This type of transformation would be expected with many matrix problems, such as calculating determinents. 48 sequential segment Qarallel j C I J —v—4 1 Original Algorithm. Type 1. l.) l W C D C D C- D 1 Hi 4&— Type 2. Type 3. Type 4. Figure 4.1 Transformations for converting a sequential algorithm to a parallel algorithm. 49 4.2 Subsidization If we return to the root finding algorithms of chapter two, we can observe some features of these transformations. Recall that in creating the parallel root finding algorithm, a small sequential segment was replaced by a parallel segment. In chapter two, it was observed that the parallel algorithm had a higher information content than the sequential algorithm. We will now look at the two algorithms in an attempt to determine what portions of the parallel algorithm were responsible for this increase in conceptual volume. Table 4.1 shows the component lengths and vocabularies of the two algorithms. Table 4.1 Component lengths and vocabularies for the two root finding algorithms. ns np Ns Np N n sequential, BISECT 21 O 61 5 61 21 parallel, ROOTF 26 16 89 31 120 42 By examining Table 4.1, it can be seen that the length, N, of the parallel algorithm is approximately double that of the sequential algorithm. As expected, much of this increase is due to the inclusion of the parallel comnponent. 50 ND accounts for half of the increase in length. Since a small sequential segment of the sequential root finding algorithm was replaced, one would expect that the length of the sequential component of parallel algorithm would have decreased from that of the sequential algorithm. However, this did not happen. Not only was there no reduction in NS, it actually increased by almost 50%. In chapter two, the conceptual component volumes were calculated for the two algorithms. For. ROOTF 2p was 124 bits and 2s was 418.34 bits. Recall that the conceptual volume of BISECT was 267.93 bits and that the conceptual volume of ROOTF was 647.07 bits. While component volumes are not additive, they demonstrate that the bulk of the increase in volume was due to an increase in the sequential component of the algorithm. Thus, the increased information in the ROOTF algorithm was caused by the addition of the parallel component to the algorithm, plus by the an increase in the sequential component as well._ Another observation can be made by viewing the usage index, the ratio of Ni and ni for the ith component (Table 4.2). 51 Table 4.2 Usage indices for the two root finding algorithms. N N N ns np n BISECT 2.90 * 2.90 ROOTF 3.42 1.94 2.86 For both algorithms, the overall usage indices N/n are about the same. However, while the overall indices are about the same, for algorithm ROOTF, the sequential usage index, Ns/ns has increased, and.is larger than the parallel usage index, Np/np. The reason for this can be seen by examining Figure 4.1. The single output data object of BISECT has been replaced by multiple outputs from ROOTF. The final seguential segment of ROOTF must collect and analyze these multiple outputs. This results in an increase in .size for. the sequential component. We say that the sequential component is subsidizing the parallel component. Subsidization potentially occurs in three of the transformation types presented earlier. It is present in the sequential code and it serves to support the parallel portion of the algorithm by either preparing input for the parallel segment, or by collecting output from the parallel segment. However, the position of subsidization will vary depending on the type of transformation which took place (Figure 4.2). 52 r t > Type 1. Subsidization. Type 2. c—i—J _ Type 3. Type 4. fl Figure 4.2 The placement of subsidization for the four algorithm transformation types. 53 4.3 Summary One method of transforming a sequential algorithm into parallel is to replace a sequential segment of the algorithm with a parallel segment. When this is done, we have shown that information may be added to the algorithm. Contrary to expectation, our examination of the two root finding algorithms has shown that the information content of the algorithm may increase not only because a parallel segment is added to the algorithm, but also because the sequential component of the algorithm increases in size as well. The sequential component increases in size, because an interface may be needed between the original sequential statements and the parallel component. It was seen here that when a single sequential segment was replaced by a single local parallel segment, that the sequential code which provides the interface may be localized as well. The support provided by this interface was called subsidization. The possible presence of subsidization in a parallel algorithm, suggests that a designer must pay particular attention to the interfaces between the sequential and parallel components. Further research is needed to examine algorithms with multiple parallel components. It is anticipated that subsidization effects will be seen there as well. However, 54 the form they take may differ. Multiple parallel segments may not need sequential subsidization segments between them, or a single subsidization segment may serve several parallel components. Whatever form subsidization takes for algorithms with multiple parallel components, one suspects its presence will effect the design process for such algorithms. CHAPTER 5 A Parallel Bin Packing Algorithm In the introduction we indicated that we would conduct experiments on the design process for parallel algorithms. Parallel algorithms with a single localized parallel component will be used in the experiments. The ROOTF algorithm, which has been discussed in Chapters two and four, will serve as an example of such an algorithm with a small localized parallel component. This chapter will examine the 'other algorithm, which we will use in our experiments. Like the ROOTF algorithm, it has a single localized parallel component. However, the size of the parallel component is larger. Thus, we will have two algorithms for our experiments, one with a small parallel component, and one with a large parallel component. The second algorithm solves a version of the bin packing problem. Like the ROOTF algorithm, it is designed for use 55 56 on a multiprocessor system. The bin packing problem is a combinatorial problem. Combinatorial algorithms by their nature are computationally intensive. Exact solutions to them are often computationally intractable. Therefore, heuristic algorithms are generally used when confronted with such problems. One of the limiting factors in designing heuristic algorithms, is the amount of computations they involve. Therefore, combinatorial problems are good candidates for solution by parallel algorithms. Bin packing is an example of such a problem. Bin packing in the one dimensional case, is an assignment problem [21-27]. Objects of various sizes are to be placed into a set of bins of limited capacity, such that the minimum number of bins are used. There are two versions of the bin packing problem. The first is the on-line version. In on-line bin packing the objects must be placed into the bins one at a time, as they are presented to the algorithm. This is opposed to off-line bin packing, where all items are presented before the algorithm attempts to pack the items. The off-line version allows the algorithm to condition the data before it processes it. The most common form of conditioning used is to sort the data. It has been shown that the worst case performance of off-line bin packing algorithms is significantly better than that for on-line ones [21]. 57 The remainder of this chapter will present and discuss a parallel on-line bin packing algorithm. Its packing performance will be compared with that of sequential algorithms, both on-line and off-line. 5.1 The Bin Packing Problem The bin packing problem consists of placing objects of particular sizes into a set of standard size bins. In the one dimensional case, the objects are characterized by one parameter. Objects can be of size between 0 and 1. For a given problem, the algorithm will be presented with a list of items, L, of length n, 0 < a. g l, for each i, l g 1 i g n, where a. represents the ith element of list L. 1 The same value may occur in more than one position in the list. The bins into which the items will be packed, are specified by a set of indices, 1, ..., m, where B1 is called the first bin and Bm represents the last nonempty bin used. For any bin Bj the sum of ai packed into it cannot exceed 1. The current sum of ai stored in bin Bj is called the content of bin j, and is denoted as c(Bj). The amount of unused space in a bin is called the gap, and is given as l-c(Bj). Several sequential heuristic algorithms have been proposed for on-line bin packing [22-25]. These include the 58 first-fit, refined first-fit, and best-fit approaches. The best-fit is of special interest to us here. In the best-fit, BF, approach a new item is assigned to a bin Bj such that the gap in bin Bj will be minimum over all currently nonempty bins. It has been shown that the worst case packing generated by the best-fit method is: 1.7 L* - 2 5 BF(L) g 1.7 L* + 2. L* represents the number of bins in the optimal packing for list L [26]. Yao has shown that for any on-line algorithm the worst case performance can be no better than 3/2 L* [21]. Off-line algorithms, however, perform better, since the list of items presented to the algorithm can be conditioned. Typically they are sorted into decreasing sequence. The best-fit decreasing rule, BFD, consists of sorting the list of items into decreasing order, and then applying the best-fit rule. Johnson [26] has shown that the BFD algorithm has a worst case performance of (11/9) L* 5 BFD(L) 5 (11/4) L* + 4. At this point, we will examine the best-fit rule to see why its performance is improved when the list is presented in decreasing order. In order to do this, it is useful to think of the best-fit rule as partitioning the set of nonempty bins into two classes. The first class consists of 59 those bins whose gaps are not smaller than the size of the item to be added. This group, 61' represents the domain of the traditional best-fit rule. From this group, the bin which will have the smallest gap, once the new item is added is chosen. The second group, 62, consists of those bins whose gap is too small to receive the new item. These bins are not candidates to receive the new item when using the best-fit rule. Thus, as group G2 grows large, the number of choices available to the best-fit rule grows small. Now it is easy to see why the best-fit decreasing algorithm works better. The larger the ai, the fewer the bins from which the algorithm can choose the best-fit.4 Since the larger items are at the beginning of the list, at which time the maximum cardinality of the set G1 is small, the impact that an item will have on the size of G1 is small. On the other hand, if a large item is placed in the middle of the list, it will mean that many of the partially filled bins will not have enough room to hold it. The best-fit algorithm will, therefore, have fewer places to store the item than would otherwise be the case if ai was smaller. The BFD algorithm is successful because the presorting allows more bins to be examined as potential candidates to receive a new item, than if the list of items was unsorted. This suggests, that if the performance of a best-fit rule is to be increased, that a way to expand the 60 number of bins from which a choice is made, must be found. 5.2 The Best-Fit With Replacement Rule 1 In order to increase the packing performance of the best-fit rule, the search space (number of bins which are considered) will be expanded. The following rule causes both sets G1 and 62 to be searched when an item, ai, is presented for packing. 1. For those bins in G1, find the bin, B. whose gap 1' will be minimum with the addition of ai. If no Bi E Gl satisfies this condition, return the null set. 2. For those bins in G2 with a gap > 0, find- a bin, 3]., stored within it is replaced by ai. If no Bi 5 62 satisfies this condition, then return the null whose gap will be minimum when one object set. 3. Select from the two sets generated above, the non-null set that contains the bin with the smallest projected gap. In case of ties, select the bin with the smallest index. If it is from G1, then add ai to the contents of the bin. If it is 61 from 62, replace the item in the bin with ai so that the gap is minimized. When both candidate sets are null, place ai in the next unused bin. 4. If an item has been replaced by ai in Step 3, repeat Steps 1 through 3 until no new replacement occurs and then process ai+1. This new rule is called best-fit with replacement (BFR). The replacement option expands the set of bins which are considered to all bins, except those which are empty, or whose gaps equal 0. To show how this rule works, the following list will be used: 6 occurences of .129 followed by 6 occurences of .352 followed by 6 occurences of .519. In Figure 5.1, the BFR packing of this list is compared to the optimal packing and that derived from the BF rule. It takes 10 bins using the BF rule, but only 8 bins with the BFR approach. The optimal packing uses 6 bins. The difference between the optimal packing and the BFR packing is 2 bins, or 1/3 of the optimal packing. The BF rule uses 4 extra bins equal to 2/3 the optimal packing, or double the extra bins. ‘ The list which was used in Figure 5.1, presents a worst case situation for the BF rule [26], because the physical ordering precludes the placement of a small, medium, and 62 BIN l BIN 2 BIN 3 BIN 4 BIN 5 BIN 6 BIN 7 BIN 8 BIN 9 BIN 10 .129 .352 .352 .352 .519 .519 .519 .519 .519 .519 .129 .352 .352 .352 .129 O 129 .129 O 129 A. BF Packing Bin 1 Bin 2 BIN 3 BIN 4 BIN 5 BIN 6 BIN 7 BIN 8 .352 .352 .519 .519 .519 .519 .519 .519 .129 .352 .352 .352 .352 .129 .129 B. BFR Packing Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 .519 .519 .519 .519 .519 .519 .352 .352 .352 .352 .352 .352 .129 .129 {129 .129 .129 .129 C. Optimal and BFD packing. Figure 5.1 A comparison of the BF, BFR, BFD, and Optimal Packings for the example list. 63 large item in each bin. Such a placement is necessary to obtain the optimal packing. On the other hand, the BFR rule, through the use of replacement, is able to make enough reorderings to significantly improve the performance. Later, it will be shown that the BFR rule will repeat its good performance when presented with longer lists. 5.3 Implementation of the BFR Rule as a Parallel Algorithm Now that the BFR rule is established, an algorithm which uses the rule will be developed. Such an algorithm could be either sequential or parallel. Note, that the improved performance is gained by increasing the number of bins, which are considered each time an item is added. This results in a direct increase in the number of computations. which will be required. Although a sequential algorithm may be sufficient for small lists, for larger lists, the number of computations would be prohibitively large. Thus a parallel implementation for our algorithm is desirable. The next question to ask then is, what aspects of the problem can be done in parallel? The bulk of the new computations are involved in evaluating the capacity of each bin to hold a new item. Therefore, this seems to be the appropriate place to consider parallel acitivities. Since each bin may have a different number of items in it, and that the bins are 64 divided into two groups, G1 and 62, suggests that asynchronous parallel processing may be appropriate. Additionally, the item to be packed must be distributed to the processors, and the results compared to select the proper bin. This suggests a master-slave relationship between the processors. These requirements are well supported by a multiprocessor system. Therefore, a parallel algorithm for the BFR rule on a multiprocessor system will now be presented. The algorithm will be refered to as the MBFR (multiprocessor best-fit with replacement) algorithm. In the parallel version presented here," a process is associated with each active bin. A bin is active if it is neither empty nor has a gap of zero, when an item, ai, is presented on-line to be packed. It is given to a master process which broadcasts the item's value to each active process. For each active bin, the associated process computes the new gap that the bin will have if the item is added to it. If the bin has room for the item, it returns a value to the master equal to the new gap. If on the other hand, the item is larger than the current gap, the process scans the list of items in the bin. The items are assumed to be in decreasing order. The scan procedes from largest to smallest until an item, x., is found, such that ai is J greater than xj and ai is less than or equal to the current gap plus xj; or the size of xj plus the gap is less than ai. In the former instance, a value is returned equal to the 65 projected gap if xj were to be replaced by the new item, along with the x5 to be replaced. Otherwise, a null value would be sent, representing that no replacement is possible. The master then collects the projected gaps or png from each of the active processes. It then selects the bin with the smallest projected non-null gap to receive the new value. If the bin is from the G1 set, a signal is sent to the associated process to insert the new item into the ordered list, and reduce the gap for the bin by the value of the item. When the bin is from the 62 set, the associated process is told to remove the item to be replaced and to insert the new item. The gap is also updated. Next the master broadcasts the replaced item to all the active bins, and the cycle continues until no replacement is necessary. When this occurs, the master reads the next item from the input list. The algorithm will always terminate, because a replaced value may be only smaller than the item to be added. Thus if a replaced value also causes a replacement when it is re-added, eventually it will be too small to replace other items. A version of the algorithm is given in Appendix A. Simulations of the algorithm have been run on various length lists. The results of these simulations will be discussed in the next section. 66 5.4 The MBFR Algorithm Simulation A simulation model of the MBFR algorithm was prepared to investigate the algorithm's performance. The model, in sequential Pascal, simulates the parallel algorithm, and assumes an arbitrary number of processors. Three hundred trials were run using the simulation model. Thirty runs were made for each of ten list lengths, ranging from 18 to 180 items. We now discuss the structure of these lists. Graham [27] analyzed the performance of the first-fit and best-fit bin packing algorithms. He suggests a class of lists for which these algorithms perform poorly. The example presented previously uses a list belonging to this class. The lists are defined as: Let n be divisible by 16, and let 8 satisfy 0 < 8‘ 5 1/84. Define a list L - (a1,a2,...,an) by 1/6 - 28 for lfiifin/3 1/3 + 8 for n/3fim Lopes: pmoEpmoH one .sesaoo comm :H .gmeoH Pmfia mono pom msfixoma Hmsfipno on» mmpoCmc * ddv ma dnv an ddv an .Nv Na ddv an ddv an ddv da .Nv an dAv an Adv ca Adv so ddv an dAv «m dav an ddv sN d~v as dav as xdv ans xsv dn ddv an. ddv an dsv dd ddv as ddv so ddv so dsv on ddv ms dmv sn ddv an dnv «N ddv as ddv do ddv an dnv as ddv as dnv an ddv ad dnv dd dnv so ddv No ddv do ddv as dmv ~s dnv um ddv s~ ddv SN dmv nd dnv m de ss ddv on ddv on ddv as duv ds ddv dn d~v ma ddv «d dnv sd dev d Iddv .os dddv .sn dndv .ms dmv .Ns dmdv .sn duv son dddv ssd dsdv sod dmdv .Nd dndv so and Ned ssd sad mod ca «a sn an ad maduaud dmdd .mcowPMpBELom Eopcmp on map Mo none pom .szCmH mea coapmazswm nomo-pom mwcdxoma d.m mdnme 69 mm. mm. mm. sm. mm. m. mm. mm. mm. mm. an *d-dozvmamz soaps unoppo>mp mm. om. mm. ad. ad. mm. dm. ad. ad. md. mmmsao sand spdddnmnodm . . . . . . .- . . . meaddm s ds no mm s mm no am 0 ad m om mm md m: m no a mm m unmodm>mm . . . . . . . . . . an ammo mm d mm d mm d sm d mm d on d mm d mm d mm d mm d_mmmz pmdoz . . . . . . . . . . EH mo d dd d mo d md d so d sd d Nd d Nd d mo d d d mam: a: mnexomm mn.ms om am.dm ad.ms as.mm m.sn sm.om md.om md 8.0 mwmdm>< wconmm ow sm ms ms em on sm md Nd e daedpao omd Nod add wmd med om an an on ma AAV newcofi pmdd .ddv rammed dmdd some doe Enpwnowam mam: map mo commemomdmm one mo assessm < m.m manna 70 1.5 db 1.1» 4- Average MBFR/L* H x» 1.2 <> 1.1 q—\/—_/W\ 100 " A _____ A A L A A ‘ I t v V ' V f T 18 36 5h 72 90 108 126 144 162 180 List Length Figure 5.2 Plot of the average packing as a ratio of the optimal packing for the MBFR simulation. 71 1.7 ll 10 1.6 .. 1.5 «P ,, 1.’+ .. Q I: $1.3 ‘ E o g 1.2 ‘ ll ‘3 9 4; 1 H 1.1 ‘f o 3 1.0 " 18 36 5# 72 90 108 126 14h 162 180 List Length Figure 5.3 Plot of the worst case packing as a ratio of the optimal packing for the MBFR simulation. 72 .5o+ :- O A (MBFR(WC)-L*)/L* is .20.. 18 36 5’4 72 90 108 126 1MP 162 180 List Length Figure 5.4 Plot of the maximum number of excess bins as a ratio to the optimal packing for the MBFR simulation. 73 80 1D w$+¢at¢ pa‘ka” A --- ’1’ ’3km" 60 " ,9 optbfns ,’ . FILES»...m m n,’ ”no”; .5 9.." / m , l 1’ I U I ’ fiuo .. ,’ m $4 0 m U) Q) 0 a? $.20 " 0 $4 0) E '1 F“ T 18 36 54 72 90 108 126 1uu 162 180 List Length Figure 5.5 Worst case, average, and optimal packings with the maximum and average number of processors required by the algorithm.for the MBFR simulation. 7L» considerably better than the worst case; furthermore, by examining Table 5.1, it can be observed that the optimal packing is obtained 42% of the time. Table 5.2 and Figure 5.4 also show the deviation of the worst case from the optimal as a ratio of the optimal. It is interesting to note that the algorithm always performs better than 1/3 and levels off at better than 1/4 for the longer lists. It should be noted that these results were obtained with the algorithm only having to perform a few reverberations. For example, on the average, only 2.66 items in a list of 18 items caused a reverberation. The maximum number of cascading reverberations for any of the lists was only two. Thus, selecting from 62 only a limited number of times, still improves the results significantly over that for the best-fit algorithm. Finally, the results show the number of processors which are required. Remember that the simulation assumed that one processor would be assigned to each active bin. In the worst case, Figure 5.5 shows, that the number of processors seems to be bounded by the optimal packing, not the worst case, as might be expected. The number of processors also appear to be significantly less than the average packing as well. If we look at the average number of processors used, we see that the average is considerably less than the optimal, average, or worst case packings. This shows promise for implementing the algorithm on a 75 multiprocessor system with a limited number of processors. In such a case, each processor would be assigned several active bins. But as can be seen from Figure 5.5, even for large lists, each processor would still only have to handle a limited number of bins. 5.5 The Computational Complexity of the MBFR Algorithm This section will discuss the computational complexity of the MBFR algorithm. The multiprocessor algorithm is based on the best-fit with replacement rule. This rule is an extension of the best-fit rule, which was discussed earlier. Therefore, the computational complexity of the best-fit algorithm will be discussed first, in order to give a basis of comparison for the MBFR algorithm. In order to determine the computational complexity of the BF algorithm, we will focus on the computations which test a bin to see if a new item is less than or equal to the current gap. Since this testing process is based on the list length and dominates the computations, the computational complexity will be expressed in terms of the total bins tested for packing a list of n items. A worst case scenario for the BF algorithm is to attempt to pack a list of items whose sizes are all the same, greater than .5, and less than 1.0. When such a list is packed, only one item can be placed into a bin, and no 76 bins can ever be completely filled. Thus, when an item ai, is presented to be packed, i-l bins must be tested to see if the item will fit. Thus, for a list of n items, 251 (i-l) bins are tested. This results in a computational complexity of 0(n2). I If the same type list is presented to the MBFR algorithm, the number of bins which are tested will be the same, $21 (i-l). Although the i-l- bins are tested simultaneously, the master process must still compare i-l bids each time an item, a is presented to be packed. Note i' that for the MBFR algorithm, the time which is required to test the individual bins is bounded by a constant, since the bins are tested simultaneously each by a seperate processor. Therefore, the computations that dominate are those' of the master process which tests each of the bids generated by the slave processors. Thus, for the MBFR algorithm, the computational complexity is based on the number of bids that the master process must test to pack a list of n items. The type of list which elicits a worst case performance for the BF algorithm, however does not produce a worst case performance for the MBFR algorithm. The MBFR algorithm will show its worst performance, if two conditions are met. The first condition is that each time an item is added, it must cause an item to be replaced. Also, for each item which is replaced, when it is repacked, it must cause a replacement as well, unless it is the smallest item in the list. The 77 second condition is that only one item may fit in a bin, but the item may not fill the bin. This condition causes i-l bins to be tested, if item ai is presented to be packed. An example of such a list is the following: .6, .7, .8, and .9. That is, all items are greater than .5 and less than 1.0; no items have the same value: and the list is in ascending order. If the first two items are packed, bin 1 will contain .7 and bin 2 will contain .6. When .8 is presented to be packed, 2 bids are generated. .8 will be placed into the first bin and displace item .7. Item .7 is then ’repacked. Again 2 bids are generated. This time .7 is placed into bin 2 and it displaces .6, which must then be repacked. Thus, if item ai is presented to be packed, i-l bids are generated, and the item is packed by displacing the item in the first bin. That item is then repacked, and it will displace the item in the second bin. This process continues until all items have been shifted one bin. This results in the master process having to consider n 2: (i-l)(i) bids. Thus, the MBFR algorithm has a igmputational complexity of 0(n3). Therefore, we can see that our simulation suggests that the MBFR algorithm has a better average performance than the BF algorithm. However, in the worst case, it does not perform as well. This is not a major problem for the MBFR algorithm, however, since the worst case for the MBFR algorithm only arises for a list of items with very 78 restrictive constraints. Also note, that since the list must be ordered for the worst case to occur, the packing of such a list would be best handled by an off-line algorithm. 5.6 Summary This chapter presented a second sample algorithm to use in our experiments, a multiprocessor bin packing algorithm. Like the ROOTF algorithm introduced earlier, it has a single localized parallel component. However, the ROOTF algorithm has a small parallel component, while the parallel component of the MBFR algorithm is much larger. Thus, we have examples of two algorithms with single, localized parallel components. One has a small parallel component and one has a large parallel component. A simulation was used to show that the on-line MBFR algorithm has a packing performance comparable to the off-line best-fit decreasing (BFD) algorithm. Although further analysis of the MBFR algorithm would be interesting, for our purposes it is not needed, since we are primarily interested in the structural properties of the algorithm, not its behavior. The ROOTF and MBFR algorithms will be used in Chapter seven, to examine the role of spatial, temporal, and functional information in the design process for these two algorithms. The next chapter will present a model of the algorithm design process. A simulation based 79 on that model will be used in Chapter seven to conduct the experiments in parallel algorithm design. CHAPTER 6 The Algorithm Design Model The previous chapters showed that parallel algorithms may contain additional complexity beyond that found in their sequential counterparts. Some of that complexity resided in the parallel component, and some of that was the result of additions to the sequential component. The size of this added complexity may impact the success of making different types of design decisions. The question which is addressed by this research is; in algorithms with a single parallel component, how does the granularity of the parallel component impact the performance of design methodologies. In this chapter, we will develope a model of the algorithm design process which will allow us to study this question. The design process will be modeled using the graph language representation of an algorithm, introduced in Chapter three. Section 6.1 will describe the basic design 80 81 unit in our model, the design step. A design step, in our ' model, consists of two phases; a placement process, and an elaboration process. Section 6.2 will discuss the placement process, and Section 6.3 will describe the elaboration process. A procedural description of the model will be presented in Section 6.4. Finally, the chapter will conclude by describing a simulation model based on the abstract model of Sections 6.1 through 6.4. The simulation model will be used in Chapter seven, to conduct experiments in parallel algorithm design. 6.1 A Design Step In Chapter one, we characterized the design process as producing a sequence of partial designs. In Chapter three, an example of two successive partial designs (Figures 3.3 and 3.4) was used to show that we could use our software metrics to quantify the change in the information content from one partial design to the next. The change in the conceptual volume between the two partial designs was a result of making modifications to the structure of the first partial design. Two basic types of decisions that underlie these modifications can be observed. One process determines the positions in the current partial design where modifications are to take place, i.e., where to make additions, deletions, or alterations. The other process 82 determines what modifications will be made at these positions. This process includes decisions about the addition, deletion, or alteration of portions of the current partial design. It also determines the size and functionality of the modules which contain those portions of the current partial design that are to be modified. Each of the above processes have been. extensively discussed in the literature [4-10,28,32]. For example, the placement of a set of modifications in a partial design dominates many discussions of design methodologies. These decisions are considered so important that design strategies have often been named after this selection process, i.e., top-down and bottom-up design .[5]. Some of these methodologies are based on a hierarchical design approach [34]. That is, the problem is described at several levels of abstraction. At the first level, the algorithm is described in general terms and at succeeding levels, an increasing amount of detail is added. Top-down approaches suggest that the designer create an algorithm in a hierarchical fashion. At one level a main function is created and at succeeding levels, subfunctions which fill in the details of the algorithm, are added [34]. Top-down methodologies dictate the levels or placement of the functions in the algorithm which the designer may deal with during a particular design step. Bottom-up design, likewise describes an algorithm in several levels of abstraction. 83 However, bottom-up design suggests that the designer start at the lowest levels of detail and group low-level actions together into successively more generalized functions [32]. Although these two methodologies differ, they both aid in the placement of the next set of modifications to the current partial design. In this research, we will call this process theplacement proces . The second part of a design step, the process of deciding what modifications are made as part of a design step, has also achieved much attention [5,28,32]. Much of this discussion has centered on dividing an algorithm into manageable size units (modules). Some of these discussions involve; module independence, interfaces between modules (coupling), module cohesion, how modules should cooperate, as well as data hiding. For example, in discussions on cohesion, Wirth [8,32] suggests that certain instruction sequences which perform .simple basic actions should be grouped into primitive modules or "action clusters”. Others [4-10], offer suggestions such as limiting module descriptions to one physical printer page, or to collect in a module a set of related primitive functions, i.e., basic string handling functions. All of these and similar suggestions serve to decompose the problem, and therefore, the design into components. In this research the process of performing this decomposition and deciding what objects should be added, deleted, or altered as a result of a design 84 step, will be called the elaboration process. Recall, that the development of an algorithm can be represented by a sequence of partial designs. The placement and elaboration processes are used to transform one partial design into the next. The set of decisions and resultant actions which lead from one partial design to the next, as a result of the placement and elaboration processes, is called a design step. A gggign methodology is a set of rules which are used to aid the designer in making the placement and elaboration decisions. Thus, a design methodology can be represented as a set of guidelines which a designer uses in order to make the decisions which lead from one partial design to the next. For convenience, this research will limit the placement decision for a design step to one location in a partial design. This restriction will not limit the generality of our model. A design step which would have allowed independent modifications to several locations can still be modeled as a sequence of design steps, one per location, for each location where a modification was to be made. The next section will discuss how our model of the algorithm design process represents the placement process. It will be followed, in Section 6.3, by a similar discussion of the elaboration process. 85 6.2 The Placement Process The placement process is the set of decisions which determine where to make the next modification to the current partial design. There are several ways in which this component can be selected. These criteria are based on selecting a component from the set of algorithm components which are not in the current partial design [5-10,28,32]. 1. Pick a component which is physically adjacent to a particular component in the current partial design, i.e., in the same module. 2. Pick a component which is concurrent to a component in the current partial design. 3. Pick a component which provides an input to a component in the current partial design. 4. Pick a component which receives an output from a component in the current partial design. 5. Apply criteria 1-4, but only with regard to the most recently developed component, i.e., the relationship 86 must connect the new component to the component which was dealt with by the last design step. 6. Pick a component which is at the same level of abstraction as the last component, i.e., if the last design step added part of the main procedure, the next component should be part of the main procedure as well. 7. Pick a component which is at an adjacent level of abstraction as the last component, i.e., add a subprocedure to the procedure which was dealt with by the last design step. 8. Pick a component with a similar functionality as the last component, i.e., if the last component did a particular string handling function, the next component should deal with string handling as well. These criteria are just a few of those used by software designers. However, they serve to illustrate the types of information a designer uses to make the placement decision. The placement decision can be described relative to the current partial design. That is, the decision somehow exploits a relationship between the new component and those Components already in the current partial design. These 87 placement relationships may be based on spatial relationships, criteria 1, 6, and 7, or on informational relationships, criteria 2-4, and 8. In otherwords, these relationships connect components because they are physically close in an algorithm's representation, or they share common functions and/or data items. The temporal order in which components are added to a design can be used to describe placement relationships. For example, placement relationships may connect the next component to the component which was added or modified by the last design step. However, placement relationships may also connect the next component to a component which was dealt with by the second to last or other previous design step. Also, some of the relationships used by a placement decision may be based on the last design step, while other relationships, which 'are based on other previous design steps, may be used by the same placement decision. Thus, we can see that the information, which is used by a design step, may have a temporal order to it. In order to model the placement process, we must identify a set of placement selection criteria to use. Based on the information which is available in our graph language model, the following relationships can be easily identified. They are based on both spatial and informational relationships between the structures of the graph language. These relationships are the following. 88 1. Components which are concurrent to each other. 2. Components which are in the same module. 3. Components which provide inputs to other components. 4. Components which receive outputs from other components. The above criteria will be used to select the placement of the next component, based on a relationship with the current partial design. Some of the relationships are based on the physical positions of the objects in the graph language representation of an algorithm (spatial locality). Other of these relationships are based on a common functionality or the direct communication of data between the graph language structures of an algorithm (informational locality). These relationships also have a temporal order to them. If we use the above relationships to connect a new component to the component which was dealt with by the last design step, we say that it is a £1553 £5925 placement relationship. We call this a first order relationship, since the new component is related to a component which was treated by a decision which is one design step away. If we use the relationships to connect the next component to the component which was added 89 or modified by the second to last design-step, we ’say the placement relationship is ggpppg ppggp. A relationship based on a component added or modified by a specific previous design step will be called an app pigs; placement relationship. It is possible that a design step uses multiple placement relationships. If this is the case, we will assign the temporal .order of the highest order placement relationship, which is used by that design step, to the decision. ' In order to model placement relationships, we will partition them into two classes. Relationships based on the above four criteria, which are of the first order, will comprise the first class. Relationships which are second or higher order form the second class. Since the relationships in. the first class are based on spatial or informational localities, and are temporally local, i.e., are based on the last design step, we will call a placement based on these relationships a localized placement. Placements which are based on the relationships in the second class will be called non-localized placements. Non-localized placements are those based upon second or higher order temporal relationships. To model this broad spectrum of possibilities would be quite complex. Therefore, non-localized placements will be modeled probabilistically. This will allow us to represent all possible criteria for placements, without having to include 90 the many combinations of relationships that some of these criteria would require. Thus, if the decision is made to use a non-localized placement, one of the components not in the current partial design will be randomly selected to be treated by the next design step. Localized placements in our model are dependent on the last component which was dealt with by the previous design step. They will be modeled by one of the four locality relationships described above. Later in Section 6.3, it will be seen that our design model allows for the possibility of errors during the design process. If an error occurs, control structures in the last component may not be complete. To allow these control structures to be completed, a localized placement may also choose the last component, if the previous design step did not complete it. The next design step then has the possibility of completing it. The process of selecting the next component consists of looking at the last component which was added or modified in the current partial design. Then it is determined whether any of the components which are not in the current partial design are connected to the last component, based on one of these localized relationships. If so, one of those components is selected. There may be several possible candidates for selection, and modeling the decisions to choose between them would be complex. Therefore, the 91 selection from the list of possible candidates will also be probabilistic. Thus, a localized placement probabilistically chooses the next component from a constrained subset of the remaining components, and a non-localized placement probabilistically chooses from the - set of all remaining components. Recall from the first part of Section 6.2, that several placement criteria were introduced. Those criteria are either modeled directly by the localized placements, or indirectly by non-localized placements. Since we can combine the localized placements in any order, or include any ~component using a non-localized placement, all possible first or any degree order placement criteria can be represented by our model. We will now discuss the process by which the decision to use a localized or non-localized placement is made. Many factors influence the progression a designer uses in creating an algorithm [7-9]. These include: 1. The designer's perspective of the problem. 2. Simularities between the -problem and those encountered previously by the designer. 3. Available mathematical models of the problem. 92 4. The language being used to design the algorithm. 5. The design methodology used by the designer, i.e., top-down, bottom-up, stepwise refinement, etc. These factors are highly variable and differ considerably between, designers. Also, we are not interested in how they are determined, but only the relative frequency of using the different criteria. Therefore, the decision to use a localized or non-localized placement will be modeled by a frequency distribution. Likewise, if the choice is made to use a localized placement, the selection of which of the five relationships to use will also be modeled by a frequency distribution. These frequencies will be represented in our model by a set of probabilities. They are the following: PR Probability of using a non-localized placement. I Probability of using an input placement relationship. Po Probability of using an output placement relationship. 93 C Probability of using a concurrent placement relationship. M Probability of using a module relationship. Ps ' Probability of using the same component. We call these probabilities the design methodology probabilities. By representing the placement decision as a set of probabilities, we will be able to examine how the relative frequency of use for each of the placement criteria affects the rate at which information is added to an algorithm's design. This will be done by examining the change in conceptual volume between successive partial designs. By varying the design methodology probabilities, different design methodologies can be examined. The next section will discuss how a component is processed, once its placement has been determined. 6.3 The Elaboration Process Previously in this chapter, we have been using the term component to refer to those algorithm graph elements which are added or modified in the current partial design as a 94 result of a design step. We will now consider a more complete description of those graph elements. In order to design an algorithm, the problem which is to be solved is often decomposed into managable size units. We will call these units design modules. In the literature [5,28,32], several criteria have been suggested for decomposing the design of an algorithm into design modules. 1. Divide the problem into major functions. Let each function be represented by a design module. 2. Design modules should be independent. The meaning of a design module should be independent of the context of the algorithm. 3. Design modules should be self contained, i.e., one should be able to replace a design module in an algorithm by a different one with the same function, and the algorithm should still be correct. 4. The size of a design module should be limited. Its representation may be limited to a number of printer lines. 5. Group primitive actions which perform a basic function together into a design module. 95 Most of these criteria are based on a functional subdivision of an algorithm, and thus are highly dependent on a designer's perspective of the problem. Two designers given a particular problem, will often arrive at different modular decompositions. The modular decomposition of an algorithm can also be hierarchical [5,28]. Thus, a designer may decompose an algorithm into a set of high level modules, which are then further decomposed into a set of lower leveled modules. Again, the choice of how to select these modules is variable, and different designers will arrive at different modular decompositions. This variability presents a problem for us in our attempt to model the elaboration process. Therefore, the modular decomposition of the algorithms used by the model will be prespecified. We will model the design process for an algorithm through the process of recreating a graph language representation of it. We will assume in our model, that algorithms are designed in a modular fashion. Recall, that the graph language is based on a functional representation of an algorithm. The objects in a control structure, or which are in control structures that are nested in the first structure, represent a functional unit. If we assume that the algorithm was created in a modular fashion, it is reasonable to assume that these functional units could be designed as a unit. Thus, these functional units can 96 represent a possible decomposition of the algorithm into design modules. These design modules are not necessarily unique. One designer may group one combination of functional units together into a design module, and another designer may use a different Vgrouping, even though the underlying functional units are the same. Thus, in order to control the experiments we will prespecify the design modules for an algorithm. These modules will represent the module placement relationship and will remain fixed for all experiments using a given algorithm. Design modules represent a high level decomposition of a problem. However, the information which is represented by a module may be too much to consider during one design step. Therefore, it is often necessary to further divide a design module into units which can be treated by a single design step. The portion of an algorithm which can be dealt with at any moment is limited. Halstead [16] and others [29] suggest that humans are limited in the number of mental discriminations per second which they can make. They suggest that the design units programmers deal with are not individual elements, but groups of objects. These groups have been called chunks [30], and represent those discriminations which can be made as a unit. In this research, a chunk is assumed to correspond to such a group of objects, but it also includes the design decisions which 97 relate those objects together. If an attempt is made to deal with too much information in one chunk, errors often result. Additionally, sometimes a designer makes an error, even when a chunk deals with limited information. Therefore, our model of the algorithm design process will consider the possibility of an error each time an object is considered for inclusion in a chunk. In our model, 3 522g; represents Egg objects Epigg_ gpg design gpgp. A chunk is what the previous sections in Chapter six have refered to as a component. Chunks are the result of decomposing a design module into units which can be treated by a single design step. Although, our model will prespecify the modular decomposition of an algorithm, chunks will be determined at the time the algorithm is designed. Chunks are determined as follows. First, a chunk must be contained within a design module. A control structure is used to identify a chunk. Objects which are in that control structure, or which are in control structures nested in the first structure are candidates for a chunk. Note, these objects also represent a functional division of the algorithm. A chunk differs from a design module, in that the size of a chunk is tightly controlled. A probabilistic limit will be placed on the number of objects which may form a chunk. Thus, the functional decomposition, which can be seen in the graph language representation of an 98 algorithm is used in two ways. First, at a high level we will group these functional units into design modules and feed a description of this grouping into our model. Second, the functional units will allow the model to form basic units which are small enough to be treated by a design step, yet represent related acitivities. These second units are called chunks. The basic action of our model is a design step. It involves the determination of the starting point for a chunk by the placement process, and then the subsequent elaboration of that chunk. This process is being modeled by recreating a graph language representation of an algorithm. Therefore, we will place one further restriction on the elaboration process. That is, we will restrict the elaboration process to adding graph language elements to a chunk. Since the target algrithm is predetermined, the only way we can insure that the proper algorithm will be designed is to prespecify the elements in the algorithm and eliminate backtracking in the design process. Thus, the elaboration process consists of identifying the objects in a chunk and adding them to the design, subject to the limitations of the designer making an error. As mentioned earlier, each time an object is considered for inclusion in a chunk, the possibility of an error will be considered. This is done for two reasons. First, it is a means to limit the size of a chunk. Second, all designers make errors. This allows 99 the impact of these errors to be represented in our model. 6.4 The Procedural Model The previous sections described an informal model of the algorithm design process and its relationship to current software design theory. We will now specify our model of a design step in procedural terms. Figure 6.1 gives a high level outline of our algorithm design model. The model consists of two parts: an environment and a designer. The environment represents the information which is available to the designer. It includes the methodology probabilities, the graph language representation of the algorithm, the localized relationships between the graph language components, and statistics about the algorithm which is being designed. Most of this information is provided to the model and is predetermined. The designer part of the model represents the process of algorithm design. The placement and elaboration activities are represented by this portion of the model. We will now examine the designer process. The designer process starts in box a of Figure 6.1. Using the methodology probabilities, it decides whether to make the next placement by using a localized or a non-localized decision. If a non-localized decision is selected, the model enters box d, which randomly selects any chunk which is not in the current partial design. If a 100 Environment Methodology Context sensitive Probability Relationships Distribution select . localized (localized) or non-lcc D for next 681“” chunk a A i ‘ choose (non-localized) next chunk Algorithm Resultant Components Algorithm & Statistics Figure 6.1 Abstract model of the algorithm design process. 101 localized decision was selected in box a, the model enters box b. Here, a particular localized relationship is chosen, based on the methodology probabilities. Then the model enters box c where a chunk, which is connected to the last elaborated chunk, is randomly chosen, based on the relationship which was selected in box b. Then from either box c or d, the model enters box e. In box e, the objects which form the chunk are determined. These objects may be input operands, results, operators, conditionals, or other control structures which are nested in the structure that starts the current chunk. The chunk elaboration process consists of selecting these objects from the algorithm's representation in the environment and placing the objects into the current partial design. As' the objects are selected, statistics are kept which indicate how many objects have been referenced and how many unique items are in the current partial design. These statistics allow the conceptual volume to be calculated at the end of each design step. When objects are identified as possible candidates for inclusion in a chunk, the possibility that the designer is making an error is considered. This is modeled by counting the number of objects in the chunk and probabilistically limiting the size of the chunk. This accomadates the notion discussed by Halstead [16] and others [29], that human designers are limited in the number of mental 102 discriminations that they can make as a unit. If an error is indicated, the object is not added to the cunk, and the elaboration of the chunk is complete. The model then reenters box a and continues in this fashion until all objects have been added to the design. This completes our discussion of the abstract model of the algorithm design process. We have modeled this process as producing a sequence of partial designs. A design step represents. those actions which are used to move from one partial design to its successor. The actions, which make up a design step, were described by two processes: a placement process and an elaboration process. The next section will describe a simulation, which was developed based on this abstract model. 6.5 The Simulation Model of Algorithm Design The previous sections described the abstract model of the parallel algorithm design process. In this section, a simulation based on the abstract model will be presented. The simulation model was implemented in Pascal on the Cyber 750. The simulation model consists of two parts. An environment and a designe . The environment represents the information which is used by the model during the design process. It includes a modular description of the target 103 algorithm, the methodology probabilities, the placement relationships, and the counts which are used to generate the software metrics. The designer represents the process of algorithm design. The designer process embodies the logic that uses the information in the environment to make the placement and elaboration decisions. 6.5.1 The ModelLs Environment The environment of the model was implemented as a series of tables and pointers, see Figure 6.2. The graph language algorithm is represented by tables STRUCT, OPERAND, and AFUNCT. Table STRUCT describes the structure and state of elaboration for the algorithm. Each graph control structure has an entry in table STRUCT. There are pointers from STRUCT for each graph element to tables OPERAND and FUNCT. A pointer to table FUNCT identifies the control structure's operator, and pointers to table OPERAND identify the control structure's operands, these are the control structure's inputs and results. Additionally, table STRUCT has entries for each graph control structure which indicate the control structure's context, sequential or parallel. There are also flags to indicate if each of the control structure's elements have been added to the design. Tables OPERAND and FUNCT each have one entry for every operand/operator that are used in the algorithm. Each entry STRUCT \~ CONCUR 104 OPERAND FUNCT NOTDONE l INPUTS \ \ OUTPUTS MODULE .Figure 6.2 Simulation model's representation of the design environment. 105 consists of two flags. The flags correspond to the context, sequential or parallel, under which references to the operands/operators occur. The flags are set after the first reference to the objects in a particular context are made. This is done in order to achieve a count of unique algorithm elements for the conceptual volume metric. The first order locality relationships are represented by tables CONCUR, INPUTS, OUTPUTS, and MODULE. 'CONCUR represents the relationship between graph control structures that are concurrent to each other. INPUTS and OUTPUTS represent the relationships of control structures that provide inputs or receive outputs from other control structures. Finally, MODULE indicates which graph control structures form modules of the algorithm. Each graph control structure which participates in one of the relationships with other graph control structures, has entries in the appropriate tables. An entry consists of a pointer to table STRUCT which identifies a control structure. Additionally, it includes a pointer to table STRUCT for each control structure that is related to the first entry. Table NOTDONB is .used to allow a non-localized placement to choose the next graph control structure. It contains entries for all graph control structures which have not been fully elaborated. An entry consists of a pointer to table STRUCT. 106 The status of the current design is represented by several variables in the simulation model. P08 is a pointer to table STRUCT for the graph control structure of current interest. STEPS indicates how many chunks have been processed. Finally, UNIQUE and TOTELEMENTS hold the element counts which are required for generating the conceptual volume metric. 6.5.2 Design Methodolggies-In The Simulation Model A design methodology is represented by a set of frequency distributions. These distributions indicate the relative frequency of use for each of the placement criteria. For convience, these distributions are represented in the model as a cumulative probability distribution, Figure 6.3. The entries are used to select a decision type for choosing the next chunk. The criteria are based on the control structure that identifies'the last designed chunk (POS). The basic criteria and their associated codes follow: S The same control structure should start the next chunk. 107 M a» ................... C . ................... R d----_-------- o . I . s . : e 0-0 1.0 Figure 6.238imulation model's representation of a design methodology. 108 1,0 The next chunk should provide an input to, or receive an output from the last chunk. C The next chunk should be concurrent to the last chunk. M The next chunk should be in the same module as the last chunk. R randomly select the next chunk (a non-localized placement). In the simulation model, table POSITION represents this probability distribution. The table has one entry for each of the accented points in the plot of Figure 6.3. Different design methodologies are modeled by changing the shape of the curve. 6.5.3 The Design Process This section will discuss the simulation's implementation of the placement and elaboration processes. First, the model is initialized. This is done by reading a graph language based description of the algorithm into table STRUCT. The locality relationship tables CONCUR, INPUTS, 109 OUTPUTS, and MODULE are provided a description of the first order relationships between the components of the graph language algorithm. From these descriptions, tables NOTDONE, OPERAND, and FUNCT are also initialized. Table POSITION is initialized by reading a cumulative probability distribution. This distribution represents the relative frequency for each of the placement criteria. Finally, since the model is probabilistic, a pseudo-random number generator is required. The seed for the generator is initialized by reading the system clock. This gives a different pseudo-random number sequence for each trial. The simulation's implementation of the designer process will now be discussed. First, the placement of the next chunk is determined. This is done for all chunks, except the first one. The first chunk always starts with the outer most graph control structure. If we refer to Figure 6.1, we can trace the placement process. The placement process is represented by boxes a through e in Figure 6.1. In box a, a pseudo-random number is generated. This number is used to select the type of decision from the cumulative probability distribution which is in table POSITION. If a non-localized placement is selected, box d is entered. In box d, a chunk is randomly selected from the set of all graph control structures which are not in the current partial design (table NOTDONE). If a localized placement had been selected' in box a, the model would have entered box b, instead of box 110 d. In box b, a first order placement relationship is selected. This is done by generating a pseudo-random number and using it to access table POSITION. Once this is done, box c is entered. In box c,the next chunk is selected by examining the table for the relationship that was selected in box b. If more than one graph control structure is related to the last chunk, one of them is randomly selected. If no structures are related to the last chunk using the appropriate relationship, a different placement decision is tried by returning to box a. Once the starting control structure for the next chunk has been selected in either box c or d, box e is entered. Box e represents the elaboration process. The objects, which consitute a chunk, are identified from tables STRUCT, OPERAND, and FUNCT. If an object is to be included in a chunk, the entry for the object in table STRUCT is marked. These objects may be results, operators, input operands, conditionals, or nested control structures. Objects are added to a chunk in a predetermined order. First, the need for a chunk is determined by its results. Therefore, the result operands are the first objects which are added to the chunk. After the results are identified, the actions which yield those results are considered. Therefore, the operators are added next. Input operands and conditionals are lower level details, therefore, they are added last. Although all designers may not use this ordering for adding 111 objects to a chunk, varying this order would not significantly impact the results from our model. As objects are identified from the tables, the possibility of an error is considered, before an object may be added to the design by marking the tables in the environment. This is done by procedure ERROR. This procedure generates a pseudo-random number and test it against an error distribution, e"/1° , where x is the number of objects in the current chunk. This distribution allows us to limit the number of items in a chunk to match the notion discussed by Halstead and others, that the number of discriminations which a designer can make as a unit is limited. The distribution causes the probability of an error to increase exponentially as objects are added to a_ chunk. It bounds the number of objects at 21. If the pseudo-random number exceeds the value of the distribution, an error is assumed and the current chunk is complete, without including the new object. As table STRUCT is marked to indicate the object is now in the current partial design, counts of the objects which are referenced and the number of unique objects are accumalated. These counts are used-to compute the software metrics. This process of identifying the objects from the graph language description, and marking that description to indicate that the objects are in the current partial design continues until either an error occurs or until no eligible 112 objects are left to include in the current chunk. Eligible items are those in the control structure which starts the chunk, or those that are in control structures which are nested in the first control structure. Recall, lthat these objects represent a functional decomposition of the graph language algorithm. Finally, the elaboration process finishes in box e, by updating the tables in the environment to reflect the new partial design. This updating consists of removing entries from the relationship tables for control structures which have all their objects completed. The software metrics are then computed for the current partial design, and the process continues by reentering box a. Figures 6.4 through 6.6 summarize the designer process. The next section will illustrate the elaboration process by tracing a sample algorithm through two design steps. 6.6 An Illustration of The Design Step Process To illustrate the flow of the design process, the algorithm of Figure 6.7 will be used. We will trace through two design steps. The starting point is the outer most. graph control structure, PROG. First its label is marked, then the result OUT is marked. At this point the element counts for the 113 repeat if not first chunk then choose-next-chunk else outer structure is next chunk; elaborate next chunk; update metrics: update environment: output metrics: until (all structures are elaborated): Figure 6.4 An overview of the designer component of the simulation model. procedure choose-next-chunk; begin POS :3 null: repeat generate random number; find match to random number in table POSITION: case match 5: next chunk is current structure: R: select randomly from table NOTDONE; I: select randomly from table INPUTS: 0: select randomly from table OUTPUTS: C: select randomly from table CONCUR; M: select randomly from table MODULE; end case: ‘ until (POS <> null); end; Figure 6.5 Procedure for choosing selection criteria and next chunk. 114 procedure elaborate: begin ERROR :8 false: for I :- l to 5 do while STRUCTIPOS] is not done and ERROR is false do case I l: mark-label: 2: mark-results: 3: mark-operator: 4: mark-input-operands: 5: mark-conditional; end case: end: Figure 6.6 An overview of the elaboration procedure for the simulation model. 115 00 am ...30 .Hmpoe swdmoc ecpdhowam map mo soak map opmdpmsaaw op Enpwmowam oHQEMm « m.m.mdswdm 4 I I _* W: E ..d B duldHullu mm + B FIIIII rd E 116 metrics show two references and two unique items. The implied operator DO WHILE is then marked. Then the input operands, which in this case are graph control structures are marked. Assume that f1, f2, and f3 are probabilistically selected and that no errors occur when they are marked, but that an error is probabilistically indicated when we try to mark f4. Since an error occurs, f4 is not marked and processing of the first chunk is complete. Thus at the end of design step one, the element counts and conceptual volume will be that which is found in Table 6.1. The second elaboration is then started. Assume that a non-localized placement was selected based on the methodology probabilities. Also, assume that from the non-completed structures, structure f2 was randomly chosen. Since f2's label has already been marked, it is not remarked. The result W is marked, then the operator +, and finally the input operands x and Y are marked. Assume this time that no errors are generated. Since all of the objects in control structure f2 have been added to the current partial design, this completes step two. Table 6.2 shows the updated metrics. Now assume a localized placement is selected. Then the determination is made to use the concurrent relationship. Since f2 started the last chunk, structure f3 will start the next chunk. Processing continues in this fashion until the 117 Table 6.1 Element counts and conceptual volume for step one.of the abstract model illustration. , milk 1 milk 1 Sequential Element (1) Fn,s Parallel Element (1) Fan PROG 1 1 f2 1 1 OUT 2 1 f3 2 1 DO...WHILE g 1 f1 1 4 4 2 2 1 = (4+2) log2 (4+2) = 15.5 bits. Table 6.2 Element counts and conceptual volume for step two of the abstract model illustration. rank Sequential Element (1) F; s Parallel Element (1) F; P 9 ‘ ’ PRQG 1 1 f2 1 1 OUT 2 1 f 3 2 1 D0. . .WHILE 3 1 W 3 1 f1 4 1 + 4 1 x 5 1 y 6 1 4 4 6 6 K = (4+6) log2 (4+6) = 33.2 bits. 118 entire algorithm has been completed. 6.7 Summary This chapter has presented a discussion of the algorithm design process. In order to organize and integrate the many suggestions offered in the literature, we have divided the design process into two phases: a placement process and an elaboration process. We have called the process of determining the positions in an algorithm at which to make modifications the placement process. We saw that many design methodologies, i.e., top-down or bottom-up design, focus on this process. We also examined the process of modularizing an algorithm and determining what objects should compose a module. We called this process the elaboration process. This chapter then presented a model of the algorithm design process which was based on the placement and elaboration processes. An implementation of the model as a simulation was presented. This simulation model will allow us to examine the impact of using localized and non-localized placements in the design process. Since we have divided placement decisions into two classes, those which use first order relationships and those which use higher order relationships, we will be limited in our results to comparing only first order decisions to the others as a group. While comparisons between the use of 119 second and third order relationships may be interesting, such comparisons are beyond the scope of this model. In the next chapter, the simulation model will be used to examine the impact of making localized and non-localized placements on algorithms with different size parallel components. CHAPTER 7 The Experiments This chapter will use the simulation model of the algorithm design process which was developed in Chapter six. Experiments will be conducted to study the impact of different placement decisions on the design of parallel algorithms with different size parallel components. First, some properties of the two sample algorithms, MBFR and ROOTF, will be discussed. Section 7.2 will then describe the different design methodologies which were used in the experiments. The methodologies are described in terms of their use of localized and non-localized placements. Section 7.3 presents a preliminary discussion of the experimental results. These results suggest that the structure of an algorithm impacts a design methodology's performance. In order to examine the role of an algorithm's structure, Section 7.4 describes the differences in the 120 121 structures between algorithms ROOTF and MBFR which can be observed using the software metrics. This is followed in Section 7.5 by a discussion on how these structural differences impact the elaboration process. Section 7.6 presents a similar discussion on the placement process. This is followed in Section 7.7 by a discussion of how these results relate to human designers. 7.1 The Sample Algorithms The ROOTF algorithm from chapter two and the MBFR algorithm from chapter five were used in the experiments. Graph language representations of the algorithms are shown in Figures 7.1 through 7.3. Both algorithms have single local parallel components. The parallel component of the ROOTF algorithm is small, while the parallel component of 'the MBFR algorithm is much larger. Thus, the algorithms can be used to examine the effects of the amount of parallelism present in an algorithm on the design process for algorithms in which the parallelism is localized. Algorithm ROOTF has a structure which is predominantly sequential. Recall from Chapter two, that it was created by making a simple transformation to the sequential algorithm BISECT. Although it contains a small parallel segment, the programmer can still visualize the basic control structure as sequential. In fact, the primary parallel construct, the 122 .ESpwdomHm meoom one mo :oHpMPcomodgod owmswsma gamma d.m odswwm upbod . (I m . -.;-II 4.- ...1 -.|.u I !.I.IIIIW-II.I.I..n..I. IIIIIIIIIII . . — . _ In . dl _. H HUM: 2 _ _ _ +..«. 353" : Iii. . + _ e\ u «H. . w .58 . 2 A " 3W... :3... 1. x .1: . _ :3”. .W4 H+ n‘. H H.“ o .HN.“*K .+ .4 s .IQHI _ o.._ 583 J _W - w ”304. _ _ d ...: IS» 33. a. rush .a mm... . 1 _ O . a can: _ . IIJ . :uh cats ” d :2: ”_264 _ _ _ 2 : . IJ. FliiillLlllill.Illllliiiil.l..i [n-hilIllilllllllululnlllllllllli. 123 d.m.a ansmdm cd cmscaaxm md goods mommam maev .Enpfimowam mam: on» Ho sowpmpsmmonamn owmswsma gamma m.n mpsmwm H2mm¢F AA mUOu‘fl — J—v—J—n —— -— .— 0 _ sas::a . usoow _ _ .m... 1:9... ,: 4.36 £1630 z“ w54d> _ _2Hat:2..uH sputum — - —_—-— ——‘_, >443“: 124 .m.a mdsmdm do sapdsomda mama and dog deeds moam a... . nu u {:2 . «£302; x .+ may , - J J J a nauu WT. +nuuaasié m: c) c o .7 ) O 0 finu “3.2:, > :I 0. Us» -anuuxsozna NM 125 PARFOR statement, can be replaced by the sequential FOR statement, and the algorithm will still be correct. Therefore, the performance of the different design methodologies on algorithm ROOTF should relate to their performance on sequential algorithms as well as on algorithms with small localized parallel components. The MBFR algorithm, on the other hand, has a larger parallel component. The conceptual volume of its parallel component is 652.3 bits versus 53.7 bits for algorithm ROOTF. It contains a more complex interface between the sequential and parallel components. Unlike the root finding algorithm, the number of computations made by the parallel processes will not be the same. It will require more time to determine the bid for a bin with a large number of items, than for a bin which is almost empty. This requires the designer to be more aware of synchronization between the processes. Likewise, the designer needs to deal with more -communication between the concurrent processes and the master process. ROOTF only requires passing the value of the function at the subinterval's midpoint. MBFR must pass a bid and the value of an item to replace, or else communicate to the master that those values are null. Therefore, the performance of the design methodologies on algorithm MBFR will be representative of their performances on other, more complex, parallel algorithms. 126 7.2 Design Methodologies Used In The Experiments The design of each algorithm was simulated using five classes of design methodologies. Each design methodology was represented by a probability mix. The mixes are shown in Table 7.1. For each mix, the probability of a localized placement was evenly divided among the four first order relationships. The set of mixes was designed to examine the effects of using only localized placements, non-localized placements, or various combinations of both types of placements. Although the use of 100% high order placements can be modeled directly, the nature of the model developed here precluded the use of only first order localized placements. This is because not all possible localized relationships are identifiable in the graph language. All algorithm structures are related to all other structures in an algorithm, but only a subset of these possible relationships are considered by the model. These missing relationships present a difficulty for the model. It is possible that they are the only relationships for the last elaborated control structure. If this is the case, the model would not be able to select another structure using a first order relationship. This would cause the model to deadlock. Thus, the use of predominantly localized 127 Table 7.1 The design methodology probability mixes used in the parallel algorithm design experiments. Mix non—localized localized percentage , percentage 1 100 O 2 75 25 3 50 50 4 25 75 5 5 95 128 placements was modeled as 95% localized placements and 5% non-localized placements. In Figures 7.1 and 7.2, the high level modular decompositions which were used for the first order module placement relationships can be seen. The graph control structures within a dashed box form one possible high level modular decomposition. These modular decompositions were used for all of the experiments. The design mixes in Table 7.1, represent the use of varying combinations of‘ localized and non-localized placements. By modeling design methodologies using these mixes, we will be able to see whether there 71g correspondence between the distribution and granularity f POI the pgrallel information ip pp algorithmI and the pypg __ information which should be used to design such algorithm . 7.3 .The Experiments In the experiments, the designs of the root finding and bin packing algorithms were simulated. Since the model is stochastic, 20 trials were run for each algorithm and design mix. Table 7.2 gives the average number of design steps for each set of trials. In Table 7.2, we can see several things. First, for each design mix, on the average, it took more steps to design algorithm MBFR than algorithm ROOTF. If we look at 129 Table 7.2 The average number of design steps which were required for each design mix, for algorithms MBFR and ROOTF. Mix Average Number of Design Steps ROOTF MBFR 1 49.0 70.2 2 51.3 67.3 3 50.5 70.0 L» 50.8 72.3 5 48.0 72.2 130 just algorithm ROOTF, we can see that each design mix, on the average, took a different number of design steps. The same observation can be made between the design mixes for algorithm MBFR. These observations suggest two things. First, the design mixes caused different amounts of information to be added to each of the two algorithms. Second, for a given algorithm, different design mixes cause information to be added at different rates. Thus, we can see that not only are there differences in the performances of a design mix between the two algorithms, there are also differences in the performances of the five design mixes for a given algorithm. Table 7.3 gives the average amount of information added to the ROOTF and MBFR algorithms as the result of a design step, for each design mix. Also shown are the differences in the averages for a design mix between the two algorithms. In Table 7.3, we can see that an average design step always added more information to algorithm MBFR than to algorithm ROOTF. But, by examining the differences in the amount of information added between the two algorithms, we can see that the amount of these differences is not constant. If we look at the differences for mixes 2 through 5, we can see that they decrease from 2.16 bits to 0.41 bits. If we look at just algorithm ROOTF, we can see that as we move from design mix 2 to mix 5, that the averages increase from 11.71 . bits to 12.52 bits. But, if we look at the same averages 131 Table 7.3 The Average information (bits of conceptual volume) added to the algorithm by a design step for each design mix. Mix Average Information Added Difference Between the Averages For Algorithms ROOTF MBFR MBFR and ROOTF 1 12.26 13.30 1.04 2 11.71 13.87 ' 2.16 4 11.83 12.91 1.08 5 12.52 12.93 0.41 132 for algorithm MBFR, the averages decrease from 13.87 bits to 12.93 bits. Thus, we can see that for our two algorithms, the design mixes behaved in different ways. A given mix caused information to be added to algorithm ROOTF at a different rate than to algorithm MBFR. Also, for a given algorithm, the different mixes caused information to be added at different rates. We can see further evidence of these differences by examining Figure 7.4. In Figure 7.4, a temporal estimate of five minutes was associated with a design step. Recall, that a design step represents processing a chunk. Therefore, it represents a non-trivial set of decisions. Five minutes was selected as a rough estimate of the average time it would take to make those decisions. Using this estimate, the average design time for each algorithm and design mix was plotted. Design mix one was omitted from the plots, because it represents the use of only non-localized placements. Since some first order relationships would normally be used in the design process, and mixes 2 through 5 represent the increasing use of first order placements, they are of more use to us here. We can see immediately from Figure 7.4, that the MBFR algorithm, on the average, took longer to design than the ROOTF algorithm. This to be is expected, since it is larger. But, we can also see that while mix 2 took the least time to design algorithm MBFR, it took the most time to design algorithm ROOTF. Also, mix 5 133 365 .. 360 w MBFR 355 t 350 ‘7 345 T 340 -~ 335 .- Minutes 260 x 255 .. 250 .. 245 4- 240 .. ROO‘tf Design Mix Figure 7i: Average design time, in minutes, for algorithms ROOTF and MBFR as a function of the design mix. 134 took the least time for algorithm ROOTF, but the most time for algorithm MBFR. Therefore, we can see that the design methodology which took the least time differed between the two algorithms. The results from Tables 7.2 and 7.3, and Figure 7.4 suggest two major observations. First, Egg choice pf 3 design methodology affects the rate at which information lg Egggg pg gg algorithm during ipg design Egg Egg Egg sample algorithms. Second, since the performance of the design methodologies differed between the two algorithms, Egg structure pg gp algorithm, impgcts £25 performance g; 3 design methodology. We will examine how the two phases of a design step are impacted by an algorithm's structure in Sections 7.5 and 7.6. First, however, we will look at the differences in the structures of algorithms ROOTF and MBFR. 7.4 Structural Differences Between Algorithms ROOTF and m In Table 7.4, several differences in the structures of algorithms ROOTF and MBFR.are identified. The first obvious difference between the algorithms is their size. Algorithm MBFR is about 55% larger than algorithm ROOTF. MBFR has a conceptual volume of 934 bits, while the conceptual volume of algorithm ROOTF is 601 bits. Likewise, the sizes of the parallel and sequential components differ. The parallel 135 Table 7.4 Structural differences between algorithms ROOTF and MBFR. ROOTF MBFR Conceptual Volume (X_in bits) 601 934 Parallel Component Conceptual Volume (X?) 53 652 Sequential Component Conceptual Volume (ES) 480 160 Ratio of Parallel Component Conceptual Volume to Conceptual Volume (1p / 1) .09 .70 Ratio of Sequential Component Conceptual Volume to Conceptual Volume (XS / X) .91 -.30 136 component of algorithm MBFR has a size of 652 bits versus 53 bits for algorithm ROOTF. The sequential component of algorithm MBFR has a size of 160 bits versus 480 bits for algorithm ROOTF. In algorithm ROOTF, the sequential component dominates the conceptual volume, while in algorithm MBFR, the parallel component dominates. For algorithm ROOTF, 9% of the volume is the result of the parallel component and 91% is the result of the sequential component. On the other hand, in algorithm MBFR, the distribution is 70% parallel and 30% sequential. Thus, our metrics have allowed us to see several differences in the informational structure of algorithms ROOTF and MBFR. In the next sections, we will see how these structural differences impact the design process. First, we will examine how they impact the elaboration process in Section 7.5. Then, we will examine their impact on the placement process in Section 7.6. 7.5 Effects g; the Algorithm's Structure pp the Elaboration Process Since algorithm MBFR is larger than algorithm ROOTF, we might expect that there are more unique objects in it. Since conceptual volume is based on the number of unique objects as well as on references to them, this might cause the addition of an object to algorithm MBFR to have a larger 137 informational impact. Recall that y - N log2 n. Since algorithm MBFR has 73 unique object-context pairs, and algorithm ROOTF has only 51, this explains some of the difference in the average information added by a design step, between the two algorithms. But, it does not explain the opposite trends in the average information added for each design mix, i.e., why does design mix 5 add more information than mix 2, for algorithm ROOTF, whereas mix 2 adds more information than mix 5 to algorithm MBFR. Therefore, we must look for other reasons why the added information added by a design step varies between the two algorithms. Since algorithm MBFR is larger than algorithm ROOTF, it has more graph control structures. And, as can be seen in Figures 7.1 through 7.3, its graph control "structures are nested to more levels than those in algorithm ROOTF. Thus, the size of a chunk may potentially be larger in algorithm MBFR, than in algorithm ROOTF. Recall, that a chunk must be contained in a functional unit. A functional unit is denoted by a graph control structure and those graph control structures which are nested in it. Since more items may be in a chunk in algorithm MBFR, more information may be added to the algorithm, on the average, by a design step. The number of object-context pairs in the algorithms and the sizes of the chunks, seem to explain why the amount of information added by a design step varies between the two 138 algorithms. The next section will examine why the five design mixes performed differently for a given algorithm. 7.6 Relationship Between the Placement Process nd the Eggs Information ig égggg pg gp Algorithm's Design The results from the previous section suggest reasons why the design mixes added information to the two algorithms at different rates. We will now examine why, for a given algorithm, the five design mixes performed differently. Since the impact of the elaboration process is the same for for all of our design mixes, we will examine the impact of an algorithm's structure on the placement process. The placement process is used to select) the next structure to add to an algorithm's design. Depending on the relationship which is used by the placement process, and the structure of the algorithm, the new structure may cause a context shift. Recall, the context of a structure represents whether it is in a sequential or a parallel component of an algorithm. The context under which an object is referenced, affects the volume metric. Recall, y - N log2 n, where N is the number of objects referenced and n represents the number of unique object-context pairs. Thus, a reference to an object in a new context has a larger impact on the volume, than a reference to the object in the old context. Therefore, a placement decision which causes a 139 context change may result in information, as measured by the volume metric, being added to an algorithm at a higher rate. If an algorithm is designed using placement decisions which cause context changes late in the design of an algorithm, the result will be a large jump in the information growth rate when the context change occurs. Since N will be larger late in the design of an algorithm, a small change in n will have a large impact. If context changes occur early, the initial growth rate will be higher, but the effects of the context change will not be as significant. Since N will be small early in the design process, a change in n will not have as large of an impact on the volume as it would later. Therefore, while the total amount of information added to an algorithm is the same, how early or late in an algorithm's design that context changes occur and the number of context changes, will affect the rate at which this information is added. In the model, the choice of a placement decision may affect when a context change can take place. Whether a placement decision causes a context change or not, may differ between two algorithms. We will now look at the placement decision types, which were used in our model, to see how this might happen. Since the concurrent relationship was not used for our two sample algorithms, it will not be considered here. 140 The input and output relationships connect structures based on the use of the same data. However, this data might be used in both the sequential and parallel components of an algorithm. In algorithm ROOTF, 60% of the structures which have input and output relationships, are connected to a structure in a different context. In algorithm MBFR, only 35% of the structures which have input and output relationships, are related to structures in a different context. A higher percentage of these structures are related to structures in a different context in algorithm ROOTF. Thus, input and output placement decisions may cause a context change more often in algorithm ROOTF, than in algorithm MBFR. Therefore, a design mix which places an emphasis on the use of the input and output relationships, may have different impacts on the two sample algorithms. The module relationship is used to make a placement based on a high level modular decomposition of an algorithm. The decompositions which were used in the experiments, only relate structures which are in the same context. Therefore, the use of a module placement decision has the same impact on both algorithms. The use of a non-localized placement may cause a context change in algorithm ROOTF, with a different probability than in algorithm MBFR. In algorithm ROOTF, 9% of the algorithm represents the parallel component and 91% of the algorithm represents the squential component. In 141 algorithm MBFR, the distribution is 70% parallel and 30% sequential. Therefore, when a structure is randomly selected using a non-localized placement, there is a higher probability of a context change in algorithm MBFR than in algorithm ROOTF. Thus, a design mix which emphasizes non-localized placements, may cause a context change more often in algorithm MBFR, and thus, add information to it at a higher rate. It appears that the input, output, and non-localized placements may be responsible for controlling context changes, in our experiments. Since' input and output are localized placements, mix 5 represents the highest probability of their use. Mix 5 also represents the smallest use of non-localized placements. As Table 7.3 shows, when algorithm ROOTF was designed using mix 5, as expected, it had the highest average information growth rate. This was expected, since mix 5 allows for the highest use of the input and output placement relationships and the smallest use of non-localized placements and thus, allows the most context changes for algorithm ROOTF. Mix 2 represents the use of localized placements only 25% of the time. When mix 2 was used to design algorithm ROOTF, the information growth rate was the lowest. This was also expected, since a non-localized placement for algorithm ROOTF has a low probability of a context change. Earlier, when we discussed the input and output placements, we saw 142 that they were less likely to cause a context change for algorithm MBFR than for algorithm ROOTF. But, non-localized placements had a higher probability of causing a context change for algorithm MBFR. Therefore, we might expect different performances for the two algorithms from the design mixes based on the impact of the input, output, and non-localized placements. Indeed this happened. For algorithm MBFR, mix 2 caused the highest information growth rate, and mix 5 caused the smallest. Thus, we can see that 525 structure g; pp algorithm impgcts Egg ggy 1p 39153 3 design methodology egg; information pg Egg algorithml‘gg ppg result g; 3 design gpgp. One way this occurs, is by promoting or preventing context changes, when structures are selected for elaboration. The next section will consider how these context changes are related to the activities of human designers. 7.7 Correspgndence Between the Model's Results and Human Designers We will now attempt to explain the model's behavior in terms of real designers. Since humans are limited in the amount of information which they can deal with at a particualr moment [16], it would be useful if a design methodology could help control the amount of information 143 which must be used to make a design decision. Some parts of a parallel algorithm can be designed independent of the other parts. Therefore, less information is needed to design these independent parts, than if there were dependencies between them. However, process synchronization and intercommunication often requires the designer to consider several processes, when making a design decision about these activities. Some of these activities may be in a parallel portion of the algorithm, while other of these activites may be in a sequential portion of the algorithm. Thus, conceptual differences between the objects in the parallel and sequential components may impact the amount of information which the designer must deal with. These conceptual differences correspond to a context change in the' design model. Thus, we can see that if these aspects of a parallel algorithm are designed early, their informational impact on the algorithm will be less than if they are considered at the end of the design process. Since an algorithm only has a small information content at the beginning of the design process, a complex decision will be easier to make then, than if the decision is made later in the design process. This occurs, since the information content of the algorithm is higher later in the design process. Although a context change still causes an increase in the rate at which information is added to an algorithm early in the design, that increase will be less than if the 144 context change occurs later in the design process. . Others have recognized this in their discussions on parallel algorithm design [32-34]. Although they have not explicitly considered conceptual differences between objects in sequential and parallel components of an algorithm, the increased informational complexity of considering objects in both contexts at the same time, underlies some of their suggestions. For example, Brinch Hansen in [32] discusses development procedures for concurrent algorithms. He divides an algorithm into three components: monitors, classes, and the process. Monitors represent synchronization activities: classes represent shared data items for process intercommunication: and the process represents the remaining sequential-like activities of an algorithm. He suggests designing the monitor and class components first. The remainder of the algorithm can then be designed almost as a sequential algorithm. In other words, he suggests making decisions which involve both parallel and sequential activities early, when the size of the algorithm is small and thus, the amount of information which must be considered from the previous design steps is limited. The process is basically sequential in nature, and only deals with information in a single context. Thus, the effective information burden on the designer is less, than that of the monitor and class components. The lack of context changes when designing the process component of the 145 algorithm, causes less information to be added to the algorithm, when the process component is designed. To further demonstrate the efficacy of this approach, an example will be presented. In [33], Coffman and Denning present an example of two cooperating sequential processes, the producer/consumer model. Figure 7.5 shows an adaptation of their model in complete detail. The Producer generates an object: the object is put into a common buffer: and it is then retrieved by the Consumer and disposed of. N represents the number of buffers: array elements B[0..N-l] are the buffers: and $1 and 52 are semaphores. Assume that statements S and W are indivisible. This is necessary for the semaphores to function correctly. Note from Figure 7.5, that if the designer tries to consider the processes as a whole, not only must the action of the individual processes be considered, but the intricate details of the semaphores and intercommunication must be considered as well. Figure 7.6 shows how the design of the producer and consumer processes can be simplified, by first addressing process synchronization and intercommunication. Now the manipulation of the semaphores can be designed independent of the processes which use them. Likewise, the details of managing the buffers for process intercommunication can be removed from the design of the producer and consumer processes. By focusing on intercommunication and process 146 begin cobegin Producer: produce(w): W: 81 :- Sl - 1; L: if 81 < 0 then goto L: B[in] :8 w; in :- (in + 1) mod N: S: 52 :8 $2 + 1; goto Producer; Consumer: W: 52 :- $2 - l: L: if 82 < 0 then goto L: w :- Blout]: out :- (out + 1) mod N; S: 51 :- $1 + l; consume(w): goto Consumer; coend end. Figure 7.5 The producer and consumer processes in full detail. Input(w): Output(w): Wait(S): L: Send(S): begin cobegin Producer: Consumer: coend end. Figure synchronization removed. 7.6 The producer 147 Blin] :- w; in :8 (in + 1) mod N. w :- B[out]: out :- (out + 1) mod N. S : i -s £s< - 1 0 t S :8 S + l. produce(w): Wait(Sl): Input(w): Send(SZ): goto Producer: Wait(SZ); Output(w); Send(Sl): consume(w): goto Consumer; and and process : hen goto L. consumer processes intercommunication with the details Xmit: Receive: begin cobegin Producer: ‘Consumer: coend end. Figure 7.7 The producer 148 Wait(Sl): Input(w): Send(SZ). Wait(sz): Output(w): Send(Sl). produce(w): Xmit(w); goto Producer; Receive(w): consume(w): goto Consumer: and consumer processes parallel details removed. with all 149 synchronization first, we are forced to define a large number of context-operand pairs early in the design process. This is consistent with the approach taken in the model of making n large at the beginning of the design process. Although the information growth rate will be intially' higher, the information growth rate will not be as large later in the design process. The designer must still deal with semaphores in the code of Figure 7.6. They are only dealt at a higher level. By combining both synchronization and process intercommunication, and treating their design first, the design of the producer and consumer processes becomes almost trivial, as seen in Figure 7.7. Since the parallel activities have been designed in a seperate module, these complex decisions have been made without the designer having to consider the information which represents the rest of the producer and consumer processes. 7.8 Summary Our experiments with the ROOTF and MBFR algorithms have shown that the choice of a design methodology may impact the time it takes to design an algorithm. Also, the structure of an algorithm may impact the performance of a design methodology. Our metrics have allowed us to see that a design methodology which has a high probability of causing a context change late in an algorithm's design, will .cause 150 information to be added to an algorithm at a higher rate than if they occur early in an algorithm's design. When designing a parallel algorithm, it is sometimes necessary to deal with objects that occur in both the sequential and parallel contexts, during a single design step. This may occur when designing the process synchronization and intercommunication activities. A decision which involves objects in both contexts is more complex late in an algorithm's design, than if it occurs early. Therefore, it may be easier for a designer to consider these aspects of a parallel algorithm early in the design process. Those parts of the algorithm which are strictly sequential or strictly parallel, can easily be handled later in the design process. Since they only deal with a single context, the informational impact of designing them is the same late in the design process, as it would be if they were designed early. It was also seen in this chapter, that these results correspond to suggestions that have been made by others for designing parallel algorithms. They suggest a similar order in designing the different components of a parallel algorithm. These suggestions were based on intuition. We now have some quantitative backing for them. CHAPTER 8 Conclusions This dissertation began with the premise that parallel algorithms contain structural and semantic information regarding process synchronization and intercommunication, which is not present in sequential algorithms. The impact of this additional information on the design process for parallel algorithms was examined. The first questions this dissertation addressed are as follows. Can this additional information be quantified? How much more information does a parallel algorithm contain? And, where in an algorithm are the effects of this added information seen? Software metrics were presented which described the information content of parallel algorithms. The metrics are based on those suggested by Halstead, but were extended to consider conceptual differences between those algorithm elements which are found in sequential 151 152 versus parallel portions of an algorithm. By examining a sequential root finding algorithm and a parallel version, which was derived frdm it, we saw that indeed the parallel root finding algorithm does contain additional information. It was also seen that this additional information is not necessarily spread uniformally throughout the algorithm, but it may be concentrated in a small portion of the' algorithm. In fact, the bulk of the added information may not necessarily be found in the parallel portion. It may appear in the sequential component. This is a result of the need for the sequential component to support the parallel portion of the algorithm. This support has been given the name subsidization. Given that parallel algorithms may contain additional structural and semantic information, the next question which was addressed was: should this added information affect the way in which we design parallel algorithms? In order to examine this question, a model of the algorithm design process was created. The model was based on the information content of an algorithm, and the information which is used during the design process. The design process was modeled as a series of design steps. A design step consists of a placement process and an elaboration process. The placement process determines the positions in an algorithm at which to make a modification. The elaboration process makes those modifications. 153 Placement decisions were divided into two sets. The first set is those decisions which are based on relationships to those objects which were added to an algorithm by the last design step. The second set represents those decisions which are based on any relationships except those which involve objects added to an algorithm by the last design step. Placements which are made based on the decisions in the first set were called localized placements. Placements which are made based on the decisions in the second set were called non-localized placements. Design methodologies were modeled by using various combinations of localized and non-localized'placements. The designs of a parallel root finding and a parallel bin packing algorithm were simulated. It was seen from the results of these simulations, that the design methodologies performed differently for the two algorithms. Also, for a given algorithm, the performance of different design methodolgies varied. It was suggested that the structures of the algorithms were responsible for these different performances. We saw that the elaboration process was impacted by the number of unique object-context pairs in an algorithm, and it was also impacted by the levels to which the structures in the algorithms were nested. The placement process was impacted by placement relationships which allow a context change to occur between design steps. A context change early in the design process 154 has less of an impact on an algorithm's volume, than a context change will have if it occurs later. Therefore, it was suggested that a designer treat those aspects of a parallel algorithm, which may involve dealing with objects that are in both sequential and parallel components of an algorithm, first. Process synchronization and intercommunication often involve objects that are in both sequential and parallel components of an algorithm. Therefore, if they are designed first, the decisions which design them will be less complex, than if they are designed later in the design process when the algorithm will be larger. The rest of the activities of a parallel algorithm, involve objects that are primarily in a single component. Therefore, the complexity of designing these remaining activities will be the same late in the design process, as it would be if they were designed at the beginning. Thus, by designing a parallel algorithm's process synchronization and intercommunication activities first, the overall design process can be simplified. These quantitative predictions seem to be confirmed in the literature. Brinch Hansen and others have made similar suggestions for designing parallel algorithms. Their suggestions have been based on intuition. We now have some quantitative support for them. The next appropriate step for this research would be to repeat the experiments for other parallel algorithms. Although the ROOTF and MBFR algorithms are representative of 155 other parallel algorithms with single localized parallel segments, it would be useful to study other such algorithms. Also, it would be interesting to study whether the trends, which were observed here, apply to parallel algorithms in which the parallelism is distributed. Further experiments using our model would offer some insights into this. In conclusion, this research has demonstrated that there may be informational differences between parallel and sequential algorthms. These differences are primarily concerned with structural and semantic information which must be considered in designing parallel algorithms, but not in designing sequential ones. These differences in information have an impact on the design process for parallel algorithms. It is anticipated, that as we come to a better understanding of these underlying differences between sequential and parallel algorthms, that we will be able to arrive at design techniques which will simplify the design of parallel algorithms. This will also aid in designing languages and operating systems for parallel architectures. The combination of better design techniques and a better design environment, should facilitate the efficient use of parallel architectures. APPENDI X APPENDI X A The Bin Packing Algorithm A pseudo-code description of the MBFR algorithm is given below. The algorithm repeats for each item in the input list. The algorithm assumes all nonempty bins make a bid for each new item, but in practice, only nonempty bins which are also not full would do so. This was done to simplify the algorithm. Null bids by active processors for an item are coded as l for the bid, and 0 for the replaced item. The master collects the bids and selects the minimum bid which is less than 1. If the new value is greater than 0 for the selected bin, the item is removed from the bin. Next the new item is placed into the selected bin, and the gap is updated. The algorithm then repeats for the removed item. 156 157 If there were no bids less than 1, the master inserts the new item into the next empty bin and calculates the appropriate gap. If items are replaced, notice that they get progressively smaller. An item cannot displace one which is larger than itself. Therefore, the replace/insert cycle will always terminate. a parallel alg is being constructed, 1458 Algorithm nsrxaturncx nuxsius:-1: GAF[1):-l: READ(VALUB) WHILE NOT END 0? LIST DO 8861! HHILE(VALUE)§ 0 DO PARFOR Iz-I TO NUHBINS DO IF VALUEO is true. If no item exists it returns to a 0* ELSE BIDlI]:-NEHVALUE[I]+CAP[I]-VALUE END; SELECT:-MIN(BID); *MIN chooses the bin with the minimum of all bids, and returns its index* 159 IF BIDISELECTlO TEEN BEGIN REHOVE(NEUVALUE[SELECT],SELECT *removes NEWVALUE from the selected bins list* INSERT(VALUE,SELECT) *inserts VALUE into selgptgd n UPDATE 02(VALUE,NEUVALUE[SELECT],GAPISELECT],SELECT) *updates GAP accordingly* VALUE:-NEWVALUE[SELECT] *new VALUE to pack on next pass‘ END ELSE BEGIN UPDATE 61(VALUE,GAP[SELECT],SELECT) *update 6 bin by sub- tracting iALUE from VALUE:-O the GAP for the selected 4 END bin ELSE BEGIN NUBINS:'NUBINS+1 *or start a new bin by setting CAP of s g . ‘ _ - s 0* ores saw mmwuumsmawt); .\mme “N", m 1 ”Wk VALUE:-O END END ‘01 pack exchange loop* READ(VALUE) END ‘0! list pruccss loop* BI BLI OGRAPHY [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] BIBLIOGRAPHY D. J. Ruck, Structure of Computers and Computations, Wiley, New York, 1978. J. L. Baer, Computer S stems Architecture, Computer Science Press, Inc., 1980. D. J. Ruck, "Measurements of Parallelism in Ordinary FORTRAN Programs", Computer, January, 1974, pps. 37-46. E. W. Dijkstra, 'GOTO Statement Considered . Harmful", CACM, Vol. 11, No. 3, March, 1968, pps. 147-148 a P. Freeman and A. Wasserman (editors), Tutorial pp Software Design Technigues, IEEE Computer Society, ‘I980"" P. J. Brown, 'Programming and Documenting Software Projects" Computing Surve s, Vol. 6 No. 4 December,'1974, pps. 213-220. ' I J. M.. Yohe, "An Overview of Programming Practices”, ggmput;2g_ziurveys, Vol. 6, No. 4, December, 1974, N. Wirth,'On the. Composition of Well-Structured gzggzgmg',197Cpmpgging24Surveys, Vol. 6, No. 4, D. E. Knuth, “Structured Programming with GOTO Statements”, Computing Surve 5 Vol. 6 No. 4, December, 1974, pps. 261=§UITX" ' B. W. Kernighan and P. J. Plauger, "Programming Style: Examples and Counter Examples", Computing Surve 3, Vol. 6, No. 4, December, 1974, pps. 303-319. W. B. Ackerman, ”Data Flow Languages", Proceedings NCC, Vol. 48, 1979, pps. 1087-1095. Per Brinch Hansen, ”The Programming Language Concurrent Pascal", IEEE Transactions on Software En ineerin , Vol. SE-I, No. 2, June, I975, pps. 193-267. 160 [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] 161 H. T. Hung, "The Structure of Parallel Algorithms", Advances ig Computers, Academic Press, New York, 9 0. S. Owicki and D. Gries, "Proving Parallel Programs Correct", Structured Multiprogramming, Springer Verlag, 1978. W. Harrison, K. Magel, R. Kluczny, and A. Dekock, "Applying Software Complexity Metrics to Program Maintenace', Computer, Vol. 15, No. 9, September, 1982, pps. 65-79. M. Halstead, Elements .3; Software Science, North-Holland, New York, I977., R. G. Reynolds and G. Cichanowski, "Metrics for the Comparision of Parallel Algorithms and Their Design Methodologies", Symposium on Empirical Foundations of Information and Software Science, Atlanta, Georgia, November, 1982. S. Baase, Com uter Al orithms, Addison-Wesley, Reading, Massac usetts, 1979. C. Gear, Applications and Algorithms ;_ Computer ~ Science, Mo u e A, SRA Press, 1 7 . R. G. Reynolds and T. L. Chang, "The PAL System - a Parallel Algorithm Design System for VLSI Based Array Architectures“, Tenth IMAX Conference, Montreal, August, 1982. Chi-Chih Yao, A., "New Algorithms for Bin Packing”, J. Assoc. Comput. Mach., Vol. 27 No. 2 pps. 7157-27—2 . "" ' ' Kou, L. T., Markowsky, G., 'Multidemensional Bin Packing Algorithms", IBM gg Res. and Dev., Vol. 21, No. 5, pps. 443-448. Coffman, E. G., Garey, M. R., Johnson, D. S., "An Application of Bin Packing to Multiprocessor Scheduling", Siam g; Comput., Vol. 7, No. l, pps. 1-17. Shapiro, S. E., "Performance of Heuristic Bin Packing Algorithms with Segments of Random Length", Inf. and Control, Vol. 35, No. 2, pps. 146-158. [25] [26] [27] [28] [29] [30] [31] [32] (33] [34] 162 Coffman, E. G., Jr., So, R., Hofri, M., Yao, A. C., "A Stochastic Model of Bin Packing", Inf. and Control, Vol. 44, No. 2, 105-115. Johnson, D. S., ”Near Optimal Bin Packing Algorithms", MIT Report, MAC TR-109, June, 1973. Graham ”Bin Packing" in Computer and Job-Shop Schedulin Theory, Ceffman, E. G., editor, Jo n WiIey and Sons, New York, 1976, pps. 208-225. Turski, W., Com uter Programming Methodology. Heyden, Philadelphia, 1978. Perlis, A., Sayward, F., Shaw, M., Software Metrics: 5% Analysis and Evaluation, MIT Press, Cambridge, MA, 1 81. Davis, J. S., ”Chunks: A Basis for Complexity Measurement", Symposium on the Empirical Foundations of Information and Software Science, Atlanta, Georgia, November, 1982. Wirth, Niklaus, ”Stepwise Program Development", in Systematic Programming: 5p Iggroduction, Prentice Ha , Englewoo C 1 s, New Jersey, 1973, pps. 125-154. Hansen, Per Brinch, The Architecture pg Concurrent Programs, Prentice HaII, EngIewood Cliffs, New Jersey, 1977. Coffman, E. G., Denning, P. J., O eratin Systems Theor , Prentice Hall, Englewood CIiffs, New Jersey, 1975. Weinberg, V., ”Strategies to Develope Designs", in Structured Analysis, Prentice-Hall, Englewood Cliffs, New Jersey, 1 , PPS. 169-203. 4188 M ii: R Ii m Ii " U l 3 03046 WINIIUWI/IHII 129 3 IIHIVWIIHIIIJHI