‘Dmerfiateon for the Degree of W D g p .5,“ MICHIGAN STATE UNIVERSITY GEOFFREY WILLIAM GATES ‘ 1;” ” 1973 ‘1" Msis III MICHIGAN S|Y3I IIIIIII IIIII IIIWII LIBRARY Michigan State University This is to certify that the -.f thesis entitled . w A COMPUTER SYSTEM LOAD MODEL BASED ON CLUSTER ANALYSIS presented by Geoffrey W. Gates has been accepted towards fulfillment of the requirements for ._Phn.D_._degree in £— M 'or professor Date WM . 0-7839 came-Ham ABSTRACT A COMPUTER SYSTEM LOAD MODEL BASED ON CLUSTER ANALYSIS By Geoffrey William Gates The evaluation and comparison of computer systems is an increasingly important endeavor. Such evaluation of the computer as a tool must consider the programs or load which the computer is expected to perform. The lack of a quantitative characterization of such a load has been a hinderance to the general evaluation of computer systems. In this dissertation a probabilistic model is suggested which provides a characterization of the load on a computer system. The model represents the behavior of the load, defined as the utilization of the computer system resources as a function of time. The model assumes the load is made up of some number of job steps. These job steps are classified according to behavior using cluster analysis so that a separate submodel may be constructed for each distinct type of behavior. The amount of detail required of any partiCular instance of the model depends upon the application of the model. A hierarchy of instances of the model is described which is determined by the resources included in a particular instance of the model, the times at which the behavior may be sampled and the goodness of fit to the real load. This hierarchy affords guidance in improving any particular instance of the model. The techniques described are applied to construct a simple model Geoffrey William Gates of the load on the Michigan State University CDC 6500. DeSpite the simplicity of this instance and the relatively small amount of data used in computing the model parameters, the resource requirements predicted by the model are comparable to the requirements observed for an equivalent number of real jobs. A COMPUTER SYSTEM LOAD MODEL BASED ON CLUSTER.ANALYSIS BY Geoffrey William.Gates A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1973 TO MY PARENTS ii ACKNOWLEDGMENTS I am grateful to Dr. Lewis H. Greenberg, my thesis adviser, for his help in selecting the problem and guiding my research. I am also grateful to Dr. Carl V. Page, the chairman of my guidance committee, for his encouragement and direction given during the course of writing this dissertation. My thanks also go to Dr. Harry Eick, Dr. Julian Kateley and Dr. Morteza Rahimi for serving on my guidance committee and for reviewing this work. My special thanks go to Dr. Bernhard Weinberg and Dr. Richard Dubes for their assistance. Finally, there is really no way I can adequately eXpress to my wife Ginny the appreciation I feel for her patience, understanding and encouragement. iii TABLE OF CONTENTS LIST OF TABLES. . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . Chapter 1. 2. 3. BACKGROUND AND LITERATURE Introduction . . . . . . . The Environment. . . . . . Literature Review. . . . . Criteria for a Model . . . Uses of a Load Model . . . Organization of the Dissertation . . . Mo DE L FMWORK I O O C O O O 0 Introduction . . . . . . . Definitions. . . . . . . . The Job Model. . . . . . . The Sequencer. . . . . . . An Example of a Sequencer. Job Step Submodels . . . . An Example of a JSSM . . . Summary and Conclusions. . COMPUTATION OF MODEL PARAMETERS Introduction . . . . . . . Resource Selection . . . . Measurement and Sampling . Determination of the Types JSSM Structure . . . . . . iv of Behavior Page vi viii 12 15 17 19 19 19 23 24 29 31 35 36 39 39 39 41 45 53 Chapter TranSiti-on Probabilities O O O O O O O O O O O 0 Representation of JSSM Distribution Functions. . 811er O O O I O O O O O O O O O O O O O 0 O O O 4. A HIERARCHY 0F INSTANCES OF THE JOB MODEL . . . . . . Introduction . . . . . . . . . . . . . . . . . . Formal Basis for a Hierarchy . . . . . . . . . . Approximation at the Job Step Level. . . . . . . Modeling the Occurrence of Wait Conditions . . . Detailed Resource Utilization. . . . . . . . . . A Very High Level of Detail. . . . . . . . . . . sma ry O O O O O O O O O O 0 O O O O O O O 0 O O 5. AN INSTANCE OF THE LOAD MODEL . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . The Load to be Modeled . . . . . . . . . . . . . Identification of the Types of Behavior. . . . . Interpretation of the Types of Behavior. . . . . Estimation and Interpretation of the Transition Function . . . . . . . . . . . . . . . . . . . . Specification of the JSSM's. . . . . . . . . . . Implementation of this Instance of the Model . . Comments on the Model Implementation . . . . . . Improvement to the Model Instance. . . . . . . . $11er 0 O O C O O O O 0 O O I O O O O 0 0 0 0 C 6. SUMMARY AND FURTHER.RESEARCH. . . . . . . . . . . . . APPENDIX. . . BIBLIOGRAPHY. Summary of the Dissertation. . . . . . . . . . . Conclusions. . . . . . . . . . . . . . . . . . . Further Research . . . . . . . . . . . . . . . . O O O O O O O O O O O O O O O O O O O O O O O O Page 54 56 59 60 6O 60 67 69 75 77 80 81 81 81 84 89 95 100 101 103 105 107 108 108 109 110 113 116 Table 10 11 LIST OF TABLES Relative frequencies of instruction types in percentages for five standard instruction mixes. . . . . . . . . . . Relative frequencies of instruction types for the Gibson III inStmction mixo O O O O 0 O O O O O O O O O O I I 0 Instruction frequencies measured on a CDC 6400 for seven computation bound programs . . . . . . . . . . . . . . . Equilibrium probabilities for a nine state Markov model of the EXEC 8 Operating system under four different loads 0 O O O O O O O O O O O O O C O O O O O O 0 O O O 0 Sample sequence frequencies illustrating the effects of using a Markov process for the sequencer. . . . . . . Matrix of transition probabilities computed from the sequences in Table 5 . . . . . . . . . . . . . . . . . . Sequences assigned non-zero probabilities by the transition matrix in Table 6 . . . . . . . . . . . . . . Job steps provided by the operating system on the IBM 1130 O O O O O O O O O O O O O O O O O O O O O O O 0 Summary of clustering exPeriments showing the number of clusters obtained for several values of the minimum number of job steps acceptable per cluster at a threshold parameter value of 0.0 . . . . . . . . . . . . Cluster point coordinates from 800 job steps run from 12 noon to 2:00 p.m. with a threshold parameter of 0.05. Identification of each cluster by command or command type occurring most frequently in the cluster. . . . . . vi Page 13 26 26 26 30 86 90 94 Table 12 13 14 Page Distribution of user written program executions among the different clusters . . . . . . . . . . . . . . . 96 Matrix of transition probabilities for the sequencer determined from 800 job steps run from 12 noon to 2:00 p.m. on a class day. . . . . . . . . . . . 97 Summary of the results obtained from the implementation of the CDC 6500 load model. . . . . . . . . 104 vii Figure 10 LIST OF FIGURES Structure of the total computer system showing the possible access paths to resources available to the user 0 O O O O O O O O O I O O O O O O O O O 0 State transition structure to produce the sequence of job steps "FTN.LGO." with the correct frequencies. The sequencer for the IBM 1130 with a simple Operating system used in the example. . . . . . . . . Condition function relating cards compiled to lines printed showing the effect of printing header and trailer information. . . . . . . . . . . . Hypothetical distribution of measured job step resource utilization in a two-dimension space . . . . Hypothetical distribution of measured job step resource utilization as in Figure 5 with decision surfaces Shown. O O O O O O O O O O O O O I O O O O 0 Illustration of the effect of the shape parameter, c, on the form of the Weibull distribution. . . . . . Illustration of three different approximations to a real Job, x1(t), x2(t) and x3(t), and the corresponding consistency sets. . . . . . . . . . . . Distribution of central processor seconds for job steps of cluster one. . . . . . . . . . . . . . . . . Distribution of the number of system requests issued by job steps of cluster one . . . . . . . . . . . . . viii Page 28 32 37 46 46 58 65 91 91 Figure Page 11 Distribution of the number of record units transferred by job steps of cluster one . . . . . . . . o 92 12 Distribution of the number of words of memory required by job steps of cluster one. . . . . . . . . . . 92 ix Chapter 1 Background and Literature Review Introduction The processing done by a computer system can conveniently be con- sidered to be of one of two types: 1. That processing that results directly from a command issued by a user. 2. That processing that results from some action of the operating system. The 1229 on a computer system consists only of the processing of the first type, izg;_done at the specific request of a user. While much effort has been expended on the instrumentation, mea- surement and modeling of computer system performance, the lack of a quantitative characterization of the load has been a hinderance. In this dissertation such a model is proposed and some of its properties are studied. The Environment For the purposes of this dissertation the load, iLngall user specified actions, and the computer system performing these actions are considered to be a closed system, the total computer system. All stimuli presented to the computer system comes from the load. The computer system comprises all resources available to perform the comp- utations specified by the load. Any overhead required by the computer 2 system, ELg;_memory management or accounting, is part of the computer system and not part of the load. The resources used in the performance of the load are ultimately embodied in the hardware of the computer which performs the processing and are available to the user at several different levels of complexity. Each level corresponds to an access path to the hardware. The relation- ship between the user's tasks and the various access paths to the hard- ware is illustrated in Figure 1. At the simplest level is the direct access to the hardware, appear- ing generally in the execution of the user's machine language instruc- tions by the central processor. Ideally, all processing could be per- formed in this manner, however, certain Operations have the potential of adversely affecting other users of the system. For example, if a user's program is using a tape drive, no other user's program should be able to access the same drive. These sensitive operations are perform- ed for the user by the operating system, thus protecting other users. These operations are accessible by means of the system request, typi- cally made by the performance of a special instruction such as an RA+1 request on the CDC 6500 or an SVC instruction on the IBM 370. Other operations are used so frequently that efficiency is of great importance, such as file OOpying, or are so complex, such as language translation, that it would be impractical for each user to supply them. These operations also offer a method to use the computer's resources and are called utility programs. Although it is not common usage to refer to compilers and assemblers as utility programs, the term is applied in this manner for the purposes of this dissertation. Both system requests and utility programs are part of the system user / specified LOAD task / ____...______._______/ system COMPUTER request u::l::;s SYSTEM Pg ystem machine request language instructions system request processor machine language instructions I hardware Figure 1. Structure of the total computer system showing the possible access paths to resources available to the user. 4 software, the difference being that utility programs are accessed by means of the operating system control language (OSCL) whereas the system requests are invoked from programs written directly by the user or from utility programs. Figure 1 then illustrates the environment in which the load, or a model of the load, must function. The load is external to the computer system and makes demands at several different levels to use the re- sources provided by the computer system. These demands can go directly to the hardware, ngL add the contents of two registers, or the demands can be in the form of a system request, 2; a open a file, or the demands may consist of using a utility program, ngL_compile a 500 statement COBOL program. Demands at all three levels comprise the load and so any model of the load must account for interactions between the user and the computer at all three levels. Literature Review No results on a general model of a load appear in the literature although such a model is implicit in all the work published on sim- ulation models of computer systems. In [Chevy 1966, Fine 1966, Gonter 1966, Oden 1972] very simple models of the load are used, charac- terizing the entire load by several distribution functions. In [Nielsen 1966] a different approach is taken, a pseudo-programming language is used for the specification of the various programs at the resource use level. In addition to being tedious to use, the method lacks generality as the primitives of the language are chosen for a particular machine operating in a special environment. Another source of implicit load models is the literature on instruction timings, benchmark programs and synthetic programs. 5 Instruction timing is a technique frequently used in the evaluation of a computer system independent of any load information. In order to simplify the comparison of two or more different computer systems it is convenient to obtain a single figure of merit for each system. Two parameters frequently used for comparison are the add time and the memory cycle time of the computer. However, for computers with espe- cially large or versatile instruction sets this measure is an over- simplification. In order to consider the entire instruction set in computing a measure of performance, it is necessary to decide how many times each particular instruction is used. Obviously, the relative frequencies of the instructions depend on the load and so any set of frequencies is a simple model for the load. Tables 1 and 2, taken from [Adams 1969], give the relative frequencies of each instruction type used in several standard mixes. Each mix is supposed to represent a particular type of program, ng; the Knight Scientific Mix is to represent the frequencies of instructions used in a typical scientific or computation bound program. The frequencies given in Tables 1 and 2 can be interpreted as the parameters of a load model, raising the question of how they are com- puted. The frequencies in Tables 1 and 2 were not computed, they repre- sent the best guesses Of people familiar with the several types of pro- grams. flniller 1971] describes an experiment in which the frequencies were actually measured for programs being executed. The frequencies, given in Table 3, may also be interpreted as parameters for a load model. The difference between the instruction frequencies for the guessed standard instruction mixes and the measured frequencies should Table 1. Relative frequencies of instruction types in percentages for five standard instruction mixes. Arbuckle Gibson Knight Kni ht General Mix Mix Scientific Commercial 3ommercial Mix Mix Mix % Fixed + & - 0.0 6 10 25 1 % Fixed x 0.0 3 6 1 0 % Fixed / 0.0 1 2 0 0 % Floating + & - 9.5 O 10 0 0 % Floating x '5.6 O 0 0 O % Floating / 2.0 O O 0 O % Load & store 28.5 25 0 o h % Indexing 22.5 0 O 0 41 % Conditional branch 13.2 20 0 0 28 % Compare 0.0 24 O 0 26 % Branch on character 0.0 10 0 0 0 % Edit 0.0 LI 0 o o % Other 18.7 7 72 7h 0 100.0 100 100 100 100 Table 2. Relative frequencies of instruction types for the Gibson III instruction mix. Percentage Instruction Occurrence Move one word from memory to accumulator Fixed point 5 Floating point 5 Store accumulator 5 Move 500 contiguous words memory to memory 1 Move 500 random words memory to memory 1 Conditional branch Result zero 5 Result negative 5 Compare two words and set indicator Fixed point 2 Floating point 2 Compare two digits and set indicator 1 Unconditional branch 8 Fixed point multiply, all in memory a Fixed point divide, all in memory 1 Fixed point add, all in memory 5 Fixed point subtract, all in memory 5 Shift one character (six bits) 3 Logical AND or OR 1 Indexing ll Indexing and indirect addressing 3 Indirect addressing 13 Floating point add, all in memory 0 Floating point subtract, all in memory a Floating point multiply, all in memory 0 Floating point divide, all in memory 2 100 Table 3. ATFUN % Test or branch 11.01 % Register moves 0.06 % Floating + & - 3.88 % Floating x & / 2.1a % Integer operations h5.lo % Other 27.81 100.00 Instruction frequencies measured on a CDC 6&00 for seven computation bound programs. FIREBALL RADEBR SACSRADAR PALLOC 10.28 n.05 10.28 b.50 2.68 21.11 16.56 0.30 0.078 0.18 0.837 0.13 0.805 50.19 u9.9o 28.92, 31.82 100.00 ROGEN RADSIM 29.12 26.28 0.276 0.007 0.0U6 0.08 0.038 0.093 “2.76 h9.h3 27.76 2U.11 100.00 100.00 100.00 9 be noted. This difference is more striking when it is known that the programmers of ATFUN and FIREBALL considered the programs to make heavy use of the computer’s floating arithmetic capability. In both cases all floating point operations combined account for less than 10% of the entire program. This discrepancy illustrates the importance of deter- mining model parameters from actual measurements. Whether the parameters are guessed or measured, instruction mixes are inadequate load models because they represent interaction on only one level, hardware. The logical extension of the instruction mix to all three levels of interaction is the benchmark program, defined in [Adams 1969] as: l. A routine used to determine the speed performance of a computer. 2. A routine to be run on several different computer config- urations to obtain comparative throughput performance figures regarding the abilities of the various config- urations to handle the specific application. 3. A precisely defined problem that is coded and timed for a number of computers in order to measure performance in a meaningful and directly comparable manner. The benchmark problem may be one of the user's own specific applications or may be representative of a class of typical computer applications. Although it is not stated explicitly, the choice of a particular benchmark program constitutes specifying a model of a load since most authors agree that it is of the greatest importance for the benchmark program to accurately match the load which will be run on the system being evaluated [Buckley 1969, Hillegass 1966, Joslin 1968, Joslin 1966, 10 Lucas 1971, Wood 1971]. One exception to this opinion appears in [Bucholz 1969] in which a program is suggested to provide an absolute measure of performance. But even here, the benchmark program provides a model of a load. Bucholz makes no attempt to match any characteristic of the benchmark load to a real load. A benchmark can be one of three types: natural, artificial or syn— thetic. A natural benchmark is a program selected from an operational load to be translated and run on the computer to be evaluated. In [Ward 1969] a very successful application of a natural benchmark is described. The overriding factor which contributed to this success was the fact that the load consisted of only one application and so, in effect, the entire load was transferred to the various machines being evaluated. Thus the model implied by the benchmark was an exact dupli- cate of the load being modeled. In more common situations the load being modeled is not homogeneous and so several benchmark programs must selected. One major drawback of natural benchmark programs is the frequent need to recode the programs and transfer any required data sets to the computer to be evaluated. Artificial benchmarks, programs which accom- plish.a well specified action such as sorting or matrix inversion, may be programmed in a machine independent manner, thus allowing easy transfer from one computer to another. Perhaps the most frequently used artificial benchmarks are those outlined in Standard EDP Reports by the Auerbach Corporation: 1. General file processing. 2. Random access file processing. 3. Sorting. ll 4. Matrix inversion. 5. General math problems, £2£h_polynomial evaluation. Whatever artificial benchmark is selected, it is implicitly a model of the load as a whole. For example, if a sort program is used as a bench- mark, then the assumption is being made that the demands on the computer system made by this program are similar to the demands made by the entire load . If a very versatile parameterized benchmark is desired, it is often necessary to resort to a program which accomplishes no real purpose, 1:2; a synthetic program. Synthetic benchmarks are programs which are constructed to exhibit a particular type of behavior. The distinction between artificial and synthetic benchmarks is in the level at which the desired behavior is specified. With an artificial benchmark, the assumption is, for example, that sorting is typical of the load and so a sort program is used. With a synthetic benchmark the assumption is that a program typically performs x input or output Operations and does y seconds of processing before each such operation where x and y are parameters to be determined. A program is then written which will exhibit this type of behavior. Synthetic benchmarks have the advan- tage of providing flexible behavior [Thompson 1969] based on the para- meters although the determination of the parameter values is difficult. As with instruction mixes it is important to base the benchmark program, and therefore the implied load model, on actual measurements. A comparison of a load with supposedly representative benchmarks is presented in [Feeley 1971]. A nine state Markov model of the EXEC 8 operating system for the Univac 1108 was used to describe the demands a job places on the computer system both at the hardware level and at 12 the system request level. The transition probabilities between these states were determined from measurements made on the running system and in turn the equilibrium probabilities were computed. This was done for a real-time load, arising from the use of the computer to monitor and control laboratory experiments, and also for a standard batch processing load. In addition, benchmark programs, supposed to be similar to the two types of loads, were measured. The equilibrium probabilities ob- tained are given in Table 4. The entries may be interpreted as the percentage of time a program would spend in each state or operating system module. Again the difference between the guessed load, $12; the benchmarks, and the measured load points up the importance of basing model parameters on measured data. In addition to the work already cited, information on models of computer systems and, indirectly, on models of a load may be found in [Calingaert 1967, Joslin 1968, Mittwede 1970, SDC 1968]. A reasonably complete annotated bibliography on modeling and other performance evaluation techniques may be found in [Miller 1972]. Criteria for a_flgdgl Previous experience with models of a load suggests several desir- able criteria for any new model to possess. The first of these criteria is that the results be useful. The model must present information in sufficient detail to answer the desired questions while at the same time omitting unneeded detail. Selecting the proper level of detail is important because this aids in understanding the results and keeps the model economical to use. For example, the model of the EXEC 8 operating system [Feeley 1971] can be used as an aid in deciding which operating system modules, if made more efficient, would afford the largest overall 13 Table u. Equilibrium probabilities for a nine state Markov model of the EXEC 8 operating system under four different loads. Real Time Standard §£g£g 5221,2252 Benchmark ngg Benchmark User program .019 .12 .09 .17 Interrupt handler .001 .002 .001 .0012 I/O handler .012 .04 .02 .056 Dispatcher .95 .70 .78 .59 User activity and output .009 i .11 .085 .16 Arbitrary device handler .008 .01 .Ol .01 COME module .001 .005 .Ooh .006 Dynamic allocator 0.0 .01 .007 .00h8 Drum handler 0.0 .003 .003 .002 1.000 1.000 1.000 1.000 14 improvement. The same information would be available in a trace of the instructions actually executed. However, such a trace contains so much unneeded information and would be so bulky that it is doubtful the trace would help in making any decisions. Very closely allied to the need for usefulness is the second cri- terion, that the model should preserve the structure of the modeled system. In the case of a load model, since the load interacts with the system at three levels, iLgL_hardware, system request and utility pro- gram, the model should also provide for interaction at all three levels. Furthermore, since the load is made up of a number of independent pro- grams submitted by different users, the model should preserve this structure also since a considerable amount of overhead is devoted to scheduling these independent jobs and to keeping them independent. This preservation of structure allows information gained from the model to be applied directly to the modeled system. The third criterion is that the parameters of the model should be defined in terms of directly measurable parameters of the real load. For example, rather than having a parameter specify the number of iter— ations of a loop of arbitrary instructions it should specify the amount of processor time to be used. If this criterion is met the model will provide direction in performing the measurements needed to determine the parameters and the connection between the structure of the model and the structure of the load will be strengthened. The fourth criterion is again related to the usefulness of the model. Since it is the load which is being modeled and since the build- ing blocks of the load are relatively invariant from one computer to another, it must be possible to implement the model in a machine 15 independent manner. The final criterion is that the model should provide an acceptable degree of accuracy. Since the degree of accuracy required of the model depends upon its particular application, the model must be structured in such a way as to allow easy verification of the degree of accuracy achieved. The total computer system is defined previously as consisting of a load and a computer system. These two components are usually repre- sented by a set of real production programs and the hardware and soft- ware of an actual computer. However, this representation may take place at different levels. For example, the computer system may be replaced by a model of a computer system, called an emulator, while the load is still represented by production programs. Alternatively, a model of the programs, a benchmark program, may be run on a real computer system. A model of a load may be combined in several different ways with a representation of a computer system. The most obvious way is to use the actual hardware and software of the computer system and implement the model in such a way as to interact the same as the real programs, 312;.3 synthetic benchmark program. This program would consist of a main procedure which would decide what demands for resources to make on the computer system, and a set of procedures which would actually use the resources. For example, one procedure would consist of a set of instructions for which the execution time is known. The choice of this set of instructions could be based on one of the instruction mixes, taking into account the possible effects of one instruction on its neighbors. For example, on the CDC 6600 there are multiple 16 arithmetic processing units. Consider a series of three multiply instructions. The first one is initiated using one of two multipliers. Assuming independent operands, the second one may be initiated before the first terminates since there is still an available multiplier. However, the third must wait for a multiplier to become available. The time to perform a multiplication therefore depends on the neighboring instructions. The main procedure of the synthetic benchmark would call this timed procedure with a parameter indicating how much processor time to use from which the procedure would compute the number of times to repeat the set of instructions. The main procedure of the benchmark, embodying the load model, could be written in a machine independent manner allowing easy transfer of the model from one computer to another. Such a benchmark program, based on a model of a real load, would allow for corrections to data gathered by software measuring techniques. Since the program collecting the data is part of the load on the com- puter system, it is possible for the model to account for its behavior, biasing the model parameters in the main procedure of the benchmark to compensate for the demands on the computer system which will be added by the software measurement. Thus, using a benchmark based on a load model affords the ability to correct for the effect of the measuring procedure and also allows the measurements to be based on a known, re- producible load. This second point is especially important in the process of tuning the computer system to obtain improved performance. It is desirable to use a load similar to the real load the computer system.will be sub- jected to, while at the same time, it is necessary that the load be reproducible. Both of these features are offered by a benchmark based 17 on a load model. Another use of a load model is as a driver for a simulation model of a computer system. The interface between the load model and the computer system.model depends on the form of the simulation model. The implementation of the load model must still specify what demands for resources to make on the computer system model. However, rather than invoking a procedure to actually use the resource, the load model simply informs the computer system model. When the computer system.model decides the demand has been fulfilled, it will inform the load model which.will then make its next demand. A technique similar to this, trace driven modeling [Cheng 1968], uses a record of actions performed by a real load, called a trace, to drive a simulation model of a com- puter system. No attempt is made to abstract the nature of the load from the trace, however, and so the trace can only be used to drive models of the computer system on which the trace was measured. This use of a load model could be valuable in the design of new computer systems or configurations. Frequently a simulation model of the computer system is constructed so that the behavior of the system may be studied. If, for example, the computer system is being designed for radar signal processing, the parameters for the load model could be computed from measurements taken on an existing computer operating under an actual load. The load model with these parameters would serve as a driver for a model of the computer system allowing the behavior of the computer system to be studied under a realistic load. Organization 2: the Dissertation In Chapter 2 the form of a load model is specified and each of its components is described. Chapter 3 contains a description of the 18 techniques used in computing the model parameters. The concept of a hierarchy of instances of this model, based on the degree of resolution provided by the model, is described in Chapter 4. Examples of several different degrees of resolution are presented with a discussion of some possible uses of the corresponding instances of the model. Chapter 5 then describes the application of these techniques to determine the para- meters of a simple instance of the load model. This instance of the model is implemented in such a way as to generate parameters for a syn- thetic benchmark program. Finally, Chapter 6 summarizes the results of the dissertation and suggests several topics for future investigation. Chapter 2 Model Framework Introduction In this chapter the behavior and structure of a job are defined. Based on these definitions a framework for constructing a general model of a job is introduced. The properties of this model are discussed in light of the criteria outlined in the previous chapter. Definitions The load on a computer system has been previously defined as the collection of all tasks to be performed at a user's specific request which utilize resources of the computer system. It is convenient for modeling purposes to consider this load as a collection of independent tasks, called jobs. This assumption of independence among jobs rules out simulating certain situations which can arise, ng;_one job creates a file, a subsequent job uses the file. If it is the computation of the job, i:£L_the actual results obtained from the job, that is being modeled, then it may be necessary to introduce some mechanism for expressing dependency. If it is the behavior of the job, 212;.the time pattern of the computer resource utilization, that is being mod- eled, then independence is a valid assumption. The resources of a computer system are the hardware and software components to which a user has access. For example, a data channel, a disk drive, a compiler or a sort program are all resources. Implicit 19 20 in the nature of any resource is an access method and a unit of measure. The access method controls how a user may make use of a resource, 2:8; a compiler can be executed, a disk track can be written to or read from. The unit of measure is used to specify how much of a resource is used and is not necessarily unique. For example, the utilization of a data channel can be measured either in terms of the number of bits of infor- mation transferred or the length of time the channel was reserved. The gwng£_of a resource is an entity which is capable of manip- ulating the resource. Each resource may have several owners, some of which may be actively using the resource at any time. A dynamic owner of a resource is a job which owns the resource. However, some types of resources remain owned even when no job is currently using them. The static owner of a resource is the owner when there is no dynamic owner. Notice that a static owner cannot be a job but is either the system or some user who is responsible for the resource. For example, most systems provide a user with the facility to save a file for an extended period of time, gzg;_permanent files on the CDC 6500. The user who created the file, and presumably controls its deletion, is the static owner. Any job which is accessing the file is a dynamic owner. Resources which are limited as to the number of concurrent dynamic owners they may possess are called limited resources. The central processor is such a limited resource and may possess only a single dynamic owner at a time. A 2 x 4 tape controller is another such re- source which is limited to no more than two concurrent dynamic owners. Limited resources merit special attention in modeling since they repre- sent potential bottlenecks to any scheduling algorithm. Based on the above definitions, the behavior of a job is defined 21 as being the utilization, as a function of time, which the job makes of those resources for which it is a dynamic owner. In other words the behavior is not the computation performed by the computer for a par- ticular job, but the demands that the computation makes on the computer system's resources. While it is important for a model to preserve the behavior of the modeled system, it is also important that the structure be preserved. In [Ziegler 1972] structure is defined in terms of the states of the system and the transitions between states. This microscopic view is useful in demonstrating equivalences between various models and the original system and can be applied to a model of a job. In particular, the state variables represent the amount of each resource utilized, the transitions represent changes in these amounts as the job progresses. At a macroscopic level, structure can be defined relative to the functional modules of the system. At this level the structure of a job is readily apparent, i;£;.an initialization step, some number of inter- mediate steps expressible in an operating system control language (OSCL) and a termination step. This view of the structure of a job simplifies relating events occurring in the model to events occurring in the modeled job and y_i_c_g mg. The following definitions formalize this view of the structure of a job. A lgb.§£gp_is the smallest system initiated activity directly expressible in the OSCL. Any command in the OSCL which can be repre- sented as a series of other commands is not a job step. For example, on the CDC 6500 the statement FTN, I=COMPILE, R=2, 0PT=2. 22 specifies a single job step, a FORTRAN compilation. On the IBM 1130 the three statements // FOR *ONE WORD INTEGERS *LIST ALL specify one job step, again a FORTRAN compilation. The statement ”*ONE WORD INTEGERS" is not a job step but is simply a parameter specification for the compiler and would cause no action if it were to appear alone. 0n the Burroughs B6500 computer the statement ?COMPILE commands the operating system to compile the following program and, if there are no errors, load and execute the program. Since there is a command which expresses each of these actions independently, "?COMPILE" is not a job step but specifies three job steps. A job step is said to be in execution when it is a dynamic owner of a central processing unit. The term compilation is used when the job step in execution corresponds to a compiler and the term loading is used when the job step in execution corresponds to a loader program. A.jgb_is a sequentially ordered set of job steps starting with a sign-in or initialization job step and ending with a sign-out job step. Each job step exhibits a certain type of behavior and the sequence of these behaviors is the behavior of the job. The assumption of a sequential ordering is not met by all OSCL's as many possess a facility for specifying a string of job steps to be performed independently of the parent job. By considering each inde- pendent string of job steps as a new job introduced into the load, it 23 is only necessary to consider the question of independence of different jobs as mentioned above. The Job Model The job model proposed in this section models the behavior of a typical job in the following sense. N real jobs are run on a computer and their utilization of resources is measured and used to compute the parameters of the model. N simulated jobs are produced from the model and run in some manner, either by a simulation model of the computer or by the actual computer. Then the total utilization of each resource simulated should approach the total utilization of each resource mea- sured as N increases. It is not intended that any one simulated job will have an exact counterpart in the set of real jobs. In fact, since the job model includes no information about the scheduling algorithms, it is impossible to predict anything about the period of real time over which these resources would be used. Just as the load on the computer system is considered to be the collection of jobs run on the system, the load model is a collection of copies of the job model. The load model must represent two character- istics of the load, the behavior of the individual jobs‘and the number of such jobs comprising the load. It is the intent of this dissertation to study the properties of the job model which specifies the behavior of the individual jobs. No attempt is made to specify the number of c0pies of the job model required for a particular load model. The organization of the job model is based directly on the struc- ture of a job, as defined in the previous section. That is, one portion of the model, the sequencer, supplies a sequence of job steps to be performed. For each job step there is a job step submodel (JSSM) which 24 models the behavior of that one job step. The behavior of the job is the sequence of behaviors of the individual job steps. This modular construction has several desirable qualities among which are the cor- respondence between the structure of the model and the structure of a real job and the possibility of modeling different job steps with differing degrees of detail. At the same time this structure increases the difficulty of identifying the underlying state and transition structure as in [Ziegler 1972]. The Sequencer The problem of modeling the sequence of job steps can be expressed in terms of a discrete stochastic process [Feller 1968]. Let N be the number of distinct JSSM]s numbered from 1 to N. The state of the se- quencer at step t is then a random variable Xt which assumes a value from 1 to N with a certain probability. The sequence of job steps which constitutes a job is (X09X1’X23000,X) f where X0 and Xf must be the sign-in and sign-out JSSM's respectively. At each step, Xi’ the joint probability P ( xi , xi_1 , . . . , x 0) is known. It is an aim.of this dissertation to determine the degree of complexity required for the joint probability distribution. A Markov process [Bharucha-Reid 1960, Parzen 1962, Feeley 1971] is a special stochastic process satisfying the condition that if ( x0 , x1 , x2 , . . . , xn_1 , xn ) 25 is a sequence of discrete valued random variables, the joint distribution is defined in such a way that the conditional probability of Xn = x on the hypotheses X X = x1, ..., Xn-l = xn_1 is identical Wlth the 0=x0’ 1 conditional probability of Xn = x on the single hypothesis Xn-l = x In the context of the job model, the sequencer would be a Markov n-l' process if the present job step depended only on the previous job step. This property leads to simplicity both in computing the model parameters and in implementing a simulation of the resulting model since all prob- abilities are determined based on sequences of length two. Another property which a stochastic process may possess is sta- tionarity, $12; the joint distribution function is not a function of time. Again, in the context of the job model, this would require that a job have the same pattern of behavior, as expressed in the joint probabilities, at any time of day or any day of the week. If the model is a stationary Markov process then it is necessary to estimate only single numbers, the transition probabilities, from the measured data. If the model is to be non-stationary, then time functions must be es— timated. The effect of using a stationary model is obvious, yi§L_the model will be unable to express long term variations in the composition of the load. However, the effect of using a Markov process is not as obvious. The result of using such a first order model is that it is possible for the model to assign a relatively high probability to a sequence which never occurred in the measured data and perhaps can never occur. As an example, consider five events, [ A , B , C , D , E ], occur- ring in sequences with the frequencies shown in Table 5. The transition Table 5. 26 Sample sequence frequencies illustrating the effects of using a Markov process for the sequencer. SE UENCE A C E E . . . A B E E . . . A C D E E . . A B C E E . . FRE UENCY 9 90 91 810 sequences in Table 5. Table 6. A B C D E Table 7. transition matrix in Table 6. SE UENCE A C D E E A B C D E A B E E . A C E E . A B C E E A PROBABILITY .010 .081 .090 .090 .729 PROBABILITY .009 .090 .091 .810 Matrix of transition probabilities computed from the D E 0 0 0 .1 .1 .9 0 1 0 1 Sequences assigned non-zero probabilities by the 27 probabilities in the matrix in Table 6 are computed by dividing the total number of times the event corresponding to the row occurs into the number of times this event is followed by the event corresponding to the column. For instance, event B occurs a total of 900 times in sequences A B E E . . . and A B C E E . . .. Of these 900 occurrences, 90 are followed by event E and the remaining 810 are followed by event C. Thus, the entry in row B column C is 810/900 or .9 and the entry in row B column E is 90/900 or .1. Table 7 shows all sequences of events assigned a non-zero probability by the transition matrix along with the corresponding probability. A comparison of Tables 5 and 7 shows clearly that while every measured sequence is assigned a prob- ability, the sequence A B C D E E . . ., which is not one of the orig- inal sequences, is also assigned a non-zero probability. Furthermore, the relative magnitude of the probabilities is not consistent. Related to this problem is the inadequacy of a first order model to handle the situation of frequently occurring groups of job steps. For example, on the CDC 6500, the sequence "FTN.LGO.", to compile and execute a FORTRAN program, appears very frequently. However, both "FTN." and "LGO." can also appear in other sequences. Local correc- tions can be made to the sequencer to make use of higher order prob- abilities by relabeling "FTN." as "FTNl" and "FTN2" where "FTNl" can occur in any context and "FTN2" can appear only in the context "LGO.". Figure 2 illustrates the transition structure this introduces. The results in Chapter 5 demonstrate that the limitations of a first order Markov model as the model of a job do not seriously effect the usefulness of the model. Based on the assumption of a stationary, first order Markov process, 28 FTNl , arbi- trary Figure 2. State transition structure to produce the sequence of job steps "FTN.LGO." with the correct frequencies. 29 the sequencer model may be represented as a six-tuple: ( S , sO , sf , T , J , A ) where S is a finite set of states, 3 e S is a unique initial state corresponding to the sign-on procedure, 3 e S is a unique final state corresponding to the sign-off procedure, T:SxS * [0,1] is a transition function whiCh assigns a tran- sition probability to each pair of states, J is a finite set of job step submodels (JSSM), and A:S -*J is a function assigning one JSSM to each state. Insuring that the sequencer is a Markov process requires: Vs e S 23 T(s,t) = 1. Vt€S Furthermore, to satisfy an intuitive requirement for a job, T(t,so) = 0 Vt e S since a job can only be initiated once, and T(sf,sf) = 1 since the final state corresponds to sign-off. Ag Exgmple g; g Sequencer The IBM 1130 has a rudimentary operating system which provides four distinct job steps to the user as shown in Table 8. In addition Table 8. 30 Job steps provided by the operating system on the IBM 1130. N§M§_ FUNCTION JOB sign-on and initialization FOR I FORTRAN compilation ASM assembly language compilation XEQ execution of previously compiled program 31 to these there is an implicit sign-off job step. The compilers are such that each subprogram must be preceded by a separate compilation direct- ive. A typical job, therefore, will consist of a number of compilation operations, either FORTRAN or assembly language, possibly followed by an execution. The set of states is { s0 , s1 , $2 , s3 , sf ] with 30 being the initial state and s the final state. The assignment of f JSSM's to states by A is: A(so) = JOB A(sl) = FOR A(sz) = ASM A(33) = XEQ A(sf) = sign-off. Figure 3 shows the states and transitions pictorially. Job Step Submodels The sequencer serves to give the proper ordering to the JSSMis which actually model the behavior. Since it is behavior that is being modeled by a JSSM it is not necessary to have a one to one correspon- dence between distinct types of job steps and JSSM's. If two job steps have the same behavior the same JSSM will serve for both. If one job step exhibits two or more distinguishable patterns of behavior there will be a corresponding number of JSSM's. In the next chapter a tech- nique is given for assessing which JSSM's are necessary. The set of resources supplied by the computer system to be util- ized by the JSSM's is denoted as R. The set of JSSM's, J, contains individual models, M for each behavior exhibited by some job step. 1, Each such model is defined as a triple: 32 Figure 3. The sequencer for the IBM 1130 with a simple operating used in the example. where E. 1 33 is the set of resources for which Mi can be a dynamic owner. The individual resources in this set are enumerated Rij e E1, 1 S j 5 [E1]. Rij is used to denote the resource and the range of values its utilization.may take on interchangeably. is a set of probability distribution functions which determine the utilization of resources uses. The functions 1 F.. e D. where 1 s j s [0.] are defined as 1] 1 1 in E1 the JSSM M Fijzwij -*[O,l] where'Wij E E1. In particular, each resource may appear as an argument of only one distrib- ution function. is a set of condition functions which deter- mine the relative rates of resource utiliza- tion. The conditions are enumerated as C.. e L. where 1 S j S lL.I and are defined as 13 1 1 —-o Cijzvij Rik. J where Vij E E1. The use and form of the con- dition functions is described in detail sub- sequently. This definition provides two methods for specifying the amount of 34 a resource to be utilized by the job step. For some resources, rfj, the total amount of resource Rij utilized is determined by a distrib- ution function. If no condition function is given for the resource, the rate of utilization is uniform, as is assumed for the implicit models discussed in the last chapter. In other words, if r¥j units of the resource are to be used in t units of time, then r§jlt units of the resource are used in each unit of time. If, however, the rate at which R. is to be used is to depend upon some other resource, R. dition function of the form rij(t) = f(rik(t)) ° rgj / f(r§k) is used where r..(t) is the utilization of resource R up to time t 13 11 j(t) might depend on several other resources and f would be a multivariate function. As an example in the simulation. In the general case ri consider a job step which copies the contents of a tape to a disk. The resources are R11, measured by the number of buffer units transferred from tape to memory and R12, measured by the number of buffer units transferred from memory to disk. The total amount of each resource utilized is a random variable determined by the appropriate distribution functions. However, the rate at which one resource is used must be directly proportional to the rate of the other and is expressed by the condition function = . 3‘: it r12(t) r11(t) r12 / ril' This form for the condition function is based on the assumption that the speed of the tape drive is the limiting factor and that the rate of utilization of Ri is specified in terms of some other resource. 1 35 The second situation which can occur is that the total amount of resource Rij is not a random variable but is simply a direct function of the amount Of some other resource, R In this case a condition ik' function rij(t) = f> is used. For example, if a compiler is considered to be a resource R11, the utilization Of this resource, ril(t)’ is expressed in state- ments compiled. The rate, k, at which this resource can be used is expressed in statements per second. Then the processor time required, r12(t), is uniquely determined by a condition function of the form ri2(t) = ril(t) / k. An Example gf'g_JSSM The sequencer example presented above used five JSSM's of which one was a FORTRAN compiler, FOR. This JSSM is expanded here as: FOR = ( E1 , D1 , L1 ) E1 = {R11 (central processor) , R12 (disk drive) , R13 (card reader) , R14 (line printer) , R15 (elapsed time) } D1 = {F11(R12’ R13: R14) 1 L1 = {r11(t) = min(r13(t) / k1 , r15(t)) , r12(t) = r13(t) r12 / r13 ’ r13(t) = min(r’i‘3 , r15(t) ° k2) , r14(t) = g(r13(t)) . ria / g(ri3) }’ E is the set Of resources which will determine the behavior of 36 the FOR job step. The distribution functions determine the joint dis- tribution Of the number of cards to compile, the number of disk accesses required and the number of lines printed as a result. The first con- dition function Specifies that processor time is used as determined by the rate Of compilation, k eXpressed in statements per second, and 1. the Speed at which statements are presented to be compiled. However, the processor time can be used no faster than the elapsed time. The second condition function is similar to that for the tape to disk copy given previously. The third condition function Specifies that no more than the total number Of cards may be read and these may not be read any faster than the card reader Operates, Specified in cards per second by the rate k The final condition function relates the amount 2. of printing to the number of cards read. Since the compiler prints a header on the listing and compilation statistics at the end, the re- lationship given by g is not linear but has the form approximated by the curve in Figure 4. Summary and Conclusions In this chapter the terms structure and behavior of a job are defined and a model for a job is introduced based on the definitions. This model allows expression of some of the interdependencies between the various resources available in a computer system. The structure of the model preserves that of a real job and so allows for comparison of the model behavior with the behavior Of the job being simulated. For example, if the model has one JSSM which exhibits behavior which is inefficient on the particular computer, it is a simple matter to identify which real job step the JSSM corresponds to. This job step can then be modified to produce more suitable behavior. The parameters 37 1007. '1' % lines printed 0% %_ 0% % cards compiled 100% Figure 4. Condition function relating cards compiled to lines printed showing the effect Of printing header and trailer information. 37 1007. ‘4’ 1 lines printed 07. .1— 0% % cards compiled 100% Figure 4. Condition function relating cards compiled to lines printed showing the effect Of printing header and trailer information. 38 of the model are expressed directly in terms of the resources which determine the behavior while the use of a first order stationary Markov process for the sequencer provides for simple computation of the para- meters. Finally, the modular construction of the model allows added detail in any portion without requiring any additional detail through- out the entire model. Chapter 3 Computation of Model Parameters Introduction In the previous chapter the load on a computer system was con- sidered to be a set of independent jobs, with a first order model prOposed for each such job. The construction of such a model is a complex task requiring the precise Specification of structure within the framework determined by considerations discussed in Chapter 2. The computation Of the parameters required for such a model is dis- cussed in detail in this chapter. The procedure for Specifying the job model can be divided into six steps: 1. Specification of '1, the set of resources of interest. 2. Measurement of a set of jobs to determine the utiliza- tion of these resources. 3. Determination of the set of JSSM's required. 4. Specification of the structure of the required JSSM'S. 5. Computation of the transition probabilities for the sequencer. 6. Computation of the distribution functions for the JSSM'S. These steps are discussed in detail in the following sections. Resource Selection The resources which a computer system.makes available to the users are of two types, those embodied directly in the hardware and 39 40 those embodied in the software. Examples of the hardware resources are Obvious, ng: tape drives, card readers, or core memory. Since most hardware resources are limited resources, and thus potential bottle- necks, careful consideration should be given before deciding to exclude one from R. Such a decision may be valid if it is known that it is very unlikely that the demand for the resource to be omitted would ever exceed the limit. For example, assume a computer with 100,000 words of memory can have at most four partitions. In addition, assume it is known that 99% of all jobs to be run require 22,000 words or less. In this case, the small error introduced by omitting memory as a resource may be an acceptable trade for the simpler model. As will be remembered from the definition Of a total computer system, a job has access to the computer software at two levels. At the first level are utility programs, such as compilers. A compiler may be considered a resource for which the utilization is measured by the number of statements compiled. In this case, the JSSM using the resource would consist of a single distribution function specify- ing a particular number Of statements. The software resources are well defined at this level and the decision to include or omit any one can be based on the usage which it is expected to receive. If the usage is small enough, it might be worthwhile, for the sake Of concise- ness, to lump the resource together with other infrequently used re- sources into a miscellaneous resource. It is at the level of the system request that software resources are of use in modeling user written programs. The detail presented in the model of the FORTRAN compiler in Chapter 2 is typical of what might be expected in a model Of a general user's program. Here again, 41 which requests to consider resources and which to omit may be decided based upon the usage each is expected to receive. Any requests which are not specifically included can be lumped into a miscellaneous resource. Several lists of potential resources have appeared in the liter- ature to aid in evaluating systems with respect to which features are available. These lists are by no means complete but may be found in [ANSI 1971, Comtre 1970, Ferrari 1972, Hunt 1971, Joslin 1965, Mittwede 1970, Rosin 1965, Walter 1967]. Measurement and Sampling Since the model parameters are to be determined to simulate a real job, the utilization Of the resources decided upon must be mea- sured in some way. Depending on the degree of detail decided upon for the set Of resources, measurements must be made at a corresponding level. A reasonable level of detail is commonly furnished by the operating system log which is routinely provided by large scale Operating systems. Information generally available from this log is given at the job step level and consists Of such items as: processor time, memory utilization, disk file utilization, unit record utili- zation, and number of tape mounts. EXperiments have been conducted by [Hunt 1971, Rosin 1965, Walter 1967] to determine the characteris- tics of the load on a computer based on this information. Data available from system logs is generally useless in analyzing what occurs within a job step. For information from within job steps software measurement techniques may be employed [Bemer 1968], con- sisting of inserting diagnostic code at various points in the Operating system to record the desired data. This technique is very flexible 42 since it may access any information available to the Operating system and yet can be selective, producing output only under certain well specified conditions. TWO disadvantages of this technique are that the presence Of the diagnostic code influences the behavior of the system in some manner which is unmeasurable except by more complicated techniques, and modifications to a production Operating system are of a sensitive nature due to the possibility of introducing unstable behavior. Hardware monitoring devices [Apple 1965, Miller 1971] are avail- able which can supply information as detailed as the number Of times a particular instruction is executed. Since these monitoring devices are completely external to the computer system, they are not in com- petition for resources with the jobs being measured. This technique suffers from not being able to identify the monitored information with the actual job step being performed. This is eSpecially impor- tant in a multiprogramming computer system since portions of different job steps may be interleaved. The measurement procedure carried out to Obtain data from which the model parameters may be computed is a statistical experiment. Hence, an underlying distribution is assumed for the job steps from which a sample is selected. Using this sample the parameters Of the underlying distribution are estimated. The actual sampling is done at two different times, one when the data is collected and the other when it is analyzed. In the first instance, data collection sampling, the motive is to minimize the effects of measurement on the system being measured. If the measurement consists of the system log, no sampling is required 43 since the overhead involved in producing the log is part of the every day environment. With software or hardware monitoring techniques it is sometimes difficult tO assess just what the affect is. The influence of software monitoring can be analyzed in terms of the length Of the measuring procedure and the number of times it is called, but deciding the precise effect of this extra load in a multiprogramming system with multiple jobs competing for resources is so difficult that a hardware monitor may be required. With a hardware monitor it is possible that the connection can cause timing problems Uniller 1971] which might well go unnoticed while biasing the measurements. If the data collection technique is to sample the load, the sampling unit must be an entire job. This is necessary so that the transition probabilities may be computed for the sequencer. The sam- pling scheme must be constructed in such a way that the jobs select- ed accurately reflect the make up of the total population of jobs. Since almost any computer system load varies over time, proportional sampling can be used to insure correspondence between the sample and the real load [Cochran 1963, Sukhatme 1954]. Proportional sampling requires that if n jobs are to be selected from a sample space con- taining N jobs and Ni jobs occur in some time interval ti’ then 111 = N1 x n / N jobs must be selected from the interval t1. In deciding what time period, ti’ to consider for sampling, consideration must be given to the size of the samples, n1, which would result. It must be remembered that parameters are being estim- ated for the entire population of size N, not for the individual 44 subpopulations Of size Ni' Thus, it is reasonable to pick ti small enough so that n1 is one or two. In this way, small cycles are not as likely to be overlooked. If it is desirable to keep 111 large enough to estimate the para- meters for each suprpulation, sampling theory indicates that relatively small values Of ni would give acceptable accuracy DAfifi 1972]. For example, assume x is a normally distributed random variable with stan- dard deviation 20. Then for a sample Of size 15 with sample mean xm, it can be said with 95% probability of being true that the population mean lies in the interval [xm - lO , xmi+ 10]. By increasing the sample size to 60 the interval is narrowed to [xm - 5 , xm + 5]. Sampling is also a consideration in analyzing the data. While certain analysis techniques are sequential and make a single pass through the data, others require the entire data set for iterated calculations. In this case, memory constraints require the selection of a subset Of the data for analysis. To avoid placing undue emphasis on local conditions it is desirable to select the subset so that it represents the widest possible range in time. For example, in a university computer center environment, student jobs tend to be short and are most likely to be submitted at a time corresponding to the end of a class period. In contrast, research jobs will generally be somewhat more complex and will be spread more uniformly throughout the day. If a sample of jobs is selected immediately before the end of a class, it would be biased towards research jobs. If the sample is taken shortly after the end of a class, it would be biased towards student jobs. By selecting the desired number of jobs Over a span of time, local biases are smoothed out. 45 Determination of the Types g£_Behavior As was pointed out in the last chapter, each JSSM models the behavior Of a job step with the behavior characterized by the utili- zation of the selected resources. Thus, a separate JSSM is required for each type Of behavior. For example, considering the case of a FORTRAN compiler in a university environment, many instances Of this job step compile a small number of statements (student programs) while the remaining instances involve compiling substantially more state- ments (research programs). Furthermore, behavior of the first type (student) probably appears in contexts different from those in which behavior of the second type appears. If each job step is considered to be a point in a multidimensional sample space where the coordinates correspond to the various resource utilizations, then the various types Of behavior can be identified by a cluster analysis technique [Ball 1965, Edwards 1965, Rao 1971, Ruspini 1969]. Such cluster analysis techniques detect regions Of high point concentrations, called clusters, in the sample space. All job steps appearing in a single cluster made similar utilization of the various resources, and so exhibited similar behavior. Therefore, each cluster represents a particular JSSM. Figure 5 illustrates the result of plotting the possible utilization of two resources for a collection Of job steps from the IBM 1130. While the data displayed here has been constructed to demonstrate the clustering concept, real data will not always be as well separated, thus requiring sophisticated techniques for determining the clusters. The goal of any clustering algorithm is to establish decision surfaces in the sample space. These decision surfaces partition the 46 F F F AF 311 x x F AAF x F A AA A A A x XA A x J JS J JSS R12 J JOB x XEQ F FOR 3 SIGN-OFF A ASM Figure 5. Hypothetical distribution of measured job step resource utilization in a two-dimension space. J JOB X XEQ F FOR S SIGN-OFF A .NSM Figure 6. Hypothetical distribution of measured job step resource utilization as in Figure 5 with decision surfaces shown. 47 Space into disjoint regions, each region corresponding to a single cluster. Figure 6 illustrates the decision surfaces for the example of Figure 5. A new data point is identified with a cluster, and thus a type of behavior, according to which region it is contained in. As can be seen in Figure 6, these decision surfaces can be very complex, making them.difficult tO represent in a concise and efficient manner. One method of simplification used in several clustering algorithms [Ball 1965, Rao 1971, Ruspini 1969] is to represent each cluster by a single cluster pgint. These cluster points, one for each cluster, are used to specify the decision surfaces in the following manner. Let x be a new data point to be identified with a cluster and let d(',') be a distance metric defined on the sample space. Then if the cluster points are {pl 3 pz 3 ° ° ° ’ pm}, the point x is associated with cluster i if d(X.pi)>0 Figure 7. Illustration of the effect Of the shape parameter, c , on the form of the Weibull distribution. 59 influences a Gaussian distribution. The parameter b, the location parameter, is analogous to the mean for a Gaussian distribution. The extra flexibility is afforded by c, the shape parameter. Figure 7 illustrates the effect of the shape parameter on the cumulative dis- tribution function. For c=O, f(x) becomes an impulse function, 3:2; highly skewed. As c increases, f(x) becomes more and more symmetric, approaching a Gaussian distribution. Procedures for estimating a, b and c are given in [Dubes 1970]. Unfortunately, the properties of these estimates are not well known although simulation experiments have verified their usefulness. Summary The computational procedures presented in this chapter are all well defined and understood techniques. The basis for the determina- tion of the model parameters is thus firmly grounded in statistical theory. The structure Of the job model gives firm direction in the performance of the statistical experiment by identifying those variables (resources) which must be measured. And yet, in the specification Of the condition functions, knowledge of the system not directly avail- able from the measurements may be included. Chapter 4 A Hierarchy of Instances of the Job Model Introduction In Chapter 2 a general framework for the construction Of models of the load on a computer System based on a model for a job is presented. Particular instances Of this framework may be used to model a load at varying levels of detail. In this chapter a partial ordering on such instances of the job model is described. For several instances Of the model the data requirements for parameter computation and the uses of the model instance are discussed. Formal Basis for a_Hierarchy The degree of detail provided by a particular instance of the job model may be measured in several different ways. One such measure of detail is the number of resources for which the utilization is modeled. While the number and type of resources to be included in any instance of the model depends on the computer on which the modeled load is to be run and on the goal of the modeling, there are three general cri- teria for including or excluding a particular resource. First, any resource which is a limited resource for the computer on which the job model is to be run, and therefore a potential scheduling problem, Should be modeled. Second, no resource utilization which is eXpress- ible as a function of one or more other resources should be included. Finally, any resource which is not significant in determining the 60 61 types of behavior exhibited by the load Should be excluded. Techniques for identifying those resources which meet the second and third criteria are discussed in the previous chapter. The application of these three criteria must be tempered by two further considerations. First, those resources included in the model must be consistent with the general level Of detail. For example, even though the adder of the CDC 6600 is a limited resource, it should not be included as a resource to be modeled if the desired level of detail in measuring the use of the processor is satisfied by the number of processor seconds used. Second, a resource should not be excluded if it is essential to Satisfy the reason for modeling. For example, if a load is being modeled to study the effect of memory utilization on a computer, then memory must be included as a resource. A second measure of the detail of an instance Of the load model is the form of the distribution functions Specifying the utilization of the various resources. That is, if the utilization of each resource is statistically independent of the utilization of all other resources, then all distribution functions are univariate. If certain resources are statistically dependent on other resources, joint distribution functions are required. Standard statistical tests are available to identify the independent and jointly distributed resources. The final measure of detail of an instance of the job model is the form of the condition functions which serve to specify the relative rates at which the resources modeled by each JSSM are utilized. These condition functions allow the modeled resource utilizations, as functions of time, to accurately model the measured resource utilizations. By formalizing the idea of the level of detail provided by an 62 instance of the model, a partial ordering, SR’ of such instances of the model can be derived. This partial ordering is defined such that, if Ill and Iflz are two instances of the model for a particular load, and Ill Sh Iflz, Iflz provides a better approximation to the measured re- source utilizations than does Ill. The relation Ill sRIflz is read "I62 refines Ill" and is based on the following definitions. Let R: {R1 , R2 , ,Rn] be the set Of all resources of the computer system and let R1 Q R and R2 (2 R be defined as R1 ll r-M. 7:1 71 and For each resource, Rk’ the utilization is denoted by rk(t) and is expressed in apprOpriate units. The job elapsed time, 'T, is the ordered aggregate of all time intervals during which the job uses any resource. The job elapsed time is measured from the beginning Of the Sign-on job step to the completion Of the sign-Off job step. A job is a vector valued function Of time defined to be J(t) = r1(t) x r2(t) x ... x rn(t). The restriction 2f_a_job to a resource set '31 is also a vector valued function of time defined to be 63 J(t) [R1 = r1“ (t) x ri (t) x x ri (t). l 2 m The load on a computer system, |., is the set of all jobs run on the computer system and the restriction c_)_f_ the load to R1, L] R1, is the set of restrictions to '31 Of all jobs run on the computer system. For a fixed value Of time, ¢ e'T , the values Of .J(T) for all jobs in the load can be described by an n-dimension histogram. This histo- gram represents the measured distribution Of all resource utilizations which any instance of the model must approximate at time T. An instance of the job model, M, over the resource set R modeling load I. is a vector valued stationary stochastic process [Papoulis 1965], x(t,z), where t e'T and z is an outcome Of an experi- ment, namely the selection of appropriate total resource utilizations according to the distribution functions Of the JSSM'S. This stochastic process may be interpreted in two ways. For a fixed outcome, 2, X(t . Z) = ri(t) X ré(t) X rr',(t) where r](t) is the approximation to the utilization Of resource Ri embodied in the condition functions. For a fixed value of time, T e 1', x(T,z) is a random variable with a distribution of values Over all possible outcomes, z, of the experiment. When comparing two instances Of the job model to determine whether one is a refinement of the other, the basis for comparison is how well each model approximates the real jobs. While it is possible for a model to approximate the measured jobs equally well for all values in the interval T, such detail is difficult to Obtain and frequently not needed. As will be illustrated by sebsequent examples, all that is 64 required to study many aspects of a computer system is that the model of the job approximate the real jobs closely only at particular times. The set of all such times at which the model must closely approximate the measured jobs is called the consistency set of the model. Further- more, the times which make up the consistency set are the only times at which the modeled utilizations can be evaluated. The values Of the modeled utilizations at all other times is immaterial. Figure 8 illustrates the utilization of one resource made by a real job and three different models approximating this utilization. Each model has a corresponding consistency set. Intuitively, x2(t) is a refinement of x1(t) if x2(t) is identical to the resource utilization made by the real jobs at least everywhere that x1(t) is, i.e. c g c . Referring again to Figure 8, x2(t) is 1 2 a refinement of x1(t) as is x3(t). However, x2(t) and x3(t) are not comparable Since c2 i c3 and c3 t c 2. The actual definition of a consistency set is complicated by the fact that, for each value in 'T, the real jobs and the instance of the model represent not just a single value for the utilization of each resource, rather they represent a distribution of possible values for the utilization of the resources. The requirement to compare two distributions is accounted for in the following definition of a consistency set. Let ll be an instance Of the job model for a load L over resource set 81 which Specifies the stochastic process x(t,z), Then c g'T is a consistency set for In with error bound e if, for every t' e c, the distribution of x(t',z) over all outcomes, 2, approximates the histogram Specified by J(t') [R1 for all .145; L with a mean squared error no greater than e. In other words, the 65 real job (t) r -- - 391(t) consistency set c1 e [t0,t2,t3,t5} , I l rm / ; x2e.) I I z u : consistency set I | u .. , a 2 c2 = it0,t1,t2,t3,t5} [In I I I . , ' , L I , r(t) /‘ I --— x (t) a , 3 ,3 a consistency set ,~ I/ s l : : : c3=€t..t.e.t..t.i / ' ' | ” ' ' ' l 1 to t1 t2 t3 tn t5 job elapsed time Figure 8. Illustration of three different approximations to a real job, x1(t), x2(t) and x3(t), and the corre- sponding consistency sets. 66 modeled behavior must approximate the real behavior to some specified degree at the points defining the consistency set. For example, if an instance Of the model models all resource utilizations at the end of each job step with an error Of 5% or less, then e=.05 and c = {t l a job step ends at time t]. If Ill is an instance of the model over '31 with consistency set c1 and error bound el, and Similarly for Iflz, F12, c2 and e2, then M1 18 a time refinement of M2, written M2 S'I‘Ml’ if and only if c2 <2 c1, e1 g e2 and R1 = R2. M1 is a refinement of M2, written 1 5 e2 and c2 ; c1. Furthermore, the relations SR and ST define partial orderings on the set of all M2 SRMl’ if and only if R2 gal, e instances Of the job model for a particular load. To illustrate the definition of refinement, consider two instances of the model, “"1 and Iflz. Let Ill model processor time and memory utilization at the end of each job with no more than 5% error. Let Iflz model processor time, memory utilization and data channel utili- zation at the end of each job step with no more than 4% error. Then ““1 sRIflz Since I02 models more resources with a smaller error and a larger consistency set. When actually constructing an instance of the job model, the degree of detail required will be determined by the use which is to be made of the model. Once the degree Of detail is determined, a suitable resource set, consistency set and error bound can be chosen. If the results obtained with this instance of the model are not suit- able, then there are three possible ways to refine the model, gig; enlarge the resource set, enlarge the consistency set or lower the 67 error bound. In the remainder of this chapter examples of three levels Of detail are presented. Each such example is a refinement Of the preceding example. Approximation at the Job Step Level A reasonable degree of detail, based on the model framework estab- lished in Chapter 2, is to require consistency between the model Of the load and the real load at the end of each job step. To Obtain this degree of detail, each JSSM must include probability distribution functions specifying the total utilization of each resource. For resources such as a central processor, for which the utilization is measured in seconds, a condition function is required which relates this utilization to real elapsed time. For example, if R1 is a central processor with utilization, r1(t), measured in seconds, then the rate of utilization is specified as t C - * 9‘ - i‘ r1 t t > r1 where t is the real elapsed time, t is the real elapsed time at the 0 start of the job step and r* is the total utilization of R Specified 1 l by the apprOpriate distribution function. It should be noted that this condition function makes no allowances for the influence of a scheduling algorithm on the rate at which the resource is utilized since such a Scheduling algorithm is part of the computer system and not part of the load. A variety of different forms of condition functions can be used to achieve the desired utilization for those resources not measured in units of time and which therefore need not be synchronized to elapsed 68 time. For example, let R be a disk drive for which the utilization, 2 r2(t), is measured in record units transferred with r3, the total utilization, Specified by an apprOpriate distribution function. Then any one of the following forms Of the condition function would be satisfactory: O t . t r (t) = O 2 r” t > t 2 ‘ 0’ 0 r1(t) / r“ r2(t) = r3 r1(t) = r* or O t - t0 < O r2(t) = 7‘.- ° 7': 7": r2 r1(t) / r1 0 s r1(t) 3 r1. Of course, if additional information is available concerning the pattern of utilization of such a resource, one form of the condition function might be preferred Over the others although all the above forms of the function satisfy the required consistency set equally well. Regard- less Of the form Of the condition functions used, the data required from the load being modeled is a record Specifying for each job step executed the utilization of each resource. An instance of the job model at this level Of detail may be implemented in any Of the ways described in Chapter 1, Sig; as a synthetic benchmark program, part of a simulation.model or part Of an analytical model. Such an implementation could be used to study the effect Of changes to the scheduling algorithm or the effect of 69 changing the hardware configuration on the total throughput of the system. Similarly, such an implementation of an instance of the model could be used to study the effect of scheduling disk accesses on the average elapsed time a job requires to be processed. However, there is insufficient detail to allow any meaningful conclusions to be reached on problems such as the effect Of input buffer size on the time spent waiting for input. Such problems require the availability of informa- tion concerning the relative rates Of resource utilization within each job step. While this information is Specified by a JSSM using any of the above Specified forms Of the condition functions, the consistency set requires only that the resource utilizations be consis- tent with the real load at the end of each job step. Modeling the Occurrence gnyait Conditions One design criterion embodied in any multiprogramming computer system is that a job which cannot use a resource for some reason should relinquish ownership of that resource, allowing other jobs access. In particular, this criterion_applies when a job step is waiting for some external condition to Occur before the job step may continue processing. Examples of such wait conditions are the need for a tape to be mounted, a requirement for more memory for the job step or the availability of a data channel for input or output. Two proper- ties Of these wait conditions are of importance in modeling a load. First, the occurrence of a wait condition depends only on the structure of the one job step in which it occurs. Second, the length of time to satisfy a wait condition generally depends on the number and type of jobs running concurrently and on the scheduling algorithms used by the computer system. 70 The occurrence of each wait condition represents a necessary inter- action between the job step and the computer causing the job step to relinquish ownership of all resources possible. When the wait condition has been removed, the job step is eligible to continue execution once it has regained Ownership Of all necessary resources. The Operating system includes scheduling algorithms to decide which of the job steps eligible for execution Should be given the needed resources and allowed to continue. For example, if a job step must wait for a tape to be mounted, it will release the central processor for some other job. When the tape is mounted, the job step must wait for the scheduling algorithm.to restore ownership of the central processor before it can resume execution. The effect of these scheduling algorithms can most readily be observed in the elapsed time required for completion Of a job step. For an interactive computer system this elapsed time is called the response time and is an important measure of the performance of any such system. In order for an instance Of the load model to be useful in evaluating alternative scheduling algorithms, the instance of the model must provide sufficient detail to model the occurrence Of wait conditions. Thus, one possible refinement of the previous example is to expand the consistency set to include the occurrence Of every wait condition and the end of each job step. To achieve this degree of detail, a record Of the load being modeled is required which Specifies for each wait condition the reason for the condition and the utilization of each resource. The condition functions required to achieve consistency with a real load at every wait condition are more complex than those required 71 to achieve consistency only at the end of each job step. The form and parameters of such condition functions depend on the particular wait condition and can best be described by several examples. Consider a job step generating output. SO that the high computation rate can be matched effectively to a lower output rate, an output buffer is used, i;e;_every write operation in the program Simply moves the data to be written to an area in memory designated as a buffer. Only when there is no more room in this buffer is the data actually transferred to the output device. Once the buffer is filled, the job step must wait for the data in the buffer to be transferred before it can resume computation. The two resources to be interrelated by the condition function are the processor, R and the output unit, R3. The utilization of the 1, processor, r1(t), is measured in seconds and the utilization Of the output unit, r3(t), is measured by the number of times the buffer has been output. Then the condition function has the form [0 r1(t)