‘ b
.N‘EE‘"
I.I r - .
7‘ .I

I) I ’3'“, .I'I-xv, 1, .1.
1‘4. €11,515. - . . ‘2‘. m 1' 1 1,: _ 11 . ‘W-Smu 2’.
. .’. V 1 L 1. §:.. . Au? "",‘4‘-‘
“‘6 .4 ”“1“" ‘ 1. \tI ’\ 'V' 5 . .5. 1 ' l 1 1" ‘J’A‘éI-l
I ‘ ‘ A
I. .1

11M. 6-"
1;". *t" i": 'gﬁ’i‘ﬁ
$3 TI ‘3'; .r1uurIL'.)-'P%\liﬂ.£1.:;¢

\‘V‘. “W”
' V h\:“1‘:."
KMA

\

:I'-.. .Q I 'I L I
If. ' 1 > n 1
"I ‘Sy‘vé gx'ggéqvvs.
:3 :I'W 1; $3».
1113' S, 1‘

3 I
,. '
3“I- l' '

It} :Lg'ak, "l' 1
‘-I""‘-' I1
‘ II '1‘“.
“HINT ' "

“14"..." _ 53$:

0“": ct" (1'11" ,‘\ _.

I’II:v
\

I‘M-"x:

‘1‘

1“

T“ 1\I

I“ '1’JE.“"‘U'..L "NF".

5".

w
1.
4a-.

. milk? .' 51%.... '-

3

VJ, ' .1' n ‘I'l'v't
I I
l
I"! 1‘

1'” l

'1; t' 31;.

mm

1 1‘" ”L I” . .
'11: 12-15. .:..#\ I . .;‘."\ , . 1 . a . 1 -. a. dz‘r; My.
I t Q .I‘ tw‘ l1 . 0“ I II’ II‘ - . w ‘ '1 1.9“}- , \Ip A.
1 1 1 ‘. III Iv -. -< - . -. »
'1'! I ‘.I':..".' 7 m l"; n- I“ $i'11", '1 1 ’1‘? -. ":1 "“ .. - ﬂ ‘1? :1? “111“”l “h!
4“” 1' I 1 ,. ' I‘ '. I . ‘ . b.2191 '.. "I ‘1‘ U . I..I"'
I ’ ‘ ‘, 4, ' ' ’
W
I"

l
1

III A'.‘ Q“ '
f;|.nl-r I- .
*2 '1.
"dul: I I
~Iy‘; H“
-‘I'I um“;
I “1"-
.' 1:1. 1.
"in

 

 

 

This is to certify that the

thesis entitled

A MODEL OF PROGRAM PERFORMANCE
IN MULTIPROGRAMMING SYSTEMS

presented by

Chester Terrance Trahan

has been accepted towards fulﬁllment
of the requirements for

Ph.D degree in Management Science

 

 

Mame/t

Major professor

Date ”/5/ 77
/ ,

0-7639

© 1978

CHESTE R TE RRAN CE TRAHAN

ALL RI GHTS RESERVED

A HODEL 0P PROGRAH PERPORHANCB

IN HULTIPROGRLHUING SYSTEMS

BY

Chester Terrance Trahan

A DISSERTATION
Submitted to
Bichigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Departlent of Management

1978

(1‘

ABSTRACT

A HODEL OF PROGRAH PERPORHANCE
IN HULTIPROGRAHHING SYSTEES

BY

Chester Terrance Trahan

Good planning information is a continuing necessity
for Data Processing management. Along with the need for
good performance information in planning to meet demand for
computer services, there is a need for individual program

performance prediction in computer job scheduling.

Some of the information necessary for planning is
collected based on performance measurement (usually by
system software). This must be augmented by benchmarks,
simulation or other techniques to predict system performance

under a specific set of circumstances.

The use of an analytic model has been proposed as the
most flexible method of predicting computer performance for
medium scale, priority-interrupt driven Operating systems.
To this end an analytic model was developed with cyclic
queuing submodels for the Central Processing Unit and the
Input/Output units and a deterministic "independent
reference" submodel for program paging behavior. The model
was programmed in APL [1] and Newton's method and Aitken's
pelt; §qgggg [2] algorithm were used to achieve convergence

of the model to equilibrium solutions.

Chester Terrance Trahan

A set of parameterization runs were made on a
System/370 nodel 1u8 and their measurements were then used
to estimate the system parameters: I/O overhead, paging
overhead, Page-in/Page-out ratios, and page "weights" for
the paging submodel. A "synthetic" program was written so
that computer usage and paging behavior could be controlled
internally and real storage, Input/Output behavior and
priority could be controlled externally to the program.
Several variations of the "synthetic" program were measured

and their characteristics estimated.

Following parameter estimation, the estimated values
were used in the APL model to predict the performance of the
experimental programs in an experiment designed as a
2X2X2X2X2 half-replicate factorial. The actual experiments
were then conducted and the experimental results compared to
the predictions. The results of the experiment showed good
prediction in the area of working set sizes and elapsed

times and aggregate I/O rates.

The results showed a sizable error in page rates,
channel utilization, overhead and CPU utilization. The
nature of these results was attributed to the inadequacy of
the independent reference model in representing paging
behavior. These results reinforced Belady and Kuehner's
conclusions about the unsuitability of independent
references [3] as a model for program paging. Hodifications

were then made to the model and the predictions of the

Chester Terrance Trahan

revised paging model showed good agreement with the
experimental data. The revised paging model as well as
several previously developed models showed good agreement
with paging behavior for programs executing with constrained
memory. However, all of the models examined showed a poor

fit under conditions of loose memory constraints.

BIBLIOGRAPHY

Iverson, K.B. A Programming Language, New York, John
Riley 8 Sons, Inc., 1962.

Isaacson, 2., and Keller, 8.8. Analys;_ g; ggmgrica;
methods, New York: John Wiley 8 Sons, Inc., 1966.

Belady, L.A., and Kuehner, C.J. "Dynamic Space-Sharing
in Computer Systems", Communications 9; the Agg,

To Thelma

iii

ACKNOWLEDGHENTS

I would like to thank the chairman of my guidance
committee, and present chairman of the Management Depart-
ment, Professor Phillip Carter for his encouragement. I
also want to thank the members of my dissertation committee,
Co-chairmen Professors Richard Henshaw and Herman Hughes,
and Professor Gerald Park. I also owe a debt of gratitude
to my manager, Paul Beukema, for his help in getting permis-
sion for me to perform my experimental work at IBh.

East Lansing,hichigan C.T.T.
1978

iv

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . .

LIST OF FIGURES.

LIST OF SYMBOLS. . . . . . . . . . . . . .

INTRODUCTION . . . . . . . . . . . . . . . .

I.

Performance Planning . . . . .

Scheduling . . . . . . . . . .
Priorities and Interactions.
Batch versus Timesharing . . .

BACKGROUND AND PREVIOUS WORK . . . . .

CPU Models . . . . . . . . . .
Central Server Network Model .
Simple Cyclic Queuing Model. .
Simple Flow Model. . . . . . . .

Cyclic Queuing with Paging and Overhead

Product of Stages Model. . . . . .
Multiple Resource Allocation Model
"Straightfoward" Queuing Model .
Eclectic Model . . . . . . . . .
Summary of Queuing Models. . . .
Synthetic Workload Benchmark .
Models of Paging Behavior. . . . .
WOrking Set Model. . . . . .
The Lifetime Function. . . . .
Markov Models. . . . . .
A "Half-Life" Model. . .
The Simple Linear Model.
The Paging Index . . . . .
The Page Survival Index. . . . . .
I/O Models . . . . . . . . . . . .
Disk Response Model. . . . . . . .
Disk and Drum Scheduling Models. .
The Disk Seek Model. . . . . . .
Revised Disk Response Model.

. viii

H
\l

O O O O O I O O
N
on

II. MODEL DEVELOPMENT AND ANALYSIS. . . . . . . . . . . .35

Batch Model Requirements . . . . . . . . . . . . .35
CPU Submodel . . . . . . . . . . . . . . . . . . .38
CPU Completion Time. . . . . . . . . . . . . . .39
CPU Waiting Time . . . . . . . . . . . . . . . .ua
CPU Utilization and Elapsed Time . . . . . . . .46
Paging Submodel. . . . . . . . . . . .'. . . . . .47
Page Exception Rate. . . . . . . . . . . . . . .48
Memory Allocaton . . . . . . . . . . . . . . . .u9
"Near-Optimal" Memory Allocation . . . . . . . .53
1/0 Submodel . ... . . . . . . . . . . . . . . . .Su
I/O Overview . . . . . . . . . . . . . . . . . .sa
I/O Submodel Development . . . . . . . . . . . .55
Disk I/O Model . . . . . . . . . . . . . . . . .57
Direct Access Disk I/O . . . . . . . . . . . .62
Indexed Disk I/C . . . . . . . . . . . . . . .68
Sequential Access Disk I/O . . . . . . . . . .73

1/0 for Paging . . . . . . . . . . . . . . . .75
Submodel Integration . . . . . . . . . . . . . . .76
Model Convergence. . . . . . . . . . . . . . . . .82

III. MODEL VALIDATION AND EXPERIMENTAL DESIGN . . . . . .89

Experimental Plan. . . . . . . . . . . . . . . . .89
Programs for Measurement and Control . . . . . .91
Experimental Variables . . . . . . . . . . . . .92
Instruments and Measures . . . . . . . . . . . .93
Experimental Design. . . . . . . . .,. . . . . .98

Design for Parameter Estimation. . . . . . . .98
Experimental Design for Validation. . . . . 101

vi

IV. EXPERIMENTAL RESULTS AND ANALYSIS . . .

Parameter Estimation . . .
Normal I/O overhead. . .
Paging I/O overhead. . .
Paging Ratio Estimation.
Page Allocation Heights.
Miscellaneous Parameters . . .

CPU Rate Estimation. . . . . . .
Virtual Storage Estimation . . .

Storage and Paging Index Estimation.

Experimental Manipulations . . . .
CPU Variance Manipulation. . .
I/O Variance Manipulation. . .

Model Predictions. . . . . . . .

Model Validation . . . . . . . .
Results of Analysis of Variance
Confidence Interval Analysis .

Sources of Error . . . . . . . .
CPU Submodel . . . . . . . . . .
Potential Sources of Error-Paging.

Post-Experimental Paging Analysis. .
Revised Paging Submodel. . . . . .
Comparison with Earlier Models . .

V. CONCLUSIONS AND RECOMMENDATIONS. . . . .

Research Conclusions . . . . . . . .
Recommendations for Further Research

GLOSSARY OF TERMS. . . . . . . . . . . . . .
APPENDIX A O O O C C O O O O O O O O O O O O

SELECTED BIBLIOGRAPHY. . . . . . . . . . . .

vii

112

112
112
113
115
117
120
121
122
122
127
129
130
130
131
135
137
138
138
139
141
141
142

143

143
145

148
156

171

1.
2.
3.
4.
S.
6.
7.
8.
9.
10.
11.
12.
A1.
A2.
A3.
A4.
A5.
A6.
A7.
A8.
A9.
A10.
A11.
A12.

LIST OF TABLES

I/O Overhead Regression Analysis. .
Paging Overhead Regression Analysis
Page I/O Ratio Analysis . . . . . .
System-wide Paging Ratio Analysis
Estimated Working Set Heights . .
Working Set Weight Analysis . . .
Paging Index Calculations . . . .
Experimental Program Estimates. .
Model Predictions . . . . . . . .
Experimental Runs - Measured. . .
Experimental Runs - Calculated. .
ANOVA Table of Prediction Errors.
Non-paging Runs Repl I - Measured
Non-paging Runs Repl I - Calculated
Non-paging Runs Repl II - Measured.
Non-paging Runs Repl II - Calculated.
Paging Runs Repl I - Measured . . .
Paging Runs Repl I - Calculated . .
Paging Runs Repl II - Measured. . .

Paging Runs Repl II - Calculated.
Page Allocation Runs ~ Measured .
Page Allocation Runs - Calculated .
Program Estimation Runs - Measured.
Program Estimation Runs - Calculated.

viii

113
115
118
119
120
121
124
128
131
133
134
136
156
15 8
160
161
162
163
164
165
166
167
168
169

10.

LIST OF FIGURES

Central Server Model . . . .
Rational Laplace Transforms.
Elapsed Time "Cycle"(ET) . .
Disk Access Model Diagram. .
Indexed Access Timing Diagram.
Random Access Timing Diagram . . .
Experimental Design for Estimation
Experimental Design for Validation
Levels for Independent Variables .
Multiprogramming Interaction Model

ix

.13
.16
.40
.59
.60
.61
100
103
108
132

9(V)

f(.)

w(j)

A(j)

ALP

ARAT(i)

ARAT(i,j)

ARAT(i,j,k)

3(1)

BETA(j)

BETA

BS(i,k)

BSP

C(j)

LIST OF SYMBOLS

The average process time between page faults.

A function giving the difference between the
average instantaneous I/O access rates on two
consecutive iterations of the model.

A real number between zero and one. A probabili-
ty.

The average working set size for program j
(pages).

,The paging activity index for program j

(references/instruction).

CPU overhead for a non-paging I/O cycle (sec-
onds).

Total access rate for all programs to device 1
(accesses/second).

The access rate for program j to device 1
(accesses/second).

The access rate for program j to file k on
device 1 (accesses/second).

The length of a CPU service interval before an
I/O or paging operation is generated (seconds).

The ratio of page-reads to page-writes for
program j.

The system ratio of page-reads to page-writes.

The average block size (physical record size)
for file k of device i (bytes).

The block size of a page in the virtual system
under discussion (bytes).

The completion time for program j (seconds).

CA

CER (j)

CI(i,k)

CK (j)

CL(i,k)

COH(j)

CP(j)

CPPU)

CSU)

CS

CTIME(j)

CYL(i,k)

CYT (j)

DU)

DC

DEL

DI (ivjlk)

DI(i,k)

The coefficient of variation for the distribu-
tion of I/O requests for disk.

The instantaneous page exception rate for
program j (exceptions/second of process time).

The cylinder location of the mid-point of the
index area for indexed file k on device 1.

A factor used in calculating the paging response
time for program j.

The number of cylinders in the index portion of
file k on device 1.

The CPU utilization for overhead due to program

j.

The CPU utilization due to program j not includ-
ing overhead (COH(j)).

The total CPU utilization due to program j
including both CP(j) and COH(j).

The coefficient of variation for the distribu-
tion of completion time for program j.

The coefficient of variation for the distribu-
tion of disk service time.

The total CPU time attributed to program j by
operating system job accounting routines. Total
problem-state time (seconds).

The size of file k on device 1 (cylinders).

The ratio of the apparent machine instruction
rate to the nominal machine instruction rate for
program j.

The amount of CPU processing time preoempted by
the channel in performing an average I/O opera—
tion for program j.

The amount of CPU time "stolen" by the channel
for each byte of data transferred (seconds).

The average CPU overhead for a paging cycle
(seconds).

The probability of an access by program j to
file k on device i.

The probability of an access to file k on device

xi

DS

ETU)

ETA(j)

EXP(N)

FAC(j)

FN

H(j)

10(1)

KS

LAN(j)

LC(i,k)

L(i)

i by any program during a specified interval.

The amount of CPU time "stolen" by the channel
to initiate an I/O operation (seconds).

The elapsed time per cycle for program j includ-
ing initial wait H(j), completion time C(j), and
I/O wait time IC(j) (seconds).

The average I/o wait time during a paging cycle,
ETA(j) = (1 + Beta(j))*PHI(j) (seconds).

The paging expansion factor. The tendency of
overall system paging rate to increase with an
increase in multiprogramming set due to page
frame contention.

The expected fraction of completion time which a
program must wait if completion of it's I/O
finds the CPU busy servicing program j. If
Cs(j) is the coefficient of variation for the
completion time for program j, FAC(j) =

(1+C5(j))/2-

The average system real-time paging rate. The
sum of the individual program real-time paging
rates (pages/second).

A factor used in calculating the induced comple-
tion time due to the completion of I/O for a
higher priority program while a lower priority
program is in the interrupted state.

The average response or wait time for any type
of I/O operation by program j including paging
(seconds).

The average CPU instruction execution rate
(instructionS/second).

The slope of the expression for seek time. The
average seek time per cylinder
(seconds/cylinder).

The average instantaneous I/O rate for program
j. This means that while program j waits for
completion of paging or I/O the operation
completes with rate LAM(j) (completions/second).

The mid-point of the data area of file k of
device i (cylinder).

The weighting factor for program j in a biased
memory allocation scheme.

xii

M(j)

M0(3)

N(i,j,k)

N(i,k)

N(j)

OHT(N)

PG(Ii)

PHI(1)

91(3)

PDC”

93(1)

RC(i,k)

RCU(ivjnk)

RCU

RCU(j)

The "critical memory". The smallest value of
program j's working set such that program j's
paging rate is less than .5 pages per second.

The instantaneous CPU access rate for program j
in the absence of paging. MU(j) =
CTIHE(j)/(N(j)+1)-

The average number of non-paging I/O operations
to file k of device i issued by program j during
it's execution.

The average number of non-paging I/O operations
issued by all programs in the multiprogramming
mix to file k of device 1 during some specified
interval.

The total number of non-paging I/O operations
generated by program j during it's execution.

The total number of programs in the multipro-
grammming mix.

The total system CPU overhead with N programs in
the multiprogramming set.

The total number of cycles in program j's
execution which terminate in paging exceptions.
The number of pages read for program j.

The average time it takes for program j to read
or write a page from virtual memory (seconds).

The average real-time page exception rate for
program j. PI(j) = CP(j)*CER(j)
(exceptions/second of real time).

The total number of page-write operations on
behalf of program j during it's execution.

The the sum of the page-read and page-write
rates for program j. The total paging rate for
program j.

The number of data records per cylinder for
indexed file k cn device 1.

The average channel utilization caused by
program j accessing file k on device i.

The average total channel utilization for some
specified channel during a specified interval.

The average total channel utilization that is

xiii

RD

5(1)

5K(iujoki

SK(i,k)

5K2(i:j:k)

S1(i,j,k)

52(iojlk)

T(j)

TAU (j)

TRD(i:j:K)

TS(ic1ok)

TS1(i,j,k)

apparent to program j. The expected channel
utilization on the condition that program j is
not using the channel.

The period of rotation of a disk device. The
length of time required for the platters or
disks to complete one revolution (seconds).

The total amount of pagable memory. The total
amount of real memory minus fixed memory (pag-
es).

The total amount of virtual memory required by
program j for execution--program j's address
space (pages).

The average seek distance experienced by program
j when accessing file k of device i (cylinders).

The average seek distance for any program in the
multiprogramming mix when accessing file k on
device i (cylinders).

The second moment of the seek distance for
program j when accessing file k on device i
(cylinders**2).

The average seek distance for program j when
accessing the index area of indexed file k on
device i (cylinders).

The average seek distance for program j when
accessing the data portion of indexed file k on
device i (cylinders).

The amount of time required to move a disk
access mechanism exactly one cylinder (seconds).-

The aggregate I/O rate for all but program j.
This symbol is also used to represent a delay
term in calculating initial wait.

The average amount of CPU usage due to program j
per cycle, including the overhead for paging
and/or normal I/O (seconds).

The average total response time for program j
accessing file k on device i, including wait
time (seconds).

The average seek time for program j when access-
ing file k on device 1 (seconds).

The average seek time for program j when access-

xiv

T52(i:jok)

TSC(i,k)

TSC1(i,j,k)

TSC2(i,j,k)

TSD(iojok)

TSD1(i,j,k)

T"C(iojok)

THE (1, j, k)

V(j)

X1(1)

ing the index area of indexed file k on device 1
(seconds).

The average seek time for program j when access-
ing the data area of indexed file k on device i
(seconds).

The average channel service time for any program
accessing file k on device 1 (seconds).

The average channel service time for program j's
access to the index area of indexed file k on
device i (seconds).

The average channel service time for program j's
access to the data area of indexed file k on
device i (seconds).

The average device service time for program j
when accessing file k on device 1 directly or
sequentially (seconds).

The average device service time for program j
when accessing the index area of indexed file k
on device 1 (seconds).

The average device service time for program j
when accessing the data area of indexed file k
on device i (seconds).

The average channel wait time due to blocking by
other programs experienced by program j when
accessing file k on device i (seconds).

The average channel wait time due to the channel
being busy when the record to be read or written
passes under the read/write heads. This is the

wait time experienced by program j when access-

ing file k on device 1 (seconds).

The initial wait time experienced by program j
between the completion of an I/o or paging
operation and program j's next access to the CPU
(seconds).

The average I/o response time experienced by

program j for non-paging I/O to all of it's
files (seconds).

XV

INTRODUCTION

Data Processing management has a continuing need for
planning information. Long and short range planning is
necessary so that the data processing needs of the organiza-
tion can be met without undue disruption of service and

unnecessary expense.

Lead times for equipment delivery schedules require
that equipment be ordered several months to years before the
equipment is actually needed. To assess the impact on
customer service and installation stability, the data
processing management team must be able to estimate the

effect of changes in software or hardware.

Software changes can be to application software or
systems software. Application software is the collection of
programs and procedures in an installation which are written

to support the busine.§ fgpgtions of the organization.

 

Programs which do payroll, schedule manufacturing opera-
tions, and maintain inventory accounts are examples of
application programs. Programs which generally support the
functions of the data processing organization are called

system software. Program compilers, sort routines, and even

the collection of programs which control the elements of the
computer system-~the operating system--are all examples of
system software. Hardware changes may be made to the
central processor (CPU), or to the peripheral or
input/output (I/O) devices such as disk drives, tape drives,
printers, drums, card equipment and teleprocessing termi~
nals. The change may be either replacement, acceleration of
operating speed, or augmentation of the capacity of the

device(s) or unit(s) under consideration.

Part of the information needed by data processing
management is provided by their data processing system in
the form of job accounting data collected during system
operation, while the systems requirements come from the
business plans developed by corporate planners. The remain-
der of the information required by data processing manage-
ment to fulfill their responsibilities is derived from
industry publications, vender literature and proposals,

staff research and intuition.

Information in the third category tends to be associ'
ated with performance, capacity and the capability of
computer systems. Typical questions that one hears in this
category might be: will the addition of that new disk drive
improve throughput by 10% or are other changes required?
how many programs should be running concurrently to optimize
throughput? will the installation of the new CPU be

sufficient to handle a projected 15% increase in workload

l:h'
r

during the next twelve months? will the proposed system
meet the performance specifications? how will the schedule
be affected if the third shift update is moved to the first

shift?

The answers to such questions are not only of interest
to data processing management and systems analysts but they
are of key importance to vendors of hardware and software.
Vhile system collected job accounting data can supply the
basis for an analysis which may yield the answers to the

above questions, it cannot provide answers.

The means of deriving answers to the above questions
range from intuitive guesses to in-depth analysis and
simulation. In some cases decisions are made to increase
computer capacity without in-depth analysis because of a
lack of analytical skills. In other instances benchmarks
are performed as a means of alleviating this situation. A
benchmark is the execution of an actual set of programs
(currently implemented on an installed or "base" system) on
the proposed or "target" system. This approach is usually
impractical for the following reaSons: (1) the amount of
time available on a target system is sometimes limited, (2)
the data files from the system being modeled cannot be
removed because they are needed for continuing processing,
(3) and the number of disk packs available for extended
periods of time is often inadequate for a complete bench-

mark. Even if the foregoing problems can be surmounted, the

logistics of duplication and transportation of card decks,

disk packs and tapes can create additional problems.

When one considers the conflicts involved in schedul-
ing computer Operators for the benchmark as well as continu-
ing production work, it is easy to see why the approach
generally taken is the preparation of a ”representative"
sample of job streams which are supposed to embody the
characteristics of the entire system and can be executed in
a limited time-~usually a few hours-~with limited data
files. This approach is still quite expensive and is
usually not used except in the case where very large compu-
ter systems are under consideration. The limited benchmark
approach still requires a great deal of preparation involv-

ing both computer time and analysts' time.

Another expensive approach to performance planning is
the use of simulation. This approach has several varia-
tions. In one variation, systems analysts build a model of
the computer system from a basic computer language such as
FORTRAN or PL/I. Another variation requires the analyst to
build the model using a generalized simulation language such
as GPSS, GASP, or SIMSCRIPT. The systems analyst sometimes
chooses to use a basic system model supplied by a software
vendor or consulting firm. In this case, the I/O configura-
tion, memory size, and CPU speed of the target system are
supplied as parameters to the basic system model. The major

processes of each program are then modelled as events,i.e.

CPU access, I/O access, and "interruption" or pre-emption of
the access to the CPU of a less important program (low
priority) by a more important program (high priority). This
method, while more practical than the first two approaches,
requires purchase of the vendor's software or his consulting

and educational services and can still be quite expensive.

In the real world, simulation has been used extensive-
1y to model large teleprocessing networks. Such networks
are not typical of most data processing organizations. Even
if the expense of the simulation approach were not prohibi-
tive, many users of small and medium sized computers do not
have employees with the skills necessary to do this kind of
study, nor do they think that the simulation packages

available are economically justified.

Sgheduligg

 

There is an urgent need need for easier or more convenient
performance prediction. Another important area for which a
predictive tool is required is computer scheduling. Compu-
ter scheduling, as used here, has to do with the sequences
and combinations of programs which are determined ggggggglly
to the computer, and is to be differentiated from schedules
which are determined by the computer's operating system,
which will be referred to as "task scheduling" or
"dispatching". Scheduling, in the context used in this

thesis, is normally a clerical function. The scheduler

tries to arrange the sequence of executions so that system
resources are balanced and deadlines are met [23]. To do
this, he must consider program memory size, peripheral
requirements, job dependencies, data availability and other
complex factors. The scheduler is really interested in

predicting the actual job run time unde; g specifig se; 9;

 

gircumstanceg, but normally uses the average job run time

 

that has been calculated over several executions of the job

being scheduled.

Presently, most batch job computer scheduling is done
using the average elapsed time for each job in the schedule
[5,21], even in cases where the scheduling function is
computer assisted. That is to say each program or sequence
of logically related programs (job) is considered to execute
for a fixed amount of time on a given computer system,
regardless of the charactersitics of all the other programs
in the system at the same time. The inadequacy of the
deterministic type of scheduling can be observed from the
non-deterministic nature of interactions between the jobs in
a computer system at any given time and the effects of

priorities.

Consider two programs which are assigned priorities or
levels of importance such that the lower priority program
will be forced to relinquish the CPU whenever the higher
priority program is ready to access the CPU (usually at the
completion of I/O activity). The lower priority program
will have to wait for the higher priority program to relin-
quish the CPU and will only have potential access to the CPU
during the times that the higher priority program waits for
I/O operations. It is easy enough to visualize that if the
low priority program always has to wait for the high priori-
ty program, and the high priority program never has to wait
for the low priority program, the respective behaviors of
the two programs will surely be affected if their roles are

reversed.

Extending the example to the case of several programs
will exaggerate the effects of program interactions in the
processor. The actual run time for a particular execution
of a given job or program will deviate from the calculated
average elapsed time unless it is run with precisely the
same set of other programs and identical assignment of

priorities as when the average was calculated.

Interactions within the processor are further compli-
cated by contention for channels and I/O devices. Returning
to the two program example, suppose each program accesses a

unique disk device and has no other I/O. Further suppose

that each program has a nontrivial disk utilization, 20% for
'example. If the data which these two programs access are
consolidated on one disk drive, it is inconceiveable that
the execution or run times of each of the two programs would
be unaffected unless other (compensating) changes were also

made.

There is another type of variation in job durations in
a typical business oriented data processing system-~the
variation due to transaction volumes in transaction driven
applications. For such applications the elapsed job times
are proportional to the number of transactions processed if
gghgr phigqg 35g gqgal. Since techniques such as trend
analysis and exponential smoothing may be used for the
prediction of transaction volumes, the focus of this
research will be on the prediction of the more complex type
of variation in job execution duration-~the variation due to

interactions among several programs executing in a computing

system.

gapgh ygrsus Tipesharinq

Computer systems may be classified into two groups
according to whether or not the system is pgipagily dedicat-
ed to batch multiprogramming or teleprocessing (time-shar~
ing) multiprogramming. Within both groups, the dispatching
scheme or the method which is used to allocate the resources
of the computer system to competing programs or "tasks", may

be a priority system, a time-sharing system or a combination

of the two. The priority dispatching scheme has the primary
objective of optimizing the throughput (number of units of
work completed per unit time) of the mpg; importggt jghg g;
paskg. The time-sharing scheme, on the other hand, has the
primary objective of optimizing (minimizing) the "response"
time or the time it takes for each interaction of a terminal
with the CPU. The combination scheme of priority dispatch-
ing within time-sharing is an attempt to have it both ways.
Although either of the above dispatching strategies may be
found in batch multiprogramming systems or in teleprccessing
systems, the priority dispatching scheme tends to be associ-
ated with batch multiprogramming and time-shared dispatching
tends to be associated with teleprocessing systems because
this is consistent with the primary performance objectives
of these respective systems. For this reason teleprocessing
under the control of primarily batch multiprogramming
systems may use priority dispatching and batch-oriented
processing under the control of primarily time-sharing or
"interactive" systems will be dispatched using the
time-sharing discipline. Because most of the work on
time-sharing systems is not under the control of the data
processing installation but is initiated by the terminal
user, the prediction of system loads is a statistical
problem. It is useless to talk about scheduling a
time-sharing system in the sense that "scheduling" is used
in this thesis, therefore this research emphasizes primarily

batch-oriented multiprogramming systems.

10

This investigation sets forth the development of an
analytic model which is readily usable by computer system
analysts, computer schedulers, and hardware and software
vendors to predict gross computer system performance charac-
teristics (CPU, channel, and device utilizations and
throughput) as well as elapsed times and CPU, channel and
device utilizations by pggqrgg g; jgb. A discussion of the
background relevant to computer system performance predic-
tion and related models is presented in chapter 1, and the
model used in this research is developed in chapter 2.
Chapter 3 consists of an explanation of the parameter
estimation and validation procedures and the experimental
design. A presentation and discussion of the experimental
results is given in chapter 4. The research results are
discussed and recommendations for future research are

presented in chapter 5.

I. BACKGROUND AND PREVIOUS WORK

92

1::
I!
O
a;
(D
H

W)

 

ggptral e ve Network Mode

1H

Most of the models developed to predict the perfor-
mance of multiprogramming systems are closed, cyclic queuing
network models (so-called "central server" models) based on
the early work of J.R. Jackson [32] and later extensions by
Gordon and Newell [28]. These researchers determined the
conditions under which closed form solutions to network
queuing models were known to exist. The types of networks
which met these conditions were called "separable" and the
solutions were said to be in "product-form". The
product-form solution states that the equilibrium state
probability for the network is the product of the equilibri-
um state probabilities of the component service centers in

the network.

The central server model is based on the assumption
that the execution of programs in a multiprogramming system
consists of alternate periods during which each program is
either receiving or waiting for CPU service and periods
during which each program is either receiving or waiting for

I/O device service. Another general assumption in the

11

12

central server model is that at the completion of CPU
service, each program requests service from the i-th server
(I/O device) with probability P(i). A schematic diagram of

the central server model is given in Figure 1.
§immls Prelim assgiss Amie;

One of the better known computer performance models is
a model developed by D.P. Gaver, Jr. Gaver assumes an
identical probability distribution for the CPU demand of
each job and an identical exponentially distributed response
time for each I/O device in the system [26]. Parameters in
Gaver's model are the number of homogeneous jobs and their
CPU service time and the number of homogeneous I/O units and
their I/O service time. This model is a specific implemen-
tation of the central server model which has two stations,
the CPU and the parallel server I/O station. The I/O
devices in the Gaver model are treated as a pool from which
a request for I/O may be serviced by any device which is
idle. An arbitrary CPU service time may be modeled by
either an Erlang, Hypoexponential or Hyperexponential
distribution. Paging behavior is not explicitely modeled
but may be considered to be included in the overall I/O

rate.

 

13

W1

W2 .
I u ——)‘

“'1.

Central Server

Peripheral Servers
(Data Channels)

 

 

 

Figure 1. Central Server Model

 

14

§AEB.§ Flow Ages;

 

Fenichel and Grossman's Plow model [20] does not use
probability distributions and does not account for direct
program I/O but rather assumes a fixed relationship between
average compute time and paging. For this model the only
I/O considered is paging. The paging response is computed
from a response table. Simulation of the operation of the
paging device is used to develop a table of response times
under different paging rates. The Flow model makes no
assumptions about the form of the probability distribution

of paging service time.

gyglig Queuigg with Paging g_g Overhead

 

Lewis and Schedler's Cylic Queuing model [34] returns to
some of the central server assumptions and accounts for I/O
and idealized paging behavior using exponential distribu-
tions. This model assumes that program execution intervals
between page exceptions (requests for I/O) are identically
and independently distributed exponential variables. In
this way the dependence of paging rates on memory size is
avoided. Like other central server models, this model
considers the behavior of all programs in the system to be
statistically identical. This is equivalent to partitioning
the computer's main memory into equal sized segments and
having the page replacement algorithm operate lggglly (each

program would only steal pages from itself). If this were

15

not the case, independent execution intervals would not
hold. This model differs from most of the central server
models in that it explicitely includes the CPU overhead for

task dispatching and paging I/O.

ggod ct 9; Stage Model

 

Using some results by Cox on probability distributions
with rational Laplace transforms [15], Basket and Gomez [4],
and Muntz [15] extended the class of known closed queuing
networks with product form solutions to include certain
servers with general service time distributions by approxi-
mating the distribution with a combination of exponential
stages (see Figure 2). Using the method of stages, closed
form solutions are known to exist for: (1) exponential
servers with firstvcome-first-served (PIES) service discip-
line, (2) general servers with processor-shared (PS) discip-
line, (3) general servers with
last-comevfirst-served-preemptive (LCFS) service discipline,
and (4) infinite servers (IS) with general service distribu-

tions.

16

 

 

 

 

 

 

 

Figure 2. Rational Laplace Transforms

17

13
1::
1H
In
1H
Io
Io
1w
m
o
a
n
m
33'
L:

gearish node;

A Multiple Resource Allocation computer model has been
developed by Boyd [7]. This model handles the resource
requirements of the jobs in the multiprogramming mix in a
similar fashion to most central server models. However, it
goes much further in the sense that, given the level of
multiprogramming, the probabalistic aspects of job selection
for execution from the job queue are developed in great
detail. The selection criteria is based on the permanent
(execution) resource requirements of the jobs that are
waiting to be added to the multiprogramming mix. Although
this model is referred to as a batch multiprogramming model
by the author, in reality it behaves very much like a
time-sharing model. In fact, the dispatching strategy for
this model is a time-sharing strategy. As a batch multipro-
gramming model, this model is representative of an installa-
tion where there are a very large number of small jobs, with
few if any data preparation constraints, and no precedence
constraints (by the assumption of independence). Futhermore
there can be little if any external control of the job
schedule since job selection and execution are entirely

determined by statistical distributions.

18

Another recent computer performance model has been
developed by Boyse and Warn [8], and is a time-sharing
cyclic queuing model. The CPU intervals in this model may
be either constant or exponential and the only I/O modeled
is paging. The CPU modeled has parallel paths to the I/O
devices (drums) and, because the number of concurrently
executing programs is small (3), the I/O response time may
be treated as independent of the number of pending I/O
requests. The assumptions in this model agree very closely
with the features of the system for which it was developed,
a dedicated graphics terminal system, and was found to have

a very satisfactory fit.

Eclectic Model

A model by Brandwajn [9] has incorporated several
recent developments into the central server model. Brand-
wajn includes paging in his model and uses Belady's "life-
time function" [6] to determine the effects of memory
allocation on paging rates. Brandwajn also uses the "prin-
ciple of decomposition" [14] to simplify the calculation of
the equilibrium state probabilities of his model. The
principle of decomposition states that if the elements of a
subnetwork of the overall system have rates of state transi-
tions much higher than the remainder of the system, this
portion of the overall network will reach equilibrium

sooner. This means that the subsystem composed of the CPU

19

and paging device may be separately analyzed under the
assumption that the rate of paging is much higher than the
rate of direct I/O. The total system is then modeled as a
two-server system where one server is the disk I/O device

and the other is the composite CPU-paging server.

Sggg_gy f Queuing Mgdels

 

All of the above models are limited or inadequate for
the purposes of this research because they all use global
parameters to determine system behavior but say nothing
about individual programs that may be executing at a given
time. Futhermore they are really processor models that do
not give a very realistic treatment to the input/output
effects on the system. The assumption of homogeneity among
I/O devices and channels is a serious weakness of these
models for configuration and scheduling studies. The
homogeneity assumption is tantamount to saying that differ-
ent types of I/O devices aren't really very different
therefore they can be treated identically. It is known that
disk drives don't behave like tape drives, printers, or card
I/O equipment. Since the latter devices are dedicated to a
particular program at any point in time, the variation in
their response times is usually due to the interference
caused by devices used by other programs on the same chan-
nel. On the other hand, disks are normally shared among
programs, thereby causing variation in response due to the

potentially random positioning of the disk access arm before

20

I/O can take place as well as queuing time for both the
device and channel. The variation in response introduced by
shared disks is on the order of several times larger than
the time required for data transfer. This research consid-
ers another approach which overcomes many of the objections

to the central server models.

Syn het'g Work oa Benchmagk

 

The Synthetic Workload method [42] of Sreenivasan and
Kleinman gives good results but requires the solution of two
or more simultaneous, non-linear equations in six unknowns
for each job (program) being modeled. Implementation of the
Synthetic Workload proceeds as follows. Let x1 represent
CPU seconds and x2 the number of EXCP's (approximately equal
to the number of I/O operations). Dividing each of these
dimensions into L parts over the range of the x1 and 12
values for the actual workload, the percentage of observa-
tions in each cell of the total number of jobs will be the

joint probability density of the real workload.

The synthetic workload may consist of a smaller

collection of programs with ghg §_gg jgigp probability

 

funcgigg. If P(i,j) is the probability of the (i,j)-th cell
and MTOT' the total number of programs in the synthetic
workload, then the number of programs in the (i,j)-th cell

of the synthetic workload is given by

21

N'(i,j) = P(i,j) * NTOT' i,j = 1,2,...,L. (1.1.1)

Then, if X1(i) corresponds to the mid-point of-the ivth
partition of the X1 dimension, the constraint on total CPU

time for the synthetic workload is expressed as

sum[ X1(i)*N'(i,j) : i,j = 1,2,..,L ] = T. (1.1.2)

The joint probability distribution of the real workload is
duplicated by NTOT' executions of the same program, a
synthetic program [10]. The synthetic program used by
Sreenivasan and Kleinman simulates a file update process.
Its execution characteristics may be manipulated by varying
a set of six parameters supplied to the program by Job

Control Language (JCL).

P1, P2, P3, P4, P5 and P6, the parameters of the
synthetic program, correspond to the number of master
records created, the number of detail records created, the
number of executions of a "compute kernal" [33] per match of
the master and detail files, the number of times the file
update is repeated, I/O buffer blocksize, and record size
respectively. The functional dependence of x1 and X2 on the

six parameters can be expressed as

X1=K1+K2*P4+K3*(P1+P2)+K4*P4*(P1+P2)+K5*P2*P3*P4 (1.1.3)

22

X2=2*P4+(2*P4+1)*([P1*P6/P5]+[P2*P6/P5]). (1.1.4)

The constants K1, K2, 33, K4 and K5 may be estimated
by regression experiments, but since there are more indepenv
dent variables than equations, there is no unique solution
to 1.1.2, 1.1.3 and 1.1.4. A solution is therefore achieved
by choosing integral values for P1, P2, P5 and P6 and
iterating on the values of P4 and P3 until the calculated
and the "given" values for x1 and x2 agree. This must be
done for each of the UTOT' programs in the synthetic work-

load.

A shortcoming of the synthetic workload is that it
requires the actual setup and execution of the synthetic
programs under the operating system being modeled. A
further complication is the requirement that the target
system be available for execution of the synthetic workload.
Although the version of the synthetic workload model
discussed here can be extended to account for paging and I/O
response time, such an extension will increase the computa-
tional complexity many-fold. Compared to the representative
sampling approach, the synthetic workload does not require
as much planning and preparation since only one program is
involved and it generates its own data. For benchmarks in
which the primary objective is comparison among alternative

CPU's, the synthetic benchmark is superior since pglgtive

 

performance is the basis for decision. For benchmarks in

23

which gyghgh alternatives are being considered, the specifi-
cations, allocation and distribution cf data for the I/O
subsystem becomes much more critical. In this case, exten-
sion of the synthetic model and more planning becomes
necessary, thus negating some of the advantages of the

synthetic workload model.

Both the representative sample benchmark and the
synthetic benchmark share the disadvantage of being unsuita-
ble for scheduling applications; the former because predic-
tion involves hghhhhhy hhhhihg the job streams, and the
latter because the synthetic jobs run hpproximatghy hhg ghm

length 2: time a

211.2 £...eal ﬁshe-

f Easing ﬁsheries

Most models of paging behavior were developed as an
aid in evaluating paging algorithms for operating systems.
Denning's "working set" model [18,19] is defined in terms of
the collection of pages of a program which have been refer-
enced during the process interval [t-TAU,t], where t is an
instant in time and TAU is an interval of time. Denning
defines the working set size to be the number of distinct
pages in the working set. He proposed the use of TAU by the
operating system software (and hardware) as a parameter to

control page residency.

24

Denning showed the working set size to be an increas-
ing function of the parameter TAU and the page fault rate
(the rate at which a program tries to access pages which are
not present in main memory) to be equal to the negative
slope of the working set size. The working set size func-
tion depends on some knowledge of the underlying probability

distributions of the memory reference patterns.

The working set concept (as defined by Denning) has
proven useful in the analysis of page management algorithms
but does not serve well as a predictor of paging behavior
for systems which do not use the working set parameter TAU

to control page residency.
Thg ifetihg Punct oh

A function proposed by Belady and Kuehner, the "life-
time function" [6], is based on a model of independent
references to the pages in a program. The independent
reference model assumes that each memory reference is an
independent Bernoulli trial relative to each page of the
program, where the probability of a reference to page i is
given by q(i). Belady and Kuehner's independent reference
model is a special case where q(i) = 1/S for a program with
5 pages. For this program, w pages in main memory results
in a probability of (S-w)/S that a page reference will

result in a page fault.

The lifetime function is defined to be the expected

25

number of consecutive references before a page will be
referenced which is not in main memory (thereby causing a
page fault). Using the geometric distribution, the lifetime

function is the found to have the following form:
e(w) = w/(S-w). (1.2.1)

Belady and Kuehner then proceed to approximate the indepen-

dent reference model by
e(w) = A*w**k, 1.5<k<2.5 (1.2.2)

on the basis that real programs do not obey the independent

reference model.

In this extension to the model, k is a function of the
program's page reference behavior and A is a function of
both the particular processor in which the program is
executing and the program reference patterns. The factor A
may be further decomposed into the product of K, the average
instruction execution rate of the processor, and A', the

average page reference per instruction.

26

Oden and Shedler developed a model of paging behavior
which is based on a Semi-Markov process [38]. They define
equivalence classes for the states of their model to reduce
the state space and present a solution for the steady state
probability distribution of the reduced state space. They
assume that the transition probabilities for each of the
reduced states is known and that the page frame (page of
real memory) to be "stolen" for a paging operation is
selected randomly. They also assume that the N programs in
the multiprogramming set are dispatched First-In-Pirst-Out

(FIFO) and that they are statistically identical.

These last few assumptions make this model unsuitable
for use in modeling a priority dispatched operating system
with a specific workload. The assumptions make this parti-
cular model more useful for a time'sharing system, and for
investigating paging behavior with regard to the determina-

tion of the optimal value for N, the multiprogramming level.

The page fault rate, steady state page residency and
page fault probability have been modeled as a Markov process
by Franklin and Gupta [24]. Using page transition diagrams,
FIFO and Least-Recently—Used (LRU) paging algorithms, they
developed a memory state transition matrix which could be

used to determine paging statistics.

While the foregoing approach is an interesting tool in

27

the examination of paging behavior and page replacement and
selection algorithms, it is defined in terms of a single
program's behavior and requires exact knowledge of the
transition probabilities for each page of a program. Direct
measurement of the variables necessary to compute the
transition probabilities of each program in an operating
system would impose too high a cost in system overhead. Of
course, it is possible for hardware to be designed to
achieve this function, but the economics of such hardware is
extremely doubtful. What can be easily measured by hardware
or software is occupancy, the amount of time spent executing
the instructions of a given page. Such an approach might be
used if one is willing to assume that the page occupancy
probability is a sufficient proxy for the actual transition
probability (this is equivalent to making the page transi-
tion probabilities for each page identical). Knowledge of
the occupancy probabilities does not yield enough equations
to estimate transition probabilities for each page of a
program. A final objection to this model is that the
computational complexity and system overhead associated with
it makes it impractical for all but very small, i.e. peda-

gogic examples.

28

A paging model that is based on fitting a lifetime
function to the shape hypothesized by Belady and Kuehner was
proposed by Chamberlain, Fuller and Liu [12]. This function

has the following form:

e(w) = 2*5/(1+(C/V)**2)o (1-2-3)

where w is the number of pages a program has in real memory
and B and C are its paging parameters. 8 is defined as
one-half the largest possible lifetime and C is the number
of resident pages which provides the process with a lifetime

of B.

The Simmls Lines; aodel

 

A simple linear model of demand paging performance
proposed by Salter [40] is based on the assumption that the
mean number of consecutive page references before a page
fault (exception) occurs is linearly proportional to the
size of main memory (Saltzer refers to this as "headway"
rather than lifetime). Saltzer's graphs for associative
memory headway show a significant deviation from linear at
small memory sizes and his graph of paging in the MUITICS
system has measurements at only two memory sizes, hardly
enough to determine curvature. He quotes three sources of
published measurements which report the lifetime (or

headway) increasing faster than linear.

29

Saltzer's model is stated in terms of the SEEAES h_;h
hghghy of the computing system and he asserts that the model
does not represent the behavior of a single program and does
not even predict headway at extreme values where paging
approaches zero. He then goes on to present examples of
individual program paging prediction by assuming an identi-
cal distribution of main memory pages to all processes in

the system.

The Resins Ines;

Another paging model which predicts system-wide paging
statictics is Bard's Paging Index model [3]. The Paging
Index (PI) is an emperically derived model that uses esti-
mates of working set sizes to predict system paging rates.
The working set size w is estimated externally to the model.
An estimate of working set size w, is made by software
monitors for each logged-on user of IBM's VM/370 time-shar-
ing system. Given a pageable main memory size of M, and the
average number of logged—on users N, the storage saturation

factor S is given by

s = N*U/M. (1.2.4)

The paging rate is then estimated by

30

P = 1*(s**2)/4 05 s 52

= I*(S-1), 2

IA
U)

(1.2.5)

where the single parameter I characterizes the entire P

versus 5 curve.

Bard's measurements of several systems supports the
apparent deviation from the linear model in some of Salter's
diagrams of paging measurements. Like Salter's model, this
model is not very useful for predicting the behavior of a
specific program although it has proven useful in predicting
average behavior for a composite or "typical" program in a
time-sharing environment under the assumption that the
probability distributions of the transactions (programs)
arriving from the terminals in the system are statistically

identical.

Thg P gg §2£ViVél Ind x

.- m..."

Bard's Page Survival Index [1,2] is typical of a class
of paging models that is based on dynamically computed
statistics. The Page Survival Index (PSI) is a measure of
the operating system without being selected for replacement.
Since a program is most likely to lose pages while it is in
an interrupted state, Bard considers the PSI to be a very
good representation of an individual program's paging

behavior.

31

The definition of PSI requires the measurement of
several dynamic paging statistics for its estimation. The
PSI is used by Bard as a response variable in the control of
the page management system and scheduler by feedback. This
model does not lend itself to prediction since it requires

the continuous calculation of dynamic variables.

Of all the paging models presented, most do not serve
well as models of paging behavior for performance prediction
since they were developed for use in gghhrolling the paging

process with feedback [19,1,2].

£49 models

 

As most models of I/O behavior were developed as a
part of multiprograming models, there are few general models
of I/O behavior. Of the I/O models that do exist, most are
simple exponential models of non-specific I/O devices. The
closest thing to I/O models are models for the investigation
of disk scheduling policies and the effects of various disk

organizations.

Disk Assesses Amie;

Seaman, Lind and Wilson analyzed disk I/O as an
integral part of overall teleprocessing system design [41].
They assumed Poisson arrivals for all disk requests and they
also assumed equal traffic to each disk module in the

configuration. For the disk service times they assume

32

identical but arbitrary probability distributions. They do
make some suggestions as to modifications to their model in

order to account for unequal traffic to the disk modules.

Dist 2.9 Pram Schedu ins £296 s

Denning developed models of both disk and drum file
systems to study the effects of different scheduling poli-
cies on the response times of direct access devices [18].
Among the scheduling policies Denning investigated were: (1)
Shortest- Seek-Time-First (SSTF), (2) Shortest-Access—Time-
First (SATF) for drums, (3) First-Come-First-Served (FIFO),
and (4) SCAN, which involves sweeping the disk access arm
back and forth across the surface of the disk, stopping at
any cylinder for which there are requests. He concluded
that SATF was the optimal policy for drum scheduling, and
that the SCAN policy was superior to FCFS, which is in turn
superior to SSTF. Many of these ideas have been incorporat-

ed in today's operating systems.

Teory studied the same scheduling policies as Denning,
but he added variations to the SCAM policy [43]. The
Circular Scan (C-SCAN) policy involves serving disk requests
only while the acccess mechanism moves in one direction
(usually from the outer cylinders toward the inner cylin-
ders). The N-step Scan (N-SCAN) allows requests to be
serviced while the disk arm moves in both directions but all
requests which arrive while the arm is sweeping in any

direction is batched for service during the return sweep.

33

Assuming uniform I/O request distributions, Teory found the
C-SCAN policy to be superior at I/O rates greater than 40
requests per second and the SCAN policy to be superior at
rates less than 40 requests per second. The N-SCAM was

found to be worse than SCAN or C-SCAM at all rates of I/O.

The Dish eggs AL“;

Waters [44] derived formulas for average seek distance
and average seek time for both sequentially and randomly
accessed disk files (under the assumption of uniform distri-
bution of accesses). He also derived formulas for computing
the average seek distance and time for files that do not
have uniformly distributed random accesses. Waters demons-
trated that the average seek distance between two files is
the difference between their mid-points and that the file
access time is minimized by placing the highest activity
records of a randomly accessed file at the center of the

file.

Wilhelm elaborated the model developed by Seaman, et
al. in a general disk performance model in which he assumes
neither a uniform distribution of workload over all disk
modules (spindles) nor a uniform distribution of accesses
over any disk module. Like most other models of I/O behav-
ior, this one assumes that the requests for disk I/O are

generated by a Poisson process. Wilhelm places no

34

restrictions on the service time distributions other than

the requirement that their Laplace transforms exist.

Of the models of disk I/O mentioned above, the Seaman,
et al. and Wilhelm models are perhaps the most useful for
the purposes of this research. Denning's model and Teorey's
model are of use in understanding disk scheduling but are
less helpful since the operating system used in this
research has not implemented SCAN disk scheduling. Although
Waters' article was written for the file designer and has a
practical orientation, some of his techniques were applied
in the elaboration of the I/O submodel presented in this

thesis.

II. MODEL DEVELOPMENT AND ANALYSIS

‘3’

£35.. .stI Reggirsmsmts

For the purposes of this research, an analytical model
is required which, in some way, accounts for priorities,
system overhead and paging as well as a normal configuration
of I/O devices. The analytical model is required because of
the necessity of having a performance predictor which can be
used for external scheduling. Simulation or synthetic
benchmark techniques are too time consuming for this purpose
although they are very well suited to performance analysis
in connection with a major equipment acquisition. Another
reason why there is a real need for the type of model
described in the remainder of this section, is that the
flexibility of data processing management should not be
limited by the necessity of authorizing a major study in
order to be able to answer relatively simple "what if"

questions from tap management.

The model should be usable by typical data processing
systems personnel without special training and should use
§1§£.£ gaptured 9323 to generate the program related charac-
teristics used in prediction. In addition, computer Opera-

tions management and systems programmers can benefit from a

35

36

straightforward tool for studying the effects of scheduling
changes and operating system parameter changes in their
efforts to run a "near optimum" operation. A necessary
requirement for scheduling is that the model predict the
performance of individual programs operating in a multipro-
gramming mix. Furthermore, the predictor should operate
with a set of program characteristics which are very nearly
iAZQEAAAE under different operating systems or hardware

configurations.

Such characteristics are known to exist within a
family of computers such as the IBM System/370 line using
either the Disk Operating System for Virtual Storage
(DOS/VS), or the Operating System for Virtual Storage I
(OS/V51). This is true because these operating systems are
enough alike that differences in execution characteristics
of a program compiled under these two systems would be
mainly reflected in CPU overhead, since the structure and
dispatching scheme of these two systems are very nearly the
same. In other words, the differences between these two_
systems can be reflected by differences in the parameter

values of the basic model.

Since the purpose of this model is the prediction of
the performance of an existing set of programs on a computer
system which may be different from the system on which the
programs are currently executing, or the prediction of the

performance of each program in a collection of programs

37

under a specific set of conditions (scheduling), the
required characteristics can be estimated from the data that
job accounting routines normally collect in most operating
systems. The data collected for each execution of a program
includes: (1) the number of I/O operations per execution,

(2) the number of I/O operations by device, (3) average
block sizes of data transferred by device or file identifi-
cation, (4) path length or the approximate number of
instructions processed per program execution (see M. Reis-
er:39), (5) job elapsed time, (6) system wait time, (7)
number of page-in and page-out operations, and average
working set size. The data captured by the job accounting
routines may then be used to estimate the mean CPU service
time, the mean I/O service time by device, and the probabil-
ity of I/O to each file or device following a CPU service.
These statistics may be estimated for each program which is

executed on the base system.

Program statistics as defined above can be used with
system parameters such as configuration, hardware Speeds,
memory size and instruction execution rates to determine the
performance and duration of each job in a given mix.

Program parameters which are hgh invariant with respect to
the configuration, hardware speed or multiprogramming mix
may be expressed as functions of these factors. For exam-
ple, the paging rates for programs operating in virtual
systems depends upon the speed of the CPU, speed of the

paging device(s), and the memory reference patterns of all

38
the programs in the system [18,27].

The model consists of three submodels: (1) a CPU
submodel, (2) a Paging submodel, and (3) an I/O submodel.
The CPU and I/O submodels are cylic queuing models
[25,35,46] and the paging submodel is a theoretical model
based on independent references and observed program paging

behavior.

can Sshmsésl

The CPU submodel used in this research is based on a priori-
ty interrupt driven dispatching scheme. One problem with
modeling this type of system is caused by the fact that
priorities are accounted for. Another difficulty is that a
finite number of §ources in a queuing system is more diffi-
cult to model than an infinite source system. The classical
"machine repairman" model fails because the behavior of each
program in the system cannot be considered to be statisti-
ically identical. This is true because of the requirement of
predicting individual program behavior for a priority

dispatching operating system.

Because of the intractibility of finite source models,
infinite source models were used to approximate the finite
source models. A program's execution interval or elapsed
time is divided into alternating periods of CPU and I/O

activity. Situations which violate this condition (such as

39

double buffering or overlapping CPU and I/O activity for the
same program) were excluded from the analysis. The CPU
phase may be thought of as being broken up into two distinct
subintervals; the mean "initial wait" interval during which
the program is waiting for the CPU following I/O, and the
mean "completion" interval which is total elapsed time from
the instant at which the program gains access to the CPU
until it relinquishes the CPU for an I49 gpghhhigh g;
ppgghhh ESEAABAEAQB [25]. These two intervals are designat-
ed W and C respectively. The mean interval during which the
program is waiting for I/O completion is designated as IO,
the inverse of the instantaneous access rate LAM, which is

assumed to be fixed in the CPU submodel. The mean CPU

"cycle", ET is the sum of W, C and I0 (see Figure 3).

SEE OEPAQIIQP TIPS

The completion time C, consists of the CPU quantum
attributable to the program being considered B, plus the
system overhead involved in task switching, initiating and
terminating I/O for this program. The completion time also
includes all the time which the program in question spends
waiting for the CPU after it has been phg;ghphgg hy h AASAEE

Brissifx 2229232-

We designate the priority level as well as the identi-
ty of each program by the index j, where lower values of j
represent higher priorities. The values for j are, of

course, limited to positive integers.

40

 

 

 

 

 

Initial Completion I/O Response Time
Wait Time
W C IO
Cycle Time
ET

Figure 3. Elapsed Time "Cycle" (ET)

 

41

The average CPU time quantum may be estimated from the
quotient of program state time divided by 1 plus the number
of I/O operations issued by a given program. This quantity
is represented by 1/HU(j) for program j. MU is the instan-
taneous CPU access rate in ghg absen e o paging, and B(j)
is defined as the mean execution interval before any I/O
(including paging) therefore B(j) is less than or equal to

1/HU (j) .

The total CPU quantum attributable to task j for each
cycle is represented by TAU(j), where TAU is composed of
B(j) plus an expression for the CPU overhead due to task
switching, initiating and terminating I/O operations. It is

assumed that TAU(j) is exponentially distributed.

In deriving the CPU submodel program 1 will be
considered first. The completion time for program 1 is
unaffected by any other task since this task has the highest
priority. The exception where processing by a higher
priority task is interrupted to handle the completion of 1/0
by a lower priority task is ignored. The completion time

for task 1 is given by

C(1)=TAU(1). (2.2.1)

Next, program 2 is considered. Its service time will
be TAU(2), but this may be spread over a longer interval if

program 2 is precempted by program 1. While program 2 is

42

using the CPU, program 1 must be waiting for I/O completion.
From the definition of the I/O completion time, I/O comple-

tions occur at the mean rate LAH(1).

0n the average LAH(1)*TAU(2) I/o operations for
program 1 occur during one "completion" time for program 2,
assuming that the I/O completion time is exponential.
Service time for program 1 will then cause a delay of
LAH(1)*TAU(2)*C(1) in the service of program 2. The comple-

tion time for program 2 will be given by

C(2)=TAU(2)+LAU(1)*TAU(2)*C(1)
=TAU(2)*(1+LAH(1)*C(1)) (2.2.2)

=TAU(2)*(LAH(1)+1/TAU(1))*C(1).

At this point a new element is introduced. It is now
possible for program 2's I/O to complete while program 3 is
interrupted by program 1. The maximum number of I/O comple-
tions by program 2 during this interval is the minimum of
LAH(2)*LAH(1)*TAU(3)*C(1) and LAH(1)*TAU(3) since there can
only be one I/o completion by program 2 during a completion
time for program 1. This quantity is represented by

H(2)*TAU(3). We then have for program 3

#3

C(3)=TAU(3)+LAH(2)*TAU(3)*C(2)
+LAH(1)*TAU(3)*C(1)+TAU(3)*H(3)*C(2) (2.2.3)

=TAU(3)*(LAH(2)+1/TAU(2)+H(2))*C(2).

Proceeding by induction and using similar logic, it can be

shown that

C(j)=TAU(j)*(LAU(j'1)+1/TAU(j'1)+H(j‘1))*C(j‘1)
H(j)=ﬂin[LAU(j)*(C(j)/T30(j)'1)thj)1
D(j)=D(j’1)*LAH(j'1)+H(3’1) (2-2-“)

for H(1)=D(1)=0 and j=2,3,...

Using the definition of completion time C, and
assuming exponential distributions for CPU service time TAU
and I/O response time IO, we approximate the expected value
for the completion time squared to compute the coefficient
of variation of completion time and thereby the following
approximation for PAC which will be needed to compute the

waiting time W.

FAC(j)=FAC(j-1)*[1+(2*(LAH(j-1)/TAU(j-1)
+H(j'1)*(LAH(j-1)-1/TAU(3'1))) (2-2-5)
+H(j-1)**2)*(C(j-1)*TAU(j)/C(j))**2]

with PAC(1)=1.

44

ggg Waiti 3 Time

Derivation of the initial wait, W is not so easy since
we do not have a concept similar to completion time to work
with in this finite source queuing process. We begin by
considering the two states, CPU busy and CPU idle with
respect to program j. Again we need not be concerned about
programs with index greater than j since they can be
pre-empted by program j. With j=1 we immediately have

W(1)=0 since there are no higher priority programs.

Next, we consider W(2). Here we assume that CPP(1),
the CPU utilization due to program 1, has been previously
computed. Since the only busy time that can affect program

2 is caused by program 1, we have

3(2) =CPP(1)*FAC(1)*C(1). (2.2.6)

FAC(1) is a scale factor which represents the portion of
C(1) which is the expected wait time for program 2 when it
finds the CPU being used by program 1. PAC is related to
the coefficient of variation (Cs**2) for completion time by

the expression PAC(j)=(1+Cs(j)**2)/2.

Deriving the expression for the wait time for program
3, we must account for the expected wait time due to program
1, program 2, ggg the wait time incurred due to the comple-
tion of program 2's I/O while waiting for program 1 to

relinquish the CPU. We designate the latter quantity by

Q5

T(2)*C(2)/LAH(2)*ET(2), where T(2) = CPP(1) *
min[LAH(2)*C(1),1]. The rationale for this expression is
that if program 3 completes during an execution interval for
program 1 and program 2 is in the I/O stage, then a maximum
of one I/O completion by program 2 can take place and
LAH(2)*C(1) will take place if this number is less than one.
Assuming that the cycle time for program 2 (ET(2)) has been
computed, the probability that program 2 is in the I/O stage

is approximately 1/LAH(2)*ET(2). Thus we have

W(3)=CPP(2)*FAC(2)*C(2)+CPP(1)*FAC(1)*C(1)
+T(2)*C(2)/LAH(2)*ET(2) (2.2.7)

=H (2) 1’ (CPP (2) *FAC (2) +T (2) /I.AH (2) *ET (2) ) *C (2)

For w(u) we have

W(4)=CPP(3)*FAC(3)*C(3)+CPP(2)*FAC(2)*C(2)
+CPP(1)*PAC(1)*C(1)
+T(2)*C(2)/LAH(2)*ET(2) (2.2.8)
+T(3)*C(3)/LAH(3)*ET(3)

where T(3)=CPP(1) * min[LAn(3)*C(1),1] + CPP(2) *
min[LAu(3)*C(2),1]. By applying 2.2.6 and 2.2.7 and gather-

ing terms we have

146

W(Q)=CPP(3)*FAC(3)*C(3)+W(3)
+T(3)*C(3)/LAH(3)*ET(3) (2.2.9)

=H (3) + (CPP (3) *FAC (3) +T (3) /LAM (3) *ET (3) ) *C (3) ) .

Applying an inductive argument, it can be shown that the
following expressions may be used to compute the mean

initial wait interval:

V (j) =9 0") 1' (CF? (1") *FACU‘U
+T(j"1)/LBH(J"1)*ET (3‘1))*C (3'1) (2-2-10)
T(j)=sum[CPP(l)*min[LAH(j)*C(l),1]:l=1,..j-1]

W(1)=T(1)=0 for j=2,3,...

922 2:iliz.si22 29g Eleeseé 2;.2

With W(j) and C(j) determined and IC(j) considered
fixed in the CPU submodel we have but to determine the mean
elapsed time per cycle ET(j), and the CPU utilization

CPP(j). We have the following definitions:

ET(3)=CU)*W(3)*IOUM (2-2-11)
CPP(3)=TAU(j)/ETU) : (2-2-12)
CP(3)=B (3)/ET (1i): (2-2-13)

and

coa(j)=CPP(j)-cp(j). (2.2.14)

Q7

CP(j) and COH(j) represent program j's mean "problem state"
utilization and "system state" (overhead) utilization

respectively.

2.29,.1119 Submeésl

The paging model was developed by fitting a curve to
observed paging behavior of programs in virtual systems, but
turns out to be identical in structure to Belady and
Kuehner's independent reference model. It may be observed
that the paging rate of a program in a virtual system is
inversely proportional to the amount of memory allocated and
directly proportional to the difference between the size of
the program and the amount of memory allocated (provided the
difference is not negative). Furthermore, the instantaneous
paging rate increases unboundedly as the amount of real
memory allocated to the program approaches zero, and
decreases to zero as the amount of memory allocated
approaches the actual size of the program. Representating
the amount of memory required by the program to execute
without paging by S, and the mean amount of memory allocated
by w, the paging rate is then proportional to the maximum of

zero and (S-w)/w.

“8

Aside from the effect of a program's own working set
size on its paging rate, the working sets of all programs in
the system taken together have an effect on each program.
This effect is similar to Bard's Paging Index [3]. The
"critical memory" H(j) is defined as the size of program j's
working set below which the program will begin to do
non-trivial paging (above .5 pages per second) provided CPU
cycles are available. Defining constants of proportionality
K as a function of CPU instruction execution speed, A as a
function of the number of memory references per instruction,
and R as the total pageable memory, the instantaneous page

exception rate is estimated by

CER(j)=max[O,K*A(j)*(S(j)-w(j))*EXP(N)/w(j)], (2-3-1)

EXP(N)=[sum[H(l):1=1,,N]/R]**q.
The real-time page exception rate PI is estimated as
PI(j)=CER(j)*CP(j)o (2-3-2)
where CP(j) is the portion of CPU utilization attributable
to program j not including system overhead. The assumption

is that system overhead processing is performed using fixed,

non-pageable memory.

49

mo Allssst.

The problem now is to allocate the pageable memory R,
to the N programs considered to be in the multiprogramming
mix. 0f the many possible ways to achieve this partition-
ing, the most reasonable appears to be to partition the
memory in a way that minimizes the total system paging rate.

The problem may be stated as follows:

minimize PN=sum[PI(j) : j=1,2,..N] (2.3.3)
subject to sum[w(j) : j=1,2,..N] < R (2.3.ﬂ)
and w(j) S S(j) for j=1,2,..N. (2.3.5)

Except for not being differentiable at the constraints
w(j)=S(j), this problem satisfies the Kuhn-Tucker condi-
tions. Applying the knowledge which was developed about the

form of the expression for PI(j), a solution may be derived.

Ignoring the individual program size constraints

(2.3.5) for the moment, and applying the KuhneTucker theo-

rem, we have the following minimization conditions:

DPN/GVU) =‘K*A (3) *CP (3) *5 (3]) *EXP (N) l" (3) “2o (2. 3-6)

DPN/dw(j)=DPN/dw(i) for i,j=1,2,..N, (2.3.7)

where DFN/dw is understood to represent partial

SO

differentiation. These conditions reduce to the following

solutions
SUu=sum[(A(i)*CP(i)*S(i))**.5: i=1,2,...N], (2.3.8)

w(j)=R*(A(j)*CP(j)*S(j))**.5/SUH
for j=1,2,..N, (2.3.9)

"(i)=R*V(j)*(A(i)*CP(i)*5(i)/A(3)*CP(3)*5(3))**-5

for i'j=1'2'..No (203010)

If all the constraints w(j)<S(j) are satisfied, we
have the optimal solution since any deviation from this
solution will cause an increase in PR. If we suppose
otherwise, a rearrangement of the memory balance between any
two programs would result in DPN/dw(j)<DPN/dw(i) and the
only change that would produce a decrease in PR would be to
subtract memory from program i and add it to program j.

However, this would only yield the initial solution.

If the memory constraints are active for some of the
programs, i.e. w(i)=S(i), no reallocation that adds memory
to such a program could cause a decrease in PR since they
are already at the minimum page rate (zero). However
removing memory from a program that does not have the memory
constraint active will result in an increase in the paging

rate.

On the other hand, if the initial solution to the

51

system results in the situation w(j)>S(j), for all programs
j, we are finished and we simply set w(j)=S(j) since the
composite constraint is not binding, i.e. sum(w(j) :

j=1,2,..Ni is less than available pageable memory R.

The fourth case is the most interesting (and probably
the most common in practice). In this case at least one of
the programs' memory constraint is violated and at least one

is not binding. Let Q=[j : 1SjSN,w(j)ZS(j)] and let H:[j

1SjSN,w(j)<S(j)]. For every j in Q we set w'(j)=S(j),
compute R1=sum[w(j)-S(j) : j in Q] and reallocate
R1+sum[w(j) : j in H] to all the programs whose indexes

belong to B.

Let us re-examine the set [w(j) : j in H]. We see

that minimization of the paging rates over the set [w'(j)
j in 8] subject to sum[w'(j) : j in H]=sum[w(j) : j in H]
yields the same set of solutions as minimization over all g

of the program working sets.

We have the following solution

w'(1)=sun(w(i):i in H]*(A(j)*cp(j)*S(j))**.5/sua

for j in 8, (2.3.11)

where SHH=sum[(A(i)*CP(i)*C(i))**.5 : i in H]. Applying the

equilibrium condition

52

W'(j)*(A(i)*CP(i)*5(i))**-5='(i)*(A(j)*CP(j)*S(3))**-5

we have

w'(j)=sum[w(j)*(A(i)*CP(i)*S(i))**.5: i in Hj/sna

=w(j) j in H. (2.3.12)

It is clear that, if we add the amount R1 to sum[w(j)
: j in H], all of the w'(j) for j in B will increase and
thereby decrease the value of the partial derivative DFN/dw.

Thus we have w'(j)>w(j) for all i in H, implying that

DFN("(j))/d'(j)(DFN("(3))/d'(j)=DFN(V(i))/dﬂ(i) (2-3-13)

DFN (11' (j) ) /dW (j) (DPN (W (i) )/dW (i) (DPN (S (i) )/dW (1) (2.3.14)

for j in H and i in Q.

This reassignment cannot produce a situation where any
reallocation from the set [w'(i) : i in Q] to the set
[w'(j): j in H] can result in a reduction in PR. We can
drop the former set from further consideration and proceed
as we did initially. Applying the algorithm to the set
[w'(j): j in B], we must have either all constraints bind-
ing, no constraints binding, or a combination in which case
the foregoing logic is again applied. We eventually arrive
at a stage where either Q or H is empty and the algorithm is

terminated.

53

Because the paging mechanisms used by most existing
operating systems are not optimal we may modify this proce-
dure to reflect more realistic operating system behavior.
For example, the paging algorithm may be biased toward the
higher priority programs in the type of operating system for
which this analysis is intended. We may wish to define a
scale factor, L(j)=(1/m)**j*a, where m is greater than or
equal to 1 and a is greater than or equal to zero. The

Kuhn-Tucker conditions then lead to the following:

A(j)"'CP(Ii)"‘5U)"‘1-(Zi)"W(i)"""2'-'A(i)"‘CP(i)'"S(i)”"1-(1)"“'(j)"""2

for i,j=1,2,..N. (2.3.15)

An example of a biased paging model is one that yields
the constant ratio w(j+1)=p*w(j) provided that p is greater
than zero and less than or equal to 1, and that
A(j+1)*CP(j+1)*S(j+1)=A(j)*CP(j)*S(j). This leads to the

parameter L(j)=p**(j-1) and the solution is
H1)=(1"P)*R/(1‘P**N)u (2-3-16)

W(j)=V(1)*P**(j’1)- (2-3-17)

In parameterizing the paging submodel L(j) is one of
the values that we could estimate. The actual I/O estimates
for the paging model will be calculated by the I/O submodel

which will now be developed.

I49 JEEViQY.

The I/O model used in this research is the composition
of a number of simpler models of assumed independent compo-
nents. The I/O response time of the composite model will be
computed as a sum of random variables with uniform, exponen-
tial and Erlang distributions. This model will be elaborat-
ed primarily as a disk I/O model since disk activity is the
predominant form of I/O activity in most modern operating
systems and disk I/O is also that aspect of the system which
causes the most difficulty in modeling. Disk I/O is so
prevalent because disks are used for paging devices, data
staging devices (temporary storage), and permanent storage
devices. The Unit Record (card and print) activity is
generally "spooled" on disk. This means that card input and
output as well as print operations are actually disk opera-
tions gggigq use; r ram ggecutiop, with the card input to
disk taking place prior to program initiation and card and
print output taking place after program termination. The
Unit Record (U/R) devices are driven by special system

programs called "spoolers" or "readers" and "writers".

55

The operation of the I/O submodel is related to the
CPU and Paging submodels as follows: The I/O wait time in
the CPU submodel is actually the aggggqg wait time over all
the forms of I/O in which the program engages, igglggigq
pgqigq. The I/o access rates generated by the CPU submodel
are gggpggiggg of the rates for each program's files. The
device access rates are the aggregate of the access rates by
program. A program's access rate to a given device is the
product of the program's total number of accesses to that

device times the program's composite access rate divided by

the program's total number of I/O accesses.

Let the number of accesses to file k on device 1 by
program j be given by N(i,j,k). Then the total number of

I/O operations by program j is

N‘j)=8um "(i,j'k’: i=1'2'ooND;k=1'2'ooNF]

j=102r00N: (2.“.1)

where ND is the total number of devices and RF is the number
of files on device 1. Since the number of accesses by
program j to device 1 is N(i,j), the access rate to device 1

by program j is then

56

ARAT(i,j)=N(i.j)/N(j)*BT(j)

i=1'200ND and j=1'2'ooNo (2.“.2,

The total access rate to device 1 is then

ARAT(i)=sum[ARAT(i,j): j=1,2,..N]. (2.“.3)

To simplify the computations, the arrival rate of I/O
requests at the I/O device queues is assumed to have a
Poisson distribution. Even if it was assumed that both the
CPU and the I/O devices were exponential servers, there
would not be Poisson arrivals in general because the CPU

uses a priority-resume service discipline [32].

For the case of card or print I/O we will assume a
constant service time. The U/R service time will depend on
the number of cards or lines transferred (to disk) per I/O
operation and the speed of the I/o devices. The only
variation in the response time for these devices will be
that due to channel contention. A program which issues an
I/O operation to a U/R device will only have to wait on the
channel. Because these devices are gggiggtgg, a program
accessing them will never find them busy since we have

excluded the possibility of buffered gpggations.

Another type of dedicated device is magnetic tape. In
this case the mean service time depends on the mean block-
size of the data transferred and on the tape transfer speeds

of the magnetic tape devices. In addition to channel delay,

57

we will also experience 993539; 33;; delay. The latter form
of delay is included with channel delay or ignored as
negligible since the configurations to which this research
applies rarely have more than one tape control unit per

channel.

The channels to which U/R devices, tapes and disks are
attached are almost always unique so that these components
of the I/O subsystem may be treated separately in computing

the I/O response.

The most complex portion of the I/O subsystem is the
part which deals with magnetic disks. It is assumed that
the size, location and disk identification for each logical
file accessed by the programs in the system are known

(either from job accounting data or otherwise).
Qisk I49 node;

Three basic patterns of disk access are considered:
(1) uniformly distributed random access over the cylinders
of a file, (2) uniformly distributed ipggggg access over the
cylinders of a file (requiring prior access to an index
before accessing the data record), and (3) sequential
access. A disk access mode diagram appears in Figure a.
This is only a partial diagram of the variations on the
three major modes of disk access. In what follows, fixed
record (block) sizes, "verify" option for all writes, a

uniform distribution for rotational delay, and a uniform

58

distribution of accesses for random access to both random

and indexed files is assumed.

It should be noted that some of the equations devel-

oped in this section are herdware or ipplementation depen-

 

dent. In particular the access methods are IBM implementa-
tions and the disks are IBu 3330 disks. These hardware

dependencies will be pointed out where applicable.

A timing diagram for indexed access appears in Figure
5, and one for random access in Figure 6. The diagram for
sequential access difffers from the latter only by the
absence of a significant seek time component. In what
follows, the occurance of equations with indices i,j and k
can be assumed to imply that the equations will hold for the

entire range of each index unless stated otherwise.

59

 

 

Sequential Indexed Direct

Access Access Access
Write Read Read Write Read Write
No Core. No Core. No

Verify Verify Index Index Rewrite Add Verify Verify

 

 

No No
Verify Verify Verify Verify

Figure 4. Disk Access Mode Diagram

60

Cylinder Index Read

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.

Indexed Access Timing Diagram

wait Seek wait Wait Wait Search &
for Cyl. for for for Transfer
) Device Index Channel Record RPS Index
TWD TSl TWC RD/2 TWR TSCl
Track Index and Data Read
Wait Seek Wait Wait Wait Search &
for Track for for for Transfer
Device Index Channel Record RPS Index
TWD T82 TWC RD/2 TWR TSCl
wait Wait Wait SearCh &
for for for Transfer
Channel Record RPS Data
TWC RD/2 TWR TSC2

 

 

61

Direct and Sequential Read

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

wait Seek Wait wait Wait Read
for Cyl. for for for Key 8
Device Channel Record RPS Data
TWD TSK TWC RD/2 TWR TSC
Direct and Sequential write
Wait Seek Wait Wait Wait write
for Cyl. for for for Key &
Device Channel Record RPS Data
TWD TSK TWC RD/2 TWR TSC
wait Wait Read
for for Key &
Record Channel Data
RD/2 TWR TSC
Figure 6. Random Access Timing Diagram

 

 

62

Direct Access Disk I/O

In computing the disk file access time it will be
assumed that the disk hardware uses the £95g§ieeel peeigi.e
eeeeieg (RPS) technology. This means that the channel will
be allowed to disconnect from the disk device during both

seek egg search operations. A geometric distribution is

assumed for the delay due to RPS.

Let BS(i,k) be the average block size for file k on
device i, which is accessed by program j. Let CYL(i,k) be
the size of the file and LC(i,k) its mid-point. We define
the channel service time, TSC, due to access to file k on

device i as

TSC(i,k)=2*RD/128+BS(i,k)/TR, (2.u.a)

where RD is the revolution time, TR is the disk transfer
rate and the term 2*RD/128 is the time necessary to prepare
the Rotational Position Sensing (RPS) device for data
transfer. This term causes any expression in which it
appears to be hardware dependent. To apply these expres-
sions to disk devices other than 185 3330's, this term

should be modified. The I/C access rates are expressed by

ARAT(i,j,k)=N(i,j,k)/N(j)*ET(j). (2.14.5)

Then the channel utilization by device, file and program is

63

RCU(i,j,k)=ARAT(i,j,k)*TSC(i,k) (2.“.6)

and the total channel utilization RC0, is given by

RCU=sum[RCU(i,j,k): i=1,.ND;j=1,..N;k=1,..NF]. (2.0.7)

The channel wait time due to RPS is

TWR=RCU*RD/(1-RCU). (2.u.8)

To estimate the channel utilization as "seen" by program j,

we have

RCU(j)=RCU-sum[RCU(i,j,k): i=1,..ND;k=1,..NF], (2.u.9)

and therefore we have an expected channel wait time for RPS

of

TWR(j)=RCU(j)*RD/(1-RCU(j)). (2.u.10)

This is true because at the time that program j requests

I/O, no other operation by program j can be in progress.

Next, the average seek time by program, file and by
device is computed. First, the probability of access to

each file on each device DI(i,k), is derived as follows:

6Q

ARAT(i,k)=sum[ARAT(i,j,k): j=1...N]. (2.u.11)

ARAT(i)=sum[ARAT(i,j,k): j=1,..N;k=1,..NF], (2.“.12)
and

DI(i,k)=ARLT(i,k)/ARAT(1). (2.4.13)

Disk files are considered to be organized by "cylin-
der", where a cylinder is the collection of all disk records
which can be read or written at one physical positioning of
the access mechanism or "heads". A movement of the heads
from one location to another is called a "seek". Since we
are accessing file k with a uniform distribution, the
expected value (in cylinders) of the seek distance SK(i,k),

is given by

E(SK(i,k))=sum[DI(i,l)*lLC(i,k)-LC(i,1)I: l¢k]

+DI(i,k)*((CYL(i,k)**2-1)/3*cyl(i,k)). (2.a.1u)

This expression is based on the fact that, on the average,
the seek distance between two files on the same disk will be
given by the distance between their mid-points [an]. For a
seek from some point within the §é-§ file the average
distance will be given by the second term in the above

expression. Similiary we may estimate the second moment for

the seek time E(SK2(i,k)) as

65

E(SK2(i,k))=sum DI(i,l)*(LC(i,k)-LC(i,l))*2 : lsk]
+sum[DI(i,l)*((CYL(i,k)**2+CIL(i,l)**2)/12 : l¢k]
+DI(i,k)*(CYL(i,k)**2-1)/6. (2.“.15)

Using this expression and the previously derived mean seek
distance, the variance of seek distance may be calculated
using the well-known formula, V(SK(i,k)) = E(SZ(i,k)) -

E(SK(i,k))**2.

In calculating the seek time for the model, it will be
assumed that the seek time is a linear function of the
distance moved. This assumption is reasonable for most
seeks greater than a few cylinders in distance. We define

the seek time function as

TS=T*U(SK)+KS*SK (2.u.16)

where SK is the number of cylinders seeked, K5 is the slope,
T is a constant, and U(SK) = 1 for SR greater than or equal
to 1 and U(SK) = 0 for SK equal to zero. Then the seek time

for an access to random file k on device 1 is

E(TS(i,k))=(CYL(i,k)-DI(i,k))*T/CYL(i,k)
+KS*(sum[DI(i,l)*lLC(i,l)-IC(i,k)I: 1¢k1 (2.4.17)

+KS*DI(i,k)*((CYL(i,k)**2)-1)/CYL(i,k)).

Using Water's method [an], and the previous assumptions

about the programs in the multiprogramming set and their

66

disk accesses, we calculate the variance of the seek times

to random file k on device 1 as

V(TS(i,k))=(1-DI(i,k)/CYL(i,k))*DI(i,k)*(I**2)/CYL(i,k)
+2*T*KS*DI(i,k)*E(SK(i,k))/CYL(i,k) (2.“.18)

+V(SK(i,k))*KS**2.

The wait time due to channel blocking caused by other

I/O processes is given by Wilhelm's model [HS] as

TWC(i,j,k)=sum[ARAT(m,l,k)*TSC(m,k)**2:l#j;m#i]. (2.4.19)

The device service time for a random read Operation will be

given by

TSD(i,j,k)=TS(i,k)+RD/2+THR(j)+TSC(i,k)+TWC(i,j,k),

(2.u.20)

and service time for a random write operation will be

TSD(i,j,k)=TS(i,k)+RD+2*(TWR(j)+TSC(i,k))+TWC(i,j,k).

(2.“.21)

The utilization of file (i,k) due to program j will be

RDU(i,j,k)=ARAT(i,j,k)*TSD(i,j,k). (2.u.22)

The respective variances for the read and write operations

67

are

V(TSD(i,j,k))=V(TS(i,k)+(Rt**2)/2
+RCU(j)*(RD**2)/(1-RCU(j))**2 (2.u.23)

+sum[ARAT(m,l,k)*TSC(m,k)**3:l#j:m¢i]/3+TWC**2,

and

V (T513011 2).“) )‘V (T5 (1.1!) ) + (R13**2)/3
+u*RCU(j)*(RD**2)/(1-RCU(j))**2 (2.u.2u)

+sum[ARAT(m,l,k)*TSC(m,k)**3:l¢j;m#i]/3+TWC**2.

From the previous computation of the variance for disk
service time, we may now compute the coefficient of varia-

tion for the disk service time as

CS(i,j,k)=V(TSD(i,j,k))/TSD(i,j,k)**2. (2.0.25)

We finally get the disk response time by use of the queuing
formula TRD(i,j,k) = TSD(i,j,k)*(1 + 330*(oa**2 + os**2)/2 *
(1 - RDU(i,j)) where RHO = 1 + 2 * (RDU(i,j) - 1)/(ca**2 +
1), and CA and C5 are the coefficients of variation of the
arrival and service distributions. The term RDU(i,j) is the
sum of the utilization of device i by all programs other
than program j. A particular application of this formula,

for Poisson arrivals (CA=1) is

68

TRD(i,j,k)=TSD(i,j,k)*(1+RDU(ioj)*

(1+CS**2)/2*(1-RDU(i,j)). (2.“.26)

To get the total response time, we must add the wait time

while the disk is available and the channel is busy,

TWD(i,j,k)=RCU(j)*(1-RDU(i,j))*TWC(i,j,k)/2. (2.u.27)

We now look at a slightly more complex model, a model of

access for indexed data.

Indexed Disk I/O

For the indexed access method we will assume that the
index is located on the same device as the data portion of
the file and that a seek to this index is required. There
are several cases of indexed access: (1) initial load or
sequential retrieval, which can be treated using the sequen-
tial method to be given later, (2) indexed read-only, and
(3) indexed read-write. We will limit the analysis of the
indexed method to random reads without the index in memory
(core index). Analysis of the other cases may be achieved

by simple extensions of the methods used here.

We will first calculate the seek time. For a random
read there are actually two seeks to be calculated; (1) the
seek from the starting location of the access mechanism to
the cylinder index, and (2) the seek from the cylinder index

to the indexed file's data area. The expected values of

69

these seeks are equal to the seek times from mid-point of
the file where the access arm is initially located to the
mid-point of the index, and from there to the mid-point of
the data area. Designating these seeks as S1 and S2 respec-

tively, we have

E(S1(koi))=

sum[DI(i,1)*lCI(i,k)-LC(i,1)l:1=1,..NF], (2.4.28)

E(SZ(k,i))=|CI(i,k)-LC(i,k)I, (2.“.29)

where CI(i,k) is the mid-point of the index for file k.
Having computed the expected value of the seek distance, we
may compute the expected seek times for the index and the
data portions of the file as P(TS) = T + KS*E(SK) since the
access mechanism will always have to move at least one
cylinder. This is true because the index will always be
read first and the access mechanism will never be left in

the index area.

Since we actually have two I/O operations to transfer
a single data record, we will compute service times for the
two operations separately. First we have the channel busy

time for searching and transferring the index

TSC1(i,k)=RD/2+2*RD/128+10/2*TR, (2.u.30)

and the channel busy time for searching and transferring the

70

data record is

TSC2(i,k)=RD+2*RD/128+BS(i,k)/2*TR. (2.u.31)

Assuming independence for each segment of the timing

diagram, we calculate the channel utilization for indexed

reads as
RCU(i,j,k)=ARAT(i,j,k)*(TSC1(i,k)+TSC2(i,k))/2, (2.4.32)

We then have the delay due to RPS

TWR(j)=RCU(j)*RD/(1-RCU(j)), (2.4.3u)

and we calculate the channel wait time as before by

Wilhelm's method:

TWC(i,j,k)=sum ARAT(m,l,k)*TSC(m,k):m#i;l¢j]. (2.u.35)

The service time for the indexed access will be

TSD1(i,j,k)=TSC1(i,k)+TWR(j)+TWC(i,j,k)

+RD/2+TS(E(S1(i,k))). (2.0.36)

The device service time for the data access will be

71

TSD2(i,j,k)=TSC2(i,k)+2*(TWR(j)+TWC(i,j,k))

+RD/2+TS(E(52(i,k))), (2.u.37)

and the rate of access to device 1 for all programs other

than program j is

ARAT(i,j)=sum[ARAT(i,l,k):lsj;k=1,..NF]. (2.“.38)

The device utilization which program j finds at device

1 will be

RDU(i,j)=sum[ARAT(i,l)*(TSD1(1,1)

+TSD2(i,l))/2 : 1:3], (2.u.39)

and the wait time for program j then becomes

TWD(j)=RCU(j)*(1-RDU(j))*TWC(j)/2. (2.u.u0)

To complete the elaboration of the model we must find
the variance of the service times. It can be shown that the

variance of the distance for the initial seek, S1, is

V(S1(i,k))=sum[DI(i,1)*(CL(i,k)**2+CYL(i,l)**2)/12:1=1,..NP]
+sum[DI(i,l)*(1-DI(i,l))*(CI(i,k)-LC(i,l))**2:l=1,..NF]
-2*sum[ DI (i, 1)*|CI(i, 1!) -LC (1,1) I*

sum[DI(i,m)*|CI(i,k)-LC(i,m)):m=1,..1-1]:l=1,..NF],

(2.u.u1)

72

where CL is the number of cylinders in the index portion of

file k on device i. The seek time is then

V(T(S1(i,k)))=T**2+V(S1(i,k))*KS**2. (2.u.a2)

The variance of the second seek is then calculated as

V(52(i,k))=(CL(i,k)**2+CYL(i,k)**2)/12, (2.u.u3)

and the seek time for the second seek is

V(T(S1(i,k)))=T**2+V(52(i,k))*KS**2 (2.“.04)

The variance for the channel service time is then (RD**2)/12
and 2*(RD**2)/12 for the index read and the data read

respectively. Since the variance of the rotational delay is
(RD**2)/12 we have the following expression for the variance

of the device service time for the read of the index:

V (T5131 (it jlk) ) =V (T (31 (iljlk’)) +V (THC (j),

+V(TWR(j))+(RD**2)/12+V(TSC1(i,j,k)), (2.u.a5)

where W(TWC) and W(TWR) are calculated exactly as in the
case of the random access method. Similarly, we have the
variance of the disk service time for the data read opera-

tion as

73

v(TSD2(iojvk))=v(T(S1(iljvk)))+(RD**2)/12

+2*(V(TWC(3))+V(TWR(3)))*V(TSC2(i'j'k))- (Z-Q-“5)

At this point we form the coefficient of variation exactly
as before and use this in the general queuing formula to get
the expected response times TRD1(i,j,k) and TRD2(i,j,k) for

the indexed access method.

The extension of these methods to other forms of I/O
for indexed files is straightforward. For example, to extend
the previous analysis to the case where the index is in core
we simply calculate the seek time component (52) exactly as
we would for a randomly accessed file. Of course the use of
a core-index implies something about the memory requirements
and also the CPU usage of the program. The use of a
core-index makes the program a gifgegege pgeggeg as far as

the CPU and Paging submodels are concerned.

Sequential Access Disk I/O

For a sequentially accessed disk file, we calculate
the timing diagram in a similar manner to calculation for
the direct access case. The difference is in the calcula-
tion of the seek time. First we determine the expected

value of the seek distance

74

B(SK(i,k))=sum[DI(i,l)*|LC(i,k)-LC(i,l)l:l¢k]

+DI (i'k)/RCI (2.“.“7,

where RC is the number of records to be read from each
cyclinder of file k. The variance of the seek distance we

calculate as

V(SK(i,k))=(1-DI(i,k)/RC)*DI(i,k)/RC
+sum[DI(I,l)*((CYL(i,l**2+CYL(i,l)**2)/12
+(1-DI(i,k))*(CYL(i,k)-CYL(i,l))**2
-2*DI(i,k)*ILC(i,k)-LC(i,l)I : lik] (2.".48)
‘2*(Sum[DI(i,1)*ILC(i,k)-LC(i,1)l*

sum[DI(i,m)*|LC(i,k)-LC(i,m)l:m=1,.l-1,m¢k]:l#k].

We then have.the expressions for the expected seek time,

E(TS(i,k))=(1-DI(i,k)+DI(i,k)/RC)*T (2.u.u9)

+K*(sum[DI(i,l)*lLC(i,l)-LC(i,k)| lik]+DI(i,k)/RC),

and the variance of the seek time,

V(TS(i,k))=2*K*T*DI(i,k)*E(S(i,k))/CYL(i,k)+V(SK(i,k))*K**2

+ (1-DI (i,k)/CYL (i,k) ) *DI (i,k) * (T**2)/CYL (i,k) . (2.4. 50)

With the expected value and variance of the seek time we
calculate the coefficient of variation, the disk response

time, channel waiting time, and the I/O response time as we

75

did in the direct access case.

From the previously derived expressions for access
rate by file, device and program, we can calculate the

average I/O response time for program j by

XI(j)=sum[N(i,j,k)*TRD(i,j,k)/N(j):i=1,..ND;k=1,..NF].

(2.4.51)

I/O for Paging

The I/O response times for paging operations are
computed as if the paging file were comprised of distinct
subfiles for each program in the multiprogramming set. The
response for each program is computed as the I/O time for
random access to its paging subfile with the added condition
that the paging I/O is priority scheduled using
Head-of-the-Line (HOL) policy. It should be noted that the
foregoing comments are specific to the implementation of the

particular operating system used in this research.

Using PHI(j) as the average time to read or write a
page on behalf of program j, TSD(l) as the mean service time
for all I/O to the paging data set, and TSD(l,j,j) as the
mean service time for access by program j to its portion of

the paging file we have the following:

76

PHI (1)) =TSD (1: j,j) +3130 (1:2)) *TSD (1) *CK (j) ,
CK(j)=CK(j-1)+ARAT(l,j-1,j-1)*TSD(1,j-1,j-1), (2.u.52)

CK(1)=(1+CS(1)**2)/2 j=1,..,N,

where ARAT and RDU have the same meanings as defined previ-
ously. All other values used in determining the response
time for paging is computed exactly the same as for a random

access data set.

This concludes the detailed discussion of the I/O
submodel. This discussion was not intended to be exhaustive
but does point out the kind of I/O models that can be used
to give a more realistic treatment to I/O than is usually

found in computer performance models.

§g§model Integ at'eg

 

We complete the model by integrating the Paging, CPU, and
I/O submodels. This is accomplished by calculating the
statistics for the CPU and Paging submodels, and then using
these statistics as input to the I/O submodel. The 1/0
statistics are then used as input to the CPU and Paging
submodels. This process is repeated until the model state

variables converge to an equilibrium solution.

Since the model developed herein is comprised of a

system of non-linear equations, the geggle Falei, or the

77

algorithm are used to accelerate convergence of the system.
The ideas that enable the various parts of the model, the
submodels, to fit together will now be examined in some

detail.

The idea of a CPU cycle is extended to include program
CPU execution intervals which terminate in paging operations
as well as those terminating in normal (non-paging) I/O
operations. Representing the average page read or write
time as PHI(j) for program j, and defining the system ratio
of page reads to page writes as BETA, we have the expression

for the average page wait time during a paging cycle

ETA(j)=(1+BETL)*PHI. (2.5.1)

With N(j) the number of non-paging I/O operations
during the execution of program j, and CTINE(j) the total
problem state time during the execution of program j, we

have

NU(j)=(N(j)+1)/CTINE(j). (2-5-2)

The approximate number of page fault cycles is given by

PG(j)=N(j)*(1/B(j)*UU(j)'1)- (2-5-3)

We then calculate the proportion of all I/O delays due to

paging as

78

PG (3)/(P30) +N(j))=N(j) *(1/HU(3)*B (3)‘1)/
N(j)*(1/’HU(j)*B(3)‘1)*N(j) (2-5-‘0

=1’HUU)*E(3) ,

where B(j) is the average CPU execution time per cycle for
program j and PG(j) is the total number of cycles of program
j's execution which terminate in page exceptions. Similar-
ly, the proportion of cycles due to non-paging I/O opera-

tions is

N(j)/(135(3)+N(j))=HU(j)*B(j)- (2.5-5)

Using the instantaneous page exception rate CER(j)
defined previously, the number of paging cycles PG(j) is
given by the product of the total number of cycles times the

rate of page exception generation per cycle

PG (:1) = (N (j) +PG (j) ) *CER (j) *8 (j) .
or equivalently,

GER (31:9301/30) * (N (3) +PGU) 1:1/301‘5001 . (2.5.6)

This implies that the CPU time per cycle B(j) is

B(j)=1/ (30 (31*CERUH (2-5-7)

and B(j) can be determined. Now that we have an estimate

for the average paging response time ETA(j) and the normal

79

I/O response XI(j), we can calculate the average overall I/O

response as

10(3)=NU(31*B(3)*XI(3)*(1'UU(3)*B(3))*ETl(j)- (2-5-8)

Turning now to a re-examination of the CPU overhead
due to I/O and page management, let the normal I/O process-
ing cycle overhead be given by ALP(j) and the average paging
cycle overhead by DBL(j). The average CPU time consumed per

cycle for program j will be

TAU(3)=B(3)*HU(3)*B(3)*BLP(3)

+(1'HU(3)*B(3))*DEL(31- (2-5-9)

For a given program j (remember this also means priority
level j), there is only one variable on the right hand side

of the above expression, the variable B(j).

In taking the output of the CPU and Paging submodels
as input to the I/O submodel, the device/file access rates
must be disaggregated so the appropriate rates are reflected
for non-paging and paging I/O (the paging I/O response is
calculated by the I/O submodel just like any other I/O).

The normal I/O rates are computed as NU(j)*B(j)/ET(j) and
the paging rates are computed as (1-HU(j)*B(j))*(1 +
BETA)/ET(j). After computing these I/O components separate-

ly, they are recombined as shown previously.

One other aspect of CPU service time to be introduced

80

at this point has to do with the elongation of service time
due to CPU cycles being "stolen" to accomplish I/O. This
effect can become significant at high I/O rates because the
amount of CPU time used by the channel is proportional to
the amount of data transferred and the number of I/O opera-

tions started.

This effect will be quantified by defining an "expan-
sion factor" CYT, as the ratio of available CPU time with
some level of I/O to the maximum available CPU time without
I/O. This factor is expected to be different for different
programs in the multiprogramming mix. Defining the amount
of time stolen by the channel for an I/O operation by

program j as D(j), we have

D(j)=DS+[(1-B*HU)*BS(j)+B*NU*BSP]*DC, (2.5.10)

where DS is the amount of time used by the channel to
initiate and terminate an I/O, BS(j) is the average I/O
block size transferred for program j, BSP is the size of a
page in the system under discussion, and DC is the amount of
CPU time used by the channel per byte transferred. For the
present research DS = 76 microseconds, DC = 0.15 microse-
conds, BS = 568 bytes and BSP = 2048 bytes. These factors

are all hardware dependent.

Then the amount of CPU time stolen each second for I/O

on behalf of program j is given by D(j)/ET(j) so the

81

fraction of CPU cycles available for instruction execution
is 1-D(j)/ET(j). A program of higher priority than program
j must be in the I/O stage if program j's instruction are
executing, so the fraction of CPU time available due to the
higher priority program (k) is 1-LAN(k)*D(k). Combining
this with the results for a lower priority program, it can

be shown that

CYT(j)=prod[1-LAH(k)*D(k):k=1,..j-1] (2.5.11)

*prod[1-D(k)/ET(k):k=j+1,..,N].

CYT may be computed in sequential fashion in the CPU submo-

del by

CYT(3)=CYT(j‘1)*(1‘LA5U‘1)*DU‘1))/(1'D(j)/ET(3))o
CYT(1)=prod[1~D(k)/ET(k):k=2,..,N], (2.5.12)

for j=2,3,..,N.

The expected length of a CPU execution interval, whether it
is B, TAU, ALP or DEL may be elongated simply by dividing by

the corresponding CYT.

This completes the submodel integration. At this
point all of the essential elements of the model have been
put together. The conditions for convergence to an equili-
brium and the extent to which these conditions are met by

the present model will now be examined.

.edel £2mx.rgense

If we represent a cycle of iterations of the CPU,
Paging, and I/O submodels by the real valued functions f(.),

defined over Euclidean N-space, we have

f(j)=LAN(j)-LAH'(j) for j=1,..N, (2.6.1)

where LAN(j)=1/IO(j) and LAU'(j)=1/IO'(j) are the outputs of
the I/O submodel (and consequently input to the CPU and
Paging submodel) on successive iterations. LAN and LAN' are
interpreted as instantaneous I/O response rates. The

equilibrium condition requires that

f(j)=0 for j=1,..N. (2.6.2)

Since the model is a system of non-linear equations,
and the iterative method of alternating between CPU-Paging
submodel and I/O submodel calculations produces an oscillat-
ing series, we have employed a variation of Newton's Nethod,
the Reggie geiei [30], to accelerate convergence. The

algorithm is based on the general form

9(3)=X(j)’f(j)/f'(j) j=1u-N. (2-5-3)

where x(j)=LAu'(j) is a scalar and f'(.) is the derivative
of f(.) with respect to x(j). Representing the vector forms

for f, g, y, and x by the upper cases F, G, Y, and x we

83

have the vector condition for convergence of the algorithm:

IIG(X)‘G(Y)IISH*IIX‘YII. (2-5-“)
for l)X-Y||<RHO
where (|.|| is a norm, M is a scalar less than one, and the

vectors X and Y are in a sphere of diameter RHO about the

equilibrium point, EP, in NP-space [37].

We are assured of the existence of the point EP by the
Mean Value Theorem, since each f(j) is continuous in any
neighborhood of EP(j) and takes on both positive and nega-
tive values in this neighborhood. Because of the complexity
of the full expression for f(.), it is very difficult to
manipulate the derivative symbolically --although it is
possible to use the same algorithm which calculates f(.) to

calculate f'(.) numerically.

If we represent the k-th iteration of the vector X by

X(k), we can express equation 2.6.3 as

9(j:X(k))‘91j:X(k-1H='f(j:X(k))/f' (j:X(k)) (2-6-5)
for j=1,..N and k=1,2,...

With this expression in mind, we will take a closer look at
the function f(.). To reduce the complexity of the analy-
sis, we will simplify the model in a way which will not

interfere with the generality of the results.

84

First, we will assume that all block sizes are equal
and that all file accesses are to disk. Furthermore we will
limit the access method to direct (or random) accesses
uniformly distributed over a single disk module. This is
really not as great a simplification as it seems since it is
many times more complicated to calculate response times for
many programs accessing a single disk drive than it is to
compute the response time for each program accessing a
unique disk drive. The latter case involves no iteration
since each program's I/O response will be almost totally
independent of every other program's I/O response, and
therefore easier to compute (this is because there will be

no queuing for devices).

The foregoing assumptions will ensure that the eegyiee
pipe (TSD) for every program will be identical. Ignoring
the channel service time, and designating the disk service
time by TSD, we have some natural constraints to work with.
For example, the access rate for all programs must be less
than 1/TSD since a device cannot be utilized more than 100%.
Representing the instantaneous access rate for program j by
x(j), an analysis of a single iteration of the algorithm
(the function f), we have that f(j;x(j)=0) < 0 and
f(j;x(j)=1/TSD) > 0 for every program j. Furthermore, it

can be shown that f(j;x(j)) s f(j+1;x(j+1)).

We can guarantee that f(N x(N)=0) < 0 by the imposi-

tion of the requirement that the utilization of the disk be

85

less than or equal to 100%. Given the access rates ARAT(1),

and a single disk device, we have

f(j;x(j))=x(j)+sum[ARAT(l):l¢j]-1/TSD,
Sum[ARAT(l)*TSD:l=1,..N-1] < 1,
==> sum[ARAT(l):l=1,..N-1] < 1/TSD, (2.6.6)

==> f(j;x(j)=0)=sum[ARAT(l):l=1,..N-1]-1/TSD < 0.
For x(j) = 1/TSD we have

f(j;x(j)=1/TSD)=1/TSD
+sum[ARAT(l):l=2,..N]-1/TSD (2.6.7)

=sum[ARAT(l):1=2,..N] 2 0,

since each rate ARAT(l) is strictly positive and converges

to zero as x(j) becomes infinitely large.

To show that f(j x(j)) S f(j+1;x(j+1)) we have

f(j;x(j))=f(j+1:x(j+1))-(x(j)**21*(C(j)+x(j)*
(W(j)+IO(j))*ARAT(j)*ARAT(j+1). (2.6.8)

Since all of the variables following the negative sign in
the above expression are greater than or equal to zero, the
necessary condition prevails. Because the functions
involved are continuous in the domain of interest, we are

assured of a solution.

1/_ "

86

It can also be shown that the functions f(.) are
differentiable and that the partial derivatives of f(j) with

respect to x(j)--Df(j)/dx(j)--are ergereé

Df(1)/dx(1)SDf(2)/dx(2)S...SDf(N)/dx(N)=1, (2.6.9)

and the second partials DDf(j)/dx(j) are non-negative

everywhere,

DDf(1)/dx(1)ZDDf(2)/dx(2)2..ZDDf(N)/dx(N)=0. (2.6.10)

These conditions assure us of a unique solution and they
also assure us that the derivative of each function is
greater than zero and less than or equal to one (1) in the
neighborhood of the solution (the derivatives are very

nearly equal to one in this neighborhood). We have that

Df(1:EP(j))/dx(1)>TSD*(f(1:1/TSD)'f(1:0)) (2.6.11)

(g(j:x(j))-g(j:y(j))|=|x(j)-y(j)+f(j:x(j))/f'(j:x(j))
“f(3=Y(j))/f'(j=Y(3))l- (2-5-12)

By Taylor's Theorem,

f(j=2(j))=f(j=EP(j))+f'(j=EP(j))*(Z(j)'EP(j)): (2-6-13)

implying that

 

87

|g(j:x(j))-g(j:x(j))|=|x(j)-y(j)+CHI*(y(j)-x(j))I
=I1'CHII*IX(3)’Y(3)I: (2-6-1“)

where CHI is a non-negative scalar which is less than one.
Using ||X|| = max[x(j):j=1,..N] as the definition of the
norm, we chose RHO so that [IX-EPII < RHO implies by contin-

uity that

f'(j=EP(j))*“/5<f'(j=X(j))<f'(j=EP(j))*5/“.
implying that

4/5<f'(j:EP(j))/f'(j:x(j))<5/4. (2.6.15)

This means that 4/5 < CHI < 5/4 or -1/4 < 1-CBI < 1/5,

therefore

|1-CHI)< 1/u. (2.6.16)

Substituting this into the previously derived expression, we

have

IIG(X)‘G(Y)IISI1’CHII*|IX'YII<(1/“)*1IX'YII- (2-6-17)

With this we conclude the demonstration that the use
of Newton's method is warranted and that the algorithm
converges to a unique solution. To accelerate the rate of
convergence of this process, we apply Aitken's Delta Square
algorithm to estimate a new point from three previous

estimates by Newtons method. The task now is to develop the

88

experimental apparatus to validate the model and determine

its usability.

III. MODEL VALIDATION AND EXPERIMENTAL DESIGN

The computer runs for estimation and validation were
performed on an IBM System/370 Model 148 in the Detroit
Datacenter. The IBM 05/751 operating system and the IBM
software products, Systems Measurement Facility (SMF), and
05/751 Utilization Monitor were used to collect and manipu-
late performance data. Making the experimental runs on the
same computer as the base runs does not detract from the
results since the programs in the base runs and the experi-
mental runs were in totally gifgegeee eegbinatieee and with
different priorities. This is, after all what the model is
supposed to predict. The only additional information that
could have been gained by executing the experimental runs on
a different computer from the base runs is the extent to
which execution timings on different model computers are not
proportional. For example the ratio of execution timings
for scientific and commercial instruction mixes on two
different models of the same computer "family" will be
different. The model developed here does not attempt to

give an answer to this problem and a decision was made to
“SE? published instruction execution rates for an "average"

commercial job mix as is the common practice.

89

 

90

To a certain extent the proposed model is dependent
upon the particular implementation of the operating system
used. The implementation of the operating system used in
making the measurements is i_§eg;gp§ ggiye_ pgiegiey; esug_
dispatching for batch programs. This research was conducted
using only one operating system but this does not seriously
limit the generality of conclusions reached because differ-
ences within the class of operating systems defined above'
are differences in pegeeeee; yeleee and furthermore this
dispatcher is representative of the majority of operating
systems in current use. Examples of these parameters are

the quanta of CPU overhead for I/O interrupt, start I/O,

dispatch processing, and paging overhead.

Estimates were obtained for the paging submodel
coefficients which correspond to the page-out/page-in ratios
(BETA), working set allocation parameter (L), and the paging
indices (A). The estimation began with the execution of a
selected workload and the measurement of its performance.
The measurement data was then fitted to the model by use of
multivariate regression analysis. Finn's MULTIVARIANCE
program was used for estimation of the regression coeffi-

oients (22].

Performance measurement was achieved by means of
software monitors and job accounting data collection. The
response data collected included program CPU usage, total

CPU usage, channel busy counts, device busy counts, paging

f

91

and other I/O counts and working set sizes. These data were
manipulated to give the performance values of interest:
total CPU utilization, program CPU utilization, channel
utilization, average working set sizes, paging rates,
program I/O operations. Further transformations were made
to estimate each program's page index (A), mean CPU service

time (1/MU), and mean I/O response time (10).

To complete the experimental plan, the apparatus
employed, and the programs used in parameter estimation and
model validation are described. Next, a description is
given of the instruments or measurement software and data
reduction programs used, and the measures to which they
relate. The last section of the chapter provides descrip-
tions of the experimental design for estimation, the design

for validation, and the respective hypotheses tested.

ggeggee_ fee geasggement egg Contr l

 

The programs used in this research were of three types

(1) those written by the experimenter to predict program
behavior and estimate parameters, (2) those designed and
programmed by the experimenter so that the experimental
variables could be controlled in model validation, and (3)
statistical programs. The programs used for prediction are
basically the model developed in chapter two. This model
was programmed in APL [31] and executed on an IBM
System/370. The experimental programs were written in COBOL

and the statistical programs used in this research were

92

Finn's MULTIVARIANCE package [22].

The parameter estimation workload is composed of 8
programs randomly chosen from a population consisting of 8
"CPU bound" programs and 8 “I/O bound" programs. The
definition for "I/O bound" used here is that, for a program
run in isolation, greater than 75% of the elapsed time would
be spent waiting for I/O operations. Likewise "CPU bound",
as used here, means that the program would consume more than

50% of the available CPU cycles during its execution.

The programs used in the model validation were "syn-
thetic" programs [10,42] specially constructed to vary over
the cells of the experimental design. The synthetic
programs were written as extensions to the synthetic
programs mentioned earlier in this thesis but they were
modified to enable the paging index A(j) to be varied. More
will be said about this in the discussion of experimental

design.

Engineers; 12me 68

There are 5 independent variables and 7 dependent
variables in the experiment. The program related indepen-
dent variables are: (1) the page index, A(j), and (2) the
variance of CPU service time WS(j). Three other independent
variables are environment related and consist of: (1) The
variance of I/O service time VI(j), (2) the priority of the

program being measured, j, and (3) the total number of

93

programs in the multiprogramming mix, N. A decision was
made to eliminate the memory variable R, because its effect

is confounded with the factor N, the number of programs.

The dependent variables are: (1) problem program CPU
utilization CP(j), (2) program elapsed cycle time ET(j), (3)
total program paging rate PR(j), (4) program page-in rate
PI(j), (5) program channel utilization RCU(j), (6) total
system overhead OHT(N), and (7) average program working set

size w(j).

The system parameters are: (1) CPU overhead for normal
I/O, ALP(j), (2) CPU overhead for Paging, DEL(j), (3) the
ratio of page-writes to page-reads, BETA(j), and (4) the

working set allocation weights, L(j).

ns rue_gts and Measu_ee

 

The instruments used in the collection of elapsed
times, problem state time, wait time, paging counts and
working set size samples by program were the job accounting
facility of the IBM OS/VS1 operating sytem, the Systems
Measurement Facility (SMF), and the IBM proprietary program,
OS/VS1 Utilization Monitor. The channel and device utiliza-
tions and working set sizes are based on sampling. System
paging and other I/O rates are based on counts and CPU times

are based on actual measurements.

In other cases the variables in this experiment were

not directly observable, so indirect measures were

94

constructed. While it is fairly easy to estimate the
problem state CPU utilization CP(j) as the ratio of total
problem state time to elapsed time, the overhead gee to a
specific program is not generally attributed to that program
by job accounting routines or program monitors. Because of
this, estimates of COH(j) must be achieved by a partition of
the total system overhead. The total system overhead is an
estimate based on the difference between the elapsed time
and the wait time and problem state time or it is based on

sampling by performance monitors.

In order to estimate the parameters ALP(j) and DEL(j),
estimates must be obtained for COH(j), but it has already
been stated that COH(j) is not measurable. One way to
partition total system overhead for each program is to
assume that the ratio of problem state CPU utilization to
overhead is the same for each program as new programs are
added to the multiprogramming mix. Suppose that we have L
programs in the mix and CP(j) and COH(j) for j=1,..,L. We
then add program K+1 and get CPT, the total CPU utilization,
and the program utilizations CP'(j) for j=1,..,K+1. The

above assumption implies that

COH'(j)=COH(j)*CP'(j)/CP(j). for j=1,..,K (3.4.1)

 

95

coa(j)/CP(11=CER(j1*DELtj)+uU(j)*ALP(j) (3.4-2)

=CER' (3') *DEI (j) +910 (j) *ALP (3')
w(j)=w'(j) for j=1,..,K. (3.4.3)

Thus the above assumptions imply that the working sets for
program j is identical in the runs with K and K+1 programs .

This is only possible if memory is not a binding constraint.

Since we have neither a direct nor an indirect means
of estimating COH(j), and since COH(j) is a function of ALP
and DEL-- both of which must also be estimated--some other
means of partitioning the total system overhead had to be
devised. It was then hypothesized that ALP and DEL increase
with increasing j (lower priority) since the dispatcher must
process more dispatch queue entries before arriving at the
last queue entry. It is also likely that ALP and DEL both
increase with increasing N since more lists and tables must
be checked by the operating system at every interrupt.
Finally, it was hypothesized that the amount of processing
that must be accomplished by the operating system before an
I/O or paging operation can be sucessfully initiated is
proportional to the depth of the I/O queues, and the depth
of the I/O queues is proportional to the the aggregate rate

of I/O operations, T(j). For the overhead variables we have

96

ALP(j,N,T(j))=K1+j*K2+N*K3+T(j)*K4, (3.4.4)
and

DEL(j,N,T(j))=L1+j*L2+N*L3+T(j)*L4, (3.4.5)

where T(j)=sum[1/ET(l): l=1,..N; l¢j]. We then have for

program j's overhead

C03 (3) /CP(j) =((143 U) *5“ (j) ) *DEL (j,N,T (3))
+30) *HUU)*AI-PticNoTUH/BU): (3-‘1-5)
or
(303(3) =(1‘BU) *30 (I!) ) *EEL(j:N.TU) )IETU)
+B(j)*UU(3)*BLPU.N.T(SIM/BTU)- (3-“-7)

We now assume that there is insignificant paging
(B(j)*MU(j)=1) and sum the individual program overhead terms

giving

OHT(N)=X(N)*K1+Y(N)*K2+N*X(N)*K3+(X(N)**2-S(N))*K4

for “=1'oo’7' (30“.8’

where X(N)=sum[1/ET(j):j=1,..N], Y(N)=sum[j/ET(j):j=1,..N],
and S(N)=sum[1/ET(j)**2:j=1,..N]. We have a system of 7
equations in 4 unknowns for which a solution exists if the
matrix of coefficients times its transpose is non-singular.
This will be the case if no column or row is equal to a
combination of other columns or rows. The condition that
there be negligible paging can be guaranteed by manipulation

of the large real memory of the experimental system.

97

Executing the ALP estimation workload in a paging
environment provided the basis for computing the overhead
caused by paging as the residue of the CPU overhead due to
I/O which was estimated with the ALP coefficients. For the

residue due to paging we have

OHT'(N)=X(N)*L1+Y(N)*L2+N*X(N)*L3+(X(N)**2-S(N))*L4

for N=u'oc'80 (3.“.9)

The first 3 runs were eliminated because there was no paging
for N<4 since the size of each program was identical and the
size of main memory was constrained to the size of three
copies of the program. From two replications of these 5
experiments, equations were derived to estimate the 4 paging
coefficients. The overhead expressions were then combined

to get the overhead for the "mixed" model.

To estimate the ratio of page-writes (PO(j)) to
page-reads (PG(j)), consider the conditions which result in
a page-write operation. A page exception will result in a
page-write operation if there are no page frames that are
unreferenced and there are no referenced pages that have not
been modified. The probability of this
event--BETA(j,N)--times the number of trials (PO(j)) will
give the number of pages written or PO(j) = BETA(j,N)*PG(j).
It was thought that BETA would be a function of the amount
of real memory in the system and the amount of memory

demanded by all the programs in the system and not depend on

98

program j alone, i.e. a system parameter.

The experiment was carried out with printer output
spooled but not printed until after program termination
because of the need to reduce the complexity introduced by
spooling and measuring additional programs. This does not
reduce the usefulness of the model because each execution of
a spool task can be considered to be the execution of a
separate program which requires a printer and "spooled"

print output from some other program.

A spool program is just like any other in the model
except that its priority will be set higher than that of a
normal program. In this sense the model would allow the
priority to "float" according to the number of system

programs (spool tasks) which are active.

Esmeriment l Resign

Design for Parameter Estimation

The design for parameter estimation consisted of 16
runs of one to 8 programs for the estimation of coefficients
for ALP, and 10 runs for the estimation of coefficients for
DEL, L and BETA (see Figure 7). The hypotheses tested were
the following:

1. Larger values for the program I/O overhead parameter

(ALP) are expected for programs executing at lower

priority levels (larger values of j)

2. Larger values of the program I/O overhead parameter

 

10.

99

(ALP) are expected for programs executing in larger
multiprogramming sets (N)

Larger values of the program I/O overhead parameter
(ALP) are expected with larger system I/O access rates
(T(j))

Larger values of the paging overhead parameter (DEL)
are expected for programs executing at lower priority
levels (larger j)

Larger values of the paging overhead parameter (DEL)
are expected for programs executing in larger multipro-
gramming sets (N)

Larger values of the paging overhead parameter (DEL)
are expected with larger system I/O access rates (T(j))
The page-write to page-read ratios BETA are expected to
be identical for programs executing concurrently

The page-write to page-read ratio BETA is expected to
be larger when a program is executed in a larger
multiprogramming set than it is when the program is
executed in a smaller multiprogramming set at the same
priority

The overall system page-write to page-read ratio BETA
is expected to be larger for larger multiprogramming
sets

The page allocation weights (L(j)) for all programs

executing concurrently are expected to be identical

 

100

Number of

Programs--- 1 2 3 4 5 6 7

Priority
1 PGMl PGMl PGMl PGMl PGMl PGMl PGMI
2 PGMZ PGMZ PGMZ PGMZ PGMZ PGM2
3 PGM3 PGM3 PGM3 PGM3 PGMB
4 PGM‘) PGM4 PGM4 PGM4
5 PGMS PGMS PGMS
6 PGM6 PGM6
7 PGM7

Note: PGMx is the x-th program randomly chosen.

Figure 7.

Experimental Design for Estimation

 

101

This concludes the discussion of experimental design for

estimation.
Experimental Design for Validation.

The objective of the experiment was the determination
of the limits of the model and the extent of its validity.
The five independent variables were varied over high and low
values to form a fixed crossed factorial design with 32
cells. Defining a low number of tasks as 6 and a high
number of tasks as 7, the experiment was effected by 16
computer runs of 6 programs each and 16 computer runs of 7

programs each for a total of 32 computer runs.

The experiment used 4 disk drives in addition to the
"system" disks. The first disk was accessed by all programs
for paging and the third disk was accessed by the experimen-'
tal program only. The second and fourth disks were only

accessed by the control programs (CNTL).

The criterion variables for this design were relative
errors consisting of the differences between the predicted
values of the variables and the experimental or measured
values divided by the experimental or measured values. Thus
a positive value of the transformed variable signifies
over-prediction (positive error) and a negative value
signifies under-prediction (negative error). Using a linear
model to test hypotheses by means of the analysis of vari-

ance, multivariate tests were performed for each of the

102

independent variables or effects.

To allow for sufficient degrees of freedom to estimate
the error term, at least two replications were required in
each cell of the design. Since the number of computer runs
involved would have been quite large (64), a decision was
made to use a 1/2 or "fractional" replicate, allowing the
estimation of all main effects as well as two factor inter-
actions [16,37]. This design reduced the size of the
experiment to 2 replications of 16 runs (see Figure 8). The
experimental data was tested using the Analysis of Variance

routines of Finn's MULTIVARIANCE program [22].

 

 

 

 

 

 

 

 

 

 

 

1CCCCC 1CCCCCC CCC1CC CCClCCC
00000 00001 00010 00011
2CCCCC 2CCCCCC CCCZCCC CCCZCCC
00100 00101 00110 00111
3CCCCC 3CCCCCC CCC3CC CCCBCCC
01000 01001 01010 01011
4CCCCC 4CCCCCC CCC4CC CCC4CCC
01100 01101 01110 01111
SCCCCC SCCCCCC CCCSCC CCCSCCC
10000 10001 10010 10011
6CCCCC GCCCCCC CCCGCC CCCGCCC
10100 10101 10110 10111
7CCCCC 7CCCCCC CCC7CC CCC7CCC
11000 11001 11010 11011
8CCCCC BCCCCCC CCC8CC CCC8CCC
11100 11101 11110 11111

 

 

Note: Numbers 1-8 represent "Synthetic" programs, and "C"
represents the control program. The underlined
cell identifications are the cells which must be
included in a "half-replicate" fractional design.-

Figure 8. Experimental Design for Validation

‘I’
I 4.__-__

104

The five-way factorial design for the experiment was
chosen to permit testing of the following null hypotheses:
1. The mean relative error in the prediction of the
experimental outcomes is less than or equal to .15
2. The mean relative error in the prediction of the
experimental outcomes due to variation in page index
(A) is less than or equal to .15
3. The mean relative error in the prediction of the
experimental outcomes due to variation in CPU service
variance (VS) is less than or equal to .15
4. The mean relative error in the prediction of the
experimental outcomes due to variation in I/O service
variance (VI) is less than or equal to .15
S. The mean relative error in the prediction of the
experimental outcomes due to variation in multiprogram-
ming level (N) is less than or equal to .15
6. The mean relative error in the prediction of the
experimental outcomes due to variation in priority

level (j) is less than or equal to .15

Before testing the above hypotheses, tests of all 10

two-factor interactions were planned.

One characteristic of the Analysis of Variance is that
tests of main and fixed effects are only meaningful in the
absence of significant interactions. The step-wise strategy
for significance testing is to test interactions first and

if the null hypothesis is maintained, to continue testing

. A L .‘J‘Ap_

105

the fixed effects in reverse order. If a significant
interaction is found, main and fixed effects cannot be

tested because they are "confounded" with the interactions.

A partial solution to this dilemma is to construct
confidence intervals about the means of the criterion
variables. This will give some information about errors but
will not answer any questions about sources of error. The
following null hypotheses were tested by means of 95%

confidence intervals:

7. I Relative error predicting CP I S 0.15
8. I Relative error predicting ET I S 0.15
9. I Relative error predicting PR I S 0.15
10. I Relative error predicting PI I S 0.15
11. I Relative error predicting RCUI s 0.15
12. I Relative error predicting OHTI S 0.15
13. I Relative error predicting w I S 0.15

For significance testing, the independent variables
were arranged in the order in which the largest errors could
he predicted based on taking "approximate" derivatives of
the dependent variables with respect to the independent
variables. The basis for this assertion is that an error in
measurement of an independent variable or in estimation of a
parameter may be considered to be a perturbation of the
variable or parameter. If it is assumed that the structure
of the model is an adequate representation of the phenomena

under investigation, the effect of the perturbation may be

 

106

viewed as a change in the dependent variable, i.e. a partial

derivative.

An example of this technique is the effect on working
set size of a change in the variance of the CPU service time

(for Erlang-1 service),

DI"(Ii)/<1V$(ID="(II)"'(1“‘II(2))/R)"‘ (3-5-1)
(1'C(j)/ET (j) )/“*V5 (3) -

The effect on working set size of a change in real memory
when memory constraints are active (sum[S(j):j=1,..N] > R)

is given by

Dw(j)/dR=w(j)/R. (3.5.2)

This type of "approximate" derivative indicated that the
appropriate order of the variables for step-down and
step-wise testing was: w, PI, PR, ET, CP, C08, and RCU for
the dependent variables, and A, VS, VI, N, and j for the

independent variables.

A feature of the experimental design is that each run
was a measurement of only one program, with the other
programs in the multiprogramming set controlling the envi-
ronment. The program being measured ran in a "matrix" of
copies of a "control" program. The control program was

selected to exhibit "average" behavior compared to the

 

107

experimental programs. A table of level values for each
independent variable, and a table of levels for each of the

synthetic programs is given in Figure 9.

 

108

PROGRAM-RELATED'VARIABLES

 

 

Synthetic Paging CPU I/O
Program. Activity Service Response Expected
Number Code Index Variance Variance Error
1 000 low high high low
2 001 low high low
3 010 low low high
4 011 low low low
5 100 high high high
6 101 high high low
7 110 high low high
8 111 high low low high
ENVIRONMENTAL VARIABLES
Pageable Number of Expected
Code Memos! Priority Programs Error
000 fixed low low low
001 fixed low high
010 fixed high low
011 fixed high high
100 fixed low low
101 fixed low high
110 fixed high low
111 fixed high high high
Figure 9. Levels of Independent Variables

 

109
The Synthetic Program

At this point a discussion of the programs used in the
experimental manipulations is in order. The experimental
programs are COBOL programs which process a sequential file
of "transactions" from disk against a direct update file on
disk. The program is written so that it can be run any
number of times against the files. Control information is
provided by means of execution parameters on a control card.
These parameters tell the program how large the records on
the disk are, how many records there are in each file, how
many passes to make through the transaction file, the
average number of times to execute the "compute kernal"
between I/O's, the interval of variation to use in computing
a random number of kernal executions on each transaction
record and the amount of variation to use in accessing its

own instructions in memory for paging.

Variation of the I/O is provided by the placement of
the data sets, the size of the direct files, and the record
sizes that are built when the files are created. The files
are created by a separate program which creates the direct
file sequentially then creates the sequential file by
generating random relative record numbers of records in the
direct access file. In practice, the variation in I/O
response was controlled by data set placement and record

sizes were held constant.

The compute kernal of the program selected 10 numbers

110

from an array of 1000 numbers and performed non-trivial
arithmetic operations on them which were self-checking (they
had to match previously computed numbers in the transaction
file). This turned out to be so compute-bound that the
number of compute passes between I/O had to be held to 2.
The actual number of compute passes could be held to exactly
two by selecting a variation of 0 or the maximum variation
could by generated by selecting 2, causing the program to
randomly choose an equally likely integer in the interval

[0:4]-

The program's executable code (as well as the 1000
entry numeric array) was duplicated five times. Each
section of code was sufficiently different that a paging
parameter of 0 would result in sequential execution of each
block of code processing the data in corresponding data
blocks. A paging parameter of 1 caused the program to skip
one-half of the code and data in each block. A paging
parameter of 2 caused the program to skip one-fourth of the
code and data in each block. This particular scheme was
chosen because it was felt that sequential execution would
result in the lowest paging rates since fewer pages would be

referenced during a specific execution interval.

The parameter levels chosen for the control programs
were 1 for paging index, 1 for CPU variance, and 2 for
number of kernal passes. The I/O was determined by making

the files 7 and 2 cylinders in size and locating their

 

111

centers 4 cylinders apart on the same phsysical disk drives.
The I/O for the experimental programs was controlled by
making the files 9 and 3 cylinders in size and locating
their centers 3 and 15 cylinders apart for the high variance

and the low variance versions of the program respectively.

As in parameterization, the experimental data were
collected by SMF and OS/VS1 Utilization Monitor and reduced

for input (along with the predicted results) to the MULTI-

 

VARIANCE program.

IV. EXPERIHENTAL RESULTS AND ANALYSIS

Paramete; Estimatieg

 

EQEBQL £49 9122.229

Hypotheses 1, g, e_g 3, Two replications of these runs were
made and the measured and calculated results are listed in
appendix A, Tables A1oA4. The data were analyzed using
linear regression on the model equation 3.4.4. Initial
analysis yielded a significant contribution to variance for
all but the last factor. The aggregate file I/O access rate
contributed to an increase in ALP but the increase was not
significant. This term was subsequently dropped from the
model and the data re-analyzed. The statistics for the
revised regression analysis is given in Table 1. The
statistics in Table 1 supports hypotheses 1 at the .0001
level. The effect of multiprogramming set (N) is signifi-
cant at the .0001 level but its directiOn is contradictory
to hypothesis 3. Adjustment of the predicted mean so that
it is equal to the observed mean yields a positive inter-
cept. This reflects the fact that in a lightly loaded

system, the page management routines use idle CPU time to

search page tables and maintain page queues.

 

113

Table 1. I/O Overhead Regression Analysis

goungg RAH REGR ggsrrs s D §_§g_ 9. ES
CONSTANT 2.2733703-03 8.0777652-05
PRIORITY (j) 3.8395762-04 3.4289222-05
ups SET (N) -2.85u969E-05 n.8937953-06

STD DEV OP DEP VARIABLE (OH) = 0.0212
HULT-R-SQUARED=.999Q P(3,12)=6776.6“8 P<.0001
5.32.2121: §§§§§§§TQT
CONSTANT P(1,1u)= 753.360 98.18% ADDL VAR P<.0001

PRIORITY P(1,13)= 91.806 1.60% ABEL VAR P<.0001
MPG SET F(1,12)= 30.03“ 0.17% ABEL VAR P<.0001

ADJUSTBENT TO HEAR = 0.008075

229$.Q £19 QEEEQEEQ

The same programs that were run in a non-memory
constrained environment for ALP estimation were then run
with pageable memory constrained to 312K. This amount of
memory was deemed appropriate to run three of the programs
(each with total memory requirement of 162K) with negligible
paging. The paging experiments were started with u programs
in the mix and another program was added with each succeed-
ing run up to a final run with 8 programs. In the final run
the system was under such stress that the monitor data that
was being logged was incomplete and it was not possible to
get measurements on two of the programs in the mix. when
the sequence of runs was repeated, the replication with 8

programs was therefore omitted.

114

The data collected from these runs is reflected in
appendix A, Tables AS-AB. Equation 3.4.4 was then used with
the coefficients for ALP to predict the overhead due to
non-paging I/o. The resultant overhead was subtracted from
the measured overhead to form the residue. The data were
then analyzed using linear regression with the I/O rates in
equation 3.4.5 replaced by the 223129 rates. A surprising
aspect of this process was that the overhead with paging was
actually lggg; than it was without paging at smaller multi-
programming sets (N). Analysis of these runs (Table 2)

shows that the intercept for paging overhead is negative.

An interpretation for this anomaly is that the page
management routines use CPU idle time to perform housekeep-
ing functions and look for work to do even in the abggggg 9;
paging. When the CPU is highly utilized and no paging is
taking place this discretionary work is effectively
bypassed. However when paging is taking place this work
cannot be bypassed and much of it is performed as a part of
normal page I/O processing. Ideally the intercepts for ALP
and DEL should cancel each other since no overhead should be

expended when I/O is not being done.

ﬁypgthg§g_ 4, §, a.g g. Initial statistics for analysis of
the model equation 3.4.5 indicated that the independent
variables for priority level j, multiprogramming set N, and
the aggregate paging rate T(j). all had effects in the

hypothesized direction on the paging overhead parameter DEL.

 

 

115

Table 2. Paging Overhead Regression Analysis

§QQ§Q§ BA! BEEB §Q§§Z§ -§1. .3593 Q: .§I
CONSTANT 6.165776E-03 6.7830703-05
HULT-R-SQUARED=.9993 F(1,6) =8262.718 P<.0001

ADJUSTMENT FOR HEANS=-0.0129923

However the effects were not statistically significant since
the significance levels for these effects were .79, .99, and

.40 respectively.

The constant term was left as the only term in equa-
tion 3.4.5, making paging overhead directly proportional to
paging rate. The data were re-analyzed using only the
constant factor in equation 3.4.5, yielding the coefficients

in Table 2.

Eéﬂiﬂﬂ Befig §§_imation

 

Since paging was needed to perform this estimation the
same set of runs were used for both DEL estimation and
paging ratios (BETA) estimation. An immediate source of
difficulty is the fact that page-out counts recorded by most
operating systems are not the same as those used in this
formulation. The theory predicts the number of pages that
must be written out to accomodate pages that are read for a
specific program. The identification of the program which
is losing the page is of no concern. However job accounting

routines do not record the identification of the program

 

116

which caused the page to be written but only adds to a

counter of pages lost for the program losing the page.

There are two alternatives to this dilemma. Hypothesis
7 could be agggggg and hypothesis 9 tested, or it can be
agsu ed that the number of pages written on behalf of a
program is equivalent to the number of pages which that
program g_g§g§ to be written and proceed with the analysis.
Proceeding on the latter assumption, results of the hypothe-

sis tests must be interpreted in view of the above discus-

sion.

Hypgthgggs 1 and g. Hypotheses 7 and 8 were tested by
performing a 4X4 factorial analysis of variance on the PO/PI
ratios of the same 4 programs for multiprogramming sets N =
4, S, 6 and 7. The within-group sum of squares was used as
the error term. Results of this analysis are summarized in
Table 3. Since there are no significant interactions main
effects may be tested. Hypothesis 7 was not supported since
there was a priority effect at the .0009 significance level.
The multiprogramming effect was positive (consistent with
hypothesis 8) but the significance level was .124. The cell
means predicted from the coefficients in Table 3 were used

as the BETA(j,N) for the predictive model.

Hypothesis 2. Hypotheses 9 was tested by performing a
one-way analysis of variance on the system-wide paging
ratios computed from the paging estimation runs with 4, 5, 6

and 7 programs in the multiprogramming sets, yielding 8

 

117

samples. From observations of cell means it was decided to
test the hypothesis using orthoganal polynomial contrasts.
Hypothesis 9 was not supported at the higher multiprogram-
ming levels since a quadratic effect was found at the .0004
significance level. The Analysis of Variance table for this

analysis is given in Table 4.

Page Allocatiop w iqht

 

This factor tests the Paging submodel's accuracy in
predicting working set allocations. The measure for this
factor L, is derived from the Kuhn-Tucker conditions for
minimization of the total paging rate where program j's
paging rate is weighted by L(j). Summing the L(j)s to 1 and
assuming that paging indices A(j) are identical, we have the

following estimator for L(j):

L(ij)=('(j)**2)/CPPU)*S‘III ("(1)**2)/CPP(1131=1o- . J]-
(“o 1.1)

To make the L(j) comparable across multiprogramming
sets, The L(j) calculated by equation 4.1.1 will be multi-
plied by the number of programs in the multiprogramming set,
N. The page allocation weights can be used in this form
since it is the relative sizes of the weights that is

important.

.J

 

118

Table 3. Page I/O Ratio Analysis

 

§QQ.§§ g.£ 3.93 .szzs §.2 T3393 9: .§T
GRAND HEAN .603356 .085417
HPG SET .329145 .241595
MPG SET .617560 .241595
HPG SET .239244 .241595
PRIORITY -1.104013 .241595
PRIORITY -1.083273 .241595
PRIORITY -.917266 .241595
INTERACTION -1.173274 .683333
INTERACTION -1.567103 .683333
INTERACTION -.423430 .683333
INTERACTION -1.337678 .683333
INTERACTION -1.306999 .683333
INTERACTION -1.669633 .683333
INTERACTION -.464727 .683333
INTERACTION -1.358317 .683333
INTERACTION -.564918 .683333
$3.23.; ALAN smug; 2.12 z 2
GRAND MEAN 11.649 1 49.896 .0001
HPG EFFECT 0.521 3 2.231 .1241
PRIORITY 2.198 3 9.413 .0009
INTERACTIONS 0.263 9 1.128 .3985
ERROR TERH 0.233 16

Both replications of Paging Estimation runs with 4, S,
6, and 7 programs were used giving 44 observations of L(j). i
The data for these runs is given in the Appendix, Tables A9
and A10. The equalized estimates for L are given in Table 5
and the ANOVA table for a 4X4 factorial design is given in
Table 6. Tests of hypothesis 10 use only the first four

programs in the analysis.

Examination of the ANOVA table reveals a significant
interaction at the .0001 level, hypothesis 10 is not

supported. Regression coefficients from this analysis

 

119

Table 4. System-wide Paging Ratio Analysis

SELL SELL !£.!§ £.LL §.2 2:1.
N = 4 0.257946 3.380890E-02
N = 5 0.477944 3.876854E-02
N = 6 0.563385 1.867964E-02
N = 7 0.354059 6.027378E-03
§QQE.§ 32.! §QEAE§ 22 z 2
LINEAR 0.014 1 18.436 .0128
QUADRIATIC 0.092 1 121.610 .0004
CUBIC 0.003 1 3.387 .1396
ERROR TERH 0.001 4

cannot be used since priority 5, 6, and 7 programs have not
been included. Regression analysis was therefore performed
using all 44 observations with covariates: multiprogramming
level (N), priority (j), priority squared (j**2), multipro-
gramming level squared (N**2), and the cross-product of
multiprogramming level and priority. The coefficients
resulting from this analysis are given in the bottom of
Table 6. These coefficients are used in computing the

weights used in the predictive model.

Since there are significant deviations from unity in
the ANOVA it can be concluded that the paging model is
deficient. At this point it will be assumed that the
working set weights that have been estimated are not specif-
ic to the programs used in the analysis but reflect error in

the paging submodel and continue with the analysis.

120
Table 5. Estimated Working Set Heights

HULTIPROGRAHHING SET (N)
G

 

2.1325.le (.1) 2 .~ .5. 2

2.140348 0.699155 0.375096 0.355586

1.799340 0.862835 0.377862‘ 0.368515

0.384188 0.323695 0.205488 0.162483

0.552756 0.363280 0.230628 0.175931

0.421240 0.418820 0.367578 0.279153

0.467356 0.360490 0.268770 0.273000

1.054224 0.708645 0.519882 0.411845

1.180548 0.675260 0.475926 0.426566

. . . 2.849685 1.347792 1.815296

. . . 3.098625 1.236636 2.007369

6 . . . . . . 3.184158 1.917405
6 . . . . . . 3.410190 1.985942
7 O C C C O C C O 0 2.059225
7 C O O O O C O O 0 1.762670

 

Higggllaneg_§ Parameters

Prior to using the model to predict program perfor-
mance, the CPU rates (HU), paging indices (A), maximum
storage (5) and critical memory point (H) must be estimated.
For this purpose the experimental programs were executed in
8 runs of N=4, with each experimental program being executed
with 3 copies of the control program. The measured and

calculated results of these runs are given in appendix A,

Table A11 and A12.

121

Table 6. Working Set Weight Analysis

 

999399 92.3 99933; 92 I E

GRAND HEAN 9.77u 1 1499.218 .0001

NRC EFFECT 0.802 3 122.997 .0001

PRIORITY 0.590 3 90.430 .0001

INTERACTIONS 0.196 9 29.991 .0001

ERROR TERH 0.007 16

.993.3 333 3293 COEFF§ STE 3339. 9: §§.

PRIORITY (j) -0.286627 0.301902

j*j .143958 0.035327 .
j*N -0.060599 0.065588 .
HPG EFFECT (N) .180209 1.019103 ~4
N*N -0.025178 0.090079

CPU Rate Estimation

CPU rates for the control programs were estimated from
the 4 runs of the control program with N = 4, 5, 6 and 7.
The parameter MD was found to vary considerably, even within
a single run. This effect is thought to be related to the
method used to account for program time and the processing
which takes place following an interrupt and before the
value of the CPU clock is stored. This has the effect of
making CPU service intervals for low priority programs
appear larger and therefore the CPU rates (HU) would appear

smaller.

The rates computed from each of the runs in appendix
A, Table A9 and A10 are weighted averages of the rates
computed from each program with I/O access rates used as
weights. The expansion factors (CYT) were used to increase

these rates (decrease service times) by applying the

122

equations in section 2.5, approximating the instantaneous
I/O rates by LAH = 1/ET*(1-CP). The rates from the 4 runs
were then averaged together and used in subsequent calcula-

tions and in the predictive model.

The CPU rates for the experimental programs were then
estimated from the 8 runs in appendix A, Tables A11 and A12
The same procedures as before were used to estimate CYT and
thereby remove some of the effects of "cycle stealing" due

to I/O.
Virtual Storage Estimation

This turned out to be trivial since the programs were
constructed to have the same maximum sizes (162K). This
number was taken from the job listings which gives a

measurement of the maximum virtual storage used.
Storage and Paging Index Estimation

These parameters were estimated in a sequential
fashion. First a "quasi" paging index A' was computed with
the ratio of the sum of critical storage to real storage

equal to 1. We have

A'=V(3)*PI(j)/CP(j)*K*(5(j)‘9(j))- (9-1-2)

A linear model was then fitted to this data using ln(A') as
the dependent variable and ln(N) as the independent varia-

ble. Symbolically we have A = A'*(N*H/R)**-q or A =

123

A'*((H/R)**-q)*N**-q. The choice for H = 56 was justified
on the basis of what follows. If the paging rate for the
entire system is less than 2 pages a second it may be
concluded that memory constraints are 395 active. Depending
on the length of a program run, an apparent system paging
rate of 2 pages/second can be caused by the initial flurry
of activity when the programs begin execution and open
files. The fact that the constraint is not active at N = 4
may be inferred from the size of the working sets of the low
priority programs compared to the higher priority programs.
The memory constraint appears to be just barely active at N

= 5 so an appropriate value for H is H = R/S = 280/5 = 56K.

Using H = 56K and equation 4.1.2, A is estimated for 4
different priorities of the CNTL program. From Table 7 it
may be observed that the values of A vary from a low of
0.000504336 to a high of 0.21720546. This wide variation in
the values of A are again indicative of the paging

submodel's failure to accurately predict paging rates.

To determine what is a reasonable value for A, dimen-
sional analysis is applied to the defining equation for
paging.

The dimensions of interest are those of K and A. In the
estimation process, a value of K = 490 was used for the
8/370 model 148 (meaning that the 148's average speed is
490,000 instructions per second). This makes the units of K

instructions per thousanths of a second and therefore the

Table 7.

124

Paging Index Calculations

 

A' VALUES FOR H = 56K
PRIORITY LEVELS
399 LIL (3) 1 2 3 9
4 0.001343 0.000504 0.001292 0.008048
5 0.002593 0.002228 0.002144 0.004502
6 0.028177 0.029452 0.044289 0.070589
7 0.051738 0.045349 0.070277 0.119278
CORR 0.962743 0.977337 0.942172 0.849545
b -16.743895 T19.677474 -18.106903 -13.270179
k 7.110351 8.672056 7.998743 5.655085
A 0.004301 0.002735 0.004414 0.013748
NORHALIZE 1.572647 1.000000 1.613795 5.026601
NORHAIIZATION FACTORS
PRIORITIES (j)
929 §E111 (A) 1 2 .3. 9 2 .9 1
4 2.663 1.000 2.562 15.957 . . . . . . . . .
5 1.164 1.000 0.962 2.021 2.345 . . . . . .
6 0.957 1.000 1.504 2.397 3.591 7.375 . . .
7 1.141 1.000 1.550 2.630 4.707 4.556 4.425

units of A are in memory references per instruction divided

by 1000. Expressing A in terms of memory references per
instruction for the A~values from Table 7, A = 4.3, 2.7, 4.4

and 13.7 respectively.

System /370 instructions reference either 0, 1, 2 or 3

locations in main memory. Of these instruction, the ones
that reference 0 storage locations are: supervisory call
instructions, register instructions and those used for
integer and floating point arithmetic. These are not
frequently used in commercial programs and, even when they
are used they must be used with instructions which reference

1 storage location to retrieve and store calculated data.

125

The instructions which reference 3 storage locations are
also infrequently used. These are instructions which
actually reference two starting addresses but involve
movement of enough data that ocassionally the data overlaps
multiple pages. An example of an instruction of this type

is the long move instruction.

There are a large number of instructions which refer-
ence a single location and they are very frequently used.
Instructions in this class include branching, register
loading and storing and immediate instructions which contain
a single byte of data in the instruction. Instructions
which reference 2 storage locations are also very frequently
used and these include storage to storage moves and
compares. These instructions are particularly frequent in
commericial applications where they are used extensively for

logic and data formatting.

Based on the previous discussion and observations of
instruction profiles of many programs it is very likely that
the A value for most programs would be in the rangeof 1 to
2 storage references per instruction. The smallest of the
estimates for A in Table 7 (and the only one in the realm of
possibility) indicates 2.73 storage references per instruc-
tion. Although this is possible it is not very likely.
However the above analysis does point out that the only
estimates for paging index in the practical range are those

developed from partition 2 of Table 7. In order to make A

126

values estimable using this format, it was decided to
normalize A' estimates by the corresponding factors from the
bottom of Table 7 by dividing by the appropriate factor.
These factors were derived by dividing the A' calculated in
each partition of the four runs of the CNTL program by the
corresponding estimate for A' from partition 2 (column 2 in

the table).

To estimate the page indices for the experimental
programs, 8 runs of 4 programs each were performed. These
runs consisted of a copy of the experimental program and 3
copies of the CNTL program. The measured and calculated
results of these runs (see appendix A Tables A11 and A12)
and the conversions in Table 7 were applied to compute the

estimates of A in Table 8.

From the page index estimates of Table 8 it may be
observed that the paging manipulation was not successful
since the estimates for A are apparently correlated with low
I/O variance rather than with the high paging indicators
(1X1 in the program identification). Prom the program
estimation runs in the appendix, Table A10 it may also be
observed that higher CPU service times are apparently
correlated with high priority execution. Along with this
effect a slight increase in process time may be observed,
however the increase in process time is not proportional to
the increase in the number of I/O operations, yielding a

higher CPU rate. Fewer I/O's are reported at higher priori-

 

127

ties because the lower priority jobs were started first in
order to delay some of the effects of priority on job
initiation.

At the same time the lower priority jobs are delayed in
starting to execute so it makes no sense to use observations
during this initiation stage. Thus only those measurements
which are taken while all programs are executing are includ~
ed. This procedure should yield results which are contrary
to the "expansion effect" and also contrary to the argument
that the lower priority programs should have smaller rates
(large average service times) because they include "start-
up" time and time for initially opening the files. Hhile
this effect appears to be systematic the exact cause is

unknown at this time.

33.2.1 9.2....r' 481139; “.lﬁan ' PHAAEAQAS

To the extent that the paging manipulation does result

in higher paging rates the manipulation was successful.

However the previous discussion on page index estima-
tion points out that the attempt to-manipulate page index
was not successful in terms of the estimates of A since this
index is correlated with the I/O variance manipulation and

not with the paging manipulation.

128

Table 8. Experimental Program Estimates

2399339 A 3 .99 A ELI 231.99
000 6 1 126.584 0.002069 3 1 4
011 6 1 126.800 0.016921 3 2 3
101 6 1 125.073 0.020515 3 2 3
110 6 1 125.528 0.003124 3 1 4
001 6 4 128.534 0.002655 3 2 3
010 6 4 128.350 0.000941 3 1 4
100 6 4 128.764 0.001193 3 1 4
111 6 4 127.095 0.002889 3 2 3
001 7 1 128.534 0.002655 3 2 3
010 7 1 128.350 0.000941 3 1 4
100 7 1 128.764 0.001193 3 1 4
111 7 1 127.095 0.002889 3 2 3
000 7 4 126.584 0.002069 3 1 4
011 7 4 126.800 0.016921 3 2 3
101 7 4 125.073 0.020515 3 2 3
110 7 4 125.528 0.003124 3 1 4

CNTL 6 2 54.296 0.002735 2 3 4

CNTL 6 3 54.296 0.002735 2 5 6

CNTL 6 4 54.296 0.002735 2 7 8

CNTL 6 5 54.296 0.002735 4 1 2

CNTL 6 6 54.296 0.002735 4 3 4

CNTL 6 1 54.296 0.002735 2 1 2

CNTL 6 2 54.296 0.002735 2 3 4

CNTL 6 3 54.296 0.002735 2 5 6

CNTL 6 5 54.296 0.002735 4 1 2

CNTL 6 6 54.296 0.002735 4 3 4

CNTL 7 2 54.296 0.002735 2 3 4

CNTL 7 -3 54.296 0.002735 2 5 6

CNTL 7 4 54.296 0.002735 2 7 8

CNTL 7 5 54.296 0.002735 4 1 2

CNTL 7 6 54.296 0.002735 4 3 4

CNTL 7 7 54.296 0.002735 4 5 6

CNTL 7 1 54.296 0.002735 2 1 2

CNTL 7 2 54.296 0.002735 2 3 4

CNTL 7 3 54.296 0.002735 2 5 6

CNTL 7 5 54.296 0.002735 4 1 2

CNTL 7 6 54.296 0.002735 4 3 4

CNTL 7 7 54.296 0.002735 4 5 6

Note: Program definitions are given in Figure 9.

 

129

£29 LLLaACG .49....2111 “4163.15.22

From the closeness of all estimates for CPU service
time (rate) it may be inferred that the manipulation for CPU
service variance was successful although the measurement
tools used in this research do not permit estimation of the
absolute magnitude. The following formulas were used to

construct the CPU variance manipulations:

S=(1+X/K)/HU**2,
V(S)=v*(v+1)/3*(K*HU)**2, (4.2.1)

E(S)=1/HU,

where K is an integer representing the number of times the
compute "kernal" is to be executed and X is an integer in
[-v,v] chosen with probability 1/(2*v+1), and v is an
integer less than or equal to K. In all cases K = 2 is
used. For the low variance programs (11X), v = 0 is used
for zero variance. For the high variance programs, (XOX) v
= 2 was used for a variance of 1/2*HU or approximately
0.000032746 seconds squared compared to a mean service time
of 0.00784314 seconds. For the CNTL program v = 1 was
chosen. This yields a variance of 1/6*HU or 0.00005653
seconds squared compared to a mean service time of approxi-

mately 0.0184176 seconds.

130
H nipplation

The I/O variance was again manipulated by construc-
tion. The high I/o variance program files were set up with
their centers 15 cylinders apart and file sizes of 9 cylin-
ders. The low I/O variance programs were set up with their
centers 3 cylinders apart and with file sizes of 3 cylinders
each. The I/o variance for the low and high variance
programs are approximately 93 H5 squared and 123 Hs squared
respectively. However, the means are also different in this
case, approximately 28 H5 for the high variance programs and

18 Hs for the low variance programs.

:3
o.
m
H
1m
10
a.
0
1+
"-0
to
1m

.1...‘ n...-

Sixteen runs of the APL model were made to develop the
predictions to be used in the analysis of variance. The
results of these runs for the experimental programs is given
in Table 9. In making these predictions the APL model was
found to converge quite rapidly. 0f the 16 runs, 13
converged in 6 iterations, 2 converged in 10 iterations and

one required 16 iterations.

The convergence criteria was to stop iterating when
the maximum absolute difference in I/O rates from one
iteration to the next was less than 0.002. This results in
an difference of less than 0.002 seconds in I/O service time
for the very lowest priority programs. A diagram of the

relationship between the various submodels and the

 

H P CPU CYCLE PAGE I/O CHNL OVER NRKG
2399 L 3 NEIL TIRE 3313 3313 BILL 9939 §§I
000 6 1 0.21 0.034 3.5 2.9 26.65 0.033 0.23 53
000 7 4 .04 .114 5.6 4.1 4.60 0.004 .34 24
001 6 4 .05 .131 2.3 1.7 5.79 0.007 .21 30
001 7 1 .18 .032 9.6 8.3 22.90 0.018 .36 57
010 6 4 .05 .138 1.5 1.1 6.08 0.008 .20 19
010 7 1 .19 .034 6.5 5.5 23.47 0.019 .33 36
011 6 1 .22 .031 5.8 4.8 26.85 0.034 .26 111
011 7 4 .02 .097 10.4 7.6 2.65 0.002 .37 46
100 6 4 .05 .137 1.7 1.2 5.95 0.007 .20 22
100 7 1 .18 .034 7.3 6.3 22.65 0.018 .34 40
101 6 1 .22 .031 5.7 4.7 26.76 0.034 .27 119
101 7 4 .02 .096 10.9 8.0 2.40 0.002 .38 48
110 6 1 .21 .034 4.0 3.3 26.18 0.033 .24 60
110 7 4 .04 .111 6.3 4.6 4.30 0.003 .35 27
111 6 4 .05 .130 2-5 1.9 5.75 0.007 .21 32
111 7 1 0.18 0.032 10.1 8.7 22.50 0.017 0.36 60

Table 9.

Hodel Predictions

Note: Program definitions are given in Figure 9.

 

convergence algorithm is given in Figure 10.

Hgdel gglidatio

 

The experimental programs were then run on the System
/370 Hodel 148. Since each run was repeated there were a
total of 32 runs. The measured and the calculated perfor~
mance variables from the experimental programs are given in

Tables 10 and 11.

132

DE
AND
mﬂ<

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hove: scauomumusn mcaaemnmommauasz .oa gunman
damm.Q.m.<.M.m
v
9m.mo Hmcosnmm Hm
madman
m m
2‘ mm...
IKIIKII
ﬂaw
mmu Hx
Hmcosnmm moo Ham Hmcosnsm
3.8 9.8 mom O\H
9 .m. .0H
.m. .3 532662 59
m.couzmz

 

 

no
04
AND
mm

MB
H0

133

Table 10. Experimental Runs - Heasured

H P PAGES PAGES OTHER ELAPSED PROC OVER- HRKG
99.9 9 9 999 99 929 9999 9999 9999 999
000 6 1 99 303 8,720 342.09 69.75 69.62 50
000 6 1 68 203 8,720 330.70 69.56 65.29 52
000 7 4 1,241 7,413 8,720 1,859.87 72.39 442.99 42
000 7 4 821 5,409 8,462 1,511.70 69.30 362.51 42
001 6 4 234 1,277 8,718, 833.58 69.23 176.30 52
001 6 4 288 1,423 8,718 842.08 69.31 182.45 50
001 7 1 128 1,009 5,800 322.98 47.06 75.22 54
001 7 1 217 1,462 8,353 469.79 67.93 110.99 48
010 6 4 197 1,172 8,720 848.99 69.32 182.52 54
010 6 4 182 1,048 8,720 880.52 69.18 181.84 46
010 7 1 125 1,574 7,709 489.00 63.11 115.97 48
010 7 1 106 1,479 8,057 478.64 65.76 114.60 54
011 6 1 119 426 8,718 330.65 70.36 65.86 48
011 6 1 142 419 8,718 325.86 70.55 67.47 52
011 7 4 815 5,086 7,331 1,365.27 60.97 325.92 46
011 7 4 950 5,700 6,765 1,455.22 56.81 344.14 44
100 6 4 919 4,413 8,720 1,266.70 70.30 296.02 58
100 6 4 1,191 5,808 8,720 1,463.46 70.93 347.36 56
100 7 1 223 3,537 8,498 743.75 69.70 178.52 54
100 7 1 275 4,008 8,513 819.89 70.10 196.13 62
101 6 1 127 578 8,718 337.04 70.57 70.14 58
101 6 1 129 525 8,718 330.45 70.49 69.06 60
101 7 4 1,862 7,220 1,329 1,460.52 13.25 346.28 38
101 7 4 1,641 6,061 1,060 1,244.75 10.70 292.30 36
110 6 1 83 492 8,720 367.05 70.33 78.89 58
110 6 1 84 409 8,720 358.19 70.47 74.85 58
110 7 4 1,693 6,133 927 1,257.55 9.47 295.34 40
110 7 4 1,627 6,206 1,224 1,254.70 11.85 296.95 42
111 6 4 708 3,477 8,718 1,103.52 71.18 255.99 56
111 6 4 678 3,482 8,718 1,100.18 70.92_ 254.93 56
111 7 1 241 2,618 8,601 620.79 71.15 147.87 56
111 7 1 259 2,468 8,432 593.80 69.73 141.51 54

Note: Program definitions are given in Figure 9.

 

001
001
001
001

010
010
010
010

011
011
011
011

100
100
100
100

101
101
101
101

110
110
110
110

111
111
111
111

~J~IO\O\ QQmCh #4010) 1H3
.=£=a.a 4.9;“: d-aSLc cue-Ad mam

QQGG

ﬁk—I—I d-‘kk

\lsJO‘O‘ QQO‘O‘ ~1qu QQO‘G
CC-I—I

4.19:4:

Table 11.

CYCLE PAGE

CPU
HEEL 2295
0.20 0.038
.20 .037
.04 .115
.05 .109
.08 .083
.08 .083
.15 .047
.15 .048
.08 .086
.08 .090
.13 .053
.14 .050
.21 .036
.22 .036
.05 .110
.04 .117
.06 .096
.05 .101
.09 .062
.09 .065
.21 .036
.21 .036
.01 .171
.01 .175
.19 .040
.20 .039
.01 .178
.01 .169
.07 .090
.06 .090
.12 .055
0.12 0.054

O O I
admh)
o o o 6

£80.;
03300
d-IQU') 010010

“-1de

o 0 m
0000....
o o o o

14.10130 ONUIOCD

ww-A—s
cum-1.:
0

ANN:

m o o
@Us’t‘h

SC-I-i
40de NOADONQ \DCDOU'I \dew

0 O
10de
4:423:01 ww—I-I
O...

O‘O‘NN U1U'1C:

l O 0 0 O O O
ONO.)
kt—I—I ca---:
0000 0000

sauna 001—5.4
mmmm MM‘O‘

kcww
mono

NNNN

I/O
99!!

25.49
26.37
4.69
5.60

10.46
10.35
17.96
17.78

10.27

9.90
15.77
16.84

26.37
26.76
5.37
4.65

6.89
6.00
11.43
10.38

25.87
26.39
0.91
0.85

23.76
24.35
0.74
0.98

7.90
7.93
13.86
14.20

CHNL

QTIL

0.021
0.021
0.015

0.014.

0.012
0.012
0.022
0.022

0.011
0.011
0.020
0.020

0.023
0.023
0.015
0.015

0.016
0.016
0.020
0.019

0.024
0.024
0.016
0.016

0.021
0.021
0.016
0.017

0.015
0.015
0.021
0.022

Experimental Runs - Calculated

Note: Program definitions are given in Figure 9.

OVER HRKG
9999 999
0.20 50
.20 52
.24 42
.24 42
.21 52
.22 50
.23 54
.24 48
.22 54
.21 46
.24 48
.20 48
.21 52
.24 46
.24 44
.24 56
.24 54
.24 62
.21 58
.21 60
.24 48
.24 48
.22 60
.21 60
.24 27
.24 27
.23 32
.23 32
.24 60
0.24 60

135

The predictions and these results were then merged and
made inputs to an analysis of variance using Finn's HULTI-
VARIANCE [22]. The criterion variables were transformed
into a relative error by dividing the differences between

the predictions and the outcomes by the outcomes.

Results

13..

999.1..819 f V 999.99

 

Hypgghg§_§ 1:6. Examination of the Hultivariate Analysis of
Variance results (Table 12) indicates significance for all
10 interactions. This prevents tests for main effects
because the main effects are gggfgggggg with the interac-
tions. Based on the analysis of variance it could well be
that there is not a fixed effect but this cannot be shown
using these tests. This effectly bars statements about the
fixed effects of the five factors other than to say that
there is apparently significant errors in the predictive
model. Thus these techniques are unable to help further in

analyzing the model. Univariate techniques will now be

explored for some partial answers.

 

 

Table 12.

pg

136

ANOVA Table

LEAST SQUARES ESTIMATES OF

.3236 -.0636

.9023

of Prediction Errors

Q

B

.99

EFFECTS-GRAND MEAN

.8109“ -02218 .3

149

-.0369

STD ERRORS OF LEAST SQUARE ESTIMATES - GRAND MEAN

.0181 .0044 .0436
.1‘_E__P;_I E EGR
GRAND HEAN
PAGE INDEX

CPU VARIANCE

I/O VARIANCE

MPG LEVEL

PRIORITY

PGNDX

PGNDX

PGNDX

PGNDX

CPVAR

CPVAR

CPVAR

IOVAR

IOVAR

MPG X

X

X

X

CPVAR

IOVAR

MPG

PRIORITY

IOVAR

MPG

PRIORITY

MPG

PRIORITY

PRIORITY

.0486

.0

020 .0

F(7,10)= 2,268.646

F(7,10)=
F(7,10)=
F(7,10)=

154.835
4.626
128.399

F(7,10)= 3,296.047

F(7,10)= 6,596.086

F(7,10)=
F(7,10)=
F (7,10) =
F (7,10) =
F(7,10)=
F(7,10)=
F(7,10)=
F(7,10)=
F(7,10)=
F(7,10)=

32.045
17.399
41.494
56.054
23.016
27.486
6.103
32.918
36.309
376.800

372

.0091

P<.0001
P<.0001
P<.0150
P<.0001
P<.0001
P<.0001
P<.0001
P<.0001
P<.0001
P<.0001
P<.0001
P<.0001
P<.0057
P<.0001
P<.0001

P<.0001

 

137

9999 99. 9 99999999 99.91999

Hypgthe s 1313. To help assess the effectiveness of the
model, confidence interval analysis will now be employed.
The null hypotheses will be supported if the confidence
intervals calculated from estimates of the of the grand
mean of the relative error lies in the interval [-.15,+.15].
The confidence intervals are of the form u1t*s/n**.5, where

u is the estimate of the mean, and t is the value with 16

 

degrees of freedom at the .025 probability level (2.12), n
is the number of observations (32), and s is the standard
error of the estimate [22]. The confidence intervals for
the dependent variables are as follows:
PROGRAH CPU UTILIZATION (CP): [0.282928,0.364252]
PROGRAM CYCLE TIHE (ET): [-.073485,-.053715]
TOTAL PAGING RATE (PR): [0.740463,0.958317]
PAGING INPUT RATE (PI): [0.804467,1.000070]
CHANNEL UTILIZATION (RCU): [-.226226,-.217274]
CPU OVERHEAD (CHT): [0.231618,0.398242]
HORKING SET (w): [-.057384,0.016416]
From this it can observed that hypotheses 8 and 13 are
supported but hypotheses 7, 9, 10, 11, and 12 do not receive
support. Thus from a statistical point of view, the model

does not predict the performance variables of interest

within the desired error margin of 15%.

 

A possible source of error might have been the fact
that the CPU submodel does not expligitly account for
variance in I/O service time. The I/O service time depends
on a random pattern of accesses by device, channel conten-
tion, rotational position sensing, and even device errors
and retry. Hinor sources of variation in CPU service times
may be attributable to error routines and exception rout—

ines.

A potential source of error in the CPU submodel is the
fact that the model does not address the portion of CPU
overhead which is not igtggpuptiblg, nor does it handle the
portion of I/O processing which g_p interrupt higher priori-

ty programs.

A major source of error is the scaling effect due to
differences in small values of CPU utilization. For exam-
ple, the difference between a predicted CPU utilization of
'0.02 and an observed utilization of 0.01 is of little
practical importance but the relative error of 100% serious-

1y biases the CPU utilization statistic.

The major source of error in the model is the subsys-
tem which deals with paging. Host previous authors have
either considered the paging rate to be fixed or ignored it
altogether. Some authors have assumed the rate of paging
operations to be proportional to the instruction execution
rate for the task itself (this approach ignores data refer-

ences). Others predict paging rates but only for certain

 

"safe" values (where the paging "function" is more or less

linear).

The choice of two factors, critical storage size and
page index to represent a program's paging behavior is an
oversimplification. A program's address space may consist
of many distinct types of data, each of which may have a
different reference pattern. For example, the executable
instructions of a program may be composed of loops, jumps,
or straight-line code and the instructions may or may not
modify themselves. The data portion may be scanned sequen-
tially or searched using a binary or indexed technique.

Data areas may be separate from instructions or the data may
be interspersed among the instructions, a situation referred

to as 99999991 99‘999999999-

The use of a program's full size as a parameter of
page demand is somewhat unrealistic in view of the fact that
a certain portion of many programs are composed of error

routines, initialization and termination routines which

140
execute an insignificant amount of time.

The proposed model includes some of the methods of
previous authors, and therefore includes some of the shortv
comings. In particular the paging coefficients estimated
for the page-in and page-out operations may be more repre-
sentative of the characteristics of the programs run on the
system than they are of the operating system and memory size
of the system. If this is indeed the case, the error in
predicting the performance of a particular set of programs
will vary according to their deviation from the "typical"

programs in an installation.

The assumption that CPU overhead processing is carried
out in non-pageable memory is contrary to the known opera-
tion of the Operating system used in these experiments.

Some portions of the operating system used in this research
compete with the user programs for the available real

storage pages.

A final source of error in the paging submodel is that
the paging indices were estimated at a multiprogramming
level of N=4. From previous comments about the lack of a
real memory constraint at N=4, it is likely that more stable
estimates for this parameter could be obtained at higher

multiprogramming levels.

141

gggtvExperimentg; Paging Agglysig

-9999999 999999 99940de1

 

The results of the program runs with 4, 5, 6, and 7
copies of the control program were fitted to a model of the
form w(j) = Q*CP(j)**1/k. This model was found to account
for 6%, 69%, 99%, and 97% of the variation in w(j) for N =
4, 5, 6, and 7 respectively. The constant of proportionali-
ty (Q) was found to fit the expression R/sum[CP(l):l=1,..,N]
very closely. This suggests a general expression for

working set size of of the form

"(3)=Q*(3(j)*CP(3))**1/ko (“-5-1)

where Q = R/sum[(A(l)*CP(l))**1/k:l=1,..,N]. Working
backward from this expression, the following expressions are

suggested for the real-time paging rate:

PI(j)=CP(j)*A(j)*K*EXP(N)/(k-1)*w(j)**(k-1), (“-5-2)

EXP(N)=(sum[H(l):1=1,..,N]/R)**q.

The value of g was estimated to be 7.14 and the value
of k was estimated to be approximately 4. Subsequent runs
of the predictive model with the revised paging submodel and
new estimates of A for N=7 yielded the following mean

errors:

142

SE 99 £11 2.1. 99.9 9.112 9'.

0.285 0.006 0.343 0.298 0.179 0.068 0.010

This significantly reduces the errors in paging and the size
of the relative errors in CPU utilization (CP) and channel
utilization (RCU) is believed to be due to the scaling

effect mentioned earlier.

9292..ar9929 9....ith 9999999 9..od999

Fitting the control program runs to the Belady-Kuehner
"lifetime function“ [6] accounts for 47S, 47S, 98%, and 96%
of the variation in paging rates. Likewise the paging
expansion factor EXP(N) was found to account for 96% of the
variation in the system paging rate. The large exponent
value (7.14) was unexpected in view of quadratic and linear
segments in Bard's Paging Index [3]. Chamberlain, Fuller,
and Liu's Half-life function [12] was found to account for
49%, 39%, 99%, and 96% of the variation in paging rates when

fitted to the data from the control program runs.

V. CONCLUSIONS AND RECOMMENDATIONS

Research gonglu519g§

The knowledge gained by study of the model discussed
herein is a step toward better predictive methods for
computer performance, I/O configuration performance and

program performance.

The system overhead due to handling I/O requests was
found to be proportional to the I/O rates and there was a
small but significant positive effect due to program priori-
ty. There was a small but significant negative effect due
to multiprogramming level. The paging overhead was found be
proportional to paging rates but the small effects due to
priority and multiprogramming levels were not statistically

significant.

The increase in the ratio of page-writes to page-reads
was not significant but there was a significant increase

with increasing priority.

The method of sucessive estimation and the convergence
techniques used in this model were found to be very effec-
tive. Convergence was usually obtained in less than 6

iterations. Improvements in the algorithm made after the

143

 

 

144

experiments had been performed reduced the number of itera-
tions to the range of 4 to 5 iterations for about half of

the model executions.

While the original goals of this research have not
been achieved, positive benefits have been gained from this
effort. For one thing the least elaborate part of the
entire model, the paging submodel, has turned out to be the
source of much of the error. Part of the reason why this
part of the model is so simple is because this area is not
as well understood as the operation of the CPU and I/O
devices. Failure of the model to accurately represent
paging behavior has a more serious effect on some of the
criterion variables than others. For example, the criterion
variables that seem to have the largest errors are paging
counts (as expected) but also CPU overhead. This is not so
hard to understand in view of the demonstrated effect of

paging rates on CPU overhead.

The techniques employed with the variable paging
manipulation of the synthetic programs can be used as a tool
in controlling loads on other systems during performance
measurement, systems tuning and benchmarks. The CPU and I/O
submodels are well elaborated and it is possible that they
can be of benefit to others who will study the kinds of
systems studied in this research. The reasons why these
models are so elaborate is that suitable models for this

research were not generally available and the insights

 

145

gained from the model building process stimulated further

elaboration.

Failure to accurately predict paging behavior does not
prevent the model in this thesis from explaining many other
events such as CPU and I/O service intervals, initial wait
and completion times since paging rates are a matter of
measurement in existing systems. It is possible that the

use of the revised paging submodel can make this method

 

worthwhile for practical applications. The inherent errors
in this submodel at unconstrained memory levels is not too
serious a defect since working set estimates are usually
unimportant at unconstrained memory levels and the paging
rates in these cases is usually neglible. Another benefit
of this effort is that it clearly points out the direction

that further research in this area ought to take.

gggommendatiggg g_£ Fgrthg; Research

 

Clearly the direction for future research in this area is
the development of better analytical models of paging
behavior. A likely candidate for this research would be
some variation of Belady and Kuehner's lifetime function
approximation [6]. Another possibility is to validate the
analytic paging model based on the relationship between the
CPU utilization and working set size. Post-experimental
analysis of the data obtained in this research revealed

strong evidence for a relationship of the form w=Q*CP**1/k.

 

146

This could become the basis of more research into paging in

priority interrupt driven systems.

The page-write/page-read ratios were not central to
this research, however questions posed here suggests further
study in this area. There is no reason why the data for
such a study cannot be drawn from the job accounting data
collected from production systems rather than from experi-

ments.

Methods of analysis other than factorial analysis of
variance is recommended since there is some indication that
interactions will reduce the usefulness of this kind of
analysis. One way to overcome this problem is to perform
enough observations so that independent one-way analyses can
be performed. A larger number of sample points can be
obtained with fewer actual program runs by including 2;;

programs from each run in the analysis.

The size of the relative errors in predicting the
cycle time (ET) and the system overhead (OHT) suggests that
the large relative errors in the program CPU utilization
(CP) and the channel utilization (RCU) are scaling effects.
It is therefore recommended that these variables should be
scaled so that the contribution to overall error of differ-

ences in small values is proportional to their importance.

It is also recommended that smaller compute kernals be

used for synthetic programs. This would allow a greater

147

range and control of the paging rates and the CPU variance.
For example compute kernals of about 20% the size of the
ones used in this research would have been more appropriate

and would have allowed a greater range of CPU variance.

The needs expressed in the beginning of this thesis
will continue to inspire the investigation necessary to the
accomplishment of better ways to perform capacity manage-
ment, resource scheduling, and data processing system

tuning.

 

GLOSSARY OF TERMS

g_g§§§ 25g: The mechanism which is used to physically

position the read/write heads of disk-type devices for
I/O operations.

gpplicaglgg pgggra : A program which serves an end-use
function in a computing system rather than a utility
function. Examples are: payroll, inventory, etc.

bggghgggk: Execution of a prescribed set of programs on
one or more different computer systems for the purpose
of performance measurement or comparison.

blgggg pggg allocatlon: A method of partitioning the
pageable real memory of a virtual computer system
which arbitrarily favors certain programs over others
with respect to minimization of the system paging
rate.

bgffering: A method of I/O data management which reads or
writes data from two or more areas of real memory,
allowing CPU and I/O operations of the same program to
be overlapped.

ggggggl servgg: The node of a queuing network which
represents the CPU in some program models. The other
nodes of the network represent the I/O devices.

ghgnnel: The data path between an I/O device or control
unit and the main memory of a computer system.

 

instant a program accesses the CPU until the program
generates an I/O or paging operation.

gggpute kggggl: A collection of machine (computer)
instructions which is artificially constructed to
represent a "typical" program or to serve as a basis
for comparisons in benchmarks.

92g 99399: A situation which prevails when the CPU is the
limiting resource in a computer system. In this
thesis a program is considered to be CPU bound if its
CPU utilization exceeds 50% when no other programs are
active on the system.

148

 

149

gggflgggatlgg: A particular combination of memory,
processors, I/O devices and channels in a computer
system.

gyllgggg: The collection of all records which can be read
or written with one physical positioning of a disk
access mechanism (arm).

glgggl gggggg gevlce: An I/O device with the ability to
read or write data at any location on the recording
medium without regard to sequence. Examples of direct
access devices are disks and drums.

glgggg gggggg gethod: The method of data access to disks
or drums which takes advantage of the direct access
characteristics of the device. This usually implies
that the data is organized for direct access and the
physical address of the data is generated by the
program based on a randomizing algorithm.

glgk §ghgduling: The queue management and service discip-
line for disk I/O requests. Examples of disk schedul-
ing strategies are First-Come-First-Served (FCES),

Priority, and SCAN.

glgk: A type of direct access device which uses iron
oxide coated platters or "disks" as a recording
medium. Data is recorded in concentric circles about
the axis of a shaft which rotates the surfaces under
read/write heads. A number of disk surfaces may be

mounted on a single shaft.

glgpgtchigg: The process of storing the status (or state)
of the program which has control of the CPU and
restoring the status of a suspended program. The
process of switching control of the CPU from one
program to another.

_l§§: Execute-Channel-Program. A request by an executing
program to the Operating System to commence a specific
I/O operation.

FC §: A queue service policy, First-Come-First-Served
(see FIFO).
9.1..-

: A queue service policy, First-In-First-Out (see
5

The mean program execution interval between page

 

150

lLQ pagag: A situation which prevails when the I/O
devices are the limiting resources in a computer
system. A program is considered to be I/O bound if it
spends greater than 75% of its elapsed execution time
waiting for I/O when it is the only program active on
a computer system.

laaaagg aggagg: A method of disk access which requires
that the location of a particular disk record be
retrieved from an index before the data record itself
can be read. The index is a disk file but it may be
made core resident during program execution.

laltlal £2953 The elapsed time from the completion of a
program's I/O until the program has access to the CPU.
This implies that another program is using the CPU
when the program in question becomes ready to run.

lglgggapt: An event which triggers a change in the
program which is in control of the CPU. An interrupt
may be voluntary, as in a request for I/O, or it may
be involuntary, as is the case for a page exception or
the completion of another program's I/O. In the
latter case control will be ultimately given to the
highest priority program that is ready to run.

l§: A queue service discipline for a server with infinite
capacity, or an Infinite Server. This differs from
the PS server in that the arrival of new requests does
not diminish the rate at which previous arrivals are
served.

and recording I/O counts, paging counts, storage
allocations and elapsed time and cumulative CPU time
for the processes or programs in a computer system.

I"

._§§: Last-Come-First-Served queue service discipline.
The most recently arrived request is served immediate-
ly and the earliest arriving requests that are still
unsatisfied are not served until all later arrivals

are served.

lifetiaa fgactiga: A relation between the amount of real
memory allocated to a process (program) in a virtual
system and the process time between page exceptions.

RU: A paging algorithm which attempts to select the
Least-Recently-Used page frame to satisfy a paging
exception. A stack procedure which is used to imple-

ment the LRU algorithm.

 

151

aaahgg a; falaa pgalplga: A method for accelerating
convergence of a sequence by approximating the deriva-
tive in Newton's Hethod with a divided difference of
previously computed sequence values.

agﬁgla: The collection of disk surfaces which rotate
about a common shaft in a disk I/O device. It is also
referred to as a spindle.

mu l_lp_ggramming: The process of concurrent execution of
tw wo or more programs in a single computer system.

mult £pgggaaming gap: The collection of all the programs
or tasks that are concurrently executing in a single
computer system during the interval under considera-
tion.

929229929299 agagay: The portion of high speed memory
which is fixed and not allowed to be paged in a
virtual system. Generally this portion of memory is
used by the operating system for critical system tasks
such as I/O or paging management.

gapancy: The probability of executing the instructions

in a particular page of a program's address space.

gyggheag: The portion of elapsed time during which the
operating system is performing functions on behalf of
problem (user) programs. This time is not attributed
to any specific program by job accounting.

pagg aagapglga: An interrupt which occurs whenever
reference is made to a page of virtual memory which is
not in real memory, is. page fault.

pagg faglt: A page exception.

paga: The smallest unit of virtual memory that is managed
by the paging mechanism of the virtual system. In
this thesis a page is fixed and either 2048 or 4096
bytes.

paglag: The process of selecting a page frame of real
memory, possibly writing its contents onto auxiliary
storage (disk or drum), and reading the missing
virtual page into the selected page frame in virtual
systems.

paglpheral: An I/O unit or piece of data processing
equipment which is locally attached to the central
processor (CPU) but is not a part of it.

152

pglgglly aghgggllag: The operating system policy of
giving control to the program which has the greatest
importance (possibly on an arbitrary scale) and is
ready to run.

pglgglay;£a§aag garvice: The queue service discipline
which services the highest priority arrival in the
queue and resumes service to any interrupted arrival
only after all higher priority arrivals have completed
service.

pggblaa 599553 The state of the CPU when user program
instructions are being executed. In this state,
certain CPU instructions which may compromise the
integrity of the entire system are prohibited. This
is in contradiction to system state, during which any
instruction may be executed.

pgggggnggra: The form of the solution for network
queuing problems which expresses the probabilities for
network and subnetwork states as the product of the
probibilities for the individual nodes of the
network. Such a problem is said to be separable.

g§: A queue service discipline for a server which has
Processor Sharing. In this case the server can
service all arrivals simultaneously, but with the
individual service rates identical and inversely
proportional to the number of arrivals being served.
This construct is the limiting case for round-robin
scheduling.

gaag_a aggaag: See direct access method.

aagponse: In the case of terminals, usually the elapsed
time from data input until the answer or acknowledge-
ment is received at the terminal. In the case of
program I/O, response time is the time from them
request to begin an I/O operation until the operation
is completed.

_§§: Rotational Position Sensing. The technology which
allows some disk or drum I/O units to disconnect from
the channel during rotational delay and to reconnect
to the channel when the desired records come under the
read/write heads. This greatly increases the data

carrying capacity of the channel.

§lT_: An I/O scheduling method for disks and drums which
services a queue of I/O requests so that requests with
the shortest access time (from the current angular
displacement of the recording medium or the current
sector) are serviced first,

Shortest-Access-Time-First.

153

§§_N: An I/O scheduling method for disks which moves the
access mechanism back and forth over the disk surfac-
es, stopping at cylinders which have outstanding
requests to be serviced.

agaedullag: In this thesis, scheduling is understood to
mean the arrangement and sequencing of programs or
jobs for production as performed by a person or a
program which is not a part of the operating system.
In other uses of the word "scheduling", the meaning is
made clear by the presence of another word such as
"disk", "I/O", etc.

seek: A physical movement of a disk access mechanism or
arm from one position or cylinder location to another.

§aal;ga;kg§§ paocess: A stochastic process characterized
by the fact that the probability distribution of the
time between changes in system states depends only on
the initial and terminal states and is otherwise
independent of the previous history of the system.

aapagable: A problem which can be formulated in such a
way that the solution is expressible as a product of
probabilities (see product form).

gaaggg aagggy: Portions of virtual memory having subrout-
ines which are executed by two or more independent
programs. Such subroutines may not modify instruc-
tions or data areas in the shared memory, and are said
to be gazaaarant.

spindla: See module or disk.

§pggllag: A method of increasing overall computer system
efficiency by freeing programs from direct interaction
with low speed I/O devices. The input is read from
data input devices by a "reader" program, blocked and
written to disk. When the actual processing program
is executed, its card and print output is blocked and
written to disk. Actual punching and printing are
accomplished asynchronously by "writer" programs.

_§l§: A disk scheduling method which services I/O
requests in the order of the shortest seek times from
the current location of the access arm,
Shortest-Seek-Time-First.

154

51555! 52595: The state of the CPU when operating system
instructions are being executed. In this state, all
instructions are allowed including special system
(privileged) instructions. Host of the time in this
state is not attributed to any specific program and
may be thought of as overhead.

alag;sharlag: The process of giving computer access to
terminal users so that the user can interact with the
computer with little regard to the absence or presence
of other users. Time-sharing systems generally use
the round-robin or time-slicing scheduling discipline.
The service discipline rotates access to the CPU among
the members of the queue in discrete increments of
time.

tgaaaactiga: A logical exchange of data between a custom-
er or user and the data processing system. Examples
of transactions are payments, orders, requests,

requisitions and inquiries.

algagal systaa: A computer system (including hardware and
software) which is capable of executing programs of
any size without regard to the size of the computer's
real memory.

aggklag gap: The collection of all pages of a program (or
system) which has been referenced during some arbi-
trary interval.

 

APPENDIX A

Table A1.

lW'U

GwN—b

thaw»...

O‘U’ISUJN.‘

PAGES
OUT

APPENDIX A

Non-paging Runs Repl I - Heasured

PAGES OTHER
991

929
11,961

12,002
6,334

11,900
5,571
2,811

11,712
6,718
3,736
2,151

11,662
6,507
3,820
2,332

985

11,448
6,673
3,942
2,469
1,131

528

156

9999

301.01
301.01

342.50
342.50
342.50

357.18
357.18
355.12
355.12

431.66
431.66
431.66
429.59
429.59

437.97
437.97
437.97
437.97
431.77
431.77

452.48
452.48
452.48
452.48
452.48
435.96
435.96

ELAPSED PROCESS

9999

97.87
34.07

96.98
117.44
53.16

95.62
102.71
52.45
59.24

93.67
123.99
68.68
40.35
73.81

92.54
119.86
70.04
43.94
18.97
76.10

90.86
122.83
72.24
46.40
21.70
9.97
77.49

URKG
999

58
1136

60
56
1136

60
58
54
1136

62
56
56
56
113

60
58
58
58
56
1136

62
58
58
60
56
56
1136

 

157

Table A1(Cont.).

1511'!)

qmmcww...

(DQONU'ISWN...

PAGES PAGES OTHER ELAPSED PROCESS NRKG
999 99 929 9999 9999 999
. . . . . . 10,600 429.87 83.78 62
. . . . . . 6,343 429.87 116.74 60
. . . . . . 3,766 431.93 68.97 56
. . . . . . 2,370 431.93 44.36 58
. . . . . . 1,149 431.93 21.88 58
. . . . . . 547 431.93 10.11 58
. . . . . . 177 411.27 3.90 56

411.27 73.09 1136
. . . . . . 10,284 419.55 81.35 66
. . . . . . 6,115 419.55 112.55 58
. . . . . . 3,648 421.62 67.05 56
. . . . . . 2,317 421.62 43.35 58
. . . . . . 1,121 421.62 21.36 58
. . . . . . 557 421.62 10.43 56
. . . . . . 216 421.62 4.70 58
. . . . . . 75 351.39 1.57 54

351.39 62.32 1138

 

 

29.9
01101
OVRHED

01101
02011
OVRHED

01101
02011
03001
OVRHED

01101
02011
03001
04000
OVRHED

01101
02011
03001
04000
05110
OVRHED

01101
02011
03001
04000
05110
06101
OVRHED

Table

CPU
9999

0.33
.11

.28
.34
.16

.27
.29
.15
.17

.22
.29
.16
.09
.17

.21
.27
.16
.10
.04
.18

.20
.27
.16
.10
.05
.02
.18

A2.

CYCLE

9999

0.025

.029
.054

.030
.064
.126

.037
.064
.116
.200

.038
.067
.115
.188
.438

.040
.068
.115
.183
.400
.824

TI/O
RATE

35.05
18.50

33.32
15.60
7.92

27.14
15.57
8.66
5.01

26.63
14.86
8.72
5.33
2.28

25.30
14.75
8.71
5.46
2.50
1.21

158

CPU CPU

3929 9999999
122.22 0.00818
123.76 .00808
53.94 .01854
124.46 .00804
54.25 .01843
53.61 .01865
125.05 .00800
54.19 .01845
54.42 .01838
53.33 .01875
126.03 .00794
54.30 .01842
54.56 .01833
53.09 .01884
51.97 .01924
126.00 .00794
54.33 .01841
54.58 .01832
53.23 .01879
52.17 .01917
53.05 .01885

PAGEIN
9999

Non-paging Runs Repl I - Calculated

I/O
9999

39.74

35.05
18.50

33.32
15.60
7.92

27.14
15.57
8.66
5.01

26.63
14.86
8.72
5.33
2.28

25.30
14.75
8.71
5.46
2.50
1.21

CYCLE
EACTQR

1.000

0.997
0.992

0.996
0.992
0.987

0.995
0.992
0.989
0.985

0.995
0.992
0.989
0.986
0.983

0.995
0.992
0.989
0.986
0.983
0.982

 

OVRHED

01101
02011
03001
04000
05110
06101
07100
08000

OVRHED

CPU
9.2;;

.20
.27
.16
.10
.05
.02
.01
.18

.19
.27
.16
.10
.05
.03
.01
0.00
.18

CYCLE

El!!!

.041
.068
.115
.182
.376
.788
2.311

.041
.069
.116
.182
.376
.756
1.943
4.624

159

Table A2(Cont.).

TI/O
5.5.2.3.

24.66
14.76
8.72
5.49
2.66
1.27
0.43

24.52
14.58
8.66
5.50
2.66
1.32
0.52
0.22

CPU CPU

3521:; 52an
126.53 .00790
54.34 .01840
54.62 .01831
53.44 .01871
52.57 .01902
54.22 .01844
45.63 .02192
126.43 .00791
54.34 .01840
54.43 .01837
53.47 .01870
52.53 .01904
53.48 .01870
46.20 .02165

48.44 0.02065

PAGEIN

139.23

I/O CYCLE

BEE

24.66
14.76
8.72
5.49
2.66
1.27
0.43

24.52
14.58
8.66
5.50
2.66
1.32
0.52
0.22

245.193

0.995
0.992
0.989
0.987
0.985
0.983
0.982

0.995
0.992
0.989
0.986
0.984
0.982
0.981
0.981

 

Table A3.

(113de SCUM—I (”N-d N-b .a

O‘UlﬁuJN-L

gonnawm-n

0340\mﬁwN—l

PAGES
922

PAGES
l!

160

OTHER
£49

11,977

11,982
6,388

11,965
5,606
2,856

11,838
6,810
3,736
2,151

11,724
6,571
3,858
2,391
1,042

11,582
6,751
4,075
2,523
1,183

544

11,159
6,607
3,948
2,441
1,169

585
198

10,327
6,182
3,785
2,380
1,161

596
228
81

ELAPSED

2w:

301.03
301.03
342.51
344.58
344.51
359.20
361.27
357.14
357.14
431.71
431.71
433.77
431.71
431.79
444.15
444.15
444.15
444.15
435.89
435.89
460.81
460.81
460.81
460.81
460.81
448.41
448.41
450.57
450.57
450.57
450.57
450.57
450.57
429.92
429.92
427.90
427.90
427.90
427.90
427.90
427.90
425.83
354.72
354.72

Non-paging Runs Repl II - Heasured

PROCESS

2132

97.70
33.94
96.78
117.73
53.45
96.29
103.00
52.76
59.86
94.14
124.76
68.35
40.10
74.73
93.30
120.78
70.55
44.88
19.99
77.16
91.96
123.82
74.41
47.29
22.64
10.13
79.96
88.46
121.88
72.14
45.90

22.16.

10.86
4.21
76.83
82.02
113.20
69.14
44.21
21.99
11.09
4.92
1.59
63.14

 

BBQE

01101
OVRHED
01101
02011
OVRHED
01101
02011
03001
OVRHED
01101
02011
03001
04000
OVRHED
01101
02011
03001
04000
05110
OVRHED
01101
02011
03001
04000
05110
06101
OVRHED
01101
02011
03001
04000
05110
06101
07100
OVRHED
01101
02011
03001
04000
05110
06101
07100
08000
OVRHED

Table A4.
CPU CYCLE
92;; an;

0.33 0.025
.11
.28 .029
.34 .054
.16
.27 .030
.29 .064
.15 .125
.17
.22 .036
.29 .063
.16 .116
.09 .201
.17
.21 .038
.27 .068
.16 .115
.10 .186
.05 .418
.18
.20 .040
.27 .068
.16 .113
.10 .183
.05 .389
.02 .823
.18
.20 .040
.27 .068
.16 .114
.10 .185
.05 .385
.02 .769
.01 2.160
.18
.19 .041
.27 .069
.16 .113
.10 .180
.05 .368
.03 .717
.01 1.860

0.00 4.326

TI/O
3.42.1;

39.79

34.99
18.54

33.31
15.52
8.00

27.42
15.78
8.62
4.99

26.40
14.80
8.69
5.39
2.39

25.14
14.65
8.85
5.48
2.57
1.22

24.77
14.67
8.77
5.42
2.60
1.30
0.46

24.14
14.45
8.85
5.56
2.72
1.40
0.54
0.23

161

Non-paging Buns

CPU
3.9.23;

122.60

123.81
54.27

124.27
54.44
54.15

125.76
54.59
54.67
53.66

125.68
54.41
54.70
53.30
52.18

125.96
54.53
54.78
53.35
52.31
53.80

126.16
54.21
54.74
53.20
52.80
53.97
47.25

125.92
54.62
54.76
53.85
52.83
53.83
46.55
51.74

CPU

QLANIEE

0.00816

.00808
.01843

.00805
.01837
.01847

.00795
.01832
.01829
.01864

.00796
.01838
.01828
.01876
.01916

.00794
.01834
.01826
.01874
.01912
.01859

.00793
.01845
.01827
.01880
.01894
.01853
.02117

.00794
.01831
.01826
.01857
.01893
.01858
.02149

0.01933

PAGEIN
£223

Repl II - Calculated

I/O
RAT§

39.79

34.99
18.54

33.31
15.52
8.00

27.42
15.78
8.62
4.99

26.40
14.80
8.69
5.39
2.39

25.14
14.65
8.85
5.48
2.57
1.22

24.77
14.67
8.77
5.42
2.60
1.30
0.46

24.14
14.45
8.85
5.56
2.72
1.40
0.54
0.23

CYCLE
FACTOR

1.000

0.997
0.992

0.996
0.992
0.987

0.995
0.992
0.988
0.985

0.995
0.992
0.989
0.986
0.983

0.995
0.992
0.989
0.986
0.983
0.982

0.995
0.992
0.989
0.986
0.983
0.982
0.981

0.995
0.992
0.989
0.986
0.984
0.982
0.981
0.981

 

Table A5.

worn

4:de

OU'ISUJN—I mch-a

~301U14=WN4

mummawwd

PAGES
992

41
14
2
2
10

95
39
28
34
73
26

150
132

66
141
448
569

97

197
253
347
458
826
732
590
237

1,392
2,187

3,225
2,930
2,691
3,206

922

162

Paging Runs Repl I - Measured

PAGES
~H

160
68
3

(4
31

296
120
34
13
111
93

476
336
159
163
491
1,090
341

1,261
987
949
954

2,111

1,751

1,390
648

10,734
8,671

8,202
7,501
6,921
6,618
3,136

OTHER
29

12,002
7,015
3,909
2,331

12,002
6,830
2,593
1,081

12,002
6,983
4,348
2,763
1,149

261

12,002
7,288
4,062
2,581

429
178
100

11,921
10,564

525
460
367
866

ELAPSED PROCESS

PIPE.

441.10
445.16
447.19
447.19
441.10

461.23
467.54
467.54
467.54
467.54
461.23

500.05
508.69
517.17
517.17
517.17
517.17
500.05

597.96
609.39
584.91
611.67
611.67
611.67
611.67
584.91

2,183.56
2,183.56

2,185.5

2,185.59
2,185.59
2,185.59
2,183.56

22.14;:

95.26
129.29
71.72
43.69
72.11

95.29
125.87
75.22
48.51
20.71
78.74

95.63
128.51
79.51
51.76
21.97
5.32
94.21

96.44
135.87
75.56
49.02
9.05
3.67
2.38
127.99

99.97
201.39

12.55
10.64
8.97
19.84
506.51

HRKG
£31

110
54
42
52

120

66
50
44
46
60
62

54
46
48
46
48
36
34

58
46
46
44
40
26
22
34

50
44

26
24
22

32
32

01101
02011
03001
04000
OVRHED

01101
02011
03001
04000
05110
OVRHED

01101
02011
03001
04000
05110
06101
OVRHED

01101
02011
03001
04000
05110
06101
07100
OVRHED

01101
02011
03001
04000
05110
06101
07100
08000
OVRHED

Table A6.
CPU CYCLE
9.1;; 1.1.9};

0.22 0.036
.29 .063
.16 .114
.10 .191
.16
.21 .038
.27 .067
.16 .113
.10 .179
.04 .392
.17
.19 .040
.25 .069
.15 .115
.10 .177
.04 .315
.01 .383
.19
.16 .045
.22 .074
.13 .117
.08 .173
.02 ..241
.01 .317
.00 .410
.22
.05 .096
.09 .114
.01 .250
.01 .275
.00 .300
.01 .292

163

Paging Buns Repl I - Calculated

TI/O
BATE

27.58
15.91
8.75
5.22

26.67

14.87
8.89
5.58
2.552

24.96
14.39
8.72
5.66
3.17
2.61

22.18
13.58
8.57
5.78
4.15
3.16
2.44

10.38
8.81

4.00
3.64
3.34
3.43

CPU

3.3.2.}; 999329.}!

CPU

126.00 0.00783

54.26
54.52
53.37

125.97
54.27
54.78
53.47
52.25

125.52
54.35
54.70
53.40
52.35
49.26

124.46
53.65
53.77
52.68
47.50
48.80
42.44

119.26
52.46

41.93
43.32
41.01
43.69

.01825
.01833
.01871

.00775
.01811
.01811
.01861
.01736

.00766
.01756
.01764
.01769
.01339
.00393

.00727
.01642
.01508
.01386
.00356
.00190
.00160

.00441
.01047

.00144
.00134
.00123

0.00265

PAGEIN I/o CYCLE
gggg FACTOR

BATE.

0.4

.2

0
0
1

6
3
1
0
2
2

0
7
3
3
9

2.1

:4: ANNwd-I-IN

.300wa

noon-no 0000.... q
1:016:00 dmeIGGO‘d

\I

040

27.21
15.76
8.74
5.22

26.02
14.61
8.81
5.55
2.31

24.00
13.73
8.41
5.35
2.22
0.51

20.07
11.96
6.95
4.22
0.70
0.29
0.17

4.92
4.84

0.24
0.21
0.17
OCuo

0.995‘
0.992
0.988
0.985
0.994

0.995
0.992
0.988
0.985
0.982
0.993

0.994
0.991
0.988
0.985
0.983
0.981
0.992

0.991
0.989
0.986
0.984
0.982
0.980
0.978
0.989

Table A7.

cum..- Iw'U

01013:de (”CUM-A

QONU'l-FLUNA

-PAGES
OUT

45
16

0
21
18

116
63
24
37
81
36

150
97
31

112

388

531
74

234
298
402
521
951
807
629
255

PAGES
2!

166
85
3

7
52

302
170
34
20
86
95

459
252
86

88
378
1,067
282

1,556
1,212
1,156
1,151
2,336
1,954
1,556

755

164

OTHER
I‘O

12,002
6,919
3,916
2,295

12,002
6,859
4,160
2,610
1,110

12,002
6,953
4,242
2,838
1,173

249

12,002
7,444
4,180
2,770

537
183
91

Elﬁﬁ

441.43
443.54
445.57
445.57
441.43

462.56
468.95
471.04
471.04
471.04
462.56

487.47
502.52
504.88
504.88
504.88
504.88
487.47

644.82
654.00
627.30
656.28
656.28
656.28
656.28

627.297

Paging Runs Repl II - Measured

ELAPSED PROCESS

QIHE

95.46
127.67
71.89
43.00
72.22

95.49
126.63
76.01
48.80
21.22
78.78

95.54
128.55
78.36
53.16
22.46
4.89
89.78

96.51
138.82
78.19
52.67

11.176

3.81
2.30
138.81

WRKG
SP:
100
64
44
54
110

70
52
40
44
62
62

56
50
42
46
48
38
34

56
46
44
44
44
26
20
34

OVRHED

01101
02011
03001
04000
05110
OVRHED

01101
02011
03001
04000
05110
06101
OVRHED

01101
02011
03001
04000
05110
06101
07100
OVRHED

Table A8.
CPU CYCLE
9.2;; PISS
0.22 0.036
.29 .063
.16 .114
.10 .193
.16
.21 .038
.27 .067
.16 .112
.10 .179
.05 .394
.17
.20 .039
.26 .070
.16 .117
.11 .172
.04 .325
.01 .383
.18
.15 .048
.21 .076
.13 .118
.08 .167
.02 .228
.01 .307
.00 .398
0.22

Paging

TI/O
BALE.

27.57
15.79
8.80
5.17

26.60
14.99
8.91
5.59
2.54

25.57
14.34
8.57
5.80
3.07
2.61

21.03
13.24
8.51
5.98
4.38
3.26
2.51

165

CPU

ELIE 92.3.1112!

CPU

125.74 0.00784

54.20
54.48
53.40

125.71
54.17
54.75
53.51
52.37

125.63
54.10
54.15
53.41
52.28
51.17

124.38
53.63
53.47
52.62
48.14
48.29
40.00

.01823
.01834
.01867

.00776
.01801
.01812
.01855
.01772

.00796
.01784
.01810
.01816
.01447
.00371

.00712
.01604
.01465
.01343
.00389
.00178

0.00140

PAGEIN

RATE

0.4
.2
.0
.0
.1

.7
.4
.1
.0
.2
.2

0.9
.5
.2
.2
.7
2.1
.6

dNUU-l-A-LN
I. O O 0 l 0 0
Nkchmmk

Runs Repl II - Calculated

I/O CYCLE
342:3.- ..P 142.192.

27.19
15.60
8.79
5.15

25.95
14.63
8.83
5.54
2.36

24.62
13.84
8.40
5.62
2.33
0.50

18.61
11.38
6.67
4.22
0.82
0.28
0.14

0.995
0.992
0.988
0.985
0.994

0.995
0.992
0.988
0.985
0.982
0.993

0.994
0.991
0.988
0.985
0.983
0.981
0.992

0.991
0.989
0.987
0.984
0.983
0.981
0.979
0.989

"R _

 

23.9.

CNTL
CNTL
CNTL
CNTL
OVRHED

CNTL
CNTL
CNTL
CNTL
CNTL
OVRHED

CNTL
CNTL
CNTL
CNTL
CNTL
CNTL
OVRHED

CNTL
CNTL
CNTL
CNTL
CNTL
CNTL
CNTL
OVRHED

Table A9.

cum-s

GUISWN-B mbN-i

downstate...

PAGES
921

28
23
33
57

70
90
83
143
184

312
312
419
600
685

575
579
793
915
970
901
805

166

PAGES OTHER

;3

143

73
110
275

263
269
210
326
381

1,534
1,502
1,554
1,746
1,854

3,306
2,787
2,682
2,500
2,411
2,192
1,095

.119

12,144
9,123
6,223
3,409

8,101
6,679
5,353
3,904
3,975

3,785
2,712
1,848
1,144

538

3,647
3,161
1,693
711
333
194
156

2;!3

761.46
761.46
762.73
762.73
761.46

655.25
655.25
655.25
649.05
636.48
636.48

393.59
393.59
393.59
393.58
391.53

391.53

609.72
609.72
609.72
609.72
606.47
606.47
584.38
584.38

Page Allocation Runs - Heasured

ELAPSED PROCESS

ELLE

222.39
164.80
113.59

64.75
110.71

142.30
116.44
94.19
70.22
74.19
95.19

68.75
49.12
33.66
21.24
10.87

91.72

68.90
60.90
32.77
14.02
6.82
4.33
3.54
138.50

HRKG
SP:

82
58
64
78

66
52
52
52
54

62
52
52
48
38
26

56
52
48
40
36
26
24

EBQE

CNTL 1
CNTL 2
CNTL 3
CNTL 4
OVRHED

CNTL 1
CNTL 2
CNTL 3
CNTL 4
CNTL 5

D

Table A10.
CPU CYCLE
HEEL TEE;

0.29 0.062
.22 .083
.15 .120
.08 .207
.15
.22 .078
.18 .094
.14 .118
.11 .153
.12 .146
.15
.18 .074
.13 .093
.09 .116
.05 .136
.03 .164
.01 . . .
.23
.11 .088
.10 .103
.05 .139
.02 .190
.01 .221
.01 .254
.01 0.286

0.24

167

Page Allocation Runs - Calculated

TI/O
BATE

16.17
12.08
8.30
4.83

12.77
10.60
8.49
6.52
6.85

13.52
10.71
8.65
7.35
6.11
4.20

11.40
9.76
7.18
5.27
4.52
3.93
3.50

CPU CPU

SAPS QWUANTQE
54.61 0.01801
55.37 .01792
54.78 .01794
52.66 .01757
56.94 .01701
57.37 .01677
56.85 .01693
55.60 .01660
53.59 .01703
55.07 .01292
55.23 .01165
54.93 .00989
53.90 .00735
49.49 .00455
52.95 .00990
51.92 .00964
51.69 .00749
50.77 .00437
48.95 .00249
45.01 .00181
44.40 0.00171

PAGEIN
BATE

0.2
.1
.1
.4

coon-o
OQSOQNO

waawww

UURngU'I
0000000

NGOJkO‘S

I/O
RATE

15.95
11.98
8.16
4.47

12.37
10.20
8.17
6.02
6.25

9.62
6.89
4.70
2.91
1.37
0.31

5.98
5.19
2.78
1.17
0.55
0.32
0.26

CYCLE
PAcggg

0.996
.994
.993
.992

.994
.993
.993
.992
.992

.989
.988
.987
.987
.986
.985

.997
.996
.996
.995
.995
.995
0.995

25.9

000
CTL
CTL
CTL
OVRHED

011
CTL
CTL
CTL
OVRHED

101
CTL
CTL
CTL
OVRHED

110
CTL
CTL
CTL
OVRHED

CTL
CTL
CTL
001
OVRHED

CTL
CTL
CTL
010
OVRHED

CTL
CTL
CTL
100
OVRHED

CTL
CTL
CTL
111
OVRHED

Table A11.

lw'U

SWN-ﬂ GWN-i sumo... #UJN—b 1:de BOON—I 1:de

SWIG-l

PAGES
992

1

9
23
51
12

23
0
28
57
5

22
0
6

59

15

5
11
25
60
15

24
29
39
70
16

26
17
34
52
21

38
27
36
59
25

31
26
35
65
19

168

PAGES OTHER

2!

44
21
47
76
32

144
(4
47
101
14

165
a
8

112

49

44
23
43
97
29

117
78
97

239
60

143
57
117
115
53

167
68
128
125
72

166
81
116
240
65

149

8,462
4,302
2,467
1,293

8,409
4,015
2,230
1,114

8,313
3,777
2,129
1,050

8,440
4,415
2,575
1,359

11,915
8,961
6,044
8,718

12,395
9,330
6,418
8,720

12,075
9,174
6,255
8,720

11,941
9,025
6,148
8,718

Iléﬁ

288.76
288.76
290.81
280.56
280.56

270.62
270.62
270.62
258.43
258.43

260.51
260.51
262.60
250.35
250.35

294.87
294.87
296.96
286.67
286.67

746.14
748.17
748.17
737.97
737.97

776.57
776.57
778.60
768.41
768.41

760.68
760.68
762.76
752.52
752.52

752.53
752.53
752.53
742.29
742.29

Program Estimation Runs - Measured

ELAPSED PROCESS

r;u§

67.18
79.81
46.37
24.54
45.27

67.66
74.35
42.04
21.15
42.63

66.82
69.83
39.77
19.99
41.53

67.58
81.51
48.29
25.81
46.35

218.38
161.44
110.52

68.36
108.75

227.01
168.15
117.22

68.48

219.93
164.09
113.55

68.26
110.79

219.04
162.67
112.41

69.14
109.74

wPKG
§§T

56
52
64
96
96

92
54
56
76
94

94
56
58
64
82

72
58
62
84
80

84
58
66
70
94

90
60
66
58
96

82
62
64
64
84

82
58
64
.74
88

169

Table A12. Program Estimation Runs - Calculated

CPU CYCLE TI/O CPU PAGE PAGEIN EXP ADJ-CPU
2595 PEEL IlEE SAPS 2A2; BATE BATE EASIQB BATE
000 1 0.23 0.034 29.46 125.98 0.2 0.2 0.995 126.58
CNTL 2 .28 .067 14.93 54.92 0.1 0.1 0.991 54.38
CNTL 3 .16 .116 8.65 53.23 0.2 0.2 0.990 53.79
CNTL 4 .09 .205 4.88 52.73 0.5 0.3 0.989 53.32
OVRHED .16 0.2 0.1
011 1 .25 .032 31.08 126.20 0.6 0.5 0.995 126.80
CNTL 2 .28 .067 15.83 54.02 0.0 0.0 0.991 54.51
CNTL 3 .16 .119 8.42 53.07 0.3 0.2 0.989 53.65
CNTL 4 .08 .213 4.71 52.71 0.6 0.4 0.988 53.32
OVRHED .17 0.1 0.1
101 1 .26 .031 32.57 124.50 0.7 0.6 0.995 125.07
CNTL 2 .27 .069 14.52 54.10 0.0 0.0 0.993 54.48
CNTL 3 .15 .123 8.14 53.57 0.1 0.0 0.991 54.05
CNTL 4 .08 .215 4.65 52.58 0.7 0.4 0.990 53.09
OVRHED .17 ' 0.3 0.2
110 1 .23 .035 28.78 124.91 0.2 0.1 0.995 125.53
CNTL 2 .28 .066 15.05 54.18 0.1 0.1 0.992 54.64
CNTL 3 .16 .113 8.82 53.37 0.2 0.1 0.990 53.90
CNTL 4 .09 .197 5.08 52.55 0.5 0.3 0.989 53.14
OVRHED .16 0.2 0.1
CNTL 1 .29 .062 16.13 54.57 0.2 0.2 0.995 54.86
CNTL 2 .22 .083 12.08 55.52 0.1 0.1 0.993 55.91
CNTL 3 .15 .122 8.21 54.70 0.2 0.1 0.992 55.15
001 4 .09 .082 12.14 127.55 0.4 0.3 0.992 128.53
OVRHED .15 0.1 0.1
CNTL 1 .29 .062 16.15 54.61 0.2 0.2 0.995 54.90
CNTL 2 .22 .083 12.09 55.49 0.1 0.1 0.993 .55.88
CNTL 3 .15 .119 8.39 54.76 0.2 0.2 0.992 55.21
010 4 .09 .087 11.50 127.36 0.2 0.2 0.992 128.35
OVRHED .15 0.1 0.1
CNTL 1 .29 .062 16.09 54.91 0.3 0.2 0.995 55.20
CNTL 2 .22 .082 12.15 55.91 0.1 0.1 0.993 56.31
CNTL 3 .15 .119 8.37 55.10 0.2 0.2 0.992 55.55
100 4 .09 .085 11.76 127.77 0.2 0.2 0.992 128.76
OVRHED .15 0.1 0.1
CNTL 1 .29 .062 16.09 54.52 0.3 0.2 0.995 54.82
CNTL 2 .22 .083 12.10 55.49 0.1 0.1 0.993 55.88
CNTL 3 .15 .120 8.33 54.70 0.2 0.2 0.992 55.16
100 4 .09 0.083 12.07 126.12 0.4 0.3 0.992 127.10
OVRHED 0.15 0.1 0.1

SELECTED BIBLIOGRAPHY

1.

SELECTED BIBLIOGRAPHY

Bard, Y. "Characterization of Program Paging in a
Time-Sharing Environment", Inn gonnnel e; Reseanen
nne Devenepmenn, Vol. 17, No. 5, (Sept 1973), pp.
387-393.

-- "Application of the Page Survival Index
(PSI) to Virtual-Eemory System Performance", Inn
Journa; e; Researen nnn nevelopnenn, Vol. 19, (Hay

1973f? pp. 212-220.

 

_. "A Characterization of VH/370 Uorkloads",
Inn gembnigge Scientific Centen, Technical Report

Baskett, F., and Gomez, F.P. "Processor Sharing in a
Central Server Queuing Hodel of Hultiprogramming
with Applications", girth nnnne; Princenen gengen-
enee en Inﬁormation Scieneee eng S stem , (1972),
pp. 598-603.

Bass, L.J. "0n Optimal Processor Scheduling for Hulti-
programming", SIAM g. QOHPUT., Vol. 2, No. 4,
(December 1973), pp. 273-80.

Belady, L.A., and Kuehner, C.J. "Dynamic Space-Sharing
in Computer Systems", Conmunicetions 92 ene neg,
Vol. 12, No. 5, (Bay 1969), pp. 282-288.

 

Boyd, D.L. "A Hultiple Resource Hodel for_A Batch-Pro-
cessing Hultiprogralming System", Natione; gechniee;
Infonnenien Senviee, U.S. Department of Commerce,
Report an-722332, (narch 1971).

 

Boyse, J.i., and Earn, D.P. "A Straightforward Hodel
for Computer Performance Prediction", gelputing
gnngeys, Vol. 7, No. 2, (June 1975), pp. 73-93.

Brandwajn, A. "A Queuing Hodel of hultiprogrammed
Computer Systems Under Pull Load Conditions",
£23222i9é212P§ 2: Lbs ASA. (April 1977). PP-
222-240.

171

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

172

Buchholz, W. "A Synthetic Job for Heasuring System
Performance", 183 §yetems Jounnel, Vol. 8, No. 4,

Burge, 3.8., and Konheim, A.G. "An Accessing Hodel",
gennnal en nne ngn, Vol. 18, No. 3, (July 1971), pp.

Chamberlain, D.D., Fuller, 5.8., and Liu, L.Y. An
Analysis of Page Allocation Strategies for Hultipro-
gramming Systems with Virtual Hemory", IE! gennnel
2; §§§§§£EP 229 PesslPPPSPP. Vol- 17. NO- 5. (Sept
1973), pp. 404-412.

Chang, H. "Single-Server Queuing Processes in Computing
Systems", 22! Systems gournel, Vol. 9, No. 1,

Courtois, P.J. "Decomposibility, Instabilities, and
Saturation in multiprogramming Systems", genmuniee-
£i29§ PE 322 LEE. Vol. 18. No. 7. (July 1975). PP-
371-376.

Cox, D.R. "A Use of Complex Probabilities in the
Theory of Stochastic Processes", Proeeednnge e; nne
generidge gnilosophicn; §ogiet1, Vol. 51, (1955),
pp. 313-319.

..--....- Planning PI §£P§£12§P$§o New York: John
Wiley and Sons,Inc., 1958.

Denning, P.J. "Effects of Scheduling on File Hemory
Operations", Spying Join; ggnputen gengerenee,
(1967), pp. 9-21.

=. "The Working Set Hodel for Prograh Behav-
ior", gommunicatione en nne ngn, Vol. 11, No. 5,
(May 1968), pp. 323-33.

 

Denning, P.J., and Schwartz, S.C. "Properties of the
Working Set Hodel", Communicntiene en nne neg, Vol.
15, No. 1, (Harch 1972), pp. 191-198.

Fenichel, R.R., and Grossman, A.J. "An Analytic Hodel
of Unltiprogrammed Computing", Pnoceedinge e; nne

Sarina 92mm 9.222....uter segsrsmcg. (1969). pp.

Fernandez, E.B., and Lang, T. "Computation of Lower
Bounds for Hultiprocessor Schedules", Inn gennnn; en
Reseenen enn nenelepnenn, Vol. 19, No. 5, (September
1975), pp. 435-44.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

173

Finn. J... A £32222; Eggs; :2; PPlIiYSIISES APPIYSAS:
New York: Holt, Rinehart and Winston, Inc., 1974.

Forbes, K., and Goldsworthy, A.W. "A Prescheduling
Algorithm -- Scheduling a Suitable Nix Prior to
Processing", Eng gggggzg; gennnel, Vol. 20, No. 1,
(1977), pp. 27-29.

Franklin, N.A., and Gupta, R.K. "Computation of Page
Fault Probability from Program Transition Diagrams",
Semngaisasieaé 2E :23 ASE. Vol- 17. No. 4. (April
1974), pp. 186-191.

Gaver, Jr., D.P. "A Waiting Line with Interrupted
Service Including Priorities", gennne; e; nne Regen
gnennenica; genieny, Series 13, No. 24, (1962), pp.
73-90.

_____ . "Probability Bodels for Uultiprogramming
Computer Systems", geurnn; 93 ene AC8, Vol. 14,No.

Ghanem, 8.2. "Dynamic Partitioning of Rain Memory Using
the Working Set Concept", Inn gennnn; en geeeenen
eng nenenepment, Vol. 19, No. 5, (September 1975),
pp. 445-50.

Gordon, W.J., and Newell, G.F. "Closed Queuing Systems
with Exponential Servers", Openenions Reseenen, Vol.
15, No. 2, (April 1967), pp. 254-265.

IBM Corporation "Analysis of Some Queuing Models in
Real-Time Systems",Form GF20-0007, Data Processing
Division, White Plains, N.Y. 10604.

Isaacson, 8., and Keller, 8.8. nnelysis en Nguenieel
nennods, New York: John Wiley 8 Sons, Inc., 1966.

Iverson, K.E. n Pnognnnning Lengnege, New York. John
Wiley 8 Sons, Inc., 1962.

Jackson, J.R. "Jobshop-Like Queuing Systems”, Ranage-
ment §enence, Vol. 10, No. 1, (Oct 1963), pp.
131-142.

Kimbleton, S.R. "Batch Computer Scheduling: A Heuristi-
cally activated Approach", Offnee e; gene; Reseanen,
Report AD-A007922 (September 1974).

Lewis, P.A.W., and Schedler, 6.5. “A Cyclic Queue Hodel
of Multiprogramming", Jounne; e; nne neg, Vol. 18,

No. 2, (April 1971), pp. 199—220.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

174

Biller, Jr., R.G. "Priority Queues", Tne nnnals en
APPASPPSASPA SESSAPEIPS: Vol- 31. (1960). PP-
86-103.

Muntz, R.L. "Poisson Departure Processes and Queuing
Networks", Conferenee en Innennation Scieneee enn
gyenene, (Barch 1973), pp. 435-440.

National Bureau of Standards, Enactionel geotorie;

2§§292§ £2; §é§£2£§ 23 2 L§!§l§v Washington: U- 5-
Department of Commerce, 1957.

Oden, P.8., and Schedler, G.S. "A Model of Remory
Contention in a Paging Rachine", Communications 9;

nne ngn, Vol. 15, N0. 8, (August 1972), pp 761-771.

Reiser, 8. "Interactive Bodeling of Computer Systems",
IP! §1§£§2§ 922222;. Vol. 15. so. a. (1976). pp-
308-27.

Saltzer, J.8. "A Simple Linear Hodel of Demand Paging
Performance", gemmunieatione en the ngn, Vol. 17,
No. 4, (April 1974), pp. 181-186.

Seaman, P.8., Lind, R.A., and Wilson, T.L. ”0n Telep-
rocessing System Design, Part IV: An Analysis of
Auxiliary Storage Activity", Inn Systene gournel,
Vol. 5, No. 3, (1968), pp. 158-170.

Sreenivasan, K., and Kleinman, A.J. "0n the Construc-
tion of a Representative Synthetic Workload",
Communicenions e; nne ngn, Vol. 17, No. 3, (Raroh

1974), pp. 127-33.

Teorey, T.J. "Properties of Disk Scheduling Policies on
Hultiprogrammed Computer Systems", gel; geint
gemnuten gonfenenee, (1972), pp. 1-11.

Waters, S.W. "Estimating magnetic Disk Seeks", Ine
gempunen gennnel, Vol. 18, No. 1, (1975), pp. 12-17.

Wilhelm, N.C. "A General Hodel for the Performance of
Disk Systems", gourne; e; nne ngn, Vol. 24, No. 1,
(Jan 1977), pp. 14-31.

Welch, P.D. "0n Pre-Emptive Resume Priority Queues",

Tne nnnnle en nathemeniee; §§etistice, Vol. 35,
(1964), pp. 1340-48.

“1111111111114auguijnﬁgwﬂmm“

3