E

<

"manta“; : _
. , . ."ﬁb. . . I
. .y I, to. .1 . 5 I)!

.t

Le
.a

n1 . z
.121"? .
E:

.

.r

.5175»): aliﬂt .. a...

.. . .I .5. .9. .L,
Etﬁvunqrrlﬂc$rrnim

.. 2:32: :1. In...

:13?! l...

5...} .» :36!
1. 5. .—r.-§l..s
. 1:9

 

 

.....
121:2.
I x‘l‘-z¢m

$35.15;...“ a. 2 .
. .3 3‘...
L“ in“. viii-«I...

.3
5.x?» «an.
2-..: .. x. .
‘Ig..\x.o.\ﬁﬁs. a? 24.9 .. . 1.1?
.4: :iﬁ..i.:i .n‘ .frsh
$3.... 2... .1 33$
. f. .. I33)- I!
in ..

‘59....wrt.
1,2... .
5

«
”39%....
hr Ina—Wu“ (I
.1. 3.
g.
5. ‘

‘3'?!»
a,»

u...
3...: .71
15¢...Ozvr
t): 3
.x
. ‘1}.135
‘l: (I £121.:
5”- 2.:

.It- .
.l! i .
27', In: .93!va ,
u.‘ . l .- nun»
301w” .

II o 103..-. z
.2. .., f... i
z. “insatf.

an“? r t

:-

tvlpvz
31. 9.. l

.3:
f... .I: :3.
w ill...
alts-e: I

z... ..
$.51:
.3...

his: v5.
:3. I: 31.11.».
.5) .. )6}?

{vVAtpt \i’ n
.11.- .

THESlS

 

K
(5»
P)
(N

LIBRARY

Michigan State
Unlverslty

 

 

 

This is to certify that the

dissertation entitled

A FRAMEWORK FOR SERVICE DIFFERENTIATING
INTERNET SERVERS

presented by

Xiangping Chen

has been accepted towards fulﬁllment
of the requirements for

Doctoral degree in Computer Science
& Engineering

 

 

ﬂWManW

Date 4 r1 «93/ 91307)

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE 'DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

moo mm.“

 

A FRAMEWORK FOR SERVICE DIFFERENTIATING
INTERNET SERVERS

By

Xiangping Chen

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2000

ABSTRACT

A FRAMEWORK FOR SERVICE DIFFERENTIATING INTERNET
SERVERS

By

Xiangping Chen

The variances in application types and the service quality requirements have been
increasing continuously. The current besteﬁort service model of the Internet and its
servers might not be able to satisfy the evolving service quality demands of diverse ap-
plications. Therefore, dz'ﬂerentz’ated service has been proposed as a potential solution
to provide alternative quality of services (QoS) and is expected to be supported in the
next generation Internet (NGI). The Service Differentiating Internet Server (SDIS)
is prOposed in this dissertation to provide service guarantee to prioritized client re—
quests, which enforces predictable C208 by categorizing requests into different groups
based on task types and client identiﬁcations and allocating resources based on the
groups. Considering the popularity of the WWW on the Internet and its capability of
providing a common interface to almost all the protocols on the Internet, Web server
performance issues are emphasized in this study.

Four major steps are taken to conduct this study. First, the workload behavior
of Web servers under different environments are studied. The size, access pattern,
lifetime distribution, and behavior of modiﬁcation of different types of documents in
different classes of web sites are analyzed and compared. Results indicate signiﬁcant
differences in static as well as dynamic characteristics of documents types in different
web site classes, which suggests that server workload characterization based on the
application environments helps to improve the web server caching efficiency and thus

improve the server throughput.

Second, the conceptual correctness of service differentiating Internet servers (SDIS)
is examined and verified. Approaches of providing differentiated and QoS based web
services, and modeling and analysis of web server scheduling under different workload
situations are studied. Experimental studies and analyses prove that a SDIS provides
signiﬁcantly better services to high priority tasks compared to a traditional Internet
server under high system utilization.

Third, the overload management versus QoS assurance in a web server is explored.
The performance of a overloaded server becomes unstable and may eventually crash.
To assure stable and predictable service, a simple and efficient admission control
algorithm ACES is proposed. The ACES algorithm bounds response delay of requests
by allocation of computational credits based on the estimation of task service times.
Experimental study shows that the ACES algorithm provides effective control of
response delay bounds under highly variant workload environments while maintaining
the system throughput.

We further extend the research on performance assurance issues of SDIS, and
present a novel and efficient admission control algorithm, PACERS, which provides
services based on the server workload characteristics. Different levels of quality of
services are assured by periodic allocation of system resources based on the estimation
of request rate and service requirements of prioritized tasks. Theoretical analysis
and experimental study show that the PACERS algorithm provides effective control
of throughput and response delay boundary to the prioritized tasks, and preserves
system throughput under various workload situations.

In summary, this research has the following contributions: trafﬁc characterization
which is essential for designing efficient and high performance Internet servers; the
conceptualization of SDIS framework and its design to meet the service quality de-
mands of evolving applications and services; development of efficient admission and

overload control techniques aiming at performance assurance of Internet servers.

TABLE OF CONTENTS

LIST OF TABLES .............................. vii
LIST OF FIGURES ............................. viii
CHAPTER 1 INTRODUCTION ..................... 1
CHAPTER 2 BACKGROUND ...................... 9
2.1 History and Growth of the Internet ................... 10
2.2 Definition of the Internet ......................... 12
2.3 Service Model of the Internet ...................... 13
2.4 Classiﬁcation of Internet Servers ..................... 15
2.4.1 Best-effort Internet Servers .................... 15

2.4.2 Real-time Internet Servers .................... 16

2.4.3 Hybrid Internet Servers ...................... 17

2.5 Performance Challenge of Web Servers ................. 18
2.5.1 E-commerce Services ....................... 18

2.5.2 Multimedia Services ....................... 19

2.5.3 Service Path and Performance Bottlenecks ........... 19

2.5.4 Optimization of Retrieval Behaviors ............... 22

2.6 Improving Server Performance ...................... 23
2.6.1 High Performance Web Servers ................. 24

2.6.2 Distributed Web Servers ..................... 25

2.6.3 Server QoS ............................ 27

2.6.4 Any Layer QoS Collaboration .................. 29

2.6.5 Service Differentiating Internet Servers ............. 31

CHAPTER 3 SERVER WORKLOAD CHARACTERIZATION . . 33

3.1 Overview .................................. 33
3.2 WWW Traces ............................... 35
3.3 Experimental Study ............................ 35
3.3.1 Summary of Traces ........................ 36
3.3.2 Document Types ......................... 37
3.3.3 Lifetime Calculation ....................... 39
3.4 Results ................................... 41
3.4.1 Access Frequency vs. File Size .................. 41

iv

3.4.2 Average Lifetime ......................... 42

3.4.3 Modification Distribution .................... 43
3.4.4 Total Lifetime Distribution .................... 47
3.5 Design Issues ............................... 49
3.5.1 Document Classification ..................... 49
3.5.2 A Two-State TTL Algorithm .................. 50
3.5.3 Caching and Prefetching ..................... 53
3.6 Summary ................................. 55
CHAPTER 4 SERVICE DIFFERENTIATING INTERNET SERVERS 56
4.1 Overview .................................. 56
4.2 Service Differentiation .......................... 58
4.2.1 Prioritized Services ........................ 58
4.2.2 Customized Services ....................... 59
4.3 A Generalized Internet Server ...................... 60
4.3.1 Service Differentiating Internet Server Model .......... 61
4.3.2 Server Processing ......................... 63
4.3.3 Admission Control ........................ 64
4.3.4 Process Scheduling ........................ 65
4.3.5 Task Assignment ......................... 65
4.3.6 System Overhead ......................... 66
4.4 Analysis of Waiting Time ........................ 67
4.4.1 Waiting Time in Non-prioritized Systems ............ 67
4.4.2 Waiting Time of Prioritized Systems .............. 69
4.5 Simulation ................................. 71
4.5.1 Workload Generation ....................... 72
4.5.2 Server Processing ......................... 73
4.5.3 Performance Metrics ....................... 74
4.6 Results ................................... 74
4.6.1 Effectiveness of Priority Based Scheduling ........... 74
4.6.2 Maximum High Priority Ratio .................. 77
4.6.3 Low Priority Task Performance ................. 78
4.6.4 Task Assignment Schemes .................... 79
4.6.5 Preferential Task Assignment .................. 83
4.6.6 Admission Control Perforn’iance ................. 84
4.7 Related Works ............................... 87
4.8 Summary ................................. 88

CHAPTER 5 BOUNDING RESPONSE DELAYS IN BUSY WEB

SERVERS ................................. 90
5.1 Overview .................................. 90
5.2 Admission Control Algorithm ...................... 92
5.2.1 Overview of the algorithm .................... 92
5.2.2 Delay Bounds Assurance ..................... 93
5.2.3 Admission Control and Service Time .............. 94
5.2.4 The Double Queue Structure ................... 95

5.3

5.6

Service Time Estimation ......................... 96
5.3.1 Web Object Distribution ..................... 96
5.3.2 Service Time Distribution .................... 98
Simulation Methodology ......................... 101
Results ................................... 103
5.5.1 Throughput Comparison ..................... 104
5.5.2 Average Delay ........................... 105
5.5.3 Delay Bounds Miss Probability ................. 105
Summary ................................. 107

CHAPTER 6 AN EFFICIENT ADMISSION CONTROL ALGO-

RITHM FOR SDIS ............................ 109
6.1 Overview .................................. 109
6.2 Workload Characterization ........................ 112
6.2.1 Access Distribution ........................ 112
6.2.2 Object type distribution ..................... 115

6.3 Admission Control Algorithm ...................... 116
6.3.1 Overview of the algorithm .................... 116
6.3.2 Admission Control and Delay Bounds .............. 117
6.3.3 Estimation of Request Rate ................... 119
6.3.4 Estimation of Service Time ................... 120
6.3.5 The Double Queue Structure ................... 121

6.4 Response Delay Analysis ......................... 122
6.4.1 Ideal Case Delay Bounds ..................... 122
6.4.2 Waiting Time Estimation .................... 125

6.5 Simulation Model and Parameters .................... 126
6.6 Results and Evaluation .......................... 128
6.6.1 Throughput Performance ..................... 128
6.6.2 Delay Performance ........................ 129
6.6.3 Sensitivity Test .......................... 131

6.7 Related Works ............................... 133
6.8 Summary ................................. 134
CHAPTER 7 CONCLUSIONS AND FUTURE WORK ....... 136

vi

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10

4.1
4.2

5.1
5.2
5.3

6.1
6.2
6.3

LIST OF TABLES

Summary of logs from EDU class. ............... 36
Summary of logs from COM class ................ 37
Summary of logs from NEWS class. .............. 37
Types of Web documents ..................... 38
Access pattern of different types in EDU class. ........ 38
Access pattern of different types in COM class ......... 39
Access pattern of different ﬁle types in NEWS class. ..... 39
Not-modiﬁed vs. Get retrieval .................. 41
Average Lifetime (unit days) ................... 43
Highly mutable document ratio. ................ 50
Trace Data Distribution. .................... 72
Simulation Parameters. ..................... 73
Trace Data Distribution. .................... 97
Simulation Conﬁguration ..................... 102
CQs allocated to each object types. .............. 103
Trafﬁc distribution vs. object type. .............. 115
Simulation Conﬁguration ..................... 127
CQs requested by each object types ............... 127

vii

LIST OF FIGURES

2.1 Growth of the Internet Hosts. ................. 11
2.2 The path of web connections ................... 21
2.3 Request processing steps ..................... 21
2.4 Mapping between the OSI Model and Internet. infrastructure. 31
3.1 Trafﬁc vs. doc. size of HTM ................... 42
3.2 Tiaffic vs. doc. size of GIF .................... 42
3.3 Modiﬁcation distribution in EDU class. ............ 44
3.4 Modiﬁcation distribution in COM class ............. 45
3.5 Modiﬁcation Distribution in NEWS class ............ 46
3.6 Total lifetime distribution of documents in the NEWS class. . 48
3.7 The two-state TTL consistency algorithm. .......... 51
3.8 Performance Comparison of three cache consistency algorithms. 52
4.1 Queuing Network Model for an Web Server ........... 62
4.2 Mean task response time vs. scheduling schemes ........ 75
4.3 Mean task slowdown vs. scheduling schemes. ......... 75
4.4 95th percentile response time vs. scheduling schemes. . . . . 76
4.5 Mean response time vs. high priority ratio. .......... 77
4.6 95th percentile response time vs. high priority ratio ...... 77
4.7 Mean response time vs. high priority ratio. .......... 78
4.8 95th percentile response time vs. high priority ratio ...... 78
4.9 Mean slowdown vs. high priority ratio. ............ 79
4.10 High priority task mean response time, priority ratio 1:1. . . 80
4.11 95th Percentile high priority task response time, priority ratio

1:1 ................................. 80
4.12 High priority task mean response time, priority ratio 4:1. . . 81
4.13 95th Percentile high priority task response time, priority ratio

4:1 ................................. 81
4.14 Low priority task mean response time, priority ratio 1:1. . . . 82
4.15 95th percentile low priority task response time, priority ratio

1:1 ................................. 82
4.16 Low priority task mean response time, priority ratio 4:1. . . . 83
4.17 95th percentile low priority task response time, priority ratio

4:1 ................................. 83
4.18 High priority task mean response time vs. PTA. ....... 84

4.19 95th percentile high priority task response time vs. PTA. . . 84

viii

4.20
4.21
4.22
4.23

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17

Low priority task mean response time vs. PTA .........
95th percentile low priority task response time vs. PTA. . . .
Reject ratio vs. task assignment schemes. ...........
Reject and abort ratio vs. priority groups. ..........

Mean response time of web objects. ..............
Mean service time of web objects. ...............
Web server system structure ...................
Throughput of stress test .....................
Throughput of sensitivity test ..................
Delay of stress test. .......................
Delay of sensitivity test ......................
Dbm rate of stress test. .....................
Dbm rate of sensitivity test. ..................

Access distribution a day (1 sec) .................
Trafﬁc distribution a day (1 sec) .................
Access distribution a day (1 min). ...............
Traffic distribution a day (1 min). ...............
Access trends in a week. ....................
Trafﬁc trends in a week ......................
Web server system structure ...................
Throughput using SAC ......................
Throughput using PACERS. ..................
Performance comparison of low priority tasks. ........
Performance comparison of high priority tasks. ........
Throughput under ﬂuctuating load. ..............
Delay under ﬂuctuating load ...................
Delay bound miss ratio under fluctuating load. . . . -. . . . .
Delay bounds miss ratio under ﬂuctuating load. .......
High Priority throughput high background trafﬁc. ......
High Priority delay high background traffic ...........

ix

85
85
86

99
101
104
104
105
105
106

113
113

122
129

CHAPTER 1 INTRODUCTION

With the exponential growth of the Internet, many of the latest developments in
technology have been aimed at providing increasingly sophisticated information ser-
vices on top of the basic Internet data communications. Much attention has been on
the use of this global information infrastructure for support of commercial services,
also well known as e—commerce. Due to its capability of providing quick and easy
access to a large variety of information from sites all over the world, the World Wide
Web (WWW or Web) has become the most widely used tool for access and dissemi-
nation of commercial, educational, and news information on the Internet. Efforts of
using the Web in revenue generating activities is increasing at a fast rate [1] and is
likely to maintain its growth rate in the future. It is becoming a typical behavior for
popular web sites to receive tens of millions of retrieval requests per day.

One problem that is widely existing in the current web service is the unpredictabil-
ity of the response time, which is not acceptable to time critical transactions that have
limited tolerance to response delay. Although contemporary web servers are able to
serve thousands of requests in one second, the response delay of a popular server
can be several orders of magnitudes higher than the average value during high load
periods, causing the de facto “denial-of-service” effects. It was estimated that in 1998
about 10~25% of e-commerce transactions were aborted owing to underlying long re-
sponse delay, which translated to about 1.9 billion dollars lost of revenue [2]. Another
challenge imposed on the web service is the increase of web pages with continuous

media (CM) objects, such as audio or video clips. These streaming data types have

time constraints or deadlines that need to be met for their meaningful delivery. Sus-
tained bandwidth assurance and low deadline miss ratio is more important than fast
response.

The performance of end user perceivable Internet. services are often measured as
the round trip delay (RTT) of a request/ response pair. The RTT is determined by
two factors: the data transfer rate of the underling networks, and the Internet server
processing capacity. For an Internet server, the performance is generally measured by
the service time of each request, and the throughput, i.e., how many requests it can
process per unit time. To maintain or improve the service quality of WWW retrieval,
several efforts have been made, such as, shorten the retrieval path by client/proxy
caching, decrease the retrieval delay by client side prefetching, reduce traffic size by
delta transmission or ﬁltering, and temporally or geographically distribute server load
by server pushing/ replication and server side caching. All of these approaches help to
improve service performance from traditional web sites which contain mainly static
information in the forms of small ﬁles. In web server environments with a signiﬁcant
amount of dynamic and CM objects which are deemed as “uncacheable” because
of the frequent update or prohibitive large size, the performance gain from caching,
prefetching or replication diminishes.

To improve the server throughput and response delay performance, web server ven-
dors continue to upgrade server hardwares and Optimize server software. Although it
increases throughput and reduces response delay of a web server, the service migra-
tion from an existing infrastructure to a more advanced one is expensive and provides
only temporary relief. Even the most powerful servers can be overwhelmed by the
continuously increasing request rate, because the number of requests per second that
a single server can handle is limited but the internet user population is potentially in-
ﬁnite. Even if high performance web servers or server clusters are available, response

times can still hardly be assured at all time frames under fluctuating workloads. Be-

sides, service requirements such as sustained bandwidth or session protection cannot
be fulﬁlled by simply speeding up the response delay.

Recent studies in web servers [3, 4, 5, 6] have addressed the technology of assuring
service quality by prioritized task processing in a web server. By tasks prioritization,
it is possible to provide high quality of services to high priority tasks during high load
periods by blocking or dropping tasks with the lower priorities. We call this kind of
servers as Service Differentiating Internet Servers (SDIS).

Similar efforts have been ongoing to introduce quality of services (QoS) to the
Internet. The Internet integrated service framework (IntServ)[7] provides the capa-
bility for applications to choose among multiple, controlled levels of delivery service
for their data packets. Resource reservation setup protocol such as RSVP commu-
nicates the requirements of applications to network elements along the path and to
convey QoS management information between network elements and the application.
In an integrated service enabled network, an application will be able to reserve re-
sources along a route from the source to the destination. QoS aware routers will then
schedule and prioritize packets to fulﬁll the reserved service requirements. One major
drawback of the reservation scheme is poor scalability, since the number of reserva-
tion setup control messages processed by each router is proportional to the number
of ﬂows going through the router. Similarly the resource reservation approach is not
apprOpriate for Internet servers because of poor scalability, low system utilization,
and/or long setup delay.

The Internet differentiated service framework (DiffServ) [8] aims to implement scal-
able service differentiation in the Internet. It achieves scalability by aggregating trafﬁc
classiﬁcation state which is conveyed by means of IP-layer packet marking and regu-
lation of per-hop behavior (PHB). Data packets are classiﬁed and marked to receive
particular PHB forwarding on network elements along their path. Sophisticated clas-

siﬁcation, marking, policing, and shaping operations need only be implemented at

network boundaries. Network resources are allocated to trafﬁc streams by service
provisioning policies which govern how trafﬁc is marked and conditioned upon entry
to a differentiated services capable network, and how that trafﬁc is forwarded within
that network. A wide variety of services can be implemented on top of these build-
ing blocks, or combined with resource reservation schemes. Differentiated service
approach has been proposed as an efﬁcient and scalable solution to provide better
service for the next generation Internet (NGI) communication [9].

The objective of this dissertation is to design a service differentiation enabled
Internet server which reciprocates QoS efforts from the Internet transmission using
IntServ or DiffServ architectures. The server should be able to provide fast response
to high priority tasks, timely delivery to real-time tasks, and minimal performance
penalty to low priority tasks without degrading the overall system throughput. To

summarize, the main motivation of the SDIS project can be itemized as follows:

0 Several evolving applications use continuous media (CM) objects, such as Web
based radio and TV channels. These data types have time constraints or dead-
lines that need to be met in order for any meaningful delivery. Thus, they
may need service priorities over the non-real-time tasks in terms of isochronous

processing, high storage capacity and sustained data delivery.

0 The exploding e-commerce market supports several types of transactions and
may need to classify services based on revenue and customer groups. The pri-

oritized transactions may be necessary and facilitate e—commerce burstiness.

o The possible adoption of differentiated services in the next generation Internet
will make the best-effort servers unacceptable and may defy the purpose of it.
Thus, in order to provide end-to—end QoS guarantee, the Internet servers also

need to provide differentiated services.

In this dissertation, we develop a model of service differentiating Internet servers,
which can be implemented with various resource allocation algorithms including ad-
mission control, prioritized scheduling, task assignment and load balancing. Policies
and feasibility of service differentiation are studied. Multiple schemes for quality of
service assurance are tested and their performance are evaluated. This research shows
that, trafﬁc characterization is essential for designing efﬁcient and high performance
Internet servers. The SDIS framework is able to meet the different service quality de-
mands of the next generation Internet, as well as evolving applications and services.
Furthermore, it has been shown that the performance of an Internet server, such
as response delay, drop rate and throughput, can be improved and assured through
efﬁcient admission control and overload management.

The main contribution of this dissertation can be summarized as follows.

1. Web server workload characterization: The study on the characteristics
of web server workload helps to determine how the current web server can be
improved to provide better services. This study characterizes the web server
workload related to the web object distribution and server environments. Nine
Web server traces are collected to represent three different classes of Web envi-
ronments: educational, commercial and news. The size, access pattern, lifetime
distribution, and modiﬁcation behavior with regard to Web object types in dif-
ferent classes of Web sites are analyzed and compared. Results demonstrate sig-
niﬁcant differences in static as well as dynamic characteristics of object types in
different Web site classes. The efﬁciency of server caching, thus server response
performance, can be improved when caching priorities and Time-to-Live (TTL)
preferences are given to certain types of Web objects. Based on the workload
characterization study, a two-state cache consistency algorithm is pr0posed to
improve the cache hit ratio and decrease unnecessary polling overhead. Prelim-

inary results show satisfactory performance improvement compared to current

cache consistency algorithms. We also provide guidelines for the design and de-
velopment of caching and prefetching techniques that exploit the Web workload

characteristics in different environments.

. Service Differentiating Internet Servers (SDIS): The conceptual design of
SDIS is examined and analyzed. Approaches of providing differentiated and QoS
based web services, and modeling and analysis of web server scheduling under
different workload situations are studied. The workload include following major
types: static small web objects, database workload (mainly dynamic data), mul-
timedia workload (mainly CM data with bounded delay delivery constraints),
and the combination of the three major workloads. Experimental study and
analyses prove that under high system utilization, a service differentiating server
provides signiﬁcantly better services to high priority tasks compared to a tra-
ditional Internet server. Through analytical modeling and simulation study,
we show the feasibility and performance beneﬁts of the SDIS. Various aspects,
such as admission control, scheduling, and task assignment schemes for SDIS
are evaluated through real workload traces. The results of these studies are

used as foundation for further studies on design and prototype development of

the SDIS.

. An admission control algorithm based on Estimation of Service time
(ACES): The ACES algorithm is designed to provide user perceivable response
delay bounds from busy web servers, and prevent servers from overloading. The
ACES algorithm bounds response delay of requests by allocation of computa-
tional credits based on the estimation of task service times. The service times
are estimated on the basis of the type of requests. A double-queue structure is
implemented to diminish the inaccuracy of the estimation of service time and

make use of spare capacity of the server, thus increasing the system throughput.

Experimental study shows that. the ACES algorithm provides effective control
of response delay bounds under highly variant workload environments while

maintaining the system throughput.

4. PACERS admission control algorithm for SDIS: PACERS algorithm is
proposed to provide different level of services based on the server workload
characteristics. Alternative levels of QoS are assured by periodical allocation
of system resources based on the estimation of request rate and service require-
ments of prioritized tasks. Admission of lower priority tasks is restricted during
high load periods to prevent denial-of-services to high priority tasks. Response
delays of most prioritized tasks are bounded by the length of the prediction
period. Theoretical analysis and experimental study shows that the PACERS
algorithm provides effective control of throughput and response delay bound-
ary to the prioritized tasks, and preserves system throughput under various

workload situations.

In brief, the objective of this study is to answer the following questions. First,
why service differentiation is needed in an Internet server, what is the performance
bottleneck in Internet services, and why not just build best-effort high performance
servers? Second, what is the appropriate model of the service differentiating Internet
servers (SDIS), and can the desired QoS be achieved in the SDIS? If it can, what is the
cost of introducing Q08 in Internet servers? Third, how to maintain the performance
of a busy web server? Last, how to enforce Q08 in the SDIS?

The remainder of this dissertation is organized as follows. Chapter 2 presents
the background materials related to this work, including the deﬁnition and growth of
Internet services, advances of contemporary Internet servers, and QoS researches on
the Internet and its servers. In one word, why service dzﬁerentz'atz'on is needed in an

Internet server? Chapter 3 presents the the server workload characterization study,

which provides baseline of the SDIS design. The conceptual model of the SDIS is
discussed in Chapter 4. The ACES algorithm and overload management of a server
system is described in Chapter 5. Chapter 6 presents the PACERS algorithm and
performance assurance methods in the SDIS. The conclusions and future work are

discussed in Chapter 7.

CHAPTER 2 BACKGROUND

The Internet is a global network of thousands of computer networks connect-
ing millions of computers across the world. A simple standard addressing system and
communication protocol suite called TCP/IP (Transmission Control Protocol / Internet
Protocol) provide an “open” infrastructure to connect different networks, which al-
lows data exchange among various computers with the Internet connectivity. Since
the beginning of 19908, both traffic volume and capacity of the public Internet have
experienced doubling growth each year [10]. The exponential growth of the Internet
are driven by the continuing increase in number of Internet users as well as increas-
ing innovative applications running on the Internet. Due to its fast growth and the
capability of quick and easy access to a large variety of information from hosts all
over the world, the Internet has become a widely used tool for global publication and
exchange of information. Because of its success, it has been regarded as a revolution
in the computer, communications, and media ﬁelds, which has been and continues
to be a deep and broad inﬂuence on our daily life. The ever increasing volume and
evolving Internet applications have been demanding enhanced services from the In-
ternet infrastructure and servers. The objective of this dissertation is to explore the
direction of the evolution of the Internet and its servers to support new applications
and services with different service requirements. Understanding the Internet and its
environments is essential in the performance study of the Internet and its servers.

The following section brieﬂy reviews the Internet history and growth.

2.1 History and Growth of the Internet

The Internet originated from the research on packet switching technique and the
ARPANET project, which was a Cold War project since 1969, to create a communi-
cations network immune to a nuclear attack. The current Internet inherits the merits
of scalability, reliability, and fault tolerance from its predecessors, but falls short in
supporting real-time transmission and response capability due to the limitation of
network transmission protocols. In the ﬁrst few years, the ARPAN ET was primarily
used to facilitate electronic mail (which was invented in 1971) services, well known
as E-Mail these days. Later, new features and services gradually spawned, such as
remote terminal emulation (ﬁrst appeared in 1972, later known as TELNET), ﬁle
transfer services (came out in 1973, later known as FTP), news group publication
services (also came to exist since 1973, later were reﬁned to NNTP), and eventually
the World Wide Web (WWW, started around 1990).

During the early 1980s, the packet switching technique and the TCP/IP protocols
matured and all the networks started using the TCP/IP protocols. The ARPANET
became the backbone of the Internet. In 1986, the NSF NET was created, which
led to an explosion of connections, especially from universities. The size (number of
sites) and capacity (bandwidth) of the NSFNET continued to grow. It replaced the
ARPANET as the backbone of the Internet in 1990. Since 1995, most of the Internet
traffic in the United States have been routed through commercial network service
providers, and since then the Internet has grown to be a commercial success with
billions of dollars of annual investment.

Till the beginning of the 19908, the access to the Internet was restricted to some
research institutes and universities, mainly due to the limited backbone capacity,
and poorly documented and hard to use services interfaces. The invention of the
HTTP protocol, a Hyper-Text Transfer protocol, facilitated graphic information to be

communicated and displayed throughout the Internet for the ﬁrst time. New services

10

and new interfaces to the existing services were developed that are user friendly,
which initiated the proliferation of a large number of the Internet users and hosts.
The so called cyber—community is a direct result of the Internet success. Figure 2.1
plots a survey of hosts number on the Internet obtained from the Hobbes’ Internet
T imeline v5.0 [11], which shows the exponential growth of the Internet hosts. In
December 1969, the ﬁrst prototype of the Internet was built connecting to 4 hosts.
After 30 years, the current Internet consists of about 60 million computers around
the world. The growth of the the Internet Domains, sub-Networks, WWW sites, and

the bandwidth capacity yields similar trends.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hobbes“ Internet Timeline Copyright @1999 Robert H Zakon
http:!Mww.isoc.orngakonilnterneUHistoryIHlThtml
50,000,000 [ . 7 , __.__1 _-
[W - ' ._
S
12/69 4 | 05/82 235 'Nw’ may
501300.000 * 06/70 9 | 08/83 562 'Oldsumy
10/70 11 I 10/84 1.024 I
12/70 13 l 10/85 1.961
40'000'000 ‘ 04/71 23 | 02/86 2.308 I
a 10/72 31 I 11/86 5.089
g 301100.000q 01/73 35 [ 1.2/87 23,174 I
a: 06/74 62 l 07/88 33.000 I
‘ 03/77 111 I 10/88 56,000 .
20.000300 512/79 188 | 07/89 130.000 ,
,08/81 213 l 10/89 159,000 I .
E. , . . ﬁ _ I .
100000008 9
. i
o ’I
0 IIAIIIﬁII?II?IIII#II?IIIII?IIIIIIIIIIIIIIIIIIIIIIIIIIIIIllllllllllIIIIIIlllllIIIIIIIIIlllllllIllllllllll
Orrmmwmmvv¢0m©®~hhwmmm
9’°.’9°979’9’9’9’9’9’.9°°?9’9’:¢29°9’39’°3
.6 3... O1 > n— _ Q )4 .5 8.. GI I— _
o£2§§§2§§%sgoesee§2s§%

 

 

 

Figure 2.1 Growth of the Internet Hosts.

The Internet continues to change and evolve with the technical advances of the
computer and communications industry. With the upgrade of network bandwidth,
it is now evolving to provide such new services as real-time data transportation, for
example, audio and video streams. The introduction of wireless/ mobile access to the

Internet, and development of affordable portable network access equipment such as

11

hand-held computers, is making possible the pervasion of nomadic computing and
communications. The evolution of Internet applications introduces diverse workload
characteristics and server requirements, thus demands much more sophisticated forms
of service provisioning and billing policies, which in turn will drive further the evolu-

tion of the Internet.

2.2 Deﬁnition of the Internet

The deployment of the TCP/IP protocols and interconnectivity among different
networks are essential components in the evolution of the Internet. To further study
the Internet and its servers, we need to have an accurate deﬁnition of the term “In-
ternet”. In 1995, the Federal Networking Council (F N C) passed a resolution deﬁning

the “Internet”:

The Internet refers to the global information system that — (i) is logically
linked together by a globally unique address space based on the Internet
Protocol (IP) or its subsequent extensions/follow-ons; (ii) is able to sup-
port communications using the Transmission Control Protocol/Internet
Protocol (T CP / IP) suite or its subsequent extensions/follow-ons, and/ or
other IP-compatible protocols; and (iii) provides, uses or makes accessible,
either publicly or privately, high level services layered on the communica-

tions and related infrastructure described herein.

The IP protocol is a network-layer protocol (Layer 3 in the OSI reference model)
that contains addressing and some control information that enables packets to be
routed between hosts. Along with the TCP, IP represents the heart of the Internet
protocols. It has two primary responsibilities: providing connectionless, best-effort
delivery of datagrams through an internetwork; and providing fragmentation and re-

assembly of datagrams to support data links with different maximum transmission

12

unit sizes. IP addressing scheme enables computers with different hardware conﬁgu-
rations or using different operating systems to communicate with each other, such as
personal computers running DOS, Windows 95 / 98 or Windows NT operating system,
l\«‘lacint.osh computers, or UNIX-based workstations.

The TCP is a transport—layer protocol (Layer 4 in the OSI reference model) which
provides reliable transmission of data in an IP environment. TCP delivers an un-
structured stream of bytes identiﬁed by sequence numbers. It offers reliability by
providing connection-oriented, end-to-end reliable packet delivery through an inter-
network. It does this by sequencing bytes with a forwarding acknowledgment number
that indicates to the destination the next byte the source expects to receive. Bytes
not acknowledged within a speciﬁed time period are retransmitted. TCP offers ef-
ﬁcient ﬂow control, which means that, when sending acknowledgments back to the
source, the receiving TCP process indicates the highest sequence number it can receive
without overﬂowing its internal buffers.

The User Datagram Protocol (UDP) is another transport-layer protocol that be-
longs to the Internet protocol family, which is a connectionless protocol providing no
reliability, ﬂow-control, or error-recovery functions to IP. Because of simplicity, UDP
headers contain fewer bytes and consume less network overhead than TCP. UDP is
useful in situations where the reliability mechanisms of TCP are not necessary, such
as in cases where a higher-layer protocol might provide error and ﬂow control. TCP

together with UDP provide an interface between IP and upper-layer applications.

2.3 Service Model of the Internet

The most popular network service architecture in the current generation Internet
is the client/server architecture, which allows end users to share resources provided
by the server in a simple and robust manner. The server is generally a powerful com-

puter system that is dedicated to managing shared resources such as storage, network

13

bandwidth, or simply just computation time. Clients are generally less powerful com-
puters, such as personal computers or workstations, to access resources provided by
servers on behalf of the end users. A client sends out requests to a server throughout
the Internet using a speciﬁc language (protocols above TCP/IP protocol) understood
by the server, and the server sends back response using the same protocol. The client
interprets and displays the response to end users.

Another type of network architecture, which is known as peer-to-peer architec-
ture where each node has equivalent responsibility, is becoming widely used with
the progress in computation power and storage capacity of personal computers. The
peer-to—peer architecture is still limited by the lack of commercially-available object
orientation analysis and design tools. To adopting to the peer—to-peer architecture
will require more hardware in the to client hosts, and generally higher overhead in
execution. The client/server architecture will still be p0pular and widely used in the
Internet in the future. Our research of Internet services performance is thus based on
the client/server architecture.

The end user perceivable performance in a client/server structured Internet ser-
vices is often measured as the round trip delay of a request/response pair. The
duration of the round trip delay is determined by the server performance as well as
the underlying network bandwidth. The network should be able to transmit data fast
and with no error. The server should be able to processes requests with little waiting
time and service time.

How to deliver the information fast and reliably throughout the Internet, and
how to provide quality of service in the network transmission is a rich and active
research area, and lots of efforts have contributed to this ﬁeld. Since the early 903,
the Internet bandwidth has doubled each year. The capacity of IP routers has also
been increasing. New technology combining routing and switching techniques such

as multiple protocol label switching (MPLS) [9] are beginning to be implemented to

14

simplify the computation complexity of routing and speed up the packet forward-
ing inter networks. Network level Quality of Service (QoS) provisioning have been
investigated for a few years for implementation in the next generation Internet.
This dissertation is focused on the performance assurance of Internet servers,
which is still quite limited in the literature reports. To further investigate the needs
and future direction of the Internet servers, a brief review of current Internet servers

in terms of service quality requirements is given out in the following section.

2.4 Classiﬁcation of Internet Servers

Internet servers can be roughly classiﬁed into the four categories according to
the requests QoS requirements: Best-effort servers, Soft real-time servers, Hard

real-time servers, and Hybrid servers.

2.4.1 Best-effort Internet Servers

Best-effort servers are so far the most popular Internet servers. The major per-
formance metrics of a best-effort server is the throughput. The server tries to serve as
many client requests as possible in the ﬁrst—come-ﬁrst-serve (FCFS) manner. Usually
there is no resource management technique is implemented in a best-effort server,
and clients do not expect best-effort servers to always respond promptly. Response
delay of several seconds, even minutes is acceptable, as long as most of the requests
can be processed reliably.

An E-Mail server is a good example of a best-effort server. Messages sent through
e-mails could arrive within a second, or as long as several hours, depending on the
server load and network trafﬁc. Interactive applications such as TELNET are not
suitable for the best-effort paradigm, due to the unpredictability of response time
which might not. be able to meet the human desired interaction speed. However,

TELNET servers are still implemented as best-effort servers in which fast response

15

is approximated by capacity planning of high performance servers. FTP servers are
best-effort servers with preliminary admission control. A FTP server should be able
provide acceptable bandwidth for bulk data transfer. Thus the number of active
clients are limited to protect the transmission speed of existing sessions. Most of
current HTTP servers, also called Web servers, are best-effort servers. However,
temporary overload situations can degrade the server performance drastically, and in
some cases, crash the server. One example of overloaded servers in the recent past
is the traﬂic jam on the eBay on-line auction site, which froze out bidders for hours

during the early summer of 1999 [12].

2.4.2 Real-time Internet Servers

Soft real-time and Hard real-time servers are emerging with the growth of
multimedia applications on the Internet such as video conferencing, IP telephony, or
Web radio/ television broadcasting. Unlike best-effort servers, a real-time server
should be able provide services within a certain time constraint, or known as deadline.

For hard real-time servers, responses which miss the deadline are meaningless
and treated as failure. Most of earlier hard real-time services are ensured by using
dedicated or private resources. In the current public Internet, hard real-time ap—
plications are still rare due to the best-effort nature of the Internet communication
protocols. Assurance of the hard real-time constraints are emulated by forcing the
server to work in a extremely low workload situation, or by using leased-lines for
network connections. For soft real-time servers, occasional missing of deadline is
tolerated, but not desired. Most of continuous media data transmission requires soft
real-time delivery and transmission assurance.

There has been ongoing efforts in introducing real-time communications into the
Internet, such as integrated network (IntServ) services using the Resource ReSer-

Vation Protocol (RSVP), which provides hard real-time assurance of the round-trip

16

transmission delay by end-to-end channel reservation. Differentiated services (Diff-
Serv) in the Internet Engineering Task Force (IETF) community, has been proposed
as an efficient and scalable alternative for resource allocation to provide soft real-
time assurance by deﬁning the per-hop—behavior (PHB) of an IP packet, and building
DiffServ capable routers. However, network layer QoS provisioning is not suﬂicient
in guaranteeing user perceivable high performance if the server does not provide any
means of service and performance assurances. Unexpected bursts of client demands
may cause long queuing delay or even crash an overloaded web server. For example,
a premium data ﬂow with end-to—end QoS guarantee from the Internet transmission
may still experience denial-of-semz'ce from an overloaded end server just like any best-
effort data ﬂow. Systematic research on real-time response assurance from an Internet

server is needed to fulﬁll the dynamics of Internet applications.

2.4.3 Hybrid Internet Servers

Hybrid servers are able to serve requests with different service requirements
at the same time. For example, Web servers should be able to serve request with
real~time or best-effort requirements fairly. The World Wide Web comprises of In-
ternet servers that supports hypertext to access almost every Internet protocol on a
single interface, which include e-mail, ﬁle transfer, Telnet, News groups, and its own
protocol of HTTP. Therefore, a web server should have the capability to accommo-
date applications with different service quality restrictions. Mostly due to the success
of the Web technology, allowing users easy access to information linked throughout
the globe, the Internet has now become almost a “community” service, standing be-
side telephone, radio, or television services. Due the capability of organizing almost
all kinds of data formats into a single “page”, there has been an explosive increase
of evolving applications that use the World Wide Web as a distributed information

exchanging interface on the Internet.

17

To most home users, the Internet has been informally renamed as the Web. At
the moment, most peOple use the term “Internet” to refer to the physical structure
of the network, including computers and network elements that connect computers.
They use the term “Web” to refer to the the information space that can be accessed
through the Internet. In the following discussions, the terms “Internet Server” and
“Web Server” are used exchangeably to refer to the sites that provide Internet services
and publish information on the Internet. In the next section, we discuss the current

performance challenges of Internet. servers.

2.5 Performance Challenge of Web Servers

It is common for a popular web site to receive tens of millions of retrieval requests
a day. Many of the latest developments in technology have been aimed at providing
increasingly sophisticated information services on top of the basic Internet data com-
munications. For example, much of the latest attention has been on the use of this
global information infrastructure for support of commercial services. Efforts to use
the Web in revenue generating activities, or well known as e-commerce, is increasing

at a fast rate [1] and likely to maintain its growth rate.

2.5.1 E-commerce Services

The e-commerce servers generally provide support for a variety of request types -
browsing, products selection, billing arrangement, shipping agreement, and banking
management. Some companies are highly successful in using the Web as their market
routes, such as Amazon online bookstore and eBay online auction site. One problem
that exists widely in contemporary web servers is, however, the unpredictability of the
response time, which is not acceptable to time critical transactions which have low
tolerance of response delay. The commercial usage of the Web has changed the data

distributions in the web sites. More and more dynamic data are generated to provide

18

information sharing between different enterprise management systems and personal-
ized services. This type of access requires more processing power and resources and
thus adds to the server load and unpredictability.

Another performance challenge brought by the e-commerce is the wide use of
secure transactions. The security mechanisms such as Secure Sockets Layer (SSL)
or IPsec impose a very high load on the Internet servers due to the encryption,
decryption and authentication processes. Some recent tests conducted by researchers
at Networkshop Inc. indicate that Web servers capable of handling hundreds of
transactions per second may be brought down to just a few transactions per second.
Some server conﬁgurations suffered as much as a ﬁfty-fold degradation in performance

from SSL effects.

2.5.2 Multimedia Services

Continuous Media (CM) trafﬁc, which includes audio and video streams, differs
from traditional data in terms of resource and transmission requirements. These data
types have additional needs in terms of temporal contiguity, high storage capacity,
high bandwidth, and encoding and decoding support. They also require bounded
delay guarantees in transmission, which contributes to the load.

The current web servers handle the CM request in the same manner as big ﬁles.
As a result, non-CM trafﬁc impairs the performance of CM trafﬁc and vice versa.
However, more and more CM objects are being used on the Web for audio-visual
effects and revenue generation (Web TV), which puts additional burden on the server

in the sense that it needs more elaborate resource management.

2.5.3 Service Path and Performance Bottlenecks

The current web servers are like enhanced ﬁle servers in an extremely large dis-

tributed system (i.e., the Internet). The difference is that. “page”s, instead of ﬁles,

19

are managed by a server and accessed by clients across the world. A page is a web
object. in the HyperText Markup Language (HTML) format, which may contain links
to other web objects or ﬁles. A page can be stored in the server as a disk ﬁle or
generated on the ﬂy in response to a request. The Unique Resource Locator (URL)
protocol speciﬁes how to locate a page stored in a Web site, providing naming and
directory services to clients. The HTTP protocol allows pages downloading from and
uploading to the server across the Internet. W :‘b clients and servers communicate
using the HTTP protocol on top of the TCP layer. Detailed description of a HTTP
connection can be found in [13].

When a client makes a request to a particular web server, a TCP connection is
established between the client and the server. However, if there is a caching proxy
server sitting between the client and the server, the client requests might be able to
be served by the proxy server, provided that the proxy server has a fresh copy of the
requested URL. Once the connection is established, the client sends a request to the
server. The server parses the request and issues a response. After the transmission
of the response data completes, the TCP connection is closed. Figure 2.2 shows the
path of web connections, where a domain represents a sub-network in the Internet.

Upon receiving a request, the server parses the URL and authenticates the user,
then loads or computes the requested data. Once the server has obtained the re-
quested content, it sends back a HTTP header followed by the content through the
established TCP connection. A dynamic web object brings extra burden to the server
because the response is generated by anxiliary applications running on the server,
which consumes the computing power of the server. A server tries to process as many
requests as possible in the FCF S order. Requests which exceed the server capacity are
dropped. The use of real-time and streaming data delivery has yet to be considered
in a traditional web server. Figure 2.3 dipicts the basic sequential steps for serving a

request.

20

Proxy Server

 

 

. = Browser = Proxy Server

E3 = Cached docmcnts H = HTTP Connection = Web Documents

0 = Domain

Figure 2.2 The path of web connections.

 

 

   

Send Read Data
Header Send Data J

  
 

Parse
Request

 

[ Accept ?
Connect

Processing Time

  

Response
generation time

Figure 2.3 Request processing steps.

Possible performance bottlenecks which may contribute to long response delay
of a web based transactions include server side CPU processing power and network
bandwidth, and client side network bandwidth. Network backbone is generally not
considered as a weak factor in web server performance research [14]. In a fast grow-

ing Internet, server CPU and network bandwidth might become bottlenecks of web

services because of relatively slow capacity upgrade.

21

2.5.4 Optimization of Retrieval Behaviors

To improve the response performance of Web retrieval, several efforts have been
made along the retrieval path. Caching data copies at client hosts or proxy servers
[15, 16. 17, 18, 19] is regarded as a good technique for reducing both the network traffic
and the server load by shortening the retrieval path thus improving retrieval latency
of web pages. One problem comes with the caching technique is the maintenance of
data consistmrcy. If cached data copies are not consistent with data in the original
web server, stale information will be sent back to web users, which impairs the value
of web caching.

Another technique is called client side prefetching [20, 21]. Client side prefetch-
ing agencies retrieve data copies to local cache prior to client actually requesting
it, thus reducing the retrieval latency by predicting the client browsing behaviors.
However, misprediction introduces extra trafﬁc and load on the servers and networks.
Precise prediction algorithms are needed to balance the cost and effectiveness of the
prefetching technique.

Server pushing/ replication [26, 27] and caching [22, 23, 24] are used to temporally
or geographically distribute server load. Temporal load distribution includes server
side pushing of information during light load periods. Geographical distribution of
server load includes mirror sites generation and redirection of requests during high
load periods.

Delta transmission or ﬁltering [25] is used to decrease the trafﬁc volume between
clients and servers, thus improves the retrieval performance. However, overhead are
brought to both client and server side to compress and decompress data. Multiple
version of data are needed for delta compression, which occupy more storage space.

All of the above approaches assume a traditional web site which contain mainly

static information in the forms of small ﬁles. When dynamic objects requests have

22

high percentage of access rate, the effectiveness of above approaches decreases. Be-
sides, the CM data is considered as cache inefﬁcient, thus media services from the

Web can hardly beneﬁt from the above approaches.

2.6 Improving Server Performance

Although the current server processing model provides a simple and robust service
paradigm. There are still some problems that restrict the wide usage of Internet
servers as the e-commerce vehicle. First, a server has to drop client requests due to
limited number of open connections and buffer space when the server processing speed
can not keep up with the request rate. Dropping of requests causes retransmission
from the client hosts, which puts additional load on the server as well as the network
and thus causes a “snowball” effect. The “snowball” effect might lead the system
to overload situation very quickly and even cause live-lock of a web server wherein
no request can get service. Second, a server does not keep any state information
for any session. It is quite possible that a nearly ﬁnished session is interrupted by
a burst of incoming requests. In an online trading web site, success in completion
of a session or a transaction is critical in bringing in proﬁts. The stateless mode of
server processing restricts the potential of revenue generation. Third, the existing
best-effort Internet servers do not allocate scarce and expensive network and CPU
resources during periods of high workload in terms of the importance of the requests
or the expected response delay, which causes unpredictable response delay that is not
acceptable to time critical and important transactions. A new service model and QoS
aware server is needed to accommodate the requirements of emerging applications on
the Internet.

Usually the Internet Service Providers (ISP) and Internet Content Providers (ICP)

service all clients requests with the same level of performance. Extensive efforts have

23

been made both from both the academia and industry to scale up the server perfor-
mance with the ever increasing server workload. However, work on at the servers has
been limited. In recent years, increased usage of Internet has resulted in the shortage
of network capacity and server processing power, compromising performance of tra-
ditional applications. At the same time, new applications have emerged with much
improved service quality requirements. As a result, service and content providers are
ﬁnding it necessary to offer alternative levels of service to meet customer demands for
new applications. Differentiation of services allow service providers to increase their
revenues through premium pricing policies by competitive service assurance. Thus
the concept of QoS aware Web server has been proposed to compliment the service

differentiation efforts from the network.

2.6.1 High Performance Web Servers

The most direct way to solve the server resource shortage with increasing demand
is to upgrade server hardwares, such as increasing the CPU speed, using symmetric
multiprocessing (SMP) or main-frame computer architectures , employment of high
capacity cache and memory, or increasing the disk and network I/O capacity. The
results published in [28] with the web server benchmark tests using SPECWEB99
provide a rough idea of hardware conﬁguration of current commercial web servers.
Although upgrading hardwares sounds simple and effective, it may be too too ex-
pensive to be acceptable as the upgrade requests might be too high for the entire
spectrum of workload.

Optimization of the web server processing procedure is another ongoing effort
and has been mainly focused on the following aspects. Reduction in connection
establishment overhead can be achieved by persistent (or Keep-alive) connections
being deployed in the HTTP 1.1 protocol [29]. Reduction of idle CPU overhead in

processing a task can be achieved by multi-process, multi-threaded web servers [30],

24

because single-process architecture can not maintain good performance on disk-bound
workloads that exceeds server cache capacity. Most current commercial web servers
use multi-threaded structure which is faster than the multi-process server architecture,
however, the multi—threaded model requires kernel level threads support from the
Operating system. Reduction of dynamic object processing overhead has encouraged
efforts in F ast-CGI [31], dynamic data caching [32], and in-process dynamic processing
such as NSAPI or ISAPI [33]. Other efforts include the wide usage of caching at all
levels, and precomputing of URL and response headers.

The performance improvement from software optimization and hardware upgrade
is comparatively high. The throughput of a single host web server in 1995 was at the
most 200 static requests per second [34]. In 1998, most of the commercial web servers,
e.g., Apache web server, Netscape Enterprise Server, and Microsoft Internet Informa-
tion Server, have achieved benchmark throughput of 1000 to 4000 static requests per
second [35], which are about 10 times faster than their predecessors.

Although adding hardware resources or buying a bigger server machine increases
throughput and delay performance of a web server site, the replacement of an existing
machine with a faster model is not cost-effective and provides only temporary relief.
Even the most powerful server will be overwhelmed by the increasing request rate.
Furthermore, the strain on Internet resources is likely to become more signiﬁcant over
the next several years as the number of commercial services and the size of the data

objects being exchanged are likely to increase dramatically.

2.6.2 Distributed Web Servers

Besides building single high performance web servers, several studies have sug-
gested to distribute load to multiple web servers. One approach to handle popular
web sites might be the replication of information across a mirrored-sites network,

which provides a list of independent URL sites that have to be manually selected by

the users. This solution relies 011 the user conjecture in load sharing, which is hardly
effective in balancing the load among each mirror site. It also brings the overload of
user “probing” behaviors [36].

The SWEB [37] project (University of California, Berkeley) of building scalable
distributed web servers is based on the physical structure of Network of Workstations
(NOW). A heavily loaded server replies with an H TTP-redirection code and provides
a new URL to which the client can resubmit its request. However, by redirection,
one access operation may require two or more connections to different web servers,
thus this mechanism increases the response delay and network trafﬁc in exchange of
load sharing among a network of web servers. A variant of the HTTP-redirection ap—
proach is used in the scalable web server architectures project [38], where two levels of
servers are used: redirection servers and normal HTTP servers. Data are partitioned
according to their content and stored on different HTTP servers to achieves better
caching efﬁciency. Redirection servers are used to distribute the users’ requests to
the corresponding HTTP servers using the redirection mechanism supported by the
HTTP protocol.

The Domain Name Server (DN S) approaches map one server host name to a set
of server IP addresses [39], thus a cluster of HTTP servers appear to be a single
server to clients. Round-Robin or Weighted Round-Robin DNS resolution is used
distribute Web requests across the cluster, and the distributed ﬁle system mechanism
is employed to maintain a synchronized set of documents across the cluster. For
example, there are seven IP addresses mapped to a single host name in the web site of
www.microsoft.com. The problem with this approach is the temporary load imbalance
caused by bursts of requests from a client, or caused by caching effects from name
servers. The DWS [40, 41] project extends the DNS based load sharing architecture by
adding some optimization criterion (e.g., load balancing, minimization of the system

response time, minimization of overloaded servers, and client proximity). Another

26

variant of the DNS approach is to map a single server IP address into several hosts
[42] in a Local Area Network (LAN). Temporary load imbalance caused by bursts of
requests from one client is still remains unsolved.

Another project on building scalable distributed high performance web servers
is being conducted at the IBM T. J. Watson research center. The reported tech-
niques include server clustering, task distribution by TCP routing [43], and a server
accelerator [44]. The TCP routing approach publishes the address of the server-side
router. Every client request is sent to the router which then dispatches the request
to an appropriate server based on the load situations. The dispatching is performed
by changing the destination IP address of each incoming IP packet to the address
of a selected server. The TCP routing approach provides better load balancing and
more ﬂexibility than the DNS approach. The server accelerator is a dedicated ﬁle
server providing fast response to a small set of popular ﬁles. Other efforts of building
high performance web servers over high speed networks include the JAWS project
conducted at Washington University [45], where a prototype high performance web

server that adapts to traffic patterns was built.

2.6.3 Server QoS

Even though high performance web servers or server clusters are available, re-
sponse times can still hardly be assured at all times under ﬂuctuating workloads.
The “bursty” nature and the rapid growth in the volume of the Web traffic makes
it expensive and inefﬁcient to scale up network bandwidth and server capacity with
the increasing demand. The third approach is to selectively provide resources and
services based on the importance and resource requirement of a request, or by mak-
ing the service aware of the Quality of Service (QoS) requests. Guarantee of service
quality can be maintained by reservation and pre-allocation of resources for value

desirable (premium) tasks. No matter how high the real workload is, those premium

27

tasks enjoy dedicated service resources, thus are immune to the high load inﬂuence
imposed on servers. The analogous policy for the Internet transmission is end-to-
end resource reservation through the Resource ReSerVation Protocol (RSVP) [46].
One major drawback of RSVP protocol is the poor scalability, since the number of
RSVP control messages processed by each router is proportional to the number of
QoS data ﬂows going through the router. Similarly the resource reservation approach
is not apprOpriate for Internet servers owing to poor scalability and low utilization
problems.

Strict QoS enforcement is sometimes too pessimistic and expensive, especially
to soft real-time applications wherein occasional degradation of service quality is
tolerable. Statistical QoS instead can be used in the servers, where strict deadline
guarantee is not necessary. Differentiated service (Dz'ﬁServ) [47] has been proposed
as an efﬁcient and scalable solution to alternate the resource reservation approach
discussed above, which is expected to be supported in the next generation Internet
(NGI) communications [9]. Dz’ffServ creates an “express-way” for high priority trafﬁc
by selective reallocation of network resources during peak workload periods. Similarly,
prioritized services can be exploited from a service differentiated server to provide
predictable services with statistical signiﬁcance.

Recent studies on QoS support in web servers [3, 4, 6, 48] have addressed the
technology of prioritized task processing in a web server and related performance
issues. Bhatti and Friedrich [3] addressed the importance of server QoS mechanisms
to support tiered service levels and overload management. A Web-QoS architecture
prototype was developed by adding connection manager module to the Apache [49]
Web server. Admission control and scheduling schemes to assure tiered services were
also discussed.

Almeida et. a1. [4] attempted to augment a web hosting server implementation

with differentiated QoS features. They explored priority—based request scheduling at

28

both user and kernel levels, and found that by controlling the numbers of processes,
one can improve the response time of high-priority requests notably while preserving
the system throughput. They also reported that the kernel-level approach tends to
penalize low-priority requests less signiﬁcantly than the user-level approach, while
improving the performance of high-priority requests.

Eggert and Heidemann [6] evaluated application level mechanisms to provide two
different levels of web services, including limiting process pool size, lowering process
priorities, limiting transmission rate, etc. Their results show less than 17% of high pri-
ority tasks are downgraded in all the cases by limiting the sending rate of background
responses, even without the support from operating systems and networks. Pandey
et. al. [48] described a distributed HTTP server which enables QoS by prioritizing
pages on a web site and allocating server resources based on the priorities.

The above research indicates that by prioritization of tasks, it is possible to main-
tain high responsiveness of high priority tasks in high server load periods by blocking
or dropping tasks with lower priorities. In fact, Hewlett Packard recently introduced
a web server helper product called WebQoS [50], which already has the preliminary
functionality of differentiated services on servers.

Other QoS assurance efforts on Web servers include QoS aware ﬁle content transcod-
ing [51] and adaptive content delivery [52], which try to improve server throughput

by downgrading the web objects quality during high load periods.

2.6.4 Any Layer QoS Collaboration

The idea ‘of SDIS is stimulated by the concept of “any layer QoS”. The Open Sys-
tem Interconnection (OSI) model deﬁnes a networking framework for implementing
protocols in seven layers. Control is passed from one layer to the next, starting at the
application layer at one site, proceeding to the bottom layer, over the communication
channel to the next site and back up the hierarchy. Most of the computer communica-

tions systems have comparable layer structures with the OSI model, although two or

29

three OSI layers may be incorporated into one. Figure 2.4 maps the current Internet
infrastructure with the OSI model.

ATM (Asynchronous Transfer Mode), is a protocol that transmits data as ﬁxed
sized packets/ cells (see http://www.atmforam.com/atmforum/z'ndezr.html for details).
It was designed to make Broadband-ISDN (B-ISDN) a reality. B-ISDN was aimed to
function as a communication network that can provide integrated broadband services
such as high-speed-data service, video phone, video conferencing, CATV services. It
is now mostly deployed at Datalink layer to provide intermediate fast switching for
the Network backbone. IP protocol works at Network Layer , TCP/UDP protocols
implement the functionality of Transport Layer. ATM is the ﬁrst network protocol
which deﬁnes classes of services (COS) and quantiﬁes QoS. Numerous studies on
ATM QoS have been reported on the literature. Interaction between TCP/IP and
ATM with respect to QoS assurance is limited, however. Efforts on QoS guarantee
from ATM switch sometimes counteract the effects of DiffServ or IntServ. Consider a
scenario in which a high priority IP packet with low loss rate requirement is segmented
into several ATM cells in an ATM network. If one of the cells is dropped by an ATM
switch since it has no knowledge of the priority status of the IP packet, the entire
packet has to be discarded at the destination due to incomplete transmission. There
is a need to ensure that characteristics of IP services based on Differentiated Services
(DS) architecture are maintained end to end across intermediate ATM networks [53].

Application layer QoS study, although still in its infant stage, is desirable. Ap-
plications have better understanding of data ﬂow characteristics. It can provide ﬁner
control over resource allocation and distribution. For similar reason, application layer
of QoS should be able to collaborate with network layer QoS effort to assure the user
perceivable performance. For example, when a request comes to the network interface
of a web server, it might carry QoS markers from lower network layer protocols, such
as DiﬁServ or RSVP. The network interface should be able to respect the QoS mark-

ers and pass the QoS requirement to the higher application layer. The IP address of

30

 

 

Application Layer (E-mail l

] WWW]
.71" P

 

Presentation Layer

 

 

 

 

Session Layer

 

 

Transport Layer [TCP/UDP protocol ]

 

 

 

(ﬂ

IP : Internet Protocol]

Datalink Layer [ATM ]

Network Layer

 

 

 

Aync Transfer Mode

 

 

 

 

 

Physical Layer
b

 

 

Figure 2.4 Mapping between the OSI Model and Internet infrastructure.

a client can also be served on the basis of their priority, although security issues need

to be addressed to prevent forgery of IP addresses.

2.6.5 Service Differentiating Internet Servers

With the explosive growth of the Internet users and capacity, the diversity in
the Internet service types and their resource requirements have been also increasing.
New applications have emerged with much improved response time requirements.
The demand for near real-time response are increasing, which can hardly be assured
from a best-effort internet server. The contemporary egalitarian service model of the
Internet and its server seems to be inadequate for the evolving demands. As the
Internet grows it becomes obvious that various Internet applications can beneﬁt from
quality of service provisioning and differentiation. The Service Differentiating Internet
Server (SDIS) is proposed to complement the efforts of Internet QoS provisioning and
accommodate the service requirements of evolving Web applications. The QoS can
be assured through prioritized processing, admission control, and scheduling schemes.
During congestion SDIS should provide much better services to high-priority requests

with minimal impact on the low priority requests.

31

When a request is sent. to the server, the HTTP header provides further informa-
tion of how to deal with the request, e.g., the URL of an object may provide hints of
priority. The processing of a request in the SDIS can be classiﬁed as real-time with
low rates of denial-of-servz'ces , real-time with delay bounds, and best-eﬁort requests.
Resource distribution and scheduling of each task can be done with respect to its
classiﬁcation. Client side operating system and browser information are also exposed
in the HTTP header, which provides hints of client side network capacity. A SDIS
server should be able to tailor the response information a client requested to meet
the characteristics of the client connectivity. The ability of tailoring a web page is
especially cherished in wireless web browsers and low bandwidth dial-up users. When
a response begins to be delivered to the client side, the server should consider the task
priority, delay and delay jitter requirements as well as client side bandwidth. The
careful smoothing of transmission eliminates the congestion control overhead from
lower network layers.

This chapter introduces the related works in the Internet QoS and server perfor-
mances. It also pointed out the shortcoming of current best-effort service model of
Internet servers. With the advances of Internet applications, more and more services
can beneﬁt from the QoS provisioning from the Internet and its servers. However,
what is the appropriate model of the service differentiating Internet servers is still
unclear. In the next chapter, we study the workload characteristics of web servers.
The characterization of server workload helps clarify the emphasizes of a QoS aware

server.

32

CHAPTER 3 SERVER WORKLOAD
CHARACTERIZATION

3. 1 Overview

The success of the web is largely due to its capability of providing quick and easy
accesses to a large variety of information from sites all over the world. The variety of
information and its behaviors however cause the unpredictability of retrieval latency,
which agonizes web users. Characterization of server workload is essential to study
the optimization and improvement of services. Several studies have been done on
characterizing web traffic, especially the static behavior of web objects, including
ﬁle size distribution, trafﬁc distribution, ﬁle type distribution, etc. Pitkow [54] did
an excellent survey on workload characterization study in recent years. However,
little work on dynamic characteristics such as lifetime behavior and modiﬁcation
behaviors of web objects has been reported in the literature. Chankhunthod et.
al. [18] periodically sampled 4600 web objects distributed in 2000 web sites and
calculated the mean lifetime of these objects. Although they indicated that lifetime
values of web objects varying widely, the lifetime distribution was not given in the
paper. Gwertzman and Seltzer [55] mixed the statistical results of web objects from
educational and commercial web sites, which concealed possible inﬂuences from the
web environments. The rate of change of web objects in two corporate communities
was studied in [56]. Such aggregated results might not be a true reﬂection of the

characteristics of either kind of sites.

33

Workload characteristic study in terms of the server environments is also needed.
We believe that different performance optimization should be conducted in different
web server environment due to their different service objective and information char-
acteristics. For example, caching of data copies along the retrieval path is regarded
as a good technique for reducing both the network traffic and the server load, thus
improving retrieval latency of web objects. One problem existing in caching tech-
niques is data consistency. If cached data copies are not consistent with data in the
original web server, stale information will be sent back to web users, which impairs
the value of web caching. Thus the modification behavior of web information should
be studied to improve the data consistency. However, the dynamic behavior of web
objects are largely depend on the context and environment of a web site.

In the following research, we study the relationship between the characteristics of
web objects in different classes of web sites with respect to object types, and the rela-
tionship between static and dynamic characteristics of web objects. Characterization
of the lifetime and modiﬁcation of web documents could provide reference data in the
design process of caching and prefetching techniques. An efﬁcient cache consistency
policy may reduce unnecessary client polling, thereby reducing server load, network
trafﬁc, and retrieval latency. This research shows that the lifetime and modiﬁcation
behavior of a. web document is not only inﬂuenced by its type, but also by the usage
environment. of the web site where it is placed.

The rest of the chapter is organized as follows. Web traces architecture used in the
study is introduced in Section 3.2. Section 3.3 summarized the web traces used in the
study and data analysis methods. Section 3.4 compares and analyzes the web objects
characterization results. The inferences derived from the study and their impact on
the design issues are outlined in Section 3.5, followed by the concluding remarks in

Section 3.6.

34

3.2 WWW Traces

Access logs of nine web servers and proxy servers are used as traces in this research.
Each line in an access log contains information of a single request for a document. A

common log entry is in the form of:

HostName - - (U.9(-37"Name)1 [Timestamp] “ Request-Method URL
(HTTP- Version) ” Status-Corle Size

HostName is the name of the remote computer sending out requests on behalf on
the user. UserName is an authenticated user name when the User Authentication
scheme is used. Followed that is the timestamp of the connection. There are three
Request-Met/zods in the access logs: GET, POST, and HEAD. A GET request asks a doc-
ument from the server or triggers the execution of a program which produces the
output to be sent to the client, which is used for standard page requests. The POST
method is used to send out formated input data to a program in the server. The
execution of the program produces output that sent back to the client. The HEAD
method is used for testing. Status_Code is a three digital number which describes the
retrieval status of a transaction. Success codes are in the 2008; redirect codes are in
the 3003; failure codes are in the 4008; and server error codes are in the 5003. Size

refers to the size of the response content.

3.3 Experimental Study

We have collected trace ﬁles from three classes of web servers: educational (EDU),
commercial (COM) and news (NEWS). Our objective in this study is to ﬁnd out in-
ﬂuence of the usage environment on the static and dynamic characteristics of different

types of web objects.

 

lContents in () are optional.

3.3.1 Summary of Traces

One educational trace is the access logs from the Department of Electrical and
Computer Engineering of Iowa State University. Two other educational traces come
from Internet Traﬂic Archive (ITA) [57]. The traces from ITA have been used earlier
for web server workload characterization in [58]. Commercial and news traces are
extracted from access logs of a proxy server in the National Laboratory for Applied
Network Research (NLAN R) [59].

EDU class: The three traces used for the EDU class are: Calgary (access logs
from a Computer Science departmental web server), Saska (access logs from a univer-
sity web server), and 15 U (access logs from an Electrical and Computer Engineering
Departmental web server). Table 3.1 summarizes traces in the EDU class.

COM class: Due to the difﬁculty of collecting data from commercial web servers
directly, the traces to three commercial web servers are extracted from NLANR
proxy traces. MS (www.microsoft.com), Compaq (www.compaq.com) and Disney
(www.disney.com) were chosen as representatives of commercial web servers. Table

3.2 summarizes traces in the COM class.

 

 

 

 

 

 

 

 

 

Table 3.1 Summary of logs from EDU class.
Item Calgary Saska 15 U
Log Duration 353 days 214 days 97 days
Start Date 10/24/94 06/01/95 02/03/98
Successful Req. 696,619 2,385,866 1,189,376
Avg. Req./Day 1,974 11,149 12,262
Distinct Req. 8,384 8,097 8,603
Trafﬁc (MB) 7,758 12,637 7,338
Bytes/Day (MB) 21.98 59.05 75.65
Bytes/Req. (KB) 11.13 5.30 6.17

 

 

 

 

 

 

NEWS class: Traces used for the news class also come from the proxy traces
of NLANR. CNN (www.cnn.com) and Use (www.usatoday.com) are two popular on-
line news web sites. Weather (www.weather.corn) provides online national weather

service. Table 3.3 summarizes traces in the NEWS class.

36

Table 3.2 Summary of logs from COM class.

 

 

 

 

 

 

 

 

 

 

 

 

 

Item MS Compaq Disney
Log Duration 68 days 68 days 68 days
Start Date 03/14/98 03/14/98 03/14/98
Success Req. 347,589 103,486 91,397
Avg. Req./Day 5,112 1,522 1,344
Distinct Req. 30,716 22,726 17,514
Trafﬁc (MB) 2,096 665 960
Bytes/Day (MB) 30.82 9.78 14.12
Bytes/Req. (KB) 6.03 6.43 10.51

 

Table 3.3 Summary of logs from NEWS class.

 

 

 

 

 

 

 

 

 

 

 

 

 

Item CNN Usa Weather
Log Duration 68 days 68 days 68 days
Start Date 03/14/98 03/14/98 03/14/98
Success Req. 127,203 147,104 73,267
Avg. Req./Day 1,871 2,163 1,077
Distinct Req. 9,906 7,841 2,594
Trafﬁc (MB) 947 834 571
Bytes/Day (MB) 13.93 12.26 8.40
Bytes/Req. (KB) 7.45 5.67 7.80

 

 

 

3.3.2 Document Types

The traces are categorized into 10 different types according to the nature of web
documents. The type of a document is distinguished by its URL or the request
method to it. For example, the suffix of a ﬁle name often tells the type of a ﬁle, and
a POST request always indicates a Form input. The 10 types used in our study are
listed in Table 3.4.

Tables 3.5, 3.6, and 3.7 list the proportion of accesses to different document types
in three server classes. Acc% is the access percentage of each type of ﬁles, and Doc%
is the percentage of unique URLs of each type of documents in a web server. Achrq
is the relative access frequency of a web document with respect to the average access

frequency of HTM ﬁles. The reason for using relative measurement of Achrq is to

improve the comparability between a busy and a less busy web server, and get a fair

37

 

 

 

 

 

 

 

 

 

 

 

 

Table 3.4 Types of Web documents.

Index Type Explanation

0 TXT Plain text

1 HTM HTML ﬁles

2 ARC Formated txt ﬁles or archives

3 AUD Audio ﬁles

4 GIF GIF ﬁles2

5 JPG JPEG and other image ﬁles

6 FRM Form input and responses

7 VDO Video ﬁles

8 EXE Executables and script ﬁles

9 OTH Others

 

 

 

 

average result. Angrf is the average transfer size in kilobytes of each object type,

and Angz is the average document size in kilobytes of each object type.

HTM, GIF, and JPG are the top three popular types. The document (a unique

URL) percentage and the access percentage for the top three ﬁle types, however, are

not the same in each web server class. Compared to the other two classes, there are

more unique HTM documents in the EDU class, more unique GIF documents in the

COM class, and the J PG documents in the NEWS class are much more than in other

classes. The average HTM document size in the NEWS class is about twice of that

in the COM class, which in turn is about twice the size of that in the EDU class.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 3.5 Access pattern of different types in EDU class.
Type TXT HTM ARC AUD GIF JPG FRM VDO EXE OTH
Acc % 0.3 41.5 0.6 0.0 45.6 5.4 5.7 0.0 0.1 0.7
Doc ‘70 1.3 49.0 3.4 0.1 34.3 4.0 0.8 0.4 0.3 6.3
Achrq 0.25 1.00 0.16 0.43 1.79 1.35 50.0 0.11 0.59 0.13
Avg'Irf 43.67 5.66 246.90 72.82 6.28 20.86 2.23 621.82 5.97 58.76
Angz 25.37 4.63 278.60 170.56 14.51 43.09 4.72 751.39 46.66 58.76

 

These differences suggest that, even web documents of the same types have various

average sizes in different server environments. In the following experiment, we also

observe that the HTM ﬁles in the NEWS class are much more mutable than the other

two classes. Here the document’s size and lifetime characteristics indicate the same

38

 

Table 3.6 Access

pattern of different ty

pes in COM class.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Type TXT HTM ARC AUD GIF JPG FRM VDO EXE OTH
Acc ”/0 0.0 21.1 0.1 0.3 68.0 6.2 0.0 0.1 2.6 1.6
Doc % 0.1 36.6 0.3 0.5 50.6 8.8 0.1 0.2 2.2 1.2
Achrq 0.66 1.00 0.43 0.65 2.44 1.22 0.22 0.56 1.79 2.05
Angrf 54.97 13.00 388.24 111.10 3.16 10.87 1.99 964.62 77.56 33.11
Angz 46.90 10.47 346.41 186.57 5.61 14.81 2.13 1320.05 239.58 76.49
Table 3.7 Access pattern of different ﬁle types in NEWS class.
Type TXT HTM ARC AUD GIF JPG FRM VDO EXE OTH
Acc ‘70 1.6 16.5 0.0 0.0 65.0 13.5 0.1 0.1 1.9 1.4
Doc (70 0.0 40.2 0.0 0.1 35.7 21.1 0.1 0.4 0.8 1.4
Achrq 50.0 1.00 2.08 0.61 5.0 2.38 0.88 0.36 5.0 1.82
Angrf 1.05 19.04 38.52 265.11 4.59 10.16 8.22 904.66 31.25 13.72
Angz 1.49 18.10 34.02 235.40 7.93 9.77 7.48 1002.64 103.17 27.44

 

 

trend of caching gain and thus a size-based caching technique could be adopted [60].

However, our study on the size distribution in different classes of web sites show that

this inference is not true for all the cases. The study reported here provides more

insight to the caching and removal policies.

3.3.3 Lifetime Calculation

In the lifetime calculation procedure, an access request is treated as a client polling,

and the corresponding access log records the result of the polling. A status code of

304 indicates that a requested document has not been modiﬁed since last time the

same client requested it. A status code of 200 indicates that either the data copy in

the client’s local cache is stale or has been swapped out from the cache, or the client

requests the document for the ﬁrst time; and the document content is sent back to the

client successfully. Since modiﬁcation timestamp of a document is unavailable from

standard access logs, it is difficult to tell directly whether a cached data copy is stale.

File size is the only information about a web document in a log besides the URL.

We rely on the assumption that most of the modiﬁcations result in a change of ﬁle

sizes. To validate this assumption, modiﬁcations of web documents in a departmental

39

 

web server at ISU were monitored over one month period. The status of each web
document was polled twice on a daily basis. Timestamps of last modiﬁcation and ﬁle
sizes were compared. Statistical results showed that more than 95% of modiﬁcations
resulted in a change of ﬁle size. Using ﬁle size as an indicator of modiﬁcation is thus
a reasonable estimation.

If a web document is requested frequently, it is assumed that its size change could
be indicated in access logs in a short period. This assumption has high credibility in
the EDU class, since traces in the EDU class are access logs from original web servers,
where the most frequent polling records are available. For traces in the COM and
NEWS classes, high frequencies are ensured by the combination of multiple users’
requests and selection of “hot” web servers. The assumption is based on the fact
that the ratio of accesses with status Not-Modiﬁed (304) to accesses with status
of successful retrieval (200) is generally high, especially in proxy trace ﬁles. High
percentages of N (it—Modified responses means that the access frequency and the default
expiration time of web documents are much shorter than the real lifetime of them.
In other words, any change of a web document could be found in a short period of
time by frequent client polling. Table 3.8 shows the percentage of 304 responses to
successful 200 responses. Avg% is the average 304 / 200 retrievals of web servers in
the same class.

The lifetime of a web document is approximated by counting the intervals between
two continuous retrievals of the same document name but with a different size. Here
the same document refers to the documents having the same URL, and a lifetime
of a document is an interval between two successive modiﬁcations of the same URL.
The jth lifetime of a document i, denoted as LT“; can be obtained from, where M Ti]-

denotes the jth modiﬁcation of document i.

40

A couple of factors appear to distort the results in our study. Some documents

might never change during the period of trace recording. The longest observed lifetime

of a document is made equal to the duration of trace ﬁles, which might be shorter

than the real lifetime of a document. Average lifetime would thus be shortened by the

limit of trace duration. However, the observed lifetime distribution is not affected by

that limit. Another factor is that the accuracy of the results are inﬂuenced by the user

reference patterns. The results of frequently accessed documents are more accurate

than the less accessed ones. These two distorting factors are compensatory in nature.

Furthermore, experimental results of the average lifetime of web documents in the

EDU class are consistent with the previous ﬁndings in [55], which suggests that the

inﬂuences of above two factors are negligible.

 

 

 

 

Table 3.8 Not-modified vs. Get retrieval.
Server 304 # 200 # 304/200 Avg
E Calg. 97,560 566,833 17.2 %
D Saska 151,607 1,032,966 14.7 % 23%
U ISU 303,474 846,442 35.9 %
C Comp. 37,834 65,002 58.2 %
O MS 128,236 208,765 61.4 % 53%
M Disn. 24,802 62,851 39.5 %
N CNN 54,835 67,045 81.8 %
E Usa 27,106 119,557 22.7 % 56%
W Weath. 27,248 44,025 61.9 %

 

 

 

 

 

 

 

 

3.4 Results

In this section, we characterize the nature of different types of web documents

with respect to three classes: EDU, COM, and NEWS.

3.4.1 Access Frequency vs. File Size

Figure 3.1 shows the average transfer bytes vs. average size of HTML ﬁles in

three classes. Very small difference exists between the average transfer bytes (average

41

of access percentage weighted document size) and the average document size for an
HTML ﬁle in all the classes. It suggests that there is no obvious tendency of small
HTML ﬁles being more likely to be accessed. To some extent, sizes of frequently

accessed ﬁles are even bigger than the average HTML ﬁle size.

HMhIcacoosmvs no GIFmeacooaaundu

Avg Transfer Bytes
A Transfer as l ] [ ]
E Avvg Documenfystize [ . [ 5 Avg Document Size

 

 

 

 

 

j

. .._.L.—L___1_ _.L.. _-1__ .1. M44

10" ]

l _

Size Kayne.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

M
O

 

 

 

 

 

I . I 1
3 I 3

1

2 2
1-EDUcIua;2-COMdua:3-NEWSdua 1—EDUdnn:2-OOMdau;3-Nﬁwsm

Figure 3.1 Trafﬁc vs. doc. size of HTM. Figure 3.2 Trafﬁc vs. doc. size of GIF.

Unlike HTML documents, frequently accessed GIF documents tend to be small, as
observed in Figure 3.2 where the average transfer bytes are smaller than the average
ﬁle sizes. This tendency is more obvious in the EDU class than in the other two
classes. Although the average GIF document sizes are different in the three classes,
the average transfer sizes of GIF documents are about the same. Our observations
are consistent with [61], but different from [62], in which the author claimed that
frequently accessed HTML documents also tend to be small. Considering the overhead
of TCP connections, caching priority given to small GIF documents is a cost efﬁcient

policy in a limited sized web cache.

3.4.2 Average Lifetime

The average lifetime of the web documents in different classes were compared and
summarized in Table 3.9. The results show that the documents in the EDU class are

much more stable than the other two classes. The average lifetime of a document is

42

about 60 days in the EDU class. Web documents in the NEWS class are the most

mutable, with only 23 days of average lifetime. Lifetime for a typical web document

in the COM class is about 810 days.

Table 3.9 Average Lifetime (unit days).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

File Type TXT HTM ARC AUD GIF JPG FRM VDO EXE OTH
EDU 56.9 61.5 21.8 13.6 63.4 55.6 N/A 10.3 29.7 45.6
COM 10.2 8.5 3.9 9.6 8.6 9.3 N/A 9.2 7.7 10.0

NEWS 8.5 2.9 2.8 2.2 4.4 2.6 N/A 2.0 7.6 4.8

 

 

There is no obvious difference of HTML and GIF average lifetime in the EDU
and COM classes resI‘)ecti\-'ely. In the NEWS class, the average lifetime of GIF ﬁles
is about 50% longer than that of HTML ﬁles. However, the average access frequency
of a GIF ﬁle during its lifetime is 2 to 5 times of that of an HTML ﬁle. Usually the
audio and video clips are expected more stable than other documents. Contrary to
our expectations, results show that the lifetime characteristics of audio/ video ﬁles do

not differ signiﬁcantly from other types of ﬁles.

3.4.3 Modiﬁcation Distribution

The relationship between a document lifetime and its access frequency is exam-
ined. Average lifetime, modiﬁcation times, and accesses during lifetime of the most
popular ﬁles are collected. Web documents are sorted by their access frequency, the
most popular ﬁles are deﬁned as a small set of ﬁles that are responsible for 80% of
retrieval requests. The distribution of modiﬁcation times versus document percentage
and access percentage are collected and shown in Figures 3.3, 3.4, and 3.5.

In each of these ﬁgures, there are four diagrams. The ﬁrst one shows the doc-
ument distribution of all ﬁle types; the second shows the percentage of documents
with different modiﬁcation frequency for each type of ﬁles; the third shows the access

distribution of all ﬁle types; and the last one shows the percentage of accesses with dif-

43

ferent modiﬁcation frequency for each ﬁle type. The ﬁgures indicate the distribution

mode of each web object type.

 

 

 

 

 

 

DOC distribution
100 I I I I I I I I 1W

80 [- -l
a" 60 ~ .
8 «it -
20 " “
o , 1 l -.l

0 1 2 7 8 9

DOC distribution VS. Modification times
100 r r r I r r r I r I

 

 

 

 

 

 

 

 

 

O 1 2 3 4 5 6 7 8 9
Ace distribution vs. Modification times

 

 

100 0 T I I I I I I

 

 

 

 

 

 

 

 

 

Modiﬁcation times

Figure 3.3 Modiﬁcation distribution in EDU class.

Figure 3.3 suggests that some popular documents, especially those GIF docu-
ments, are seldom modiﬁed in the EDU class. Similar observation was reported by
Bestavros in [63]. The average lifetime is about 2 months for a web document in
the EDU class. On the other hand, 1.3% of the HTML ﬁles that are frequently up-

dated accounted for 42% the of access requests, which suggested that the users have

44

tendency to access more frequently modiﬁed data.

Doc distribution

100 I I I I I I I I I F

 

Doc %

4O

20

 

 

   

I 1

0 1 2 3 4 5 6 7 8 9

i i ‘ M

Doc distribution vs. Modification times
I 00 I I I T f I I I I I

 

DOC o/<>

 

 

 

 

 

 

 

 

 

 

 

O 1 2 3 4 5 6 7 8 9
Ace distribution
1 00 f I I I I I I I ﬁr I
80 P -
39 60 ' ‘
O
O
< 40 ~— _
20 _ . . . 7 IL . . , . _,
o |1 - 41 M I I 4 l
O 1 2 3 4 5 6 7 8 9
Ace distribution vs. Modification times
I m I I I T I I I

 

 

 

 

 

 

Acc %

 

 

 

Modification times

Figure 3.4 Modiﬁcation distribution in COM class.

Figure 3.4 shows that more than 94% of HTML documents have average lifetime
of more than 10 days in the COM class. However, the access percentage to those
documents are less than 50%. On the other hand, 0.2% of HTML documents are
modiﬁed in daily base, and the access requests to those documents are about 17% of
the total accesses. This observation suggests that, Bestavros’ conclusion [63] about

the popularity of a document being inversely proportional to the possibility of its

45

modiﬁcation is not valid for the HTML documents in the COM class.

GIF documents in the COM class also show that more popular documents were
more mutable. The documents modiﬁed 1150 times are responsible for 38% of access
requests, and 1% of documents that are modiﬁed more than 50 times are responsible

for 25% of total requests.

 

 

 

 

 

 

 

 

Doc distribution

100 I T I I T I I I I I
so]- «
39 60 r a

8
O 40 p _
20 _ -
0 I I
0 7 8 9
Doc distribution vs. Modification times 0
100 I I I I I I I 1 _5
f - 6—1 0

on
O
I

 

 

 

 

 

 

 

 

DOC °/o

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

80" -1
s 60* a
8
< 40]- ..
20— 3 - - «
o—- 1 1 A I ‘ ‘
0 1 2 3 4 5 6 7 8 9
100 I T
80i—
ae 60[
8
< 40_ if
E!
20]-
F
o l l
O 1 2

 

Modification times

Figure 3.5 Modiﬁcation Distribution in NEWS class.

Figure 3.5 shows that most references in the NEWS class are to the documents

with modiﬁcations more than 50 times. Considering the total trace collection period of

46

68 days, The documents with lifetime of 1 day or less accounts for most of requests.
This behavior is expected of the NEWS class. 2.7% of HTML documents in the
NEWS class are modiﬁed more than 50 times, and they constitute 50% of the access
requests. Although there are some rarely updated HTML documents, access times do
not seem to increase with the lifetime of those documents. For the GIF documents in
the NEWS class, the more popular a document is, the shorter lifetime it has. There

is no GIF document which is never modiﬁed during the trace collection period.

3.4.4 Total Lifetime Distribution

In this research, documents with the same URL in trace ﬁles are considered as the
same document, and the total lifetime of a web document 2', T L,, is deﬁned as the
duration between the ﬁrst access, denoted as M T1,, and the fast access, denoted as
M Tm, to a the same URL. In other words, the total lifetime of a document is equal
to the sum of the individual lifetimes of the document, where an individual lifetime
is deﬁned as duration between any two successive modiﬁcations of a document. The
TL, can be expressed as,

TL,- = 211T,” -— 1117",, = 2": LT“. (3.2)
1:1

Figure 3.6 depicts the total lifetime distribution of three major types of documents
(HTML, GIF and JPG) in the NEWS class. It can be observed from Figure 3.6
that the majority of web documents exist only a short period among the popular
ﬁles. A speciﬁc kind of documents seems to be very popular in the NEWS class.
The characteristic of this kind of documents is a long total lifetime with frequent
modiﬁcations. Although the percentage of this kind of documents is not very high,
they are responsible for high percentage of access requests. For example, during the
trace collection period of 68 days, the front page of CNN —— http://www.cnn.com/ —

was accessed 5,114 times, and was modiﬁed 4,867 times. To most of the individual

47

users, this is also a kind of “uncacheable” document due to its frequent modiﬁcation

behavior.
An interesting phenomenon presented in Figure 3.6 is that the total lifetime of

documents and access requests are almost uniformly distributed except for shortly

existed documents and frequently updated documents.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Total Lifetime Distribution of HTML documents in NEWS field 61 6
60 I I I ‘I '
— Doc Pcnt % _’
- - - Aoc Pcnt % l
50 [ — , — - Mod Times ,
, Avg Traffic KB '
40% , , 4
r‘ If i
I I ll, , ’l [a
I ‘ l l
, l ’ l 1
i \_ ~ I I i
/ \ l \, \ \ I \ ffl\' l _‘
~ I \ . / \ / x / 1 I / \l
4. _, \ / ‘ T
_.Q ‘4‘—v->~ —- A AV z.../
20 30 40 50 60 70
Total Lifetime Distribution of GIF documents in NEWS field 1 3
60 1W I I I '
-—---—- Doc Pcnt % ’
50 _ —- - — Aoc Pcnt % ’ -
- — » - Mod Times , "
. . Avg Traffic KB ,‘ I
40 P l '—
l ' i l
. l l
30 , I l ' , a
, I I I \f
‘ l I, i I \ , I l
- . I i
20 — y.. . I ‘ 4 7,]. g I _. ' q
. / \ i I / ~ ~- / l l g. . l
. \ - 1‘ . i \ 1\, ,
10 A \ ,l.’ v, \ ~./1\/ /\ /-~— » .1 [I ,\ id
1‘ x I v \ " T V » ~ I
A \ / \ / ' ‘ ./ '
0 x —- — .— A— W 1 l_ M W”
O 10 20 30 40 50 60 70
Total Lifetime Distribution of JPG and other graphic documents in NEWS field 2 2
60 I I I I q2
-——-- Doc Pcnt % "
50 P - - - A00 P001 0/0 _‘
» — — - Mod Times ",1
. Avg Traffic KB l l
l _

 

 

 

 

 

 

30 40 50 60 70
Total lifetime (days)

 

Figure 3.6 Total lifetime distribution of documents in the NEWS class.

However, modiﬁcation times of a document tend to increase with its total lifetime

TL,-. Considering the average lifetime of a document being equal to its total lifetime

48

divided by the modiﬁcation times, we can conclude that the average lifetime and
the access frequency of a document do not increase linearly with respect to its total
lifetime. If the cache hit frequency is treated as the value of a cached document, the
caching value of a document reduces with its age in the cache.

These analysis suggest that the traditional caching policy may not be suitable for
retrieving web documents in the NEWS web sites. The total lifetime distribution of
documents in the COM class is also analyzed (the results are not shown for space
limitations). There are also shortly existed documents in the COM web sites, with a
total lifetime close to 2 days, a day longer than that of the NEWS class. The docu-
ment percentage and access percentage of the short-lived documents are signiﬁcantly
lower than the NEWS class. The same situation occurs in the frequently updated
documents. The modiﬁcation frequency and access percentage are also much lower

than that of the NEWS class.

3.5 Design Issues

3.5.1 Document Classiﬁcation

Based on the access, lifetime and modiﬁcation characteristics, web documents can
be classiﬁed into four categories: highly mutable documents, stable documents, short
life documents, and others. The existence of the highly mutable documents and stable
documents was also reported in [55].

A highly mutable document is deﬁned as a document with frequent modiﬁcations,
e.g., the front page of some web sites. They are usually accessed more frequently
than other documents, see Table 10. Conventional caching policies such as LRU or
LF U might be insensitive in distinguishing highly mutable documents, and lead to
high percentage of stale data in caches. Instead, prefetching or server-pushing based
data dissemination might be a suitable option for retrieving documents from web sites

with high percentage of highly mutable documents.

49

Table 3.10 Highly mutable document ratio.

 

 

 

 

Environment Doc % Acc %
EDU 10 20
COM 3 25

NEWS 33 80

 

 

 

 

 

A stable document is in steady state with relatively long lifetime. Some of them
are accessed frequently, while most of stable documents have relatively low access
rate. In the EDU class, 44% of the HTML and 78% of the image documents never
change during the whole trace period, 20% of the HTML ﬁle accesses and 80% of the
image ﬁle accesses are to those stable documents. Nearly 30% of HTML documents
in the COM class were never modiﬁed during the trace period and were responsible
for 18% accesses.

Short life documents are accessed or existed in traces for only 1 or 2 days. Nearly
1/3 of the documents in the NEWS class are Short life documents. There are about
20% of short life documents in the COM class. The short life documents should be
distinguished and removed from the local cache to save space for potentially useful

data.

3.5.2 A Two-State TTL Algorithm

Keeping highly mutable and short life objects in a cache sometimes does not
help to increase the cache hit rate, rather it could cause potentially useful (to be
retrieved in the near future) data copies being deleted from the cache and thereby
increase the number of requests to the original server. Most web caching policies
fail to efficiently deal with all classes of web documents. For example, LFU is not
suitable for distinguishing short life documents [64], and LRU is not effective for
highly mutable documents.

A good cache consistency mechanism helps to prevent a cache from returning

stale data as well as improve the cache efficiency. For instance, highly mutable and

short life documents are granted a short time-to-live (TTL). When the TTL expires,
the document is treated as stale and needs to be retrieved again from the original
server. In case of cache replacement, they are treated as expired data and discarded
to save space for potentially useful data. This is especially important where clients
are personal computers with limited cache space. At the same time, the TTL for
stable documents should not be set too short to introduce bulks of “conditional get”
requests, i.e., GET requests with If-Modified-Since ﬁelds.

Here, we present a two-state TTL consistency algorithm which is depicted in
Figure 3.7. When a new data c0py is cached for the ﬁrst time, it is set in a transient
state with a short TTL, say less than 1 hour. If the data copy is still valid after the
short T T L, it is switched into a steady state with a relatively long TTL. This long
TTL can be decided based on statistic measurement, or a portion of time elapsed

since the last. modiﬁcation time.

 

   
   
   

Transient Cache

(Short TTL) Passes transient state

 

   
     
   

   

        
 
 

‘m r777 . _ 7 _. ..__",
“a; ‘. .. ' .
Steady caChe ‘1“-
i’. ‘ ’ ‘ "i
[I]. ‘1‘. 1- :I [U . TJ':"",:.“ .
I "' I ' ‘ u. 1.. ~,, .1;

is?

Expires/Mispredicts

  

.—

‘ili

4’..‘

   

Figure 3.7 The two—state TTL consistency algorithm.

We set up a simulation using a cache simulator from Virginia Tech [60], and
adapted it by extending three consistency, i.e., adaptive TTL, ﬁxed TTL, and two-
state T TL algorithms into caching policies. In the two—state TTL algorithm, the
cache space is divided into two equal areas for transient and steady state documents
respectively. New caching copies only replace documents in transient state, older

steady state documents are replaced by newer steady state documents in cache re-

placement. The traces from EDU3, COM and NEWS4 classes are replayed to roughly
check the effectiveness of the two-state TTL algorithm. Cache hit rate, conditional
get requests with not-modiﬁed response, and the amount of stale data returned are
collected and compared.

In the ﬁxed TTL algorithm, the default TTL is set to be 1/4 of average lifetime
in that class. In the adaptive TTL algorithm, TTL is set to be 1/2 of elapsed time
since last modiﬁcation. In the two-state TTL algorithm, transient TTL is set to be 15
minutes in NEWS class, 12 hours in COM class, 24 hours in EDU class, and adaptive
TTL is used in steady states. The performance comparison of three cache consistency
algorithms, ﬁxed TTL, adaptive TTL, and two-state TTL, is shown in Figure 3.8

Performance Comparison of TTL Algorithms PA)
I

Fixed TTL
2—State TTL
Adapt TTL

 

    

 

MisPredici MisPollRaIe CacheHiiFlat

Figure 3.8 Performance Comparison of three cache consistency algorithms.

The results indicate that less than 1% of stale data are returned in the EDU
class irrespective of the consistency policy. In the COM and NEWS class, using ﬁxed

TTL returns about 19.8% of stale data, and 10.9% of conditional get requests with

 

3We use the web client traces from Virginia Tech representing the EDU class.
“We use the NLANR proxy traces representing the COM and the NEWS classes.

52

not-modified response. Adaptive TTL performs the best in keeping cache consistency,
returns about 7.3% of stale data, while 25.4% of unnecessary conditional get requests.
Two-state TTL decreases 3.1% of unnecessary get requests from adaptive TTL al-
gorithm at the cost of returning 0.9% more stale data. Furthermore, two-state TTL
increases cache hit rate by average 2.8% compared to the other two TTL consistency
algorithms, in a cache size of 5 MBytes. It proves that the two-state TTL algorithm
do help to improve caching efﬁciency. Detailed performance evaluation can be done

by varying the configuration and workload.

3.5.3 Caching and Prefetching

Due to the explosive growth of the Internet, lots of efforts have been directed
towards implementation of client/proxy caching [15, 16, 17, 18, 19], sever caching
[22, 23, 24], prefetching [20, 21], and replication [26, 27] techniques to reduce network
latency and the server load. However, there has been limited studies reported on the
integration of workload characterization and the design of caching techniques in the
web environment.

In the context of caching, there exists signiﬁcant differences between an web envi-
ronment and a traditional distributed computing system. The workload characteris-
tics in the two environments differ signiﬁcantly. Most of the web caching algorithms
are transplanted from the distributed ﬁle systems that do not consider the static and
dynamic characteristics of documents on the web. For example, the number of docu-
ments on the web are much higher than traditional distributed ﬁle systems. A large
quantity of short life documents exist on the web and accesses to the web documents
tend to be bursty [65]. Caching algorithms should be developed targeting the web
documents characteristics and for speciﬁc classes of environments.

Our study has shown that signiﬁcant differences exist in both static and dynamic
characteristics of web documents in different web environments. Caching algorithms

should be adjusted with the characteristics to gain better performance.

The average lifetime of documents in the educational web sites are signiﬁcantly
higher than the commercial and news web sites. A high percentage of stable image
ﬁles with high access rate exists on the educational web sites. They are the best
candidates of the conventional caching algorithms. Special priority should be given
to graphic documents, since more graphic documents are seldom changed compared
to the other types of documents, and users’ tendency of retrieving small graphic
documents improves cache efﬁciency. In [60], the authors have presented a good
evaluation of caching algorithms suited for documents in the educational web sites.

For documents in the commercial web sites, About 15% of the GIF documents and
nearly 30% of the HTML ﬁles have short lifetime with high access frequencies. LRU
combined with two-state expiration time ﬁlters out these documents effectively. In a
commercial web site, there are also some frequently updated documents with very high
access frequency. The assumption that the popular ﬁles are modiﬁed less frequently
is not valid here. Prefetching with server invalidation should be adopted to deal with
these highly mutable documents; Two possible types of prefetching schemes reported
in [21] could be examined in these situations. The access frequency of documents
on the commercial web sites tend to decrease with the total lifetime for most stable
documents. In such cases, LRU might outperform LFU replacement schemes, which
can explain why some trace driven simulation results are not consistent in web caching
studies [64, 60, 19].

Most documents in the news web sites are highly mutable and many are accessed
only in a short period. Except for large proxy servers, news documents are “not
worthing caching” for personal web users. Instead, server initiated push caching as
in [27] might be used to gain better performance. For the less mutable documents in
the news web sites, their TTL should be set signiﬁcantly shorter than which from the

other classes of web sites.

3.6 Summary

Different. types of web objects from different sites have different static and dynamic
characteristics. In this study, we have reported an extensive study on the size, access,
lifetime and modiﬁcation behavior of web documents from various web sites, and
tried to ﬁnd out some rules in improving caching schemes with the knowledge of web
environments dynamics. Some of the major observations derived from our study are

itemized as follows.

0 The document size, access pattern and modiﬁcation behavior are dependent on
the document type as well as the server environments. The TTL estimation
algorithm should be made adopted to the document type and/ or the server en-
vironment. A two-state TTL weak consistency algorithm based on multi-mode
characteristics of web documents introduced in this study might outperform

current ﬁxed TTL or adaptive TTL algorithms.

0 A generalized technique for caching and prefetching technique may not be effec-
tive for a diverse WWW environment. Rather, server—speciﬁc or data-speciﬁc
caching and prefetching techniques should be adopted. Caching consistency
policy should be taken into consideration in caching schemes to improve the

cache efficiency.

0 Several additional interesting behavior of different types of web documents in
different classes of servers have been reported in the study. In COM and NEWS
classes, there is a strong trend that users access highly mutable documents more

frequently than other documents.

The results suggested that the performance of an optimization scheme is inﬂuenced
by different classes of web servers as well as by different types of web documents.
Thus, caching, prefetching, server initiated replication, and other service optimization

techniques should be designed such that they exploit these varied characteristics.

C“
or

CHAPTER 4 SERVICE DIFFERENTIATING
INTERNET SERVERS

4. 1 Overview

The existing best-effort with F irst Come F ﬁrst Serve (F CF S) scheduling service
model of the Internet servers leads to mis-allocation of scarce and expensive network
and CPU resources during heavy load periods, thus causes unpredictable response
delay, which is not acceptable to time critical transactions. The problem of unpre-
dictability in response time becomes extremely serious during information retrieval
from the Web, since more and more applications with time constraints are using the
Web as their distributed interfaces. Upgrading hardware to faster CPU, more effi-
cient OS, and broader I/ O bandwidth are always solutions for improving the response
time and throughput of Internet servers. However, the “bursty” nature and the rapid
growth of of the Internet trafﬁc volume make it expensive and also inefficient to scale
up network bandwidth and server capacity with the increasing peak demand. Pre-
vious studies [58, 62, 66] have shown that the peak workload of a web server may
exceeds its average load by orders of magnitudes. Thus the server utilization would
be very low if we design the system to satisfy requirements of the workload during
peak periods.

Another approach of assuring service quality is to pre-allocate resources for value
desirable tasks. No matter how high the real workload is, those cherished tasks

enjoy dedicated system resources. The analogous policy for the Internet transmission

56

is the end-to—end resource reservation through the Resource ReSerVation Protocol
(RSVP) [46] scheme. One major drawback of the RSVP scheme is poor scalability,
since the number of RSVP control messages processed by each router is proportional
to the number of ﬂows going through the router. Similarly the resource reservation
approach is not appropriate for Internet servers because of poor scalability, low system
utilization, and/or long setup delay.

Differentiated services approach, known as DiffServ [47] in the IETF community,
has been proposed as an efﬁcient and scalable solution to provide better service for
the next generation Internet (NGI) communication [9]. Dz'ﬁ'S'erv creates an “express-
way” for high priority trafﬁc by selective reallocation of network resources during
peak workload periods. Similarly, prioritized services can be exploited by a Service
Differentiating Internet Server (SDIS) to provide predictable services with statistical
guarantees. Especially, the response delay to mission critical requests can be bounded
by allocating CPU and I/ O resources with respect to their priorities. Our research
objective is to design Internet servers that can provide fast response to high priority
tasks, minimize the performance penalty of low priority tasks caused by service differ-
entiation without degrading the overall system throughput. In this study, we present
a model of SDIS and evaluate its performance through analysis and simulation. Dur-
ing congestion a SDIS provides much better services to high-priority requests with
minimal impact on the low priority requests. We have also analyzed several features
of SDIS, such as admission control, task scheduling, degree of priorities, and the over-
heads. Impact of some of these issues on SDIS performance are quantiﬁed in our
study. The results demonstrate the feasibility and performance advantages of SDIS.

The remainder of this chapter is organized as follows. Section 4.2 describes poli-
cies for providing differentiated services from an Internet Server. The server model,
admission control and scheduling issues are discussed in Section 4.3. The simulation

environment and results are discussed in Sections 4.4 and 4.5, respectively. Section

4.6 analyses the results, Section 4.7 describes the related works, followed by the con-

cluding remarks in Section 4.8.

4.2 Service Differentiation

A traditional Internet server processes requests in F CF S manner. During a high
load period, each task has to wait in a queue for a long time before getting services.
Overhead from tasks competing for limited resources, such as open connections and
network bandwidth, is increased. “Retry” from impatient clients worsen the load
situation and cause the “snowball” effect. More elaborate resource allocation schemes
rather than the best-effort model need to be adopted to provide predictable services
in high load periods. In this section policies of differentiating services that could be

deployed by an Internet server, especially a web server, are discussed.

4.2.1 Prioritized Services

In the DiﬂServ model for the Internet, different priorities are assigned to packets,
high priority packets enjoy timely processing and assurance of services. Similarly at
the server, an incoming request can be assigned to a priority group on the basis of
the client, network, content, or owner of the requested object. It can also inherit a
performance level deﬁned by low level network protocols such as the priority assigned
by the DiffServ model.

Client based service prioritization scheme provides means for a client to purchase
premium services at a certain cost. A client can be authenticated by combination
of cookies, client host ID, session ID provided by the server, or by other client pro-
ﬁle information in a server. More detailed user authentication gives the users more
ﬂexibility for QoS selection, whilst increases the load of the status information main-

tainence at the server.

Network based service differentiation is determined by the underlying network
level protocols. Clients specify priority of packets which carry request information.
The server extracts the packets priorities and sends reply at the matching service
levels.

Content based service differentiation is determined by the “value” or “impor-
tance” of the web object being requested. In an on-line stock exchange center, pri-
ority can be granted to purchase/sell requests over queries. Similarly, in an on-line
shopping web site, the transactions of someone with items in the shopping cart or
someone entering payment information should have higher priority than the general
browsing transactions.

Owner based prioritization of service provides selective Q08 in a web hosting
environment. A web host can grant priorities to requests made to web pages belonging
to owner organizations based on the prices they paid.

The latter two service differentiations are based on the URL or request methods,
GET or POST, of a web object, which is deﬁned in the HTTP protocol set. Several open
issues associated with prioritized service and their studies and results are reported in
the following sections. How many levels of priorities can be set up to provide both
ﬂexibility and efﬁciency of the system is an important metric but is considered beyond
the scope of this work. Similarly, quantiﬁcation of QoS and pricing issues for different
level of services also need further investigation and analysis, and are not emphasised

in this study.

4.2.2 Customized Services

The resources consumed by an Internet server include CPU processing power,
network bandwidth, disk I / O, etc. Different types of tasks have various resource re-
quirements. Serving a dynamic web page is much more CPU intensive than retrieving

a static web page. Consequently, it is not uncommon that the response times of such

dynamic requests are orders of magnitude higher than the response times of static
requests. In case the CPU load becomes high, low priority dynamic requests should
be restricted to prevent CPU from overload and preserve system throughput. Some
tasks are I/ O intensive and consume large volume of disk and network bandwidth
for a sustained duration. .\lost of CM tasks fall into this category. Low priority CM
requests are restricted to prevent too many open connections blocking I/ O channels.
High priority CM tasks should be carefully scheduled to meet the bounded delay re-
quirements. However, most conten‘iporary web servers drop requests indiscriminately
in case of overload or I / O congestion, which causes further burst of request ﬂows and
service degradation of high priority requests.

If the network supports IntServ [7] architecture, an Internet server can be enhanced
to honor (208 information carried by each data ﬂow. The data ﬂow response delay
and dropping rate requirements of an admitted request should be met, and the rest
of system resources can be allocated fairly across “best-effort” or “better-than-best-
effort” trafﬁc ﬂows.

In order to provide customized service, available resources in each system com-
ponent should be monitored in time. Overload avoidance and detection algorithms
should be explored to maintain system response time and throughput. Informed drop-
ping mechanism should be adopted to prevent client from retrying during congestion.

Admission control and process scheduling issues that help in service differentiation

and overload control are discussed in the following sections.

4.3 A Generalized Internet Server

In this section, we have presented a model of a generalized Internet server and have
analyzed the feasibility of service differentiation. Although other models of Internet
servers exist, the goal of the proposed model is to study the issues involved in service

differentiation, not the types of Internet servers.

60

4.3.1 Service Differentiating Internet Server Model

Figure 4.1 shows a queuing model of a generalized QoS aware Internet server.
The server architecture is extracted from Apache [49] web server source code, which
is the most popular server type in current Internet. QoS aware components are
derived from a web server design design in [43]. Another kind of distributed web
server model can be found in [37]. No task dispatcher is needed in the latter model,
load sharing and balancing is achieved through distributed negotiation and request
forwarding. Apparently, the latter architecture can be regarded as a group of single
server providing prioritized service.

The system consists of four major logical components, an task initiator T], a task
dispatcher TD, a task server pool S,(i : 1...N), and a communication channel NS. A
represents the requests sent from clients. (1.4 represents the dropping probability of
incoming requests. A' are requests that got served. The initiator T1 maintains open
connections of a system. Incoming requests are queued awaiting acceptance by the
T1. Each accepted request is assigned a task, and each task is granted an appropriate
priority level based on system settings. The task dispatcher TD assigns tasks to a
task server, and each task server schedules and processes tasks according to their
priorities. Responses are sent. back to clients through the communication channel
N3. To simplify the model, we assume that the server connects to the client through
a high speed network. We have ignored blocking and ﬂow control from client side
network connection as they are beyond the scope of the proposed study.

The task initiator T, picks up requests from the incoming request queue. The
queue length is limited by the operating system constraints and allowable open con-
nections in the server. Incoming requests are tail-dropped if the queue is full. With
the cooperation of intelligent network interface, network layer priority setting of a
request can be passed on to the application layer. T, can be used to collect priority

information from low level network protocols and convey it to the task dispatcher

61

4121'

i i
A ”may .4- _LA_. I'll _,
—-> llll ~
. A'
e E

 

 

 

 

 

.___>q,,,- llll -—>
Web Server Model
_ J//”"Vx\
Client ,// \\
\ .
A— ] [ [ I ‘__[ Internet >+L ] I l l <—

Figure 4.1 Queuing Network Model for an Web Server.

T D. Otherwise, TD determines the task priority based on the implementation of the
service differentiation algorithm.

After determining the priority of each task, TD selects a task server to process
the task. Priority is assigned based on the criteria discussed in Section 4.2. The
task dispatcher identiﬁes the clients and checks proﬁle of the requested web object.
At the same time, it monitors the popularity of web objects and the load of system
components. Statistic information of the popularity of web objects is used for server
caching and prefetching optimization, and the load information is used for the task
assignment procedure.

Each task server, S,(i : 1...N), represents a processing unit that takes care of
processing a task. For a. static request, the task server translates the URL into the
corresponding ﬁle path, ﬁnds the ﬁle, copies the ﬁle contents into memory, and sends
back reply to the client through the Internet. For a dynamic request, the server
invokes a gateway interface. A web page is generated “on the ﬂy” based on the result
returned from the gateway interface and sent back to the client. A task server is

an abstract concept in the sense that it can be a child process in a multi-process

62

server, a thread in a. multi-threaded server, a processor in a multi-processor server, or
a host in a server cluster. Based on the system environment, each task server share
or has an independent waiting queue, memory, cache space, and other resources.
Task servers can be symmetric, i.e., with the same hardware conﬁguration, or can be
asymmetric with different hardware conﬁgurations. The task dispatcher provides a
bridge to seamlessly integrate new hardware equipment into the server system. In an
asymmetric task server environment, response time should be predicted when a task
is assigned to a speciﬁc server.

The capacity of the communication channel N s is determined by the bandwidth
of the server network access point. In the current Internet infrastructure, a server
generally has T1 (192 KB/s) or T3 (5.4 MB/s) connection. An Intranet web server
generally connects to clients through high-speed local area networks such as 10 Mb
or 100 Mb Ethernet. Data throughput is determined by the channel bandwidth and

network utilization.

4.3.2 Server Processing

To process each web task, resources are consumed in I/ O interrupt handling,
TCP checksum computing, URL parsing, user authentication, ﬁle location, reply
transmission, and logging. We design a server queuing model based on the estimated

cost for processing each request using the following formula:

Ts : :Fim't + Tcpu + Tdata + Tnet (41)

T, is the sum of service time of each system component. Tim-t is the time to bring
a request from network interface to the application server. Tm, is the CPU time to
handle an incoming request, plus the computational cost of dynamic tasks. Tnet is
the cost to send processing results back to the client. T dam is the cost to load the

data from disks into server cache. If the requested data is already in the server cache,

63

Tim is zero. If the requested data needs to be loaded from the disks, the value of
Tdata is approximated by a disk model with ﬁxed seek time and disk bandwidth.
The system model can be constructed as a queuing network. The waiting time
of a task is determined by the time it, spends on every network component, and the
service time it gets is the sum of the service time received from each component.
Most components are sensitive to system overload [67, 68]. The system becomes
unstable and response time increases drastically when the request access rate and
resource consumption exceed certain threshold. (208 can be assured by maintaining
the system load below the threshold, which can be achieved by admission control,

scheduling, and efﬁcient task assignment schemes (if multiple servers are available).

4.3.3 Admission Control

Admission control can be implemented using the following two steps: The ﬁrst
step of admission control is provided by the initiator T1. Incoming requests are
rejected in case of access rates exceeding the processing capacity of the system. In
Figure 4.1, sf provides feedback of system capacity to the initiator. To avoid bulk
rejection, early overload detection can be adopted and two levels of thresholds can
be used similar to the Random Early Detection (RED) scheme [69]. Low priority
packets are dropped with increasing probability after the system load exceeds the
ﬁrst threshold. All the. packets are dropped when the system load exceeds the second
level of threshold. Average queue length at the initiator can be used as the threshold
indicator. The value of the thresholds can be manually conﬁgured, or dynamically
adjusted depending on the application environment or on the variety of incoming
requests.

The second step of admission control can be provided by the task dispatcher TD.
The task dispatcher monitors the task server load and makes decision for dropping

requests to assure that the system works in acceptable load condition. Contrary to

64

the ﬁrst step of uninformed dropping, the task dispatcher sends back brief explanation
to clients stating why it cannot fulﬁll the request. The informed dropping decreases
the “retry” attempts from clients. When one task server is overloaded, tasks are
redirected to the other task servers. If all the task servers are overloaded, low priority
tasks experience “informed dropping” to relieve the overload situation. In case of
communication channel overload, low priority large size tasks are ﬁrst discarded,
continuous media data requests are discarded when bounded delay QoS can not be

assured. Available bandwidth is used as a metric of channel load.

4.3.4 Process Scheduling

In a task server, priority based scheduling is adapted to provide faster processing
of some tasks than others. In this study, we have used strict priority scheduling
policy, i.e., low priority tasks do not get service if higher priority tasks are waiting,
whilst some of the other scheduling schemes such as weighted fair scheduling can
also be considered. Tasks in the same priority group are serviced in FCF S order.
Preemptive scheduling allows high priority tasks to preempt low priority tasks, and
thus eliminates the need for tasks to wait for completion of service of lower priority
tasks. It does introduce the complexity of context switching and status preservation
of preempted tasks. We consider a non—preemptive scheduling approach in our study
to avoid the complexity caused by preemption.

Scheduling in the communication channel also follows priority based queuing.
Unlike the task server, the data that is being sent out is restricted by the page size of
cache memory and network protocols. The scheduling in the communication channel

is similar to prioritized process sharing.

4.3.5 Task Assignment

A distributed server system introduces ﬂexibility of selecting a server for a speciﬁc

task. Selection can be based on task priority, task type, spatial and temporal locality

65

of web objects, and hardware configuration.

One or more high performance web servers can be reserved for high priority tasks
to make sure that the high priority tasks always encounter short queue length. An-
other approach of task assignment could be the assignment of task type to different
servers. Same type of tasks often consume about the same amount of processing time,
while tasks with different. types may consume processing time with very high vari-
ance. Task assignment schemes which lead towards lower variance of processing time
help to improve response time and throughput. CM data require streaming delivery
for a sustained time period, for which process sharing performs better than FCF S. In
addition, task assignment schemes considering temporal locality has the potential of
improving caching efﬁciency.

Task assignment scheme can be combined with scheduling. If each task server
has the facility of prioritized processing, then tasks can be sent to the corresponding
server right away to alleviate the queuing load at the task dispatcher. Otherwise,
the task dispatcher blocks low priority tasks unless there are no high priority tasks

waiting to be served at a. server.

4.3.6 System Overhead

Prioritized servicing inevitably brings overhead in two aspects: the extra CPU
cycles for arbitrating the appropriate priority for each task and kernel overhead in
priority-based scheduling. The overhead brought by priority assignment depends on
the priority assignment criteria. If the priority is inherited from the lower network
layers, for example, client based priority assignment and DiffServ enabled packet
classiﬁcation, the arbitrating overhead is trivial since we only need to pass a QoS pa-
rameter from the network interface. If the priority assignment is based on the URLs
of web objects, the priority arbitration can be combined into the URL parsing step

of a conventional web server. By combining URL parsing with priority assignment

66

functionality, the overhead is reduced. However, part of the task processing cannot
beneﬁt from prioritized services. Other methods of decreasing overhead include sep-
aration of coding space of cookies, directory caching, and separation of naming space

based on the ownership of objects.

4.4 Analysis of Waiting Time

We ﬁrst present an analysis of the comparison of waiting time estimates between
prioritized and non-prioritized system. The analysis is based on simplistic queuing
model assumption and is meant for studying the impact of various workload param-
eters on the waiting time. A detailed performance evaluation is done in the next

section.

4.4.1 Waiting Time in Non-prioritized Systems

First we consider a general non-preemptive queueing system with FCFS discipline
assuming infinite queue space. A task waiting time to can be decomposed into two
parts: the residual life of the task being serviced upon its arrival ml, and the delay it
experiences due to tasks enqueued upon its arrival "mg. The waiting time, 11), can be

expressed as:

u: = nil -+- "LL-‘2. (4.2)

Let. 71,, be the number of queued tasks when a new task arrives, M, be the expected
number of queue length of the system, 1(2') be the service time of the ith task in the
queue, and X be the expected system capacity expressed in time units. The waiting

time of ueued tasks M2. and the ex ected value of 11:2, denoted as ””2. e ual to:
3 . I

in; = Z 1f(i) (4.3)
i=1
W32 2 E{n,q} * E{.r} : Nq * X. (4.4)

67

Let /\ be the mean task arrival rate, and ””1 be the expected residual life of the
task being served when a new task arrives. According to Little’s Law, the expected
waiting time of a. task, ll", can be represented as:

M, _ W1

II'I":———Ii’ N X:—————-—.
/\ 1+ (7* 1—X*)\

(4.5)

Equation 4.5 indicates that the expected waiting time of a task can be expressed
as a function of inter-arrival rate, service time and the residual life of a task. When
the task arrival is an independent and identically distributed random process, the

Pollaczek-Khinchin (P—K) mean value formula can be used to ﬁnd the average waiting

time in the queue as:

_ /\ * (x2 + E{;r2})

IV
2(1 — X * A)

 

(4.6)

However, given the existence of self-similarity (or long-term dependence) charac-
teristics in the Internet trafﬁc, a more general arrival distribution estimator should
be used. A precise estimation of waiting time in a G / G / 1 queueing system is hard to
obtain, while we can get an approximation of waiting time in a heavy trafﬁc situation
[70]. The value can also be served as an upper bound of mean waiting time in a

general queuing system, which is expressed as:

N /\ * ((7,, + or)
~20—X*M’

 

W’ ma

where 00 and 0:, are the variance of the inter-arrival rate and service time, respec-
tively. It can be observed from Equation 4.7 that the mean waiting time increases
nearly linearly with the variance of inter-arrival and service time during heavy load

period.

68

4.4.2 Waiting Time of Prioritized Systems

Considering a non-preemptive queueing system with prioritized services, where
tasks are queued and processed according to priority groups. The waiting time of a
task with priority 12(1) : 1, P) is decomposed into three parts: wlm is the residual
life of the executing task when a new job arrives; 11:2,}, is the service time of queued
tasks with equal or higher priority when the task enters the system; and 103,1, the delay
due to higher priority tasks arriving during its waiting period. For the highest priority
tasks, the last part of waiting time needs not to be considered. Let /\p(P = 1, ..., P)
be the task inter-arrival rate of the pth priority group (P is the highest priority), and
/\ = 25:1)‘12- The expected waiting time of the tasks in the pth priority group can

be expressed as:

E{urp} : E{url,p} + E{w2,,,} + E{w3,p}

P ni P
: 1171+ 22.23,,- + Z E{wp})\,-E{:r,-}
izpj=l i=p+l
P P
= “"1 + Z E{w,~}A,-E{.r,~} + E{wp} Z /\iE{.Ti}. (4.8)
izp i=p+1
P
Nq :- l’V * /\ = Z APE{wp}. (4.9)
p21

””1 is the mean residual life of an executing task, which is unrelated to the priority
distribution due to non-preemptive discipline of the system. Assume the mean service
time of each priority group is Xp, (p = 1, ..., P). Then, combining Equations 4.5, 4.8,

and 4.9, we get the expression for W}, as a function of X,-, A,, (27 = 1, ..., P) and W as:

. [V]
W = , , (4.10)
” (1- 2.2pA.A.>(1— 2.3:... m.)
: W(1— 22:1 AiXi) . (4.11)
(1— {=1} Aix\'i)(1— fzp+1/\i4¥i)

 

 

69

Equation 4.10 illustrates the waiting time relationship between different priority
groups. The overhead with several priorities may be high and it may not be worth-
while to support a large number of priorities from a cost / performance standpoint. A
few classes of priorities may be enough to provide adequate service differentiation. For
the sake of simplicity, let us consider a system with two priorities assigned to tasks,
high and low priority tasks. Let Wh be the expected waiting time of high priority
tasks, and W, the expected waiting time of low priority tasks. p, ph, p, are the utiliza-
tion factors of the system, high priority group, and low priority group, respectively.

The waiting times can be derived as:

 

 

U" 1 —
up, = ——1— = WL—i), and (4.12)
1 — ph. 1 — Ph
l 1
W, = V? = W . (4.13)
(1—ph)(1—P) 1—,0h

From the discussions in the previous subsection, we know that the average waiting
time of a task is determined by the distribution of inter-arrival rate and the service
time. These distributions are rather stable for an existing system. The average
residual life of a task, W1, in the system has an upper bound of X, which occurs when
the system utilization approaches 1. Given the average waiting time of a system,
which can be easily derived from the statistics of response time of tasks, we can
estimate the waiting time of different priority groups versus their utilization factors.
For example, assume the service time distribution of different priority groups are the
same, i.e., X = X, = Xh. Ph and H are the probability of a task in the high priority

group and the low priority group. Equations 4.12 and 4.13 can be rewritten as:

W1 < _ X
1—P,,*p _1—P,,*.\"*A
W1 , 1

W = = W
’ (1—P,,*X*A)(1—-X*A) 1—P,,*X*A

W, = (4.14)

 

 

 

(4.15)

70

 

The value of If"), increases inversely with respect to (1 — Ph * p). Even the system
operates close to its full capacity, the expected waiting time of high priority tasks can
still be controlled by adjusting the value of Ph. The value of W, increases inversely
with respect to the product of (1 — Ph * p) and (1 — p). If the value of (1 — Ph * p)
remains constant, e.g., C = 1 — P}, * p,0 < C < 1, If", follows the changes of W
linearly against various system load situations.

The above analysis provides an average estimation of response time of tasks with
different priorities. In reality, the variance of response time can be quite high, and
a conservative system design is needed to assure the response time of high priority
tasks. A rule of thumb is to keep high priority utilization factors no more than the
acceptable system utilization factor. For example, if the delay begins to increase
sharply from the server utilization of 0.5, then the system can provide acceptable
service to high priority requests with Ah 2 0.5X. The analysis presented in this
section is based on the time independent identical arrival pattern and service time.
This analysis is useful for obtaining estimates of waiting time in prioritized and non-
prioritized systems. The results could be used to study the impact of the workload
parameters such as inter-arrival rate, high priority task ratio, service time, and server
utilization on the waiting times. The comparison of the magnitude of impact of these

parameters on the high priority and low priority tasks can be also examined.

4.5 Simulation

The request arrival patterns at Internet servers are known to exhibit high self-
similarity and long-range dependencies. Thus it is difﬁcult to build a good analytical
model that can provide us an in-depth view of the performance estimation. In this
section, we have used a simulation model and traces from a real web server as workload
to examine the performance of SDIS. We have implemented an event driven simulator

for the experimental study of the server model proposed in Section 3. The simulator

71

model is built using the CSIM [71] simulation package. Response time and slowdown
performance under different workload along with the scheduling and task assignment
schemes have been examined and compared. Other aspects, including admission

control and server side caching algorithms, have been studied as well.

4.5.1 Workload Generation

Previous works [66, 61, 72] have suggested apparent self-similarity and long-term
dependency of the WWW trafﬁc pattern. However, accurate synthesis of self-similar
traffic remains an open problem [68, 73]. In this study, we pr0pose to generate
workload from real trace ﬁles. We monitored the logs from a departmental web
server in the Computer Science department at Michigan State University. The trace
ﬁles contain 866,587 requests in one-week period. The access logs provide the request
timestamp, client ID, object URL, service status, and reply size of each request. The
referrer logs complement burst and session information. Request type distribution

from a week access log is list in Table 4.1. Coefﬁcient of Variance (CoV) of request

inter-arrival rate is 3.59 during the observation period.

Table 4.1 Trace Data Distribution.

 

 

 

 

 

 

 

 

 

 

 

 

 

Item HTML Image Audio Video Dynamic Other Total
Req. Ratio (‘70) 19.2 68.8 0.2 0.1 3.9 7.8 100
Trafﬁc Ratio (70) 15.0 49.2 1.6 6.7 5.4 20.2 100
Mean Trans. Size(KB) 5.76 4.98 579.9 2503.9 3.84 19.0 7.39
Transfer Size CoV 1.90 2.46 1.76 1.56 1.33 7.90 14.41

 

 

Request replayers take data extracted from the traces, regenerate and send re-
quests to the server. The number of independent request replayers is changed to gen—
erate different workload intensity. “Burst” of the request ﬂows is well preserved by
using multiple independent request replayers. Timestamps have 1-second resolution,
and requests with the same timestamp are assumed to be distributed exponentially
in a one second time period. Data from each day time period, 9 AM to 9 PM, are

used as input of the simulator.

‘72

A study based on the traces obtained from ClarkNet [74] was reported by us in
[5]. Most of the trends in the inferences obtained here are similar to that obtained
through the ClarkNet traces. We have used traces from Michigan State University

here to reflect the behavior of most recent trafﬁc patterns.

4.5.2 Server Processing

The parameters for task processing behavior are derived from [75, 45], and by
monitoring the network trafﬁc to and from our departmental web server. Studies in
[76, 77] have shown that smaller ﬁles tend to be more frequently accessed, and caching
ﬁle c0pies decreases disk accesses and thereby improves response time. Caching hit
ratio data is selected based on the study in [76, 78]. Cache size of each server is set
to be 32 MB for static objects with size less than 32 KB. Dynamic and big, i.e., sizes
equal or larger than 32 KB, web objects are treated as uncacheable. For a dynamic
object request, the service time is dominated by the CPU computation time; and
for a large size ﬁle request, it is dominated by I/ O processing time. The simulation

parameters are shown in Table 4.2.

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 4.2 Simulation Parameters.

[ Parameter value ]
Number of Task Servers 4
System Capacity for Static Objects 1000 req./sec
Task dispatcher Capacity 4000 req./sec
Disk Bandwidth 10 MBps
Disk Seeking Overhead 1 ms
Network Bandwidth 100 Mbps
Outbound Network BW 80 Mbps
Cache Size 32M bytes/server
Caching Threshold 32 KB
Dynamic Objects Processing Overhead 10 ms
Priority Level 2

 

 

 

 

4.5.3 Performance Metrics

The effectiveness of a scheduling scheme is measured in terms of mean slowdown,
mean response time, as well as the 95th percentile response time. Mean response
time is deﬁned as the time between the acceptance of a request and the completion of
the reply, which is the sum of the waiting time and service time. Mean response time
indicates the average time a task stays in the system. The 95th percentile response
time shows the response time that majority of tasks experience, which gives out sta-
tistical predictability of system responsiveness. Service time includes the service time
incurred in the scheduler, in a task server, and in the network interface. Slowdown of
a task is the ratio of its response time to its service time. Slowdown of a task gives
out a metrics of user tolerance. A user is often willing to wait longer time for a “big”

task, which is reflected by the slowdown parameter.

4.6 Results

In this section, we present and analyze the results obtained from our simula-
tor using the real workload traces. We have examined the impact of priority-based

scheduling, task assignment policies, and admission control schemes.

4.6.1 Effectiveness of Priority Based Scheduling

In the Internet environment, both access interval and service time distribution are
signiﬁcantly different from the widely used synthetically modeled workload. There-
fore, we have used real workload traces for performance evaluation. The high variance
of the inter-arrival time and service rate degrades system performance [7 9], and forces
the web server to operate in low utilization state. Figure 4.2 records the mean response
time of tasks under different scheduling schemes. Figure 4.3 shows the slowdown of
tasks under different scheduling schemes, and Figure 4.4 shows the 95th percentile

response time of tasks.

 

200 1 1 w

 

 

 

 

 

5 T I Y
- - - low_pn' (a) I - - - Iow_pri_(a)
‘5 - - high_pn (b) " t ‘50 - - hogh_pn (b) 4
—— non_pn (c) I , — non_pn (c)
l

 

 

b

..
c-
O

W

U

(1'
WV

\

U
..
8

I

Mean Response Time (8)
M
Lu
Y I Y
Moan Slowdown (5)
8
l

 

 

 

 

n
.\h
8

8

.-
ﬂ
\
\
I
I
‘\
\
\
\
\
J;
.
O
o ’_—‘T‘—T_"‘r—_—r ' '

U‘
r

 

 

 

O
U‘
\
\
\.
" \
\
\ \
l
l
l
I
\
\
\
\
L

 

 

L ‘ J L L
1 oo 2CD 300 400 500 6C!) 700

 

00 100 200 300 400 50” 600 700 0
Mean Access Rate (1):) Mean Aocm Rate (11:)
Figure 4.2 Mean task response Figure 4.3 Mean task slow-
time vs. scheduling down vs. scheduling
schemes. schemes.

Curve (c) in each ﬁgure indicates the performance trends of non-prioritized pro-
cessing of tasks. As we can see from Figures 4.2, 4.3 and 4.4, the response time
increases sharply with respect to the system utilization. Curves (a) and (b) in each
ﬁgure are the response time and mean slowdown curves of low priority requests and
high priority requests, respectively. High priority tags are assigned to half of the
requests randomly, and the rest of requests are marked as low priority tasks. Thus
the ratio of high priority to low priority tasks is 1 to 1, and both types of tasks are
randomly distributed in the whole arrival sequence. Tasks are assigned to each task
server using Roundeobin scheduling irrespective of their priorities. Tasks queued
at each server are served based on their priority. Speciﬁc scheduling and task as-
signment approaches are discussed later in this section. Performance degradation of
the high priority task group happens at a. much higher utilization compared to the
non-priority-based model. On the other hand, the performance curve of low priority
task is fairly close to the performance curve without priority differentiation.

We only consider the stable states of the system in the study, i.e., the task response

time and slow down before the sharp performance degradation, or the “knee” of the

75

 

 

 

 

 

 

w 1' I T
- — - low_pn‘ (a) ' :
- - : high_pn (b) ,
-—— non_pn (C) .
505—" ' '——~“ 1
. l
l
l
A l
e .0. ,
g l
a ] ,
3 I
g I
2 II)" J
g i
s 20*
10"
[ .. ~ ‘ ’ ’ -
0 —-’-’/ - ’ A L
0 100 200 300 4C!) 500 600 700

Mean Access Rate (Us)

Figure 4.4 95th percentile response
time vs. scheduling schemes.
performance curve. From the preceding results, we observe that, the delay is bounded
in an acceptable range for high priority tasks in a system with “performance knee”
at about 50% of the system capacity for static objects with 50% or less high priority
tasks. A steep rise in response time of low priority requests occurs at about 50% of
the system capacity, which is about. the same as response degradation point without
priority differentiation. High priority requests incur average low delay, i.e., less than 2
seconds, even when the system approaches full utilization. Note that the experiment
could not achieve the full system capacity due to the overhead incurred by dynamic
objects. The 95th percentile response delay of high priority requests is also controlled
within a reasonable range, which is less than 10 seconds in the experiment.

The irregularities in the curve at. low server utilization are caused by load imbal-
ance among servers while serving continuous media objects. Roundeobz'n scheduling
treat each task indifferently, which fails to balance load among servers since continu-
ous media tasks consume much more disk and network bandwidth than other tasks.
In the following subsection, we have varied the high priority task proportions from
0.5 used in preceding experiment to 0.9, corresponding to the portions of low priority

task from 0.5 to 0.1. In reality the proportion of high priority tasks would be lower

76

than that of the low priority tasks. The proportions that we have used as rather

conservative and may be considered as the worst case scenarios.

4.6.2 Maximum High Priority Ratio

Next, we examine the relationship between the high priority task ratio and the
“knee of the curves” in the system. The simulation setup is kept the same as in the
last experiment, the only difference is that the high priority task ratio varies from 0.5
to 0.9.

Figures 4.5 and 4.6 show response delay curves of high priority tasks with high pri-
ority ratio varying from 0.5 to 0.9. As high priority task ratio increases, the response
time curve gets closer to the non-prioritized system response time curve, and the ben-
eﬁt margin obtained from differentiating services diminishes. Figure 4.5 shows that
the “knee”s of the mean response time of high priority tasks move closer to that of the
non-prioritized system with the increase in high priority ratio. However, by control-
ling the proportion of high priority requests, we can obtain signiﬁcant performance

beneﬁt from the SDIS.

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

‘ v r 7 —— high _pri 0.5
— high_prt 0.5 ] - - — high‘pri 0.6
‘5 - — - high-pnO.6 / [.1 - - ‘ hngh_pri0.7
- - . high_pn0.7 , I 50 - hugh_pri0.8
- hugh_pr10.8 f I O hngh_pn0.9
‘ o hngh_pn 0.9 f I
—-— non_pri ’ E
3.5 ’ . :40-
3 O I l y.
E . « g
t- , 4
. a? ”i ”
25 ’ I 4' s I
2» ; / ~1 o
3 . 4 [A gm?
/
15- «
1 s 1 1o ..
0.5
0
o ‘ ‘ 1 °

1 L A l
0 100 200 300 400 500 600 700 Moan Access Rate (11:)
Mean Access Halo (my

Figure 4.6 95th percentile response
time vs. high priority ra-
tio.

Figure 4.5 Mean response time vs.
high priority ratio.

Figure 4.6 displays the 95th percentile response time of high priority tasks with
high priority ratio ranging from 0.5 to 0.9. The monitored time frame is set to be 60
seconds, which is the default timeout period used in this study. It can be observed
from the ﬁgure that the high priority tasks rarely gets timed out and retransmitted.

Service availability of high priority tasks is much higher than that of low priority

tasks.

4.6.3 Low Priority Task Performance

Figures 4.7 and 4.8 show the mean response time and 95th percentile response
time curves of low priority tasks with the high priority ratio ranging from 0.5 to 0.9,

i.e., low priority task ratio from 0.5 to 0.1, versus trafﬁc intensity.

 

M

 

—— low_pri 0.5 I .

-—- low_pri0.4 I.)
- - - low_pr‘|0.3 I .

- low_pri 0.2 I I/
5 o low_pn 0.1 I I ;

 

 

 

 

a.

1 L J 1
Moon Respomo Time (s)
8
Y

Moan Response Time (a)
U"
s 9
\—

 

 

 

 

 

 

/ ,‘
ll/ \ ‘\/
1'- A—Wf i
,x
A A . i . , V0 100 200 300 400 500 600 700
MoanAccmRatouis)

0 v v .
0 100 200 300 400 500 600 700
Mean Access Rate (Us)

 

Figure 4.7 Mean response time vs. F1gure4.8 9.51311 percentile response
t1me vs. high priority ra-

high priority ratio. .
tlo.

It can be observed that the response time of low priority tasks becomes worse
with the increase in high priority task proportion, as expected. On the other hand,
we ﬁnd that the spectrum of the occurrence of the “performance knee” in low priority
response time curves is relatively narrow, which reﬂects the minimal inﬂuence on low

priority tasks with service differentiation. The range of the 95th percentile response

78

 

 

[ —— Iow_pn'0.5 [ ' ,' f
' - - - Iow_pn 0.4. I ,/
— — . low_,pre 0.3 [ I l/

 

low_pn 0.2 I'
‘00 O low_pn0.1

 

 

‘7 Too 200 300 400 500 600 700
Mean Access Rate (Us)

Figure 4.9 Mean slowdown vs. high pri-
ority ratio.
time of low priority tasks with varying high priority task ratio is narrower than the
mean response time range under the same situation, as shown in Figure 4.8. The
slowdown performance of low priority tasks is shown in Figure 4.9, which is rather

consistent with the response time performances shown in Figures 4.7 and 4.8.

4.6.4 Task Assignment Schemes

In a distributed server enviromnent, an appropriate task assignment scheme de-
creases the waiting time variance and thereby improves the system performance.
We studied four types of task assignments, Round_Robin (rr), Shortest_Queue_First
(sqf), PrioritizedShortest-Queue-First (psqf), and Reserved_PSQF (rpsq), in the
experiment. Tasks are assigned to task servers in rotational order when we use rr
task assignment scheme. It is the simplest task assignment scheme. sqf task as-
signment scheme is based on load balancing techniques, which assigns tasks to the
server with lowest number of active processes. The effectiveness of the sqf assignment
scheme depends on the accuracy of system load information the dispatcher uses. In
the experiment, we assume that the scheduler always gets the updated process num-

ber in each task server, and the overhead of system load monitoring is 10% of service

79

time. To adapt to the differentiated service environment, a psqf task assignment
scheme is introduced in which a new task is assigned to the server with the least
number of waiting tasks of equal or higher priority than the incoming task. We also
tried a resource reservation scheme rpsq in assigning a task, i.e., some resources are
reserved for high priority tasks. A high priority task waiting time is expected to
decrease by reserving one or two servers exclusively serving high priority tasks.

The response time of high priority tasks are showed in Figures 4.10, 4.11, 4.12
and 4.13. Figures 4.14, 4.15, 4.16 and 4.17 illustrate the corresponding response time
variations of low priority tasks. Experimental parameters are kept the same in each of
the task assignment schemes. The ratio between high priority and low priority tasks
is one to one in Figures 4.10, 4.11, 4.14, and 4.15 to illustrate normal load situation
of high priority tasks, and four to one in Figures 4.12, 4.13, 4.16 and 4.17 to show

heavy load performance of high priority tasks.

 

 

 

 

 

 

    

 

 

 

 

 

 

Pnomy‘ ratio1.1
Pnomyrahoi 1 30 a r r
‘ I Y —"
_" __“
--sqI H - new

0.9 - M + 2.:
0.8?

0.7
E t
206*
:
gosf
0
c:
304

0.3»

03»

0.1»

/ ‘--—~—:vl o l J L A
G i i" i i 1 O 100 200 300 400 500 000 700 000
100 200 300 400 Mean Access Rate (1/3)

Moan Access Rate (115)

Figure 4.11 95th Percentile high
priority task response
time, priority ratio
1:1.

Figure 4.10 High priority task
mean response time,
priority ratio 1:1.

The rr task assignment scheme consumes the least system overhead, which as-

signs tasks in rotation order of task servers. There is no need to monitor task server

80

Hohonomy:Lowpmmy-t:|
V T Y

 

 

Hughpnomy:Lowpnomy-41 .. _
V Y] Y

2.5 1 , f a

 

 

 

 

 

[Ill
335‘

l

05] / "f 1 5*

 

 

 

 

 

 

.. r 400
"o 100 200 300 mo 500 600 700 900 ”ma-DWS)
”Wham/a)

Figure 4.13 95th Percentile high
priority task response
time, priority ratio
4:1.

Figure 4.12 High priority task
mean response time,
priority ratio 4:1.

load dynamically to make the decision. However, one big task can cause temporary
server overload and degrade the system responsibility and throughput drastically. As
we can see from the Figures 4.10, 4.12, 4.14, 4.16, using rr scheme, both high pri-
ority and low priority tasks experience worse response time performance than other
task assignment schemes irrespective the proportion of high priority tasks. On the
other hand, introducing load balancing techniques in the system improves the sys-
tem performance on the whole. The experimental results show improved response
time performance in both high priority and low priority tasks using the sqf scheme
compared to the rr task assignment scheme.

There are not much differences between the sqf and the psqf schemes in light
load situation. However, the psqf scheme performs better in heavy load situation
and keeps mean and the 95th percentile response time of high priority tasks consid-
erably lower than the sqf scheme, see Figures 4.10 and 4.12. The mean and the 95th
percentile response time of low priority tasks are about the same under the sqf and

the psqf assignment schemes, see Figures 4.14 and 4.16.

81

Pnomymioin
r

 

 

 

 

 

 

 

 

 

 

 

 

 

Priorﬂyrano1:1
if... T 7 .I‘ l
‘5 - out I ] .
W ] ' ] 4
.. {I ;
35 f [ 3
3 c I E
5 3L 5 : P
L. I I E ~
. I], : '1 3
2 2» if J] ;
1,5L / .7 l
/\\/ ,7 '
1 // F / l ]
/
05L. JVJ A\/¢
4i \ J N .
ca 100 200 300 400 soo 600 700 300 mm‘rnamm) 000
MunA-ocossﬂaienln)
. . . Fi ‘ure 4.15 95th ercentile low
Figure 4.14 Low priority task g , , p
. priority task response
mean response time, , . . .
. . . time, priority ratio
priority ratio 1:1. 1 1

The performance of high priority task under reservation based sqf and psqf
schemes does not differ much from non-reservation based sqf and psqf schemes.
On the contrary, reservation based psqf, denoted as rpsq, degrades system response
time of low priority tasks when the priority ratio is one to one. The “curve knee”
of low priority task response time moves closer to that of the rr scheme. Reserving
some resources for high priority tasks is proved to be not as effective as load balancing
based task assignment schemes.

Comparing the low priority task performance against different task assignment
schemes with varied high priority ratio, we observe that there are not much differences
in the occurrence of “curve knee”s under different task assignment schemes, although
the slope of the curves are different after the “knee”. The results suggest that the

increase of high priority task ratio causes low performance penalty to low priority

tasks if the system load is well balanced.

82

 

Hughpnomy'Lowpnonty-l'i
Y Y 7

Hugh pnonty . Lou pnonty - 4 ‘ 1
T T

W T Y

 

 

 

 

 

 

 

+.
335:

\\
s

6

Mean Response Time (s)
b 0" C
x\‘~‘
A‘m..-_\b‘ .54
41* L
Mean Response Time (I)
8

 

 

0
ﬁ'
\
\ \‘
.\L
L J
_.
O

N
1

 

 

 

 

 

[
K
r
[

 

C)

500
400 500 600 700 800 Mean Access Rate (We)
Mean Access Rate (m)

s
5
§

Figure 4.17 95th percentile low
priority task response
time, priority ratio
4:1.

Figure 4.16 Low priority task
mean response time,
priority ratio 4:1.

4.6.5 Preferential Task Assignment

In the previous experiment we have tested the performance issues of different task
assignment schemes based on the concept of load sharing and / or load balancing. The
results indicates that a simple rr task assignment scheme is not efﬁcient in balancing
the load and improve system performance in diverse workload environments. Tem-
porary system load imbalance caused by big tasks hurts the system responsiveness
to a great extent. Next, we study a preferential task assignment (PTA) on the basis
of memory and type afﬁnity in a distributed server environment. The stateless task
assignment schemes discussed earlier are enhanced by the stateful delivery. Each type
of web objects has its preference of a primary server. The same or same type of web
objects go to a speciﬁc server as far as the server load is below a threshold. Here we
set the load threshold as 1.2 of mean load of task servers. Otherwise, tasks are sent to
a secondary server of lighter load at a migration cost. The migration cost is set to be
5% of mean service time. Popular web objects are detected and duplicate cache copies

are stored in primary and secondary servers. The copy number and location of cache

83

copy of a “hot” web object is determined by the object type and the pOpularity of the
object. Continuous media objects are sent to the same server, and processing sharing
is used in that particular server. Results are shown in the Figures 4.18 through 4.21.

High priority ratio is set to be 0.8 to show the performance bounds of high priority

 

 

 

 

 

 

 

 

 

 

   

 

 

tasks.
2 High priority Low phorfy . 4 ’ 1 “ High pnonry ' Low pnomy - t _ 1
- a — sqf_npreT] - -- sanpref
1.3 - 0— squref 1 - 0- squref
- - - psqf_npref , ' " ‘ 930L09'9'
1h —0 - psqt_prel ,4 -0 . psquref
' -—-— rpsq_npref I + rpsq_npref I ’
[ + msqmril ,’ + rpsqmrd :
a” ,’ ,1 :10L
312- E
E ' 3
i 3 5.
on»
0.4
0.2» .
LLLLL f :; tL/é\ ‘ i J i 1 l l
0o 100 200 300 400 500 00 100 200 300 400 500 000 700 sec
Mean Access Rate (11:) Mean Acceee Rate (Us)
Figure 4.18 High priority task Figure 4.19 95th percentile high
mean response time priority task response
vs. PTA. time vs. PTA.

Figures 4.18 and 4.19 compare the response time versus task arrival rate of state-
less task assignment schemes and those of the preferential task assignment schemes.
Preferential task assignments do help to move performance “knee” of high priority
tasks from about 0.8 to 0.9 of the system utilization. The performance improvement
comes from improved server caching hit ratio, and decreased service time variance.
Performance changes in low priority tasks are negligible, as observed in Figures 4.20
and 4.21. The results indicate that stateful task assignments based on memory or

type afﬁnity improve the system performance in the whole.

4.6.6 Admission Control Performance

In the previous experiments, we assume that a processor queue space is unlimited,

so is the lifetime of a task. In a realistic scenario, the number of open connections

84

Highpnonfy,Lowpnontysl'1 thpnoritytmpmﬂty-dtl
I T Y r f 7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

' - ~I - sqf_npref ' w - *- sanpref
4.5 ’ 0‘ SQ’JJIBI . - 0- squret
— r - psqf_npref ‘ “ ‘ P’QLWM
-€> : psquref ’5 ‘0 ' Owl—9"“
‘ —*— rpsqmpref ‘ —*— rpsanpreI
—{> - rpsq_pret —e— rpsq_pref '
:3“ f 2°"
E a» 4 E
g... ] é...
a 3
i :t ‘ .
10" 4
1.5”
1»- Sb
05
0 L A A A A - ‘ O
0 100 200 300 400 000 O 000
Mean Access Rate (155)
Figure 4.20 Low priority task Figure 4.21 95th percentile low
mean response time priority task response
vs. PTA. time vs. PTA.

and queue space are often restricted to system predeﬁned values, which corresponds
to the availability of resources. High priority tasks experience the same probability of
denial of service in an overloaded system if there is no appropriate admission control
scheme. In this study, we explore the effectiveness of reserving buffer space for high
priority tasks by an admission control scheme and the early discard admission control
(EDAC) scheme discussed in section 3. We ﬁnd that EDAC is suited for light to
medium workload. During heavy load period, EDAC is not effective in assuring high
priority tasks to get system resources. The accepted low priority tasks starve a long
time for service, and those waiting tasks occupy buffer space and cause denial of
service to high priority tasks even when the access rate of high priority tasks is below
the system capacity, which is shown in Figure 4.22. The ratio between high and
low priority tasks is one to one, the early detection threshold of low priority tasks
is set to be 0.5 of the buffer space, and the threshold to high priority tasks be the
buffer space. We collected the rejection rate data of high and low priority tasks using
sqf, and psqf task assignment schemes, which are the near optimal and optimal task

assignment schemes as examined earlier. The slope of the reject rate of high priority

85

tasks is almost the same as the rejection rate of low priority tasks irrespective of the
task assignment schemes. In the following experiment with timeout consideration,
we only present rejection rate and abort rate data using sqf task assignment, since

there are no noticeable differences in performance between sqf and psqf schemes.

 

Mmboitim 2006

 

prkonty ratio 1 1 ED threshoic C 5

 

ﬁ . —~— lpn_rej
-—~—- Ipn_abon
——9— hpri_rej
hpri_abort

  
 

 

  

 

Bend Rate
e

O

6

 

 

, ' . qupri

 

 

 

 

 

 

d o sqf_hpn'
— - - pqupri
— ~ ‘ ‘0“ J. L A 1 A A l l
10 L i i l +— psthpn I o 100 200 300 400 500 000 700 500 900 1000
600 650 700 750 500 350 900 950 1000 “u“ M R"

Mean Access Rate

Figure 4.23 Reject and abort
ratio vs. priority
groups.

Figure 4.22 Reject ratio vs. task
assignment schemes.

The experimental result shown in Figure 4.22 suggest that EDAC is not effective
in keeping rejection rate of high priority tasks low under heavy load situations. The
rejection rate of high priority tasks increases to more than 10% in high load situation
which may be considered as high. We further consider adding timeout to release
system resources from starved tasks, and any Operations related to tasks that have
expired timeout are aborted. The abort rate and rejection rate performance is col-
lected and displayed in Figure 4.23. In results shown in Figure 4.23, a timeout of 2
seconds is added to release the system resources from stale low priority tasks. We
repeat the experiment with increased high priority task trafﬁc volume till peak access
rate equals system capacity. The result shows that 99.5% percentage of high priority
tasks are served within 2 seconds. No high priority task experience denial of services

against various task inter-arrival rate. The abort rate of high priority tasks varies

86

from 0.05% to 0.5%, depends on the distribution of high priority task access rate.
It is worth noticing that the abort rate of high priority requests decreases when the
task access rate approaches system capacity, which is due to the increase of rejection
rate of low priority tasks. The abort rate of low priority tasks is about the same as
high priority tasks under light to medium load, and continue to increase with the
workload. There is no signiﬁcant difference between the rejection rate of low priority
tasks with and without timeout setting. The results indicate that EDAC and timeout
effectively reallocate system resources to high priority tasks during server overload

periods.

4.7 Related Works

Although a significant amount of research have been done on the service differ-
entiation at the network level, the work on service differentiation in Internet servers
have been limited. Almeida et al. [65] have implemented a prototype of web server
which can provide prioritized service in a web hosting environment and studied per-
formance issues in both user space and kernel space. They compared performance of
high priority tasks and low priority tasks using synthetic benchmarks. Their research
results encouraged us to study prioritized service in a general web server environ-
ment. Our work is different from theirs in that we use empirical distribution input
which represent the real workload of a web server. In addition, we have also analyzed
several other issues of SDIS including task assignment, admission control, impact of
memory afﬁnity, etc. Bhatti and Friedrich in HP Labs [80] have built a prototype of
task classiﬁer and scheduler on an Apache web server, and studied response time and
throughput issues of prioritized services on a web server. They used stress tests which
might not be adequate in simulating high variance of web server workload environ-
ment. The pricing issues related to the QoS provision on the Internet was analyzed in

[81]. They discussed several QoS classiﬁcation options, which can be also provided by

87

our server model. We extend their work in terms of service differentiation and classi-
ﬁcation. The authors of [82] studied task assignment policies in a distributed server
system model. Each host processes tasks in F CF S order and the task resource demand
is known in advance. Their results suggested that a size-based policy performed the
best in a environment of task processing time with high variance and linearly to reply
size. However, they did not consider the QoS assurance issues. The authors of [37]
investigated the issues of building a scalable web server system on workstation clus-
ters. Load balancing was achieved by redirection mechanism provided by HTTP
protocol. Extra round-trip delay was the drawback of their scheme. The authors
of [43] introduced another type of scalable distributed web server, load sharing and
balancing is achieved by combining DNS approach and router dispatching. They have

not considered QoS issues in their studies.

4.8 Summary

The next generation Internet will demand differentiated services from Internet
servers, which can be achieved through priority based service. Service differentiating
Internet servers are needed to provide high quality of service to high priority tasks
even under high system utilization. An incoming request can be assigned to a priority
group on the basis of the client, net-work, content, or owner of the requested object,
and current server load situations. It can also inherit a performance level deﬁned
by low level network protocols. Resources are allocated and processes are scheduled
according to priority groups. In this study, we prove that under near-saturation of
web server utilization, differentiated services provide signiﬁcantly better services to
high priority tasks compared to a traditional web server with minimal performance
penalty on low priority tasks. We also present quantitative performance estimation

of different. levels of tasks.

88

A high priority task waiting time is determined by high priority arrival rate and
the whole system utilization. To maintain a stable system response and throughput
states, the arrival rate of high priority tasks shall not exceed the “knee” of the whole
system performance curve. Thus we have provided a framework for the determination
of the maximum acceptable rate of high priority tasks.

Task assignment schemes toward decreasing the variation of waiting time and
service time shortens the perceivable average waiting time of each priority group.
A distributed server environment and shortest queue ﬁrst task assignment schemes
help in balancing the load between each task servers, thus reducing the response time
variance and average response time. The results suggest that system load monitoring
and balancing improve performance for both high priority and low priority tasks. We
also explore how to make use of state information in improving system performance.
Corresponding preferential task assignment and caching schemes based on object
behavior and types help to extend the stable states area in terms of system utilization.
Reservation of system resources realized by early detection admission control and
timeout schemes improve the system availability for high priority tasks as well as the
system throughput.

The results from this study prove that. the combination of selective early discard,
timeout, and priority queuing is necessary and maybe sufﬁcient to provide predictable
response from the next generation Internet Servers. We can conﬁgure our server sys-
tem so that the response time to high priority requests can be controlled irrespective

of the volume of low priority requests.

89

CHAPTER 5 BOUNDING RESPONSE DELAYS IN
BUSY WEB SERVERS

5.1 Overview

There has been an explosive increase of evolving applications that use the World
Wide Web (WWW) as a distributed information exchange interface on the Inter-
net. Efforts of using the WWW in revenue generating activities, or well known as
e-commerce, are also increasing at a fast rate [1]. A widely existing problem in con-
temporary Web servers, however, is the unpredictability of response times, which is
not acceptable to time critical transactions with response delay constraints. On the
other hand, one second response time is expected from Web sites, which is appropriate
to human response speed [83]. Although current Web servers can serve thousands of
requests in one second, the average response delay of a popular server may be orders
of magnitudes higher than expected during high load periods.

Traditional Web servers are designed more or less like ﬁle servers, wherein the
majority of activities are retrieval of small static ﬁle objects such as HTML or image
ﬁles. Evolving web servers are more like middlewares to emerging new applications
which demand diverse and much improved service qualities. For example, the average
CPU time for satisfying dynamic requests is 10 to 100 times higher than the average
CPU time for static requests. Secure transactions on the Web, which is popular in E-
cornmerce, consume much more CPU resources than regular transactions. The growth

of the Internet trafﬁc and dynamics of server processing impose continuous stress on

90

server load. With the increase in load, the performance of an Internet server may
degrade up until a certain level. Beyond this level, which we call as overload point,
the performance drops drastically and the server behavior becomes unstable and may
eventually crash [5]. Because of the increase in response time during overloads, users
become impatient and sometimes abort the retrievals, which causes waste of precious
and expensive server resources. Effective admission control (AC) mechanisms can be
used to reject the requests with high abort probability, preserve response delay and
throughput of servers.

Most contemporary web servers use a rather naive admission control scheme,
called tail-dropping, in which incoming requests are dropped when the request queue
becomes full. The tail-dropping AC scheme requires careful system capacity planning
and works well only in static workload situations. In a highly variable workload
environment, more flexible AC schemes are needed to adapt to the dynamics of Web
trafﬁc.

In this study, we propose a simple and effective AC algorithm called ACES (Ad-
mission Control based on Estimation of Service time) to provide bounded response
delay to incoming requests under highly variant workload and service processing en-
vironments. The ACES algorithm admits or rejects requests based on the estimation
of task service times. The service time estimation for each task is determined by the
task types. Admission of a request is decided by comparing the available computation
power for a duration equivalent to the delay bound with the estimated service time of
the request. If the task is admitted based on this constraint, then it is likely that the
request will be served within the speciﬁc time bounds. A double-queueing structure
is used to compromise the inaccuracy of estimation and make use of spare capacity
of the system, while simplifying the queue management. Experimental study shows
that the ACES algorithm provides effective control of response delay bounds to tasks,

and preserves system throughput under various workload conditions.

91

The ACES algorithm also can be used for session management in a generic web
server system. Session based predictive AC has been proven to be effective in a
uniform workload situation [84], i.e., the resource requirement is regarded as the same
in the experiment. The ACES algorithm is similar to the predictive AC proposed in
their study. We extend the model by taking into consideration of reserving queue
space for premium sessions and packets.

The rest of the study is organized as follows. Section 5.2 discusses in detail about
the ACES algorithm. Section 5.3 answers the questions of how to make estimation
by analyzing the workload characteristics of a server from real web server traces. The
simulation experiment and performance evaluation of the algorithm are reported in

Sections 5.4 and 5.5, respectively. Section 5.6 concludes the study.

5.2 Admission Control Algorithm

The goal of ACES admission control algorithm is to provide response delay bounds

to incoming requests while preserving the system throughput of busy web servers.

5.2.1 Overview of the algorithm

Usually a queue of incoming requests is maintained in a web server awaiting to
be processed. Using the tail-dropping AC scheme, incoming requests are put in the
queue until the queue is full. Queue length is not always a good indicator of system
overload, especially when the variance of processing time is high. Without effective
admission control, the server response time and throughput deteriorate drastically
when the aggregate request rate exceeds server capacity, indiscriminately affecting all
clients. Abdelzaher and Bhatti [52] reported that as much as half of the web system’s
processing capacity is wasted on eventually aborted / rejected requests when the load

is high.

92

To accommodate the high variance in service pattern of web server systems, we
propose a simple and adaptive admission control algorithm, Admission Control based
on Estimation of Service time (ACES), to provide assurance of bounded response
delay, while preserving the system throughput. Service time of each task is estimated
based on the request types. Inaccurate estimations are dynamically adjusted and

bounded delay is achieved by a double-queue architecture.

5.2.2 Delay Bounds Assurance

In a stable system, i.e., if the server has the capacity to process all the requests in
the steady state, a queue and the associated queuing delay are introduced to smooth
out the short term variations in the job arrival rate. When the short term arrival rate
is faster than the processing rate, newly arrived requests have to wait in the queue for
service. If variations of job arrival rate and processing rate are high, more queueing
buffer is needed, and therefore average queueing time and response delay would be
high. One way to increase the response time predictability of a system is to limit the
number of accepted requests in a given time period to no more than what the system
can handle in that period of time. Borowsky [85] has provided a theoretical proof of
short term response time verses arrival rate in a FCFS work-conserving system.

The theorem is described as below: Let K (t) be the queue length at time t. Let
2,1,1? 8,, be the sum of the service time for requests in queue at time t. So is the
residual service time of the request in service. Let N (t, t’) be the number of requests
arrive in the period (t, t’), 2:1?” S, be the sum of service time required by requests

N (t, t’). Therefore,

1m. N(t—T,t)

)
Sn <= 2: Si <2 T. (5.1)
i:1

71:0

The response delay required by pending jobs is no more than T, if the sum of

service time required by arriving jobs at any duration T is bounded by time T.

93

The response time (or delay) of all requests can be bounded by T by restricting the
workload arriving in every interval of length T.

Assume that the time is divided into slots (0, T), (T, 2T), (kT, (k+1)T) . For
simplicity, we use the begin point. of each time period, kT, to represent the duration
(kT, (k + 1)T). Let Cum, be the unit processing capacity of the system, C(kT) be
the predicted system processing capacity in period kT, S (i, kT) be the service time
needed by the ith task at period kT, and n(kT) be the number of admitted tasks in
period kT. If the server has a. precise knowledge of service time needed by each task,
the admission decision can be made based on the following equation:

n(kT)

C(kT) 2 cm, *T 2 Z soar). (5.2)

i=1

If the expression is true, then the request is admitted, otherwise it is rejected. The
maximum response delay of each task is bounded by the value of T if the service time
of each task is known a priori of admission decision. In other words, the admission

control manager should have knowledge of the service time of incoming tasks.

5.2.3 Admission Control and Service Time

In deciding to accept or reject. a request, the admission control manager should
ensure that the sum of service time of accepted tasks does not exceed the system
capacity. In reality, it is not possible to know the service time S (i, kT) in advance.
High variance in resource requirement is a widely recognized characteristic of web
server workload. As indicated in the previous section, however, the service time and
bandwidth requirement of the same type of requests are more or less consistent. Ser-
vice time of web objects can be thus estimated based on the request type distribution
of the server access pattern. We approximate the service time of each task by using
weighted computation quantum (CQ) matching the CPU processing time of different

type of tasks.

94

When a new request arrives, the admission control manager checks if there are
enough CQ available to grant to the request. Only requests that are granted enough
CQ can be enqueued and served eventually. The number of CO issued to each task is
determined by the resource requirement of the request. For example, the CQ granted
to a dynamic request might be 10 times more than the CO granted to a static request.
Here we denote it as the number of types of requests to a web server, which can be
derived from the studies of type characteristics of a web server. Let N,(kT) be the
number of request of i type in period kT, CQ, be the weighted CQ matching the
CPU processing time for type i tasks. Then Equation (5.2) can be approximated as:

n
C(kT) 2 2 co, * N,(kT). (5.3)
i=1

As in the case of Equation (5.2), a request is admitted if Equation (5.3) holds

true, otherwise is rejected.

5.2.4 The Double Queue Structure

Since Equation (5.3) is based on the estimation of service time, we should be
careful to adjust the accumulate delay influence of possible over-allocation during
a time period. One simple solution is to discard unﬁnished tasks at the end of
each period, although this solution may result in a lot of wastage of resources that
have been consumed by the discarded requests. Similarly, a part of the processing
power of the system could be wasted during under-allocation situations. To handle
the over/ under allocation problems, we propose to use a double-queue structure. A
primary queue is used as the incoming task queue, and a secondary queue is added
for the backed up requests (we call this as backup queue). Incoming requests are put
into the primary queue, if admitted, otherwise they are dropped. At the beginning of
each period, all unﬁnished tasks in the primary queue are moved to the backup queue.

Tasks in the backup queue cannot be served if there are tasks in the primary queue.

95

Whenever the primary queue becomes empty, tasks in the backup queue get served.
When the backup queue becomes full, requests in the backup queue are discarded. By
using a double-queue structure, newly accepted requests need not wait in the queue
for a long time to get service, thus bounded delay is achieved for most of the requests.
More details about the double queueing scheme follows in the next section.

Similar algorithm can be expanded to I / O admission control. Because the trans-
mission of a sustained CM object may extend for several sample periods, a double-
queue structure can not be used directly. The available bandwidth varies in different
periods. Let Bum-t be the unit time processing capacity of the system. Let B(kT)
be the effective bandwidth in period kT, tub, the bandwidth weight of each type of
tasks. Let Br(nT) be the sum of bandwidth needed by unﬁnished tasks at the end
of ((n — 1)T,nT). Let m be the maximum number of periods a task can extend.
The decision of if a web object get fully serviced can be made based on the equation

below:

11(kT) Tn
i=1 i=1
k
B(kT) = Bun,,*:r— Z B,(kT). (5.5)
iZk—m

5.3 Service Time Estimation

5.3.1 Web Object Distribution

To explore a simple and effective to estimate the service time of web objects, we
analyzed the traces of a busy web server to track trends and characteristics of resource
requirements of the requests. The trace data is collected for a week from the web
server of the Department of Computer Science and Engineering at Michigan State

University, which encountered about a million requests during the period.

96

Table 5.1 lists the requested Web object types and corresponding traffic distribu-
tion of the traces. Web object types are categorized on the same basis as reported in
[86]. The ﬁrst and second rows of the Table show the request ratio of each major type
of web objects. It can be observed that requests for small and static web objects,
i.e., HTML and image ﬁles still dominate the incoming requests and Web trafﬁc. The
third row shows the mean response size in kilobytes of each object type. The fourth
row shows the coefﬁcient. of variance (CoV) of the reply size of each web object type.
The data show that the CoV of web trafﬁc is as low as 1"2 for each type of web

objects. However, the CoV of the aggregated web trafﬁc can be as high as 14.41.

 

 

 

 

 

 

 

Table 5.1 Trace Data Distribution.
Item HTML Image Audio Video Dynamic Other Total
Req. Ratio ((70) 19.2 68.8 0.2 0.1 4.9 6.8 100
Trafﬁc Ratio (‘70) 15.0 49.2 2.6 6.7 4.4 20.2 100
Mean Trafﬁc 5.76 4.98 579.9 2503.9 6.84 19.0 7.39
Trafﬁc CoV 1.90 2.46 1.76 1.56 1.33 7.90 14.41

 

 

 

 

 

 

 

 

There are non-negligible percentage of dynamic objects. This type of access re-
quires more processing power, and thus increases the server load and the unpre-
dictability of response delay. Previous studies [14] have estimated that a dynamic
object requires 10 to 100 times of service time than a typical static object. The
processing time difference was conﬁrmed by recent benchmark tests of popular com-
mercial web servers [35] and the following service time characteristic study. The
throughput of serving dynamic tasks are usually lower than serving static tasks by a
factor of 10. Continuous media (CM) objects (audio and video clips) are now being
used more extensively in the web environments (compared to the previous results
in [86]). These data types requires high storage capacity (thus uncacheable), high
density I/O processing, and sustained bandwidth with bounded delay constraints.
The average CPU time for serving a CM object is also much higher than the average

CPU time for serving small static objects. Signiﬁcant differences in size of different

97

type of web objects, high volume of CM objects, and computation-intensive dynamic
requests suggest that the service time variance of the requested web objects should be
considered in estimating the server capacity and response time. Classiﬁcation of Web
objects can be used to approximate and regulate the server processing, thus decrease

the variance in server processing and improve the delay performance.

5.3.2 Service Time Distribution

Previous studies [52, 82, 87] suggested that, the service time of retrieving static
web objects such as HTML and image ﬁles can be divided into two parts: a rather
ﬁxed URL processing time, and the content transfer time which increases linearly
with the ﬁle sizes. To collect the service time attributes, we set up an experiment
environment consisting of one Apache (version 1.3.12) web server and three WebStone
(version 2.5) clients, connected through a 10 Mb Ethernet. The web server hardware
is based on a Pentium II 233MHZ CPU with 128 MB memory. The operating system
used in the server is Linux 2.2.9. Three clients are installed in Sun UltraSparc 10
with 128 MB memory, running Solaris 2.6.

The CPU cycles are used as timing scaled to obtain a precise time resolution,
since the typical service time of a request is much less than one second. Even mil-
lisecond resolution is not sufﬁcient to provide an accurate picture of the service time
distribution in different processing procedures. The performance monitoring counters
provided by the Intel P6 family processors are used to record the elapsed busy CPU
cycles of the current process.

The response time is deﬁned as the duration from the time that the server accepts
a client request to the time that the response transmission ﬁnishes. The response
time include CPU busy times and idle times waiting for resources. Figure 5.1 plots
the response time of requests versus ﬁle sizes and maximum process numbers.

It can be observed from the ﬁgure that the response times are rather constant if

the ﬁles are smaller than 64 Kbytes. If the ﬁles size exceeds 64 Kbytes, the mean

98

response times are determined by network bandwidth. For example, the effective
bandwidth of the TCP trafﬁc in the Ethernet is 1 Mbytes per second. The response
time of a 1 i\~Ibyte ﬁle request is around one second under single process situation.
The response time of 1 l\lbyte ﬁle request is around ten seconds if the maximum
process number is 10.

The service time is deﬁned as the CPU cycles consumed between the time that
the server accepts a client request, and the time that the server ﬁnishes sending the
response. Figure 5.2 depicts the mean service time of static objects versus requested
ﬁle size under different maximum process number (MPN) in the server. It is well
known that the MPN has direct inﬂuence in service time distribution, which are
discussed in detail in following paragraphs. CPU cycles have been converted into

seconds in the ﬁgure. The CoV of obtained service times vary between 0.09 to 0.11.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1°2+ioc’ ' I 1 If 1°4+Us§ T
- 1 Ophrocesses - 1 coproceeses
- : 2O processes . . 20 processes
40 processes ...»,f_' 40 processes ' j
10‘ A GOorocesses , . , X . » (Siamese 7 :
BOprocesses . "1 ' '- Bowman“ . ,, , "5!":3
A100 r to] 10 , fr“,
E ,1," "E ‘0 A. ‘ .
Emir i, / -. E . 'f i
g ,’// If '
io - [if <3 10 1,
: if ‘ . .
I ., ,. if) b ’’’’ 3 'V" 431,; [MM
‘° :- ag-» * - WW
P__’_______ '--—er
‘0 ‘—L is 10 ‘ * 5
to 10 10‘ 10 10‘ 10 10 10
tile uze(by1e) Me size (byte)
Figure 5.1 Mean response time of Figure 5.2 Mean service time of
web objects. web objects.

It can be observed from the ﬁgures that the curves have two phases. If the ﬁles are
smaller than 64 Kbytes, the service times with the same MPN are rather constant,
and the service times of M PN > 1 is about twice as the service time of M PB = 1. If
the ﬁle sizes exceed 64 Kbytes, the mean service times increase linearly with the ﬁle

sizes and the MPN value. We call this phenomenon the 64 KB leap. The 64 KB leap

99

is mainly due to the restriction by the IP protocol. Since the maximum IP packet
number is 64 KB, any response bigger than 64 KB can not be transferred in one
packet. Memory contention overhead and context switching overhead are introduced
due to the network I / O blocking between packets. Some other factors also contribute
to the 64 KB leap. For example, asynchronized disk I/O is widely used in current
UNIX operating systems, and the default size for read-ahead operation is 64 KB. At
most 64 KB can be loaded from hard disk in one I/O operation.

The slopes of service time increase linearly with the number of maximum processes,
because the increase of process number caused higher probability of page faults, higher
context switching, and synchronization overhead. Based on the observation from
Figure 5.2, the service time of a task T(s, n.) can be estimated by the ﬁle size 3 KB
and MPN value of n.

T(s, n) = a + [.9/64] * (l) + c * n), (5.6)

where a is the service time for small static web objects, i.e., requests for ﬁles
smaller than 64 KB. b is the data transferring time factor, and c is the context
switching overhead factor. Using linear regression, we get the relative value for a, b,
andcas: a:b:c=1:0.06:0.18.

Rechecking the ﬁle size distributions, we ﬁnd that less than 1% of HTML and
image ﬁles are larger than 64 KB. Thus the service time of most HTML and image
ﬁles are rather uniform. Service times of dynamic objects, mainly CGI requests, are
depend on the computation complexity of the URL processing instead of response
size. Using the testing CGI scripts provided by WebStone 2.5 test set, the average
service time of a CGI request is around one order of magnitude higher than the service
times of static objects with a ﬁle size less than 64 KB. The experiment results indicate
that the object type is a good indicator of CPU time needed, which can be derived
from the requested URL easily. Besides, classiﬁcation of object types introduces less
overhead than retrieving ﬁle size information, which requires one fstatO system call

in UNIX operating systems.

100

5.4 Simulation Methodology

To test the effectiveness of the ACES algorithm, we develop an event driven sim-
ulator for the web server using empirical workload from real trace ﬁles. The reference
locality. size and object type distribution extracted from the logs of the Computer Sci-
ence departmental web server at Michigan State University are used for the workload
generation.

The simulated system structure is shown in Figure 5.3. An incoming request is
ﬁrst sent to the admission control manager AC. The AC classiﬁes the request type
and decides if the request can be enqueued based on the AC algorithm. Enqueued
requests wait to be served in the primary queue Qp. At the end of each period or the
beginning of the next period, unﬁnished requests are sent to the back up queue Qb.
When Qb gets full, it is cleared up and all the tasks are drOpped. The task scheduler
TS picks up requests in the queues and sends to the server pool. No request in the
backup queue is serviced unless the primary queue is empty. Replies are sent back
through the server network interface Nb. The system conﬁguration is set based on

the average conﬁguration of current popular web servers as in the Table 5.4.

l @M
e @e

Server
Ill 9 @
backup

 

 

 

 

 

  

drop

 

 

drop

Figure 5.3 Web server system structure.

101

Table 5.2 Simulation Conﬁguration.

 

 

 

 

 

 

 

 

 

 

 

 

[ Parameter l value ]
Priority Level 2
Scheduling period 1 second
System Capacity for Static Objects 1000 req./sec
Network Bandwidth 50 Mbps
Disk Bandwidth 10 Mbps
Caching hit ratio 0.7
Dynamic Objects Processing Overhead 10 ms
Maximum Open connection 1000
Total queue length 1000
Response delay bounds 1 second

 

 

 

 

Three performance metrics are used in the simulation: server throughput, mean
response time, and response delay bounds miss rate. System throughput indicates
the server capability and measures the rate of request served. Mean response time
and delay bounds miss rate quantify the service qualities. The delay bounds miss
rate is measured in terms of the ratio of the number of requests with response delay
of more than one second to the total number of admitted requests.

Three kinds of admission control algorithms are implemented to examine the ef-
fectiveness of ACES algorithm in terms of providing bounded delay and high system
throughput. The ﬁrst admission control algorithm, we call it simple admission control
(SAC) algorithm, is analogous to the leaky bucket [88] algorithm used in trafﬁc engi-
neering in the Network transmissions. Using the SAC scheme, each admitted request
is allocate one CQ irrespective of the request type, thus there is no estimation of
service time. The SAC scheme performs better than the tail-dropping scheme, since
it smoothes out the web server trafﬁc and provides preliminary overload protection.

Another admission control algorithm used for comparison purpose is the conserva-
tive admission control (CAC) scheme. The CAC scheme is a hypothetical scheme, in
which the server is assumed to have precise knowledge of service time a request needs.

Using the CAC scheme, the allocated CQ matches exactly to the service time require-

102

ments. Although implementation of the CAC is unrealistic, we use it to compare the
performance of ACES scheme.

Based on the data collected, web objects are classiﬁed into 4 types: static, dy-
namic, audio and video objects. The CQs consumed by each object type in the ACES
scheme are listed in the Table 5.3.

Table 5.3 CQs allocated to each object types.

Object Types Static Dynamic Audio Video
CQs 1 10 20 100

 

 

 

 

 

 

 

 

 

The service time and corresponding CQs assigned to audio and video ﬁles are
calculated based on their average size. The maxium process number is set as 30,

which is the default conﬁguration for Apache web server.

5.5 Results

The three admission control algorithms as noted in the previous section are de-
ployed in the web server simulator. Two kinds of workloads are used in the ex-
periment. The ﬁrst is the stress test workload, in which traﬂic intensity increases
continuously till 2.5 times of the system capacity. We try to explore the capability of
the three algorithms in preserving the system throughput under extremely high load
by stress test. The second kind of workload is aimed to examine the sensitivity of the
three algorithms under ﬂuctuating workload. There are two modes in the workload
series used in the sensitivity test; One is sustained lightload or overload, and the
other is occasional lightload or overload. The maximum workload is 2 times of the
system capacity. The occasional overload duration is 2 time units, and the sustained
overload duration is 10 time units. Each time unit is 100 times of the observation

period, i.e., 100 seconds.

103

5.5.1 Throughput Comparison

Figures 5.4 and 5.5 plots the throughput performance of the three admission
control algorithms under the two tests. In Figure 5.4, X—axis is the aggregate load
of the server, and Y-axis is the normalized throughout. In Figure 5.5, X-axis is the
time series of the load input, Y-axis is the the normalized system throughout, and

the dotted line is the load intensity.

 

 

 

 

 

‘ *r Y ﬁr 1 ' 2 ﬁr *r 1 v
. I sac l 0 5w
, - ACES, . , , . AGES
0 9. ’ O . ' . I_' Ac—“—4C ‘~s>- ‘ l f l - cw
w R .3 ‘ l l
/ t ’ r l . Q
05* L ‘ \. x‘ ’ a a, ,1 0.} , 1 16>
I ‘ ‘F‘
07 f ‘ 4 Wm
l h 14 J
4
go. , v. .
| ~ .0 u l \l’ 12* A
€05 ' " ‘, ‘ s I
g 1 g 1 . ‘
g . . g F»
50.4» C _, . i .
z 1’ g »‘.\\ x-+—*
/ on» , . .., é/K -
03 ‘ {I \ / \.
!‘ V ‘6/ \
f. \z \
./ on» ix: 'j
02, a l e
l ..’
, \‘.~
01'r ' < °‘* (9 "*0 ‘ ‘
I ’ VIVA”
1 L A

 

o % L L A L 02 1
0 0.5 1 1.5 2 0 0.5 1 1 5 2 2.5
Normalized load Who me

 

 

Figure 5.4 Throughput of stress test. Figure 5.5 Throughput of sensitivity test.

It can be observed that the throughput of the system increases linearly with the
workload during low load periods (lower than 0.5 of system capacity) irrespective of
the admission control scheme. With the increase in load, the throughput of the CAC
scheme stays behind the other two schemes. The reason is that the CAC scheme
wastes system resources when the inter-arrival rate of requests is temporarily lower
than the system capacity. When the system load is around the system capacity, the
throughput performance the ACES scheme is very close to the SAC scheme. Under
even higher load situation, the ACES scheme outperforms the SAC scheme because
of the variable CQ allocations based on the estimation of service times. Another
interesting phenomena showed in the ﬁgures is that, the throughput of the system
tends to be more stable and not inﬂuenced by the ﬂuctuation of the workload when

using ACES compared to SAC.

104

5.5.2 Average Delay

Figures 5.6 and 5.7 depicts the average delay performance of the three admission
control algorithms under the stress test and sensitivity test. The mean response delay
of the ACES scheme is fairly close to that of the CAC scheme, which is about one
tenth of the delay bounds during overload situations. The average delay is slightly
higher in using the SAC scheme than the other two schemes. However, the average
delay of the SAC scheme is one order of magnitude higher than the other two schemes
during overload periods. Since nearly all the tasks need to wait from the tail of queue
for service under high load situations, the average delay of the SAC scheme under
high load periods is determined by the system total queue length. The experiment
proves that the system responsiveness of ACES is close to the ideal case. Based on
the queue length and waiting time relationship per Little’s Law, the average delay
differences between the three AC schemes suggest that the required queue length

might be much shorter for ACES compared to that of SAC.

 

 

 

 

 

 

 

 

10‘ v r r Y -. 10‘ 1 1'
~7- SAC '9- SAC
ACES ~ ACES
- CAC ~~ CAC
9». . , ‘\~r —6—— ,9.-. a ,0 g/f’xr
/ HHS—GMT »’ 9M“ 0 9 \ 0 W
10 [’1’ 10 r/ \(v 11' §
I . /
F I l, ,v’ \ mawm
/ a - ~ I, l - *
> t ' . ’~ I
i ,Yﬁ - v a g I’ x —I q
3‘0 I f ‘ §10 / 1 \ \ I F: "1
' ;' '1 n , . '
2 2 1:. \r . ‘
yr * it i "x l
/ , // I " if?» ‘
10‘» a '1‘, v, ‘0 ‘, i ‘3‘“.
f" \PJ VI.-.
972,”. l .
‘0 1 g A l J 10 L L 4* 1
o 05 1 1.5 2 2,5 o 05 1 1.5 2 2.5
Normaluzomoaa Tumounoo
Figure 5.6 Delay of stress test. Figure 5.7 Delay of sensitivity test.

5.5.3 Delay Bounds Miss Probability

Figures 5.8 and 5.9 depicts the delay bounds miss ratio of the three admission

control algorithms under two tests. As expected, delay bounds miss rate (Dbm rate)

105

for CAC is zero, since the system stops admitting tasks when it cannot serve them
within the delay bounds. Using the SAC scheme, on the contrary, the system fails to
meet the delay bounds for nearly all the tasks in high load periods. The reason is that
the SAC scheme fails to catch up with the changes of system resource consumption
and tends to over-admit tasks. The over admission leads to the formation of long
queue, thus introducing long waiting time for all admitted tasks. However, under
low or medium load situation (about less than 0.7 of the system capacity), the SAC
scheme performs as well as the other two schemes. The performance of ACES is good
under reasonably high load (less than 1.7 of the system capacity), and nearly no delay
bounds miss is incurred at this load. Under extremely high load situation, the ACES

scheme has about 10% of delay bound miss ratio which is not too high.

 

 

‘0' T ﬁ 7 Y ﬂ 1 V ,
5} SAC E 4}-SAC

. ACES
0-CAC

Fae—e—«kwe rec—e/BW f <

‘.
O
o

a s .

g r 4 gm % Jar—«T WM

3;, l E g x l \

8 J1! £10 : 1 ( ,1 \ 1

—‘
Q
g.

 

 

 

 

.1 . > J» 3 s ‘

; b :~: d x . ,

. 1 _ ‘ ‘ -_
101—g 9-9 9 Q» «g Q o-—. .Iﬁaw. . - 4‘. ~o—c n - $7.7 . n . 104—fwd... o—»»L———.—.—.—.-~.-- —-»o—g_o».—a'...o uu— u—FLH.._._A
0 05 2 25 0 05 1 L5 2 25

1 15
Nonnuuodkmd Ikrmduodknd

Figure 5.8 Dbm rate of stress test. Figure 5.9 Dbm rate of sensitivity test.

The two tests prove that the ACES algorithm is effective in providing delay as-
surance as well as in preserving system throughput under diverse load and overload
situation in a web server system. Its throughput performance is closer to the SAC
scheme, and the response delay performance is closer to the CAC scheme.

The CAC scheme is very powerful in controlling the deadline misses of tasks, as
it does not incurs any bound miss. However, the throughput of CAC is much lower

than the other two schemes. The low throughput is partly due to the strict admission

106

control. The server does not make full use of computing resources when the variance
of request rate is high. The CAC scheme is suited for the hard real-time services
where deadline guarantee is critical.

The SAC scheme is suitable for a system with uniform workload, as it does not
work well in highly diverse workload environments such as a web server. The SAC
scheme performs well during low or medium load periods. During high or overload
periods, the SAC scheme tends to over allocate system resources, which causes long
response delay. Accumulative effect of overload has long term influence in degradation

of system responsiveness and throughput.

5.6 Summary

The growth of the Internet and WWW applications have imposed continuous
challenge to the Internet server performance and quality of service assurance in terms
of predictable delay and throughput. However, the prevalent “bursty” nature of the
server access patterns makes it difficult and expensive to maintain fast response at
all times even by a high performance server. The peak workload of an Internet server
may exceeds the average workload by orders of magnitudes. In this study, we present
a simple and effective admission control algorithm, ACES, to adapt to the highly
variant processing times of web server environments. Tasks are admitted based on
the estimation service time. A double-queue structure is deployed to compromise the
inaccuracy of service time estimation, prevent accumulation of response delay, and
improve the system throughput. A detained experimental measurement of service
time distribution of web objects is conducted to provide foundations of service time
estimation. Simulation results demonstrate that the ACES algorithm is able to pro-
vide assurance of response time while maintain its throughput under various workload

situations.

107

Results also show that the ACES algorithm outperforms existing tail-dropping ad-
mission control scheme by up to 10 times delay improvement. It exceeds the through-
put performance of conservative admission control scheme by about 30% on medium
to high load. Although this study is based on the UNIX environment with high speed
connection, it can be easily extended to other environments with adaption of different

service time computation quantum.

108

CHAPTER 6 AN EFFICIENT ADMISSION CONTROL
ALGORITHM FOR SDIS

6. 1 Overview

Since the beginning of 19905, Internet and Internet services have experienced ex-
ponential growth. Owing to its universal accessibility and low cost, the use of the
web in revenue generating activities, well known as e-commerce, is also increasing at
a fast rate [1]. A widely existing problem in contemporary web servers, however, is
the unpredictability of response time, which is caused by the FCFS service model
and the “bursty” workload behaviors. Usually, one second response time is desired
from web sites, which is appropriate to the human reSponse speed [83]. Long response
delay frustrates user interest in interaction with servers, thus devalues the web ser-
vice quality. Although current web servers are able to serve thousands of requests
per second, the average response delay of a popular server can be several orders of
magnitudes higher than that expected during high load periods, causing the de facto
“denial-of-service” effects.

The response delay of Internet service is determined by two factors: the quality
(delay, delay jitter, and loss rate, etc.) of network transmission, and the processing
capacity of the server. Study of quality of service (Q08) in network transmission
are active in recent years, including efforts of building Integrated Services (IntServ)
architecture [7] with end-to—end (.208 guarantee, and Differentiated Service (DiffServ)

[8] architecture with alternative levels of services provisioning. However, network

109

layer (.208 is not sufﬁcient in providing user perceivable high performance if the server
does not offer any service and performance assurances. Unexpected bursts of client
demands may cause long queuing delay or even crash an overloaded web server. Even
premium data flow with end-to-end QoS guarantee may still experience service re-
jection when the server is overloaded. On the other hand, session/transaction-based
service differentiation can only be provided from application layer servers. With the
boost of resource requirements of web based applications, the web is expected to
evolve from “free” to “paid” web in the near future. Value added services will emerge
with competitive differentiation of service offerings based on proﬁts instead of the
best-effort service discipline.

Recent studies on QoS support in web servers have addressed the technology
of prioritized processing in a web server and related performance issues [3, 4, 5,
6]. By prioritization of tasks, it is possible to maintain low response delay for high
priority tasks during high server load periods by blocking or dropping low priority
tasks. We call this kind of servers as service differentiating Internet servers (SDIS),
which reciprocates QoS efforts in the Internet transmission. The basic ideas of SDIS
include classiﬁcation of client requests into groups with different service requirements,
resource allocations based on the task groups, and prioritized scheduling and task
assignments schemes. A detailed study 011 the concept and performance evaluation
of SDIS is reported in [89].

Prioritized scheduling of a web server has been proven effective in providing sig-
nificantly better delay performance to high priority tasks at relatively low cost to
lower priority tasks in previous studies. However, it is still possible that a signiﬁcant
amount of high priority tasks are dropped under extremely high load situations and
user acceptable response delay cannot be ensured. Effective admission control (AC)
mechanisms are needed to assure the drop rate and the delay bounds of tasks.

Most contemporary web servers use a rather naive AC scheme, namely tail-

droppz'ng AC, in which incoming requests are dropped when the number of tasks

110

awaiting exceeds a predeﬁned threshold. The tail-dropping AC scheme requires care-
ful system capacity planning and works well only in steady workload situations. In a
highly variable workload environment, more ﬂexible AC schemes are needed to adapt
to the dynamics of trafﬁc. For example, secure transactions on the Web, which is
popular in E-commerce, consume much more CPU resources than regular transac-
tions due to encryption/decryption and multiple handshaking overheads. The AC
algorithm should be designed to consider the characteristics of a wide variety of re-
quests.

In this study we propose a simple and effective AC algorithm, PACERS (Peri-
odical Admission Control based on Estimation of Request rate and Service time), to
provide bounds on response delay for incoming requests under highly variant work-
load environments. The PACERS algorithm dynamically adjusts system capacity for
each priority group by estimating the request rate of tasks. Admission of a request
is decided by comparing the available computation power for a duration equivalent
to the predetermined delay bound with the estimated service time of the request.
If the task is admitted based on this constraint, then it is very likely that the re-
quest will be served within the speciﬁc time bounds. The service time estimation is
done on the basis of the request types. A double-queueing organization is used to
diminish the inaccuracy of estimation and exploit spare capacity of the system, while
simplifying the task management. Theoretical analysis and experimental study show
that the PACERS algorithm bounds the response delay for tasks in different priority
groups, assures the service availability to high priority tasks even under high load,
and preserves system throughput under various workload.

The rest of the chapter is organized as follows. Section 6.2 answers how to es-
timate request rate and service times by analyzing the workload characteristics of a
real web server. Section 6.3 discusses the PACERS algorithm in detail and its imple-

mentation issues. Section 6.4 provides a theoretical proof of the delay performance of

111

the PACERS algorithm for each priority group. Simulation results and performance
evaluation of the algorithm are reported in Sections 6.5 and 6.6, respectively. Related

works are discussed in section 6.7. Section 6.8 concludes the study.

6.2 Workload Characterization

Admission control policies are used extensively to provide congestion control and
to enforce desirable performance in computer and communication systems. The
workload on web servers are different from that of the network trafﬁc, thus exist-
ing network-level AC algorithms might not suit well in web server environments.
Apparent self-similarity and long-term dependency are prevalent in the WWW trafﬁc
pattern [66, 61, 72]. On the other hand, processing of web requests are much more
complicated than the handling of network packets. To explore a simple and effective
admission control algorithm which ﬁts to the web server environments, we analyze
the traces of a busy web server to track trends and characteristics of resource require-
ments of the requests. The trace data is collected for a week from the web server of
the Department of Computer Science and Engineering at Michigan State University,

which encountered about a million requests during the period.

6.2.1 Access Distribution

The access logs provide the request timestamp, client ID, object URL, service
status, and reply size of each request. The referrer logs complement burst and session
information. Coefﬁcient of Variance (CoV) of request inter-arrival rate is 3.59 during
the observation period. High CoV means high variances of access rates, which are
demonstrated in Figures 6.1, 6.2, 6.3, 6.4, 6.5 and 6.6. Figures 6.1 and 6.2 plot the
number of requests and web trafﬁc volume (reply size in bytes) of the web server per
second in a day. Figures 6.3 and 6.4 show the number of requests and web trafﬁc

volume of a web server per minute in a day. The requested web trafﬁc volume have

112

even higher CoV value of 14.4. Note that the traffic volume is calculated according to
the access logs of the web server. in which the reply size of each request is recorded.
The access logs record the time point when a request is ﬁnished instead of the traf-
ﬁc transmission duration. The real network trafﬁc should be smoother than those

indicated in the diagram due to throttling effects of network transmission.

T ‘7

 

 

10000-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

Figure 6.1 Access distribution a Figure 6.2 Trafﬁc distribution a
day (1 sec). day (1 sec).
2300’ g
s E
325° 3
8
Figure 6.3 Access distribution a Figure 6.4 Trafﬁc distribution a

day (1 min). day (1 min).

The Coefﬁcient of Variance (CoV) of request inter-arrival rate with 1 second res-

113

olution is 3.59 during the observation period, which by large is due to the cyclic
behavior of the server access pattern. Figures 6.5 and 6.6 show the request intensity
and trafﬁc volume per hour in a week. Distinct seasonal / periodic behaviors according
to the time series can be observed from the ﬁgures. For example, the access intensity
continues to increase from early morning, at 10AM reaches the high intensity area,
and stays high till late afternoon, then decreases slowly to the lowest point in a day,
which is about 3AM in the next day. Similar traﬁic pattern repeats around the same
time each day, which suggests that both short and long term history should be used

as predictors of future workloads.

 

 

 

 

 

 

 

 

Q./"” 6
a}
x13

 

 

1_ 1 1 A 1 l 1 A 1 1 L i 1 L
0 20 ‘0 60 80 100 120 1‘0 ‘60 O 20 ‘0 60 80 100 ‘20 1‘0 160

mammr) WWO")
Figure 6.5 Access trends in a week. Figure 6.6 Tfafﬁc trends in a week.

The fluctuation in trafﬁc intensity can be several orders of magnitudes, which
causes the high variances in access rate and bandwidth requirements, thus resulting
in high average waiting time of tasks in a system. Care should be taken to eliminate
the effects of variances. An interesting phenomenon is that the access pattern of the
web server tends to be consistent in a short time window. For example, the CoV of
access rates decrease to around 1 when they are measured on an hourly basis. The
decrease for CoV is even more obvious during busy hours. Similar observations have

been reported in [91, 92, 77], which suggest that a multi-state Modulated Markov

114

Poisson Process (Alb/IP13) can be used to approximate or predict the burstiness of the
aggregate input of the web server, which is discussed in more detail in Section 6.3.3.

Note that during a. week’s observation period, there is one overload point which
causes request access rate of about 10 times higher than the common peak load of the
day. We examined the logs and found that the abnormal overload was caused by a
group of careless CGI based requests. Those requests directly caused the web server
failing to respond to any other request for 21 minutes. This is one of the examples
which justify the need for appropriate AC to prevent system resources being wasted

in careless or malicious request attempts.

6.2.2 Object type distribution

Table 6.1 lists the requested Web object types and corresponding trafﬁc distribu-
tion of the traces. Web object types are categorized on the same basis as reported in
[86]. The ﬁrst and second rows of the Table show the request ratio of each major type
of web objects. It can be observed that requests for small and static web objects, i.e.,
HTML and image ﬁles still dominate the incoming requests and web trafﬁc. The third
row shows the mean reply size of each object type. The fourth row shows the CoV of
the reply size of each web object type. The data show that the CoV of web traffic is
as low as 1"2 for each type of web objects. However, the CoV of the aggregated web

trafﬁc can be as high as 14.41.

Table 6.1 Trafﬁc distribution vs. object type.

 

 

 

 

 

 

 

 

Item Req. (70) Trafﬁc (%) Mean (KB) CoV
HTML 19.2 15.0 5.76 1.90
Image 68.8 49.2 4.98 2.46
Audio 0.2 2.5 579.9 1.76
Video 0.1 6.7 2503.9 1.56
Dynamic 4.9 4.4 6.84 1.33
Other 6.8 20.2 19.0 7.90
Total 100 100 7.39 14.4

 

 

 

 

 

 

 

115

There are. non-negligible percentage of dynamic objects. This type of access re-
quires more processing power, and thus increases the server load and the unpre-
dictability of response delay. Previous studies [14] have estimated that a dynamic
object requires 10 to 100 times of service time than a typical static object. The
processing time difference was conﬁrmed by recent benchmark tests of popular com-
mercial web servers [35] and the service time characteristic study in Chapter 5.3.2.
The throughput of serving dynamic tasks are usually lower than serving static tasks
by a factor of 10. Continuous media (CM) objects (audio and video clips) are now be-
ing used more extensively in the web environments (compared to the previous results
in [86]). These data types requires high storage capacity (thus uncacheable), high
density I/O processing, and sustained bandwidth with bounded delay constraints.
The average CPU time for serving a CM object is also much higher than the average
CPU time for serving small static objects. Signiﬁcant differences in size of different
type of web objects, high volume of CM objects, and computation-intensive dynamic
requests suggest that the service time variance of the requested web objects should

be considered in estimating the server capacity and response time.

6.3 Admission Control Algorithm

This section presents how the web server workload characteristics discussed in the
previous section impacts the design of our AC algorithm. The goal of the PACERS
algorithm is to provide response delay bounds to incoming requests while preserving

the system throughput of busy web servers.

6.3.1 Overview of the algorithm

Usually a queue of incoming requests is maintained in a web server awaiting to
be processed. Using the toil-dropping AC scheme, incoming requests are put in the

queue until the queue is full. Queue length is not always a good indicator of system

116

load, especially when the variance of processing time is high. Without effective AC,
the server response time and throughput deteriorate drastically when the aggregate
request rate exceeds server capacity, indiscriminately affecting all clients. Abdelzaher
and Bhatti [52] reported that as much as half of the web system’s processing capacity
is wasted on eventually aborted/ rejected requests when the load is high.

To accommodate the high variance in request rate and service time of web servers,
we propose a simple and adaptive AC algorithm, Periodic AC based on Estimation
of Request rate and Service time (PACERS), to ensure the availability and delay per-
formance of prioritized tasks. The predictive strategy estimates the periodic request
rate of each priority group, and guarantees the performance of higher priority tasks
by restricting admission of lower priority tasks. The request rate estimation is based
on the history of access pattern. Service time of each task is estimated based on the
request types which is more or less delineated by the size of the responses. Inaccu-
rate estimations are dynamically adjusted by a double-queue architecture as described

later.

6.3.2 Admission Control and Delay Bounds

We ﬁrst examine a non-prioritized system to have a systematic understanding of
the proposed algorithm. Assume that the time is divided into discrete slots (0,T),
(T, 2T), ..., (kT, (k+1)T), . For simplicity, we use the beginning point of each time
period, kT, to represent the duration (kT, (k + 1)T). Let c be the unit processing
capacity of the system, C (kT) be the predicted system processing capacity in period
kT, S (i, kT) be the service time needed by the ith task at period kT, and n(kT) be
the number of admitted tasks in period kT. If the server has a precise knowledge of
service time needed by each task, the admission decision can be made based on the
following expression.

n(kT)
C(kT) : c * T 2 Z S(i, kT). (6.1)
2'21

117

If expression 6.1 is true, then the request is admitted, otherwise it is rejected. The
maximum response delay of each task is bounded by the value of T if the service time
of each task is known prior to the admission decision phase.

In a prioritized system, the periodic system capacity seen by different priority
groups changes with the prediction of resource requirements of each priority group.
By adjusting the assigned system capacity to each priority group, the PACERS al—
gorithm provides service quality assurance to prioritized tasks. There are two kinds
of service disciplines that can be provided by the PACERS algorithm: prioritized re-
source allocation (PRA) and weighted fair allocation (WFA). PRA is implemented by
assigning resources equal to the whole system capacity to the highest priority tasks
(or premium tasks). Lower priority tasks get the remaining resources. The WFA
is realized by setting shares of system resources in each priority group, where each
priority group get at most/ least their shares. In this study we only discuss the PRA
control scheme. The WFA scheme can be easily extended from the PRA scheme by
setting a minimum or maximum resource ratio for each priority group.

The objective of the server is to ensure QoS to high priority tasks whenever their
arrival rate is lower than the system capacity. Thus, for high priority tasks, the avail-
able resources are equal to the system period capacity. For lower priority tasks, the
available resources are the system capacity minus the predicted resource requirements
of higher priority requests during a time period. Since the priority ratio and type dis-
tribution of incoming tasks vary over time, dynamic assignment of system capacity
is needed to preserve the system throughput.

Assume all the requests are classiﬁed and are assigned a priority p, (p = 1, .., P),
wherein P denotes the highest priority. Let Afrmided, (i = 1, .., P) be the predicted
inter-arrival rates of tasks for each priority group. The system capacity available to
priority group p at time kT S t S (k + 1)T, denoted as Cp(kT, t), is:

P
0,,(kr, t) = C(kT) — Z Af”"”“‘ed(kT) * :r — Ap(kT) * t. (6.2)
i=p+l

118

A task with priority p is admitted if the service time is equal or less than available

capacity CPUL'T, t).

6.3.3 Estimation of Request Rate

The resources allocated to each priority group is based on the prediction of the
request rate of incoming prioritized tasks. Apparent seasonal workload patterns cor-
responding to daily cycles discussed in the previous section can be used to predict
current traffic intensity based on the workload history. On the other hand, reports in
[92, 77] suggested that the aggregate web trafﬁc tends to smooth out as Poisson traffic
in short observation time windows. This fact was further proved by Morris and Lin
in [91]. Based on the above published results, we decide to use Markov-Modulated
Poisson Process (MMPP) described in [93] to capture the seasonal behavior of the
aggregate workload of web servers, while preserving the tractability of modulated
Poisson process. The request arrival process is described as a Markov process M (i)
with state Space 1, 2, ..., i, ..., N. State i has arrivals complying with Poisson process
at rate /\,-. To follow the seasonal behavior, especially the day/ night cyclic behavior of
the web server load, the observed trafﬁc data is chopped into subsets for each hour on
a daily basis. The Markov transition matrix Q(n) = [Q,j(n)], (n = 1, ..., 24) for each
hour can be easily calculated by quantizing the inter-arrival rate in each observation
period, and calculating the frequency at which M (n) is switched from state i to state

j. The predicted access rate APTEdfC‘ed(kT) can be expressed by the following equation:

Ai’redidedar) = [/\((k — 1)T)]. * Q(kT), (6.3)

where [M(k — 1)T)] is the state vector of measured inter-arrival rate in the pre-
vious period. Q(kT) can be further adjusted by comparing the differences between

predicted data and measured data, to catch up with the long term trends of a web

119

server load. In the experiment, we use three state (pm, psame, pdec) transition matri-
ces to capture the trafﬁc trends in each observation period. pm is the probability of
increment request rate, psam the probability of the same request rate, and price the
probability of decrement request rate. The 6 value for increment and decrement is
10% of measured value. The experiment shows that there is not much difference in
the capability of capturing the trafﬁc trends in each observation period between a

three state transition matrix and more complicated state transition matrices.

6.3.4 Estimation of Service Time

While deciding to accept a request, the system should ensure that the sum of
service time of accepted tasks do not exceed the system capacity. In reality, it is
not possible to know the service time S (i, kT ) in advance. High variance in resource
requirement is a widely recognized characteristic of web server workload. As indicated
in the previous section, however, the service time and bandwidth requirement of the
same type of requests are more or less consistent. Service time of web objects can be
thus estimated based on the request type distribution of the server access pattern. We
approximate the service time of each task by using weighted computation quantum
(CQ) matching the CPU processing time of different type of tasks.

When a new request arrives, the system checks if there are enough CQ available
to grant to the request. Only requests that are granted enough CQ can be enqueued
and served eventually. The number of CO issued to each task is determined by the
resource requirement of the request. Here we denote T n as the number of types of
requests to a web server, which can be derived from the workload characterization of
a web server.

Let N,(kT) be the number of requests of type i in period kT, CQ, be the weighted
CQ matching the CPU processing time for type i tasks. Then Equations (6.1) and
(6.2) can be

120

approximated as.

Tn

C(kT) 2 :CQ,*N,(k-T) (6.4)
i=1
Tn

0,,(kT) 2 :CQ,*..’V,-,p(kT). (6.5)
i=1

As in the case of Equations (6.1) and (6.2), a request is admitted if Equations

(5.3) and (6.5) hold true, otherwise is rejected.

6.3.5 The Double Queue Structure

Since Equations (5.3) and (6.5) are approximations of Equations (6.1) and (6.2),
care needs to be taken to amortize the accumulate delay influence of over-admission
during a time period. Assume that a restricted prioritized processing is enforced
inside a queue, i.e., no lower priority tasks get served if there is a higher priority
task waiting, incoming requests are queued in the order from high priorities to low
priorities. Requests in the same priority are queued in FCFS order. When over-
admission happens, it is possible that low priority tasks stay in the queue for a long
time awaiting services while high priority tasks get dropped due to lack of queue
space. On the other hand, under-admission wastes system capacity. A double-queue
structure is used to handle the over/ under admission problems. A primary queue is
used as the incoming task queue, and a secondary queue is added for the backed up
requests (we call this as the backup queue). The system structure is shown in Figure
5.3.

An incoming request is ﬁrst sent to the AC manager ACM. The ACM classiﬁes
the request priority and decides if the request can be enqueued. Enqueued requests
wait to be served in the primary queue Qp. At the beginning of the next period,
unﬁnished requests are sent to the backup queue Qb. When the Qb becomes full,
it is cleared up and queued tasks are dropped. Other methods for expunging tasks

from Qb can be explored. The task scheduler TS picks up requests in the queues

121

 

 

 

 

6*

drop

 

Server

 

 

backup

drop

Figure 6.7 Web server system structure.

and sends to the server pool. No request in the Qb can be picked up unless the Qp
is empty. Replies are sent back through the server network interface Nb. By using
a double queue structure, newly accepted requests need not wait for a long time for

service, thus bounded delay is achieved for most of the requests.

6.4 Response Delay Analysis

Service differentiation is desired to permit differentiated pricing of Internet ser-
vices. These characteristics may be speciﬁed in quantitative or statistical terms of
throughput, delay, and /or drop rate. AC is generally used to regulate the through-
put of a speciﬁc service group. The PACERS algorithm provides delay bounds and

average waiting time assurance as well as throughput regulation.

6.4.1 Ideal Case Delay Bounds

In a stable system, i.e., the server has the capacity to process all the requests in
the steady state, the request queue helps in smoothing out the short term variations
in the job arrival rate. When the short term arrival rate is faster than the processing

rate, newly arrived requests have to wait in the queue for service. If the variations

122

of job arrival rate and processing rate are high, more queue space is needed, and the
average queueing time and response delay tend to be high. If enough buffer space is
not available, denial of service occurs when the queue is full.

One way to increase the response time predictability of a system is to limit the
number of requests in a given “short time” period T to no more than what the system
can handle in that period of time. Borowsky [85] provided a theoretical proof that the
service time required by pending jobs is no more than T in a F CF S work-conserving
system, if the sum of service time required by the arriving jobs at any duration T is
bounded by time T. Thus the response time (or delay) of all requests can be bounded
by T by restricting the workload arriving in every interval of length T time.

In a prioritized environment where each request is assigned a speciﬁc priority, the
processing order and resource allocation are based on the priority of each request. The
FCF S processing assumption in the above theorem is no longer valid. In [5], we proved
that the mean response time of a request is bounded by the arrival rate with equal or
higher priority and the service rate of the system, if requests with different priorities
have the same service requirement distribution and a strict priority scheduling is used.

Let p(1 S p S P) denote the priority of the requests and P be the total number
of priorities. Let sp(i) be the service time requirement of task i belonging to priority
p, and Np(t — T, t) be the number of requests of priority p arriving during the period
T.

Lemma : In a prioritized preemptive work-conserving environment with no pend-
ing work, if N(t — T, t) requests arrive belonging to a single priority level p, then their
response time is bounded by T1,,(Tp < T) if 2:194") sp(i) 3 TP (proved in [85]).

Theorem : If the priority levels are in the increasing order of p, the response
time of a task with priority p is bounded by Zfzp T,,. If 25:17} 3 T, the response
time of any task is bounded by T.

Proof : Let RpJ—(n) be the response time of the ith task in the priority class p

during period n, and brusy_time(n) be the amount of time in (n) that the system is

123

busy during period 71. Let the request time series be partitioned into periods of n
with length T.

For n = 1,i.e.,t = (0,T), without loss of generality, let all requests arriving
during (0, T) be reshufﬁed to be served in the decreasing order of p (for p =
1, ..,P), then Rp,,-(1) <= Tp,fori = 1, ..., .\7p(1), and Rp,,(1)<=2q__p+, Tq + Tp <=
25:,0 Tq, forp : l, .., P.

Assume the above expressions hold true in the period It — 1, reshuffle the requests
in queue in the decreasing order of 1) during period k. Let Qp(k ) be the sum of the

service times for the pth priority tasks, then

P quk—I)
Qp(k) : 11(k — 1) + 2
(1:10 1:1
busy-time(k), (6.6)
Clearly, (2,,(k —- 1) <2 busy-time(k). By assumption, 2,,_ p2,:", "(T 1)sq(i) <=
2,1,):qu, so
P Nq(k—1) P
2 Z 5,,(i) <=ZTq. (6.7)
q=p i=1 q=p
Hence, by induction and from Equation (6.7), we get fem-(k) <= 52,, Tq(i =

1,.. .,1\ 'pk( ), forp— — 1,. P, andk = 1, 2, i.e., the response delay boundary of each
prioritized task can be achieved by controlling the requested service time in each
period of length T. As long as the sum of service times requested by high priority
tasks does not exceed T, the system response time can be assured by restricting the
number of admitted low priority tasks.

The granularity of T effects the system performance. Increasing the value of T
smoothes out the access variance between adjacent periods and allows more requests
to be admitted. However, larger value of T also increases the variance of response
time and degrades the system responsiveness. Both user perceivable delay and system

throughput should be considered in determining an apprOpriate value of T.

124

6.4.2 Waiting Time Estimation

The service time of a web object can be estimated from statistics of the same or
similar type of web object with acceptable variance. The number of requests allowed
by the system can be derived from the expected waiting time and the job arrival rate.
Let lrl',,(t) denote the expected waiting time of tasks with priority p at time t, Ap(t)
be the expected arrival rate of tasks with priority p at time t. Let W(t) denote the
expected waiting time of a task in the system at time t, A(t) be the expected arrival
rate of tasks of the system at time t. In the period k, the expected number of tasks

in the system equals:

P
N(k) = W(k) * A(k) = Z Ap(k)W},(k). (6.8)
p=1
According to Little’s Law [70], the waiting times of each priority group equals:
N(k) — ftp“ A,(k)W,-(k)

Ap(k)
On the other hand, the waiting time of a task in the pth priority group equals

141,0.) = (6.9)

 

the residual life of the executing task W0 plus the sum of service times of equal or
higher priority tasks in the queue, and the sum of service times of higher priority
tasks arriving while it waits in the queue. The mean service time of group i in period

It is represented as S,(k). The waiting time is thus equal to :

P
I’VPUC) = I’Vo + Z Ni(f€) +

izp

i=p+1

Let p,-(k) = .' ,(k:)S,-(k). By combining results of Equations (6.9) and (6.10) we
can get,

W0
P P
(1— szlle - Z: Pilkll

i:p+1

W. (k) =

 

1’25

1

P P '
c(1— Zp,(kt))(l — Z 9103))

i=p+1

 

S (6.11)

c is the unit processing capacity of the system, its inverse value is the worst case
expected residual life of an executing task, which happens when the utilization factor
approaches to 1. Equation (6.11) shows the mean waiting time of prioritized tasks
by using the estimation of the inter-arrival rate of equal or higher priority tasks.

Similarly, we can get the expected task inter-arrival rate and acceptable number

of tasks in each period based on the expected waiting time. The result is shown as:

P
in — mam — )3 pm)?
Np(k) = .4,,(k.) * :r = '2?“ T. (6.12)

,.
l—Imese) Z 121k)

i=P+1

 

Note that the above analysis are based on the periodic data collection assumption.
In fact, all the data used in the above equations can be easily obtained from a few
counters which are reset periodically. The workload of each priority level is estimated
at the end of each period. Expected number of tasks is determined based on the
expected inter-arrival rate. During one period, if the number of incoming tasks of
one priority level exceeds the expected number, then there is no need to accept new

requests in the same priority level until the beginning of next period.

6.5 Simulation Model and Parameters

In this study, we simulate an event driven server model using empirical workload
from real trace ﬁles. The reference locality, size and object type distribution are
extracted from the logs of the Computer Science departmental web server at Michigan
State University and are used for the workload generation. We only consider two

priority levels to simplify the study since the primary concern is to examine the

126

Table 6.2 Simulation Conﬁguration.

 

 

 

 

 

 

 

 

 

 

 

 

 

[ Parameter [ value 1
Priority Level 2
Scheduling Period 1 sec.
Static Obj. Throughput 1000 req./sec
Dynamic Obj. Throughput 100 req./sec
Network Bandwidth 100 Mbps
Disk Bandwidth 100 Mbps
Caching hit ratio 0.7
Maximum open connections 1000
Maximum queue length 1000
Maximum server process number 30
Response delay bound 1 sec.

 

 

 

 

effectiveness of the PACERS algorithm. The system conﬁguration is shown in Table
6.2.

Three performance metrics are used in the simulation: server throughput, mean
response time, and response delay bound miss rate of each priority group. System
throughput indicates the server capability. Mean response time and delay bound miss
rate quantify the service qualities in each priority group. The delay bound miss rate
presents the proportion of tasks whose response time exceed the bounded delay.

Based on the data reported in Section 6.2, web objects are classiﬁed into 4 types:
static, dynamic, audio and video objects. The CQs consumed by each object types
and their input percentage are listed in Table 6.3. The CQs allocated to each object
type is based on the results in Section 5.3.2.

Table 6.3 CQs requested by each object types.
Object Types Static Dynamic Audio Video

Credits 1 10 20 100
Request Freq. 94.7 5 0.2 0.1

 

 

 

 

 

 

 

 

 

 

6.6 Results and Evaluation

Our experiments evaluate the performance of the PACERS algorithm as opposed
to a simple admission control (SAC) algorithm, which is analogous to the leaky bucket
[88] algorithm used in trafﬁc engineering in network transmissions. The SAC algo-
rithm produces CO at a constant rate matching the server throughput; each admitted
task consumes one CQ. Incoming tasks are dropped if there is no CQ available. How-
ever, estimation of the request rates is not implemented in the SAC algorithm. The
SAC scheme outperforms the tail-drop scheme, since it smoothes out the web server
trafﬁc and also provides preliminary overload protection.

In the experiment, high priorities are assigned to 50% of the incoming requests
randomly, and the remaining requests are marked as low priority. Thus the ratio of
high priority to low priority tasks is 1 to 1, and both types of tasks are randomly
distributed in the whole arrival sequence. The web server trace in Section 6.2 is
used to generate request at various rate and the performance is recorded from the

simulated server environment.

6.6.1 Throughput Performance

Figures 6.8 and 6.9 plot the throughout of stress test of two priority groups.
Normalized load in the x-axis is the ratio of real load to the system capacity, and
the normalized throughput is ratio of the throughput to the system capacity. In
the stress test, the aggregate request rate increases continuously till 2.5 times of
the system capacity. The objective of this experiment is to explore the capability
of the algorithm in preserving the system throughput and delay performance under
extremely high load by the stress test.

Figure 6.8 shows the throughput variation using the SAC algorithm. It can be
observed that the throughput of each priority group is proportional to the trafﬁc

percentage of each group. High priority tasks suffer from the same low throughput

128

as low priority tasks when the aggregate workload is high. Predeﬁned QoS proﬁle

strictly prioritized admission and scheduling, are not ensured.

PACERS. Phony Rat-oi 1
SAC Pnomy Ram 1 1 1.2 Y Y a

 

 

-- - Low poo inmohou!

 

 

 

 

 

 

 

”-T—'ﬁ fﬁm” V i, _ .1 - ¥__‘. _
- Lou one thrmgmut ‘ . .- High 000 quroul ‘
- Hugh om) throuﬂ'xpgt H h pno Load
1" i. Hﬂh {mo Load T
_‘ 1)- 4
y»
09 < *1
08 a
303 « g
3 ° 7* ] é «’-
£ 1 EOG’
J
£05“ < E
b 1‘-
04 ‘ ‘— * I. " h K A ’1 ‘ f {l x" ‘ 1
/ d ' A ‘ 1x
oa~ , " ‘ 02L 2‘ x
J. ’ .
02 '1 1‘ ‘
.’{ ‘
A I
01 g o # 1 1 A». A .1.
, 1 1 1 1 1 1 1 A , 1 0 0.2 04 0.6 0.8 1 1.2
0.1 0.2 03 04 05 06 0.7 03 0.9 1 11 Nomllzod lond'o'h-ohpmmﬂu

lelZOd load Ior mgr onomy lam

Figure 6.9 Throughput using PAC-

F' 6.8 Tl Th t ' SAC.
1gure iroug pu usmg ERS.

Figure 6.9 shows the throughput of each priority group when the PACERS scheme
is deployed. The system stops serving low priority tasks when the high priority load
approaches about 80% of system capacity. On the other hand, the throughput of
high priority tasks equals the request rate of high priority group until the system
utilization reaches 0.8, which is the highest throughput of the system using the traces
as input workload. The throughput of high priority tasks remain at the 80% level
when the high priority tasks overload the system. It can also be observed that the
aggregate throughput of the two AC schemes remains at the same level. It suggested
that the prediction and reservation behaviors of the PACERS scheme do not degrade

the system aggregate throughput.

6.6.2 Delay Performance

The average delay of each priority group is also well controlled under the PACERS

scheme. As shown in Figures 6.10 and 6.11, the average delay of each priority group

is much lower than the predeﬁned delay bounds of 1 second by using the PACERS

 

 

 

 

 

 

 

 

a1 gorit hm.
. Pnomy Ratio H. Law pm 13911 peﬁormamo Pnonty Rot-01:1, High poo lull pond-mane.
‘0 f If - u a 3 1o h 1 Y Y T Y i
I l 1 J.)- SAC lhrciugrpul
: : ' ' SAC dezny 4
t .., , » ~ - PACERS throumpul .
[ i < PACERS corny .
, . , dela bound:
10 1- 10 [- ~
; 1
: : f ‘
+ - [
L .
> 10' f '———‘ ‘3 )10‘ I l
g ; ‘ <1 SAC Yh'cuqmui : 5 ,
F ‘ ' - SAC delay I 8 ,
‘ i - PACE RS lhrmghpui .1 z ,
i - PACERS Oeuy . 4
§ . [ deaa bounds ,
10 F — A 10 1
c r 1 E 3
*- l . '- l
I ‘ a . ,
t r
102 g ‘ ‘00 .r - i
: ‘ i I: — - 4: - - F
6, 9 9 91» k( e 4 t 0 g Q J» '3 ‘waOoUﬂGPOQ ’0-9'5b
, O ' l 8 b
a ’3
10 y ,9 mi 91 1
' I
. A A L L A 1 i A L 1 . L L 1 1 l A l 1 J A J
01 02 0,3 04 05 06 O7 08 09 1 11 0,1 0.2 03 04 05 08 07 0.8 0.9 1 1.1
Hammad toad Iov high priority tasks Normanzod load to: high pnomy mm

Figure 6.10 Performance comparison Figure 6.11 Performance comparison
of low priority tasks. of high priority tasks.

Figure 6.10 plots the low priority task delay and throughput performance of the
two AC algorithms. Using the SAC algorithm, low priority task throughput is pro-
portional to the ratio of low priority trafﬁc. However, extremely long delays are
introduced when prioritized processing is used. On the contrary, the PACERS al-
gorithm blocks admission of low priority tasks during high load periods in exchange
of low delay bounds miss ratio. The prediction and CQ assignment behavior of the
PACERS scheme avoids unnecessary waiting of low priority tasks, thus decrease the
system overload during high load periods.

Figure 6.11 plots the high priority task delay and throughput performance of
the two AC algorithms. Using the SAC algorithm, high priority task throughput is
proportional to the ratio of high priority trafﬁc to the total trafﬁc volume, although
the high priority task trafﬁc is low. Delays of high priority tasks are kept low because
prioritized processing is used. On the contrary, the PACERS algorithm preserves
throughput performance of high priority tasks during high load periods with the cost

of higher average response delay, but still within the delay bounds.

130

As we can see from the workload analysis, the trafﬁc intensity varies with time
periods. It is reasonable to believe that trafﬁc intensity of each priority group also
varies with time in service differentiating environments. Request rate estimation helps
to preserve system resources for high priority tasks and reduce wastage of system
resources. We test the performance of the PACERS algorithm under ﬂuctuating load

situation.

6.6.3 Sensitivity Test

Figures 6.12, 6.13 and 6.14 plot throughput and delay performance under ﬂuctu-
ating workload. The experiment is aimed at examining the sensitivity of the PACERS
algorithm to the variation in the workload. he test set can be phased as sustained
lightload or overload, and occasional lightload or overload. The occasional overload
duration is 2 time units, and the sustained overload duration is 10 time units. Each

time unit is 1000 times of the observation period.

PACERS Pnomv Ratio 1 1 Delay ponormimoo. Pnonty Raho 1:1
T I

 

 

 

 

 

 

 

 

i 2 r r T v ‘f ~ I ‘W
4: Lowpnothrouorun -
. niqupnowmnput T: if" 9:33:31;
00W mL 4
1 r- 4
am»-
'i’
700- 1]
018- . , l '[
a l
g . ,..... n
E 06 3 i / [ Iondmm
g [. a 9 SUN- l [
a ' ‘ '
E . g 400
i /
o a » . ‘ ( Gk /
-43; ‘1 \‘d -
...,. 1 z ~ . ..,. - 1 l
1 go 3 1x 1'\ I
a 7’ \\ ‘\ I ‘\' f i
[ . . a 1 200 I 1 , 1 ~
02 , , R\ I ‘g ‘9’4 x'i
‘3 I Q / 'j ' \ ﬂ
l 100 1 g u \ I f \ J
‘4' , [ ,‘l -, j! a [ / 0‘\S\
o \L 4 ); A7 A 4‘ 4 0 fr: ‘ I 'l :21 :Y' n l 1 A 4A] ‘ . . l,, n ‘u ‘
2 4 o a 10 12 14 16 18 20 22 21 5 1o 15 v 20 v 25

Figure 6.12 Throughput under ﬂuctu- Figure 6.13 Delay under ﬂuctuating
ating load. load.

The maximum resource requirements of high priority requests equal the system
capacity. It can be observed from Figure 6.12 that the PACERS scheme is very

sensitive to the load ﬂuctuation, which blocks the low priority tasks under high load

131

situation to preserve the throughput of high priority tasks, and resume service of
low priority tasks during medium to low load situation. Figure 6.13 shows the delay
performance of the two priority groups. The delay of high priority tasks follow the
trends of incoming high priority workload, but the maximum value is only about one
ﬁfth of the delay bounds. The delay of low priority tasks is also well controlled under
the delay bound. Figure 6.14 shows the delay bounds miss ratio of the two priority
groups. High priority tasks experience zero delay bounds miss ratio under various load

situations; delay bound miss ratio of low priority tasks occasionally exceeds zero.

 

 

 

 

 

 

25 0 5 25

Time some Time um

Poorlty Rat-o 1'1 Pnonty Ratio 11
04 Y r 1 0‘ T T 1
-€* Low pno ratio > *9 Lou pno ratio
- Hugh pno ratio + Hugh poo rlho
H n Load - H h no Load
0 35 < 0.35 <
O 3 0.3
.3 0,25 .. g 0.25 "
3 3
E load curve E load cum
5 02 > g 02 ~
). >
5 a
8 0,15 » 3 015
01 01 l-
o 05 > o 05L ,9,
/ \
a fix /I \\
00 7+4—o grew- 00—1?) o“9+0155—0 o4ﬂ- \goa o~e—o— o—o H—-¢+o 9+. We.“ -‘65+0~H4260'H—§—0—l

Figure 6.14 Delay bound miss ratio Figure 6.15 Delay bounds miss ratio
under ﬂuctuating load. under ﬂuctuating load.

6.6.3.1 Performance Isolation of Priority Groups

Figures 6.15, 6.16, and 6.17 show the experiment results where high priority tasks
trafﬁc ratio is 20%. The objective of this experiment is to examine the performance
impact when large quantity of low priority tasks is presented.

It can be observed that the throughput and delay performance of high priority
tasks remain the same with or without high volume of low priority traffic. The exper-

iment proves that a large amount of low priority trafﬁc does not have any impact on

132

PACERS. High pno ' Low pno . 14
Y Y Y Y

 

 

—_’::-—r—. . _‘::i 350r
+ Law prm mrwghpot |
O Hogh pno throUanut
H 71 pm) Load

0 Low pno delay

     
   

I H1 h 00000“

,0
so

 

O
a
O

o
‘4
Vﬁr
A

0

1
§

Normal ((10 throughout
0
U!
,7
1
. >11
1"
Ave 399 response delay
8
r
O
O

O O O
N U A
v
E
K
n
14 1
8
0

0
§
‘
P
.4

 

 

 

 

 

 

O

1 1 1 1 4% 41 1 1 r ' A 1
2 4 6 a 10 12 14 16 18 2O 22 24 0 S 10 15 20 25
Tm senos Time some

Figure 6.16 High Priority throughput Figure 6.17 High Priority delay high
high background trafﬁc. background traffic.
the throughput or average delay performance of high priority tasks. Thus the PAC-
ERS scheme offers effective performance protection to high priority requests under

various load situations.

6.7 Related Works

Several studies on QoS support in web servers have addressed the technology of
prioritized task processing and admission control issues. Bhatti and Friedrich [3]
addressed the importance of server QoS mechanisms to support tiered service levels
and overload management. A Web-Q08 architecture prototype was developed by
adding connection manager module to the Apache [49] Web server. Admission control
is implemented by blocking low priority tasks when the high priority waiting task
number exceeds the threshold. Eggert and Heidemann [6] evaluated application level
mechanisms to provide two different levels of web services. The admission control is
implemented by limiting process pool size and response transmission rate to different
priority groups. Pandey et. al. [48] described a distributed HTTP server which

enables C208 by prioritizing pages on a web site. The admission control is realized by

133

assigning communication channel to prioritized pages. All of the admission control
mechanisms mentioned above are based on a predeﬁned “threshold”. Performance
of high priority tasks is guaranteed by emulation of a ﬁxed bandwidth “leased line”.
However, it is expensive to satisfy the requirements of burst workload by “leased line”

scheme. since peak loads are several orders of magnitude higher than average load.

6.8 Summary

VV' hen a server is unable to provide satisfactory service to all requests, selective
resource allocation is a promising technique to assure service to requests which are
more important to clients or servers. In this study, we present a novel and effective
admission control algorithm, PACERS, to adapt. to the highly variant access pat-
terns and processing of web server environments. Tasks are admitted based on the
estimation of periodical behavior of prioritized task groups and service times. The
performance of high priority tasks are preserved by blocking the trafﬁc of low priority
tasks when the system load is high. Delay of most tasks are bounded by the algorithm
processing period.

Theoretical proof and simulation results demonstrate that the PACERS algorithm
is able to provide assurance of response time and throughput performance for each
priority group under various workload situations. Its aggregate throughput perfor-
mance is similar to the SAC algorithm, while the peak high priority task throughput
is much higher than the SAC algorithm. The average delay is about 10 times lower
than the predefined delay bound, and the delay bound miss ratio is 0 during most
of the time periods. Compared to “threshold” based admission control such as tail-
dropping, the PACERS algorithm adjusts to the burst in the workload dynamically,
increasing throughput during low load periods while ensuring QoS during high load

periods.

134

The algorithm can be expanded for web server session control if the session / connection
establishment requests are assigned lower priority to protect existing sessions in the
server. The estimation of the average response time of the server system can be used

to determine the appropriate queue space. thus can be used for capacity planning.

135

CHAPTER 7 CONCLUSIONS AND FUTURE WORK

The major contribution of this dissertation can be summarized in the answers to
the following questions:

Why service diﬁerentz’atz’on is needed in Internet servers? This question can be
further divided into two sub-questions. First, why choose Internet servers as the
research subject? The answer is that the Internet servers are'becoming the perfor-
mance bottlenecks of Internet services with increasing service volume and diversity
of applications. Second, why service differentiation is needed? The reason is that the
widespread of e-commerce activities in the Internet environment demands signiﬁcant
amount of dynamic data generation and secure transactions. To design a uniform
high performance server to meet customer expectation at all times is impractical
and cost-ineffective. The service differentiating Internet servers (SDIS) is designed
to meet customer expectations for evolving diverse applications, such as CM data
with time constraints that need to be met, and e-commerce transactions that need
classiﬁed services based on revenue generation. The quality of services of SDIS are
enforced by categorizing requests into different groups based on task types and client
identiﬁcations, then allocating resources based on the priority groups.

What is the appropriate model for a SDIS? After extensive literature review and
analysis of existing web server systems, including Apache web server and some pep-
ular load balancers used in Internet services, a queueing model which allows vari-
ous resource allocation algorithms is designed to simulate the web server processing

procedure. Modular framework of the web server model allows modeling of server

136

processing with various architectures. CPU processing capacity, network bandwidth,
open connections and memory capacity are critical resources in the server model.
Task classiﬁcation policies, methods of task differentiation, and desired Q08 in each
class of tasks are discussed. An incoming request can be assigned to a priority group
on the basis of the client, network, content, or owner of the requested object, and
current server load situations. It can also inherit a performance level deﬁned by low
level network protocols.

Can desired performance being achieved in SDIS? To answer this question, we
appraise the performance of a SDIS using trace input from real web servers. Ex-
perimental study and analyses prove that under high system utilization, a service
differentiating server provides signiﬁcantly better services to high priority tasks at
minimal cost. to low priority tasks. Through analytical modeling and simulation
study, we show the feasibility and performance beneﬁts of the SDIS. Admission con-
trol, prioritized scheduling, and task assignment policies are evaluated with respect
to the performance assurances of different groups. A few limitations, such as overload
effects of a web server and starvation problem of low priority tasks are also discussed
in this dissertation.

How to maintain stable performance of a web server? The most important perfor-
mance metrics of a web server is the response delay and throughput. Experimental
results indicate that the response delay increases slowly with server load until the
server load achieves the “overload threshold”. The response delay of an overloaded
server is several orders of magnitude higher than normal value thus unacceptable
to end users. Throughput of an overloaded server also decreases due to “snowball”
effects and resources contention overhead. An admission control scheme, ACES, is
designed to assure the user perceivable response performance and server throughput.
Experimental evaluation proves the effectiveness of the ACES algorithm.

How to enforce quality of services in SDIS? Due to the diversity of request rates

and applications in web servers, static resource allocation mechanisms fail to enforce

137

(.208 of classiﬁed tasks with the dynamics of the server environment. Two charac-
teristics of the server workload need to be handled in resource allocation: the cyclic
fluctuation of trafﬁc intensity, big variance of task processing time and relatively con-
sistent service time of the same type of web objects. We propose to provide different
level of services based on the server workload characteristics. Different levels of QoS
are assured by periodical allocation of system resources based on the estimation of
request rate and service requirements of prioritized tasks. Admission of low priority
tasks is restricted or blocked during high load periods to maintain the throughput of
high priority tasks and response delay boundary of all the accepted tasks. Theoret-
ical analysis proves the feasibility of the algorithm. Experimental study shows that
effective control of throughput and response delay boundary to the prioritized tasks
is achieved under various workload situations.

While answering these questions, we present a framework of a service differenti-
ating Internet server in this dissertation. SDIS is not simply another type of web
server, it has much deeper and broader influence in the future of the Internet appli-
cations. Traditionally, the content published in the Internet is available for “free”,
and the content providers have no obligation to ensure that the user retrieval process
is a pleasant experience. However, the introduction of SDIS changes the egalitarian
philosophy of the Internet and it servers. Alternative levels of service quality become
available at certain costs. Users have more choices on how services are provided,
and the service and content providers can increase their margin proﬁt by allocating
resources to value added transactions.

Today’s Internet suffers from its own success. Technology designed for a network
of thousands of computers is laboring to serve a network with millions of hosts. The
Next generation Internet (NGI) program was initiated to offer reliable, affordable,
and secure information delivery at rates thousands of times faster than that of today.

In the near future, the Internet will provide a powerful and ﬂexible environment for

138

business, education, culture, and entertainment. People will use this environment
to study, work, shop, entertain, and manage ﬁnances. The customer will be able to
choose among different. levels of services with varying prices.

One goal of the NGI program is to develop and test new network services and
techniques. These will include advances in quality of service provisioning. Several
techniques, in bringing in QoS into the network, have already existed as individual
con'iponents for a few years, such as class of services (COS) in ATM networks, RSVP
based integrated service framework, and PHB based differentiated service on the
Internet. However, the application layer (.208 is still a new ﬁeld that needs to be
explored. Substantial system integrations across layers are be required for them to
provide seamless support for QoS aware applications. The study of SDIS and any
layer QoS collaboration strongly supports the objective of the NGI advances.

In this dissertation, we have built a general simulation model of SDIS and evalu-
ated its performance under a set of admission control and scheduling schemes. There
is still lots of work that needs to be done to complete the building of a framework
of SDIS. When service differentiation is implemented in a real system, lots of factors
need to be reconsidered, such as context switching overhead and policies, memory al-
location, and inﬂuence from some Optimization of the operating system. A prototype
of SDIS will be implemented to compare the results among real system, simulation
and theoretical derivation. As we stated earlier, to effectively control the resources
allocated to tasks in a server, accurate estimation of trafﬁc trends and service re-
quirements are needed. More extensive workload characterization work should be
conducted. For example, the service time and response time distribution versus server
load and connection number would provide adequate guidelines for efficient admission
and overload control algorithms. Study of QoS aware server management issues will

be also investigated in future.

139

Bibliography

[1] E. J. W. West, “Using the internet for business - web oriented routes to market and
existing it infrastructures,” Computer Networks and ISDN Systems, vol. 29, pp. 1769
— 1776, July 1997.

[2] T. Wilson, “E-biz bucks lost under $51 strain,” Internet Week Online, May 20 1999.
http: / / www.internetwk.com / lead / lead052099.htm.

[3] N. Bhatti and R. Friedrich, “Web server support for tiered services,” IEEE Network,
pp. 64—71, September/ October 1999.

[4] J. Almeida, M. Dabu, A. Manikutty, and P. Cao, “Providing differentiated quality-of-
service in web hosting services,” in 19.98 Workshop on Internet Server Performance,
June 1998. http://www.cs.wisc.edu/”cao/publicationshtml.

[5] X. Chen and P. Mohapatra, “Providing differentiated service from an
internet server,” in proceedings of IEEE Internet Conference on Com-
puter Communications and Networks, (Boston, MA), October 1999.
http: / / www.cse.msu.edu / rgroups / isal / pubs / conf / .

[6] L. Eggert and J. Heidemann, “Application-level differentiated services for web
servers,” In World Wide Web Journal, vol. 3, no. 2, pp. 133—142, 1999.
http: / / www.isi.edu / ”larse / papers / index.html.

[7] Integrated Services (intserv). http://www.ietf.org/html.charters/intserv-charter.html.
[8] http: / /www.ietf.org/html.charters/diﬁ'serv-charter.html.

[9] X. Xiao and L. M. Ni, “Internet qos : the big picture,” IEEE Network Magazine,
pp. 8—18, March/April 1999.

[10] K. G. Coffman and A. M. Odlyzko, “The size and growth rate of the internet,” First
Monday, 1998. http: / / www.ﬁrstmonday.dk / issues / issue3_10 / coffman / index.html.

[11] R. H. Zakon, “Hobbes’ internet timeline v3.3,” 1998.
http: / / info.isoc.org / guest / zakon / Internet / History / HIT.ht ml.

[12] http: / / cnn.com / TECH / computing/ 9905 / 24 / sitefall.idg/ .

[13] V. N. Padmanabhan and J. C. Mogul, “Using predictive prefetching to improve world
wide web latency,” in ACM SIG COMM Computer Communication Review, July 1996.
http: //http.cs.berkeley.edu/ ~padmanab/index.htmlaljtPublications.

[14] A. K. Iyengar, E. MacN air, and T. Nguyen, “An analysis web server performance,” in
Proceedings of the IEEE 1997 Global Telecommunications Conference (GLOBECOM
’97), (Phoeniz, AZ), November 1997.

140

[15]

[16]
[17]

[18]

I19]

[23]

[24]

[25]

[26]

[27]

[28]
[29]

[30]

S. Glassman, “A caching relay for the world wide web,” in Proceedings of the First
International Conference on the World Wide Web, (Geneva, Switzerland), May 1994.
http://www1.cern.ch/PapersWWW94/steveg.ps.

P. Cao and S. Irani, “Cost-aware www proxy caching algorithms,” in Proceedings of
the 2nd Web Caching Workshop, (Boulder, Colorado), June 1997.

M. A. R. Wooster, “Proxy caching that estimates page load delays,” in Proceedings of
the Sixth International World Wide Web Conference, (Santa Clara, CA), April 1997.

A. Chankhunthod, P. Danzig, C. Neerdaels, M. Schwartz, and K. Worrell, “A hierarchi-
cal internet object cache,” in Proceedings of the 1996 USENIX Technical Conference,
(San Diego, CA), January 1996. http://netweb.usc.edu/danzig/.

A. Bestavros and et. al., “Application level document caching in the internet,” in
Proceedings of IEEE SDNE’96, The Second International Workshop on Services in
Distributed and Networked Environments, (Whistler, British Columbia), June 1995.

A. Bestavros, “Using speculation to reduce server load and service time on the www,”
Tech. Rep. CS Tech Report BUCS-TR-95-006, Boston University, February 1995.

E. Markatos and C. E. Chronaki, “A tOp-IO approach to prefetching on the web,” Tech.
Rep. 173, ICS-FORTH, Heraklion, Crete, Greece, August 1996.
http: / / www.ccsf.caltech.edu / ~markatos / avg / www.html.

R. Tewari, H. Vin, and et. al., “Resource based caching for web servers,” in Proceed—
ings of SPIE/A CM Conference on Multimedia Computing and Networking {MMCN98),
(San Jose, CA), 1998.

3

A. Reddy, “Evaluation of caching strategies for an internet server,’ in Proceedings of
the International Conference in Multimedia Computing and Systems, (Ottawa, Ontario,
Canada), June 1997.

E. Markatos, “Main memory caching of web documents,” in FProceedings of the Fifth
International World Wide Web Conference, (Paris, France), May 1996.

M. Kurcewicz, W. Sylwestrzak, and A. Wierzbicki, “A ﬁltering algorithm for proxy
caches,” in Proceedings of the 3rd International WWW Caching Workshop, (Manch-
ester, England), TERENA, Trans-European Research and Education Networking As-
sociation, June 1998.

M. Baentsch and et. al., “Introducing application-level replication and naming into
today’s web,” in Proceedings of the Fifth International World Wide Web Conference,
(Paris, France), May 1996.

A. Bestavros and C. Cunha, “Server-initiated document dissemination for the www,”
IEEE Data Engineering Bulletin, vol. 19, pp. 3—11, September 1996.
http: / / www.cs.bu.edu / faculty / best / res / papers / HomeDIS.html.

http: / / www.specbench.org / osg / web99 / results / res99q4 / .

H. F. Nielsen, J. Gettys, A. Baird-Smith, E. Prud’hommeaux, H. W. Lie, and C. Lilley,
“Network performance effects of http/1.1, cssl, and png,” in W3C, SIGCOMM ’97,
1997. http://www.w3.org/People/Frystyk/#Refereed.

http: / / www.cs.wustl.edu / ” jxh / research / research.html.

1411

[31]

[32]

[33]

[37]

[45]

[46]

G. Venkitachalam and T. cker Chiueh, “High performance common gateway interface
invocation,” in Proceedings of the IEEE Workshop on Internet Applications (WIAPP)
’99, (San Jose, CA), July 1999.

A. Iyengar and J . Challenger, “Improving web server perfor-
mance by caching dynamic data,” in USENIX Symposium on In-
ternet Technologies and Systems, (Monterey, CA), December 1997.
http: / / www.usenix.org / publications / library / proceedings / usits97 / iyengar.html.

J. C. Collins, “Beyond cgi: Using the apis,” PC Magazine, April 1998.
http: / / www.zdnet.com / devhead / stories / articles / 0,4413,1600169,00.html.

T. Kwan, R. McGrath, and D. Reed, “Ncsa’s world wide web server: Design and
performance,” IEEE Computer Magazine, pp. 68-74, November 1995.

G. Alwang, “Web servers benchmark tests,” PC Magazine, May 1998.
http:/ / www.zdnet.com / pcmag / features / webserver98 / bench.html.

C. Yoshikawa, B. Chun, P. Eastham, A. Vahdat, T. Anderson, and D. Culler, “Using
smart clients to build scalable services,” in Proceedings of the 1997 USENIX Annual
Technical Conference, (Anaheim, California), 1997.

D. Andresen, T. Yang, and O. Ibarra, “Towards a scalable distributed www server on
networked workstations,” The Journal of Parallel and Distributed Computing (JP DC),
1996. http: //a1exandria.sdc.ucsb.edu/public-documents/annual-report/nod852.html.

A. Mourad and H. Liu, “Scalable web server architectures,” in Proceedings of the IEEE
Symposium on Computers and Communications (1300 ’97), (Alexandria, Egypt), July
1997. http: / / www.computer.org/ conferen / proceed / 7852abs.htm#E37E7.

E. D. Katz, M. Butler, and R. McGrath, “A scalable http server: The ncsa prototype,”
in First International World Wide Web Conference, (Geneva, Switzerland), May 1994.

http: / /traianus.ce.uniroma2.it/dws/ .

V. Cardellini, M. Colajanni, and P. S. Yu, “Dns dispatching algorithms with state
estimators for scalable web-server clusters,” World Wide Web, vol. 2, pp. 101—113,
July 1999. http://traianus.ce.uniroma2.it/dws/WWW99.ps.

O. P. Daman, P. E. Chung, Y. Huang, C. Kintala, and Y.-M. Wang, “One-ip: Tech-
niques for hosting a service on a cluster of machines,” in The sixth World Wide Web
Conference, 1996. http: / / www.bell-labs.com/ user / emerald / paper/ oneip/ oneip.html.

D. Dias, W. Kish, R. Mukherjee, and R. Tewari, “A scalable and highly available
server,” in Proceedings of the IEEE International Computer Conference, (COMP—
CON’96), February 1996. http://www.research.ibm.com/webvideo/pub.html.

A. Iyengar, E. Levy, J. Song, and D. Dias, “Design and performance of a web
server accelerator,” in Proceedings of IEEE INF OCOM ’99, (New York), March 1999.
http: / / www.research.ibm.com / people / i / iyengar / arun2.html.

J. Hu, S. Mungee, and D. Schmidt, “Techniques for developing and measur-
ing high-performance web servers over atm networks,” in INF OCOM ’98, 1998.
http: / / www.cs.wustl.edu / ~ jxh / research / research.html.

R. Braden, L. Zhang, and S. B. et. al., “Resource reservation protocol (rsvp) — version
1.” RFC 2205, Proposed Standard, September 1997.
ht.tp://www.isi.edu/div7/rsvp/pub.html.

142

[47] S. Blake, D. Black, M. Carlson, E. Davies, Z.Wang, and W. Weiss, “An architecture
for differentiated services,” December 1998. RFC 2475.

[48] R. Pandey, J. F. Barnes, and R. Olsson, “Supporting Quality Of Service in HTTP
Servers,” in Proceedings of the Seventeenth Annual SIGA CT-SIGOPS Symposium on
Principles of Distributed Computing, (Puerto Vallarta, Mexico), pp. 247—256, ACM,
June 1998.

[49] “Apache server project.” http: //www.apache.org.
0] http: / / www.internetsolutions.enterprise.hp.c0m / webqos / products / overview / index.html.

[51] S. Chandra, C. Ellis, and A. Vahdat, “Differentiated multimedia web services using
quality aware transcoding,” in Proceedings of the IEEE Infocom 2000 Conference, (Tel-
Aviv, Israel), March 2000. http://www.cs.duke/edu/ surendar/infocom00.pdf.

[52] T. F. Abdelzaher and N. Bhatti, “Web server qos management by adaptive content
delivery,” in IEEE Infocom 2000, 2000. http://www.ieee—infocom.org/2000/papers.

[53] S. Ayandeh, A. Krishnamu, and A. Malis, “Mapping to atm classes of service for dif-
ferentiated services architecture.” draft-ayandeh-diffserv-atm-OO.txt, November 1999.
http: / / search . ietf . org / internet-drafts / draft-ayandeh-diffserv-atm—OO.txt.

[54] J. Pitkow and M. Recker, “Summary of www characterization,” Computer Networks
and ISDN Systems, vol. 30, pp. 551—558, 1998.

[55] J. Gwertzman and M. Seltzer, “World-wide web cache consistency,” in Proceed-
ings of the 1996 Useniz Technical Conference, (San Diego, CA), January 1996.
http: / / www.eecs.harvard.edu/ “vino/ web / usenix.196 / .

[56] F. Douglis, A. Feldmann, and et. al., “Rate of change and other metrics: A live study
of the world wide web,” in Proceedings of the USENIX Symp. on Internet Technologies
and Systems, December 1997.

[57] http://ita.ee.lbl.gov/html/traces.html.

[58] M. F. Arlitt and C. L. Williamson, “Internet web servers: Workload characteriza-
tion and performance implications,” IEEE/ACM Transactions on Networking, vol. 5,
pp. 631-645, October 1997 .

[59] http: / / www.mlanr.net / N A / Learn / popular.html.

[60] S. Williams, M. Abrams, et al., “Removal policies in network caches for world-wide

web documents,” in ACM SIGCOMM, (Stanford, CA), pp. 293—305, August 1996.
http://www.cs.vt.edu/éhitra/pubs.html.

[61] T. T. Kwan, R. E. McGrath, and D. A. Reed, “User access patterns to ncsa’s world wide
web server,” CS Tech Report UIUCDCS-R—95-1934, University of Illinois at Urbana—
Champaign, February 1995. ftp: / /ftp.cs.uiuc.edu/pub/dept/ techJeports/ 1995/ .

[62] C. Cunha, A. Bestavros, and M. Crovella, “Characteristics of www client-based traces,”
CS Tech Report BU-CS-95—010, Boston University, April 1995.
http: / /www.cs.bu.edu/techreports/ .

[63] A. Bestavros, “Demand-based document dissemination to reduce trafﬁc and balance
load in distributed information systems,” in Proceedings of the 1995 Seventh IEEE
Symposium on Parallel and Distributed Processing, (San Antonio, Texas), October
1995. http: / / www.cs.bu.edu / groups/ oceans/ papers / Home.html.

143

[64]

1661

[67]

[68]

[70]
[71]

[72]

[73]

[74]
[75]

J. Pitkow and M. Recker, “Simple yet robust caching algorithm based on dynamic
access patterns,” in Second International World Wide Web Conference, (Chicago, IL),
October 1994.

J. Almeida, M. Dabu, A. Manikutty, and P. Cao, “Providing differentiated quality-0f-
service in web hosting services,” in 1998 Workshop on Internet Server Performance,
June 1998.

http://www.cs.wisc.edu/‘cao/publicationshtml.

M. E. Crovella and A. Bestavros, “Self-similarity in world wide web trafﬁc: Evidence
and possible causes,” IEEE/A CM Transactions on Networking, vol. 5, pp. 835-846,
December 1997.

L. P. Slothouber, “A model of web server performance,” in the Fifth
International World Wide Web Conference, (Paris, France), May 1996.
http: / / www.inria.fr / mistral / personnel / Zhen.Liu / weint.html.

P. Barford and M. E. Crovella, “Generating representative web workloads
for network and server performance evaluation,” in Proceedings of Perfor-
mance’98/ACM SIGMETRICS’98, (Madison WI), pp. 151—160, July 1998.
http: / / www.cs.bu.edu / faculty / crovella/ papers.html.

S. Floyd and V. Jacobson, “Random early detection gateways for congestion avoid-
ance,” IEEE/ACM Transactions on Networking, vol. 1, pp. 397—413, August 1993.
http: / / www.aciri.org/ floyd / papers / red / red.html.

L. Kleinrock, Queueing Systems. John Wiley & Sons, 1976.

H. Schwetman, “Object-oriented simulation modeling with c++/csim,” in Proceedings
of 1995 Winter Simulation Conderence, (Washington, D.C.), pp. 529—533, 1995.

V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira, “Characterizing reference
locality in the www,” in In Proceedings of the Fourth International Conference on
Parallel and Distributed Information Systems (PDIS .96), (Miami Beach, FL), IEEE,
December 1996.

V. Paxson and S. Floyd, “Why we don’t know how to simulate the internet,” in Pro-
ceedings of the 1997 Winter Simulation Conference, December 1997.
http: / / wwwnrg.ee.lbl. gov / ﬁoyd / papershtml.

“http: //ita.ee.lbl.gov/html/contrib/clarknet-http.html.”

Y. Hu, A. Nanda, and Q. Yang, “Measurement, analysis and performance improvement
of the apache web server,” in 18th IEEE International Performance, Computing and
Communications Conference (IPCCC’99), (Phoenix/Scottsdale, Arizona), February
1999. http://www.ele.uri.edu/ hu/research.html.

E. P. Markatos, “Main memory caching of web documents,” in the Fifth
International World Wide Web Conference, (Paris, France), May 1996.
http: / / www5conf.inria.fr / fich_html / papers / Pl/ Overview.html.

J. C. Mogul, “Network behavior of a busy web server and its clients,” Tech. Rep.
Technical Report WRL 95/5, DEC Western Research Laboratory, Palo Alto, CA,
October 1995.

P.Rodriguez, E.W.Biersack, and K.W.Ross, “Improving the latency in the web:
Caching or multicast7,” in 3rd International WWW Caching Workshop, (Manchester,
England), June 1998. http://www.eurecom.fr/ rodrigue/.

144

[79]

[80]
[81]

[82]

[86]

[87]

[88]

A. Erramilli, O. Narayan, and W. Willinger, “Experimental queuing analysis with
long-range dependent packet traffic,” IEEE/A CM Transactions on Networking, vol. 4,
pp. 209—223, April 1996.

N. Bhatti and R. Friedrich, “Web server support for tiered services,” IEEE Network,
September/ October 1999.

P. F ishburn and A. Odlyzko, “Dynamic behavior of differential pricing and quality of
service options for the internet,” in ICE ’98, (Charleston, SC), 1998.

M. Harchol-Balter, M. E. Crovella, and C. D. Murta, “On choosing a task assign-
ment policy for a distributed server system,” in Proceedings of Performance Tools ’98,
Lecture Notes in Computer Science, vol. 1469, pp. 231—242, 1998.

Usability Engineering. Academic Press, 1993.

L. Cherkasova and P. Phaal, “Hybrid and predictive admission strategies to improve the
performance of an overloaded web server,” Tech. Rep. HPL-98-l25R1, Hewlett-Packard
Laboratories, 1998. http: / / www.hpl.hp.com / techreports / 98 / HPL—98—125R1.html.

E. Borowsky, R. Golding, P. Jacobson, A. Merchant, L. Schreier,
M. Spasojevic, and J. Wilkes, “Capacity planning with phased work-
loads,” in Proc. WOSP’98, (Santa Fe, NM), ACM, October 1998.
http: / / www.11pl.hp.com / research / itc / csl / ssp / papers / index.html.

X. Chen and P. Mohapatra, “Lifetime behavior and its impact on web caching,” in
Proceedings of the IEEE Workshop on Internet Applications, (San Jose, CA), July
1999. http: / / dlib.computer.org / conferen / wiapp / 0197 / pdf / 01970054.pdf.

K. Li and S. Jamin, “A measurement-based admission-controlled web server,” in Pro-
ceedings of the IEEE Infocom 2000 Conference, (Tel-Aviv, Israel), March 2000.

K. Sohraby and M. Sidi, “On the performance of bursty and correlated sources subject
to leaky bucket rate-based access control schemes,” in Proceedings of the Conference
on Computer Communications (IEEE Infocom), (Bal Harbour, Florida), pp. 426—434,
April 1991.

X. Chen and P. Mohapatra, “Service differentiating Internet
servers.” submitted to IEEE Transaction on Computers, 2000.
http://http://www.cse.msu.edu/rgroups/isal/pubs/journal/.

M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: Evidence
and possible causes,” IEEE/ACM Transactions on Networking, vol. 5, pp. 835—845,
December 1997. http: / / www.cs.bu.edu/ groups / oceans / papers / Home.html.

R. Morris and D. Lin, “Variance of aggregated web trafﬁc,” in IEEE Infocom 2000,
2000. http: / / www.ieee-infocom.org / 2000 / papers.

A. K. Iyengar, M. S. Squillante, and L. Zhang, “Analysis and characterization of large-
scale web server access patterns and performance,” World Wide Web, pp. 85-100,
1999.

V. S. Frost and B. Melamed, “Trafﬁc modeling for telecommunications networks,”
IEEE Communications Magazine, vol. 32, pp. 70—81, March 1994.

[1111]" Will] Hill] [ll]

31293 02088 06