it):
:x

a I:

v
.

ms

:52. $1.

.

.. many.

.5
Fv‘tl" Y .

.. .. s

. hum?

.ymaxr

1.5..

14 r.
{harm
s

.3 .5
it .c.
v‘ .
N

1

‘3.

.a.m "we

.3...
53.3.1! a.
{$9.}.

7}

$715!! I. . u: .11.
a... . 2.2.2; .u...
.... r? 3r.

‘ a. l)

(.141 .. .
. . _ A
d}! 5......5. .an ..
n A! ‘ (v

a :...z:!:.....:.

I. I \nfuni. .

.. i... :

41:4!)

.. 1... .r .
. .f.

.I 5?

51.3.

5. 4 x

:Il..£3 I .. t

. ..4.a.l¥.ll 2. if”. 1..
13'! 1?... II

J, u.

aid“?

41},

n 2,.
31.
:1...

E .

hmnz hum
if“:

 

 

m LIBRARY
2 Michigan State
0 32 University

 

 

This is to certify that the
dissertation entitled

ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME
SUPERMEDIA APPLICATIONS

presented by

ALEXANDER CHOBANYAN

has been accepted towards fulﬁllment
of the requirements for the

Doctoral degree in Computer Science and

 

 

Enlineering
ﬂm ﬂﬂé

 

Major Professor’s Signature
45M 3; 21332

Date

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution

-.-.-.----.-t-t.— — .u—.-n--.-o--.----.-v-n-n-n—.--.--.--.-u—.—.-.-.—.-.-I—--n-.-.---—-.-.—.-._.-.-.-.—.---.-

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:/ClRC/DateDue.indd-p.1

ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME SUPERMEDIA
APPLICATIONS
By

Alexander Chobanyan

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Computer Science and Engineering

2007

ABSTRACT
ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME SUPERMEDIA
APPLICATIONS
By

Alexander Chobanyan

The emerging development of remote sensing and control technology gives rise to
new types of real-time networking applications that require “pro-active” quality of
service guarantees. This dissertation speciﬁcs Quality of Service parameters char-
acterizing requirements for successful execution of a given real-time tele-operation
over the Internet. The dissertation also provides a statistical model that outputs
the likelihood of a failure of remote real-time task execution based on the history of
behavior of an end-to—end network path and the task requirements. We provided and
compared both parametric and non-parametric approaches, conducted their experi-
mental evaluation and also introduced some mechanisms for improvement of present
measurement technologies of an end-to-end quality of a network path.

Available bandwidth (spare capacity) of a network path is a key characteristic
of the network path quality. We regard available bandwidth as a random time-
series with an existing auto-correlation structure that is realistic and makes channel
quality estimation accurate on real paths over the Internet.

We also consider a set of pro—active measures to save remote real-time task ex-
ecution from a failure once the likelihood of such a failure exceeds an acceptable
threshold. In particular, we give a solution to the Reroute Sequence Planning prob-
lem: to choose an optimal sequence of data—streams reallocations across different
network paths once the QoS of old paths are unacceptable. Optimal reallocation
minimizes extra capacity of a network link that is required for carrying out the

reallocation procedure.

TABLE OF CONTENTS

List of Tables .................................. v
List of Figures .................................. vi
Introduction 1
Background 9
Parametric Approach 19
3.1 Data collection and preprocessing .................... 19
3.2 Modeling available bandwidth ...................... 25
3.2.1 “Crossing probability” ...................... 25
3.2.2 ARCH model ........................... 26
3.2.3 Results from extremal values theory ............... 34
3.3 Experimental evaluation ......................... 36
3.4 Conclusion ................................. 39
Network One Way Delay Trend Detection Mechanism 42
4.1 Main challenges of IC/ CS elimination and trend detection in Pathload 42
4.2 IC/ CS elimination procedure and t-test ................. 46
4.3 Experimental evaluation ......................... 51
4.4 Conclusion ................................. 56
Non-Parametric Approach 57
5.1 Moving block bootstrap ......................... 58
5.2 D-statistic: model verification mechanism ................ 62
5.3 D-statistic: theoretical results ...................... 66
5.4 D-statistic: experimental evaluation ................... 69
5.5 Prediction optimization with weights .................. 74
5.6 R-statistic: improvement of a veriﬁcation procedure .......... 76
5.6.1 Timing ............................... 76
5.6.2 R-statistic ............................. 77
5.7 R-statistic: theoretical results ...................... 78
5.8 R-statistic: experimental evaluation ................... 84
5.9 Limits of a veriﬁcation procedure .................... 88
5.10 Conclusion ................................. 91
Re—route Sequence Planning Problem 93
6.1 Introduction ................................ 94
6.2 Setting Of The Problem And Transference Lemma ........... 97
6.3 Application of Transference Lemma to Problem. An estimation of
minyr |||x7r||| ................................ 99
6.4 The case of [go .............................. 101

iii

6.5 An algorithm for ﬁnding a ........................ 103

7 Summary and Future Research 106
7.1 Summary ................................. 106
7.2 Future research .............................. 109

7.2.1 QoS overlay network infrastructure ............... 109
7.2.2 QoS model improvement ..................... 109
BIBLIOGRAPHY 111

iv

3.1
3.2

4.1

4.2

4.3

4.4

5.1
5.2

LIST OF TABLES

Empirically estimated “Crossing” probability for a path MSU—Berkeley 37
Difference between model-based estimated upper bound and empiri—

cally estimated “Crossing” probability for a path MSU-Berkeley . . . 39
Comparison of PCT/PDT test and t-test performances on nearly
empty network and zero receiver’s processor utilization ........ 54
Comparison of PCT/PDT test and t-test performances on nearly
empty network and 5 — 7% receiver’s processor utilization ...... 54
Comparison of PCT/ PDT test and t-test performances on heavy
loaded network and zero receiver’s processor utilization ........ 54

Comparison of PCT/ PDT test and t-test performances on average
loaded network (around 50Mbps iIOMbps cross-traffic) and zero re-

ceiver’s processor utilization ....................... 55
Valid range of history window sizes ................... 73
P-values for different history window sizes before and after weight

adjustment. ................................ 85

1.1

3.1

3.2

3.3
3.4
3.5
3.6
3.7

3.8
3.9

3.10
3.11

3.12
3.13

3.14

4.1

4.2

4.3

4.4

4.5

5.1

5.2

LIST OF FIGURES

Estimated available end—to—end bandwidth (Mbps). The time-series
is sampled at a rate one observation per minute ............. 4

Available end-to-end bandwidth (Mbps): Michigan State University
(35.9.2014) - Massachusetts Institute of Technology .......... 20
Available end-to—end bandwidth (Mbps): Denver (USA) 198.32.154.187

— Canarie, Montreal (Canada) planetl.montreal.canet4.nodes.planet-

lab.org ................................... 20
Available end-to-end bandwidth (Mbps): Michigan State University
(35.9.2014) - University of California, Berkeley (169.229.50.12) . . . . 21
Available bandwidth before trend removal and inversion ....... 23
Available bandwidth after trend removal and inversion ........ 23
Michigan State University - University of Sydney, Australia (trend
example) .................................. 24
Michigan State University - University of Uupsala, Sweden on the
right (routes oscillation example) .................... 24
Measured available end-to-end bandwidth. ............... 28
Simulated ARCH 2(1) observations with [30 = ,31 = 0.25. ....... 28
Fisher Information: standard deviation of 5‘0 ............. 31
Fisher Information: standard deviation of 61 ............. 31
Fisher Information: correlation between 31 and [30 .......... 32
“Crossing” probability for a threshold M=25 Mbps and a number of
observations N = 50 ........................... 35
Empirically estimated 95% upper bound over empirically computed
“Crossing” probability for a path MSU-Berkeley ............ 40
Shortcomings of IC/ CS elimination procedure in Pathload: before
elimination ................................ 45
Shortcomings of IC / CS elimination procedure in Pathload: after elim-
ination ................................... 45
OWD pattern at the presence of packet loss. Trains were sent at a
rate 40 Mbps higher than available at that time bandwidth. ..... 49
OWD pattern at the presence of packet loss. 'ITains were sent at a
rate 30 Mbps higher than available at that time bandwidth. ..... 50
OWD pattern at the presence of packet loss. Trains were sent at a
rate 20 Mbps higher than available at that time bandwidth. ..... 50

Window of size N = 3 slides across the history data of size n = 12 for
prediction of a crossing probability P over coming N = 3 observations 59
Probability density functions for a p—value conditioned by Null and
Alternative hypothesis ........................... 63

vi

5.3

5.4

5.5

5.6

5.7

5.8

6.1

Evolution of the crossing probability along with its 95% conﬁdence
intervals throughout the day. M = 65Mbps, N = 5, T = 0.43sec.
Probability density functions along with p—values collected for the
route 35.9.4290 (Michigan State University) - 137.189.100.235 (Chi—
nese University of Hong Kong). .....................
Probability density functions along with p—values collected for the
route 35.9.1014 (Michigan State University) — 210.72.135.98 (Shenyang
Institute of Automation, Chinese Academy of Sciences) .........
Reducing the time of self-verification procedure when switching from
R-statistic to D-statistic .........................
Histograms of acceptable range of lengths of history windows before
and after weights adjustment. The black histogram shows the range
distribution after adjustment whereas the gray histogram shows the
range distribution before adjustment. ..................
Dependency of minimal value of a crossing probability for which ver-
iﬁcation procedure works well on the future time-window over which
prediction is made and on prediction/veriﬁcation technique ......

Network graph example .........................

vii

69

70

70

87

90

Chapter 1

Introduction

Over the past several years the importance of Internet-related technologies has
rapidly increased. The growth of the Internet has not only a quantitative nature but
also affects the type of information that may be transferred and consequently the
methods for real-time interaction between communicating parties. Many emerging
Internet-based technologies may beneﬁt not only by transferring traditional media-
types, such as video and audio, but also by remotely manipulating, touching and
feeling a remote object.

Tele-medicine, for example, can potentially allow a doctor to touch and feel
a patient thus further expanding opportunities for conducting remote diagnostics.
Tele-education opens wider possibilities to learn about different objects via touching
and feeling them. Tele-marketing applications can be designed in order to allow a
customer to touch the item he wants to purchase before making a ﬁnancial transac-
tion.

Video, audio, haptic, temperature, control commands and other media used for
tele-operations are uniﬁed under the term “Supermedia” that can be considered as
a signiﬁcant extension of traditional multimedia information streams.

This dissertation has been written in the frames of a “Supermedia” project that
assumes developing and experimentally testing new methods, algorithms, and tools
to establish an infrastructure for real-time supermedia transport over the Internet

to support tele—operations.

Traditionally tele-operations take place over isolated controlled lines. In such a
case, the channel has sufficient degree of reliability for conducting important tele-
operations that may have high risk-cost associated with channels failure to give
necessary quality of service to communicating parties.

On one hand, using the Internet as a transport for tele—operations is much cheaper
and therefore more convenient. In addition the Internet is wide-spread and easily
accessible medium all over the world. On the other hand, however, the Internet does
not provide sufficient quality of service guarantees to communicating parties. A net-
work path over the Internet is by its nature unreliable and uncontrolled. Throughput
of such a path at any time-instance always depends on the cross-trafﬁc present in
the path. Therefore real-time tele—operation over the Internet may always have un-
expected network delays, distortions or even communication failures. One can even
argue that the absence of communication reliability represents a principle challenge
for wide deployment of tele—operations over the Internet. We believe that if “on aver-
age” the cost of tele—operation failure is “sufﬁciently” small then such a tele-operation
still can be performed. Under “average” failure-cost we assume the product of the
cost associated with tele-operated task failure multiplied by the likelihood of such a
failure to happen.

In order to support tale-Operations over the Internet it is vital to develop mecha-
nisms that efficiently estimate for any particular tele—operation the likelihood of the
failure of remote real-time task execution and take pro-active measures to minimize
this likelihood. Moreover, based on his knowledge of the cost associated with a fail-
ure of a particular real—time task the operator should be able to temporarily suspend
real-time task execution until better times. For example, before a robot is directed
to approach a wall, it is important to report to the operator the probability based
on the channel’s history that control over the robot is lost due to network problems
while the robot approaches the wall. Then the operator makes a decision whether
he should begin the wall-approach immediately or to wait until better times.

In this thesis, an available bandwidth (AB) of a network path is considered as

a primary QoS characteristic of the path and probabilistic models of tele-operation

performance are built around this quantity. Available Bandwidth may be brieﬂy
deﬁned as the maximum rate at which a sender can put bits on a wire without
affecting existing cross-trafﬁc, or in other words, as a residual capacity of the network
path.

Previous work [29] concentrated on obtaining the value of the available band-
width averaged over a reasonably small time-window without an attempt to provide
a broader probabilistic picture characterizing the network path. Figure 1.1 shows
an example of how available bandwidth between two nodes located 20 hops apart
may behave over time. As depicted in the plot, available bandwidth may instanta-
neously drOp far below its mean value thus causing degradation of service, such as
distorted video or audio. Despite the fact that such “spikes” may last just a few
seconds, they may be fatal for many real-time remotely-operated tasks. As it fol—
lows from ﬁgure 1.1, the value of the available bandwidth averaged over an arbitrary
large (small) time-window cannot characterize the network path well in terms of its
bandwidth QoS properties. The available bandwidth time—series is directly related
to the cross-traffic that is present in the path. It has been shown [34] that the net-
work traffic has self-similar properties that imply a non-summable autocorrelation
function and long-range dependence between observations. Therefore observations
of available bandwidth cannot be considered as independent, which makes statistical
inference more challenging.

The principal novelty of approaches introduced in this dissertation is in viewing
available bandwidth as a random heavy-tailed time-series with existing autocorre-
lation structure instead of reporting just one expected value while assuming that
it does not change over the measurement time-period. More precisely, this disser-
tation introduces a new (in the context of available bandwidth time series) (208
quantity called “crossing probability” - the probability that the available bandwidth
time-series becomes less than pre-deﬁned threshold over pre-deﬁned time-frame.

This dissertation introduces parametric and non-parametric statistical models
that predict “crossing probability” for a pre—defined threshold and future time-frame.

Also the dissertation introduces signiﬁcant improvement of basic block of present

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mbps
90 r +
80 - .
70 — —
60 ~ _
50 r _
minutes
40 1 n
0 500 1000 1500

Figure 1.1: Estimated available end—to—end bandwidth (Mbps). The time—series is
sampled at a rate one observation per minute.

probe-rate based available bandwidth measurement tools.

The proposed parametric model provides an important probabilistic QoS channel
speciﬁcation in terms of coefficients that allow rapid computation of “crossing prob-
ability”. The aforementioned coefficients are evaluated based on observed available
bandwidth behavior over a reasonable period of time and are updated at run-time
reﬂecting the changing dynamics of a network environment. Both, the critical value
and the time-frame for which we predict should be deﬁned by the QOS speciﬁcations
of a particular real-time task.

The accuracy of the “crossing probability” prediction may vary from path to
path and over time. However, the notion of “crossing probability” on its own is
conceptually novel with respect to the available bandwidth estimation. Real time
applications can beneﬁt from this approach by reallocating part of the QoS-sensitive
channels over different network paths when the crossing probability exceeds an ac-
ceptable threshold. Charmel reallocation in this scenario happens before something

fatal occurs regarding the real-time task execution but after the probability for this

fatal event exceeds a reasonable value.

Despite its own advantages, the parametric approach relies on values of available
bandwidth reported by presently known measurement tools and for many network
paths these values may not be accurately reported, which signiﬁcantly limits the
range of its applicability. Present measurement tools need around one minute for an
accurate reporting of one observation of an available bandwidth time-series. Conse-
quently, in order to collect a statistically signiﬁcant number of observations one has
to consider the history window of at least 1-2 hours. On another hand, our para-
metric model assumes that the observed available bandwidth time-series is weakly
stationary throughout the history time—frame. In order to make this assumption
realistic we have to choose a history window size as small as possible. Our experi-
ments showed that for many network paths the aforementioned trade-off for a history
window size exists. For such paths the parametric model does its job reasonably well.

However, for many network paths the sampling rate of available bandwidth mea-
surement tools is too small. In addition, when dealing with robotic applications there
is a need to make predictions for future windows of the order of seconds whereas a
parametric approach works for scales of the order of minutes.

The aforementioned factors pushed us to research on improvement of present
available bandwidth measurement tools towards decreasing measurement time while
preserving measurement accuracy.

Measurement of available bandwidth (even as a single value) is a complicated task
since cross-trafﬁc in the network path dynamically changes thus causing change of
the available bandwidth. The best that present measurement techniques can do is
to measure the mean available bandwidth averaged over a certain period of time T.
Decreasing the measurement time T is an important issue because the assumption
that available bandwidth does not change over a smaller period of time is weaker
and more realistic rather than the assumption that the bandwidth does not change
ever larger periods.

Presently two types of approaches to measurement of available bandwidth exist.

Probe-gap models [56, 45] send a pair of packets and make further inference about

available bandwidth based on the measurement of a gap between packets at the
receiver’s side. Probe—rate models [29, 46, 42] send sequences or “trains” of packets
at different rates assuming that a sequence sent at a rate higher than the available
bandwidth causes noticeable positive trend in times that packets in the sequence
need for travelling from sender to receiver. These times are often called one-way-
delays (OWDs). Then the available bandwidth at that time can be found by a simple
binary search. Probe-gap approaches are less intrusive than probe-rate approaches.
However, probe-gap approaches suffer from the so called “interrupt coalescence /
context switch (IC/ CS)” effect [48], which significantly distorts measurements of
packets inter-arrival times. Probe-rate models are affected by IC/ CS phenomenon
as well but not as much as probe-gap models. Probe-rate models send a larger
number of packets and therefore an IC / CS effect can at least be clearly observed
and to some extent considered by a model deployed in the probe-rate measurement
tool.

This dissertation introduces a new metric for OWD trend detection of packets in
a train together with a new method for dealing with the IC / CS effect. We compare
our results with Pathload [29], which to our knowledge is the only relatively reliably
working tool that considers the IC/ CS phenomenon. We demonstrate that our
metric used for trend detection together with our way of dealing with interrupt
coalescence and context switch effects provides much better OWD trend detection
accuracy. It can also better tolerate processor utilization of a receiving machine and
therefore can serve as a basic block not only in Pathload but also can improve a wide-
range of proberate oriented available bandwidth measurement tools that presently
use trend detection algorithm introduced by Pathload’s authors. In particular, with
our approach in most cases we need to send only up to three trains while achieving
the same accuracy as Pathload when it sends 6 trains. We therefore decrease the
measurement time without sacriﬁcing accuracy.

The dissertation deploys a robust OWD measurement module (described above)
for developing a more general non-parametric approach for computation of a “cross-

ing probability”. In contrast to above-mentioned parametric approach that relies

on measurements of the AB time-series made by Pathload [29], the non-parametric
approach relies directly on measurements of indicators showing whether the AB
time—series is smaller or larger than a given value. Let us note that every particu-
lar measurement of available bandwidth made by Pathload is obtained by a binary
search [29], which itself relies on measurements of a number of indicators mentioned
above. Therefore the proposed non-parametric approach is more robust, allows faster
computation of a crossing probability than described earlier parametric approach.
Also non-parametric approach has a larger range of applicability since it does not
rely on any parametric time-series model.

For computation of a “crossing probability” estimate we use modern non-parametric
techniques of statistical analysis of a random time-series such as a moving block
bootstrap [39]. Moving block bootstrap considers dependence [34] between obser-
vations of indicators and helps in providing 95% conﬁdence intervals around the
non—parametric estimator of a “crossing probability”. Since the main goal of our
research is prediction of a “crossing probability,” we also introduce two methods for
a veriﬁcation of the accuracy of our prediction based on training and previously un—
seen testing data-sets obtained by a real-life measurements across multiple Internet
paths. Along with real-life experiments, we present theoretical results that show
that our veriﬁcation procedures are probabilistically legitimate.

In addition to “crossing probability” estimation, this dissertation deals with a
set of actions that communicating parties might undertake in order to keep tele-
operation alive given that “crossing probability” increased above critical threshold.
A real-time application may reallocate, for example, some of data streams through
different routes that may have better (.208 characteristics at given time-period. In
this dissertation we provide preliminary results related to improvement of the solu-
tion of a Reroute Sequence Planning problem [41] (RSP). The RSP problem chooses
an optimal sequence of reallocations of an old set of data-stream paths to a new set
of data-stream paths that minimizes extra network-link capacity needed for carry-
ing out of streams reallocation. We provide theoretical results on the bound of the

extra needed capacity and also give a sketch of an algorithm for performing reroute

sequence planning.

The rest of this dissertation is structured as follows. Chapter 2 provides deﬁni-
tions and describes previous available bandwidth measurement tools. Chapter 2 also
gives a brief background on weak stationarity, self-similarity and describes heavy—
tailed long-range dependent models that were used for network traffic modeling.
Chapter 3 introduces the notion of a “crossing probability” and describes the para-
metric model for “crossing probability” estimation. Chapter 4 deals with an im—
provement of the previous available bandwidth measurement tools with a new One
Way Delay trend detection mechanism. Then Chapter 5 introduces a non—parametric
approach to “crossing probability” prediction that uses a measurement module in-
troduced in Chapter 4. Also chapter 5 introduces model veriﬁcation mechanisms
that help to check on real data-sets wether predicted values of a “crossing proba-
bility” really match on-coming predicted events of a crossing. Chapter 6 introduces
Reroute Sequence Planning problem and corresponding mathematical problem of
Compact Vector Summation. The chapter gives theoretical results for a link capac-
ity bound and provides an algorithm for ﬁnding a sequence of replacements of paths
for data streams that complies mentioned above bound for a capacity. In Chapter 7

we summarize results of this dissertation and then talk about our future work.

Chapter 2

Background

We consider a network path as a set of store-and—forward links Li,z' = 1, 2, ...,k.
The capacity of the i-th link C,- may be deﬁned as a maximum bit rate at which the
sender may transfer information through this link when no other traffic is present.
Now assume that the link is partially used by another communication. Let T (t) be
deﬁned as the number of bits that have been transferred through this link by time t.
The spare capacity of link i, or in other words, its available bandwidth Az- (t) at time
t can be deﬁned as Jill-(t) E Ci — limT_,0 Ti(t+T7)._Ti(t) or: Ai(t) E Ci — %(t).
Then the bandwidth A(t) available at time t of a path can be deﬁned as A(t) E

 

minizl Mk Ai(t). From the deﬁnition it follows that available bandwidth cannot
be measured precisely and therefore needs to be approximated by the mean available

bandwidth at time t for the interval T.

 

T2:(t+T) -Tz'(t)}

AT(t) = min
i=1,...

Jsz' -
The period of time T over which AT(t) is averaged depends on the particular network
application. If just one value of AT(t) is reported, then a smaller T does not nec-
essarily guarantee better results because the measurement accuracy degrades with
decreasing T. However, with smaller T, we have more flexibility when describing the
probabilistic properties of available bandwidth behavior. Naturally, with a higher

sampling rate one can acquire more “behavioral” information while still considering

measurement errors within the frames of a chosen probabilistic model. In general,
the approximation of available bandwidth is (.208 relevant if T is less than the time
that the receiver needs to empty its buffer. Fluctuations of the available bandwidth
should not cause degradation of quality at the receiver as long as those ﬂuctuations
do not affect the number of bits that the receiver needs to receive to keep its buffer
non-empty.

Signiﬁcant work that has been done in mean available bandwidth estimation is
well reﬂected by Strauss, et a1. [56] and Hu, et a1. [27]. In short, deployed bandwidth
estimation models can be divided into probe gap (or packet dispersion) and probe
rate classes.

Probe gap models used in Spruce [56] and Delphi [50] analyze delay between
probe packets and make further inference about available bandwidth based on that
delay. Probe gap models can be further divided into direct and iterative approach
groups.

Tools utilizing the direct approach [50, 56] sample available bandwidth directly
analyze either delays between packet arrivals or packet travel times. An iterative
approach assumes step-by-step convergence to the tentative range where the avail—
able bandwidth resides. Each step represents a comparison of available bandwidth
against a speciﬁed rate-value.

The classical direct approach assumes ﬂuid cross-traffic as well as an existence
of a tight and narrow link with known capacity C. We accept terminology where
a “tight” link is the link with smallest available bandwidth among all links in the
path whereas the “narrow” link is the link with the smallest among links in the path
capacity. Fluid cross-trafﬁc model assumes that at the input of a router during any
arbitrarily small time interval 6 arrives the cross-trafﬁc of R50 bytes where RC is the
average cross-traffic rate for a time-period 6.

The formula for available bandwidth A can be derived as follows. If RC exceeds
an available bandwidth A then there will be an increase in a queue of the router at
the tight and narrow link caused by measurement packets. Let 61 be a gap between

packets in incoming sequence and 52 be a gap between packets after the tight and

10

narrow link. Also let Aq be the queue increase for the time 61 and L be the size of
a measurement packet. Then Aq can be expressed as Aq = (RC — A)61 where RC
can be viewed as an arrival rate whereas A can be viewed as a service rate for the
measurement data. Also one can write: (51 = L/Rc and therefore Aq = ﬂ—Rﬁg—Az.
Then taking into account that capacity C is the rate at which router services input
queue (measurement packets + cross-trafﬁc) we can state that increase in the queue

for the time 61 can be processed for the time

Ad _ — _
C, Rec,

Then gap between packets at the output of the slow link can be expressed as

L(Rc — A)

6 =6 Adz—.6
2 1+ 1+ RcCt

Taking into account that RC = 5L; one concludes that the only unknown parameter
in the above equation is the available bandwidth A. Therefore available bandwidth

can be expressed as
L Q62

A=Ct‘3—1( L

1)
In the language of rates the above formula can be rewritten as

Ct
Rout

 

A = Ct — Rc( - 1)
where Rout E 3% is the rate at which measurement packets are received. A ﬂuid
trafﬁc model approach implies that the number of packets in the sequence sent
with the rate RC does not signiﬁcantly affect measurement accuracy. One may even
conclude that an analysis of a pair of packets demonstrates performance similar to
analysis of a large sequence of packets.

Despite seeming advantages of fast sampling of exact values of available band-
width, the direct approach has several signiﬁcant drawbacks which according to our

point of View makes it inapplicable to available bandwidth measurement.

11

First, there might not exist a revealed tight and narrow link in the network
path. Even if such a link exists then its capacity still may not be known. Second, if
the packet-size of a cross-trafﬁc is suﬂiciently large then amount of cross-traffic in-
serted between measurement packets should be represented by discrete values which
signiﬁcantly falls out of the ﬂuid model. Jain, et al. demonstrated in [31] that mea—
surement error increase from 2% to 40% when cross-traﬂic packet-size is 1500B and
the number of packets in the measurement sequence drops from 100 to 10. However,
an increase of number of packets in the measurement sequence also is faulty because
of the inﬂuence of the long measurement queue onto itself [40].

An iterative probe—gap approach [44, 43, 27, 49] is also based on the assumption
that if the rate at which measurement sequence is sent exceeds available bandwidth
then gaps between measurement packets increase. The iterative approach, however,
does not rely on a known capacity of the tight and narrow link and does not make
direct inference about value of available bandwidth based on a single measurement.
If gaps between measurement packets increase then the only conclusion made by an
iterative approach is that the available bandwidth is lower than the rate at which
packets were sent. A variation of this approach estimates rates at which packets
were sent and received. Then available bandwidth is believed to be lower than
the rate at which packets were sent if the rate at which packets were received is
signiﬁcantly lower than the rate at which they were sent. Various search methods
are applied for ﬁguring out what is an approximate value of the available bandwidth.
Hu and Steenkiste for example do linear rate variation [27] whereas the Pathchirp
measurement tool [49] exponentially increases inter-arrival times between successive
packets.

All above-mentioned tools and techniques, however, do not consider the effect of
cross-traffic burstiness. Jain and Drovolis note in [31] that basic queueing theory says
that queues can increase even before server utilization reaches 100%. And as it was
shown by Liu, et al. [40], non-empty ﬂuctuating queues at routers are responsible for
constant underestimation of an available bandwidth by all above-mentioned probe-

gap based iterative techniques. Such an underestimation happens because if the

12

rate at which packets are sent begins to increase then gaps between measurement
packets begin to increase far before the rate at which packets were sent reaches the
point of an available bandwidth.

Probe rate iterative models also utilize sequences or trains of packets sent at
different rates. The key difference between probe—rate approaches and probe—gap
models is that probe-rate approaches work with packet travel times or so called One
Way Delays (OWD) for determining whether the rate at which sequence of packets
was sent is higher or lower than the available bandwidth.

For example, the probe rate model used in Pathload [29] uses trains of packets
sent at different rates. If the delay between packets at the receiver shows an in-
creasing trend, then this trend may be considered as an indicator that the rate at
which packets were sent is higher than the available bandwidth at that moment. If
there is no indication of an increasing delay trend then the rate is smaller than the
available bandwidth. Pathload then does a binary search to ﬁnd the rate that is ap-
proximately equal to the available bandwidth. As it was stated above, ﬂuctuations
of queues at routers can increase gaps between packets. However, a highly revealed
OWD positive trend in a sufﬁciently large sequence of packets unambiguously points
to the inability of one of the renters in the network path to serve its queue, which
can be interpreted as a lack of available bandwidth of the path with respect to the
measurement trafﬁc rate.

Both probe rate and probe gap measurement approaches suffer from known in-
terrupt coalescence / context switch (IC/ CS) effects [48]. If interrupt coalescence
is enabled at the receiver’s network adapter, then the adapter waits for a group of
packets to arrive and only after that issues an interrupt request. Then the entire
group is processed with one interrupt. The direct IC impact on network measure—
ments is that packets are not time—stamped by a receiver after they really arrived but
with a variable delay. Some may wait for an interrupt request for up to hundreds of
microseconds, which signiﬁcantly inﬂuences network measurements. Context switch
effects have the same impact. If a processor is utilized by another job, then a cer-

tain random time-period is passed after a packet’s arrival before it is processed and

13

time-stamped by a receiver.

It is fairly complicated to consider the IC / CS inﬂuence when only pairs of packets
are sent. To our knowledge there are no known probe-gap (packet dispersion) based
available bandwidth measurements that consider the IC / CS effect. Pathload is the
only tool from the probe-rate approaches that partially addresses the IC / CS issue.
We therefore here-and-after focus our attention on Pathoad as the most stable and
accurate end-to—end available bandwidth measurement tool to our knowledge.

Pathload checks the time that a system needs for delivering packets from network
adapter to upper layers. All packets that are received within this time-frame are
assigned to one group or burst. Then in a packet train, Pathload counts only one
packet out of each burst and discards the rest of the packets.

After IC/ CS related ﬁltering, Pathload divides the remaining N packets into
F x x/N groups and takes the median from each group. The remaining packets
are discarded as well. Finally two metrics are computed based on remaining F

measurements. The Pairwise Comparison Test (PCT) metric is evaluated as

I‘ ,
S _ Zk=2[(A/[k >flfk_1)
PCT_ I‘__1

 

where I is an indicator function evaluated to one when the median of the k-th group
M k is larger than the median of the k —1st group and to zero otherwise. As it follows
by deﬁnition if a train has an increasing trend then the PCT metric approaches 1.
If there is no trend, then the PCT metric should be close to zero. The Pairwise

Difference Test (PDT) metric is deﬁned as

My — M1
Z£=2 Ill/[k —- Mk_1|

 

SPDT =

If the train does not have an increasing OWD trend then SPDT should vary around
zero, whereas in a presence of an increasing trend it should be close to one. The
PDT metric was introduced for detection of a trend with a signiﬁcant start—to—end

OWD difference. Finally, two tests together decide what to report for a given train

14

of packets.

The PCT and PDT tests represent a basic mechanism of trend detection in
Pathload. In general, the number of packets in a train, the train packet size, and
the bandwidth-rate search mechanism may vary among approaches. For example,
Man, et al. [42] introduce a light-weight train-based tool that is less intrusive than
Pathload on one hand but sacriﬁces accuracy on the other hand. Nishikawa, et
al. [46] propose to evaluate the approximate distribution of available bandwidth as
of a random quantity. Basic metrics in these works remained, however, the same as
they were introduced ﬁrst in Pathload.

Pathload is believed to be slightly more intrusive than tools using the probe gap
model. To our knowledge Pathload is, however, the most stable and accurate among
available bandwidth measurement tools to date. Also Pathload creators propose a
way to consider the IC / CS inﬂuence and therefore Pathload is to our knowledge the
only tool that considers an IC/ CS effect.

The accuracy of Pathload is nevertheless far from ideal. Recent research [29] have
proved that Pathload’s measurements are much more accurate for higher values of
available bandwidth than for the lower values. Also our experience with Pathload
shows that for many network paths, in a long run, Pathload reports many miss-
ing values and sometimes is unable to catch instantaneous rapid drops of available
bandwidth that are critical for QoS inference about the channel. Pathload, up to
a certain degree, considers IC/ CS inﬂuence, but still its performance signiﬁcantly
depends on the processor utilization of measurement machines. Therefore for our
experiments we had to use machines with a very light processor utilization factor.

From the deﬁnition of available bandwidth it follows that it is tightly related with
the cross-trafﬁc present at the moment in the network path. Cross-trafﬁc deﬁned as
%(t) represents a random time-series that has been extensively analyzed by both
statisticians and network engineers for its statistical properties. If the tight link
remains unchanged throughout the measurements (that is usually the case) then
the available bandwidth time-series can be evaluated by subtracting the cross-traffic

from a tight link’s unchanged capacity. It follows that in the aforementioned case

15

the available bandwidth and cross-trafﬁc are exactly the same time—series in terms
of their statistical characteristics. Therefore we believe that the model chosen for
available bandwidth characterization should incorporate at least basic properties of
models that are used presently for Internet trafﬁc modeling. Below we provide some
background of such models.

All parametric traffic models assume weak stationarity of the trafﬁc time-series.
Weak stationarity can be deﬁned as follows: Let {X t} be a random time-series that
takes values 1‘1,I2,$3, ...xn at different instants of time. Generally speaking, all
these values are observations of different, dependent random variables X 1, X 2, ..., X n.
We say that random variables X1, X2, ..., Xt, ..., observed over time, form a weakly
stationary time-series if the following conditions are satisﬁed: expected values of all
Xt’s are the same, and the 7X(t, h) deﬁned as Cov(Xt+h, Xt) does not depend on
t. 7X(t, h) = 7X01.) is called the autocovariance function of the time—series (ACVF)
at lag h. p X(h) deﬁned as p X(h) E % is called an autocorrelation function
(ACF) of the time-series at lag h. Weak stationarity means that the correlation
picture remains the same under any shift in the time domain. ACF therefore can be
treated as a basic characteristic of a weakly-stationary time-series.

Leland, et al. [34] showed that network trafﬁc possesses an important char-
acteristic called self-similarity. Self-similarity can be formally deﬁned as follows:
let pX(h) ~ h'3 as h —+ 00, 0 < i3 < 1. For an integer m = 1,2,...,oo let
{X gm} be new weakly stationary series deﬁned as X (in) = —71,—,(X km—m +1 +
+ X km): The time-series {Xt} is called second-order self—similar with self-
similarity parameter H = 1 — g if for all m = 1, 2, ..., oo Var(X(m)) ~ Tn—ﬁ and
pX(m)(h) = p X (h) for all h > 0. Most important implications of self-similarity
are “slowly” (slower than l/h) decreasing autocorrelation implying long-range de-
pendence (220:0 p X(h) = 00), and “slowly” decaying variance of the so called
aggregated time-series ( {X (m) (k)} that was obtained by averaging the initial time-
series over groups of size m).

As a motivation for self—similar characteristics of a network trafﬁc one may

consider an efﬁcient modeling of a network traﬂic by a superposition of several

16

“ON / OFF” sources. Each of such a source can reside in two states: “ON” state when
it intensively transfers packets and “OFF” state when it is idle. The model assump-
tion is that the lengths of activity periods are independent identically distributed
random variables. If each source “is driven” by the same distributions then it be-
comes possible to model the behavior of superposition of several such sources. The
important achievement made by researchers from Bellcore was to model “ON / OF F”
periods with the help of inﬁnite variance distributions (such as Pareto with param-
eter a in the range from 1 to 2). Superposition of several such “ONN / OFF” Pareto
sources allows to derive a self-similar trafﬁc model.

There exist many self-similar models by which researchers approximated Internet
trafﬁc. Crovella and Bestavros report in [17] that trafﬁc formed by web browsers
possesses self-similar characteristics. In their study they analyzed more than 500, 000
requests generated by 37 browsers. Each web—browser was modeled as an ON / OFF
source. In particular, it was shown that browser data match heavy-tailed Pareto
distribution. Similar results were shown in [5] and in [4].

Borella and Brewster [8] use UDP packets for measuring packet delays and show-
ing that delays demonstrate long-range dependent nature. Another study [23] shows
that FTP data rate is bursty and the distribution of the number of bytes in bursts
has heavy tails.

Several works show that data associated with video trafﬁc also has self-similar
characteristics. Video compression algorithms assume frames of variable sizes, which
allows to view video trafﬁc as a random time-series. Regular video has scenes with
a lot of motion, and scenes with a few of motion which makes distribution of data in
frames heavy-tailed. Garrett and Willinger, for example show in [21] that transfer
of the “Star Wars” movie coded in JPEG format reveals self-similar nature and
that length of the frame follows heavy—tailed Pareto distribution. Authors show, in
particular that high degree of variability in frame length is associated with scenes
variability.

In another study Beran at al [7] analyzed 20 different Variable Bit Rate video

transfers, which were formed with various coding mechanisms. They showed that

17

long-range dependency exists in all 20 transfers and does not depend on type of
video-encoding or on used scene-types. Tsybakov and Georganas [57] report similar
results.

Zukerman, et al. [60] proposed to use the Poisson Pareto Burst process model for
an Internet traffic. In addition to the mean, variance, and self-similarity parameter
H, the model has an extra “aggregation” parameter A. Zukerman, et a1, believe
that their aggregation parameter signiﬁcantly improves the accuracy of the model.
Li, et al. [36] studied characteristics of streaming media stored in the Internet and
came to conclusion that it has long-tailed distribution and therefore may contribute
self-similar Internet trafﬁc. Aracil, et al. [3] studied FTP and Telnet data arrival
time-series and came to the conclusion that it is self-similar and can be matched to
a fractal renewal process.

Traffic modeling studies have constructed models that visually look similar to
observed time-series and have autocorrelation structures similar to what has been
observed. Also, estimated model parameters based on observed data should remain
unchanged or slowly vary over time. Finally, the model should be relatively simple
and amenable for further statistical analysis. We will follow the same strategy to
unify recent work in Internet trafﬁc analysis, available bandwidth estimation tools,
and time-series study with respect to analysis of the QoS characteristics of a network

path.

18

Chapter 3

Parametric Approach

This chapter introduces the notion of a “crossing probability”. It emphasizes its
importance for tele-operations requiring “pro-active” reaction to QoS degradation,
and gives a parametric model for “crossing probability” estimation based on past

observations of an available bandwidth time—series.

3.1 Data collection and preprocessing

Figures 3.1, 3.2, 3.3 depict the available bandwidth behavior for the period of 800
minutes related to various network paths collected with the help of a Pathload
measurement tool. The distance between nodes varies from 4 to 20 hops. For all
examples in ﬁgures 3.1, 3.2, 3.3 we observe a small trend that is less than 1 Mbps
around the mean value. Also, all examples have bursts reﬂecting the heavy-tailed
self-similar nature of Internet trafﬁc-related series. This work is related to available
bandwidth time-series that are stationary. Therefore, we assume that the mean of
the time-series does not change or, in another words, the time-series of interest has
an insignificant trend.

Sometimes the available bandwidth series may acquire a trend signiﬁcantly ex-
ceeding a reasonable limit (say 5 Mbps). Figures 3.6 shows an example of such a
network path (Between nodes located in MSU and the University of Sydney, Aus-

tralia). The existence of such a trend naturally violates the stationarity assumption

19

 

Mbps

 

 

 

 

 

 

 

 

 

 

 

 

 

 

minutes

I J l l

o I l l
0 100 200 300 400 500 600 700 800

 

Figure 3.1: Available end-to—end bandwidth (Mbps): Michigan State University

(35.9.2014) - Massachusetts Institute of Technology
100 I I I I I I I

"DPS
m—VIIU I I _

80~ *

 

 

 

7o»- -«

 

40- ~

 

mlnutos

 

 

 

30 I I 1 I I I l
0 100 200 300 400 500 600 700 800

Figure 3.2: Available end-to-end bandwidth (Mbps): Denver (USA) 198.32.154.187
- Canarie, Montreal (Canada) planetl.montreal.canet4.nodes.planet-lab.org

20

 

1 m I I I I I I I
l

 

 

 

 

 

 

 

 

Mbps
m d
80 ~ -
70 — -
60 ~ ~
50 - —
40 ~ —
30 ~ .
2o — -
mlnutes
1 0 l 1 l I 4 I l
o 100 200 300 400 500 600 700 800

Figure 3.3: Available end-to-end bandwidth (Mbps): Michigan State University
(35.9.2014) - University of California, Berkeley (169.229.50.12)

which makes impossible to apply any stationary time-series model for further anal-
ysis. We note, however, that if the trend is sufﬁciently small for the period of time
used for the analysis then stationary models still can be applied. For example, if we
use historical data of the last hour for prediction of what may happen in coming 30
minutes and if the ongoing trend is insigniﬁcant for the period 1.5 hours relevant to
prediction then the stationary model still can be applied.

For further ﬁtting of stationary self-similar models, we preprocess the time-
series of interest by subtracting the trend. For all examples considered in ﬁg-
ures 3.1, 3.2, 3.3, the trend-subtraction changes each observation for less than
2 Mbps, which is negligible with respect to spikes that are the primary cause for
instantaneous unexpected degradation of service. Predicting the probability of such
a spike to occur over a certain QoS sensitive time-slice is our primary goal. For
the aforementioned purpose we can neglect 2-3 Mbps trend and further consider the
“adjusted” time—series. Figures 3.4 and 3.5 show the original time-series representing

an available bandwidth between nodes located in MSU and Berkeley along with the

21

corresponding “adjusted” time-series. Figures 3.4 and 3.5 show that trend-removal,
inversion and “lining-up” to zero level do not signiﬁcantly change the behavior pat-
tern of the original time-series of interest. For trend-removal we have used the
Spencer 15-point moving average ﬁlter described by Brockwell, et al. [11] that skips
polynomials of degree three and smoothes everything that has a larger degree.

A Spencer ﬁlter can be brieﬂy introduced as follows. Let {mt} be time-series
representing the trend whereas {Xi} is the original series. Then {mt} obtained by
the Spencer ﬁlter can be evaluated as follows. mt = Zj ant—j’ where aj = 0 if
|j| > 7, aj = a_j if |j| g 7 and [a0,a1, ...,a7] 2 3.30m, 67, 46,21,3, —5, —6, —3].

It is also undesirable to have instantaneous spikes affecting trend estimation.
Therefore a “trimming” procedure is applied before passing data through the Spencer
ﬁlter. The whole procedure can be done at run-time. The size of the window
that slides across time-series observations is 19. We delete two maximal and two
minimal observations, and pass the remaining 15 observations to the Spencer ﬁlter
that outputs the estimated trend. Then the trend is subtracted from the original
time-series, the result is inverted, lined-up to zero, and thus ﬁnally the “adjusted”
time series is obtained.

The following discussion is related to the analysis of the “adjusted” time-series.
Note that in most cases the “adjustment” procedure does not signiﬁcantly distort
data. Even in cases such as the one which is shown in ﬁgure 3.6, if the existing
trend has a “long-term” nature then we still can apply the inference about spikes
made for the “adjusted” time-series to spikes of the original time-series. In such
cases the “zero-level” for the “adjusted time-series” during different parts of a day
corresponds to different “base-levels” of an available bandwidth. Under the term
“base-level” we assume the level of an available bandwidth from which spikes of
“adjusted” time-series have to be subtracted in order to obtain an observed value
of an available bandwidth. For examples on ﬁgures 3.1, 3.2, and 3.3 as well as for
majority network paths that we observed, the base—level remains the same during
the day. For the case shown in ﬁgure 3.6 the base-level changes throughout the day.

One, however, still can make predictions based on a smaller time—scale.

22

100
Mbps
80
70
60

50

30

20

Figure
90

Mbps
oo
7o
so
so
40

30

20

 

 

 

 

 

 

 

 

 

 

 

I l l l l l 1 1 l

 

100 200 300 400 500 600 700 300 900 1000
mlnutos

3.4: Available bandwidth before trend removal and inversion

 

I I T I I I I I I

 

 

 

 

 

 

 

 

 

 

0 100 200 300 400 500 600 700 800 900 1000

minutes

Figure 3.5: Available bandwidth after trend removal and inversion

23

 

 

 

20 -
trend present

 

“Elli; pinball

 

 

 

 

 

n

4 l l
"o 500 1000 mlnutes 1500

 

Figure 3.6: Michigan State University — University of Sydney, Australia (trend ex-
ample)

 

Mbps
9 L

 

 

 

 

 

 

 

 

 

 

 

.__l‘

200 300 I 400 500 600

 

mlnutes

Figure 3.7: Michigan State University - University of Uupsala, Sweden on the right
(routes oscillation example)

24

   

We conclude that if the slope of on-going trend is sufﬁciently small (with respect
to time—series variability) to assume that the “base-level” of an available bandwidth
estimated during the history time-frame remains the same during the time-frame for
which the prediction is made, then trend subtraction does not affect the accuracy
of the prediction.

We do not discuss the rare cases when the routers on the network path frequently
change routes for packets, which makes the available bandwidth time-series unstable,

such as depicted in ﬁgure 3.7.

3.2 Modeling available bandwidth

3.2.1 “Crossing probability”

Available bandwidth is an important QoS characteristic of a network path. As it
was mentioned in previous sections, the mean available bandwidth is not enough to
reﬂect the complex nature of bandwidth dynamics. We believe that the key point
is to create a statistical model for available bandwidth that efﬁciently characterizes
the network path in terms of its QoS properties.

Assume that the operator is about to decide whether to perform a real—time QoS
critical job over a potentially unreliable network path. Then the most important
question can be stated as follows. What is the probability of experiencing a QoS
critical degradation of service during the time required for a given real-time task
execution? An example is the probability that the picture transferred to the operator
from a remotely-controlled walking robot distorts to a point when operator is not
able to understand where the robot is located with respect to an approaching wall.

Now let us slightly formalize the problem. The natural assumption is that the
operator should be aware of the real-time task duration, minimal allowed transmis-
sion rate and the receiver’s buffer size. These three metrics can be considered as

basic QoS characteristics of a given task. The QoS requirements are violated if the

25

mean available bandwidth time-series averaged over time

receiver’s buffer size

 

minimal allowed transmission rate

drops below the threshold of minimal allowed transmission rate during the task’s
execution time. Note that the task’s execution time will correspond to a sequence

of observations of the mean available bandwidth time-series of a length

t *k’ d t'.
N: as 8 ura 2071

 

receiver’ 3 buffer size
minimal allowed transmission rate

 

Now rephrase the main QoS question as follows: What is the probability that the
minimal value over the next N consecutive observations of the available bandwidth
(AB) time-series drops below a given threshold where both the threshold and N
are speciﬁed by the real-time task constraints? For the “adjusted” and inverted
mean available bandwidth time-series, the aforementioned probability can be re-
deﬁned as the probability for the inverted AB-time-series maximum over the next
consecutive N observations to exceed the pre-deﬁned QoS critical threshold. We
believe that ﬁnding this probability should be an ultimate goal and a statistically
correct problem statement for the QoS characteristics of a network path. We here—
and-after refer to this probability as to a “crossing probability”. Even ﬁnding a
reasonable upper bound for such a probability is a good contribution to the QoS
characterization of a network path. Below we build a model that describes the
observed available bandwidth data. It is self-similar and amenable to analysis with

respect to the aforementioned “crossing probability”.

3.2.2 ARCH model

In this work we propose a parametric model that allows quick computation of the
“crossing probability” for any given number of future observations N and threshold

M based on estimated model parameters. We used the “ARCH 2(1)” model [20] for

the description of the mean available bandwidth data. Therefore we ﬁrst give below

26

 

some background on ARCH models and also show reasons why this model has been
chosen as a candidate for the behavior of mean available bandwidth description.
The ARCH(p) (Auto—Regressive Conditional Heteroscedasticity) time—series model
was introduced by Engle [20] in applications to ﬁnance events. The ARCH(p) model
can be deﬁned as an autocorrelated Gaussian noise with dependency between obser—

vations described as follows.

 

p
Xt = (. :30 + Z 3iX52_,-)Zt
i=1

where Zt are independent observations following a standard normal distribution.

The simplest partial case of the ARCH (1) time-series is deﬁned as

 

Xi = (1/130 + 31192.1)21
Finally, ARCH 2(1) time—series will be deﬁned as
Y1 = (.30 + BIYI—Ilzt2

From the deﬁnition of ARCH 2(1), it follows that the model is completely deﬁned
by a pair of coefﬁcients [60,61]. Both ARCH (1) and ARCH 2(1) models were
extensively applied to ﬁnance-related time—series, which at least visually have the
similarity to “adjusted” mean available bandwidth time-series pattern. Before going
any further we provide a picture with “adjusted” mean available bandwidth on the
right and simulated ARCH 2( 1) time-series on the left. As shown in ﬁgures 3.8, 3.9
the pattern of both time-series is the same. Another encouraging factor, as shown by
Embrechts, et al. [19], is the ARCH 2(1) time-series possesses self-similar character-
istics. The ARCH 2(1) model allows further computation of a “crossing probability”
based on the estimated model coefﬁcients. The mathematical analysis of ARCH 2(p)
models, where p > 1, is much more complex and there are no known mathemati-
cal results about “crossing probability” computation for such models. The model’s

arnenability to further mathematical analysis was the crucial argument in favor of

27

 

 

Mbps

70H -1

60H -1

40- -

 

30*

 

 

 

 

 

 

 

 

0 100 200 300 400 500 600 700 000 900 1000
mlnutoo

 

Figure 3.8: Measured available end-to—end bandwidth.
90 1

 

 

 

 

 

 

I T I I I I I T
Mbps
so— -
70- l
601-
50..
4o~
30~
20- ]
i

10- ,

l
o l 1

0 100 200 300 400 500 000 700 000 900 1000
minutes

Figure 3.9: Simulated ARCH 2(1) observations with ,6’0 2 131 = 0.25.

28

considering the case when p = 1. However, we conducted simple statistical data
analysis aiming to show that collected data do not give any signiﬁcant evidence for
choosing ARCH models with p > 1. As it is known from time-series studies [11] the

following quantity called AICC is considered for choosing an optimal order p:

2(p+1)n

AICC: —2L ,3 Me
(.031 p)+,,_p_2

where L is Quasi Log Likelihood Function evaluated as:

l.
4...”...
J

 

n—p
L030, 13,, sp) = — Z [logmo + 61X§+p_1 + + stEH
1:1
2
Xi+p ]
, 2 / 2

and n is the number of observations. The AICC criterion is a function of p, ,60, ..., tip.
The “best-ﬁt” order p is the value that minimizes the AICC criterion for 60, ..., 6p,
which further minimize AI CC for a given p. We computed the AI CC criterion for
30 non-intersecting blocks of observations belonging to a real available bandwidth
time-series. Each block contains 200 observations that corresponds to 3.3 hours
given that the available bandwidth time-series is sampled at a rate 1 observation
per minute. The size of the block was chosen large enough to be representative and
small enough for assuming that model coefﬁcients remain unchanged throughout
block. For each data—block, the AI CC criterion was evaluated for p = 1, 2, and 3.

Below is shown an AI CC minimizing p value for each block.

1, 1, 1,3, 1, 2, 2,3, 3, 1., 3, 1, 1, 2, 2,

2,2,3,3,1,3,1,2,1,1,2,3,2,3,1

Out of 30 trials we have 12 cases when p = 1, 9 cases when p = 2, and 9 cases when
p = 3. From above values it can be noticed that the data do not give us sufficient

evidence to believe that ARCH models with p > 1 work better than the one with

29

p = 1. Then following the well-known paradigm (Occam’s razor) telling that among
models with relatively the same performance, the best model is the simplest one.
We conclude that our decision to choose for our model p = 1 is reasonable.

We, ﬁnally, conducted a test for checking an ARCH model applicability to real-
life measurements of available bandwidth.

Real—life measurements of an available bandwidth time-series were ﬁrst prepro-
cessed (as it was described in previous section) and then statistical test checking data
for heterascidascicity [20] has been performed. The Null hypothesis for such a test
was an assumption that data follows ARCH model whereas an alternative hypoth-
esis was an assumption that ARCH model is not applicable. After trend removal
and inversion for real available bandwidth measurements taken across Internet paths
we obtained high p—values which clearly suggest to accept the Null hypothesis. In
particular, for the path MSU-Australia the reported p-value was 0.237, which does
not allow us to reject Null hypothesis at a reasonable signiﬁcance level.

We conclude that we have sufficient preliminary information to apply the ARCH
model to an available bandwidth time—series and check how well it predicts real
network paths in the Internet.

In order to apply a parametric model to “crossing probability” estimation, one
needs to ensure that the model parameters can be estimated at run-time sufﬁciently
fast. Then the model needs to be mathematically analyzed in order to ﬁnd an
efﬁcient method to compute the “crossing probability” based on estimated model
parameters. Also one needs to be able to build conﬁdence intervals around the es-
timated probability in order to be aware of the model’s accuracy. Then the model
needs to be empirically veriﬁed. We therefore provide below our work on uniﬁ-
cation of results in ARCH 2(1) parameters estimation and “crossing probability”
computation together with the scheme of conﬁdence intervals construction.

ARCH(I) model has been studied extensively with regard to the description of
time-series in ﬁnance. The basic work in ARCH parameter estimation has been

done by Weiss [59], who proposed to estimate ARCH parameters by maximizing the

30

 

Standard error for no estImItor

  
  

 
 

0.16 ‘i’ L
. At
0.14 , “.1. :“\:§‘ ‘V“\“ O \\'y 3"“
0.12 ""““ \‘\:;‘::;:: :.\;§ »:::¢ ‘§::. :';\-;:_-
0.1::‘:‘§‘ \:

£v:::;£t
::Vf‘

 

Figure 3.11: Fisher Information:

31

    
 

   
   

   
     

standard deviation of 01

 

Figure 3.12: Fisher Information: correlation between 31 and 30

Quasi Log Likelihood Function evaluated as:

"—1 2 X2+1
L(ﬂo,/31)= - [108;(50 + #31)“ ) + 2—l
,2 1 so + 41X?

Here n is the number of observations in the window that is used for parameter es—
timation. The parameter n is very important and we will talk separately about
its proper adjustment. The advantage of the QMLE (Quasi Maximum Likelihood)
estimator of [60,131] is that it is asymptotically unbiased. Also it is known [51]
that [00,31] is normally distributed with mean lﬂOtruea ﬁltruel and covariance ma-
trix called the Fisher Information. The normality of estimators gives background
for computing a conﬁdence interval of any level around the estimated values of the
model parameters. The biggest challenge for the QMLE estimator is, however, that
it is not given in a closed form. Therefore ﬁnding estimators for [60, [31] reduces to
a maximization of a function of two variables. The function maximization problem
has been intensively studied and many efficient algorithms exist. However their per—

formance signiﬁcantly depends on the initial point from which they have to converge

32

and sometimes this cannot be done sufﬁciently fast for an on—line implementation.

Bose, et al. [9] proposed another estimator for [I = [30,131]’.

S-_[X:)

 

,1:
82—187,“
l 1{—4——f——}]
where s,- _1— _ [1, x? _1]’,a ' _(13)— _ s,_1,3, and B- — (3’3) 1371?. Although

Bose, et al. [9] give estimators of the [130,131] in a closed form, we run a veriﬁcation
procedure in order to check how the precision and computation time of this approach
may be compared with QMLE. Our simulation included modeling of 8,000,000 dif-
ferent ARCH(l) time-series with varying known lﬂOtruea ﬁltruel- We came to a
conclusion that for many tested values of [60true1f31truelv the estimator proposed
by Bose, et al. [9] is biased whereas the QMLE estimator is unbiased for all tested
values of [(30truev ﬁltruel- Therefore we decided to join two approaches in the fol-
lowing way. We ﬁrst ﬁnd preliminary estimators of [[10, [31] by the Bose, et al. [9]
approach that takes less than a second. Then we put this preliminary estimator
as an initial point of convergence for a QMLE algorithm. In this way the QMLE
algorithm takes an average of approximately 3 seconds to converge.

Since estimators of [130, 61] depend on past observations and therefore have a
random nature, we also provide a way for quick computation at run-time of the
conﬁdence region of any level (by default 95% CR). The 95% conﬁdence region for
[130, 61] is deﬁned as a random region in R2 that contains the point [130, [31] with
probability 2 0.95. This region for the QMLE estimator will have an elliptical form
deﬁned by the equation obtained by Hotelling [26]

_. q
A

(x3 — BYE—118' — me -— 21/2 = Fan—210.95)

where F2,n—2(0-95) is 95% quantile of F-distribution with 2 and n — 2 degrees of

freedom, and 2 is estimated covariance matrix. In order to compute a conﬁdence

33

 

region quickly at run-time we tabulated E for a ﬁxed 77. = 720 and a grid of 60
and ,61. Figures 3.10, 3.11, 3.12 plot standard errors for 60 and 61 as well as the
correlation between [3‘0 and 61 as functions of true values of ,60 and 61. Standard
errors and correlation were calculated by simulating 1000 independent ARCH 2(1)
time-series for each ﬁxed pair of 60, 181 and ﬁnding the difference between estimated
and true values of the aforementioned vector. 60 and 61 then were varied within
the region (05,3.5) and (0,06), respectively. This region covers a reasonable re-
gion of values that 60 and 61 might take while describing the available bandwidth
behavior within the range of 200 Mbps. It is known that the estimated covariance
matrix is proportional to nTl. Therefore, when having available tables visualized
in ﬁgures 3.10, 3.11, 3.12 for n = 720, one can recompute E for an arbitrary n in

less than a second.

3.2.3 Results from extremal values theory

Below we show some mathematical results that allow us to compute the “crossing
probability” for any speciﬁed threshold M, number of observations N, and estimated
model parameters 130, 51. These results also help us to draw a 95% conﬁdence inter-
val for a “crossing probability” by considering the fact that the “crossing probabil-
ity” is a monotonically increasing function of both 60 and 61 and ﬁnding “crossing
probabilities” for sufﬁciently many points on the 95% ellipse in the 60, 61 space.
Embrechts, et al. [19] summarize recent work in extreme values providing formulas
for ﬁnding “crossing probability” for various time-series models. In particular for

the ARCH 2(1) model, Embrechts, et al. have

1 _ I, fit
Nlim P(N-E max{X1, ..., XN} S Aim) = exp Clium
—+oo

that allows us to rewrite this result for “crossing probability” P and large N as
follows

_ , .—K.
P_:1_exp (3le N

34

.o
4&3

.O
m

I

.O
_L
[U1

.0
o
I (’1

 

"Crossing Probability"
0

rev0

 

Figure 3.13: “Crossing” probability for a threshold M=25 Mbps and a number of
observations N = 50

u
where H. is a unique root of the equation (267—3 I‘(u + %) = 1. [Win stands for a

threshold for an inverted available bandwidth time—series. It can be related to a
threshold I'll by a simple formula Min = constant — III, where the constant is the
“base-level” around which the original available bandwidth time—series was inverted
(or in other words “mirrored”). Although Embrechts et al. [19] give an expression
for CI, it is more efﬁcient to estimate it empirically off—line for different [30 and
31. For that purpose we simulated 800,000 observations for each pair of 130 and
[11 coming from a grid with range (05,35) and (0,0.6), respectively. Figure 3.13
shows this probability for a ﬁxed N = 50 and M = 25 Ill bps. Then, based on the
probability expression above, this tabulated result can be quickly generalized on—line
to arbitrary N and IV.

Calculating the estimated probability together with its conﬁdence interval gives

35

an opportunity to judge the size of the window 71 used for prediction of 130 and 61
coefficients. On one hand we want n to be as small as possible because we assume
that the model parameters remain unchanged for the time-period related to n. On
the other hand, with larger n, accuracy of the prediction increases with correspond-
ing shrinking of the 95% conﬁdence region. In our experiments, for example, we
put the following condition to the 95% conﬁdence interval for the probability: the
difference between the 95% upper bound and the predicted value should not exceed
0.2 for an arbitrary predicted probability value from the range [0;0.5]. Then for
N E {1, ..., 100} and M E [60; 90], the lower boundary for n will be evaluated to
120. Note that the dependence on N is natural: with increasing of the period of
time for which the prediction is made (duration of real-time task execution), the

period of time used for prediction also needs to be correspondingly increased.

3.3 Experimental evaluation

In this section we describe our method for testing the accuracy of our approach
to available bandwidth modeling. In previous sections we introduced the “crossing
probability” as a characteristic of a network path with respect to available band-
width and the given real-time task that needs to be remotely executed. “Crossing
probability” can be viewed as a characteristic of a current state of a network path
expressed in the form of a prediction of the bandwidth behavior for the time-period
equivalent to the next N observations of the available bandwidth time-series. The
“Crossing probability” approach tells that if a network path resides in the present
state (evaluated with the help of the past n observations) for the period of the next
N observations, then the probability of crossing of a predeﬁned threshold can be
quickly evaluated by a proposed parametric model.

The ﬁnal target of the modeling is a prediction of a probability. The only way,
therefore, to empirically verify the model is to apply the ergodic theorem, i.e., to
have B groups with N observations in each and then count the number L of groups

where the maximum over N observations within the group crossed a pro-deﬁned

36

 

 

 

N\Mbps 60 62 64 66_ 68 E 72 74 76 78 80 82 84 86 88 90
15 0.09 0.09 0.08 0.07 0.07 0.07 0.07 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.04
20 0.12 0.12 0.10 0.09 0.09 0.09 0.09 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.06 0.05
25 0.14 0.14 0.13 0.11 0.11 0.11 0.11 0.09 0.09 0.09 0.09 0.09 0.09 0.08 0.08 0.06
30 0.18 0.13 0.16 0.14 0.14 0.14 0.14 0.11 0.11 0.11 0.11 0.11 0.11 0.10 0.10 0.07
35 0.20 0.20 0.13 0.16 0.16 0.16 0.16 0.13 0.13 0.13 0.13 0.13 0.13 0.11 0.11 0.08
40 0.22 0.22 0.20 0.18 0.18 0.18 0.18 0.15 0.15 0.15 0.15 0.15 0.15 0.13 0.13 0.10
45 0.27 0.27 0.24 0.22 0.22 0.22 0.22 0.17 0.17 0.17 0.17 0.17 0.17 0.15 0.15 0.11
50 0.23 0.28 0.25 0.23 0.23 0.23 0.23 0.19 0.19 0.19 0.19 0.19 0.19 0.17 0.17 0.12
55 0.32 0.32 0.29 0.26 0.26 0.26 0.26 0.21 0.21 0.21 0.21 0.21 0.21 0.19 0.19 0.14
60 0.32 0.32 0.30 0.26 0.26 0.26 0.26 0.24 0.24 0.24 0.24 0.24 0.24 0.21 0.21 0.15
65 0.33 0.33 0.29 0.26 0.26 0.26 0.26 0.22 0.22 0.22 0.22 0.22 0.22 0.19 0.19 0.14
70 0.34 0.34 0.30 0.26 0.26 0.26 0.26 0.24 0.24 0.24 0.24 0.24 0.24 0.20 0.20 0.14
75 0.35 0.35 0.31 0.28 0.28 0.28 0.23 0.25 0.25 0.25 0.25 0.25 0.25 0.22 0.22 0.16
80 0.34 0.34 0.32 0.28 0.28 0.28 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.19 0.19 0.16
85 0.40 0.40 0.35 0.31 0.31 0.31 0.31 0.27 0.27 0.27 0.27 0.27 0.27 0.22 0.22 0.18
90 0.39 0.39 0.32 0.28 0.23 0.28 0.28 0.24 0.24 0.24 0.24 0.24 0.24 0.19 0.19 0.15
95 0.39 0.39 0.35 0.29 0.29 0.29 0.29 0.25 0.25 0.25 0.25 0.25 0.25 0.21 0.21 0.17
100 0.41 0.41 0.35 0.29 0.29 0.29 0.29 0.25 0.25 0.25 0.25 0.25 0.25 0.21 0.21 0.17

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 3.1: Empirically estimated “Crossing” probability for a path MSU-Berkeley

threshold. Then the ratio L / B can be considered as an empirically computed value of
a “crossing probability”. We considered as our design accuracy the following criteria:
within the frames of the model, the difference between the predicted probability and
its 95% upper conﬁdence limit should not exceed 0.2 at any circumstances. Our
simulations of ARCH 2( 1) series show that such “within-the—model” accuracy on a
probability scale for N S 90 dictates the lower bound on number of observations
needed for model parameters estimation as n = 120. Higher accuracy leads to higher
values of a lower bound for n.

In real-life, however, it is impossible to achieve such an ideal design of the exper—
iment as it was described in the previous paragraph because of the following factors.
First, we obtain B groups of measurements in a real scenario that belong to the
same time-series (instead of having multiple independently generated series in the
simulation scenario) and therefore they are somehow dependent. Second, even if we
assume that the model parameters remain the same over the estimation window of
length n (that is reasonable if n correspond to 2 hours) it is too naive to assume that
these parameters will remain the same over the whole time-period needed for collec-

tion of B groups of size N. Recall that present available bandwidth measurement

37

tools allow to make consistently stable measurements at a rate 1 sample per minute.
Therefore, B = 100 reasonably separated groups of N = 100 observations in a group
correspond to time needed for collection of 20,000 consecutive observations. With
presently available sampling rate this time is evaluated to 20,000 minutes or 2 weeks
of consistent measurements.

In order to address the above discussed challenges we propose the following
scheme for estimating efﬁciency of the upper bound on “crossing probability”. We
consider many network paths aiming at those that have visually the same daily be-
havior range. In this section we show an experimental evaluation of one such path
between nodes in MSU and Berkeley that are 20 hops apart. These nodes were
chosen because their hardware parameters and stable working allowed us to take
15,000 continuous observations with the help of a Pathload measurement tool. For
the MSU-Berkeley network path we chose n = 120 and for a ﬁxed N = 50 monitored
how estimated 95% upper bound for a “crossing probability” evolved over the pe-
riod of one day. Therefore we separated our data-set into a training-validation part
that constituted 24 hours or 1440 observations and testing samples. For the ﬁrst
day we evaluated the width of the strip where the estimated probability resided and
then computed the empirical probability for the rest of the data. By doing this we
computed the “crossing probability” averaged over the period of 1350 observations.
Given that the behavior of the path did not have any signiﬁcant trend or oscillations
over all measurement periods, we were expecting an average probability estimated
by an ergodic theorem to reside in a strip deﬁned by a ﬁrst day behavior pattern.
Shown in the table 3.1 are estimated empirically values of the “crossing probability”
for different N and threshold M varying within the range 15-100 observations and
60—90 Mbps, respectively. As was expected, it turned out that for all thresholds M
and sizes N empirically computed probability resides below the estimated bound.
Table 3.2 shows the difference between the empirically computed probability and
the estimated upper bound, whereas ﬁgure 3.14 shows two surfaces where the upper
surface represents an upper bound estimated by the model and the lower surface

represents an empirically computed probability. Figure 3.14 shows, in particular,

38

 

 

‘Wbps 60 62 64 66 68 W721 74 '76'78 80 82 84 86 88 90
15 0.06 0.06 0.06 0.06 0.06 0.05 0.04 0.05 0.05 0.04 0.04 0.03 0.03 0.03 0.03 0.04
20 0.08 0.07 0.07 0.08 0.07 0.06 0.06 0.07 0.06 0.05 0.05 0.04 0.04 0.04 0.04 0.05
25 0.10 0.09 0.09 0.10 0.09 0.08 0.07 0.08 0.07 0.06 0.06 0.05 0.05 0.05 0.04 0.06
30 0.11 0.09 0.09 0.10 0.09 0.08 0.07 0.09 0.08 0.07 0.06 0.05 0.05 0.05 0.05 0.07
35 0.13 0.11 0.11 0.11 0.10 0.09 0.08 0.10 0.09 0.08 0.07 0.06 0.05 0.06 0.05 0.08|
40 0.14 0.12 0.12 0.13 0.11 0.10 0.08 0.11 0.10 0.09 0.08 0.07 0.06 0.07 0.06 0.09]
45 0.13 0.11 0.12 0.12 0.11 0.09 0.08 0.11 0.10 0.09 0.08 0.07 0.06 0.07 0.06 0.09]
50 0.14 0.12 0.14 0.14 0.12 0.11 0.09 0.12 0.11 0.10 0.08 0.07 0.06 0.07 0.06 0.10]
55 0.14 0.12 0.13 0.14 0.12 0.10 0.09 0.12 0.11 0.10 0.09 0.07 0.06 0.08 0.07 am
60 0.16 0.14 0.15 0.16 0.14 0.13 0.11 0.12 0.11 0.09 0.08 0.07 0.06 0.08 0.07 0.11]
65 0.18 0.16 0.19 0.19 0.18 0.16 0.14 0.16 0.14 0.13 0.12 0.10 0.09 0.11 0.10 0.14]
70 0.20 0.18 0.19 0.21 0.19 0.17 0.16 0.17 0.15 0.14 0.12 0.11 0.10 0.12 0.11 0.16|
75 0.21 0.19 0.21 0.22 0.20 0.18 0.16 0.18 0.16 0.14 0.13 0.11 0.10 0.12 0.11 0.16|
80 0.25 0.22 0.22 0.25 0.23 0.21 0.19 0.21 0.20 0.18 0.16 0.15 0.14 0.16 0.15 0.17|
85 0.21 0.19 0.22 0.24 0.22 0.20 0.18 0.20 0.18 0.16 0.15 0.13 0.12 0.15 0.14 0.17|
90 0.24 0.22 0.27 0.29 0.26 0.24 0.22 0.25 0.23 0.21 0.20 0.18 0.17 0.20 0.19 0.21|
95 0.26 0.24 0.26 0.29 0.27 0.25 0.23 0.25 0.24 0.22 0.20 0.19 0.17 0.20 0.19 0.21|
100 0.26 0.24 0.28 0.31 0.29 0.27 0.25 0.27 0.25 0.24 0.22 0.20 0.19 0.22 0.20 0.23]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 3.2: Difference between model-based estimated upper bound and empirically
estimated “Crossing” probability for a path MSU-Berkeley

that two surfaces are nearly parallel to each other, i.e., that the model reﬂects scales
of both N and M in reasonable agreement with what was empirically observed. Also
ﬁgure 3.14 shows that estimation becomes less accurate when N approaches to n,
which is understandable.

We conclude that for a given channel the model produced veriﬁable and a rea-
sonably accurate prediction. We also note that the model is supposed to do its
job for any pair of network paths and real-time tasks for which parameters 60 and
61 do not fall out of pre-deﬁned regions of variability and also signiﬁcant trend or
routes oscillations do not occur for the period of time required for a real-time task

execution.

3.4 Conclusion

In this chapter we presented a “probabilistic” parametric approach to a description
of the available bandwidth behavior. We showed that after trend subtraction and
inversion the available bandwidth can be described by the ARCH2(1) parametric
model. We have intensively studied and uniﬁed known results in ARCH 2(1), tab-

39

"Crossing probability”

 

  

7A
Threshold: \‘ 120
Mbps 6° 80 100

40
N (over which prediction is made)

Figure 3.14: Empirically estimated 95% upper bound over empirically computed
“Crossing” probability for a path MSU—Berkeley

ulated relevant characteristics and provided an efﬁcient method for computation of
a “crossing probability”. Then we tested the accuracy of our model on a channel
with relatively stable long—term behavior and proved that the approach has a cer-
tain range of applicability. The most signiﬁcant shortcoming of our experiments
is the sampling rate of Pathload and its unstable output over signiﬁcant number
of network paths. The best Pathload’s sampling rate during our experiments was 1
observation per minute. Recall that T should be of the order of receiver’s buffer time
in order to allow observations to be relevant with respect to a particular real-time
task execution. Also as our simulations show we need at least 120 observations for
accurate enough estimation of model’s parameters. At the same time the smaller is
the size of the history window the more realistic is the assumption that behavior of
available bandwidth time—series is stationary over history! time-frame used for pre—
diction. Taking into account Pathload’s sampling rate we therefore were restricted
with paths with relatively stationary behavior for the period of at least 2 hours.

If in the future there are better available bandwidth measurement tools that are

40

able to sample at higher rates, then the range of applicability of the parametric
approach will signiﬁcantly increase. The next chapter introduces work directed to—
wards increasing the accuracy of probe-rate based available bandwidth measurement

tools and consequently available bandwidth sampling rate.

41

h;

Chapter 4

Network One Way Delay Trend

Detection Mechanism

In this chapter we demonstrate an approach that improves a whole range of “probe-
rate” AB measurement tools that send sequences of measurement packets (called
“trains”) across the network path and capture a positive trend in their travel times
or One Way Delays (OWD). In particular, we show an algorithm for efﬁcient OWD

trend detection and compare it to widely used OWD trend detection tests.

4.1 Main challenges of IC / CS elimination and trend
detection in Pathload

Jain and Dovrolis have shown in [30] that Pathload demonstrates reasonably good
performance On paths that have tight and narrow link in the middle of the path.
For such cases Pathload’s error was around 5% of available bandwidth value when
this value belongs to the range 70-80Mbps and around 10% of available bandwidth
value when the value is smaller than 10Mbps.

In such cases, however, the slow link, which lies in the middle of the path,
stretches packets, whereas fast links contract packets. In other words, suppose that
packets are sent with a certain ﬁxed delay. When they pass from a faster link to a

slower link the delay between packet arrivals becomes smaller or is eliminated. In

42

contrast, when packets pass from a slow link to a fast link, then the delay between
packets increases. Therefore by having a slow link in the middle of the path the
authors are guaranteed that there will be a signiﬁcant delay between packet arrivals
at the end which ﬁrst, makes trend detection easier and second, makes measurements
less vulnerable to IC / CS effect.

Another “friendly” factor in the experimental setup was good match of a length
of a packet train to the path across which train was sent. More precisely, when
a train was sent at rates higher than available at that time bandwidth, routers in
the middle of the path were experiencing instantaneous increase of packet queues
but at the same time were not dropping any signiﬁcant portion of packets. That
experimental design allows one to avoid addressing the problem of packet loss. It is
unreasonable, however, to assume that an algorithm of an ideal match of length of
a train to queue length on intermediate routers can be developed.

Ubik, et al. [58] performed an extensive analysis of Pathloads Operation for var-
ious networks, values of available bandwidth and values of a cross-trafﬁc. Tables
shown in their work clearly show that performance of a Pathload is far from ideal.

We set up an experiment where queues on routers/switches were small enough
for dropping signiﬁcant portion of packets sent by a Pathload at rate higher than the
available bandwidth. In some of our experiments Pathload’s PCT/ PDT test falsely
reported trend for approximately 25% of analyzed trains sent at the rate lower than
available bandwidth while not reporting a trend for 14% of trains sent at the rate
higher than available at that time bandwidth.

Even when experiencing packet losses Pathload still can report available band-
width from a correct range despite the high error rate of trend detection at the level
of a particular train. Pathload creators recognized the challenge of packet losses and
introduced a very artiﬁcial way of partially avoiding this problem. They have set up
two levels of packet losses. If the percentage of losses in at least one train in a set
of trains sent at some ﬁxed rate is higher than HIGH LOSS RATE than the rate is
automatically believed to be higher than the available bandwidth. If losses in more

than 50% of trains sent at ﬁxed rate exceed MEDIUM LOSS RATE then the rate

43

 

is believed to be higher than the available bandwidth as well. These two levels of
percentages are initially set in a Pathload as 15% and 7%, respectively.

The artiﬁcial way described above of dealing with packet losses has its own
shortcomings. First, HIGH and MEDIUM loss rate parameters have to be nicely
tuned. Second, packets may be lost because of a processor utilization at receiving
computer. As a consequence of implemented in Pathload packet-loss dealing mecha-
nism, Pathload fails to report correct bandwidth even when processor utilization at
receiving machine reaches 5%. This can be considered already as a signiﬁcant limi-
tation of a tool usage especially for a long-lasting measurements given that a receiver
machine is running under any of wide-spread non-real—time operating system.

The way how Pathload deals with IC/CS effect is not perfect as well. The time
that the operating system needs for delivering packets to upper layer cannot be
precisely estimated. Pathload’s PCT/ PDT metrics also leave a place for improve—
ment. Figures 4.1 and 4.2 illustrate a typical example of a train misclassiﬁed by
the PCT/ PDT test. The train was sent at the rate which was 10Mbps higher than
available at that time bandwidth. On the left hand-side are plotted one-way delays
of 100 packets belonging to the train. We see that the plot has parts with negative
trend and these parts are strictly “linear”, which is a direct consequence of IC / CS
effect. We can think of IC/ CS as of an additional buffer at which packets spend
certain time and which is periodically emptied. The last packet in the group before
the “buffer” is emptied waits the least time whereas the one that came straight
after previous emptying waits the longest period of time. Also, it is known that
all packets were released by the sender with some ﬁxed unchanged delay. It fol-
lows therefore that all packets related to strictly linear decreasing part of the plot
belong to the same group. Now consider a subset of packets constituted by the
last packet taken out of each strictly-linear decreasing part. On the left hand-side
plot one can clearly observe that aforementioned set constitutes subsequence with
clearly-revealed positive trend.

In ﬁgure 4.2 is depicted a subsequence of points chosen by a Pathload IC / CS

elimination procedure. One can see on that plot that at least some points of the

44

 

4000 ~—

3000 ——

2000 ——

 

1000 —«

,_N

l l l
100

Relative One Way Delay

 

 

 

 

50
Packet index in the train

Figure 4.1: Shortcomings of IC / CS elimination procedure in Pathload: before elim-

ination

 

4000 —~

3000 —~

2000 —~

1000 -—

Relative One Way Delay

 

 

 

0 _
1 1 1 l f n
6 7

Figure 4.2: Shortcomings of IC / CS elimination procedure in Pathload: after elimi-

nation

45

chosen subsequence do not characterize packets that are least delayed by the IC/CS
effect. Moreover the trend picture that is clearly observed even by a naked eye on
ﬁgure 4.1 is not so evident on ﬁgure 4.2. The information about the index of packets
ﬁnally chosen by IC/CS elimination procedure is also not considered by a Pathload.

Finally, let us apply the PCT test to what is left on the right plot. There are left
only seven points and only half of those are bigger than their direct predecessors.
As a ﬁnal result for this real-life typical example Pathload received a value of 0.43
for PCT metric and 0.17 for PDT metric which are deep in the region classifying
train as the one without trend.

The conclusion is that Pathload throws away signiﬁcant part of a measurement
information at the IC/CS elimination stage as well as when dividing the rest of data
into groups and discarding everything except medians in each group. Also there is
an absence of statistical motivation of the efﬁciency of the PCT / PDT test as well as
absence of reasoning why the chosen thresholds for PCT and PDT metrics should
do a reasonably good job for an arbitrarily taken network path. The example which
we plotted at ﬁgures 4.1 and 4.1 was very frequent in data that we collected at our
network segment and therefore explains signiﬁcant inaccuracy of the PCT / PDT test
when the experimental setup is not as friendly as it was with a highly-revealed tight
and narrow link in the middle of the path.

In next section we explain our way to ﬁx deﬁciencies of a Pathload listed above
and later give comparative experimental results demonstrating effectiveness of our

approach.

4.2 IC / CS elimination procedure and t-test

In this section we give a detailed analysis of our way of dealing with interrupt
coalescence / context switch effect as well as describe the metric that we choose for
efficient trend detection.

The idea of proper IC / CS elimination is very simple. We do not consider the time

which the system needs to bring packets from the network adapter to memory since

46

it cannot be estimated accurately. Instead we detect cases when packets constitute
a strictly linear decreasing trend. If difference in one-way delays of a pair of packets
with neighboring indices is negative and is very close to the same difference of
neighboring packet pairs then all such packets are assigned to the same group and
only the last packet from that group is considered for the further analysis. In other
words, all packets with one way delays constituting a clearly revealed straight line
with a negative slope are discarded except the very last one. We require such a group
to have at least three packets in order to discard all packets in the group except the
very last packet. At this stage it is important that we keep information about
indices of packets left for a further statistical analysis. This way of dealing with
the IC/CS effect is simpler and more effective than the one proposed in Pathload.
We do not assume that the very last packet in the group is processed immediately
after it arrives. We only say that it waits the smallest possible time for an interrupt
among all packets in its group. Also we assume that the time which the last packet
in a group waits for an interrupt is normally distributed and therefore methods of
linear regression analysis can be further applied to chosen subsequence of packets
for trend detection.

The basic idea of a t-test in linear regression analysis is simple. In our case
so—called “response variable” is one way delay of a packet whereas “predictor” is
the packet index that incorporates time ﬂow. The main question to which t-test
for a slope-coefﬁcient in linear regression is supposed to answer can be stated as
follows. Do the train data give sufﬁcient evidence to believe that one-way delay of a
packet depends on its index? Linear regression procedure plots a best-ﬁt line across
data points corresponding to one—way packet delays in a way that minimizes squared
distances from data points to the plotted line. Slope coefﬁcient of the line can be

found according to the formula

31 ___ REEL-:1 372% - (Z?=1$i)(2?=1 ”ll-i)
”231:1sz ‘(Zf=1$i)2

 

In this formula y,- represents one way delay of a packet, :13,- represents packet’s

47

 

index in the train whereas n is total number of ﬁnally left packets belonging to
the train. Since 31 depends on random measurement errors it also has a random
nature. It is known [51] that under linear regression model assumptions coefﬁcient
[fl follows t-distribution with n — 1 degrees of freedom centered around true value of
this coefﬁcient. The null hypothesis in our t-test is an assumption that the train does
not contain any OWD trend or in other words the slope coefﬁcient of a regression line
is zero. Therefore the distribution of a random quantity Bl under null hypothesis is
t-distribution centered around zero with n — 1 degrees of freedom.

If we reject the null hypothesis (i.e., decide that there is a trend in a train) then
the probability of an error can be calculated as an area under the centered around
zero t-distribution density, which remains on the right hand-side of the rejection
threshold. The error is an error of classifying the train with no-trend as a train
with a trend and is called a type I error. The probability of a type I error naturally
depends on the placement of a rejection threshold. The smallest value of type I
error probability given that we still reject the null hypothesis can be achieved if we
place the rejection threshold straight to the value of 1531. The area of t-distribution
density function which is left on the right hand-side of the observed value of 61 is
called p—value of the test.

We decided to choose the obtained p—value of the test as our ﬁnal quantity for
judging trend presence instead the value of 3’1 itself. As it follows from previous
paragraphs, the low p—value values should be considered as an indicator of a trend
presence, whereas high values should be considered in favor of accepting of the null
hypothesis. We choose the threshold for a p—value to be equal to 0.01. The threshold
set in the p-value space has a statistical motivation behind it. It means that we want
to have the misclassiﬁcation probability for trains which really do not have OWD
trend (i.e., trains which were sent at the rate lower than available at that time
bandwidth) to be approximately equal to 0.01.

We have to accept that there is not much that one can say about type 11 error
probability (probability of classifying a train with trend as a train without trend)

unless it is estimated by an experiment for a ﬁxed value of a decision threshold.

48

 

1.53E+09 —
‘~
0
a \

 

 

 

 

 

 

 

>
E
d)
o
5‘
g a a
a) 1.53E+09 — 3 g
c 8 8
O O. O.
9 17, a
‘43 2 .‘\ 2 o
_ § ~
Q) o .9
1:: c .
1.535+09 — 1 1 "
0 50 100

Packet index

Figure 4.3: OWD pattern at the presence of packet loss. Trains were sent at a rate
40 Mbps higher than available at that time bandwidth.

However, we now have at least some statistical reasoning under choosing a particular
level of a p-value as a decision threshold instead of choosing rejection regions for PCT
and PDT metrics on absolutely empirical basis.

A consideration of how much of information is still available after IC/CS elimi-
nation for a further analysis can be viewed as one more argument towards choosing
p—value as a ﬁnal statistic of interest. With a decreasing of number of points re-
maining after IC / CS elimination, and corresponding decrease of degrees of freedom
n t-distribution becomes more and more “stretched-out”. This means that for a
ﬁxed value of Bl the area on its right-hand side will be larger and larger with de-
creasing of amount of available information (i.e., available data—points or degrees of
freedom). Correspondingly, in order to have a sufﬁcient basis for rejecting the null
hypothesis we will need a much higher value of 31 compared to the case when more
data points are available for the analysis.

Another advantage of our scheme of IC/ CS elimination and trend detection is
that it deals with packet losses better. The inﬂuence of packet loss to trend detection
is very signiﬁcant. When routers in the path discard signiﬁcant number of packets
they may “recover” from “instantaneous” overﬂowing of their queues. Therefore

trend detection becomes available only when considering each piece of contiguous

49

 

 

 

 

 

 

 

1.53E+09 ~ .....
0°. .0
o o
E :5. $0 .
8 .5 .0. ...0
(>0‘ 0.... ...o '0'.
. ; ... ..o :2 0::
q) 1.53E+09 — ' OJ 0'
C ..o. {3 ‘0’.
O a :0.
G) H o:
.2 U) 0'
”(U .9 .0...
T) o
(I .0...
o
1.53E+09 # i l l
0 50 100
Packet index

Figure 4.4: OWD pattern at the presence of packet loss. Trains were sent at a rate
30 Mbps higher than available at that time bandwidth.

 

1 .53E+09 —

1.53E+09 ——

Relative One Way Delay
0
lost packets
lost packets

 

 

 

 

 

 

1 .53E+09 —

 

—
—‘

0 50 . 100
Packet Index

Figure 4.5: OWD pattern at the presence of packet loss. Trains were sent at a rate
20 Mbps higher than available at that time bandwidth.

50

chunks of received packets separately. The effect of packet—loss is well-illustrated
by ﬁgures 4.3, 4.4, 4.5. We can clearly see there that even after elimination of
IC/ CS parts the whole picture does not give an impression that there is a trend
whereas every contiguously received chunk does give such an impression. In such
cases Pathload reports a high percentage of “uncleared” trains because Pathload
throws away too much information. Pathload frequently does not have any reserve
left for further splitting a train into a number of sub-trains. Our scheme can afford
such splitting and therefore we report much lower “unclear” trains, which results
in signiﬁcant reducing of number of trains that are required to be sent in order
to achieve the desired detection accuracy. We report train as an “unclear” only
when we have ﬁnally less than 4 data-points left. If we have a number of reported
p—values related to a number of sub-trains then we decide about the trend in the
whole train based on the majority—vote rule in sub—trains. Note that if the IC/CS
packet group is directly followed by lost packets then we throw away all that group
of packets including the last one. With decreasing a number of trains required to
be sent, measurement time also is signiﬁcantly reduCed. This gives our approach
a higher degree of ﬂexibility in available-bandwidth related planning, which is of a
high importance for present real-time network applications.

In the next section we describe our experimental setup and give comparison of

our scheme performance versus the Pathload.

4.3 Experimental evaluation

We experimented within our laboratory network, which consists of three 100Mbps
capacity segments. The sender and receiver were placed in the ﬁrst and third seg—
ments, while the path from the ﬁrst to the third segment was lying through the sec-
ond segment. We wanted to have both Pathload’s PCT/ PDT test and our approach
to have absolutely equal conditions. Therefore we modiﬁed an original version of a
Pathload by adding a number of printouts that were outputting to the separate ﬁle
the OWD times measured by Pathload. Also for each train we outputted values of

51

the PCT/ PDT metric together with Pathload’s ﬁnal decision about the train. Then
we processed the same train data by our code, which eliminated the IC/CS effect,
conducted t-test, outputted p-values, and ﬁnally made a decision about the trend
presence.

During the ﬁrst part of the experiment it was known with a high degree of
conﬁdence that the background trafﬁc in all three segments is very low (around 1
Mbps) and does not exceed 2-3Mbps at any period of time of the length of 200
milliseconds. For that part of the experiments we knew therefore that all trains sent
at the rate lower than 95Mbps should be reported as trains without trend. Also
we knew that all three links capacities were lOOMbps and therefore all trains that
were sent at rates 98 Mbps and higher should be reported as trains with a trend.
This experimental setup did not contain any tight / narrow link in the middle of the
path. We run Pathload continuously for the period of approximately 7 minutes and
analyzed in total 4650 trains. Then we imposed a slight utilization to the processor
of a receiver and repeated the experiment. Table 4.1 shows how the performance
of the PCT/ PDT test deployed in a Pathload is compared with the performance
of our approach when no processor utilization was imposed at the measurement
machine responsible for receiving packets. Table 4.2 shows detection performance
comparison when processor utilization was 5 — 7%. Tables are structured as follows.
In the second bold column are shown type II errors for PCT/ PDT test and t—test,
respectively. In particular, there is shown the portion of trains that were incorrectly
classiﬁed as trains without a trend when the trend was really present. Note that this
portion has been computed “conditionally” i.e., as a portion out of total number of
trains that were sent in the speciﬁed rate-range and were not assigned to “uncleared”
train set. In the third bold column is shown type I error for multiple rate ranges.
Let us remember that we have set a decision threshold for p—value equal to 0.01.
We see in all tables that experimentally evaluated type I error is not that far from
“theoretically-predicted” value 0.01, which can be considered to some extent as a
veriﬁcation of model’s applicability. “Unclear trains” colunm is followed by a column

showing the number of trains that the t-test based tool needs to send in order to

52

achieve accuracy of PCT/ PDT test sending 6 trains at a given rate. Naturally, if a
number in this column is greater than 6 then PCT/ PDT test demonstrates better
performance for a corresponding rate and vice-versa otherwise.

For the case when there was no any extra processor utilization, t-test performed
better for all trains sent at the rate higher than 98Mbps as well in the range 15-
55Mbps. Although we have slightly higher error rate for a range 55-95Mbps than
PCT/ PDT test has, still both approaches need to send three trains for achieving
99% accuracy. Difference in performances for the mentioned range therefore can be
considered as not signiﬁcant whereas for other rates our trend detection mechanism
signiﬁcantly outperforms the PCT/ PDT test.

Table 4.2 clearly shows that the PCT / PDT performs worse when operating even
under very small on-going processor utilization. In particular, the PCT/ PDT test
fails to correctly classify trains sent at the rate-range 15—55Mbps. Its error ap-
proaches 0.5 and sometimes even exceeds this value, which makes the rate classiﬁca-
tion procedure either unacceptably long or not applicable at all. At the same time in
the global scale Pathload was unable to report even once available bandwidth from
a correct range when processor utilization was imposed. For example, the maximal
reported value of available bandwidth out of twenty runs was 70Mbps when the
channel was almost free and processor utilization was 3.5%. The responsibility of
this failure is Pathload’s method to deal with packet losses.

During the second part of our experiment it was known with a high degree of
conﬁdence that the total cross—trafﬁc was always exceeding the level of 65Mbps. In
other words, we were guaranteed that if trains that were sent at rate 35Mbps or
higher are classiﬁed as those with trend, then this classiﬁcation is correct. There
was no processor utilization imposed to a measurement machine. The goal of the
experiment was to check how efficient t-test and PCT/ PDT test detect trend in
trains that are sent at the rate lower than the capacity but higher than the available
bandwidth of the network path.

Table 4.3 shows results of t-test and PCT / PDT test accuracy for trains sent at the
rate higher than the available bandwidth. Instead of type I error in the third bolded

53

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N of trains
Rate at 1 Trend detection that t-test
which Classified as 0- Classified as mor accuracy for a needs to N of trains required to Total
trains 95 Mbps (withou than 98 Mbps Unclear trains fleet consisting achieve 6- achieve at least 0.99 number of
were trend) (with trend) out of 6 trains train accuracy analyzed
sent accuracy of trains
PCT/PDT
PCT/PDT t-teet PCT/PDT t-teet PCT/PDT t-test PCT/PDT t-tost test PCT/PDT Host
1535 0.000 0.014 0.500 0.014 0.985 1.0071 3 7 3 72
3555 0.560 0.023 0.233 0.000 0.312 1.000 1 >100 3 564
55-75 0.002 0.040 0.093 0.000 1.000 0.999 9 3 3 420
7595 0.002 0.054 0.098 0.006 1.000 0.997 11 3 3 1296
> 98 0.083ﬁ0.026 0.296 0.000 0.971 1.000 1 9 3 2226

 

 

 

 

 

 

 

Table 4.1: Comparison of PCT/ PDT test and t-test performances on nearly empty
network and zero receiver’s processor utilization

 

 

 

 

 

 

 

 

 

 

 

 

N of trains
Rate at Trend detection that t-test
which Classiﬁed as 0- Classified as mor accuracy tor a needs to N of trains required to Total
trains 95 Mbps (withou than 98 Mbps Unclear trains tleet consisting achieve 6- achieve at least 0.99 number of
were trend) (with trend) out 016 trains train accuracy analyzed
sent accuracy of trains
PCT/PDT
PCT/PDT t-tost PCT/PDT Hut PCT/PDT t-test PCT/PDT t-test test PCT/PDT Most
15-35 0.441 0.010 0.190 0.055 0.508 1 1 >100 3 199
35-55 0493 0.008 0. I 52 0.063 0.412 1 1 >100 3 1186
55-75 0063 0.0I0 0.077 0.074 0.994 1 3 5 3 1070
75-95 0033 0.011 0.053 0.069 0.999 1 5 4 3 1926
> 98 0.081 [0.061 0.302 0.046 0.973 0.996 3 9 5 710

 

 

 

 

 

 

 

 

 

 

 

 

Table 4.2: Comparison of PCT/ PDT test and t-test performances on nearly empty
network and 5 — 7% receiver’s processor utilization

 

 

 

 

 

 

 

 

Rate at Classified as Trend detection
Classified as less more than 35 accuracy tor a ﬂeet N ot trains required to Total
which trains than 35 Mbps Unclear trains consisting out 0t 6 achieve at least 0.99 number ot
(without trend) Mbps (With trend) trains accuracy analyzed
were sent trains
PCT/PDT Host PCT/PDT Host PCT/PDT Host PCT/PDT Most PCT/PDT t-tost
35-5_5 0.005 0.922 0.453r 0.000 0.000 0.932 >100 5 592
5575 0.021 0.761 0.958 0.000 0.004 0.853 >100 17 426
7595 0.325 0.958 0.394 0.000 0.157 0.999 >100 3 335
> 98 0.276 [ 0.082 0.305 0.001 0.766 0.991 38 5 1601

 

 

 

 

 

 

 

 

 

 

 

 

Table 4.3: Comparison of PCT/ PDT test and t-test performances on heavy loaded
network and zero receiver’s processor utilization

54

 

Difference N of trains

 

 

 

 

 

 

 

 

 

 

 

 

 

 

between that t-test
rate at which Classification error needs to N of trains required Total
trains were Unclear trains achieve 6- to achieve at least number
train 0.99 accuracy of
sent and accuracy analyzed
available I of
bandwidth PCT/PDT t-test PCT/PD t-test PCT /PDT PCT/PDT t-tost
15-35 0.122 0.008 0.464 0.000 1 16 1 1023
3555 0.054 0.000 0.299 0.002 1 7 1 711
55—75 0.020 0.003 0.344 0.001 1 6 1 751
7595 0.443 0.292 0.397 0.001 1 >100 29 741
> 98 0.476 0.065 0.346 0.001 1 >100 5 664

 

 

 

 

Table 4.4: Comparison of PCT / PDT test and t-test performances on average loaded
network (around 50Mbps :tIOM bps cross—traffic) and zero receiver’s processor uti—
lization

column we show detection accuracy for a speciﬁed range of rates. Table 4.3 shows
that the PCT/ PDT test completely fails to detect trend when suffering packet losses
and non-friendly experimental setup. We also note that despite this failure at the
level of trains, Pathload still always reported values of the available bandwidth from
the correct range because of the artiﬁcially implemented packet-loss mechanism. We
believe, however, that an available bandwidth measurement tool can be signiﬁcantly
improved after excluding an artiﬁcial mechanism of dealing with packet losses and
respectively relying solely on a train-level reported statistic.

In the third part of our experiment we artiﬁcially generated and monitored cross-
trafﬁc in order to obtain a clearer pattern of error percentage as a function of differ—
ence between the rate at which a train was sent and available bandwidth. Unfortu-
nately we were unable to capture all trafﬁc on the network and were able to monitor
trafﬁc only with a certain error. In our setup generated trafﬁc was ﬂuctuating around
50 Mbps. Since there was no 100% conﬁdence in cross-trafﬁc measurements and in
order to be on a safe side we monitored trains which were sent at rates either at
least 15Mbps higher or 15Mbps lower than tentatively reported available bandwidth.
Based on the setup of our experiment we were able with sufﬁciently high degree of

conﬁdence to assume that these train rates were either higher or lower than the

55

available bandwidth. Table 4.4 shows an output for this part of the test. Now
we have only one column showing classiﬁcation errors. For lines where difference
between train rate and available bandwidth is negative, the error represents type I
error. For lines with positive difference between rate and available bandwidth, the

error represents type II error. Table 4.4 shows that our approach again outperforms

PCT/ PDT test for all shown ranges.

4.4 Conclusion

In this chapter we proposed a simple algorithm of dealing with the IC/ CS effect
together with a trend detection algorithm based on a simple linear regression model
and p—value reported by t-test. We showed that our algorithm is more robust with
respect to the IC / CS effect and packet losses. Our real-life experiments clearly Show
better performance of our approach in comparison to PCT/ PDT test deployed ﬁrst
in Pathload and later in other probe-rate measurement tools [42, 46].

As a result, with our scheme either the accuracy of any probe-rate based available
bandwidth measurement tool can be signiﬁcantly improved or the measurement time
can be reduced at least twice while preserving the same measurement accuracy.

However, despite the algorithm introduced in this chapter allows an increase of
at least twice the available bandwidth sampling rate, the range of applicability of
such a measurement tool for “crossing probability” estimation is still limited. We
know that for most real-time applications, the averaging time 7' is of the order of
seconds rather than minutes.

In the next chapter we introduce a non—parametric model for “crossing proba-
bility” estimation, which is based directly on the OWD trend detection technique.
Taking into account that the OWD at a given rate can be detected for the period of
the order of tens of a second, our non-parametric model can predict crossing proba-
bility over the next second based on the behavioral history of last ten minutes. The

next chapter gives details about our non-parametric approach.

56

Chapter 5

Non-Parametric Approach

In this chapter we provide a more general non-parametric approach for computation
of a “crossing probability”. In contrast to Chapter 3 that relies on measurements of
the available bandwidth (AB) time-series, a non-parametric approach in this chapter
relies directly on measurements of indicators showing whether the AB time-series is
smaller or larger than a given value. Therefore a non-parametric approach is robust,
allows faster computation of a crossing probability than the parametric approach
in Chapter 3, and has a larger range of applicability since it does not rely on any
parametric time-series model. Moreover, for computation of indicator values that
compare available bandwidth against a speciﬁed value, we do not rely on the Pairwise
Comparison Test / Pairwise Difference Test (PCT/ PDT) metrics [29], but use our
own way of trend detection introduced in Chapter 4.

We suggest to further optimize estimation of a “crossing probability” by “tuning”
it to a speciﬁc network path and even the day of the week at which measurements are
made in order to make estimation more efﬁcient. First, different network paths have
different time-frames at which available bandwidth may be treated as a stationary
time—series. Second, different paths may “forget” about their history at different
rates. In other words, even while assuming that the available bandwidth time-series
is stationary, it might be applicable to assign larger weights to the recent past and
smaller weights to the older history. The rate at which weights decay is network—path

speciﬁc and needs to be adjusted as well.

57

Finally, since the main goal of our research is a prediction of a “crossing proba-
bility,” we introduce and compare methods for a veriﬁcation of the accuracy of our
prediction based on training and previously unseen testing data-sets obtained by
real-life measurements across multiple Internet paths. Along with real-life experi-
ments, we present theoretical results that show that our veriﬁcation procedures are

probabilistically legitimate.

5.1 Moving block bootstrap

Let us consider a stationary sequence X1, X2, ..., Xn of observations of zeros and
ones where each observation is related to the period of time of approximately 0.5 sec-
ond. Let the value zero correspond to an event when the available bandwidth time-
series is larger than a given threshold whereas one is related to the case when the
available bandwidth is smaller than the threshold. Each observation can be obtained
with the help of the probe-rate approach. “One” is reported when there exists a
positive trend in travel times of packets in a sequence sent at a given threshold
rate. “Zero” is reported when there is no indication of such a trend. For obtaining
observations of zeros and ones related to a speciﬁed threshold rate, sequences or
trains of packets were sent exactly at that rate. Since each train was sent within
the same period of time, the “averaging time” 7' does not vary from train to train.
Each observation is related to one train of packets and therefore 7' can be deﬁned as
the time that one train of packets requires to be delivered from sender to receiver.

The problem of ﬁnding the crossing probability is reduced to ﬁnding a probability
that among N consecutive observations of “zeroes” and “ones” there exists at least
one “one”. Note, that observations of zeros and ones are tightly related to cross—
trafﬁc existing in the path and therefore we have to assume that these observations
depend on each other. More precisely, they form a stationary sequence that possesses
long-range dependency and self-similar properties [34].

Based on aforesaid, the natural way of non-parametric estimation of a cross-

ing probability is as follows. Observe n values X1,X2, ...,Xn, X2- : {0,1},z' E

58

      
 

Future

   

Here n =12; N = 3, C = 7, P = 0.7 History

Figure 5.1: Window of size N = 3 slides across the history data of size n = 12 for
prediction of a crossing probability P over coming N = 3 observations

{1, 2, ..., n} of the time-series of interest. Then establish a window of size N slid-
ing across the sequence of observations of size n by one observation at a time (see
ﬁgure 5.1). At each cycle increase the counter C when among N observations in a
sliding window there exists at least one “one”. When the sliding window of size N
gets to the end of the sequence of size n, divide the counter C by total number of
cycles (n — N +1) and consider this ratio as the estimator of the crossing probability
for the next N observations. If n— N + l is large and at the same time the time-series
is stationary during the period of time that corresponds to n observations, then the
ergodicity theorem says that the aforementioned ratio is a legitimate estimator of
the crossing probability (i.e., it converges to the crossing probability as n tends to
inﬁnity).

The estimator of the crossing probability C / (n — N + 1) depends on observations
of a random time-series and therefore is a random quantity. It is important to
report not only the value of an estimator but also its 95% conﬁdence interval. This
is a random interval around the estimator such that the true value of the crossing
probability falls into the interval with a likelihood of 95%.

The only presently known way of non-parametric estimation of a distribution
of an estimator is the bootstrap technique. However, the bootstrap in its basic
formulation helps to ﬁnd a distribution of a random quantity T (data) that is a
function of independent identically distributed random observations. Given that
initially there are given It independent random observations, the bootstrap proposes
to consider B groups of k observations in each, where each group is sampled with

replacement. Therefore in some group there may exist multiple copies of a particular

59

 

observation and some observations may not exist in a certain group as well. The
bootstrap proposes to compute B values of quantity T, where each value is observed
only at a particular group taken with replacement. Then distribution of a random
quantity T can be estimated based on its B bootstrap values. In particular, the
95% conﬁdence interval can be found with the help of 2.5% and 97.5% quantiles of
obtained dataset of size B.

Our quantity of interest T can be deﬁned as

 

n—N-l-l ,
T = i=1 C2
n — N +1

. In our case, however, 71 —— N + 1 observations of an indicator

Ci;1max{Xz-,X2-+1,...,Xz-+N_1}=1

depend on each other. Therefore application of a bootstrap in its original formulation
is not legitimate. Liu and Singh [39] introduce the notion of Moving Block Boot-
strap (MBB) re—sampling for the case of m—dependent random sequences. They be-
lieve however that their procedure works well on more general stationary sequences.
Therefore we believe that it is legitimate to use the Moving Block Bootstrap proce-
dure in our work.

We next describe the procedure. Let b be the size of the block of consecutive
observations of time-series Xi, Xi+1, ..., Xi+b—1' If this block “moves” across the
whole dataset of size n then n — 0+ 1 such blocks can be obtained. Note that length
of the block 0 should be sufﬁciently large for capturing all correlation structure of
the underlying time—series. More precisely, 0 should exceed the lag H at which the
autocorrelation function p(h) is negligibly small. Then ['2'] blocks are chosen with
replacement out of n — 0+ 1 existing blocks. These blocks are grouped together, one
after another. Thus the new n “re-sampled” time-series observations are obtained.
The ﬁrst bootstrap value of a crossing probability estimator 121* is obtained based

on these n observations. We repeat the above procedure 1000 times and obtain 1000

60

different bootstrap values of 03*, B = 1, ...,1000. Then an empirical distribution
of a crossing probability estimator can be obtained from p“B* bootstrap values.
In particular, the 95% conﬁdence interval can be constructed by 2.5% and 97.5%
quantiles of 1000 observations of p“B*.

The size of the block b as well as total number of available bootstrap blocks
n — b + 1 are important parameters. With an increase of the period of time deﬁned
by the number N over which the prediction is made, the “history” period of time
that is used for computation of a predictor has to be increased as well. On the
other hand we tried to keep the “history” period as small as we could because the
“stationarity” assumption of a “zero-one” time-series is more realistic for smaller
“history” time-periods. I

We believe that the following rule achieves the best tradeoff between keeping the
size n of the history—window small and still having a representative statistic. Each
block in the MBB approach has to have at least 20 non—overlapping windows of
the length N and the “history” window should contain at least 10 non-overlapping
blocks. Therefore the size n of the window that is used for prediction should be
approximately 200 times more than the size N of the window over which prediction
is made.

For example, if the rate threshold for which we predict the crossing probability
is 80 Mbps, then one train of 120 packets can be sent for approximately 0.3 of a
second. Suppose then that the real-time task takes 3 seconds to execute with T = 0.3
seconds, which means that N = 10. Then n should be of the order of 2000, which
corresponds to time-period of 600 seconds or 10 minutes. Therefore we conclude that
prediction of crossing probability for the period of 3 seconds can be made based on
channels behavior over recent 10 minutes that is reasonable for assuming stationarity

of the available bandwidth time-series.

61

5.2 D-statistic: model veriﬁcation mechanism

The moving block bootstrap technique described above helps to report a non-
parametric estimate of the probability (along with its conﬁdence interval) that
among on-coming N observations there is at least one “one” . The problem now is
to verify efficiently how accurate non-parametric prediction technique works for a
real-life time-series of indicators. We predict the probability of the realization of a
certain event and the natural way to verify accurateness of our prediction is to ob-
serve whether on real unseen data the predicted event has been realized. Then from
many independent predictions and realizations / non-realizations of these predictions,
one might make an inference about the accurateness of the prediction.

Multiple predictions, however need to be independent and therefore for the same
network channel these predictions have to be signiﬁcantly separated from each other
in time scale. These predictions separated in time-scale may report different values
of crossing probability because the stationarity assumption for available bandwidth
time-series does not hold for large periods of time. Below we show the design of
a statistically correct veriﬁcation test that clearly states the Null and Alternative
hypothesis. We also provide a theoretical background that proves legitimacy of our
test.

Assume that we have the set of predictors 13,-, i E 1, ...,l of success probabilities
of independent Bernoulli random variables along with actual observations of these
variables Yi. Let the Null hypothesis be an assumption that predictors p“,- match
actual success probabilities of Yi- Let the alternative hypothesis be an assumption
that p”,- do not have any relationship with realizations Y2- Now consider the following

S 11111

= 22:10”) — 152')
72:14.11 — 25.)

Below in the separate theoretical section we prove that despite Yis are not identically

 

D

 

distributed random variables, the aforementioned sum under the Null hypothesis

converges in the distribution to standard normal random variable (i.e., the Central

62

15

 

 

— Probability density under Alternative Hypothesis H1

. - . - . . Probability density under Null Hypothesis H0

 

 

 

10* '1

pdf

 

   

 

 

 

o 0.2 0.4 0.6 0.8 1
p—value

Figure 5.2: Probability density functions for a p-value conditioned by Null and
Alternative hypothesis.

Limit Theorem does hold).

If D statistic follows the standard normal distribution (N (0, 1)) under the Null
hypothesis, then we can reduce the veriﬁcation problem to a classical Kolmogorov-
Smirnov normality test in the following way. Given that one can observe the network
channel for an arbitrarily large period of time, it is possible to collect many realiza-
tions of a random quantity D. Then, once we have at least 100 such observations
we conduct Kolmogorov-Smirnov normality test and consider p—value of the test as
an indicator of whether our predictions were correctly made. Null hypothesis H0
for the normality test is an assumption that D ~ N (0, 1). An alternative hypothesis
says that D does not follow N (0, 1). Under the Null hypothesis, the p—value is known
to be uniformly distributed on the interval [0;1]. At the same time the p-value for
data with distribution other than N (0, 1) take small values. Therefore low p—values
(less than 0.05) can be considered as a sign that the normality test failed and that
prediction did not work correctly. High p—values are indicators that the data do not

give us reasons to claim that the approach has failed to predict correctly.

63

For a better explanation and visualization of the meaning of a p-value in the
Kolmogorov-Smirnov normality test we did the following simulation. We generated
groups of size 30 of observations of Bernoulli random variables. In each group were
present realizations of variables with different success probabilities. Directly from
the form of D, it is clear that if the approach either constantly overestimates or
underestimates the success probability, then the distribution of D will be signiﬁcantly
shifted from zero and therefore by deﬁnition will not be considered even close to
standard normal. In our simulations we considered the scenario where probabilities
pi participating in construction of the D statistic had nothing to do with realizations
of simulated Yi- Values p,- were simply randomly taken values from the interval
[0;1]. Once we had a group of reported values of the D statistic of size 200, we
conducted a Kolmogorov—Smirnov normality test and reported its p—value. The
whole aforementioned procedure was repeated 10000 times and 10000 p-values were
obtained for the scenario where the prediction was fake. Figure 5.2 shows two
probability density functions on the interval [0;1]. The solid line shows the empirical
density of p—value of the Kolmogorov—Smirnov test for data where predicted success
probabilities were not related to realizations. The dashed line shows the density
of the distribution of the p-value under the Null hypothesis, namelythe uniform
distribution on [0, 1]. We provide ﬁgure 5.2 in order to give a clear impression that
if reported for a real dataset that the p—value of the Kolmogorov—Smirnov normality
test is sufﬁciently high then this can be considered as a reasonable veriﬁcation of
matching of reported predictions to on—coming events. Low p—values of the test may
be considered as an indicator that the predictions are inaccurate.

Remember that the history window size, n, used for prediction of a crossing
probability is an important parameter. On one hand it should be large enough to
have representative history data. On the other hand, with the increase of the history
window, the assumption of stationarity of a time—series {X 2} becomes less realistic.
Different network paths have different time-periods during which one can treat an
available bandwidth related time-series as stationary. We refer here—and-after to

such a time-period as a stationary time-period. Our goal therefore is to detect the

64

stationary time-frame for a particular network path.

The “self—veriﬁcation procedure” described above can be considered as a natural
way to detect a suitable range of history window sizes. We repeat the same procedure
and obtain p-values for the same history data and different history window sizes.
Dependency of reported p-values on the history window size is believed to be uni-
modal. Naturally, for large window sizes we are supposed to obtain small p—values
because the stationarity assumption does not hold and predictions are inaccurate.
For small windows, p-values should also be small because there are too few zeroes
and ones used for predicting crossing probability 13 and therefore the prediction is
not accurate. Therefore, the dependency of a p—value on the history window size
should have one mode deﬁning the range of acceptable sizes of history-windows.

The approximate time-scale of a prediction/self-veriﬁcation is therefore as fol-
lows. For predicting of a crossing probability for the period of the next 2 seconds
we need around 8 minutes of history frame. However, for doing self-veriﬁcation and
reporting a p—value we need 1500 independent realizations/non-realizations of an
event of a crossing. Therefore self-veriﬁcation may require the time of 15 hours of
data-collection. Let us emphasize that actual prediction of a crossing probability for
the next 2 seconds is based on the history of the order of 8 minutes and stationar-
ity is assumed only within the frames of these 8 minutes of a history window size.
We can start outputting crossing probability immediately after observing a network
path for the period of 8 minutes. In addition, after observing a channel for 10-15
hours we are capable also to specify the range of sizes of history windows that are
suitable for a prediction and use the window from that range for making our predic—
tion. Finally, if for some network path it turns out that stationarity does not hold
for the required time-period and the range of acceptable history window-sizes does
not exist, then we can detect this phenomenon and report a temporary inability to
make accurate predictions. Decreasing the time-period required for self-veriﬁcation

is important and we address this issue in detail later in this work.

65

5.3 D-statistic: theoretical results

In this section we give a tl'reoretical background to the fact that D statistic has stan-
dard normal distribution given that predictors 15,- match the actual success proba-
bilities of corresponding Bernoulli random variables Y2"

We will start with a basic result that establish the Central Limit Theorem when
success probabilities p,- are known.

Proposition. Let Y1, ..., Yn, be independent. with distribution Y2- ~ Bernoulli (pl)

Assume that

 

 

00
2 P211 - Pi) —* 00
2'21
. Then I
, y. _ .
:77“ 2 1),) => N(0, 1)
t/Zizl P211 - Pi)
Remark. A simple condition that implies
oo
2: P111 - 107;) = 00
i=1

is the existence of c > 0 such that c < pi < 1 — c, i = 1, 2,3,

Proof. Denote
Yr - P2:

’1 724:1 201(1 — Pr)
Since E (Y2) = pi for any i we have

a. E(Yz',l) = 0, and

b. 24:15:03.3) =1

 

Y,- ,r=1,...,r

 

The result follows by a standard application of Lindberg-Feller condition. For

every E > 0
l

ZE(YZZZ;|YH|>5)—>0 asn—+oo
i=1 ’

66

Observe that.

1
sup {IY il[}— —>0asl—>oo

1955 \/Z$=1Pi(1_pi)

 

 

So Lindberg—Feller condition holds. 0

Our next task is to convert the Proposition into a result that matches the setup
we are dealing with. Now we assume that p,- are not known, so we replace them by
their estimators 13,-.

Let n be a ﬁxed positive integer. We remark that n appears actually as n — N + 1

(i)

in other sections. Let Yk ~ Bernoulli (p2) be a triangular array of random variables
where each row {Ykm, k = 0,1,...,n} represents a stationary sequence, 1' = 1, 2,
Further, we assume that the stationary sequences represented by different rows are

independent. We now estimate each p,- by

211:1 Yk ('1)

TI.

Pit:
To tie our notations with those used in other sections we will denote Yi 2 Y7?) . The

following can be proved similarly to the Proposition. Therefore we skip the proof.

Theorem. Assume 00
: v...(y,. — ..,.) = ..
121
Then 1
21:1 (Y,- _ 152')
VZi=1vaffyi — 172‘)

 

:>N(O,1)

 

The quantity Var(Y- — p,) can be calculated precisely using pg)? = P(Yk(z) =
1, Yk+m = l), a quantity that due to the stationarity assumption does not depend

on k. Observe that p8) 2 pi.

Lemma. n
. 1 2m '
Varm - pi) = (1+ g)” - Z (717%?
m=1

67

Proof of Lemma. We will drop the index i. for ease of notation.

Var/“(22;6 Y — Yk)

Varr(Y — [3) = n2

 

:22”; -0 EKY-YkXY—lel

n2

 

225-— pO—pk—p,'+[}-, "
1.9—1 . J [k—j] 1 2m
1+- -— E —-
7L2 :( n)p0 =1(n2 )Pm

 

Using the Lemma we can rewrite the convergence in the Theorem as

21:10? —I32')

(2') => N(0,1)
(/(1+%)Z[=1pi-Z,_12m =1(T) Pm

 

 

(i)

A natural estimate of Pm is given by

a 22:01 "W15 W12...
Pm:

n—m

 

We conclude that for large n we get that

Z§:1(Yt #3.)
72:113.- - 25:1 34;"ng

 

=> N(0, 1)

 

[\D

(i)

In the extreme case where Y?) ,k = 0,1,...,n are independent then p. =
1, m

p22, m 2 1 and we get:

- 1
Var(Yz- —pz-) 2 (1+ 'T—lXPi — P3)~Pi(1—Pi)

In general, we do not have independence but only stationarity. However, if the

(i) 2

stationary sequence has enough ergodic (mixing) properties then we do get pm ~ p2- ,

68

 

 

 

 

0.9 -
0.8 _
E .
E 0.7 -<
.o 0.6 ..
e l
Q 0,5 -
2"
a 0.4 -
U)
9 0.3 s
O , ‘
0.2 . 4 d
0.1 _,
0 1 1 1 1 1 1 1
0 500 1000 1500 2000 2500 3000 3500 4000

'21.6 seconds; 4000 = 24 hours

Figure 5.3: Evolution of the crossing probability along with its 95% conﬁdence
intervals throughout the day. M = 65Mbps, N = 5, T = 0.433ec.

at least for large m. Now we look at the expression 2” _1(—-2-2 m)p£n) and we see

that it represents a weighted average of {p,(n)} where most of the weight is on large

value of m. So an assumption of strong ergodicity will lead to

El(:m)10m) N Pi
In that case we can write that approximately

214% -I3z')
725:1 26.0 — a)

The formula above is used in the following section.

 

D = ~ N(0, 1)

 

5.4L D-statistic: experimental evaluation

Figure 5.3 shows how crossing probability evolved throughout the day for the net-
work path between nodes at Michigan State University (35.9.4290) and the Shenyang
Institute of Automation of the Chinese Academy of Sciences (210.72.135.98). The
accuracy of prediction is clearly shown on the ﬁgure by means of the reported 95%

conﬁdence interval.

69

 

15 l r T I

 

 

— Probability density under Alternative Hypothesis H1

 

 

. - . - . . Probability density under Null Hypothesis H0

 

10-

pdl

+0-

 

 

 

 

 

 

0.6 0.8 1
p—value

Figure 5.4: Probability density functions along with p-values collected for the route
35.9.4290 (Michigan State University) - 137.189.100.235 (Chinese University of
Hong Kong).

15 1 I I I

 

 

— Probability density under Alternative Hypothesis H1

. . t . . . Probability density under Null Hypothesis H0

 

 

 

 

 

 

 

 

0.4 0.6 0.8 1
p—value

Figure 5.5: Probability density functions along with p—values collected for the route
35.9.1014 (Michigan State University) - 210.72.135.98 (Shenyang Institute of Au-
tomation, Chinese Academy of Sciences).

70

Experimental veriﬁcation of correctness of crossing probability prediction has
been performed on multiple network paths where experimental nodes were located
in the United States, mainland China, and Hong Kong. On all paths we observed
the same pattern which is well-illustrated on two examples below.

The veriﬁcation procedure described in the previous sections has been applied
to real measurements for verifying whether prediction of crossing probability made
by our module is correct. Veriﬁcation has been performed for different values of
aggregation time T, bandwidth threshold M, and number N of observations over
which we predict. Note that the machines used for measurements do not belong to
the Planetlab network because there have been criticism that Planetlab machines do
not reﬂect typical Internet patterns of available bandwidth between nodes and the
processor utilization on Planetlab machines are often high. Processor utilization of
measurement machines is an important factor, which needs to be low in order not
to affect measurements. Therefore the machines that were used in the experiments
were completely dedicated to measurement.

Veriﬁcation for a particular path and values of T, N, and M was performed
according to the following scenario. First, we collected around 400,000 continuously
taken observations of indicators. Given that T varied up to 0.5 sec for collecting
data for one set of parameters, we observed the channel for the period of two days.
Measurement of indicators was performed with the help of modiﬁed Pathload’s code.
We enabled logging of one way delay times in Pathload, and used our own module
for processing and analyzing these times. Also we removed Pathload’s binary search
part of the code and forced the Pathload to send packets at the user-deﬁned rate.

After obtaining 400,000 continuous observations of indicators, we used the win-
dow of size N *200 for making our ﬁrst prediction of the crossing probability for next
N observations. After the prediction was made we checked whether the bandwidth
threshold M was really crossed for next N observations thus obtaining the ﬁrst re-
alization of Bernoulli random variable Y1 considered in the previous sections. Then
the window used for prediction of the length 200 >1: N was shifted 100 observations

ahead and new prediction was made for next N observations of indicators. Therefore

71

1‘

adjacent groups of size N were separated by 100 observations. We considered such a
separation sufﬁcient for claiming that two adjacent groups are independent on each
other. Then the same procedure of shifting and predicting has been repeated 30
times allowing us to collect 30 observations of crossing probabilities p,- and indica—
tors of actual realization of the predicted event Y2. With the help of aforementioned
observations we construct the ﬁrst value of the D statistic. Then we continue with
another portion of 30 cycles of predicting crossing probability, checking the realiza-
tion of predicted event and then shifting the window ahead. We therefore obtained
130 values of the D statistic and then applied the Kolmogorov-Smirnov normality
test to these 130 values. Finally we report p-value of this test.

The whole procedure of data collection and analysis has been repeated for values
of bandwidth threshold M E 55, 60, 65Mbps and N E 5, 10, 15. Our veriﬁcation test
can be viewed as a two class categorization problem. The ﬁrst “matched” class may
be deﬁned as the class of input instances of success probabilities of Bernoulli random
variables together with their realizations when realizations match probabilities. The
second “unmatched” class is the class of instances where success probabilities are
not related to realizations. The p-value is the only feature based on which we decide
to which class given instance belongs. Figure 5.2 show class—conditional densities
of the p-value. In other words, for any new unseen instance p-value, Figure 5.2
suggests to assign it to the class which has higher probability density at the point
which corresponds to that p-value. Therefore for each given network path we have
to check that at least 95% of p—values relate to real-life data, which clearly suggest
to us to assign those instances to the “matched” class. Figure 5.4 and Figure 5.5
show p-values for two paths that we monitored for a signiﬁcant period of time. The
p—values for different runs are shown as “+” signs. Note that the y-coordinate of
observed p—values (shown on plots by “+” signs) was assigned a random value just
for better display of the spread of data along the :r-direction. The a: coordinate of
“+” signs is meaningful and represents the observed p-value. For clarity purposes,
we copied two class-conditional densities to the result plot. With the densities on

the plot, it is clear which class—density takes higher value at points corresponding

72

 

 

 

 

 

 

 

 

 

 

 

 

 

n 400 600 800 1000 1200 1400 ——$———Va"d Ra" ems'ze

Obs Seconds
1 0.004 0.035 0.838 0.475 0.0135 0.0045 800-1000 400-500
2 0.041 0.583 0.236 0.846 0.2456 0.0094 600-1200 300-600
3 0.003 0 0.004 0.242 0.5723 0.03891000-1200 500-600
4 0.467 0.683 0.007 0.005 0.0023 0.0003 400-600 200-300
5 0.097 0.344 0.272 0.012 0.0241 0.003 600-800 150-200
6 0.572 0.296 0.973 0.386 0.742 0.067 4001200 100300
7 0.042 0.634 0.437 0.027 0.0024 0 600-800 150-200
8 0.003 0.048 0.473 0.489 0.0081 0 8001000 200250
9 0 0 0.346 0.39 0.6723 0.0038 8001200 200300

 

 

 

 

 

 

 

 

 

Table 5.1: Valid range of history window sizes

to p—values of tests conducted against real prediction measurements. Figure 5.4 and
Figure 5.5 show that all instances of our veriﬁcation procedure are clearly suggested
to be assigned to the “match” class. Also one can note that the whole pattern of
p—values on both ﬁgures is similar to the pattern of realizations of a uniform random
variable.

Table 5.1 shows dependency of reported p—values on the chosen history—window
size. Each row in a table represents a different network path. Each column represents
a different size of a history-window used for prediction. All predictions were made
over windows of size N E {2, 3, 4, 5}. The two right columns represent the range of
acceptable history window sizes in observations and in seconds, respectively. As it
follows from the table, some experiments were made when observations were sampled
at the rate 2 observations per second and some at the rate 4 observations per second.
As ﬁgure 5.2 suggests we assume that the particular history window-size works well
if reported p-value is larger than 0.1. Table 5.1 clearly shows that the trade-off
between complying stationarity assumption on one hand and having representative
statistic on another hand can be achieved in practice for all analyzed cases.

We conclude this section with a note that our prediction without an exclusion

passed the accuracy veriﬁcation test at all paths that we monitored.

73

5.5 Prediction optimization with weights

In addition to choosing an optimal history window-size described in section 5.2,
we can use the same data that were used for that purpose for further increasing
prediction accuracy by consideration of different weights applied to the recent and
older past. The predictor 15 = mfg/v? shown in previous sections can be viewed
as a partial case of applying weights. We can view the counter C as a sum of
indicators 1k, 11‘ E {1,2,...,n — N +1} where 1k = 1 when at least one X,- = 1,
i E {10, k + 1, k + N — 1} and 1k = 0 otherwise. We therefore can see that the
same weight H-TNH is applied to each of n — N + 1 indicators within the frames
of a history window. Naturally, all weights should add up to one. In this work we
propose to use weights of the form $11107 where the normalizing constant H can be

determined from the equation

n—N+1

H: 2 k7
k=1

Therefore the predictor of a crossing probability takes the following form

n-N +1 1
p = E 31.71,,
1:21
The expression for 13 suggests assigning higher weights to the recent past, which
corresponds to higher values of k. A parameter 7 describes the rate at which weight
decay and can be adjusted based on the same history data that were used for an
adjustment of a history window size. For adjustment of 'y, we propose to minimize
a quantity of a total relative error as follows. If we obtain from a “self-veriﬁcation”
dataset many predictors p“,- together with the following realizations Y2" then the total

relative error can be evaluated as

Error = Z [131' — Y2]

Z

74

The quantity of an error introduced above does not tell us much in an absolute
sense about the accuracy of our prediction. For that purpose we use an appropriate
statistical procedure such as the one introduced in the section 5.2 together with
its p—value. However, minimization of a relative error can help tune the rate 7 of
weights decay.

Overall, our algorithm for tuning parameters to a speciﬁc network path may
be described as follows. First, we collect sufﬁcient data for conducting a “self-
veriﬁcation” procedure. The minimization of the amount of data and the time
required for that purpose will be considered later in this work. We assign to 7 a
value 0, which corresponds to the case with uniform weights. Then we run the
self-Veriﬁcation algorithm for a number of window sizes, where n varies within the
range of [50 =1: N; 1000 ... N]. We use obtained p-values to obtain an optimal value
of n, which corresponds to the range where p—values are high. If the range with
sufﬁciently high p-values does not exist, then we still take n that corresponds to
the maximal reported p—value. With the newly obtained history window-size n, we
minimize the total relative error and thus ﬁnd the best rate '7. Then we use the
new value of alpha and again try different window sizes to determine the new range
of sizes for history-windows. The expected outcome is that the range of applicable
window sizes either arises or expands after applying the weight-based technique.

The usage of the weights within the frames of a stationary time—frame can be
motivated by the fact that the Internet trafﬁc time-series demonstrates a positive
correlation pattern at small lags. In other words, higher values of traffic imply higher
values at close time-moments, whereas small values of trafﬁc imply small values. The
available bandwidth time-series is closely related to a cross-traffic and sometimes is
directly equal to a constant minus the cross-trafﬁc through a tight and narrow link.
Therefore we believe that the most recent values of the available bandwidth related

time-series should be more important than the others.

75

5.6 R—statistic: improvement of a veriﬁcation pro-
cedure

In this section we discuss an optimization of the self-veriﬁcation technique. In short,
the self-veriﬁcation procedure collects a number of predictions of a crossing probabil-
ity p“,- together with realizations/non-realizations of the predicted events of crossing
and then suggests a statistical test that checks if predicted values of a crossing prob-
ability really match oncoming events of a crossing. The output of the self-veriﬁcation
procedure is used by a parameters adjustment module, which was introduced in the
previous section. Therefore, it is important to reduce the number of observations
and respectively, the boot time that the self-veriﬁcation procedure needs to produce
its output.

The statistical test shown in the section 5.2 assumes the creation of the D-
statistic on the basis of pairs of quantities 151': Yi- The D-statistic follows the stan—
dard normal distribution under the null hypothesis, which indicates that predictions
match on—coming events. The procedure has to collect at least 50 independent values
of the D-statistic, then conduct a normality test on them and report the correspond-

ing p—value.

5.6.1 Timing

First we brieﬂy analyze the time that is required for reporting one value of a D
statistic. The time—period up to 0.25 seconds is required for conducting a robust
check of the available bandwidth time-series against a certain threshold. Therefore
we assume that one observation of the original time-series {X} of zeroes and ones
requires 0.25 seconds to be reported. Note that Yz- is an observation that equals
one when at least 1 out of N observations of the underlying time-series {X} equals
to one. Let us assume for the sake of preciseness that N = 4. Then the time
for observing one value Yz' of the realization of an event of crossing is 1 second.

Computer simulations, conﬁrmed by real-life experiments, show that the time needed

76

 

to report a good predictor of a crossing probability p,- varies from 50* N to 1000* N.
Assuming that the history window-size is of the order of 500 =1: N, we note that the
history time-frame is of the order of 500 seconds (8.3 minutes). The form of the D
statistic assumes that observations of Y,- are independent of each other. This can be
achieved in practice only by separating them in time by at least 100 observations of
the underlying time-series {X}. Finally, taking into account that we need at least 30
summands to make the D statistic robust, we note that the time for outputting one
value of the D-statistic can be evaluated as t = 500 + (100 * 0.25 + 1) =1: 30 = 1280 see

(21 minutes). We conclude that the boot time for a self-veriﬁcation module required

 

for a collection of at least 50 values of the D-statistic is of the order of 16 hours.
Our optimization of the self-veriﬁcation module is a better replacement for a D
statistic. Introduced below, the R statistic possesses the same characteristics as the
D statistic while requiring only around 500 seconds (8.3 minutes) for the mentioned
above example to be reported. This means that time of the order of only 6 hours
is needed for reporting the p-value of the normality test. Therefore we reduce the

boot time for the self-veriﬁcation procedure by a factor of 3.

5.6.2 R—statistic

Consider observations of a Bernoulli zero—one time-series of {X i}, 2' E 1, ..., n, n + 1, ..., n + N,
which fall into a history time-frame plus the following N observations. Then we cre-

ate the time-series Wk according to the following rule:

IV}, = H18.X{Xk+1,Xk+2, ""Xk+N} —ﬁk, k E {N, ...,n}

where pk is a predictor of a crossing probability based on the ﬁrst It observations of

time-series {X ,} Then the following statement holds:

:2“: N Wk

R:
\/(n—N+1)r“r,2,

 

=>N(0,1)

 

77

where

 

2 n W12 n—30 n— k
z +2 z z:
- ——.N— — N — k
i=N n + 1 k—N i—N n + 111

Figure 5.6 shows the way how R-statistic is created. Also ﬁgure 5.6 illustrates gain in
time that is obtained when moving from D-Statistic to R—statistic. For illustration on
ﬁgure 5.6 we have chosen N = 4, n = 2000 and sampling rate equal to 4 observation
per second. To obtain the R statistic, we enter the history window of size n, start
with a small history of size N, and then for each summand in a numerator begin to
increase the window size one observation at a time. In each summand we predict
the crossing probability 15,, and then check it against the following (unseen by 13k) N
observations of a history data. Naturally, summands obtained in that way are highly
dependent. The main advantage of the R statistic over the D statistic is that the
former does not require independence of underlying observations Wk, whereas the D
statistic requires Yk to be separated in time. One can see that a variance expression
[7,2, has a term that considers correlations between observations of Wk values. We can
ﬁt one value of the R—statistic directly into the history time-frame. Therefore, with
the R-statistic, the available data are used much more efﬁciently. The following
section provides a mathematical proof of the statement above telling that the R
statistic should follow a standard normal distribution under the hypothesis that

predictions match realizations.

5.7 R—statistic: theoretical results

We start with a deﬁnition of the concept of strongly mixing. It is one of the criterion
used to quantify a situation in which the longer the time between observations, the
more statistically independent those observations become.

Deﬁnition. Let ...,X_1,X0,X1, ...,Xn, be a strictly stationary sequence of

random variables. It is called strongly mixing if a(n) —+ 0 where a(n) is deﬁned by

0(0) = SUP {IN/403) —P(A)P(B)|}
AEFQOO,BEFTCL)O

78

1 value of D-statistic: history of size 20000bs + 30 windows of size 4obs + 30 gaps
of size 1000bs = 5120obs (or 1280 secﬁnds when 4obs are taken for 1 second)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

{ \
x2001 - x2004 X2105 - x2108 x5117 - x5120

IX1 'x2000I I I I I x2006'x2104 I I I I I x21os-x6116 I I I I
' \ W4

5 Y1 = max{x2001a°--9x2004} “ Y2 Y3 - st Yso

'1‘ Expansion of X1 - X20... “ ................................

IX. ] X. | x. [x. | X. | X. X. | X. X. - X...» |X.... |X....|X....|X....|
V V 7 J 7 7
Q. w4 = max{x.,...,x.} - 6. 2V 1
k M wzooo = maxfxzoo1.---.Xzoo4} ' 62000 I

Y -

1 value of R-statistic: we need just history frame with size 20000bs + 1 window of
size 4obs = 20040bs (or 501 seconds when 4obs are taken for 1 second)

Figure 5.6: Reducing the time of self-veriﬁcation procedure when switching from
R—statistic to D-statistic

where F900 = 0{Xk : k g 0} and F80 = 0{Xk : k 2 n}.
Ibragimov and Linnik proved in [28] the following Central Limit Theorem.
Theorem 1 (18.5.4). Let V1, ..., Vn, be a strictly stationary sequence of ran-
dom variables with E (Vn) = 11.. Assume that the sequence is uniformly bounded (
i.e. there is C > 0 with anl S C, as), and strongly mixing with 220:1 a(n) <
00. Then a2 : Var(V0) + 222:1COV(V0,VA,) converges. If a2 75 0 then
22:1 Vk‘
ﬁe

A comment about the proof of Theorem 1. It is based on organizing

 

11 =>N(0,1)

the sequence into blocks that alternate between long and short ones. The total
contribution of the short blocks to the limiting distribution is negligible. One then
takes advantage of the assumption on the mixing coefﬁcients, namely: 230:1 a(n) <
00, by showing that the long blocks are essentially independent. Since the long blocks
are identically distributed due to the stationary assumption the result follows.

For our purpose in this article we need to convert Theorem 1 into a Central Limit

79

Theorem, where the mean ,u is replaced by an estimate that is updated regularly with

If V-
the additional data available to us as time moves on. We denote Dn = —1—.:ﬁl—£,
n = 1,2, and we have

Theorem 2. Under the assumptions of Theorem 1 we have

n . If
:11le II 1‘:

J50 => N((), 1)

VVIICI‘C an '2 th, — 17".

Comments about the proof of Theorem 2. We will not provide a full
detailed proof due to the length consideration. Instead we make a few comments.
Denote: a)”, = 1 — blunt where bk,n = ZILkI, k = 1,...,n, and then with
,u = E(Vn) we get

 

 

22:110. _ 2,3,/5 “k nd —Z a]. n (V1.0— u)
“”0 _ k—l

were the ﬁrst equality can be veriﬁed by simple algebra while the second follows

from 22:1 akﬂ = 0, which in turn follows from 22:1 bk,n :Zk— 12,_ k. a. =
n (1)

i=1 2 27:1 1 = n. We would like to show now that

 

n ak n(Vk _ l1) 2 2
(1) E(Z ’ . ) —>a asn—ioo
k=1 ﬂ

We will show that

3

(‘2)

                 

Indeed 2 2 2
TI. , 'n
k:1“k,n _ 2221(1— bk,n) _ 1+ 2:1.=1b/.,n

 

n TL

n
(since 2 bk,n = n)
1021

80

So to establish (2) we need to Show that

Z”
(3) J—S—ﬂ 2 asn—eoo

Since bk n ~ —log(%) we calculate

n 1
—— ~ 2 log2(:)( (7194/0 log2(;r)d;r =

lxllogtariaoge) — 2) + 21115: 2

and (2) is established. In a similar way we can prove that for each ﬁxed k > 0 we

get

—k.
a-.a-
(4) Ewal asn—eoo

A standard calculation gives

 

 

E12 ’ ) =1:
k=1 ﬂ

p-n

(”z—3k WWW.) V )

k=1 2': n , k

which leads to (1) by using (2) and (4). To prove Theorem 2, we now need to repeat
the idea of alternating short and long blocks as was explained in the comment
after Theorem 1. Finally we remark that an attempt to prove Theorem 2 by using
Theorem 1 and 177,, ~ [1 (justiﬁed by the ergodic theorem) will fail since the difference
between the relevant 2 quantities is not negligible even in the independent case.

Theorem 2’ (Extension of Theorem 2 to weighted sample means). Fix
n k7V
'y > 0 and let Wn = Vn — 1771, where 177,, = Jill—k where Tn)», = 22:1 1:7

no

81

Under the assumptions of Theorem 1, we have:

713:1 Wk

=> N (0
ﬁe ’

 

27+1)

Comments on the proof. Here we will deﬁne bk n :15] Z"; kT1—’ k- — 1,. n,

 

and we still get Zk=10 k,n = n, and
22:1 Wk _ Zn: ak,n(vk — M)
«to ‘ 1—1 «to

 

As before we will concentrate only on the asymptotic behavior of the second moment:

 

 

n a. .— . 2
(0’ E1}: ..,..(v. ”>24 "

 

kzl W 2’, +1 as n —> 00
To prove (1)’ we need to show
(3), Mal+ iasn—ioo
n 27' +1

+1
which is proved as follows: T714»), = ZZ’_1 k7 ~ %I— and then
_ I

 

TI. TL
1 1
1 . z . 3’7
z=k ’7 22k

_'y_+__171
7 k[i—73=.k= '7 717

So we get

k=1 k,n 7+12 k7 2 1
l l 211 m) 1”)

TI.

 

7+12/1 2 1
— 1—-7 d=1 —»
[7] 0( :L) a: +2,7+1 asn oo

82

Equipped with (3)” we get:

 

HE: ———az’nai+k’n —> 1 l: > 0
,_ n 27 + 1’ _

and (1)’ follows as before.
It will useful for us to replace Theorems 2 and 2’ by the following version which
follows immediately from Theorem 2 by a standard probability argument.

Theorem 3. Fix 7 2 0. Under the assumptions of Theorem 1 we have

7? W- 1
21:1 1:}

N (0, ——
“"0721 27 + 1

)

2
if £3 —> 1 in probability.
a

2

Three methods to estimate 0 . Here we use 7 = 0. From (1) we see that if

n is large enough we get

 

Ti. 073(71VI:_ I1)
424—)?— - £122" >“‘~o2
_ x/ﬁ)

Using the fact that {ll/Tn} is approximately stationary we see that we can utilize

Method 1:

where A! is a ﬁxed number (say 30). This works since by the ergodic theorem
2

. W. —k w.w. k
3:. —..L ~ E<W§> and 2);, ﬁg; ~ E(W0Wk)-

Alternatively we may use Method 2:

M(Zk+an W )2
“’2‘: n: Mn (TL-Afn)

 

where we can take for example .Mn ~ \/r—i. Again by ergodic theorem we get 0,2, ~

83

2
E(Z£I_n1 Vii—f? SO the requirenlent 53- —+ 1 in probability, C1088 hOld.

Finally, following Peligrad and Shao [47] we can consider as a method 3 the

x , ‘ . 2.
follovnng expressmn for 0n-

 

since under our conditions

>232: _ Vn)2
021(

log1(n

 

13|

—1|—>0 asn—>oo

5.8 R—statistic: experimental evaluation

We ﬁrst evaluate our new self-veriﬁcation procedure and then use it for evaluation of
the efﬁciency of the proposed improvement on the weighted prediction of a crossing
probability.

To evaluate the self-verification procedure, we first generated stationary se-
quences of zeroes and ones and run the self-veriﬁcation procedure on them. We
ﬁrst generated ARMA(p, q), p,q E {3, 4, 5, 6, 7, 8,9} stationary sequences, chose a
reasonable threshold and then produced “zero” if the time-series remained below
the threshold and “one” otherwise. Since generated time-series were known to be
stationary by the experimental setup, we always expected to obtain high p—values as
an output of the self-veriﬁcation procedure. Note that zeroes and ones were highly
dependent on each other in our setup. We experimented with different window—
sizes to learn what is the minimal history—window size that works well for the self-
ven'ﬁcatz'on procedure. This experimentation was conducted because we wanted to
separate clearly the potential occurrences of low p-values when running over real
data due to a failure of the self-verification procedure to do its job. It turned out

that beginning from history-sizes of 200, the self—veriﬁcation procedure works well

84

 

 

 

 

 

 

 

 

 

History size n (a of observations) 200 400 600 800 1000 1200 V3"" Range °l H'SWY'S'Z"
Observations Seconds
p—values before weights adjustment 0.0263 0.3196 0.3702 0.0023 0.0217 0.03022 400-600 100-150
p-values after weights adjustment 0.11 0.8845 0.6566 0.1258 0.2026 0.2793 400-1200 100-300
p-values before weights adjustment 0.0162 0.277 0.1436 0.0198 0.0367 0.0338 400-600 100-150
p—values after weights adjustment 0.0759 0.6941 0.6769 0.5685 0.2995 0.0158 400-1000 100-250
p-values before weights adjustment 0 0 0 0.0072 0.073 0.0061 0 0
p-values after weights adjustment 0 0 0.0324 0.123 0.251 0.0042 800-1000 200-250
p—values before weights adjustment 0.011 0.0974 0.3438 0.272 0.01225 0.0241 600-800 150-200
ovalues after weights adjustment 0.279 0.2048 0.4899 0.1303 0.87 06 200-1200 50300
ovalues before weights adjustment 0.021 0.054 0.0063 0.003 0.0086 0.0023 0 0
p-values after weights adjustment 0.026 0.32 0.56 0.054 0.0043 0.0021 400-600 100-150
p-values before weights adjustment 0.0522 0.0776 0.3744 0.1035 0.0007 0 600-800 150-200
p—values after weights adjustment 0.1333 0.4703 0.171 0.6262 0.0198 0.005 200-800 50-200
p-values before weights adjustment 0 0 0 0.005 0.5852 0.0085 1000 250
pvalues after weights adjustment 0 0 0.0043 0.103 0.532 0.342 800-1200 200-300

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.2: P-values for different
justment.

history window sizes before and after weight ad-

(reported p-values follow the pattern of uniform distribution on interval [0, 1]). We
conclude with ﬁgure very similar to 5.2, which shows two densities that describe
behavior p—values for our self-veriﬁcation procedure.

For evaluation of weights with decaying rate adjusted to a particular network
path, we used the Auckland II N LANR traces [1] as well as our own measurements
of end-to-end available bandwidth.

The Auckland-II dataset [1] is a collection of IP header traces captured with
a DAG2 system at the University of Auckland Internet uplink. It consists out of
multiple chunks of continuously taken measurements. We used this dataset consid-
ering that the available bandwidth over a link is merely a constant, minus the trafﬁc
through the link. We processed traces and our output was a link trafﬁc sampled at
the rate 4 observations per second. Then we applied our prediction technique with
different windows-sizes N and different thresholds M against which we wanted to
estimate the crossing probability.

For each trace in Auckland-II dataset we followed the following setup. We ﬁrst
ran a prediction and self-veriﬁcation module, thus obtaining a range of history—
windows that were suitable for a prediction. Then we ran the parameters adjustment
procedure and after that again obtained the range of windows that were good for a
prediction.

The second part of our experimental veriﬁcation deals with measurements of end-

85

 

to-end available bandwidth that were collected by us over different Internet paths.
Measurement nodes were located at Michigan State University, Chinese University
of Hong Kong and Shenyang Institute of Automation, Chinese Academy of Sciences.
For measuring time-series of indicators {Xi} we used a Pathload [29] measurement
tool modiﬁed based on our algorithm introduced in chapter 4 for a robust trend
detection. The difference between working with trafﬁc trace and actual sampling of
an available bandwidth is that in the latter case we are unable to alter a threshold
because measurements were conducted against one threshold level at a time.

Table 5.2 shows p—values for different windows sizes before and after adjustment
of parameters. We present a number of representative samples from the Auckland
traces and from our measurements. We predicted the crossing probability for N
varying around 5, which corresponds to the time-period of the order of a second.
We can easily see that the range of history windows that are valid for a prediction
increases after application of an adjustment procedure. In some cases we even see
that the suitable history window sizes were not available before running the weights
adjustment procedure. This means that in cases mentioned above, the weights
adjustment procedure made predictions possible.

We emphasize that the p—value of a test is roughly a likelihood to observe what
has been actually observed under the assumption that predictions of a crossing
probability match on—coming events. If this likelihood becomes small (say less than
0.1) we conclude that it is very unlikely to see what we see under the assumption
of the Null hypothesis and therefore the Null hypothesis has to be rejected with
the conclusion that the predictions are not accurate anymore. We therefore assume
for our algorithm that if the reported p—value is more than 0.1, then the window
that corresponds to that p—value is valid for a prediction. According to this scheme
we report in the two right columns the ranges of windows that turned out to be
valid for a prediction. In the ﬁrst column the range is presented in the number of
observations while in the second in seconds.

In order to show the performance of the algorithm on many samples, we provide

ﬁgure 5.7. Figure 5.7 shows two histograms of ranges of acceptable window sizes

86

 

25

# of samples

 

 

 

0 50 100 150 200 250 300
Seconds

Figure 5.7: Histograms of acceptable range of lengths of history windows before
and after weights adjustment. The black histogram shows the range distribution
after adjustment whereas the gray histogram shows the range distribution before

adjustment.

87

(upper bound minus lower bound) reﬂecting 100 samples of behavior of an available
bandwidth at different days and places. The grey histogram is related to the case
when uniform weights were used while the black histogram shows a range of accept—
able windows when weight adjustment procedure has been applied. We can see that
a gray histogram captures zero, which means that for signiﬁcant number of traces
the acceptable range of history-windows does not exist. The black histogram shows
that the application of weight-based adjustment increases on average more than 3

times the range of history windows, which is good for predictions.

5.9 Limits of a veriﬁcation procedure

In this section we discuss limits of applicability of the above—introduced self-evaluation
procedure. An application of our self-evaluation technique naturally fails when re-
ported values of a crossing probability are extremely small. Consider, for example,
a limit case when the whole history window does not capture even one event of a
crossing. Then the predicted value of a crossing probability will be evaluated as
zero and the estimated value of a variance 6,2, in the expression for R statistic also
becomes zero. The theoretical proof in section 5.7 also assumes a non-degenerate
case for predicted values of a crossing probability and thus a non-degenerate case for
the variance 02. In practice one can observe that if the predicted values of a crossing
probability are too close to zero then reported p-values become small just because
self-evaluation procedure does not do its job. We note that if the crossing probabil-
ity is nearly zero then the operator concludes that network environment is safe and
tele—operations can be allowed anyway. An Operator needs to track cases when the
crossing probability signiﬁcantly differs from zero andfor these cases self-evaluation
technique works well.

We believe, therefore, that it is important to understand for how small values of
a predicted crossing probability the self—evaluation technique still works correctly.

With the Auckland-II dataset we were in possession of exact values of available

bandwidth time-series. Therefore we were able to freely alter threshold against which

88

a crossing probability had to be estimated.

Altering the crossing probability threshold made it possible to determine the
range of applicability of our veriﬁcation technique with respect to the value of a
crossing probability which has to be estimated. We note that with a decrease of the
value of a crossing probability (by moving down an available bandwidth threshold),
the history window size needs to be increased for avoiding a degenerate case in the
denominator for an R—statistic. Then the history window can still capture a couple
events of a crossing. In other words by increasing the history window we increase
the granularity at which the crossing probability can be predicted and therefore
allow crossing probability values to be closer to zero. At the same time the upper
bound for the length of a history window does not depend on the value of a reported
probability. This bound is solely deﬁned by the time-period during which one can
still consider the available bandwidth time-series as stationary. Therefore for each
ﬁxed future window size N there exists a minimal value of a crossing probability P
at which the acceptable range for a history window size decays completely.

We note that the size N of a future time-frame over which the prediction of a
crossing probability is made also affects the range of history window sizes. Suppose,
for example, that a crossing probability P is ﬁxed at a certain value. Then it is
natural to assume that with an increase of the window size N, the history size n
also has to be increased (such that history window of size n still contains a signiﬁcant
number of non-overlapping windows of size N).

Based on aforesaid we conclude that the minimal value of a crossing probability
for which the self-evaluation test still works well should be increased with an increase
of a window size N over which the prediction is made.

We used the Auckland-II dataset for experimental determination of how the
mentioned above minimal predicted value of a crossing probability depends on the
size N of the future window over which the prediction has to be made. Also we
compared applicability limits of prediction/ veriﬁcation techniques with R—statistic
and weights against the one with D statistic .

For each technique (the new one related to R—statistic and weights-based pre—

89

 

0005 I l l l I I i

0.045

Probability

*Hiﬂmﬂi ¥ *¥*

 

 

 

 

:2
0.02 -+
0.015 7 f .i
0.01 - i.
0.005 -
0 - t l J
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Seconds

Figure 5.8: Dependency of minimal value of a crossing probability for which ver-
iﬁeation procedure works well on the future time-window over which prediction is
made and on prediction / veriﬁcation technique

diction, and the one related to D-statistic) we conducted the following procedure.
First, we ﬁxed the future window size N at a certain value. Then we made a binary
search for a threshold which on one hand still reports the maximal p-value higher
than 0.1 and on another hand has the value of a crossing probability as small as
possible. Note, that for each value of N and each threshold level we tried a set of
different history window sizes and ﬁnally selected the maximal p-value among those
that were reported. In other words, if at least one history window size existed for
which the reported p-value was higher than 0.1 then we were further decreasing
the available bandwidth threshold for achieving further decrease of the value of a
crossing probability. Finally, after the binary search converged, we outputted the
value N together with values of a crossing probability. These minimal and accept-
able crossing probability values were obtained while checking available bandwidth
against the “boundary” threshold obtained with the help of mentioned above binary

search. In ﬁgure 5.8 we marked these values by red “+” signs for the experiment with

90

F'-

old prediction / veriﬁcation technique without weights. Minimal acceptable crossing
probability values related to the R-statistic based prediction/ veriﬁcation technique

“*” signs.

are marked in ﬁgure 5.8 by blue

The same procedure has been repeated for N E {1, 2, 3, 4, 5, 6}, where one obser-
vation corresponds to a time-period of 0.25 seconds. Therefore the future time-frame
over which the prediction was made varied within the range of [0.25; 1.5] seconds.
Note, that for representation purposes we slightly shifted apart “-1—” and “*” signs.
For example, “+” and “*” related to a time-period of 0.25 seconds are shown as
observations related to 0.24 and 0.26 seconds respectively.

Finally, on ﬁgure 5.8 we connected the mean values of a minimal crossing proba-
bilities reported for each N with two solid lines (for old and new techniques respec-
tively). From ﬁgure 5.8 we conclude that the veriﬁcation approach still works for real
data when the value of a crossing probability P drops as low value as 0.01. Figure 5.8
shows that, as it was expected, the minimal value of P increases with an increase of
a window size N over which the prediction is made. Also from ﬁgure 5.8 clearly fol-

lows that new prediction/ veriﬁcation technique signiﬁcantly decreases lower bound

for an acceptable values of a crossing probability.

5.10 Conclusion

The basic result in this Chapter represents a robust non-parametric method of com-
putation of a “crossing probability”. This probability deﬁnes a quality of an end-to-
end network path with respect to a speciﬁc real—time task. The probability represents
the chances of a real-time task execution to fail due to the inability of the path to
give sufﬁcient bandwidth to communicating parties.

Our approach has a capability of dynamically choosing the length of the window
n that is used for prediction. Since we provide a robust veriﬁcation technique, our
algorithm can continuously do a self-check of the accurateness of the prediction
at run—time and automatically adjust the length of a prediction window. Low p-

values of a test can be considered as a clear indicator that for a given network path

91

 

the window size 71 does not do a good job in prediction and therefore needs to be
adjusted.

After presenting a basic non-parametric prediction/ veriﬁcation technique we in—
troduce two signiﬁcant improvements of a non-parametric approach to estimation
of a crossing probability. First, the improvement deals with tuning basic parame-
ters such as the size of the history n used for a prediction and also the coefficient
a that can be treated as a rate at which the network path “forgets” its history.
This is important because there exists a large variety of patterns of trafﬁc along
different Internet links. Some links may be highly multiplexed by a large number
of users and some may have only sporadical trafﬁc bursts associated with instanta-
neous user activity. At some links, the majority of a trafﬁc may be constituted by
a video-transfer, which has highly-revealed long-memory nature and at some links
trafﬁc may even represent a Gaussian time-series with exponentially decaying auto—
correlation, and therefore short memory properties. Since the available bandwidth
time-series is tightly related to a cross-trafﬁc, we see automatic tuning to a speciﬁc
network path as a crucial factor, which makes our approach applicable to a wide
range of behaviors of available bandwidth time series across the Internet. Second,
the improvement deals with an efﬁcient procedure that helps to verify the correct-
ness of the prediction and therefore is a kernel for efﬁcient parameters adjustment
to a speciﬁc network path. We accompany both, D-statistic and R-statistic with
a solid theoretical argumentation that proves the validity of a chosen veriﬁcation
procedures. Therefore the work represents a combination of theoretical results and
experimental veriﬁcation of these results applicability to various real—life data-sets.
We test the efﬁciency of a weight-based approach with decaying rate a on publicly
available trafﬁc data—sets as well as on data-sets of available bandwidth collected by

us on a number of Internet paths.

92

Chapter 6

Re-route Sequence Planning

Problem

As it was stated in Chapter 3, a tele—operation can be viewed as a number of ﬂows
with predeﬁned QOS requirements. These ﬂows need to be allocated across the
Internet through the best possible network paths.

We believe that the promising suggestion is to consider allocation of tele—operation
“facilitators” at geographically dispersed parts of the Internet. The idea is in setting
up an application-level virtual network that facilitates allocation and reallocation
of channels through different network paths. There are a number of efforts that
consider dynamic network high-level route allocation, deallocation with the help of
overlay networks [2, 37] for optimization of network delay or guaranteeing compliance
to a given QoS requirements.

In this chapter we take one step towards setting up such an infrastructure while
considering a mathematical aspect of a Reroute Sequence Planning Problem [32].
The problem can be brieﬂy stated as follows. Assume that a set of channels with a
given bandwidth requirements or demands is allocated through the currently optimal
set of network paths. Then sometime later another Optimal set of network paths
is computed. The second set of paths reﬂects the changed QoS conditions of the
network environment and therefore may differ from the previous set. The Reroute

Sequence Planning problem chooses the optimal sequence of channel reallocations

93

from an old set of paths to a new set of paths that minimizes the amount of extra
network-link bandwidth required for conducting the reallocation procedure.

This chapter deals with the well-known mathematical “Dynamic Vector Surn-
mation” problem to which the Reroute Sequence Planning problem can be easily
reduced. In particular, we give a tight bound for maximum extra link bandwidth
capacity required for carrying out reroute procedure. We also provide an algorithm
that ﬁnds in polynomial time the sequence of route reallocations that complies pro-

posed bamlwidth bound.

6. 1 Introduction

The compact vector summation as a pure mathematical tool has been introduced
and developed at the beginning of 20-th century by Levy [35] and Steinitz [55]. The
problem can be stated as follows: given vectors in a normed linear space ﬁnd their
rearrangement 1r for which the maximum deviation R” of partial sums from zero
is as small as possible; or, estimate Rn under the best rearrangement 7r .

There are at least two sound functional—analytic problems where these methods
are being used. One of them is the description of the set of sums of a conditionally
convergent series in a topological vector space. Another problem is the famous
Kolmogorov problem (open so far) on the system of convergence of abstract Fourier
series under a rearrangement of the orthonormal basis. For an extensive bibliography

devoted to this subject, see [25, 33, 22, 10].

There are also a series of problems in scheduling theory that reduce to this
problem stated in the next section (see [52]). The Reroute Sequence Planning (RSP)
problem described by Makai [41] is an example of such a scheduling problem in
telecommunication networks. Below we give a description of RSP problem.

A network consists of the set V of vertices and the set E of edges connecting
certain pairs from V (not necessarily all pairs), IE I = d (the number of edges is

d ). Consider a system of paths 73 = (P1, . . . , Pn) , the path Pi connecting the

94

 

 

Figure 6.1: Network graph example

vertices s,- and t,- ,z' = 1, . . . ,n. By a path connecting 3,; and t,- we mean a
broken line consisting of edges with the starting point at 32- and endpoint at t,- (of
course one can state the problem formally, entirely in terms of the abstract graph
theory).

Fig. 6.1 shows an example network consisting of 8 vertices and 8 edges. Consider
a system ”P = (P1, P2) of 2 paths, where P1 connects 31 and t1 and consists of the
edges 6 1, e4, e6 and e7 , while P2 connects 32 and t2 and consists of 82, e4 and
e8 .

Each of the paths Pk,1 g k g n imposes a demand ”k on each of the edges of
the path. If an edge belongs to several paths, then the total demand u(e) imposed
to this edge is the sum of corresponding demands. For example, in Fig 6.1 the edge
e4 which is common for P1 and P2 has the total demand 111 + v2 . The demand
on the edges 61,86 and e7 is 111 , while the demand on e2 and 88 is 222 .

Each of the edges ek has a value of capacity Ck assigned to it. Capacity for a
given edge represents the maximal demand that can be imposed to the edge. If for

every edge in E the total imposed demand does not exceed the edge capacity, then

95

 

the network is believed to operate without violation.
In reroute sequence planning (RSP) the system 73 = (P1, . . . , Pn) of paths
should be replaced by another system Q = (Q1, . . . , Qn) that connects the same

' as

vertices s,- and ti, 2' = 1, . . . , n. Each path 622- in Q has the same demand vz

the corresponding path P,- in ’P. In a reroute sequence procedure at each step only
one Pi is being replaced by the corresponding Qz' . The RSP problem consists of
ﬁnding a sequence 91, ..., gin of rearrangements of {1, . . . ,n} such that the successive
replacement P91 by Q91, ..., Pg” by an does not cause the demand v(e) after each
replacement to exceed the capacity C(e) for each e. Denote by :rz-(e) the change in

demand for the edge e when P,- is replaced by 622- . We have

f

.rz-(e) = ( —v2-, if e 6 Pi, 6 ¢ Qi

vi, if e¢Pi, eEQi

 

[0 otherwise.

The vectors $1,...,:1:n belong to the space Rd , where d, the dimension
of the space, is the number of edges. Denote by L0(e) the starting demand on

the edge e, and assume we have chosen the permutation deﬁned by a sequence

91,...,gn of replacements P91 by 6291, P92 by Q92, ..., Pgn by Q9”.
Then the demand on the edge e, after the l—th step is

[40(8) + 1391(6) + . . . + £91 (6) .
This implies that the demand on the edge e at each step does not exceed

L e + max maxa: e +...+a: e
0() 1S[SneeEIglm 9M

=L e+ max I +...+:1: =L e+ a: ,
0() 13,9” 91 gllloo 0() lllrrlll

where H - [loo denotes the maximum-norm, while |||:r7r||| = max1<l<n ”.1391 +. . .+

$91 [[00 . Thus, we have come to the problem of ﬁnding a permutation a minimizing

96

 

lllrvrlllo

In Section 6.2 we state the problem on compact vector summation and prove the
basic Transference Lemma that provides easy proofs for existing results and reduces
the problem to that of choosing signs. Some of consequences of the lemma are given
in Section 6.3. In section 6.4 we consider the case of maximum-norm which has
many applications. In the concluding section 6.5 we give an algorithmic result of
ﬁnding an appropriate permutation a for a general norm, given a sign-algorithm.
The algorithm runs in a polynomial time, provided that the sign algorithm runs in

a polynomial time.

6.2 Setting Of The Problem And Transference
Lemma

Let X be a normed space, a: = (1:1, ...,xn),.r1, ...,am 6 X. Denote
lllxvrlll = [32; Harm) + . ~ - + $74k)“,

where 7r: {1,...,n} —> {1,...,n} is a permutation of indices {1, . . .,n}.

The main problem (on compact vector summation) we study in this paper is as
follows.

PROBLEM : Given elements 11:1, ..., am from a normed space X

(i) Estimate minyr |||:r7r[|| .

(ii) Find a permutation 7r : {1, . . . ,n} —+ {1, . . . , n} ensuring the estimate in (i).

We will study both theoretical and algorithmical aspects of Problem. The follow-
ing lemma plays a crucial role in solving the problem and in algorithmic construction
of an appropriate permutation 7r.

For a permutation 7r: {1,...,n} ——> {1,...,n} we write 7r 2 (k1,...,kn),
where ki = 7r(z'),i = 1, . . .,n. The permutation 7Tb = (kn, kn_1, . . . , 15.1)) is called

the backward permutation. Given an ordered collection of vectors a". = (3:1, . . . ,xn) C

97

X we put
:rwd _= max :1: t9 +...+:c.6. ;
I” [ll 1<k< ll 1 1 A, kll

s = —Z?___1 xi; 8,36 and 22,8 will stand for the collections (s,w1,...,:cn) and
(r1, . . . ,xn, s) respectively.

Lemma 6.2.1. Let X be a normed space, x1, ..., In E X, . Then for any permu-
tation 7r : {1, . . .,n} ——> {1, . . .,n} and any collection ofsigns 6 = (01, . . .,6n), 6,; =
i1, i=1,...,n

|Ha77r||| + lllIrerHI 22|||$7r>r||| — H8“, (6-2-1)

where 7r* = 7r(/.~,+),7r(i~,+_1), . . .,7r(k_1+),7r(kl—), wag), . . .,7r(k,7,), and kit < k; <
+

< kl are all the indices with Bk? '2 0k;— = . .. = 6k; = +1, while ki— < k2- <
< 1:77,; are all the indices with 6k— = 0k- 2 = 613— = —1.
1 2 m.

Proof. Using the fact that ”[513]” enjoys the properties of a norm in the space

of sequences, we get by the triangle inequality for [I] - Ill

lllxvrlll + ”12:7.me = lllwvr.s|||+ lllwafllll = Him, |||+ lllzvrbﬁlll
2 max(llls.2rr7iblll.Ills.2:v7?b||l) = Ills.2x.t,,2x;,sill = His.2x.t,,.2x;m 2
2Hix+ x‘m — Ilsll = 2mx Ill — llsll
7Tb: 7f wilt .

Lemma is proved.

The next lemma converts our rearrangement problem into a well-known combi-
natorial problem on choosing signs.

Lemma 6.2.2 (Transference Lemma: An extension of a Transference
Lemma proved in [16]). Let X be a normed space, 271, ..., an E X, s = — 2? wk.

(i) Then there exists a permutation a : {1, . . . ,n} —> {1, . . . ,n} such that for

any 6 = (61, . . . , i9”) , 6i = 21:1,i = 1, . . . ,n the following inequality holds:
Illrvalll S lllIobfllll + IISII. (622)

where 0b stands for the backward permutation, where indices go in the opposite

98

 

direction. As the permutation a one can take the optimal permutation, the one at
which [[[Tnlll attains its minimum.
(ii) ”22:1 :ck = 0, then for the optimal permutation o and any 6 = (91, . . . ,ﬂn) ,
02- : :l:1,i=1,...,n
”[5130]” S lllitaOHI- (52-3)

Proof. (i) follows from Lemma 6.2.1, if we take as it in (6.2.1) the optimal
permutation a . (ii) follows from (1).

Remark. Part (ii) of Lemma 6.2.2 has been proved in [16]. A straightforward
use of (6.2.3) along with the triangle inequality would lead to a worse version of
(6.2.2):

llll'alll S lllInOHI + 2H8”

that worsens related algorithmic estimates.

6.3 Application of Transference Lemma to Prob-
lem. An estimation of min7r |||:c,.|||

Theorem 6.3.1. For any collection (:51, . . .,a:n) from a normed space X there

exists a permutation o : {1, . . . , n} —> {1, . . . , n} such that

k n n n
”[370]” s E ggliZmOW-iu + 1122:.“ s2EiIZx.r.-H + 1122:.” .(6.3.1)
— 1 1 1 1

where r1, . . . , rn are Rademacher random variables ( independent and taking two

values -1 and +1 with probability 1/2), and E is the operation of expectation.
Proof. It is enough to average both sides the inequality (6.2.2) of Lemma 6.2.2

with respect to all collections of signs 0 = (61, . . .,6n) and then use the Levy

inequality.

99

Remark. In fact Lemma 6.2.2 gives a stronger result (see [16]), namely

n I n
1
ﬁZlH-rrrlll s 2EIIin-rill + IIZxrll.
‘ 7r 1 1

In other words, (6.3.1) holds for a typical permutation o.
In the case of l2—norm (or, more general, for type—2 norms) the last inequality

implies the following:

n n
1 2 2 2
EXIHIWIII SCE :Ilrill +||§I$zt||1

where C depends only on the norm. The inequality goes back to [22] where it has
been stated for the scalar case.

To our knowledge there is no any estimation for IIIlIl7'j' |||r7r||| that is better than
(6.3.1) for an inﬁnite-dimensional X.

In the rest of this section and in section 6.4 we will be ﬁnding estimations for
[llxglll for an appropriate 0 through Transference Lemma and a known estimation
of “[16]“ for an appropriate collection of signs 0 . Such a well-known estimation
of |||x0||| is the following.

Theorem 6.3.2. ( [18], generalization by [24]). Let X be a normed space
of dimension d ,x1,...,xn E X, ||in| S 1, i = 1,...,n. Then there exists a

collection of signs 6 = (01, . . . , On) such that
IIIIEQHI S 2d.
Corollary 6.3.3. Ifx1,.. .,xn E X, dimX = (1, ”xi” S 1, i=1,...,n, then
there is a permutation o : {1, . . . , n} —> {1, . . . ,n} such that

|||xg||| 3 2d +||x1+...+ in”. (6.3.3)

100

Remark. The inequality (6.3.3) is the famous Steinitz (1916) inequality. It can
be proved in a straightforward manner, without appealing to Lemma 6.2.2 (see [33]).

We believe, however, that the proof using Lemma 6.2.2 is simpler.

6.4 The case of ldo'O

This is the d-dimensional space Rd with the 111aximum-norm. This case is most
important for applications in scheduling theory. For low dimensions the general
estimate (6.3.3) is the best among known ones. However, in most applications to
scheduling theory the dimension d is much larger than n. Instead of using the
Dvoretzki-Hanani theorem in this case we exploit the following assertion.

Theorem 6.4.1. If $1,...,a:.,. 6 (go, Hat,” 3 1,2‘ = 1,...,n, then for the
Rademacher random variables r1, . . . ,rn

(i)

n
El] 2 xkrkH2 S Cnlnd,
k=1

where C is an absolute constant;

(ii) £7“! 26 [||x6[|| 3 C1 M, where C1 is an absolute constant;

(iii) There exists a 6 such that

”[156]” SCp/nlnd (6.4.1).

Proof. (i) Denote X (k) = zT-lzl xgklri, k = 1, . . . , (1. Using the integration by

parts formula we get for any 5 > O

n 00
EHZxZ-rillzzE max |X(kl|2=/ P( max |X<kll>t)dt2=
1 ISk'Sd 0 1<k<d

0° 2
P(|Y| > t)dt ,

foﬁjdﬂ +foo [jdt2 <62+/6OO :P(|X( kl|>t)dt2<62+d/6

1:21

101

 

where Y is the X (I) for l where (600 P[|X(k)| > t]dt2 attains its maximum. Now

using the following well-known estimation for the tail probability

2
t
PUYI > t) S 2€XP{————}.
22in?-
where Y = 2313/2273» we get
71 00 _ t2 _62
E||inri||2362+dfé 2e ﬁtdt=62+2dne 277.

1

Putting (52 = 2n ln 2d we ﬁnd

n
El] 2 Iirill2 g n(1+ 21112d).
i=1
This proves (i).

(ii) is a consequence of (i). Indeed,

I n
,7 Z 1112:6111 = Emmn s (ElllrrII|2)1/2 s (Elixir/€17.16)”2 = cal/Winner.
6 1

(iii) is an immediate consequence of (ii).

Remark. 1. Theorem 6.4.1 (iii) has been proved ﬁrst by use of combinatorial
methods by Spencer [53]. Statement 6.4.1 (ii) shows that not only does the desired
6 exist, but (6.4.1) holds for a typical 6.

2. In another paper Spencer [54] has proved a surprising result saying that in the
case of d = n there is a collection of signs 6 = (61, . . . , 6n) such that |||x6||| S Cx/d,
where C is an absolute constant. However, as the following example shows, in the
case of arbitrary d and n Theorem 6.4.1 (iii) cannot be improved.

Example. Consider a matrix whose rows are all possible sequences of length n

consisting of +1’s and —1’s. Our vectors x1, . . . ,xn in lgo are all the columns of the

102

matrix. If one of the columns is multiplied by —1, we still get a matrix consisting of
all different :tl-sequences of length n, and in particular there is a sequence consisting
of +1’s only. This implies that |||x6||| = n for any 6 = (61, . . . , 6n). In this example
d = 2”, and |||x6||| = n = m which up to a constant coincides with (6.4.1).
The application of the transference lemma leads to the following assertion.
Theorem 6.4.2. If x1,...,xn E lgo,||x,-|| S 1,i=1,...,n, then there is a

permutation a : {1, . . .,n} —> {1, . . .,n} such that
[[[Ig’lll g Cannd+||x1+...+xn||,

where C is an absolute constant.

6.5 An algorithm for ﬁnding 0

In all previous results concerning estimation of min” |||x7r||| (Theorem 6.3.1, Corol-
lary 6.3.3, Theorem 6.4.2) we stated only the existence of a permutation o ensuring
the corresponding estimation. In this section we give a constructive way of ﬁnding 0‘
provided that the respective (much simpler) sign—problem possesses an algorithmic
solution.

Let us Show ﬁrst that a greedy algorithm (that seems to be natural) is not, in
general, the best.

Given vectors x1, . . . , xn E X a greedy algorithm chooses at each step a vector
that minimizes the norm of the next partial sum. In other words, on step 1 it
chooses an element xnl that has a minimal norm. On step 2 it selects an element
xn2,n2 76 n1 such that [Ixnl + x712” S ||xn1+ xnkll for any k # n1, etc.

The following simple example constructed by Jakub Wojtaszchik (oral commu-
nication) shows that a greedy algorithm is not in general the best one even in the
2-dimensional case.

Example. Consider it groups of vectors each consisting of the three following
vectors: (1, 1), (2, —3), and (3, —2). Obviously, the greedy algorithm chooses at
the ﬁrst it steps the vectors (1,1),(1,1),...,(1,1). Therefore, if ﬂy is the greedy

103

permutation, then [[errng 2 n. In fact, lll177rglll = n + 2. While the optimal algo-
rithm arranges the vectors as follows: (1, 1), (2, -3), (-3, 2), (1, 1), (2, —3), (—3, 2),

etc. For the optimal permutation no we have |||X7r0||| = 3. Therefore, the ratio
llll‘rr ”I n 2 .
mm = —‘l3'— can be made arbltrary large.

Consider now a collection x1, . . . ,xn E X, “at,” S 1,i 2 1, . . .,n, where X is a

general normed space. Assume that for any permutation 7r there is a collection of
signs 6 = (61, . . . , 6”) such that |||x7r6||| S D for some D > 0, and that the desired
6 can be found algorithmically. This is the case for vectors involved in Theorem
6.3.2 and Theorem 6.4.1.

Remark. For a general d-dimensional X such a polynomial sign-algorithm with
D = 2d was constructed by [6], for X = 15,10 such an algorithm with D = m
was described by Spencer [53]. Another algorithmic, type result that uses a Sign
procedure was found earlier in [15]. An assertion similar to Theorem 6.5.1 was
stated without proof by Makai [41] for the case 5 = 0. His arguments are based on
Transference Lemma 6.2.2.

Below is the description of the permutation-algorithm.

We start with an arbitrary permutation 7r : {1, . . . ,n} —> {1, . . .,n}. Then we
ﬁnd a 6 satisfying |||x7rb6||| S D, where rrb is the backward permutation (see Lemma

6.2.1). According to Lemma 6.2.1,
lllxvrlll + lll$7rb9lll Z 2|||a77r1|l| - IISII (6-5-1)

for some permutation al. An easy algorithm determining rrl is described in Lemma

6.2.1 and runs in O(n) time. Using (6.5.1) we get

1 1 .
lllr-vrllll S §<|l|x7rlll + lllzrvrbfllll + IISII) S -2-(||l:vrr||| + 0+ ||8||)- (6-5-2)

Then we ﬁnd 6(1) such that xyrlb6(1l S D and a permutation n2 such that

was. Help. 6<1l|l|22lllx7r2lll—Ilsll
1 1b

104

In the same way as in (6.5.2) we get

1 3
ll|$7r1|||S Z—Qllll‘vrlll + 2—2(D+ HSID-

After l iterations we get

.7: 1
|ll27lrlll +(1_ ?)(D+ IISH) S D+ HSH +6

 

lllitﬁllll S

where 6 = 2%|[|x7r|||. Noticing that “|de S n, by an appropriate choice of l, we
can make 5 smaller than any ﬁxed 6 > 0 (to get 6 < s, l = [1n loll + 1 iterations are
needed, where [x] denotes the integral part of x.)

Thus we have proved the following.

Theorem 6.5.1. Let X be a normed space, x = (x1, ...,xn),x1, ...,xn E
X, ”1132” S 1, i = 1,...,n. Assume that for any permutation 7r : {1,...,n} ——r
{1, . . . , n} there is an algorithm with a polynomial complexity to deﬁne 6 = (61, . . . , 6n)
such that

|||337r9||| SD. (1)

where D does not depend on 7r . Then for any 5 '> 0 there exists an algorithm

with a polynomial complexity to deﬁne a such that
lllxalll S D + “8” +5.

where s = — Z? x,- The complexity of the algorithm is O(C - log(n/e) ), where C
is the complexity of the sign-algorithm ( for ﬁnding 6 )

105

Chapter 7

Summary and Future Research

In this chapter we summarize ﬁndings described in previous chapters and also brieﬂy

specify future research directions that are suggested by these ﬁndings.

7. 1 Summary

This dissertation proposes to consider a new quantity that describes the behavior of
available bandwidth, which is called “crossing probability”. Crossing probability is
a likelihood that the available bandwidth time series drops below a certain quality-
critical threshold M for the quality-critical time-period T. Both, the threshold and
the time-period have to be deﬁned by speciﬁcation of a real-time task that has to
be executed.

At the beginning we concentrated our efforts on estimating crossing probabil-
ity with a parametric model that relies on observations of an available bandwidth
obtained with the help of a Pathload measurement tool. We used Pathload for col-
lecting available bandwidth measurements because our research showed that it was
the most robust and accurate measurement tool among those available. We success—
fully applied the ARCH 2 model used in econometrics to modeling available band-
width behavior. In particular, we combined two known estimation techniques used
in econometrics thus obtaining our method for quick estimation of model parame-

ters together with their conﬁdence interval. We used known results from extreme

106

 

values theory for computation of a crossing probability for arbitrary threshold Al
and time-period T. Then we veriﬁed an efﬁciency of our approach on a number of
Internet paths.

Before going any further with available bandwidth modeling, the dissertation
discusses some work in improving the One Way Delay trend detection mechanism of
present available bandwidth measurement tools. Instead of using empirically deﬁned
cutout thresholds for trend-detection metrics, we propose to detect trend based on
the p—value of linear regression t-test conducted against slope of one way delay as a
function of a packet sequence number.

We also suggested a simple way of dealing with the interrupt coalescence / context
switch effect. Our experiments clearly indicated that both Type I and Type II
errors of trend detection were signiﬁcantly smaller with t-test than errors reported
by extensively used before PCT/ PDT test.

The next part of the dissertation deals with a non-parametric approach that relies
on binary observations of {0,1}, where 0 indicates that available bandwidth at a
particular time-instance is higher than a speciﬁed critical threshold and 1 indicates
that available bandwidth is lower than a critical threshold. Our non-parametric
model gives an answer to the question: “What is the probability that among the

“1”” in contrast to our

next N observations of indicators will be present at least one
parametric model that gives an answer to the question: “What is the probability
that the minimal over next N observations value of an available bandwidth time
series drops below the level of M”.

With the non-parametric approach relying on {0, 1} observations, one can predict
what happens within the frames of the next couple of seconds while considering a
history time-period of 10 minutes. The parametric approach that relies on slow
available bandwidth measurement tools needs observations collected for the period
of at least 2 hours for predicting of what happens within next 5 minutes.

In addition to introducing of a non-parametric model for a “crossing probability”

we developed a self-veriﬁcation procedure that combines predicted probabilities of

realization of the event of a crossing and actual realizations/non—realizations of the

107

 

mentioned event into a statistic that follows the standard normal distribution if
predicted probabilities “match” on-coming realizations. We also present a theoretical
proof of the fact that the proposed self-veriﬁcation statistic really follows a standard
normal distribution given that the prediction module works correctly.

The self-veriﬁcation procedure opened further opportunities to adjust the pre-
diction module to a particular network path. We output predictions of a crossing
probability based on a number of sizes of a history time-frame used for a predic-
tion. For an actual prediction we use the window size that is “conﬁrmed” by a
self-veriﬁcation test.

Then we added weights to the crossing probability prediction. Even within the
frames of a history window the most recent past inputs into prediction with a weight
higher than the less recent past. We adjust the rate at which weights decay to a
particular network path. Experiments with real-life data-sets of available bandwidth
time-series conﬁrmed that introducing weights signiﬁcantly increases the range of
history window-sizes valid for a crossing-probability prediction.

The last part of this dissertation is dedicated to the future research of possible
actions that communicating parties may undertake if a crossing probability becomes
unacceptably large. One of the promising scenarios of actions dedicated to support—
ing tele-operation under a degraded quality of service is the reallocation of part of
the communication channels through different network paths with the help of ex-
ternal nodes-facilitators. In the context of such a reallocation arises an interesting
and important “Reroute Sequence Planning (RSP)” problem. The problem chooses
the optimal (or nearly optimal) sequence of channel reallocations (once they are
planned) in a way that minimizes extra capacities required along the network for
carrying out the reallocation procedure.

In the last part of this dissertation we develop a solid theoretical basis for the
algorithm that ﬁnds a lower bound for an extra needed capacity (for carrying out
reallocation) as well as actually ﬁnds in polynomial time a sequence of reallocations
from old paths to new paths that ﬁt the mentioned above bound.

A part of the dissertation was published in the frames of following references: [13,

108

12, 14]

7 .2 Future research

7 .2.1 (208 overlay network infrastructure

As it was described in Chapter 6 we see a promising direction in setting up an
overlay infrastructure that helps in dynamic allocation-reallocation of communica-
tion channels through various network paths in a way that minimizes likelihood of a
failure of remote real-time task execution. Chapter 6 makes a step towards setting
up such an infrastructure. However, it is clear that there are still many unsolved
directions to follow. Directions include algorithms for ﬁnding the best set of paths
with respect to particular QoS-related probabilistic requirements, detecting an ap-
propriate time-moment for triggering the reroute sequence procedure, ﬁnding the
best reroute-sequence procedure with respect to probabilistic (but not hard-coded

as in Chapter 6) QoS characteristics, and many others.

7.2.2 QoS model improvement

The Quality of Service model that was introduced in Chapter 3 assumes that real-
time task can be speciﬁed with some ﬁxed value of a rate (bandwidth) that the task
requires for a safe execution. For most tele—operations, however, the vast majority of
tele-operation trafﬁc is constituted by a real-time video. Trafﬁc associated with video
transfer may be bursty. Various encoding formats, for example, send only change
from the current video frame to the previous one when camera moves slowly and
there are no signiﬁcant scene changes (B-frames). If a rapid scene change takes place
then the whole newly-captured frame has to be sent at once, which causes bursts in
video-trafﬁc. It may not be efﬁcient to reserve bandwidth for a video-traﬂic at such
a “burst-level” because for most of the time the high bandwidth will remain unused.
On the other hand, reservation of bandwidth at lower levels may lead to unexpected

degradation of service. Therefore a precisely estimated “crossing probability” as it

109

was introduced in Chapter 3 may not do a good job for estimating the likelihood
of an actual degradation of service during the execution of a remote real-time task.
Under a more complex model, the likelihood of degradation of service for the period
of time T can be viewed as a probability that the trafﬁc associated with remote
real-time task execution exceeds the available bandwidth time-series at any moment
during the critical period of time T. In order to estimate such a likelihood, one
has to model traffic associated with the task in addition to modeling of available
bandwidth time-series. Liu and Mao [38] analyze various prediction algorithms for
real-time video trafﬁc and introduce another prediction algorithm. One of our goals
therefore is to unify results presented in [38] with our results in available bandwidth
estimation and come out with a more robust model for estimation of a likelihood of
a failure of remote real-time task.

We conclude this dissertation with a hope that the introduced parametric and
non-parametric approaches to modeling of an available end—to—end bandwidth to—
gether with the improvement of present OWD trend detection mechanism and
novel self-veriﬁcation / self adjustment module'will facilitate the extension of tele-
operations from “traditional” isolated and fully controlled lines to inexpensive and

easily accessible Internet.

110

BIBLIOGRAPHY

111

BIBLIOGRAPHY

[1] Auckland-ii trace archive. ”http://pma.nlanr.net/Traces/long/auck2.html”.

[2] David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris. Re-
silient overlay networks. In 18th ACM Symposium on Operating Systems Prin-
ciples (SOSP), 2001.

[3] Javier Aracil, Richard Edell, and Pravin Varaiya. A phenomenological ap-
proach to internet trafﬁc self-similarity. In 35th Annual Allerton Conference on
Communication, Control and Computing, 1997.

[4] M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a web proxy
in a cable modem environment. In ACM Performance Evaluation Review, 1999.

[5] M. Arlitt and T. J in. A workload characterization study of the 1998 world cup
web site. In IEEE Network, 2000.

[6] I. Barany and VS. Grinberg. On some combinatorial questions in ﬁnite-
dimensional spaces. Linear Algebra and its Applications, 4121—9, 1981.

[7] J. Beran, M. Sherman, S. Taqqu, and W. Willinger. Long-range dependence in
variable-bit-rate video trafﬁc. In IEEE Transactions on Communications, 1995.

[8] M. Borella and G. Brewster. Measurement and analyses of long-range packet
dependent behavior of internet packet delay. In IEEE INFOCOM, 1998.

[9] Arup Bose and Kanchan Mukherjee. Estimating the ARCH parameters by
solving linear equations. Journal of time series analysis, 24(2):127—136, 2002.

[10] J. Bourgain. /\ -sets in analysis: results, problems and related aspects. Hand-
book of the geometry of Banach spaces, I:195—232, 2001.

[11] Peter J. Brockwell and RA. Davis. Time Series: Theory and Methods. Springer,
New York, 1991.

[12] Alexander Chobanyan, Matt Mutka, Zhiwei Gen, and Ning Xi. One way delay
trend detection for available bandwidth measurement. In IEEE GL OBECOM
2005, 2005.

[13] Alexander Chobanyan, Matt Mutka, Shlomo Levental, and Ning Xi. Behavior
of available end-to—end bandwidth: Non-parametric approach. In International
Conference on Quantitative Evaluation of SysTems ( QES T), 2006.

112

[14] Alexander Chobanyan, Matt Mutka, V.S. Mandrekar, and Ning Xi. Modeling
available bandwidth for an efﬁcient qos characterization of a network path. In
4th IF IP- TC6 Networking Conference, Networking2605, 2005.

[15] Sergei Chobanyan. The structure of the set of sums of a conditionally convergent
series in a normed space. Mat Sb (NS), 128(170)(1):50—65, 1985.

[16] Sergej Chobanyan. Convergence as. of rearranged series in Banach spaces and
associated inequalities. Appeared in Probability in Banach spaces .9 edited by J.
Hoﬂmann-Jorgensen and J. Kuelbs and M.B. Markus. Birkhauser, 1994.

[17] M. Crovella and A. Bestarvos. Self—similarity in world-wide web trafﬁc: Ev-
idence and possible causes. In ACM Sigmetrics Conference on Measurement
and Modeling of Computer Systems, 1996.

[18] A. Dvoretzky and H. Hanani. Sur les changements des signes des termes d’une
serie a termes complexes. CR. Acad. Sci. Paris, 255:516—518, 1947.

[19] Paul Embrechts, Claudia Kluppelberg, and Thomas Mikosch. Modelling Ex-
tremal Events for Insurance and Finance. Springer, Berlin Heidelberg, 1997.

[20] RF. Engle. Autoregressive conditional heteroscedasticity and estimates of vari-
ance of UK inﬂation. Econometrica, (50):987—1008, 1982.

[21] M. Garrett and W. Willinger. Analysis, modeling, and generation of self-similar
vbr video trafﬁc. In SIGCOMM, 1994.

[22] A. Garsia. Topics in almost everywhere convergence. Lectures in Advanced
Mathematics. Markham Publishing Co., Chicago, 1970.

[23] N. Giroux and S. Ganti. Quality of Service in ATM Networks. Prentice Hall,
Upper Saddle River, NJ, 1999.

[24] VS. Grinberg and S.V. Sevastyanov. Value of the steinitz constant. Funktsional.
Anal. i Prilozhen, 14:56-57, 1980.

[25] I. Halperin and T. Ando. Bibliography: Series of vectors and Riemann sums.
Hokkaido University, Sapporo, 1989.

[26] H. Hotelling. Multivariate Quality Control. Techniques of Statistical Analysis.
Eisenhart C, Hastay MW, Wallis WA, McGraw-Hill, New York, 1947.

[27] N. Hu and P. Steenkiste. Evaluation and characterization of available bandwidth
techniques. In IEEE JSAC Special Issue in Internet and WWW Measurement,
Mapping, and Modeling, 2003.

[28] Ibragimov and Linnik. Independent and stationary sequences of random vari-
ables. Wolters—Noordhoff, The Netherlands, 1971.

113

[29]

[30]

[31]

[32]

[33]

1341

[35]
[36]

[37]

1381

[39]

Manish Jain and Constantinos Dovrolis. Pathload: a measurement tool for
available bandwidth estimation. In Passive and Active Measurements Work-
shop, 2002.

Manish Jain and Constantinos Dovrolis. End-to-end available bandwidth:
Measurement methodology, dynamics, and relation with top throughput. In
ACM/IEEE Transactions on Networking, 2003.

Manish Jain and Constantinos Dovrolis. Ten fallacies and pitfalls on end-to—end
available bandwidth estimation. In Internet Measurement Conference, 2004.

Balazs Gabor Jozsa and Marton Makai. On the solution of reroute se-
quence planning problem in mpls networks. Computer Networks: The Inter-

national Journal of Computer and Telecommunications Networking, 42(2):199—
210, 2003.

M.I Kadets and V.M. Kadets. Series in Banach Spaces.Conditional and Un-
conditional Convergence. Birkhauser Verlag, 1997.

Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson.
On the self-similar nature of ethernet traﬂic (extended version). IEEE/A CM
Transactions on Networking, 2(1):l—15, 1994.

P. Levy. Sur les series semi-convergents. Nouv. Ann. Math, 5:506~511, 1905.

Mingzhe Li, Mark Claypool, Robert Kinicki, and Jim Nichols. Characteristics
of streaming media stored on the internet. Technical Report WPI—CS—TR-03-18,

Computer Science Department, Worcester Polytechnic Institute, 100 Institute
Road, Worcester, Massachusetts 01609-2280, May 2003.

Zhi Li and P. Mohapatra. QRON: QoS-aware routing in overlay networks. IEEE
Journal on Selected Areas in Communications, 22(1):29—40, 2004.

Huabing Liu and Guoquiang Mao. Prediction algorithms for real-time variable-
bit video. In Asia Paciﬁc Conference on Communications, 2005.

Regina Y. Liu and Kesar Singh. Moving Blocks Jacknife and Bootstrap Cap-
ture Weak Dependence. Appeared in Exploring the Limits of Bootstrap edited by
LePage and Billiard. John Wiley Sons, New York, 1992.

[40] X. Liu, K. Ravindran, B. Liu, and D. Loguinov. Single-hop probing asymp-

[41]

totics in available bandwidth estimation: A sample-path analysis. In ACM
SICCOMM conference on Internet measurement, 2004.

Marton Makai. Reroute sequence planning in telecommunication networks
and compact vector summation. Applied Mathematics and Computation,
150(3):785—801, 2004.

114

[42]

1431

[44]

1451

1461

[47]

[48]

[49]

[50]

[51]

1521

[53]

[54]
[55]

Cao Le Thanh Man, Go Hasegawa, and Masayuki Murata. A new avail—
able bandwidth rneasurement technique for service overlay networks. In 6th
[PIP/IEEE International Conference on Management of Multimedia Networks
and Services, 2003.

B. Melander, M. Bjorkman, and P. Gunningberg. A new end-to—end probing
and analysis method for estimating bandwidth bottlenecks. In Global Internet
Symposium, 2000.

B. Melander, M. Bjorkman, and P. Gunningberg. Regression-based available
bandwidth measurements. In International Symposium on Performance Eval-
uation of Computer and Telecommunication Systems, 2002.

Jiri Navratil and R. Les. Cottrell. ABwE:a practical approach to available
bandwidth estimation. In Passive and Active Measurements Workshop, 2003.

Hikori Nishikawa, Takuya Asaka, and Tatsuro Takashi. ABdis: Approach to
estimate available bandwidth distribution using a multi-rate probe. In Inter-
national Conference on Communication and Broadband Networking ( I CBN04 ),
2004.

M. Peligrad and Q. Shao. Central limit theorem for weakly dependent random
variables. Journal of Theoretical Probability, 1994.

Ravi Prasad, Manish Jain, and Constantinos Drovolis. Effects of interrupt

coalescence on network measurements. In Passive and Active Measurements
Workshop, 2004.

V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell. Pathchirp:
Efﬁcient available bandwidth estimation for network paths. In Passive and
Active Measurements PAM workshop, 2003.

V.J. Ribeiro, M. Coates, R.H. Riedi, S.Sarvotham, and RG. Baraniuk. Mul-
tifractal cross trafﬁc estimation. In I TC specialist seminar on IP traﬁ‘ic Mea-
surement, 2000.

John A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press,
Belmont, California, 1995.

S.V. Sevastyanov. On some combinatorial methods in scheduling theory. Dis—
crete Applied Mathematics, 55:59—82, 1994.

J. Spencer. Balancing games. Journal of Combinatorial Theory, B23:68——74,
1977.

J. Spencer. Balancing vectors in max norm. Combinatorica, 6(1):55~65, 1986.

E. Steinitz. Bedingt konvrgente reihen und convexe systeme. Journal fur die
Reine und Angewandte Mathematik, 146268—111, 1916.

115

[56] Jacob Strauss, Dina Katabi, and Hans Kaashoek. A measurement study of
available bandwidth estimation tools. In IMC, 2003.

[57] B. Tsybakov and N. Georganas. On self-similar traffic in atrn queues: Deﬁni-
tions, owerﬂow probability bound, and cell delay distribution. In IEEE/A CM
Transactions on Networking, 1997.

[58] Sven Ubik and Antonin Kral. End-to-end bandwidth estimation tools. Technical

Report WPI-CS-TR-03-18, CESNET, November 2003.

[59] A.A. Weiss. Asymptotic theory for ARCH models: Estimation and testing.
Econometric Theory, (2):31, 1986.

[60] Moshe Zukerman, Timothy D. Neame, and Ronald G. Addie. Internet trafﬁc
modeling and future technology implications. In IEEE INFOCOM, 2003.

116

 

V LIBRAHJE

If LJNIVEPSIY

Fqu

RAH

M
K

4295

6

1293 0295