it): :x a I: v . ms :52. $1. . .. many. .5 Fv‘tl" Y . .. .. s . hum? .ymaxr 1.5.. 14 r. {harm s .3 .5 it .c. v‘ . N 1 ‘3. .a.m "we .3... 53.3.1! a. {$9.}. 7} $715!! I. . u: .11. a... . 2.2.2; .u... .... r? 3r. ‘ a. l) (.141 .. . . . _ A d}! 5......5. .an .. n A! ‘ (v a :...z:!:.....:. I. I \nfuni. . .. i... : 41:4!) .. 1... .r . . .f. .I 5? 51.3. 5. 4 x :Il..£3 I .. t . ..4.a.l¥.ll 2. if”. 1.. 13'! 1?... II J, u. aid“? 41}, n 2,. 31. :1... E . hmnz hum if“: m LIBRARY 2 Michigan State 0 32 University This is to certify that the dissertation entitled ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME SUPERMEDIA APPLICATIONS presented by ALEXANDER CHOBANYAN has been accepted towards fulfillment of the requirements for the Doctoral degree in Computer Science and Enlineering flm flflé Major Professor’s Signature 45M 3; 21332 Date MSU is an Affirmative Action/Equal Opportunity Institution -.-.-.----.-t-t.— — .u—.-n--.-o--.----.-v-n-n-n—.--.--.--.-u—.—.-.-.—.-.-I—--n-.-.---—-.-.—.-._.-.-.-.—.---.- PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/ClRC/DateDue.indd-p.1 ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME SUPERMEDIA APPLICATIONS By Alexander Chobanyan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2007 ABSTRACT ESTIMATING AVAILABLE BANDWIDTH FOR REAL-TIME SUPERMEDIA APPLICATIONS By Alexander Chobanyan The emerging development of remote sensing and control technology gives rise to new types of real-time networking applications that require “pro-active” quality of service guarantees. This dissertation specifics Quality of Service parameters char- acterizing requirements for successful execution of a given real-time tele-operation over the Internet. The dissertation also provides a statistical model that outputs the likelihood of a failure of remote real-time task execution based on the history of behavior of an end-to—end network path and the task requirements. We provided and compared both parametric and non-parametric approaches, conducted their experi- mental evaluation and also introduced some mechanisms for improvement of present measurement technologies of an end-to-end quality of a network path. Available bandwidth (spare capacity) of a network path is a key characteristic of the network path quality. We regard available bandwidth as a random time- series with an existing auto-correlation structure that is realistic and makes channel quality estimation accurate on real paths over the Internet. We also consider a set of pro—active measures to save remote real-time task ex- ecution from a failure once the likelihood of such a failure exceeds an acceptable threshold. In particular, we give a solution to the Reroute Sequence Planning prob- lem: to choose an optimal sequence of data—streams reallocations across different network paths once the QoS of old paths are unacceptable. Optimal reallocation minimizes extra capacity of a network link that is required for carrying out the reallocation procedure. TABLE OF CONTENTS List of Tables .................................. v List of Figures .................................. vi Introduction 1 Background 9 Parametric Approach 19 3.1 Data collection and preprocessing .................... 19 3.2 Modeling available bandwidth ...................... 25 3.2.1 “Crossing probability” ...................... 25 3.2.2 ARCH model ........................... 26 3.2.3 Results from extremal values theory ............... 34 3.3 Experimental evaluation ......................... 36 3.4 Conclusion ................................. 39 Network One Way Delay Trend Detection Mechanism 42 4.1 Main challenges of IC/ CS elimination and trend detection in Pathload 42 4.2 IC/ CS elimination procedure and t-test ................. 46 4.3 Experimental evaluation ......................... 51 4.4 Conclusion ................................. 56 Non-Parametric Approach 57 5.1 Moving block bootstrap ......................... 58 5.2 D-statistic: model verification mechanism ................ 62 5.3 D-statistic: theoretical results ...................... 66 5.4 D-statistic: experimental evaluation ................... 69 5.5 Prediction optimization with weights .................. 74 5.6 R-statistic: improvement of a verification procedure .......... 76 5.6.1 Timing ............................... 76 5.6.2 R-statistic ............................. 77 5.7 R-statistic: theoretical results ...................... 78 5.8 R-statistic: experimental evaluation ................... 84 5.9 Limits of a verification procedure .................... 88 5.10 Conclusion ................................. 91 Re—route Sequence Planning Problem 93 6.1 Introduction ................................ 94 6.2 Setting Of The Problem And Transference Lemma ........... 97 6.3 Application of Transference Lemma to Problem. An estimation of minyr |||x7r||| ................................ 99 6.4 The case of [go .............................. 101 iii 6.5 An algorithm for finding a ........................ 103 7 Summary and Future Research 106 7.1 Summary ................................. 106 7.2 Future research .............................. 109 7.2.1 QoS overlay network infrastructure ............... 109 7.2.2 QoS model improvement ..................... 109 BIBLIOGRAPHY 111 iv 3.1 3.2 4.1 4.2 4.3 4.4 5.1 5.2 LIST OF TABLES Empirically estimated “Crossing” probability for a path MSU—Berkeley 37 Difference between model-based estimated upper bound and empiri— cally estimated “Crossing” probability for a path MSU-Berkeley . . . 39 Comparison of PCT/PDT test and t-test performances on nearly empty network and zero receiver’s processor utilization ........ 54 Comparison of PCT/PDT test and t-test performances on nearly empty network and 5 — 7% receiver’s processor utilization ...... 54 Comparison of PCT/ PDT test and t-test performances on heavy loaded network and zero receiver’s processor utilization ........ 54 Comparison of PCT/ PDT test and t-test performances on average loaded network (around 50Mbps iIOMbps cross-traffic) and zero re- ceiver’s processor utilization ....................... 55 Valid range of history window sizes ................... 73 P-values for different history window sizes before and after weight adjustment. ................................ 85 1.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 4.1 4.2 4.3 4.4 4.5 5.1 5.2 LIST OF FIGURES Estimated available end—to—end bandwidth (Mbps). The time-series is sampled at a rate one observation per minute ............. 4 Available end-to-end bandwidth (Mbps): Michigan State University (35.9.2014) - Massachusetts Institute of Technology .......... 20 Available end-to—end bandwidth (Mbps): Denver (USA) 198.32.154.187 — Canarie, Montreal (Canada) planetl.montreal.canet4.nodes.planet- lab.org ................................... 20 Available end-to-end bandwidth (Mbps): Michigan State University (35.9.2014) - University of California, Berkeley (169.229.50.12) . . . . 21 Available bandwidth before trend removal and inversion ....... 23 Available bandwidth after trend removal and inversion ........ 23 Michigan State University - University of Sydney, Australia (trend example) .................................. 24 Michigan State University - University of Uupsala, Sweden on the right (routes oscillation example) .................... 24 Measured available end-to-end bandwidth. ............... 28 Simulated ARCH 2(1) observations with [30 = ,31 = 0.25. ....... 28 Fisher Information: standard deviation of 5‘0 ............. 31 Fisher Information: standard deviation of 61 ............. 31 Fisher Information: correlation between 31 and [30 .......... 32 “Crossing” probability for a threshold M=25 Mbps and a number of observations N = 50 ........................... 35 Empirically estimated 95% upper bound over empirically computed “Crossing” probability for a path MSU-Berkeley ............ 40 Shortcomings of IC/ CS elimination procedure in Pathload: before elimination ................................ 45 Shortcomings of IC / CS elimination procedure in Pathload: after elim- ination ................................... 45 OWD pattern at the presence of packet loss. Trains were sent at a rate 40 Mbps higher than available at that time bandwidth. ..... 49 OWD pattern at the presence of packet loss. 'ITains were sent at a rate 30 Mbps higher than available at that time bandwidth. ..... 50 OWD pattern at the presence of packet loss. Trains were sent at a rate 20 Mbps higher than available at that time bandwidth. ..... 50 Window of size N = 3 slides across the history data of size n = 12 for prediction of a crossing probability P over coming N = 3 observations 59 Probability density functions for a p—value conditioned by Null and Alternative hypothesis ........................... 63 vi 5.3 5.4 5.5 5.6 5.7 5.8 6.1 Evolution of the crossing probability along with its 95% confidence intervals throughout the day. M = 65Mbps, N = 5, T = 0.43sec. Probability density functions along with p—values collected for the route 35.9.4290 (Michigan State University) - 137.189.100.235 (Chi— nese University of Hong Kong). ..................... Probability density functions along with p—values collected for the route 35.9.1014 (Michigan State University) — 210.72.135.98 (Shenyang Institute of Automation, Chinese Academy of Sciences) ......... Reducing the time of self-verification procedure when switching from R-statistic to D-statistic ......................... Histograms of acceptable range of lengths of history windows before and after weights adjustment. The black histogram shows the range distribution after adjustment whereas the gray histogram shows the range distribution before adjustment. .................. Dependency of minimal value of a crossing probability for which ver- ification procedure works well on the future time-window over which prediction is made and on prediction/verification technique ...... Network graph example ......................... vii 69 70 70 87 90 Chapter 1 Introduction Over the past several years the importance of Internet-related technologies has rapidly increased. The growth of the Internet has not only a quantitative nature but also affects the type of information that may be transferred and consequently the methods for real-time interaction between communicating parties. Many emerging Internet-based technologies may benefit not only by transferring traditional media- types, such as video and audio, but also by remotely manipulating, touching and feeling a remote object. Tele-medicine, for example, can potentially allow a doctor to touch and feel a patient thus further expanding opportunities for conducting remote diagnostics. Tele-education opens wider possibilities to learn about different objects via touching and feeling them. Tele-marketing applications can be designed in order to allow a customer to touch the item he wants to purchase before making a financial transac- tion. Video, audio, haptic, temperature, control commands and other media used for tele-operations are unified under the term “Supermedia” that can be considered as a significant extension of traditional multimedia information streams. This dissertation has been written in the frames of a “Supermedia” project that assumes developing and experimentally testing new methods, algorithms, and tools to establish an infrastructure for real-time supermedia transport over the Internet to support tele—operations. Traditionally tele-operations take place over isolated controlled lines. In such a case, the channel has sufficient degree of reliability for conducting important tele- operations that may have high risk-cost associated with channels failure to give necessary quality of service to communicating parties. On one hand, using the Internet as a transport for tele—operations is much cheaper and therefore more convenient. In addition the Internet is wide-spread and easily accessible medium all over the world. On the other hand, however, the Internet does not provide sufficient quality of service guarantees to communicating parties. A net- work path over the Internet is by its nature unreliable and uncontrolled. Throughput of such a path at any time-instance always depends on the cross-traffic present in the path. Therefore real-time tele—operation over the Internet may always have un- expected network delays, distortions or even communication failures. One can even argue that the absence of communication reliability represents a principle challenge for wide deployment of tele—operations over the Internet. We believe that if “on aver- age” the cost of tele—operation failure is “sufficiently” small then such a tele-operation still can be performed. Under “average” failure-cost we assume the product of the cost associated with tele-operated task failure multiplied by the likelihood of such a failure to happen. In order to support tale-Operations over the Internet it is vital to develop mecha- nisms that efficiently estimate for any particular tele—operation the likelihood of the failure of remote real-time task execution and take pro-active measures to minimize this likelihood. Moreover, based on his knowledge of the cost associated with a fail- ure of a particular real—time task the operator should be able to temporarily suspend real-time task execution until better times. For example, before a robot is directed to approach a wall, it is important to report to the operator the probability based on the channel’s history that control over the robot is lost due to network problems while the robot approaches the wall. Then the operator makes a decision whether he should begin the wall-approach immediately or to wait until better times. In this thesis, an available bandwidth (AB) of a network path is considered as a primary QoS characteristic of the path and probabilistic models of tele-operation performance are built around this quantity. Available Bandwidth may be briefly defined as the maximum rate at which a sender can put bits on a wire without affecting existing cross-traffic, or in other words, as a residual capacity of the network path. Previous work [29] concentrated on obtaining the value of the available band- width averaged over a reasonably small time-window without an attempt to provide a broader probabilistic picture characterizing the network path. Figure 1.1 shows an example of how available bandwidth between two nodes located 20 hops apart may behave over time. As depicted in the plot, available bandwidth may instanta- neously drOp far below its mean value thus causing degradation of service, such as distorted video or audio. Despite the fact that such “spikes” may last just a few seconds, they may be fatal for many real-time remotely-operated tasks. As it fol— lows from figure 1.1, the value of the available bandwidth averaged over an arbitrary large (small) time-window cannot characterize the network path well in terms of its bandwidth QoS properties. The available bandwidth time—series is directly related to the cross-traffic that is present in the path. It has been shown [34] that the net- work traffic has self-similar properties that imply a non-summable autocorrelation function and long-range dependence between observations. Therefore observations of available bandwidth cannot be considered as independent, which makes statistical inference more challenging. The principal novelty of approaches introduced in this dissertation is in viewing available bandwidth as a random heavy-tailed time-series with existing autocorre- lation structure instead of reporting just one expected value while assuming that it does not change over the measurement time-period. More precisely, this disser- tation introduces a new (in the context of available bandwidth time series) (208 quantity called “crossing probability” - the probability that the available bandwidth time-series becomes less than pre-defined threshold over pre-defined time-frame. This dissertation introduces parametric and non-parametric statistical models that predict “crossing probability” for a pre—defined threshold and future time-frame. Also the dissertation introduces significant improvement of basic block of present Mbps 90 r + 80 - . 70 — — 60 ~ _ 50 r _ minutes 40 1 n 0 500 1000 1500 Figure 1.1: Estimated available end—to—end bandwidth (Mbps). The time—series is sampled at a rate one observation per minute. probe-rate based available bandwidth measurement tools. The proposed parametric model provides an important probabilistic QoS channel specification in terms of coefficients that allow rapid computation of “crossing prob- ability”. The aforementioned coefficients are evaluated based on observed available bandwidth behavior over a reasonable period of time and are updated at run-time reflecting the changing dynamics of a network environment. Both, the critical value and the time-frame for which we predict should be defined by the QOS specifications of a particular real-time task. The accuracy of the “crossing probability” prediction may vary from path to path and over time. However, the notion of “crossing probability” on its own is conceptually novel with respect to the available bandwidth estimation. Real time applications can benefit from this approach by reallocating part of the QoS-sensitive channels over different network paths when the crossing probability exceeds an ac- ceptable threshold. Charmel reallocation in this scenario happens before something fatal occurs regarding the real-time task execution but after the probability for this fatal event exceeds a reasonable value. Despite its own advantages, the parametric approach relies on values of available bandwidth reported by presently known measurement tools and for many network paths these values may not be accurately reported, which significantly limits the range of its applicability. Present measurement tools need around one minute for an accurate reporting of one observation of an available bandwidth time-series. Conse- quently, in order to collect a statistically significant number of observations one has to consider the history window of at least 1-2 hours. On another hand, our para- metric model assumes that the observed available bandwidth time-series is weakly stationary throughout the history time—frame. In order to make this assumption realistic we have to choose a history window size as small as possible. Our experi- ments showed that for many network paths the aforementioned trade-off for a history window size exists. For such paths the parametric model does its job reasonably well. However, for many network paths the sampling rate of available bandwidth mea- surement tools is too small. In addition, when dealing with robotic applications there is a need to make predictions for future windows of the order of seconds whereas a parametric approach works for scales of the order of minutes. The aforementioned factors pushed us to research on improvement of present available bandwidth measurement tools towards decreasing measurement time while preserving measurement accuracy. Measurement of available bandwidth (even as a single value) is a complicated task since cross-traffic in the network path dynamically changes thus causing change of the available bandwidth. The best that present measurement techniques can do is to measure the mean available bandwidth averaged over a certain period of time T. Decreasing the measurement time T is an important issue because the assumption that available bandwidth does not change over a smaller period of time is weaker and more realistic rather than the assumption that the bandwidth does not change ever larger periods. Presently two types of approaches to measurement of available bandwidth exist. Probe-gap models [56, 45] send a pair of packets and make further inference about available bandwidth based on the measurement of a gap between packets at the receiver’s side. Probe—rate models [29, 46, 42] send sequences or “trains” of packets at different rates assuming that a sequence sent at a rate higher than the available bandwidth causes noticeable positive trend in times that packets in the sequence need for travelling from sender to receiver. These times are often called one-way- delays (OWDs). Then the available bandwidth at that time can be found by a simple binary search. Probe-gap approaches are less intrusive than probe-rate approaches. However, probe-gap approaches suffer from the so called “interrupt coalescence / context switch (IC/ CS)” effect [48], which significantly distorts measurements of packets inter-arrival times. Probe-rate models are affected by IC/ CS phenomenon as well but not as much as probe-gap models. Probe-rate models send a larger number of packets and therefore an IC / CS effect can at least be clearly observed and to some extent considered by a model deployed in the probe-rate measurement tool. This dissertation introduces a new metric for OWD trend detection of packets in a train together with a new method for dealing with the IC / CS effect. We compare our results with Pathload [29], which to our knowledge is the only relatively reliably working tool that considers the IC/ CS phenomenon. We demonstrate that our metric used for trend detection together with our way of dealing with interrupt coalescence and context switch effects provides much better OWD trend detection accuracy. It can also better tolerate processor utilization of a receiving machine and therefore can serve as a basic block not only in Pathload but also can improve a wide- range of proberate oriented available bandwidth measurement tools that presently use trend detection algorithm introduced by Pathload’s authors. In particular, with our approach in most cases we need to send only up to three trains while achieving the same accuracy as Pathload when it sends 6 trains. We therefore decrease the measurement time without sacrificing accuracy. The dissertation deploys a robust OWD measurement module (described above) for developing a more general non-parametric approach for computation of a “cross- ing probability”. In contrast to above-mentioned parametric approach that relies on measurements of the AB time-series made by Pathload [29], the non-parametric approach relies directly on measurements of indicators showing whether the AB time—series is smaller or larger than a given value. Let us note that every particu- lar measurement of available bandwidth made by Pathload is obtained by a binary search [29], which itself relies on measurements of a number of indicators mentioned above. Therefore the proposed non-parametric approach is more robust, allows faster computation of a crossing probability than described earlier parametric approach. Also non-parametric approach has a larger range of applicability since it does not rely on any parametric time-series model. For computation of a “crossing probability” estimate we use modern non-parametric techniques of statistical analysis of a random time-series such as a moving block bootstrap [39]. Moving block bootstrap considers dependence [34] between obser- vations of indicators and helps in providing 95% confidence intervals around the non—parametric estimator of a “crossing probability”. Since the main goal of our research is prediction of a “crossing probability,” we also introduce two methods for a verification of the accuracy of our prediction based on training and previously un— seen testing data-sets obtained by a real-life measurements across multiple Internet paths. Along with real-life experiments, we present theoretical results that show that our verification procedures are probabilistically legitimate. In addition to “crossing probability” estimation, this dissertation deals with a set of actions that communicating parties might undertake in order to keep tele- operation alive given that “crossing probability” increased above critical threshold. A real-time application may reallocate, for example, some of data streams through different routes that may have better (.208 characteristics at given time-period. In this dissertation we provide preliminary results related to improvement of the solu- tion of a Reroute Sequence Planning problem [41] (RSP). The RSP problem chooses an optimal sequence of reallocations of an old set of data-stream paths to a new set of data-stream paths that minimizes extra network-link capacity needed for carry- ing out of streams reallocation. We provide theoretical results on the bound of the extra needed capacity and also give a sketch of an algorithm for performing reroute sequence planning. The rest of this dissertation is structured as follows. Chapter 2 provides defini- tions and describes previous available bandwidth measurement tools. Chapter 2 also gives a brief background on weak stationarity, self-similarity and describes heavy— tailed long-range dependent models that were used for network traffic modeling. Chapter 3 introduces the notion of a “crossing probability” and describes the para- metric model for “crossing probability” estimation. Chapter 4 deals with an im— provement of the previous available bandwidth measurement tools with a new One Way Delay trend detection mechanism. Then Chapter 5 introduces a non—parametric approach to “crossing probability” prediction that uses a measurement module in- troduced in Chapter 4. Also chapter 5 introduces model verification mechanisms that help to check on real data-sets wether predicted values of a “crossing proba- bility” really match on-coming predicted events of a crossing. Chapter 6 introduces Reroute Sequence Planning problem and corresponding mathematical problem of Compact Vector Summation. The chapter gives theoretical results for a link capac- ity bound and provides an algorithm for finding a sequence of replacements of paths for data streams that complies mentioned above bound for a capacity. In Chapter 7 we summarize results of this dissertation and then talk about our future work. Chapter 2 Background We consider a network path as a set of store-and—forward links Li,z' = 1, 2, ...,k. The capacity of the i-th link C,- may be defined as a maximum bit rate at which the sender may transfer information through this link when no other traffic is present. Now assume that the link is partially used by another communication. Let T (t) be defined as the number of bits that have been transferred through this link by time t. The spare capacity of link i, or in other words, its available bandwidth Az- (t) at time t can be defined as Jill-(t) E Ci — limT_,0 Ti(t+T7)._Ti(t) or: Ai(t) E Ci — %(t). Then the bandwidth A(t) available at time t of a path can be defined as A(t) E minizl Mk Ai(t). From the definition it follows that available bandwidth cannot be measured precisely and therefore needs to be approximated by the mean available bandwidth at time t for the interval T. T2:(t+T) -Tz'(t)} AT(t) = min i=1,... Jsz' - The period of time T over which AT(t) is averaged depends on the particular network application. If just one value of AT(t) is reported, then a smaller T does not nec- essarily guarantee better results because the measurement accuracy degrades with decreasing T. However, with smaller T, we have more flexibility when describing the probabilistic properties of available bandwidth behavior. Naturally, with a higher sampling rate one can acquire more “behavioral” information while still considering measurement errors within the frames of a chosen probabilistic model. In general, the approximation of available bandwidth is (.208 relevant if T is less than the time that the receiver needs to empty its buffer. Fluctuations of the available bandwidth should not cause degradation of quality at the receiver as long as those fluctuations do not affect the number of bits that the receiver needs to receive to keep its buffer non-empty. Significant work that has been done in mean available bandwidth estimation is well reflected by Strauss, et a1. [56] and Hu, et a1. [27]. In short, deployed bandwidth estimation models can be divided into probe gap (or packet dispersion) and probe rate classes. Probe gap models used in Spruce [56] and Delphi [50] analyze delay between probe packets and make further inference about available bandwidth based on that delay. Probe gap models can be further divided into direct and iterative approach groups. Tools utilizing the direct approach [50, 56] sample available bandwidth directly analyze either delays between packet arrivals or packet travel times. An iterative approach assumes step-by-step convergence to the tentative range where the avail— able bandwidth resides. Each step represents a comparison of available bandwidth against a specified rate-value. The classical direct approach assumes fluid cross-traffic as well as an existence of a tight and narrow link with known capacity C. We accept terminology where a “tight” link is the link with smallest available bandwidth among all links in the path whereas the “narrow” link is the link with the smallest among links in the path capacity. Fluid cross-traffic model assumes that at the input of a router during any arbitrarily small time interval 6 arrives the cross-traffic of R50 bytes where RC is the average cross-traffic rate for a time-period 6. The formula for available bandwidth A can be derived as follows. If RC exceeds an available bandwidth A then there will be an increase in a queue of the router at the tight and narrow link caused by measurement packets. Let 61 be a gap between packets in incoming sequence and 52 be a gap between packets after the tight and 10 narrow link. Also let Aq be the queue increase for the time 61 and L be the size of a measurement packet. Then Aq can be expressed as Aq = (RC — A)61 where RC can be viewed as an arrival rate whereas A can be viewed as a service rate for the measurement data. Also one can write: (51 = L/Rc and therefore Aq = fl—Rfig—Az. Then taking into account that capacity C is the rate at which router services input queue (measurement packets + cross-traffic) we can state that increase in the queue for the time 61 can be processed for the time Ad _ — _ C, Rec, Then gap between packets at the output of the slow link can be expressed as L(Rc — A) 6 =6 Adz—.6 2 1+ 1+ RcCt Taking into account that RC = 5L; one concludes that the only unknown parameter in the above equation is the available bandwidth A. Therefore available bandwidth can be expressed as L Q62 A=Ct‘3—1( L 1) In the language of rates the above formula can be rewritten as Ct Rout A = Ct — Rc( - 1) where Rout E 3% is the rate at which measurement packets are received. A fluid traffic model approach implies that the number of packets in the sequence sent with the rate RC does not significantly affect measurement accuracy. One may even conclude that an analysis of a pair of packets demonstrates performance similar to analysis of a large sequence of packets. Despite seeming advantages of fast sampling of exact values of available band- width, the direct approach has several significant drawbacks which according to our point of View makes it inapplicable to available bandwidth measurement. 11 First, there might not exist a revealed tight and narrow link in the network path. Even if such a link exists then its capacity still may not be known. Second, if the packet-size of a cross-traffic is sufliciently large then amount of cross-traffic in- serted between measurement packets should be represented by discrete values which significantly falls out of the fluid model. Jain, et al. demonstrated in [31] that mea— surement error increase from 2% to 40% when cross-traflic packet-size is 1500B and the number of packets in the measurement sequence drops from 100 to 10. However, an increase of number of packets in the measurement sequence also is faulty because of the influence of the long measurement queue onto itself [40]. An iterative probe—gap approach [44, 43, 27, 49] is also based on the assumption that if the rate at which measurement sequence is sent exceeds available bandwidth then gaps between measurement packets increase. The iterative approach, however, does not rely on a known capacity of the tight and narrow link and does not make direct inference about value of available bandwidth based on a single measurement. If gaps between measurement packets increase then the only conclusion made by an iterative approach is that the available bandwidth is lower than the rate at which packets were sent. A variation of this approach estimates rates at which packets were sent and received. Then available bandwidth is believed to be lower than the rate at which packets were sent if the rate at which packets were received is significantly lower than the rate at which they were sent. Various search methods are applied for figuring out what is an approximate value of the available bandwidth. Hu and Steenkiste for example do linear rate variation [27] whereas the Pathchirp measurement tool [49] exponentially increases inter-arrival times between successive packets. All above-mentioned tools and techniques, however, do not consider the effect of cross-traffic burstiness. Jain and Drovolis note in [31] that basic queueing theory says that queues can increase even before server utilization reaches 100%. And as it was shown by Liu, et al. [40], non-empty fluctuating queues at routers are responsible for constant underestimation of an available bandwidth by all above-mentioned probe- gap based iterative techniques. Such an underestimation happens because if the 12 rate at which packets are sent begins to increase then gaps between measurement packets begin to increase far before the rate at which packets were sent reaches the point of an available bandwidth. Probe rate iterative models also utilize sequences or trains of packets sent at different rates. The key difference between probe—rate approaches and probe—gap models is that probe-rate approaches work with packet travel times or so called One Way Delays (OWD) for determining whether the rate at which sequence of packets was sent is higher or lower than the available bandwidth. For example, the probe rate model used in Pathload [29] uses trains of packets sent at different rates. If the delay between packets at the receiver shows an in- creasing trend, then this trend may be considered as an indicator that the rate at which packets were sent is higher than the available bandwidth at that moment. If there is no indication of an increasing delay trend then the rate is smaller than the available bandwidth. Pathload then does a binary search to find the rate that is ap- proximately equal to the available bandwidth. As it was stated above, fluctuations of queues at routers can increase gaps between packets. However, a highly revealed OWD positive trend in a sufficiently large sequence of packets unambiguously points to the inability of one of the renters in the network path to serve its queue, which can be interpreted as a lack of available bandwidth of the path with respect to the measurement traffic rate. Both probe rate and probe gap measurement approaches suffer from known in- terrupt coalescence / context switch (IC/ CS) effects [48]. If interrupt coalescence is enabled at the receiver’s network adapter, then the adapter waits for a group of packets to arrive and only after that issues an interrupt request. Then the entire group is processed with one interrupt. The direct IC impact on network measure— ments is that packets are not time—stamped by a receiver after they really arrived but with a variable delay. Some may wait for an interrupt request for up to hundreds of microseconds, which significantly influences network measurements. Context switch effects have the same impact. If a processor is utilized by another job, then a cer- tain random time-period is passed after a packet’s arrival before it is processed and 13 time-stamped by a receiver. It is fairly complicated to consider the IC / CS influence when only pairs of packets are sent. To our knowledge there are no known probe-gap (packet dispersion) based available bandwidth measurements that consider the IC / CS effect. Pathload is the only tool from the probe-rate approaches that partially addresses the IC / CS issue. We therefore here-and-after focus our attention on Pathoad as the most stable and accurate end-to—end available bandwidth measurement tool to our knowledge. Pathload checks the time that a system needs for delivering packets from network adapter to upper layers. All packets that are received within this time-frame are assigned to one group or burst. Then in a packet train, Pathload counts only one packet out of each burst and discards the rest of the packets. After IC/ CS related filtering, Pathload divides the remaining N packets into F x x/N groups and takes the median from each group. The remaining packets are discarded as well. Finally two metrics are computed based on remaining F measurements. The Pairwise Comparison Test (PCT) metric is evaluated as I‘ , S _ Zk=2[(A/[k >flfk_1) PCT_ I‘__1 where I is an indicator function evaluated to one when the median of the k-th group M k is larger than the median of the k —1st group and to zero otherwise. As it follows by definition if a train has an increasing trend then the PCT metric approaches 1. If there is no trend, then the PCT metric should be close to zero. The Pairwise Difference Test (PDT) metric is defined as My — M1 Z£=2 Ill/[k —- Mk_1| SPDT = If the train does not have an increasing OWD trend then SPDT should vary around zero, whereas in a presence of an increasing trend it should be close to one. The PDT metric was introduced for detection of a trend with a significant start—to—end OWD difference. Finally, two tests together decide what to report for a given train 14 of packets. The PCT and PDT tests represent a basic mechanism of trend detection in Pathload. In general, the number of packets in a train, the train packet size, and the bandwidth-rate search mechanism may vary among approaches. For example, Man, et al. [42] introduce a light-weight train-based tool that is less intrusive than Pathload on one hand but sacrifices accuracy on the other hand. Nishikawa, et al. [46] propose to evaluate the approximate distribution of available bandwidth as of a random quantity. Basic metrics in these works remained, however, the same as they were introduced first in Pathload. Pathload is believed to be slightly more intrusive than tools using the probe gap model. To our knowledge Pathload is, however, the most stable and accurate among available bandwidth measurement tools to date. Also Pathload creators propose a way to consider the IC / CS influence and therefore Pathload is to our knowledge the only tool that considers an IC/ CS effect. The accuracy of Pathload is nevertheless far from ideal. Recent research [29] have proved that Pathload’s measurements are much more accurate for higher values of available bandwidth than for the lower values. Also our experience with Pathload shows that for many network paths, in a long run, Pathload reports many miss- ing values and sometimes is unable to catch instantaneous rapid drops of available bandwidth that are critical for QoS inference about the channel. Pathload, up to a certain degree, considers IC/ CS influence, but still its performance significantly depends on the processor utilization of measurement machines. Therefore for our experiments we had to use machines with a very light processor utilization factor. From the definition of available bandwidth it follows that it is tightly related with the cross-traffic present at the moment in the network path. Cross-traffic defined as %(t) represents a random time-series that has been extensively analyzed by both statisticians and network engineers for its statistical properties. If the tight link remains unchanged throughout the measurements (that is usually the case) then the available bandwidth time-series can be evaluated by subtracting the cross-traffic from a tight link’s unchanged capacity. It follows that in the aforementioned case 15 the available bandwidth and cross-traffic are exactly the same time—series in terms of their statistical characteristics. Therefore we believe that the model chosen for available bandwidth characterization should incorporate at least basic properties of models that are used presently for Internet traffic modeling. Below we provide some background of such models. All parametric traffic models assume weak stationarity of the traffic time-series. Weak stationarity can be defined as follows: Let {X t} be a random time-series that takes values 1‘1,I2,$3, ...xn at different instants of time. Generally speaking, all these values are observations of different, dependent random variables X 1, X 2, ..., X n. We say that random variables X1, X2, ..., Xt, ..., observed over time, form a weakly stationary time-series if the following conditions are satisfied: expected values of all Xt’s are the same, and the 7X(t, h) defined as Cov(Xt+h, Xt) does not depend on t. 7X(t, h) = 7X01.) is called the autocovariance function of the time—series (ACVF) at lag h. p X(h) defined as p X(h) E % is called an autocorrelation function (ACF) of the time-series at lag h. Weak stationarity means that the correlation picture remains the same under any shift in the time domain. ACF therefore can be treated as a basic characteristic of a weakly-stationary time-series. Leland, et al. [34] showed that network traffic possesses an important char- acteristic called self-similarity. Self-similarity can be formally defined as follows: let pX(h) ~ h'3 as h —+ 00, 0 < i3 < 1. For an integer m = 1,2,...,oo let {X gm} be new weakly stationary series defined as X (in) = —71,—,(X km—m +1 + + X km): The time-series {Xt} is called second-order self—similar with self- similarity parameter H = 1 — g if for all m = 1, 2, ..., oo Var(X(m)) ~ Tn—fi and pX(m)(h) = p X (h) for all h > 0. Most important implications of self-similarity are “slowly” (slower than l/h) decreasing autocorrelation implying long-range de- pendence (220:0 p X(h) = 00), and “slowly” decaying variance of the so called aggregated time-series ( {X (m) (k)} that was obtained by averaging the initial time- series over groups of size m). As a motivation for self—similar characteristics of a network traffic one may consider an efficient modeling of a network traflic by a superposition of several 16 “ON / OFF” sources. Each of such a source can reside in two states: “ON” state when it intensively transfers packets and “OFF” state when it is idle. The model assump- tion is that the lengths of activity periods are independent identically distributed random variables. If each source “is driven” by the same distributions then it be- comes possible to model the behavior of superposition of several such sources. The important achievement made by researchers from Bellcore was to model “ON / OF F” periods with the help of infinite variance distributions (such as Pareto with param- eter a in the range from 1 to 2). Superposition of several such “ONN / OFF” Pareto sources allows to derive a self-similar traffic model. There exist many self-similar models by which researchers approximated Internet traffic. Crovella and Bestavros report in [17] that traffic formed by web browsers possesses self-similar characteristics. In their study they analyzed more than 500, 000 requests generated by 37 browsers. Each web—browser was modeled as an ON / OFF source. In particular, it was shown that browser data match heavy-tailed Pareto distribution. Similar results were shown in [5] and in [4]. Borella and Brewster [8] use UDP packets for measuring packet delays and show- ing that delays demonstrate long-range dependent nature. Another study [23] shows that FTP data rate is bursty and the distribution of the number of bytes in bursts has heavy tails. Several works show that data associated with video traffic also has self-similar characteristics. Video compression algorithms assume frames of variable sizes, which allows to view video traffic as a random time-series. Regular video has scenes with a lot of motion, and scenes with a few of motion which makes distribution of data in frames heavy-tailed. Garrett and Willinger, for example show in [21] that transfer of the “Star Wars” movie coded in JPEG format reveals self-similar nature and that length of the frame follows heavy—tailed Pareto distribution. Authors show, in particular that high degree of variability in frame length is associated with scenes variability. In another study Beran at al [7] analyzed 20 different Variable Bit Rate video transfers, which were formed with various coding mechanisms. They showed that 17 long-range dependency exists in all 20 transfers and does not depend on type of video-encoding or on used scene-types. Tsybakov and Georganas [57] report similar results. Zukerman, et al. [60] proposed to use the Poisson Pareto Burst process model for an Internet traffic. In addition to the mean, variance, and self-similarity parameter H, the model has an extra “aggregation” parameter A. Zukerman, et a1, believe that their aggregation parameter significantly improves the accuracy of the model. Li, et al. [36] studied characteristics of streaming media stored in the Internet and came to conclusion that it has long-tailed distribution and therefore may contribute self-similar Internet traffic. Aracil, et al. [3] studied FTP and Telnet data arrival time-series and came to the conclusion that it is self-similar and can be matched to a fractal renewal process. Traffic modeling studies have constructed models that visually look similar to observed time-series and have autocorrelation structures similar to what has been observed. Also, estimated model parameters based on observed data should remain unchanged or slowly vary over time. Finally, the model should be relatively simple and amenable for further statistical analysis. We will follow the same strategy to unify recent work in Internet traffic analysis, available bandwidth estimation tools, and time-series study with respect to analysis of the QoS characteristics of a network path. 18 Chapter 3 Parametric Approach This chapter introduces the notion of a “crossing probability”. It emphasizes its importance for tele-operations requiring “pro-active” reaction to QoS degradation, and gives a parametric model for “crossing probability” estimation based on past observations of an available bandwidth time—series. 3.1 Data collection and preprocessing Figures 3.1, 3.2, 3.3 depict the available bandwidth behavior for the period of 800 minutes related to various network paths collected with the help of a Pathload measurement tool. The distance between nodes varies from 4 to 20 hops. For all examples in figures 3.1, 3.2, 3.3 we observe a small trend that is less than 1 Mbps around the mean value. Also, all examples have bursts reflecting the heavy-tailed self-similar nature of Internet traffic-related series. This work is related to available bandwidth time-series that are stationary. Therefore, we assume that the mean of the time-series does not change or, in another words, the time-series of interest has an insignificant trend. Sometimes the available bandwidth series may acquire a trend significantly ex- ceeding a reasonable limit (say 5 Mbps). Figures 3.6 shows an example of such a network path (Between nodes located in MSU and the University of Sydney, Aus- tralia). The existence of such a trend naturally violates the stationarity assumption 19 Mbps minutes I J l l o I l l 0 100 200 300 400 500 600 700 800 Figure 3.1: Available end-to—end bandwidth (Mbps): Michigan State University (35.9.2014) - Massachusetts Institute of Technology 100 I I I I I I I "DPS m—VIIU I I _ 80~ * 7o»- -« 40- ~ mlnutos 30 I I 1 I I I l 0 100 200 300 400 500 600 700 800 Figure 3.2: Available end-to-end bandwidth (Mbps): Denver (USA) 198.32.154.187 - Canarie, Montreal (Canada) planetl.montreal.canet4.nodes.planet-lab.org 20 1 m I I I I I I I l Mbps m d 80 ~ - 70 — - 60 ~ ~ 50 - — 40 ~ — 30 ~ . 2o — - mlnutes 1 0 l 1 l I 4 I l o 100 200 300 400 500 600 700 800 Figure 3.3: Available end-to-end bandwidth (Mbps): Michigan State University (35.9.2014) - University of California, Berkeley (169.229.50.12) which makes impossible to apply any stationary time-series model for further anal- ysis. We note, however, that if the trend is sufficiently small for the period of time used for the analysis then stationary models still can be applied. For example, if we use historical data of the last hour for prediction of what may happen in coming 30 minutes and if the ongoing trend is insignificant for the period 1.5 hours relevant to prediction then the stationary model still can be applied. For further fitting of stationary self-similar models, we preprocess the time- series of interest by subtracting the trend. For all examples considered in fig- ures 3.1, 3.2, 3.3, the trend-subtraction changes each observation for less than 2 Mbps, which is negligible with respect to spikes that are the primary cause for instantaneous unexpected degradation of service. Predicting the probability of such a spike to occur over a certain QoS sensitive time-slice is our primary goal. For the aforementioned purpose we can neglect 2-3 Mbps trend and further consider the “adjusted” time—series. Figures 3.4 and 3.5 show the original time-series representing an available bandwidth between nodes located in MSU and Berkeley along with the 21 corresponding “adjusted” time-series. Figures 3.4 and 3.5 show that trend-removal, inversion and “lining-up” to zero level do not significantly change the behavior pat- tern of the original time-series of interest. For trend-removal we have used the Spencer 15-point moving average filter described by Brockwell, et al. [11] that skips polynomials of degree three and smoothes everything that has a larger degree. A Spencer filter can be briefly introduced as follows. Let {mt} be time-series representing the trend whereas {Xi} is the original series. Then {mt} obtained by the Spencer filter can be evaluated as follows. mt = Zj ant—j’ where aj = 0 if |j| > 7, aj = a_j if |j| g 7 and [a0,a1, ...,a7] 2 3.30m, 67, 46,21,3, —5, —6, —3]. It is also undesirable to have instantaneous spikes affecting trend estimation. Therefore a “trimming” procedure is applied before passing data through the Spencer filter. The whole procedure can be done at run-time. The size of the window that slides across time-series observations is 19. We delete two maximal and two minimal observations, and pass the remaining 15 observations to the Spencer filter that outputs the estimated trend. Then the trend is subtracted from the original time-series, the result is inverted, lined-up to zero, and thus finally the “adjusted” time series is obtained. The following discussion is related to the analysis of the “adjusted” time-series. Note that in most cases the “adjustment” procedure does not significantly distort data. Even in cases such as the one which is shown in figure 3.6, if the existing trend has a “long-term” nature then we still can apply the inference about spikes made for the “adjusted” time-series to spikes of the original time-series. In such cases the “zero-level” for the “adjusted time-series” during different parts of a day corresponds to different “base-levels” of an available bandwidth. Under the term “base-level” we assume the level of an available bandwidth from which spikes of “adjusted” time-series have to be subtracted in order to obtain an observed value of an available bandwidth. For examples on figures 3.1, 3.2, and 3.3 as well as for majority network paths that we observed, the base—level remains the same during the day. For the case shown in figure 3.6 the base-level changes throughout the day. One, however, still can make predictions based on a smaller time—scale. 22 100 Mbps 80 70 60 50 30 20 Figure 90 Mbps oo 7o so so 40 30 20 I l l l l l 1 1 l 100 200 300 400 500 600 700 300 900 1000 mlnutos 3.4: Available bandwidth before trend removal and inversion I I T I I I I I I 0 100 200 300 400 500 600 700 800 900 1000 minutes Figure 3.5: Available bandwidth after trend removal and inversion 23 20 - trend present “Elli; pinball n 4 l l "o 500 1000 mlnutes 1500 Figure 3.6: Michigan State University — University of Sydney, Australia (trend ex- ample) Mbps 9 L .__l‘ 200 300 I 400 500 600 mlnutes Figure 3.7: Michigan State University - University of Uupsala, Sweden on the right (routes oscillation example) 24 We conclude that if the slope of on-going trend is sufficiently small (with respect to time—series variability) to assume that the “base-level” of an available bandwidth estimated during the history time-frame remains the same during the time-frame for which the prediction is made, then trend subtraction does not affect the accuracy of the prediction. We do not discuss the rare cases when the routers on the network path frequently change routes for packets, which makes the available bandwidth time-series unstable, such as depicted in figure 3.7. 3.2 Modeling available bandwidth 3.2.1 “Crossing probability” Available bandwidth is an important QoS characteristic of a network path. As it was mentioned in previous sections, the mean available bandwidth is not enough to reflect the complex nature of bandwidth dynamics. We believe that the key point is to create a statistical model for available bandwidth that efficiently characterizes the network path in terms of its QoS properties. Assume that the operator is about to decide whether to perform a real—time QoS critical job over a potentially unreliable network path. Then the most important question can be stated as follows. What is the probability of experiencing a QoS critical degradation of service during the time required for a given real-time task execution? An example is the probability that the picture transferred to the operator from a remotely-controlled walking robot distorts to a point when operator is not able to understand where the robot is located with respect to an approaching wall. Now let us slightly formalize the problem. The natural assumption is that the operator should be aware of the real-time task duration, minimal allowed transmis- sion rate and the receiver’s buffer size. These three metrics can be considered as basic QoS characteristics of a given task. The QoS requirements are violated if the 25 mean available bandwidth time-series averaged over time receiver’s buffer size minimal allowed transmission rate drops below the threshold of minimal allowed transmission rate during the task’s execution time. Note that the task’s execution time will correspond to a sequence of observations of the mean available bandwidth time-series of a length t *k’ d t'. N: as 8 ura 2071 receiver’ 3 buffer size minimal allowed transmission rate Now rephrase the main QoS question as follows: What is the probability that the minimal value over the next N consecutive observations of the available bandwidth (AB) time-series drops below a given threshold where both the threshold and N are specified by the real-time task constraints? For the “adjusted” and inverted mean available bandwidth time-series, the aforementioned probability can be re- defined as the probability for the inverted AB-time-series maximum over the next consecutive N observations to exceed the pre-defined QoS critical threshold. We believe that finding this probability should be an ultimate goal and a statistically correct problem statement for the QoS characteristics of a network path. We here— and-after refer to this probability as to a “crossing probability”. Even finding a reasonable upper bound for such a probability is a good contribution to the QoS characterization of a network path. Below we build a model that describes the observed available bandwidth data. It is self-similar and amenable to analysis with respect to the aforementioned “crossing probability”. 3.2.2 ARCH model In this work we propose a parametric model that allows quick computation of the “crossing probability” for any given number of future observations N and threshold M based on estimated model parameters. We used the “ARCH 2(1)” model [20] for the description of the mean available bandwidth data. Therefore we first give below 26 some background on ARCH models and also show reasons why this model has been chosen as a candidate for the behavior of mean available bandwidth description. The ARCH(p) (Auto—Regressive Conditional Heteroscedasticity) time—series model was introduced by Engle [20] in applications to finance events. The ARCH(p) model can be defined as an autocorrelated Gaussian noise with dependency between obser— vations described as follows. p Xt = (. :30 + Z 3iX52_,-)Zt i=1 where Zt are independent observations following a standard normal distribution. The simplest partial case of the ARCH (1) time-series is defined as Xi = (1/130 + 31192.1)21 Finally, ARCH 2(1) time—series will be defined as Y1 = (.30 + BIYI—Ilzt2 From the definition of ARCH 2(1), it follows that the model is completely defined by a pair of coefficients [60,61]. Both ARCH (1) and ARCH 2(1) models were extensively applied to finance-related time—series, which at least visually have the similarity to “adjusted” mean available bandwidth time-series pattern. Before going any further we provide a picture with “adjusted” mean available bandwidth on the right and simulated ARCH 2( 1) time-series on the left. As shown in figures 3.8, 3.9 the pattern of both time-series is the same. Another encouraging factor, as shown by Embrechts, et al. [19], is the ARCH 2(1) time-series possesses self-similar character- istics. The ARCH 2(1) model allows further computation of a “crossing probability” based on the estimated model coefficients. The mathematical analysis of ARCH 2(p) models, where p > 1, is much more complex and there are no known mathemati- cal results about “crossing probability” computation for such models. The model’s arnenability to further mathematical analysis was the crucial argument in favor of 27 Mbps 70H -1 60H -1 40- - 30* 0 100 200 300 400 500 600 700 000 900 1000 mlnutoo Figure 3.8: Measured available end-to—end bandwidth. 90 1 I T I I I I I T Mbps so— - 70- l 601- 50.. 4o~ 30~ 20- ] i 10- , l o l 1 0 100 200 300 400 500 000 700 000 900 1000 minutes Figure 3.9: Simulated ARCH 2(1) observations with ,6’0 2 131 = 0.25. 28 considering the case when p = 1. However, we conducted simple statistical data analysis aiming to show that collected data do not give any significant evidence for choosing ARCH models with p > 1. As it is known from time-series studies [11] the following quantity called AICC is considered for choosing an optimal order p: 2(p+1)n AICC: —2L ,3 Me (.031 p)+,,_p_2 where L is Quasi Log Likelihood Function evaluated as: l. 4...”... J n—p L030, 13,, sp) = — Z [logmo + 61X§+p_1 + + stEH 1:1 2 Xi+p ] , 2 / 2 and n is the number of observations. The AICC criterion is a function of p, ,60, ..., tip. The “best-fit” order p is the value that minimizes the AICC criterion for 60, ..., 6p, which further minimize AI CC for a given p. We computed the AI CC criterion for 30 non-intersecting blocks of observations belonging to a real available bandwidth time-series. Each block contains 200 observations that corresponds to 3.3 hours given that the available bandwidth time-series is sampled at a rate 1 observation per minute. The size of the block was chosen large enough to be representative and small enough for assuming that model coefficients remain unchanged throughout block. For each data—block, the AI CC criterion was evaluated for p = 1, 2, and 3. Below is shown an AI CC minimizing p value for each block. 1, 1, 1,3, 1, 2, 2,3, 3, 1., 3, 1, 1, 2, 2, 2,2,3,3,1,3,1,2,1,1,2,3,2,3,1 Out of 30 trials we have 12 cases when p = 1, 9 cases when p = 2, and 9 cases when p = 3. From above values it can be noticed that the data do not give us sufficient evidence to believe that ARCH models with p > 1 work better than the one with 29 p = 1. Then following the well-known paradigm (Occam’s razor) telling that among models with relatively the same performance, the best model is the simplest one. We conclude that our decision to choose for our model p = 1 is reasonable. We, finally, conducted a test for checking an ARCH model applicability to real- life measurements of available bandwidth. Real—life measurements of an available bandwidth time-series were first prepro- cessed (as it was described in previous section) and then statistical test checking data for heterascidascicity [20] has been performed. The Null hypothesis for such a test was an assumption that data follows ARCH model whereas an alternative hypoth- esis was an assumption that ARCH model is not applicable. After trend removal and inversion for real available bandwidth measurements taken across Internet paths we obtained high p—values which clearly suggest to accept the Null hypothesis. In particular, for the path MSU-Australia the reported p-value was 0.237, which does not allow us to reject Null hypothesis at a reasonable significance level. We conclude that we have sufficient preliminary information to apply the ARCH model to an available bandwidth time—series and check how well it predicts real network paths in the Internet. In order to apply a parametric model to “crossing probability” estimation, one needs to ensure that the model parameters can be estimated at run-time sufficiently fast. Then the model needs to be mathematically analyzed in order to find an efficient method to compute the “crossing probability” based on estimated model parameters. Also one needs to be able to build confidence intervals around the es- timated probability in order to be aware of the model’s accuracy. Then the model needs to be empirically verified. We therefore provide below our work on unifi- cation of results in ARCH 2(1) parameters estimation and “crossing probability” computation together with the scheme of confidence intervals construction. ARCH(I) model has been studied extensively with regard to the description of time-series in finance. The basic work in ARCH parameter estimation has been done by Weiss [59], who proposed to estimate ARCH parameters by maximizing the 30 Standard error for no estImItor 0.16 ‘i’ L . At 0.14 , “.1. :“\:§‘ ‘V“\“ O \\'y 3"“ 0.12 ""““ \‘\:;‘::;:: :.\;§ »:::¢ ‘§::. :';\-;:_- 0.1::‘:‘§‘ \: £v:::;£t ::Vf‘ Figure 3.11: Fisher Information: 31 standard deviation of 01 Figure 3.12: Fisher Information: correlation between 31 and 30 Quasi Log Likelihood Function evaluated as: "—1 2 X2+1 L(flo,/31)= - [108;(50 + #31)“ ) + 2—l ,2 1 so + 41X? Here n is the number of observations in the window that is used for parameter es— timation. The parameter n is very important and we will talk separately about its proper adjustment. The advantage of the QMLE (Quasi Maximum Likelihood) estimator of [60,131] is that it is asymptotically unbiased. Also it is known [51] that [00,31] is normally distributed with mean lflOtruea filtruel and covariance ma- trix called the Fisher Information. The normality of estimators gives background for computing a confidence interval of any level around the estimated values of the model parameters. The biggest challenge for the QMLE estimator is, however, that it is not given in a closed form. Therefore finding estimators for [60, [31] reduces to a maximization of a function of two variables. The function maximization problem has been intensively studied and many efficient algorithms exist. However their per— formance significantly depends on the initial point from which they have to converge 32 and sometimes this cannot be done sufficiently fast for an on—line implementation. Bose, et al. [9] proposed another estimator for [I = [30,131]’. S-_[X:) ,1: 82—187,“ l 1{—4——f——}] where s,- _1— _ [1, x? _1]’,a ' _(13)— _ s,_1,3, and B- — (3’3) 1371?. Although Bose, et al. [9] give estimators of the [130,131] in a closed form, we run a verification procedure in order to check how the precision and computation time of this approach may be compared with QMLE. Our simulation included modeling of 8,000,000 dif- ferent ARCH(l) time-series with varying known lflOtruea filtruel- We came to a conclusion that for many tested values of [60true1f31truelv the estimator proposed by Bose, et al. [9] is biased whereas the QMLE estimator is unbiased for all tested values of [(30truev filtruel- Therefore we decided to join two approaches in the fol- lowing way. We first find preliminary estimators of [[10, [31] by the Bose, et al. [9] approach that takes less than a second. Then we put this preliminary estimator as an initial point of convergence for a QMLE algorithm. In this way the QMLE algorithm takes an average of approximately 3 seconds to converge. Since estimators of [130, 61] depend on past observations and therefore have a random nature, we also provide a way for quick computation at run-time of the confidence region of any level (by default 95% CR). The 95% confidence region for [130, 61] is defined as a random region in R2 that contains the point [130, [31] with probability 2 0.95. This region for the QMLE estimator will have an elliptical form defined by the equation obtained by Hotelling [26] _. q A (x3 — BYE—118' — me -— 21/2 = Fan—210.95) where F2,n—2(0-95) is 95% quantile of F-distribution with 2 and n — 2 degrees of freedom, and 2 is estimated covariance matrix. In order to compute a confidence 33 region quickly at run-time we tabulated E for a fixed 77. = 720 and a grid of 60 and ,61. Figures 3.10, 3.11, 3.12 plot standard errors for 60 and 61 as well as the correlation between [3‘0 and 61 as functions of true values of ,60 and 61. Standard errors and correlation were calculated by simulating 1000 independent ARCH 2(1) time-series for each fixed pair of 60, 181 and finding the difference between estimated and true values of the aforementioned vector. 60 and 61 then were varied within the region (05,3.5) and (0,06), respectively. This region covers a reasonable re- gion of values that 60 and 61 might take while describing the available bandwidth behavior within the range of 200 Mbps. It is known that the estimated covariance matrix is proportional to nTl. Therefore, when having available tables visualized in figures 3.10, 3.11, 3.12 for n = 720, one can recompute E for an arbitrary n in less than a second. 3.2.3 Results from extremal values theory Below we show some mathematical results that allow us to compute the “crossing probability” for any specified threshold M, number of observations N, and estimated model parameters 130, 51. These results also help us to draw a 95% confidence inter- val for a “crossing probability” by considering the fact that the “crossing probabil- ity” is a monotonically increasing function of both 60 and 61 and finding “crossing probabilities” for sufficiently many points on the 95% ellipse in the 60, 61 space. Embrechts, et al. [19] summarize recent work in extreme values providing formulas for finding “crossing probability” for various time-series models. In particular for the ARCH 2(1) model, Embrechts, et al. have 1 _ I, fit Nlim P(N-E max{X1, ..., XN} S Aim) = exp Clium —+oo that allows us to rewrite this result for “crossing probability” P and large N as follows _ , .—K. P_:1_exp (3le N 34 .o 4&3 .O m I .O _L [U1 .0 o I (’1 "Crossing Probability" 0 rev0 Figure 3.13: “Crossing” probability for a threshold M=25 Mbps and a number of observations N = 50 u where H. is a unique root of the equation (267—3 I‘(u + %) = 1. [Win stands for a threshold for an inverted available bandwidth time—series. It can be related to a threshold I'll by a simple formula Min = constant — III, where the constant is the “base-level” around which the original available bandwidth time—series was inverted (or in other words “mirrored”). Although Embrechts et al. [19] give an expression for CI, it is more efficient to estimate it empirically off—line for different [30 and 31. For that purpose we simulated 800,000 observations for each pair of 130 and [11 coming from a grid with range (05,35) and (0,0.6), respectively. Figure 3.13 shows this probability for a fixed N = 50 and M = 25 Ill bps. Then, based on the probability expression above, this tabulated result can be quickly generalized on—line to arbitrary N and IV. Calculating the estimated probability together with its confidence interval gives 35 an opportunity to judge the size of the window 71 used for prediction of 130 and 61 coefficients. On one hand we want n to be as small as possible because we assume that the model parameters remain unchanged for the time-period related to n. On the other hand, with larger n, accuracy of the prediction increases with correspond- ing shrinking of the 95% confidence region. In our experiments, for example, we put the following condition to the 95% confidence interval for the probability: the difference between the 95% upper bound and the predicted value should not exceed 0.2 for an arbitrary predicted probability value from the range [0;0.5]. Then for N E {1, ..., 100} and M E [60; 90], the lower boundary for n will be evaluated to 120. Note that the dependence on N is natural: with increasing of the period of time for which the prediction is made (duration of real-time task execution), the period of time used for prediction also needs to be correspondingly increased. 3.3 Experimental evaluation In this section we describe our method for testing the accuracy of our approach to available bandwidth modeling. In previous sections we introduced the “crossing probability” as a characteristic of a network path with respect to available band- width and the given real-time task that needs to be remotely executed. “Crossing probability” can be viewed as a characteristic of a current state of a network path expressed in the form of a prediction of the bandwidth behavior for the time-period equivalent to the next N observations of the available bandwidth time-series. The “Crossing probability” approach tells that if a network path resides in the present state (evaluated with the help of the past n observations) for the period of the next N observations, then the probability of crossing of a predefined threshold can be quickly evaluated by a proposed parametric model. The final target of the modeling is a prediction of a probability. The only way, therefore, to empirically verify the model is to apply the ergodic theorem, i.e., to have B groups with N observations in each and then count the number L of groups where the maximum over N observations within the group crossed a pro-defined 36 N\Mbps 60 62 64 66_ 68 E 72 74 76 78 80 82 84 86 88 90 15 0.09 0.09 0.08 0.07 0.07 0.07 0.07 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.04 20 0.12 0.12 0.10 0.09 0.09 0.09 0.09 0.07 0.07 0.07 0.07 0.07 0.07 0.06 0.06 0.05 25 0.14 0.14 0.13 0.11 0.11 0.11 0.11 0.09 0.09 0.09 0.09 0.09 0.09 0.08 0.08 0.06 30 0.18 0.13 0.16 0.14 0.14 0.14 0.14 0.11 0.11 0.11 0.11 0.11 0.11 0.10 0.10 0.07 35 0.20 0.20 0.13 0.16 0.16 0.16 0.16 0.13 0.13 0.13 0.13 0.13 0.13 0.11 0.11 0.08 40 0.22 0.22 0.20 0.18 0.18 0.18 0.18 0.15 0.15 0.15 0.15 0.15 0.15 0.13 0.13 0.10 45 0.27 0.27 0.24 0.22 0.22 0.22 0.22 0.17 0.17 0.17 0.17 0.17 0.17 0.15 0.15 0.11 50 0.23 0.28 0.25 0.23 0.23 0.23 0.23 0.19 0.19 0.19 0.19 0.19 0.19 0.17 0.17 0.12 55 0.32 0.32 0.29 0.26 0.26 0.26 0.26 0.21 0.21 0.21 0.21 0.21 0.21 0.19 0.19 0.14 60 0.32 0.32 0.30 0.26 0.26 0.26 0.26 0.24 0.24 0.24 0.24 0.24 0.24 0.21 0.21 0.15 65 0.33 0.33 0.29 0.26 0.26 0.26 0.26 0.22 0.22 0.22 0.22 0.22 0.22 0.19 0.19 0.14 70 0.34 0.34 0.30 0.26 0.26 0.26 0.26 0.24 0.24 0.24 0.24 0.24 0.24 0.20 0.20 0.14 75 0.35 0.35 0.31 0.28 0.28 0.28 0.23 0.25 0.25 0.25 0.25 0.25 0.25 0.22 0.22 0.16 80 0.34 0.34 0.32 0.28 0.28 0.28 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.19 0.19 0.16 85 0.40 0.40 0.35 0.31 0.31 0.31 0.31 0.27 0.27 0.27 0.27 0.27 0.27 0.22 0.22 0.18 90 0.39 0.39 0.32 0.28 0.23 0.28 0.28 0.24 0.24 0.24 0.24 0.24 0.24 0.19 0.19 0.15 95 0.39 0.39 0.35 0.29 0.29 0.29 0.29 0.25 0.25 0.25 0.25 0.25 0.25 0.21 0.21 0.17 100 0.41 0.41 0.35 0.29 0.29 0.29 0.29 0.25 0.25 0.25 0.25 0.25 0.25 0.21 0.21 0.17 Table 3.1: Empirically estimated “Crossing” probability for a path MSU-Berkeley threshold. Then the ratio L / B can be considered as an empirically computed value of a “crossing probability”. We considered as our design accuracy the following criteria: within the frames of the model, the difference between the predicted probability and its 95% upper confidence limit should not exceed 0.2 at any circumstances. Our simulations of ARCH 2( 1) series show that such “within-the—model” accuracy on a probability scale for N S 90 dictates the lower bound on number of observations needed for model parameters estimation as n = 120. Higher accuracy leads to higher values of a lower bound for n. In real-life, however, it is impossible to achieve such an ideal design of the exper— iment as it was described in the previous paragraph because of the following factors. First, we obtain B groups of measurements in a real scenario that belong to the same time-series (instead of having multiple independently generated series in the simulation scenario) and therefore they are somehow dependent. Second, even if we assume that the model parameters remain the same over the estimation window of length n (that is reasonable if n correspond to 2 hours) it is too naive to assume that these parameters will remain the same over the whole time-period needed for collec- tion of B groups of size N. Recall that present available bandwidth measurement 37 tools allow to make consistently stable measurements at a rate 1 sample per minute. Therefore, B = 100 reasonably separated groups of N = 100 observations in a group correspond to time needed for collection of 20,000 consecutive observations. With presently available sampling rate this time is evaluated to 20,000 minutes or 2 weeks of consistent measurements. In order to address the above discussed challenges we propose the following scheme for estimating efficiency of the upper bound on “crossing probability”. We consider many network paths aiming at those that have visually the same daily be- havior range. In this section we show an experimental evaluation of one such path between nodes in MSU and Berkeley that are 20 hops apart. These nodes were chosen because their hardware parameters and stable working allowed us to take 15,000 continuous observations with the help of a Pathload measurement tool. For the MSU-Berkeley network path we chose n = 120 and for a fixed N = 50 monitored how estimated 95% upper bound for a “crossing probability” evolved over the pe- riod of one day. Therefore we separated our data-set into a training-validation part that constituted 24 hours or 1440 observations and testing samples. For the first day we evaluated the width of the strip where the estimated probability resided and then computed the empirical probability for the rest of the data. By doing this we computed the “crossing probability” averaged over the period of 1350 observations. Given that the behavior of the path did not have any significant trend or oscillations over all measurement periods, we were expecting an average probability estimated by an ergodic theorem to reside in a strip defined by a first day behavior pattern. Shown in the table 3.1 are estimated empirically values of the “crossing probability” for different N and threshold M varying within the range 15-100 observations and 60—90 Mbps, respectively. As was expected, it turned out that for all thresholds M and sizes N empirically computed probability resides below the estimated bound. Table 3.2 shows the difference between the empirically computed probability and the estimated upper bound, whereas figure 3.14 shows two surfaces where the upper surface represents an upper bound estimated by the model and the lower surface represents an empirically computed probability. Figure 3.14 shows, in particular, 38 ‘Wbps 60 62 64 66 68 W721 74 '76'78 80 82 84 86 88 90 15 0.06 0.06 0.06 0.06 0.06 0.05 0.04 0.05 0.05 0.04 0.04 0.03 0.03 0.03 0.03 0.04 20 0.08 0.07 0.07 0.08 0.07 0.06 0.06 0.07 0.06 0.05 0.05 0.04 0.04 0.04 0.04 0.05 25 0.10 0.09 0.09 0.10 0.09 0.08 0.07 0.08 0.07 0.06 0.06 0.05 0.05 0.05 0.04 0.06 30 0.11 0.09 0.09 0.10 0.09 0.08 0.07 0.09 0.08 0.07 0.06 0.05 0.05 0.05 0.05 0.07 35 0.13 0.11 0.11 0.11 0.10 0.09 0.08 0.10 0.09 0.08 0.07 0.06 0.05 0.06 0.05 0.08| 40 0.14 0.12 0.12 0.13 0.11 0.10 0.08 0.11 0.10 0.09 0.08 0.07 0.06 0.07 0.06 0.09] 45 0.13 0.11 0.12 0.12 0.11 0.09 0.08 0.11 0.10 0.09 0.08 0.07 0.06 0.07 0.06 0.09] 50 0.14 0.12 0.14 0.14 0.12 0.11 0.09 0.12 0.11 0.10 0.08 0.07 0.06 0.07 0.06 0.10] 55 0.14 0.12 0.13 0.14 0.12 0.10 0.09 0.12 0.11 0.10 0.09 0.07 0.06 0.08 0.07 am 60 0.16 0.14 0.15 0.16 0.14 0.13 0.11 0.12 0.11 0.09 0.08 0.07 0.06 0.08 0.07 0.11] 65 0.18 0.16 0.19 0.19 0.18 0.16 0.14 0.16 0.14 0.13 0.12 0.10 0.09 0.11 0.10 0.14] 70 0.20 0.18 0.19 0.21 0.19 0.17 0.16 0.17 0.15 0.14 0.12 0.11 0.10 0.12 0.11 0.16| 75 0.21 0.19 0.21 0.22 0.20 0.18 0.16 0.18 0.16 0.14 0.13 0.11 0.10 0.12 0.11 0.16| 80 0.25 0.22 0.22 0.25 0.23 0.21 0.19 0.21 0.20 0.18 0.16 0.15 0.14 0.16 0.15 0.17| 85 0.21 0.19 0.22 0.24 0.22 0.20 0.18 0.20 0.18 0.16 0.15 0.13 0.12 0.15 0.14 0.17| 90 0.24 0.22 0.27 0.29 0.26 0.24 0.22 0.25 0.23 0.21 0.20 0.18 0.17 0.20 0.19 0.21| 95 0.26 0.24 0.26 0.29 0.27 0.25 0.23 0.25 0.24 0.22 0.20 0.19 0.17 0.20 0.19 0.21| 100 0.26 0.24 0.28 0.31 0.29 0.27 0.25 0.27 0.25 0.24 0.22 0.20 0.19 0.22 0.20 0.23] Table 3.2: Difference between model-based estimated upper bound and empirically estimated “Crossing” probability for a path MSU-Berkeley that two surfaces are nearly parallel to each other, i.e., that the model reflects scales of both N and M in reasonable agreement with what was empirically observed. Also figure 3.14 shows that estimation becomes less accurate when N approaches to n, which is understandable. We conclude that for a given channel the model produced verifiable and a rea- sonably accurate prediction. We also note that the model is supposed to do its job for any pair of network paths and real-time tasks for which parameters 60 and 61 do not fall out of pre-defined regions of variability and also significant trend or routes oscillations do not occur for the period of time required for a real-time task execution. 3.4 Conclusion In this chapter we presented a “probabilistic” parametric approach to a description of the available bandwidth behavior. We showed that after trend subtraction and inversion the available bandwidth can be described by the ARCH2(1) parametric model. We have intensively studied and unified known results in ARCH 2(1), tab- 39 "Crossing probability” 7A Threshold: \‘ 120 Mbps 6° 80 100 40 N (over which prediction is made) Figure 3.14: Empirically estimated 95% upper bound over empirically computed “Crossing” probability for a path MSU—Berkeley ulated relevant characteristics and provided an efficient method for computation of a “crossing probability”. Then we tested the accuracy of our model on a channel with relatively stable long—term behavior and proved that the approach has a cer- tain range of applicability. The most significant shortcoming of our experiments is the sampling rate of Pathload and its unstable output over significant number of network paths. The best Pathload’s sampling rate during our experiments was 1 observation per minute. Recall that T should be of the order of receiver’s buffer time in order to allow observations to be relevant with respect to a particular real-time task execution. Also as our simulations show we need at least 120 observations for accurate enough estimation of model’s parameters. At the same time the smaller is the size of the history window the more realistic is the assumption that behavior of available bandwidth time—series is stationary over history! time-frame used for pre— diction. Taking into account Pathload’s sampling rate we therefore were restricted with paths with relatively stationary behavior for the period of at least 2 hours. If in the future there are better available bandwidth measurement tools that are 40 able to sample at higher rates, then the range of applicability of the parametric approach will significantly increase. The next chapter introduces work directed to— wards increasing the accuracy of probe-rate based available bandwidth measurement tools and consequently available bandwidth sampling rate. 41 h; Chapter 4 Network One Way Delay Trend Detection Mechanism In this chapter we demonstrate an approach that improves a whole range of “probe- rate” AB measurement tools that send sequences of measurement packets (called “trains”) across the network path and capture a positive trend in their travel times or One Way Delays (OWD). In particular, we show an algorithm for efficient OWD trend detection and compare it to widely used OWD trend detection tests. 4.1 Main challenges of IC / CS elimination and trend detection in Pathload Jain and Dovrolis have shown in [30] that Pathload demonstrates reasonably good performance On paths that have tight and narrow link in the middle of the path. For such cases Pathload’s error was around 5% of available bandwidth value when this value belongs to the range 70-80Mbps and around 10% of available bandwidth value when the value is smaller than 10Mbps. In such cases, however, the slow link, which lies in the middle of the path, stretches packets, whereas fast links contract packets. In other words, suppose that packets are sent with a certain fixed delay. When they pass from a faster link to a slower link the delay between packet arrivals becomes smaller or is eliminated. In 42 contrast, when packets pass from a slow link to a fast link, then the delay between packets increases. Therefore by having a slow link in the middle of the path the authors are guaranteed that there will be a significant delay between packet arrivals at the end which first, makes trend detection easier and second, makes measurements less vulnerable to IC / CS effect. Another “friendly” factor in the experimental setup was good match of a length of a packet train to the path across which train was sent. More precisely, when a train was sent at rates higher than available at that time bandwidth, routers in the middle of the path were experiencing instantaneous increase of packet queues but at the same time were not dropping any significant portion of packets. That experimental design allows one to avoid addressing the problem of packet loss. It is unreasonable, however, to assume that an algorithm of an ideal match of length of a train to queue length on intermediate routers can be developed. Ubik, et al. [58] performed an extensive analysis of Pathloads Operation for var- ious networks, values of available bandwidth and values of a cross-traffic. Tables shown in their work clearly show that performance of a Pathload is far from ideal. We set up an experiment where queues on routers/switches were small enough for dropping significant portion of packets sent by a Pathload at rate higher than the available bandwidth. In some of our experiments Pathload’s PCT/ PDT test falsely reported trend for approximately 25% of analyzed trains sent at the rate lower than available bandwidth while not reporting a trend for 14% of trains sent at the rate higher than available at that time bandwidth. Even when experiencing packet losses Pathload still can report available band- width from a correct range despite the high error rate of trend detection at the level of a particular train. Pathload creators recognized the challenge of packet losses and introduced a very artificial way of partially avoiding this problem. They have set up two levels of packet losses. If the percentage of losses in at least one train in a set of trains sent at some fixed rate is higher than HIGH LOSS RATE than the rate is automatically believed to be higher than the available bandwidth. If losses in more than 50% of trains sent at fixed rate exceed MEDIUM LOSS RATE then the rate 43 is believed to be higher than the available bandwidth as well. These two levels of percentages are initially set in a Pathload as 15% and 7%, respectively. The artificial way described above of dealing with packet losses has its own shortcomings. First, HIGH and MEDIUM loss rate parameters have to be nicely tuned. Second, packets may be lost because of a processor utilization at receiving computer. As a consequence of implemented in Pathload packet-loss dealing mecha- nism, Pathload fails to report correct bandwidth even when processor utilization at receiving machine reaches 5%. This can be considered already as a significant limi- tation of a tool usage especially for a long-lasting measurements given that a receiver machine is running under any of wide-spread non-real—time operating system. The way how Pathload deals with IC/CS effect is not perfect as well. The time that the operating system needs for delivering packets to upper layer cannot be precisely estimated. Pathload’s PCT/ PDT metrics also leave a place for improve— ment. Figures 4.1 and 4.2 illustrate a typical example of a train misclassified by the PCT/ PDT test. The train was sent at the rate which was 10Mbps higher than available at that time bandwidth. On the left hand-side are plotted one-way delays of 100 packets belonging to the train. We see that the plot has parts with negative trend and these parts are strictly “linear”, which is a direct consequence of IC / CS effect. We can think of IC/ CS as of an additional buffer at which packets spend certain time and which is periodically emptied. The last packet in the group before the “buffer” is emptied waits the least time whereas the one that came straight after previous emptying waits the longest period of time. Also, it is known that all packets were released by the sender with some fixed unchanged delay. It fol- lows therefore that all packets related to strictly linear decreasing part of the plot belong to the same group. Now consider a subset of packets constituted by the last packet taken out of each strictly-linear decreasing part. On the left hand-side plot one can clearly observe that aforementioned set constitutes subsequence with clearly-revealed positive trend. In figure 4.2 is depicted a subsequence of points chosen by a Pathload IC / CS elimination procedure. One can see on that plot that at least some points of the 44 4000 ~— 3000 —— 2000 —— 1000 —« ,_N l l l 100 Relative One Way Delay 50 Packet index in the train Figure 4.1: Shortcomings of IC / CS elimination procedure in Pathload: before elim- ination 4000 —~ 3000 —~ 2000 —~ 1000 -— Relative One Way Delay 0 _ 1 1 1 l f n 6 7 Figure 4.2: Shortcomings of IC / CS elimination procedure in Pathload: after elimi- nation 45 chosen subsequence do not characterize packets that are least delayed by the IC/CS effect. Moreover the trend picture that is clearly observed even by a naked eye on figure 4.1 is not so evident on figure 4.2. The information about the index of packets finally chosen by IC/CS elimination procedure is also not considered by a Pathload. Finally, let us apply the PCT test to what is left on the right plot. There are left only seven points and only half of those are bigger than their direct predecessors. As a final result for this real-life typical example Pathload received a value of 0.43 for PCT metric and 0.17 for PDT metric which are deep in the region classifying train as the one without trend. The conclusion is that Pathload throws away significant part of a measurement information at the IC/CS elimination stage as well as when dividing the rest of data into groups and discarding everything except medians in each group. Also there is an absence of statistical motivation of the efficiency of the PCT / PDT test as well as absence of reasoning why the chosen thresholds for PCT and PDT metrics should do a reasonably good job for an arbitrarily taken network path. The example which we plotted at figures 4.1 and 4.1 was very frequent in data that we collected at our network segment and therefore explains significant inaccuracy of the PCT / PDT test when the experimental setup is not as friendly as it was with a highly-revealed tight and narrow link in the middle of the path. In next section we explain our way to fix deficiencies of a Pathload listed above and later give comparative experimental results demonstrating effectiveness of our approach. 4.2 IC / CS elimination procedure and t-test In this section we give a detailed analysis of our way of dealing with interrupt coalescence / context switch effect as well as describe the metric that we choose for efficient trend detection. The idea of proper IC / CS elimination is very simple. We do not consider the time which the system needs to bring packets from the network adapter to memory since 46 it cannot be estimated accurately. Instead we detect cases when packets constitute a strictly linear decreasing trend. If difference in one-way delays of a pair of packets with neighboring indices is negative and is very close to the same difference of neighboring packet pairs then all such packets are assigned to the same group and only the last packet from that group is considered for the further analysis. In other words, all packets with one way delays constituting a clearly revealed straight line with a negative slope are discarded except the very last one. We require such a group to have at least three packets in order to discard all packets in the group except the very last packet. At this stage it is important that we keep information about indices of packets left for a further statistical analysis. This way of dealing with the IC/CS effect is simpler and more effective than the one proposed in Pathload. We do not assume that the very last packet in the group is processed immediately after it arrives. We only say that it waits the smallest possible time for an interrupt among all packets in its group. Also we assume that the time which the last packet in a group waits for an interrupt is normally distributed and therefore methods of linear regression analysis can be further applied to chosen subsequence of packets for trend detection. The basic idea of a t-test in linear regression analysis is simple. In our case so—called “response variable” is one way delay of a packet whereas “predictor” is the packet index that incorporates time flow. The main question to which t-test for a slope-coefficient in linear regression is supposed to answer can be stated as follows. Do the train data give sufficient evidence to believe that one-way delay of a packet depends on its index? Linear regression procedure plots a best-fit line across data points corresponding to one—way packet delays in a way that minimizes squared distances from data points to the plotted line. Slope coefficient of the line can be found according to the formula 31 ___ REEL-:1 372% - (Z?=1$i)(2?=1 ”ll-i) ”231:1sz ‘(Zf=1$i)2 In this formula y,- represents one way delay of a packet, :13,- represents packet’s 47 index in the train whereas n is total number of finally left packets belonging to the train. Since 31 depends on random measurement errors it also has a random nature. It is known [51] that under linear regression model assumptions coefficient [fl follows t-distribution with n — 1 degrees of freedom centered around true value of this coefficient. The null hypothesis in our t-test is an assumption that the train does not contain any OWD trend or in other words the slope coefficient of a regression line is zero. Therefore the distribution of a random quantity Bl under null hypothesis is t-distribution centered around zero with n — 1 degrees of freedom. If we reject the null hypothesis (i.e., decide that there is a trend in a train) then the probability of an error can be calculated as an area under the centered around zero t-distribution density, which remains on the right hand-side of the rejection threshold. The error is an error of classifying the train with no-trend as a train with a trend and is called a type I error. The probability of a type I error naturally depends on the placement of a rejection threshold. The smallest value of type I error probability given that we still reject the null hypothesis can be achieved if we place the rejection threshold straight to the value of 1531. The area of t-distribution density function which is left on the right hand-side of the observed value of 61 is called p—value of the test. We decided to choose the obtained p—value of the test as our final quantity for judging trend presence instead the value of 3’1 itself. As it follows from previous paragraphs, the low p—value values should be considered as an indicator of a trend presence, whereas high values should be considered in favor of accepting of the null hypothesis. We choose the threshold for a p—value to be equal to 0.01. The threshold set in the p-value space has a statistical motivation behind it. It means that we want to have the misclassification probability for trains which really do not have OWD trend (i.e., trains which were sent at the rate lower than available at that time bandwidth) to be approximately equal to 0.01. We have to accept that there is not much that one can say about type 11 error probability (probability of classifying a train with trend as a train without trend) unless it is estimated by an experiment for a fixed value of a decision threshold. 48 1.53E+09 — ‘~ 0 a \ > E d) o 5‘ g a a a) 1.53E+09 — 3 g c 8 8 O O. O. 9 17, a ‘43 2 .‘\ 2 o _ § ~ Q) o .9 1:: c . 1.535+09 — 1 1 " 0 50 100 Packet index Figure 4.3: OWD pattern at the presence of packet loss. Trains were sent at a rate 40 Mbps higher than available at that time bandwidth. However, we now have at least some statistical reasoning under choosing a particular level of a p-value as a decision threshold instead of choosing rejection regions for PCT and PDT metrics on absolutely empirical basis. A consideration of how much of information is still available after IC/CS elimi- nation for a further analysis can be viewed as one more argument towards choosing p—value as a final statistic of interest. With a decreasing of number of points re- maining after IC / CS elimination, and corresponding decrease of degrees of freedom n t-distribution becomes more and more “stretched-out”. This means that for a fixed value of Bl the area on its right-hand side will be larger and larger with de- creasing of amount of available information (i.e., available data—points or degrees of freedom). Correspondingly, in order to have a sufficient basis for rejecting the null hypothesis we will need a much higher value of 31 compared to the case when more data points are available for the analysis. Another advantage of our scheme of IC/ CS elimination and trend detection is that it deals with packet losses better. The influence of packet loss to trend detection is very significant. When routers in the path discard significant number of packets they may “recover” from “instantaneous” overflowing of their queues. Therefore trend detection becomes available only when considering each piece of contiguous 49 1.53E+09 ~ ..... 0°. .0 o o E :5. $0 . 8 .5 .0. ...0 (>0‘ 0.... ...o '0'. . ; ... ..o :2 0:: q) 1.53E+09 — ' OJ 0' C ..o. {3 ‘0’. O a :0. G) H o: .2 U) 0' ”(U .9 .0... T) o (I .0... o 1.53E+09 # i l l 0 50 100 Packet index Figure 4.4: OWD pattern at the presence of packet loss. Trains were sent at a rate 30 Mbps higher than available at that time bandwidth. 1 .53E+09 — 1.53E+09 —— Relative One Way Delay 0 lost packets lost packets 1 .53E+09 — — —‘ 0 50 . 100 Packet Index Figure 4.5: OWD pattern at the presence of packet loss. Trains were sent at a rate 20 Mbps higher than available at that time bandwidth. 50 chunks of received packets separately. The effect of packet—loss is well-illustrated by figures 4.3, 4.4, 4.5. We can clearly see there that even after elimination of IC/ CS parts the whole picture does not give an impression that there is a trend whereas every contiguously received chunk does give such an impression. In such cases Pathload reports a high percentage of “uncleared” trains because Pathload throws away too much information. Pathload frequently does not have any reserve left for further splitting a train into a number of sub-trains. Our scheme can afford such splitting and therefore we report much lower “unclear” trains, which results in significant reducing of number of trains that are required to be sent in order to achieve the desired detection accuracy. We report train as an “unclear” only when we have finally less than 4 data-points left. If we have a number of reported p—values related to a number of sub-trains then we decide about the trend in the whole train based on the majority—vote rule in sub—trains. Note that if the IC/CS packet group is directly followed by lost packets then we throw away all that group of packets including the last one. With decreasing a number of trains required to be sent, measurement time also is significantly reduCed. This gives our approach a higher degree of flexibility in available-bandwidth related planning, which is of a high importance for present real-time network applications. In the next section we describe our experimental setup and give comparison of our scheme performance versus the Pathload. 4.3 Experimental evaluation We experimented within our laboratory network, which consists of three 100Mbps capacity segments. The sender and receiver were placed in the first and third seg— ments, while the path from the first to the third segment was lying through the sec- ond segment. We wanted to have both Pathload’s PCT/ PDT test and our approach to have absolutely equal conditions. Therefore we modified an original version of a Pathload by adding a number of printouts that were outputting to the separate file the OWD times measured by Pathload. Also for each train we outputted values of 51 the PCT/ PDT metric together with Pathload’s final decision about the train. Then we processed the same train data by our code, which eliminated the IC/CS effect, conducted t-test, outputted p-values, and finally made a decision about the trend presence. During the first part of the experiment it was known with a high degree of confidence that the background traffic in all three segments is very low (around 1 Mbps) and does not exceed 2-3Mbps at any period of time of the length of 200 milliseconds. For that part of the experiments we knew therefore that all trains sent at the rate lower than 95Mbps should be reported as trains without trend. Also we knew that all three links capacities were lOOMbps and therefore all trains that were sent at rates 98 Mbps and higher should be reported as trains with a trend. This experimental setup did not contain any tight / narrow link in the middle of the path. We run Pathload continuously for the period of approximately 7 minutes and analyzed in total 4650 trains. Then we imposed a slight utilization to the processor of a receiver and repeated the experiment. Table 4.1 shows how the performance of the PCT/ PDT test deployed in a Pathload is compared with the performance of our approach when no processor utilization was imposed at the measurement machine responsible for receiving packets. Table 4.2 shows detection performance comparison when processor utilization was 5 — 7%. Tables are structured as follows. In the second bold column are shown type II errors for PCT/ PDT test and t—test, respectively. In particular, there is shown the portion of trains that were incorrectly classified as trains without a trend when the trend was really present. Note that this portion has been computed “conditionally” i.e., as a portion out of total number of trains that were sent in the specified rate-range and were not assigned to “uncleared” train set. In the third bold column is shown type I error for multiple rate ranges. Let us remember that we have set a decision threshold for p—value equal to 0.01. We see in all tables that experimentally evaluated type I error is not that far from “theoretically-predicted” value 0.01, which can be considered to some extent as a verification of model’s applicability. “Unclear trains” colunm is followed by a column showing the number of trains that the t-test based tool needs to send in order to 52 achieve accuracy of PCT/ PDT test sending 6 trains at a given rate. Naturally, if a number in this column is greater than 6 then PCT/ PDT test demonstrates better performance for a corresponding rate and vice-versa otherwise. For the case when there was no any extra processor utilization, t-test performed better for all trains sent at the rate higher than 98Mbps as well in the range 15- 55Mbps. Although we have slightly higher error rate for a range 55-95Mbps than PCT/ PDT test has, still both approaches need to send three trains for achieving 99% accuracy. Difference in performances for the mentioned range therefore can be considered as not significant whereas for other rates our trend detection mechanism significantly outperforms the PCT/ PDT test. Table 4.2 clearly shows that the PCT / PDT performs worse when operating even under very small on-going processor utilization. In particular, the PCT/ PDT test fails to correctly classify trains sent at the rate-range 15—55Mbps. Its error ap- proaches 0.5 and sometimes even exceeds this value, which makes the rate classifica- tion procedure either unacceptably long or not applicable at all. At the same time in the global scale Pathload was unable to report even once available bandwidth from a correct range when processor utilization was imposed. For example, the maximal reported value of available bandwidth out of twenty runs was 70Mbps when the channel was almost free and processor utilization was 3.5%. The responsibility of this failure is Pathload’s method to deal with packet losses. During the second part of our experiment it was known with a high degree of confidence that the total cross—traffic was always exceeding the level of 65Mbps. In other words, we were guaranteed that if trains that were sent at rate 35Mbps or higher are classified as those with trend, then this classification is correct. There was no processor utilization imposed to a measurement machine. The goal of the experiment was to check how efficient t-test and PCT/ PDT test detect trend in trains that are sent at the rate lower than the capacity but higher than the available bandwidth of the network path. Table 4.3 shows results of t-test and PCT / PDT test accuracy for trains sent at the rate higher than the available bandwidth. Instead of type I error in the third bolded 53 N of trains Rate at 1 Trend detection that t-test which Classified as 0- Classified as mor accuracy for a needs to N of trains required to Total trains 95 Mbps (withou than 98 Mbps Unclear trains fleet consisting achieve 6- achieve at least 0.99 number of were trend) (with trend) out of 6 trains train accuracy analyzed sent accuracy of trains PCT/PDT PCT/PDT t-teet PCT/PDT t-teet PCT/PDT t-test PCT/PDT t-tost test PCT/PDT Host 1535 0.000 0.014 0.500 0.014 0.985 1.0071 3 7 3 72 3555 0.560 0.023 0.233 0.000 0.312 1.000 1 >100 3 564 55-75 0.002 0.040 0.093 0.000 1.000 0.999 9 3 3 420 7595 0.002 0.054 0.098 0.006 1.000 0.997 11 3 3 1296 > 98 0.083fi0.026 0.296 0.000 0.971 1.000 1 9 3 2226 Table 4.1: Comparison of PCT/ PDT test and t-test performances on nearly empty network and zero receiver’s processor utilization N of trains Rate at Trend detection that t-test which Classified as 0- Classified as mor accuracy tor a needs to N of trains required to Total trains 95 Mbps (withou than 98 Mbps Unclear trains tleet consisting achieve 6- achieve at least 0.99 number of were trend) (with trend) out 016 trains train accuracy analyzed sent accuracy of trains PCT/PDT PCT/PDT t-tost PCT/PDT Hut PCT/PDT t-test PCT/PDT t-test test PCT/PDT Most 15-35 0.441 0.010 0.190 0.055 0.508 1 1 >100 3 199 35-55 0493 0.008 0. I 52 0.063 0.412 1 1 >100 3 1186 55-75 0063 0.0I0 0.077 0.074 0.994 1 3 5 3 1070 75-95 0033 0.011 0.053 0.069 0.999 1 5 4 3 1926 > 98 0.081 [0.061 0.302 0.046 0.973 0.996 3 9 5 710 Table 4.2: Comparison of PCT/ PDT test and t-test performances on nearly empty network and 5 — 7% receiver’s processor utilization Rate at Classified as Trend detection Classified as less more than 35 accuracy tor a fleet N ot trains required to Total which trains than 35 Mbps Unclear trains consisting out 0t 6 achieve at least 0.99 number ot (without trend) Mbps (With trend) trains accuracy analyzed were sent trains PCT/PDT Host PCT/PDT Host PCT/PDT Host PCT/PDT Most PCT/PDT t-tost 35-5_5 0.005 0.922 0.453r 0.000 0.000 0.932 >100 5 592 5575 0.021 0.761 0.958 0.000 0.004 0.853 >100 17 426 7595 0.325 0.958 0.394 0.000 0.157 0.999 >100 3 335 > 98 0.276 [ 0.082 0.305 0.001 0.766 0.991 38 5 1601 Table 4.3: Comparison of PCT/ PDT test and t-test performances on heavy loaded network and zero receiver’s processor utilization 54 Difference N of trains between that t-test rate at which Classification error needs to N of trains required Total trains were Unclear trains achieve 6- to achieve at least number train 0.99 accuracy of sent and accuracy analyzed available I of bandwidth PCT/PDT t-test PCT/PD t-test PCT /PDT PCT/PDT t-tost 15-35 0.122 0.008 0.464 0.000 1 16 1 1023 3555 0.054 0.000 0.299 0.002 1 7 1 711 55—75 0.020 0.003 0.344 0.001 1 6 1 751 7595 0.443 0.292 0.397 0.001 1 >100 29 741 > 98 0.476 0.065 0.346 0.001 1 >100 5 664 Table 4.4: Comparison of PCT / PDT test and t-test performances on average loaded network (around 50Mbps :tIOM bps cross—traffic) and zero receiver’s processor uti— lization column we show detection accuracy for a specified range of rates. Table 4.3 shows that the PCT/ PDT test completely fails to detect trend when suffering packet losses and non-friendly experimental setup. We also note that despite this failure at the level of trains, Pathload still always reported values of the available bandwidth from the correct range because of the artificially implemented packet-loss mechanism. We believe, however, that an available bandwidth measurement tool can be significantly improved after excluding an artificial mechanism of dealing with packet losses and respectively relying solely on a train-level reported statistic. In the third part of our experiment we artificially generated and monitored cross- traffic in order to obtain a clearer pattern of error percentage as a function of differ— ence between the rate at which a train was sent and available bandwidth. Unfortu- nately we were unable to capture all traffic on the network and were able to monitor traffic only with a certain error. In our setup generated traffic was fluctuating around 50 Mbps. Since there was no 100% confidence in cross-traffic measurements and in order to be on a safe side we monitored trains which were sent at rates either at least 15Mbps higher or 15Mbps lower than tentatively reported available bandwidth. Based on the setup of our experiment we were able with sufficiently high degree of confidence to assume that these train rates were either higher or lower than the 55 available bandwidth. Table 4.4 shows an output for this part of the test. Now we have only one column showing classification errors. For lines where difference between train rate and available bandwidth is negative, the error represents type I error. For lines with positive difference between rate and available bandwidth, the error represents type II error. Table 4.4 shows that our approach again outperforms PCT/ PDT test for all shown ranges. 4.4 Conclusion In this chapter we proposed a simple algorithm of dealing with the IC/ CS effect together with a trend detection algorithm based on a simple linear regression model and p—value reported by t-test. We showed that our algorithm is more robust with respect to the IC / CS effect and packet losses. Our real-life experiments clearly Show better performance of our approach in comparison to PCT/ PDT test deployed first in Pathload and later in other probe-rate measurement tools [42, 46]. As a result, with our scheme either the accuracy of any probe-rate based available bandwidth measurement tool can be significantly improved or the measurement time can be reduced at least twice while preserving the same measurement accuracy. However, despite the algorithm introduced in this chapter allows an increase of at least twice the available bandwidth sampling rate, the range of applicability of such a measurement tool for “crossing probability” estimation is still limited. We know that for most real-time applications, the averaging time 7' is of the order of seconds rather than minutes. In the next chapter we introduce a non—parametric model for “crossing proba- bility” estimation, which is based directly on the OWD trend detection technique. Taking into account that the OWD at a given rate can be detected for the period of the order of tens of a second, our non-parametric model can predict crossing proba- bility over the next second based on the behavioral history of last ten minutes. The next chapter gives details about our non-parametric approach. 56 Chapter 5 Non-Parametric Approach In this chapter we provide a more general non-parametric approach for computation of a “crossing probability”. In contrast to Chapter 3 that relies on measurements of the available bandwidth (AB) time-series, a non-parametric approach in this chapter relies directly on measurements of indicators showing whether the AB time-series is smaller or larger than a given value. Therefore a non-parametric approach is robust, allows faster computation of a crossing probability than the parametric approach in Chapter 3, and has a larger range of applicability since it does not rely on any parametric time-series model. Moreover, for computation of indicator values that compare available bandwidth against a specified value, we do not rely on the Pairwise Comparison Test / Pairwise Difference Test (PCT/ PDT) metrics [29], but use our own way of trend detection introduced in Chapter 4. We suggest to further optimize estimation of a “crossing probability” by “tuning” it to a specific network path and even the day of the week at which measurements are made in order to make estimation more efficient. First, different network paths have different time-frames at which available bandwidth may be treated as a stationary time—series. Second, different paths may “forget” about their history at different rates. In other words, even while assuming that the available bandwidth time-series is stationary, it might be applicable to assign larger weights to the recent past and smaller weights to the older history. The rate at which weights decay is network—path specific and needs to be adjusted as well. 57 Finally, since the main goal of our research is a prediction of a “crossing proba- bility,” we introduce and compare methods for a verification of the accuracy of our prediction based on training and previously unseen testing data-sets obtained by real-life measurements across multiple Internet paths. Along with real-life experi- ments, we present theoretical results that show that our verification procedures are probabilistically legitimate. 5.1 Moving block bootstrap Let us consider a stationary sequence X1, X2, ..., Xn of observations of zeros and ones where each observation is related to the period of time of approximately 0.5 sec- ond. Let the value zero correspond to an event when the available bandwidth time- series is larger than a given threshold whereas one is related to the case when the available bandwidth is smaller than the threshold. Each observation can be obtained with the help of the probe-rate approach. “One” is reported when there exists a positive trend in travel times of packets in a sequence sent at a given threshold rate. “Zero” is reported when there is no indication of such a trend. For obtaining observations of zeros and ones related to a specified threshold rate, sequences or trains of packets were sent exactly at that rate. Since each train was sent within the same period of time, the “averaging time” 7' does not vary from train to train. Each observation is related to one train of packets and therefore 7' can be defined as the time that one train of packets requires to be delivered from sender to receiver. The problem of finding the crossing probability is reduced to finding a probability that among N consecutive observations of “zeroes” and “ones” there exists at least one “one”. Note, that observations of zeros and ones are tightly related to cross— traffic existing in the path and therefore we have to assume that these observations depend on each other. More precisely, they form a stationary sequence that possesses long-range dependency and self-similar properties [34]. Based on aforesaid, the natural way of non-parametric estimation of a cross- ing probability is as follows. Observe n values X1,X2, ...,Xn, X2- : {0,1},z' E 58 Future Here n =12; N = 3, C = 7, P = 0.7 History Figure 5.1: Window of size N = 3 slides across the history data of size n = 12 for prediction of a crossing probability P over coming N = 3 observations {1, 2, ..., n} of the time-series of interest. Then establish a window of size N slid- ing across the sequence of observations of size n by one observation at a time (see figure 5.1). At each cycle increase the counter C when among N observations in a sliding window there exists at least one “one”. When the sliding window of size N gets to the end of the sequence of size n, divide the counter C by total number of cycles (n — N +1) and consider this ratio as the estimator of the crossing probability for the next N observations. If n— N + l is large and at the same time the time-series is stationary during the period of time that corresponds to n observations, then the ergodicity theorem says that the aforementioned ratio is a legitimate estimator of the crossing probability (i.e., it converges to the crossing probability as n tends to infinity). The estimator of the crossing probability C / (n — N + 1) depends on observations of a random time-series and therefore is a random quantity. It is important to report not only the value of an estimator but also its 95% confidence interval. This is a random interval around the estimator such that the true value of the crossing probability falls into the interval with a likelihood of 95%. The only presently known way of non-parametric estimation of a distribution of an estimator is the bootstrap technique. However, the bootstrap in its basic formulation helps to find a distribution of a random quantity T (data) that is a function of independent identically distributed random observations. Given that initially there are given It independent random observations, the bootstrap proposes to consider B groups of k observations in each, where each group is sampled with replacement. Therefore in some group there may exist multiple copies of a particular 59 observation and some observations may not exist in a certain group as well. The bootstrap proposes to compute B values of quantity T, where each value is observed only at a particular group taken with replacement. Then distribution of a random quantity T can be estimated based on its B bootstrap values. In particular, the 95% confidence interval can be found with the help of 2.5% and 97.5% quantiles of obtained dataset of size B. Our quantity of interest T can be defined as n—N-l-l , T = i=1 C2 n — N +1 . In our case, however, 71 —— N + 1 observations of an indicator Ci;1max{Xz-,X2-+1,...,Xz-+N_1}=1 depend on each other. Therefore application of a bootstrap in its original formulation is not legitimate. Liu and Singh [39] introduce the notion of Moving Block Boot- strap (MBB) re—sampling for the case of m—dependent random sequences. They be- lieve however that their procedure works well on more general stationary sequences. Therefore we believe that it is legitimate to use the Moving Block Bootstrap proce- dure in our work. We next describe the procedure. Let b be the size of the block of consecutive observations of time-series Xi, Xi+1, ..., Xi+b—1' If this block “moves” across the whole dataset of size n then n — 0+ 1 such blocks can be obtained. Note that length of the block 0 should be sufficiently large for capturing all correlation structure of the underlying time—series. More precisely, 0 should exceed the lag H at which the autocorrelation function p(h) is negligibly small. Then ['2'] blocks are chosen with replacement out of n — 0+ 1 existing blocks. These blocks are grouped together, one after another. Thus the new n “re-sampled” time-series observations are obtained. The first bootstrap value of a crossing probability estimator 121* is obtained based on these n observations. We repeat the above procedure 1000 times and obtain 1000 60 different bootstrap values of 03*, B = 1, ...,1000. Then an empirical distribution of a crossing probability estimator can be obtained from p“B* bootstrap values. In particular, the 95% confidence interval can be constructed by 2.5% and 97.5% quantiles of 1000 observations of p“B*. The size of the block b as well as total number of available bootstrap blocks n — b + 1 are important parameters. With an increase of the period of time defined by the number N over which the prediction is made, the “history” period of time that is used for computation of a predictor has to be increased as well. On the other hand we tried to keep the “history” period as small as we could because the “stationarity” assumption of a “zero-one” time-series is more realistic for smaller “history” time-periods. I We believe that the following rule achieves the best tradeoff between keeping the size n of the history—window small and still having a representative statistic. Each block in the MBB approach has to have at least 20 non—overlapping windows of the length N and the “history” window should contain at least 10 non-overlapping blocks. Therefore the size n of the window that is used for prediction should be approximately 200 times more than the size N of the window over which prediction is made. For example, if the rate threshold for which we predict the crossing probability is 80 Mbps, then one train of 120 packets can be sent for approximately 0.3 of a second. Suppose then that the real-time task takes 3 seconds to execute with T = 0.3 seconds, which means that N = 10. Then n should be of the order of 2000, which corresponds to time-period of 600 seconds or 10 minutes. Therefore we conclude that prediction of crossing probability for the period of 3 seconds can be made based on channels behavior over recent 10 minutes that is reasonable for assuming stationarity of the available bandwidth time-series. 61 5.2 D-statistic: model verification mechanism The moving block bootstrap technique described above helps to report a non- parametric estimate of the probability (along with its confidence interval) that among on-coming N observations there is at least one “one” . The problem now is to verify efficiently how accurate non-parametric prediction technique works for a real-life time-series of indicators. We predict the probability of the realization of a certain event and the natural way to verify accurateness of our prediction is to ob- serve whether on real unseen data the predicted event has been realized. Then from many independent predictions and realizations / non-realizations of these predictions, one might make an inference about the accurateness of the prediction. Multiple predictions, however need to be independent and therefore for the same network channel these predictions have to be significantly separated from each other in time scale. These predictions separated in time-scale may report different values of crossing probability because the stationarity assumption for available bandwidth time-series does not hold for large periods of time. Below we show the design of a statistically correct verification test that clearly states the Null and Alternative hypothesis. We also provide a theoretical background that proves legitimacy of our test. Assume that we have the set of predictors 13,-, i E 1, ...,l of success probabilities of independent Bernoulli random variables along with actual observations of these variables Yi. Let the Null hypothesis be an assumption that predictors p“,- match actual success probabilities of Yi- Let the alternative hypothesis be an assumption that p”,- do not have any relationship with realizations Y2- Now consider the following S 11111 = 22:10”) — 152') 72:14.11 — 25.) Below in the separate theoretical section we prove that despite Yis are not identically D distributed random variables, the aforementioned sum under the Null hypothesis converges in the distribution to standard normal random variable (i.e., the Central 62 15 — Probability density under Alternative Hypothesis H1 . - . - . . Probability density under Null Hypothesis H0 10* '1 pdf o 0.2 0.4 0.6 0.8 1 p—value Figure 5.2: Probability density functions for a p-value conditioned by Null and Alternative hypothesis. Limit Theorem does hold). If D statistic follows the standard normal distribution (N (0, 1)) under the Null hypothesis, then we can reduce the verification problem to a classical Kolmogorov- Smirnov normality test in the following way. Given that one can observe the network channel for an arbitrarily large period of time, it is possible to collect many realiza- tions of a random quantity D. Then, once we have at least 100 such observations we conduct Kolmogorov-Smirnov normality test and consider p—value of the test as an indicator of whether our predictions were correctly made. Null hypothesis H0 for the normality test is an assumption that D ~ N (0, 1). An alternative hypothesis says that D does not follow N (0, 1). Under the Null hypothesis, the p—value is known to be uniformly distributed on the interval [0;1]. At the same time the p-value for data with distribution other than N (0, 1) take small values. Therefore low p—values (less than 0.05) can be considered as a sign that the normality test failed and that prediction did not work correctly. High p—values are indicators that the data do not give us reasons to claim that the approach has failed to predict correctly. 63 For a better explanation and visualization of the meaning of a p-value in the Kolmogorov-Smirnov normality test we did the following simulation. We generated groups of size 30 of observations of Bernoulli random variables. In each group were present realizations of variables with different success probabilities. Directly from the form of D, it is clear that if the approach either constantly overestimates or underestimates the success probability, then the distribution of D will be significantly shifted from zero and therefore by definition will not be considered even close to standard normal. In our simulations we considered the scenario where probabilities pi participating in construction of the D statistic had nothing to do with realizations of simulated Yi- Values p,- were simply randomly taken values from the interval [0;1]. Once we had a group of reported values of the D statistic of size 200, we conducted a Kolmogorov—Smirnov normality test and reported its p—value. The whole aforementioned procedure was repeated 10000 times and 10000 p-values were obtained for the scenario where the prediction was fake. Figure 5.2 shows two probability density functions on the interval [0;1]. The solid line shows the empirical density of p—value of the Kolmogorov—Smirnov test for data where predicted success probabilities were not related to realizations. The dashed line shows the density of the distribution of the p-value under the Null hypothesis, namelythe uniform distribution on [0, 1]. We provide figure 5.2 in order to give a clear impression that if reported for a real dataset that the p—value of the Kolmogorov—Smirnov normality test is sufficiently high then this can be considered as a reasonable verification of matching of reported predictions to on—coming events. Low p—values of the test may be considered as an indicator that the predictions are inaccurate. Remember that the history window size, n, used for prediction of a crossing probability is an important parameter. On one hand it should be large enough to have representative history data. On the other hand, with the increase of the history window, the assumption of stationarity of a time—series {X 2} becomes less realistic. Different network paths have different time-periods during which one can treat an available bandwidth related time-series as stationary. We refer here—and-after to such a time-period as a stationary time-period. Our goal therefore is to detect the 64 stationary time-frame for a particular network path. The “self—verification procedure” described above can be considered as a natural way to detect a suitable range of history window sizes. We repeat the same procedure and obtain p-values for the same history data and different history window sizes. Dependency of reported p-values on the history window size is believed to be uni- modal. Naturally, for large window sizes we are supposed to obtain small p—values because the stationarity assumption does not hold and predictions are inaccurate. For small windows, p-values should also be small because there are too few zeroes and ones used for predicting crossing probability 13 and therefore the prediction is not accurate. Therefore, the dependency of a p—value on the history window size should have one mode defining the range of acceptable sizes of history-windows. The approximate time-scale of a prediction/self-verification is therefore as fol- lows. For predicting of a crossing probability for the period of the next 2 seconds we need around 8 minutes of history frame. However, for doing self-verification and reporting a p—value we need 1500 independent realizations/non-realizations of an event of a crossing. Therefore self-verification may require the time of 15 hours of data-collection. Let us emphasize that actual prediction of a crossing probability for the next 2 seconds is based on the history of the order of 8 minutes and stationar- ity is assumed only within the frames of these 8 minutes of a history window size. We can start outputting crossing probability immediately after observing a network path for the period of 8 minutes. In addition, after observing a channel for 10-15 hours we are capable also to specify the range of sizes of history windows that are suitable for a prediction and use the window from that range for making our predic— tion. Finally, if for some network path it turns out that stationarity does not hold for the required time-period and the range of acceptable history window-sizes does not exist, then we can detect this phenomenon and report a temporary inability to make accurate predictions. Decreasing the time-period required for self-verification is important and we address this issue in detail later in this work. 65 5.3 D-statistic: theoretical results In this section we give a tl'reoretical background to the fact that D statistic has stan- dard normal distribution given that predictors 15,- match the actual success proba- bilities of corresponding Bernoulli random variables Y2" We will start with a basic result that establish the Central Limit Theorem when success probabilities p,- are known. Proposition. Let Y1, ..., Yn, be independent. with distribution Y2- ~ Bernoulli (pl) Assume that 00 2 P211 - Pi) —* 00 2'21 . Then I , y. _ . :77“ 2 1),) => N(0, 1) t/Zizl P211 - Pi) Remark. A simple condition that implies oo 2: P111 - 107;) = 00 i=1 is the existence of c > 0 such that c < pi < 1 — c, i = 1, 2,3, Proof. Denote Yr - P2: ’1 724:1 201(1 — Pr) Since E (Y2) = pi for any i we have a. E(Yz',l) = 0, and b. 24:15:03.3) =1 Y,- ,r=1,...,r The result follows by a standard application of Lindberg-Feller condition. For every E > 0 l ZE(YZZZ;|YH|>5)—>0 asn—+oo i=1 ’ 66 Observe that. 1 sup {IY il[}— —>0asl—>oo 1955 \/Z$=1Pi(1_pi) So Lindberg—Feller condition holds. 0 Our next task is to convert the Proposition into a result that matches the setup we are dealing with. Now we assume that p,- are not known, so we replace them by their estimators 13,-. Let n be a fixed positive integer. We remark that n appears actually as n — N + 1 (i) in other sections. Let Yk ~ Bernoulli (p2) be a triangular array of random variables where each row {Ykm, k = 0,1,...,n} represents a stationary sequence, 1' = 1, 2, Further, we assume that the stationary sequences represented by different rows are independent. We now estimate each p,- by 211:1 Yk ('1) TI. Pit: To tie our notations with those used in other sections we will denote Yi 2 Y7?) . The following can be proved similarly to the Proposition. Therefore we skip the proof. Theorem. Assume 00 : v...(y,. — ..,.) = .. 121 Then 1 21:1 (Y,- _ 152') VZi=1vaffyi — 172‘) :>N(O,1) The quantity Var(Y- — p,) can be calculated precisely using pg)? = P(Yk(z) = 1, Yk+m = l), a quantity that due to the stationarity assumption does not depend on k. Observe that p8) 2 pi. Lemma. n . 1 2m ' Varm - pi) = (1+ g)” - Z (717%? m=1 67 Proof of Lemma. We will drop the index i. for ease of notation. Var/“(22;6 Y — Yk) Varr(Y — [3) = n2 :22”; -0 EKY-YkXY—lel n2 225-— pO—pk—p,'+[}-, " 1.9—1 . J [k—j] 1 2m 1+- -— E —- 7L2 :( n)p0 =1(n2 )Pm Using the Lemma we can rewrite the convergence in the Theorem as 21:10? —I32') (2') => N(0,1) (/(1+%)Z[=1pi-Z,_12m =1(T) Pm (i) A natural estimate of Pm is given by a 22:01 "W15 W12... Pm: n—m We conclude that for large n we get that Z§:1(Yt #3.) 72:113.- - 25:1 34;"ng => N(0, 1) [\D (i) In the extreme case where Y?) ,k = 0,1,...,n are independent then p. = 1, m p22, m 2 1 and we get: - 1 Var(Yz- —pz-) 2 (1+ 'T—lXPi — P3)~Pi(1—Pi) In general, we do not have independence but only stationarity. However, if the (i) 2 stationary sequence has enough ergodic (mixing) properties then we do get pm ~ p2- , 68 0.9 - 0.8 _ E . E 0.7 -< .o 0.6 .. e l Q 0,5 - 2" a 0.4 - U) 9 0.3 s O , ‘ 0.2 . 4 d 0.1 _, 0 1 1 1 1 1 1 1 0 500 1000 1500 2000 2500 3000 3500 4000 '21.6 seconds; 4000 = 24 hours Figure 5.3: Evolution of the crossing probability along with its 95% confidence intervals throughout the day. M = 65Mbps, N = 5, T = 0.433ec. at least for large m. Now we look at the expression 2” _1(—-2-2 m)p£n) and we see that it represents a weighted average of {p,(n)} where most of the weight is on large value of m. So an assumption of strong ergodicity will lead to El(:m)10m) N Pi In that case we can write that approximately 214% -I3z') 725:1 26.0 — a) The formula above is used in the following section. D = ~ N(0, 1) 5.4L D-statistic: experimental evaluation Figure 5.3 shows how crossing probability evolved throughout the day for the net- work path between nodes at Michigan State University (35.9.4290) and the Shenyang Institute of Automation of the Chinese Academy of Sciences (210.72.135.98). The accuracy of prediction is clearly shown on the figure by means of the reported 95% confidence interval. 69 15 l r T I — Probability density under Alternative Hypothesis H1 . - . - . . Probability density under Null Hypothesis H0 10- pdl +0- 0.6 0.8 1 p—value Figure 5.4: Probability density functions along with p-values collected for the route 35.9.4290 (Michigan State University) - 137.189.100.235 (Chinese University of Hong Kong). 15 1 I I I — Probability density under Alternative Hypothesis H1 . . t . . . Probability density under Null Hypothesis H0 0.4 0.6 0.8 1 p—value Figure 5.5: Probability density functions along with p—values collected for the route 35.9.1014 (Michigan State University) - 210.72.135.98 (Shenyang Institute of Au- tomation, Chinese Academy of Sciences). 70 Experimental verification of correctness of crossing probability prediction has been performed on multiple network paths where experimental nodes were located in the United States, mainland China, and Hong Kong. On all paths we observed the same pattern which is well-illustrated on two examples below. The verification procedure described in the previous sections has been applied to real measurements for verifying whether prediction of crossing probability made by our module is correct. Verification has been performed for different values of aggregation time T, bandwidth threshold M, and number N of observations over which we predict. Note that the machines used for measurements do not belong to the Planetlab network because there have been criticism that Planetlab machines do not reflect typical Internet patterns of available bandwidth between nodes and the processor utilization on Planetlab machines are often high. Processor utilization of measurement machines is an important factor, which needs to be low in order not to affect measurements. Therefore the machines that were used in the experiments were completely dedicated to measurement. Verification for a particular path and values of T, N, and M was performed according to the following scenario. First, we collected around 400,000 continuously taken observations of indicators. Given that T varied up to 0.5 sec for collecting data for one set of parameters, we observed the channel for the period of two days. Measurement of indicators was performed with the help of modified Pathload’s code. We enabled logging of one way delay times in Pathload, and used our own module for processing and analyzing these times. Also we removed Pathload’s binary search part of the code and forced the Pathload to send packets at the user-defined rate. After obtaining 400,000 continuous observations of indicators, we used the win- dow of size N *200 for making our first prediction of the crossing probability for next N observations. After the prediction was made we checked whether the bandwidth threshold M was really crossed for next N observations thus obtaining the first re- alization of Bernoulli random variable Y1 considered in the previous sections. Then the window used for prediction of the length 200 >1: N was shifted 100 observations ahead and new prediction was made for next N observations of indicators. Therefore 71 1‘ adjacent groups of size N were separated by 100 observations. We considered such a separation sufficient for claiming that two adjacent groups are independent on each other. Then the same procedure of shifting and predicting has been repeated 30 times allowing us to collect 30 observations of crossing probabilities p,- and indica— tors of actual realization of the predicted event Y2. With the help of aforementioned observations we construct the first value of the D statistic. Then we continue with another portion of 30 cycles of predicting crossing probability, checking the realiza- tion of predicted event and then shifting the window ahead. We therefore obtained 130 values of the D statistic and then applied the Kolmogorov-Smirnov normality test to these 130 values. Finally we report p-value of this test. The whole procedure of data collection and analysis has been repeated for values of bandwidth threshold M E 55, 60, 65Mbps and N E 5, 10, 15. Our verification test can be viewed as a two class categorization problem. The first “matched” class may be defined as the class of input instances of success probabilities of Bernoulli random variables together with their realizations when realizations match probabilities. The second “unmatched” class is the class of instances where success probabilities are not related to realizations. The p-value is the only feature based on which we decide to which class given instance belongs. Figure 5.2 show class—conditional densities of the p-value. In other words, for any new unseen instance p-value, Figure 5.2 suggests to assign it to the class which has higher probability density at the point which corresponds to that p-value. Therefore for each given network path we have to check that at least 95% of p—values relate to real-life data, which clearly suggest to us to assign those instances to the “matched” class. Figure 5.4 and Figure 5.5 show p-values for two paths that we monitored for a significant period of time. The p—values for different runs are shown as “+” signs. Note that the y-coordinate of observed p—values (shown on plots by “+” signs) was assigned a random value just for better display of the spread of data along the :r-direction. The a: coordinate of “+” signs is meaningful and represents the observed p-value. For clarity purposes, we copied two class-conditional densities to the result plot. With the densities on the plot, it is clear which class—density takes higher value at points corresponding 72 n 400 600 800 1000 1200 1400 ——$———Va"d Ra" ems'ze Obs Seconds 1 0.004 0.035 0.838 0.475 0.0135 0.0045 800-1000 400-500 2 0.041 0.583 0.236 0.846 0.2456 0.0094 600-1200 300-600 3 0.003 0 0.004 0.242 0.5723 0.03891000-1200 500-600 4 0.467 0.683 0.007 0.005 0.0023 0.0003 400-600 200-300 5 0.097 0.344 0.272 0.012 0.0241 0.003 600-800 150-200 6 0.572 0.296 0.973 0.386 0.742 0.067 4001200 100300 7 0.042 0.634 0.437 0.027 0.0024 0 600-800 150-200 8 0.003 0.048 0.473 0.489 0.0081 0 8001000 200250 9 0 0 0.346 0.39 0.6723 0.0038 8001200 200300 Table 5.1: Valid range of history window sizes to p—values of tests conducted against real prediction measurements. Figure 5.4 and Figure 5.5 show that all instances of our verification procedure are clearly suggested to be assigned to the “match” class. Also one can note that the whole pattern of p—values on both figures is similar to the pattern of realizations of a uniform random variable. Table 5.1 shows dependency of reported p—values on the chosen history—window size. Each row in a table represents a different network path. Each column represents a different size of a history-window used for prediction. All predictions were made over windows of size N E {2, 3, 4, 5}. The two right columns represent the range of acceptable history window sizes in observations and in seconds, respectively. As it follows from the table, some experiments were made when observations were sampled at the rate 2 observations per second and some at the rate 4 observations per second. As figure 5.2 suggests we assume that the particular history window-size works well if reported p-value is larger than 0.1. Table 5.1 clearly shows that the trade-off between complying stationarity assumption on one hand and having representative statistic on another hand can be achieved in practice for all analyzed cases. We conclude this section with a note that our prediction without an exclusion passed the accuracy verification test at all paths that we monitored. 73 5.5 Prediction optimization with weights In addition to choosing an optimal history window-size described in section 5.2, we can use the same data that were used for that purpose for further increasing prediction accuracy by consideration of different weights applied to the recent and older past. The predictor 15 = mfg/v? shown in previous sections can be viewed as a partial case of applying weights. We can view the counter C as a sum of indicators 1k, 11‘ E {1,2,...,n — N +1} where 1k = 1 when at least one X,- = 1, i E {10, k + 1, k + N — 1} and 1k = 0 otherwise. We therefore can see that the same weight H-TNH is applied to each of n — N + 1 indicators within the frames of a history window. Naturally, all weights should add up to one. In this work we propose to use weights of the form $11107 where the normalizing constant H can be determined from the equation n—N+1 H: 2 k7 k=1 Therefore the predictor of a crossing probability takes the following form n-N +1 1 p = E 31.71,, 1:21 The expression for 13 suggests assigning higher weights to the recent past, which corresponds to higher values of k. A parameter 7 describes the rate at which weight decay and can be adjusted based on the same history data that were used for an adjustment of a history window size. For adjustment of 'y, we propose to minimize a quantity of a total relative error as follows. If we obtain from a “self-verification” dataset many predictors p“,- together with the following realizations Y2" then the total relative error can be evaluated as Error = Z [131' — Y2] Z 74 The quantity of an error introduced above does not tell us much in an absolute sense about the accuracy of our prediction. For that purpose we use an appropriate statistical procedure such as the one introduced in the section 5.2 together with its p—value. However, minimization of a relative error can help tune the rate 7 of weights decay. Overall, our algorithm for tuning parameters to a specific network path may be described as follows. First, we collect sufficient data for conducting a “self- verification” procedure. The minimization of the amount of data and the time required for that purpose will be considered later in this work. We assign to 7 a value 0, which corresponds to the case with uniform weights. Then we run the self-Verification algorithm for a number of window sizes, where n varies within the range of [50 =1: N; 1000 ... N]. We use obtained p-values to obtain an optimal value of n, which corresponds to the range where p—values are high. If the range with sufficiently high p-values does not exist, then we still take n that corresponds to the maximal reported p—value. With the newly obtained history window-size n, we minimize the total relative error and thus find the best rate '7. Then we use the new value of alpha and again try different window sizes to determine the new range of sizes for history-windows. The expected outcome is that the range of applicable window sizes either arises or expands after applying the weight-based technique. The usage of the weights within the frames of a stationary time—frame can be motivated by the fact that the Internet traffic time-series demonstrates a positive correlation pattern at small lags. In other words, higher values of traffic imply higher values at close time-moments, whereas small values of traffic imply small values. The available bandwidth time-series is closely related to a cross-traffic and sometimes is directly equal to a constant minus the cross-traffic through a tight and narrow link. Therefore we believe that the most recent values of the available bandwidth related time-series should be more important than the others. 75 5.6 R—statistic: improvement of a verification pro- cedure In this section we discuss an optimization of the self-verification technique. In short, the self-verification procedure collects a number of predictions of a crossing probabil- ity p“,- together with realizations/non-realizations of the predicted events of crossing and then suggests a statistical test that checks if predicted values of a crossing prob- ability really match oncoming events of a crossing. The output of the self-verification procedure is used by a parameters adjustment module, which was introduced in the previous section. Therefore, it is important to reduce the number of observations and respectively, the boot time that the self-verification procedure needs to produce its output. The statistical test shown in the section 5.2 assumes the creation of the D- statistic on the basis of pairs of quantities 151': Yi- The D-statistic follows the stan— dard normal distribution under the null hypothesis, which indicates that predictions match on—coming events. The procedure has to collect at least 50 independent values of the D-statistic, then conduct a normality test on them and report the correspond- ing p—value. 5.6.1 Timing First we briefly analyze the time that is required for reporting one value of a D statistic. The time—period up to 0.25 seconds is required for conducting a robust check of the available bandwidth time-series against a certain threshold. Therefore we assume that one observation of the original time-series {X} of zeroes and ones requires 0.25 seconds to be reported. Note that Yz- is an observation that equals one when at least 1 out of N observations of the underlying time-series {X} equals to one. Let us assume for the sake of preciseness that N = 4. Then the time for observing one value Yz' of the realization of an event of crossing is 1 second. Computer simulations, confirmed by real-life experiments, show that the time needed 76 to report a good predictor of a crossing probability p,- varies from 50* N to 1000* N. Assuming that the history window-size is of the order of 500 =1: N, we note that the history time-frame is of the order of 500 seconds (8.3 minutes). The form of the D statistic assumes that observations of Y,- are independent of each other. This can be achieved in practice only by separating them in time by at least 100 observations of the underlying time-series {X}. Finally, taking into account that we need at least 30 summands to make the D statistic robust, we note that the time for outputting one value of the D-statistic can be evaluated as t = 500 + (100 * 0.25 + 1) =1: 30 = 1280 see (21 minutes). We conclude that the boot time for a self-verification module required for a collection of at least 50 values of the D-statistic is of the order of 16 hours. Our optimization of the self-verification module is a better replacement for a D statistic. Introduced below, the R statistic possesses the same characteristics as the D statistic while requiring only around 500 seconds (8.3 minutes) for the mentioned above example to be reported. This means that time of the order of only 6 hours is needed for reporting the p-value of the normality test. Therefore we reduce the boot time for the self-verification procedure by a factor of 3. 5.6.2 R—statistic Consider observations of a Bernoulli zero—one time-series of {X i}, 2' E 1, ..., n, n + 1, ..., n + N, which fall into a history time-frame plus the following N observations. Then we cre- ate the time-series Wk according to the following rule: IV}, = H18.X{Xk+1,Xk+2, ""Xk+N} —fik, k E {N, ...,n} where pk is a predictor of a crossing probability based on the first It observations of time-series {X ,} Then the following statement holds: :2“: N Wk R: \/(n—N+1)r“r,2, =>N(0,1) 77 where 2 n W12 n—30 n— k z +2 z z: - ——.N— — N — k i=N n + 1 k—N i—N n + 111 Figure 5.6 shows the way how R-statistic is created. Also figure 5.6 illustrates gain in time that is obtained when moving from D-Statistic to R—statistic. For illustration on figure 5.6 we have chosen N = 4, n = 2000 and sampling rate equal to 4 observation per second. To obtain the R statistic, we enter the history window of size n, start with a small history of size N, and then for each summand in a numerator begin to increase the window size one observation at a time. In each summand we predict the crossing probability 15,, and then check it against the following (unseen by 13k) N observations of a history data. Naturally, summands obtained in that way are highly dependent. The main advantage of the R statistic over the D statistic is that the former does not require independence of underlying observations Wk, whereas the D statistic requires Yk to be separated in time. One can see that a variance expression [7,2, has a term that considers correlations between observations of Wk values. We can fit one value of the R—statistic directly into the history time-frame. Therefore, with the R-statistic, the available data are used much more efficiently. The following section provides a mathematical proof of the statement above telling that the R statistic should follow a standard normal distribution under the hypothesis that predictions match realizations. 5.7 R—statistic: theoretical results We start with a definition of the concept of strongly mixing. It is one of the criterion used to quantify a situation in which the longer the time between observations, the more statistically independent those observations become. Definition. Let ...,X_1,X0,X1, ...,Xn, be a strictly stationary sequence of random variables. It is called strongly mixing if a(n) —+ 0 where a(n) is defined by 0(0) = SUP {IN/403) —P(A)P(B)|} AEFQOO,BEFTCL)O 78 1 value of D-statistic: history of size 20000bs + 30 windows of size 4obs + 30 gaps of size 1000bs = 5120obs (or 1280 secfinds when 4obs are taken for 1 second) { \ x2001 - x2004 X2105 - x2108 x5117 - x5120 IX1 'x2000I I I I I x2006'x2104 I I I I I x21os-x6116 I I I I ' \ W4 5 Y1 = max{x2001a°--9x2004} “ Y2 Y3 - st Yso '1‘ Expansion of X1 - X20... “ ................................ IX. ] X. | x. [x. | X. | X. X. | X. X. - X...» |X.... |X....|X....|X....| V V 7 J 7 7 Q. w4 = max{x.,...,x.} - 6. 2V 1 k M wzooo = maxfxzoo1.---.Xzoo4} ' 62000 I Y - 1 value of R-statistic: we need just history frame with size 20000bs + 1 window of size 4obs = 20040bs (or 501 seconds when 4obs are taken for 1 second) Figure 5.6: Reducing the time of self-verification procedure when switching from R—statistic to D-statistic where F900 = 0{Xk : k g 0} and F80 = 0{Xk : k 2 n}. Ibragimov and Linnik proved in [28] the following Central Limit Theorem. Theorem 1 (18.5.4). Let V1, ..., Vn, be a strictly stationary sequence of ran- dom variables with E (Vn) = 11.. Assume that the sequence is uniformly bounded ( i.e. there is C > 0 with anl S C, as), and strongly mixing with 220:1 a(n) < 00. Then a2 : Var(V0) + 222:1COV(V0,VA,) converges. If a2 75 0 then 22:1 Vk‘ fie A comment about the proof of Theorem 1. It is based on organizing 11 =>N(0,1) the sequence into blocks that alternate between long and short ones. The total contribution of the short blocks to the limiting distribution is negligible. One then takes advantage of the assumption on the mixing coefficients, namely: 230:1 a(n) < 00, by showing that the long blocks are essentially independent. Since the long blocks are identically distributed due to the stationary assumption the result follows. For our purpose in this article we need to convert Theorem 1 into a Central Limit 79 Theorem, where the mean ,u is replaced by an estimate that is updated regularly with If V- the additional data available to us as time moves on. We denote Dn = —1—.:fil—£, n = 1,2, and we have Theorem 2. Under the assumptions of Theorem 1 we have n . If :11le II 1‘: J50 => N((), 1) VVIICI‘C an '2 th, — 17". Comments about the proof of Theorem 2. We will not provide a full detailed proof due to the length consideration. Instead we make a few comments. Denote: a)”, = 1 — blunt where bk,n = ZILkI, k = 1,...,n, and then with ,u = E(Vn) we get 22:110. _ 2,3,/5 “k nd —Z a]. n (V1.0— u) “”0 _ k—l were the first equality can be verified by simple algebra while the second follows from 22:1 akfl = 0, which in turn follows from 22:1 bk,n :Zk— 12,_ k. a. = n (1) i=1 2 27:1 1 = n. We would like to show now that n ak n(Vk _ l1) 2 2 (1) E(Z ’ . ) —>a asn—ioo k=1 fl We will show that 3 (‘2) Indeed 2 2 2 TI. , 'n k:1“k,n _ 2221(1— bk,n) _ 1+ 2:1.=1b/.,n n TL n (since 2 bk,n = n) 1021 80 So to establish (2) we need to Show that Z” (3) J—S—fl 2 asn—eoo Since bk n ~ —log(%) we calculate n 1 —— ~ 2 log2(:)( (7194/0 log2(;r)d;r = lxllogtariaoge) — 2) + 21115: 2 and (2) is established. In a similar way we can prove that for each fixed k > 0 we get —k. a-.a- (4) Ewal asn—eoo A standard calculation gives E12 ’ ) =1: k=1 fl p-n (”z—3k WWW.) V ) k=1 2': n , k which leads to (1) by using (2) and (4). To prove Theorem 2, we now need to repeat the idea of alternating short and long blocks as was explained in the comment after Theorem 1. Finally we remark that an attempt to prove Theorem 2 by using Theorem 1 and 177,, ~ [1 (justified by the ergodic theorem) will fail since the difference between the relevant 2 quantities is not negligible even in the independent case. Theorem 2’ (Extension of Theorem 2 to weighted sample means). Fix n k7V 'y > 0 and let Wn = Vn — 1771, where 177,, = Jill—k where Tn)», = 22:1 1:7 no 81 Under the assumptions of Theorem 1, we have: 713:1 Wk => N (0 fie ’ 27+1) Comments on the proof. Here we will define bk n :15] Z"; kT1—’ k- — 1,. n, and we still get Zk=10 k,n = n, and 22:1 Wk _ Zn: ak,n(vk — M) «to ‘ 1—1 «to As before we will concentrate only on the asymptotic behavior of the second moment: n a. .— . 2 (0’ E1}: ..,..(v. ”>24 " kzl W 2’, +1 as n —> 00 To prove (1)’ we need to show (3), Mal+ iasn—ioo n 27' +1 +1 which is proved as follows: T714»), = ZZ’_1 k7 ~ %I— and then _ I TI. TL 1 1 1 . z . 3’7 z=k ’7 22k _'y_+__171 7 k[i—73=.k= '7 717 So we get k=1 k,n 7+12 k7 2 1 l l 211 m) 1”) TI. 7+12/1 2 1 — 1—-7 d=1 —» [7] 0( :L) a: +2,7+1 asn oo 82 Equipped with (3)” we get: HE: ———az’nai+k’n —> 1 l: > 0 ,_ n 27 + 1’ _ and (1)’ follows as before. It will useful for us to replace Theorems 2 and 2’ by the following version which follows immediately from Theorem 2 by a standard probability argument. Theorem 3. Fix 7 2 0. Under the assumptions of Theorem 1 we have 7? W- 1 21:1 1:} N (0, —— “"0721 27 + 1 ) 2 if £3 —> 1 in probability. a 2 Three methods to estimate 0 . Here we use 7 = 0. From (1) we see that if n is large enough we get Ti. 073(71VI:_ I1) 424—)?— - £122" >“‘~o2 _ x/fi) Using the fact that {ll/Tn} is approximately stationary we see that we can utilize Method 1: where A! is a fixed number (say 30). This works since by the ergodic theorem 2 . W. —k w.w. k 3:. —..L ~ E and 2);, fig; ~ E(W0Wk)- Alternatively we may use Method 2: M(Zk+an W )2 “’2‘: n: Mn (TL-Afn) where we can take for example .Mn ~ \/r—i. Again by ergodic theorem we get 0,2, ~ 83 2 E(Z£I_n1 Vii—f? SO the requirenlent 53- —+ 1 in probability, C1088 hOld. Finally, following Peligrad and Shao [47] we can consider as a method 3 the x , ‘ . 2. follovnng expressmn for 0n- since under our conditions >232: _ Vn)2 021( log1(n 13| —1|—>0 asn—>oo 5.8 R—statistic: experimental evaluation We first evaluate our new self-verification procedure and then use it for evaluation of the efficiency of the proposed improvement on the weighted prediction of a crossing probability. To evaluate the self-verification procedure, we first generated stationary se- quences of zeroes and ones and run the self-verification procedure on them. We first generated ARMA(p, q), p,q E {3, 4, 5, 6, 7, 8,9} stationary sequences, chose a reasonable threshold and then produced “zero” if the time-series remained below the threshold and “one” otherwise. Since generated time-series were known to be stationary by the experimental setup, we always expected to obtain high p—values as an output of the self-verification procedure. Note that zeroes and ones were highly dependent on each other in our setup. We experimented with different window— sizes to learn what is the minimal history—window size that works well for the self- ven'ficatz'on procedure. This experimentation was conducted because we wanted to separate clearly the potential occurrences of low p-values when running over real data due to a failure of the self-verification procedure to do its job. It turned out that beginning from history-sizes of 200, the self—verification procedure works well 84 History size n (a of observations) 200 400 600 800 1000 1200 V3"" Range °l H'SWY'S'Z" Observations Seconds p—values before weights adjustment 0.0263 0.3196 0.3702 0.0023 0.0217 0.03022 400-600 100-150 p-values after weights adjustment 0.11 0.8845 0.6566 0.1258 0.2026 0.2793 400-1200 100-300 p-values before weights adjustment 0.0162 0.277 0.1436 0.0198 0.0367 0.0338 400-600 100-150 p—values after weights adjustment 0.0759 0.6941 0.6769 0.5685 0.2995 0.0158 400-1000 100-250 p-values before weights adjustment 0 0 0 0.0072 0.073 0.0061 0 0 p-values after weights adjustment 0 0 0.0324 0.123 0.251 0.0042 800-1000 200-250 p—values before weights adjustment 0.011 0.0974 0.3438 0.272 0.01225 0.0241 600-800 150-200 ovalues after weights adjustment 0.279 0.2048 0.4899 0.1303 0.87 06 200-1200 50300 ovalues before weights adjustment 0.021 0.054 0.0063 0.003 0.0086 0.0023 0 0 p-values after weights adjustment 0.026 0.32 0.56 0.054 0.0043 0.0021 400-600 100-150 p-values before weights adjustment 0.0522 0.0776 0.3744 0.1035 0.0007 0 600-800 150-200 p—values after weights adjustment 0.1333 0.4703 0.171 0.6262 0.0198 0.005 200-800 50-200 p-values before weights adjustment 0 0 0 0.005 0.5852 0.0085 1000 250 pvalues after weights adjustment 0 0 0.0043 0.103 0.532 0.342 800-1200 200-300 Table 5.2: P-values for different justment. history window sizes before and after weight ad- (reported p-values follow the pattern of uniform distribution on interval [0, 1]). We conclude with figure very similar to 5.2, which shows two densities that describe behavior p—values for our self-verification procedure. For evaluation of weights with decaying rate adjusted to a particular network path, we used the Auckland II N LANR traces [1] as well as our own measurements of end-to-end available bandwidth. The Auckland-II dataset [1] is a collection of IP header traces captured with a DAG2 system at the University of Auckland Internet uplink. It consists out of multiple chunks of continuously taken measurements. We used this dataset consid- ering that the available bandwidth over a link is merely a constant, minus the traffic through the link. We processed traces and our output was a link traffic sampled at the rate 4 observations per second. Then we applied our prediction technique with different windows-sizes N and different thresholds M against which we wanted to estimate the crossing probability. For each trace in Auckland-II dataset we followed the following setup. We first ran a prediction and self-verification module, thus obtaining a range of history— windows that were suitable for a prediction. Then we ran the parameters adjustment procedure and after that again obtained the range of windows that were good for a prediction. The second part of our experimental verification deals with measurements of end- 85 to-end available bandwidth that were collected by us over different Internet paths. Measurement nodes were located at Michigan State University, Chinese University of Hong Kong and Shenyang Institute of Automation, Chinese Academy of Sciences. For measuring time-series of indicators {Xi} we used a Pathload [29] measurement tool modified based on our algorithm introduced in chapter 4 for a robust trend detection. The difference between working with traffic trace and actual sampling of an available bandwidth is that in the latter case we are unable to alter a threshold because measurements were conducted against one threshold level at a time. Table 5.2 shows p—values for different windows sizes before and after adjustment of parameters. We present a number of representative samples from the Auckland traces and from our measurements. We predicted the crossing probability for N varying around 5, which corresponds to the time-period of the order of a second. We can easily see that the range of history windows that are valid for a prediction increases after application of an adjustment procedure. In some cases we even see that the suitable history window sizes were not available before running the weights adjustment procedure. This means that in cases mentioned above, the weights adjustment procedure made predictions possible. We emphasize that the p—value of a test is roughly a likelihood to observe what has been actually observed under the assumption that predictions of a crossing probability match on—coming events. If this likelihood becomes small (say less than 0.1) we conclude that it is very unlikely to see what we see under the assumption of the Null hypothesis and therefore the Null hypothesis has to be rejected with the conclusion that the predictions are not accurate anymore. We therefore assume for our algorithm that if the reported p—value is more than 0.1, then the window that corresponds to that p—value is valid for a prediction. According to this scheme we report in the two right columns the ranges of windows that turned out to be valid for a prediction. In the first column the range is presented in the number of observations while in the second in seconds. In order to show the performance of the algorithm on many samples, we provide figure 5.7. Figure 5.7 shows two histograms of ranges of acceptable window sizes 86 25 # of samples 0 50 100 150 200 250 300 Seconds Figure 5.7: Histograms of acceptable range of lengths of history windows before and after weights adjustment. The black histogram shows the range distribution after adjustment whereas the gray histogram shows the range distribution before adjustment. 87 (upper bound minus lower bound) reflecting 100 samples of behavior of an available bandwidth at different days and places. The grey histogram is related to the case when uniform weights were used while the black histogram shows a range of accept— able windows when weight adjustment procedure has been applied. We can see that a gray histogram captures zero, which means that for significant number of traces the acceptable range of history-windows does not exist. The black histogram shows that the application of weight-based adjustment increases on average more than 3 times the range of history windows, which is good for predictions. 5.9 Limits of a verification procedure In this section we discuss limits of applicability of the above—introduced self-evaluation procedure. An application of our self-evaluation technique naturally fails when re- ported values of a crossing probability are extremely small. Consider, for example, a limit case when the whole history window does not capture even one event of a crossing. Then the predicted value of a crossing probability will be evaluated as zero and the estimated value of a variance 6,2, in the expression for R statistic also becomes zero. The theoretical proof in section 5.7 also assumes a non-degenerate case for predicted values of a crossing probability and thus a non-degenerate case for the variance 02. In practice one can observe that if the predicted values of a crossing probability are too close to zero then reported p-values become small just because self-evaluation procedure does not do its job. We note that if the crossing probabil- ity is nearly zero then the operator concludes that network environment is safe and tele—operations can be allowed anyway. An Operator needs to track cases when the crossing probability significantly differs from zero andfor these cases self-evaluation technique works well. We believe, therefore, that it is important to understand for how small values of a predicted crossing probability the self—evaluation technique still works correctly. With the Auckland-II dataset we were in possession of exact values of available bandwidth time-series. Therefore we were able to freely alter threshold against which 88 a crossing probability had to be estimated. Altering the crossing probability threshold made it possible to determine the range of applicability of our verification technique with respect to the value of a crossing probability which has to be estimated. We note that with a decrease of the value of a crossing probability (by moving down an available bandwidth threshold), the history window size needs to be increased for avoiding a degenerate case in the denominator for an R—statistic. Then the history window can still capture a couple events of a crossing. In other words by increasing the history window we increase the granularity at which the crossing probability can be predicted and therefore allow crossing probability values to be closer to zero. At the same time the upper bound for the length of a history window does not depend on the value of a reported probability. This bound is solely defined by the time-period during which one can still consider the available bandwidth time-series as stationary. Therefore for each fixed future window size N there exists a minimal value of a crossing probability P at which the acceptable range for a history window size decays completely. We note that the size N of a future time-frame over which the prediction of a crossing probability is made also affects the range of history window sizes. Suppose, for example, that a crossing probability P is fixed at a certain value. Then it is natural to assume that with an increase of the window size N, the history size n also has to be increased (such that history window of size n still contains a significant number of non-overlapping windows of size N). Based on aforesaid we conclude that the minimal value of a crossing probability for which the self-evaluation test still works well should be increased with an increase of a window size N over which the prediction is made. We used the Auckland-II dataset for experimental determination of how the mentioned above minimal predicted value of a crossing probability depends on the size N of the future window over which the prediction has to be made. Also we compared applicability limits of prediction/ verification techniques with R—statistic and weights against the one with D statistic . For each technique (the new one related to R—statistic and weights-based pre— 89 0005 I l l l I I i 0.045 Probability *Hiflmfli ¥ *¥* :2 0.02 -+ 0.015 7 f .i 0.01 - i. 0.005 - 0 - t l J 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Seconds Figure 5.8: Dependency of minimal value of a crossing probability for which ver- ifieation procedure works well on the future time-window over which prediction is made and on prediction / verification technique diction, and the one related to D-statistic) we conducted the following procedure. First, we fixed the future window size N at a certain value. Then we made a binary search for a threshold which on one hand still reports the maximal p-value higher than 0.1 and on another hand has the value of a crossing probability as small as possible. Note, that for each value of N and each threshold level we tried a set of different history window sizes and finally selected the maximal p-value among those that were reported. In other words, if at least one history window size existed for which the reported p-value was higher than 0.1 then we were further decreasing the available bandwidth threshold for achieving further decrease of the value of a crossing probability. Finally, after the binary search converged, we outputted the value N together with values of a crossing probability. These minimal and accept- able crossing probability values were obtained while checking available bandwidth against the “boundary” threshold obtained with the help of mentioned above binary search. In figure 5.8 we marked these values by red “+” signs for the experiment with 90 F'- old prediction / verification technique without weights. Minimal acceptable crossing probability values related to the R-statistic based prediction/ verification technique “*” signs. are marked in figure 5.8 by blue The same procedure has been repeated for N E {1, 2, 3, 4, 5, 6}, where one obser- vation corresponds to a time-period of 0.25 seconds. Therefore the future time-frame over which the prediction was made varied within the range of [0.25; 1.5] seconds. Note, that for representation purposes we slightly shifted apart “-1—” and “*” signs. For example, “+” and “*” related to a time-period of 0.25 seconds are shown as observations related to 0.24 and 0.26 seconds respectively. Finally, on figure 5.8 we connected the mean values of a minimal crossing proba- bilities reported for each N with two solid lines (for old and new techniques respec- tively). From figure 5.8 we conclude that the verification approach still works for real data when the value of a crossing probability P drops as low value as 0.01. Figure 5.8 shows that, as it was expected, the minimal value of P increases with an increase of a window size N over which the prediction is made. Also from figure 5.8 clearly fol- lows that new prediction/ verification technique significantly decreases lower bound for an acceptable values of a crossing probability. 5.10 Conclusion The basic result in this Chapter represents a robust non-parametric method of com- putation of a “crossing probability”. This probability defines a quality of an end-to- end network path with respect to a specific real—time task. The probability represents the chances of a real-time task execution to fail due to the inability of the path to give sufficient bandwidth to communicating parties. Our approach has a capability of dynamically choosing the length of the window n that is used for prediction. Since we provide a robust verification technique, our algorithm can continuously do a self-check of the accurateness of the prediction at run—time and automatically adjust the length of a prediction window. Low p- values of a test can be considered as a clear indicator that for a given network path 91 the window size 71 does not do a good job in prediction and therefore needs to be adjusted. After presenting a basic non-parametric prediction/ verification technique we in— troduce two significant improvements of a non-parametric approach to estimation of a crossing probability. First, the improvement deals with tuning basic parame- ters such as the size of the history n used for a prediction and also the coefficient a that can be treated as a rate at which the network path “forgets” its history. This is important because there exists a large variety of patterns of traffic along different Internet links. Some links may be highly multiplexed by a large number of users and some may have only sporadical traffic bursts associated with instanta- neous user activity. At some links, the majority of a traffic may be constituted by a video-transfer, which has highly-revealed long-memory nature and at some links traffic may even represent a Gaussian time-series with exponentially decaying auto— correlation, and therefore short memory properties. Since the available bandwidth time-series is tightly related to a cross-traffic, we see automatic tuning to a specific network path as a crucial factor, which makes our approach applicable to a wide range of behaviors of available bandwidth time series across the Internet. Second, the improvement deals with an efficient procedure that helps to verify the correct- ness of the prediction and therefore is a kernel for efficient parameters adjustment to a specific network path. We accompany both, D-statistic and R-statistic with a solid theoretical argumentation that proves the validity of a chosen verification procedures. Therefore the work represents a combination of theoretical results and experimental verification of these results applicability to various real—life data-sets. We test the efficiency of a weight-based approach with decaying rate a on publicly available traffic data—sets as well as on data-sets of available bandwidth collected by us on a number of Internet paths. 92 Chapter 6 Re-route Sequence Planning Problem As it was stated in Chapter 3, a tele—operation can be viewed as a number of flows with predefined QOS requirements. These flows need to be allocated across the Internet through the best possible network paths. We believe that the promising suggestion is to consider allocation of tele—operation “facilitators” at geographically dispersed parts of the Internet. The idea is in setting up an application-level virtual network that facilitates allocation and reallocation of channels through different network paths. There are a number of efforts that consider dynamic network high-level route allocation, deallocation with the help of overlay networks [2, 37] for optimization of network delay or guaranteeing compliance to a given QoS requirements. In this chapter we take one step towards setting up such an infrastructure while considering a mathematical aspect of a Reroute Sequence Planning Problem [32]. The problem can be briefly stated as follows. Assume that a set of channels with a given bandwidth requirements or demands is allocated through the currently optimal set of network paths. Then sometime later another Optimal set of network paths is computed. The second set of paths reflects the changed QoS conditions of the network environment and therefore may differ from the previous set. The Reroute Sequence Planning problem chooses the optimal sequence of channel reallocations 93 from an old set of paths to a new set of paths that minimizes the amount of extra network-link bandwidth required for conducting the reallocation procedure. This chapter deals with the well-known mathematical “Dynamic Vector Surn- mation” problem to which the Reroute Sequence Planning problem can be easily reduced. In particular, we give a tight bound for maximum extra link bandwidth capacity required for carrying out reroute procedure. We also provide an algorithm that finds in polynomial time the sequence of route reallocations that complies pro- posed bamlwidth bound. 6. 1 Introduction The compact vector summation as a pure mathematical tool has been introduced and developed at the beginning of 20-th century by Levy [35] and Steinitz [55]. The problem can be stated as follows: given vectors in a normed linear space find their rearrangement 1r for which the maximum deviation R” of partial sums from zero is as small as possible; or, estimate Rn under the best rearrangement 7r . There are at least two sound functional—analytic problems where these methods are being used. One of them is the description of the set of sums of a conditionally convergent series in a topological vector space. Another problem is the famous Kolmogorov problem (open so far) on the system of convergence of abstract Fourier series under a rearrangement of the orthonormal basis. For an extensive bibliography devoted to this subject, see [25, 33, 22, 10]. There are also a series of problems in scheduling theory that reduce to this problem stated in the next section (see [52]). The Reroute Sequence Planning (RSP) problem described by Makai [41] is an example of such a scheduling problem in telecommunication networks. Below we give a description of RSP problem. A network consists of the set V of vertices and the set E of edges connecting certain pairs from V (not necessarily all pairs), IE I = d (the number of edges is d ). Consider a system of paths 73 = (P1, . . . , Pn) , the path Pi connecting the 94 Figure 6.1: Network graph example vertices s,- and t,- ,z' = 1, . . . ,n. By a path connecting 3,; and t,- we mean a broken line consisting of edges with the starting point at 32- and endpoint at t,- (of course one can state the problem formally, entirely in terms of the abstract graph theory). Fig. 6.1 shows an example network consisting of 8 vertices and 8 edges. Consider a system ”P = (P1, P2) of 2 paths, where P1 connects 31 and t1 and consists of the edges 6 1, e4, e6 and e7 , while P2 connects 32 and t2 and consists of 82, e4 and e8 . Each of the paths Pk,1 g k g n imposes a demand ”k on each of the edges of the path. If an edge belongs to several paths, then the total demand u(e) imposed to this edge is the sum of corresponding demands. For example, in Fig 6.1 the edge e4 which is common for P1 and P2 has the total demand 111 + v2 . The demand on the edges 61,86 and e7 is 111 , while the demand on e2 and 88 is 222 . Each of the edges ek has a value of capacity Ck assigned to it. Capacity for a given edge represents the maximal demand that can be imposed to the edge. If for every edge in E the total imposed demand does not exceed the edge capacity, then 95 the network is believed to operate without violation. In reroute sequence planning (RSP) the system 73 = (P1, . . . , Pn) of paths should be replaced by another system Q = (Q1, . . . , Qn) that connects the same ' as vertices s,- and ti, 2' = 1, . . . , n. Each path 622- in Q has the same demand vz the corresponding path P,- in ’P. In a reroute sequence procedure at each step only one Pi is being replaced by the corresponding Qz' . The RSP problem consists of finding a sequence 91, ..., gin of rearrangements of {1, . . . ,n} such that the successive replacement P91 by Q91, ..., Pg” by an does not cause the demand v(e) after each replacement to exceed the capacity C(e) for each e. Denote by :rz-(e) the change in demand for the edge e when P,- is replaced by 622- . We have f .rz-(e) = ( —v2-, if e 6 Pi, 6 ¢ Qi vi, if e¢Pi, eEQi [0 otherwise. The vectors $1,...,:1:n belong to the space Rd , where d, the dimension of the space, is the number of edges. Denote by L0(e) the starting demand on the edge e, and assume we have chosen the permutation defined by a sequence 91,...,gn of replacements P91 by 6291, P92 by Q92, ..., Pgn by Q9”. Then the demand on the edge e, after the l—th step is [40(8) + 1391(6) + . . . + £91 (6) . This implies that the demand on the edge e at each step does not exceed L e + max maxa: e +...+a: e 0() 1S[SneeEIglm 9M =L e+ max I +...+:1: =L e+ a: , 0() 13,9” 91 gllloo 0() lllrrlll where H - [loo denotes the maximum-norm, while |||:r7r||| = max1 {1,...,n} is a permutation of indices {1, . . .,n}. The main problem (on compact vector summation) we study in this paper is as follows. PROBLEM : Given elements 11:1, ..., am from a normed space X (i) Estimate minyr |||:r7r[|| . (ii) Find a permutation 7r : {1, . . . ,n} —+ {1, . . . , n} ensuring the estimate in (i). We will study both theoretical and algorithmical aspects of Problem. The follow- ing lemma plays a crucial role in solving the problem and in algorithmic construction of an appropriate permutation 7r. For a permutation 7r: {1,...,n} ——> {1,...,n} we write 7r 2 (k1,...,kn), where ki = 7r(z'),i = 1, . . .,n. The permutation 7Tb = (kn, kn_1, . . . , 15.1)) is called the backward permutation. Given an ordered collection of vectors a". = (3:1, . . . ,xn) C 97 X we put :rwd _= max :1: t9 +...+:c.6. ; I” [ll 1 {1, . . .,n} and any collection ofsigns 6 = (01, . . .,6n), 6,; = i1, i=1,...,n |Ha77r||| + lllIrerHI 22|||$7r>r||| — H8“, (6-2-1) where 7r* = 7r(/.~,+),7r(i~,+_1), . . .,7r(k_1+),7r(kl—), wag), . . .,7r(k,7,), and kit < k; < + < kl are all the indices with Bk? '2 0k;— = . .. = 6k; = +1, while ki— < k2- < < 1:77,; are all the indices with 6k— = 0k- 2 = 613— = —1. 1 2 m. Proof. Using the fact that ”[513]” enjoys the properties of a norm in the space of sequences, we get by the triangle inequality for [I] - Ill lllxvrlll + ”12:7.me = lllwvr.s|||+ lllwafllll = Him, |||+ lllzvrbfilll 2 max(llls.2rr7iblll.Ills.2:v7?b||l) = Ills.2x.t,,2x;,sill = His.2x.t,,.2x;m 2 2Hix+ x‘m — Ilsll = 2mx Ill — llsll 7Tb: 7f wilt . Lemma is proved. The next lemma converts our rearrangement problem into a well-known combi- natorial problem on choosing signs. Lemma 6.2.2 (Transference Lemma: An extension of a Transference Lemma proved in [16]). Let X be a normed space, 271, ..., an E X, s = — 2? wk. (i) Then there exists a permutation a : {1, . . . ,n} —> {1, . . . ,n} such that for any 6 = (61, . . . , i9”) , 6i = 21:1,i = 1, . . . ,n the following inequality holds: Illrvalll S lllIobfllll + IISII. (622) where 0b stands for the backward permutation, where indices go in the opposite 98 direction. As the permutation a one can take the optimal permutation, the one at which [[[Tnlll attains its minimum. (ii) ”22:1 :ck = 0, then for the optimal permutation o and any 6 = (91, . . . ,fln) , 02- : :l:1,i=1,...,n ”[5130]” S lllitaOHI- (52-3) Proof. (i) follows from Lemma 6.2.1, if we take as it in (6.2.1) the optimal permutation a . (ii) follows from (1). Remark. Part (ii) of Lemma 6.2.2 has been proved in [16]. A straightforward use of (6.2.3) along with the triangle inequality would lead to a worse version of (6.2.2): llll'alll S lllInOHI + 2H8” that worsens related algorithmic estimates. 6.3 Application of Transference Lemma to Prob- lem. An estimation of min7r |||:c,.||| Theorem 6.3.1. For any collection (:51, . . .,a:n) from a normed space X there exists a permutation o : {1, . . . , n} —> {1, . . . , n} such that k n n n ”[370]” s E ggliZmOW-iu + 1122:.“ s2EiIZx.r.-H + 1122:.” .(6.3.1) — 1 1 1 1 where r1, . . . , rn are Rademacher random variables ( independent and taking two values -1 and +1 with probability 1/2), and E is the operation of expectation. Proof. It is enough to average both sides the inequality (6.2.2) of Lemma 6.2.2 with respect to all collections of signs 0 = (61, . . .,6n) and then use the Levy inequality. 99 Remark. In fact Lemma 6.2.2 gives a stronger result (see [16]), namely n I n 1 fiZlH-rrrlll s 2EIIin-rill + IIZxrll. ‘ 7r 1 1 In other words, (6.3.1) holds for a typical permutation o. In the case of l2—norm (or, more general, for type—2 norms) the last inequality implies the following: n n 1 2 2 2 EXIHIWIII SCE :Ilrill +||§I$zt||1 where C depends only on the norm. The inequality goes back to [22] where it has been stated for the scalar case. To our knowledge there is no any estimation for IIIlIl7'j' |||r7r||| that is better than (6.3.1) for an infinite-dimensional X. In the rest of this section and in section 6.4 we will be finding estimations for [llxglll for an appropriate 0 through Transference Lemma and a known estimation of “[16]“ for an appropriate collection of signs 0 . Such a well-known estimation of |||x0||| is the following. Theorem 6.3.2. ( [18], generalization by [24]). Let X be a normed space of dimension d ,x1,...,xn E X, ||in| S 1, i = 1,...,n. Then there exists a collection of signs 6 = (01, . . . , On) such that IIIIEQHI S 2d. Corollary 6.3.3. Ifx1,.. .,xn E X, dimX = (1, ”xi” S 1, i=1,...,n, then there is a permutation o : {1, . . . , n} —> {1, . . . ,n} such that |||xg||| 3 2d +||x1+...+ in”. (6.3.3) 100 Remark. The inequality (6.3.3) is the famous Steinitz (1916) inequality. It can be proved in a straightforward manner, without appealing to Lemma 6.2.2 (see [33]). We believe, however, that the proof using Lemma 6.2.2 is simpler. 6.4 The case of ldo'O This is the d-dimensional space Rd with the 111aximum-norm. This case is most important for applications in scheduling theory. For low dimensions the general estimate (6.3.3) is the best among known ones. However, in most applications to scheduling theory the dimension d is much larger than n. Instead of using the Dvoretzki-Hanani theorem in this case we exploit the following assertion. Theorem 6.4.1. If $1,...,a:.,. 6 (go, Hat,” 3 1,2‘ = 1,...,n, then for the Rademacher random variables r1, . . . ,rn (i) n El] 2 xkrkH2 S Cnlnd, k=1 where C is an absolute constant; (ii) £7“! 26 [||x6[|| 3 C1 M, where C1 is an absolute constant; (iii) There exists a 6 such that ”[156]” SCp/nlnd (6.4.1). Proof. (i) Denote X (k) = zT-lzl xgklri, k = 1, . . . , (1. Using the integration by parts formula we get for any 5 > O n 00 EHZxZ-rillzzE max |X(kl|2=/ P( max |Xt)dt2= 1 ISk'Sd 0 1 t)dt , fofijdfl +foo [jdt2 <62+/6OO :P(|X( kl|>t)dt2<62+d/6 1:21 101 where Y is the X (I) for l where (600 P[|X(k)| > t]dt2 attains its maximum. Now using the following well-known estimation for the tail probability 2 t PUYI > t) S 2€XP{————}. 22in?- where Y = 2313/2273» we get 71 00 _ t2 _62 E||inri||2362+dfé 2e fitdt=62+2dne 277. 1 Putting (52 = 2n ln 2d we find n El] 2 Iirill2 g n(1+ 21112d). i=1 This proves (i). (ii) is a consequence of (i). Indeed, I n ,7 Z 1112:6111 = Emmn s (ElllrrII|2)1/2 s (Elixir/€17.16)”2 = cal/Winner. 6 1 (iii) is an immediate consequence of (ii). Remark. 1. Theorem 6.4.1 (iii) has been proved first by use of combinatorial methods by Spencer [53]. Statement 6.4.1 (ii) shows that not only does the desired 6 exist, but (6.4.1) holds for a typical 6. 2. In another paper Spencer [54] has proved a surprising result saying that in the case of d = n there is a collection of signs 6 = (61, . . . , 6n) such that |||x6||| S Cx/d, where C is an absolute constant. However, as the following example shows, in the case of arbitrary d and n Theorem 6.4.1 (iii) cannot be improved. Example. Consider a matrix whose rows are all possible sequences of length n consisting of +1’s and —1’s. Our vectors x1, . . . ,xn in lgo are all the columns of the 102 matrix. If one of the columns is multiplied by —1, we still get a matrix consisting of all different :tl-sequences of length n, and in particular there is a sequence consisting of +1’s only. This implies that |||x6||| = n for any 6 = (61, . . . , 6n). In this example d = 2”, and |||x6||| = n = m which up to a constant coincides with (6.4.1). The application of the transference lemma leads to the following assertion. Theorem 6.4.2. If x1,...,xn E lgo,||x,-|| S 1,i=1,...,n, then there is a permutation a : {1, . . .,n} —> {1, . . .,n} such that [[[Ig’lll g Cannd+||x1+...+xn||, where C is an absolute constant. 6.5 An algorithm for finding 0 In all previous results concerning estimation of min” |||x7r||| (Theorem 6.3.1, Corol- lary 6.3.3, Theorem 6.4.2) we stated only the existence of a permutation o ensuring the corresponding estimation. In this section we give a constructive way of finding 0‘ provided that the respective (much simpler) sign—problem possesses an algorithmic solution. Let us Show first that a greedy algorithm (that seems to be natural) is not, in general, the best. Given vectors x1, . . . , xn E X a greedy algorithm chooses at each step a vector that minimizes the norm of the next partial sum. In other words, on step 1 it chooses an element xnl that has a minimal norm. On step 2 it selects an element xn2,n2 76 n1 such that [Ixnl + x712” S ||xn1+ xnkll for any k # n1, etc. The following simple example constructed by Jakub Wojtaszchik (oral commu- nication) shows that a greedy algorithm is not in general the best one even in the 2-dimensional case. Example. Consider it groups of vectors each consisting of the three following vectors: (1, 1), (2, —3), and (3, —2). Obviously, the greedy algorithm chooses at the first it steps the vectors (1,1),(1,1),...,(1,1). Therefore, if fly is the greedy 103 permutation, then [[errng 2 n. In fact, lll177rglll = n + 2. While the optimal algo- rithm arranges the vectors as follows: (1, 1), (2, -3), (-3, 2), (1, 1), (2, —3), (—3, 2), etc. For the optimal permutation no we have |||X7r0||| = 3. Therefore, the ratio llll‘rr ”I n 2 . mm = —‘l3'— can be made arbltrary large. Consider now a collection x1, . . . ,xn E X, “at,” S 1,i 2 1, . . .,n, where X is a general normed space. Assume that for any permutation 7r there is a collection of signs 6 = (61, . . . , 6”) such that |||x7r6||| S D for some D > 0, and that the desired 6 can be found algorithmically. This is the case for vectors involved in Theorem 6.3.2 and Theorem 6.4.1. Remark. For a general d-dimensional X such a polynomial sign-algorithm with D = 2d was constructed by [6], for X = 15,10 such an algorithm with D = m was described by Spencer [53]. Another algorithmic, type result that uses a Sign procedure was found earlier in [15]. An assertion similar to Theorem 6.5.1 was stated without proof by Makai [41] for the case 5 = 0. His arguments are based on Transference Lemma 6.2.2. Below is the description of the permutation-algorithm. We start with an arbitrary permutation 7r : {1, . . . ,n} —> {1, . . .,n}. Then we find a 6 satisfying |||x7rb6||| S D, where rrb is the backward permutation (see Lemma 6.2.1). According to Lemma 6.2.1, lllxvrlll + lll$7rb9lll Z 2|||a77r1|l| - IISII (6-5-1) for some permutation al. An easy algorithm determining rrl is described in Lemma 6.2.1 and runs in O(n) time. Using (6.5.1) we get 1 1 . lllr-vrllll S §<|l|x7rlll + lllzrvrbfllll + IISII) S -2-(||l:vrr||| + 0+ ||8||)- (6-5-2) Then we find 6(1) such that xyrlb6(1l S D and a permutation n2 such that was. Help. 6<1l|l|22lllx7r2lll—Ilsll 1 1b 104 In the same way as in (6.5.2) we get 1 3 ll|$7r1|||S Z—Qllll‘vrlll + 2—2(D+ HSID- After l iterations we get .7: 1 |ll27lrlll +(1_ ?)(D+ IISH) S D+ HSH +6 lllitfillll S where 6 = 2%|[|x7r|||. Noticing that “|de S n, by an appropriate choice of l, we can make 5 smaller than any fixed 6 > 0 (to get 6 < s, l = [1n loll + 1 iterations are needed, where [x] denotes the integral part of x.) Thus we have proved the following. Theorem 6.5.1. Let X be a normed space, x = (x1, ...,xn),x1, ...,xn E X, ”1132” S 1, i = 1,...,n. Assume that for any permutation 7r : {1,...,n} ——r {1, . . . , n} there is an algorithm with a polynomial complexity to define 6 = (61, . . . , 6n) such that |||337r9||| SD. (1) where D does not depend on 7r . Then for any 5 '> 0 there exists an algorithm with a polynomial complexity to define a such that lllxalll S D + “8” +5. where s = — Z? x,- The complexity of the algorithm is O(C - log(n/e) ), where C is the complexity of the sign-algorithm ( for finding 6 ) 105 Chapter 7 Summary and Future Research In this chapter we summarize findings described in previous chapters and also briefly specify future research directions that are suggested by these findings. 7. 1 Summary This dissertation proposes to consider a new quantity that describes the behavior of available bandwidth, which is called “crossing probability”. Crossing probability is a likelihood that the available bandwidth time series drops below a certain quality- critical threshold M for the quality-critical time-period T. Both, the threshold and the time-period have to be defined by specification of a real-time task that has to be executed. At the beginning we concentrated our efforts on estimating crossing probabil- ity with a parametric model that relies on observations of an available bandwidth obtained with the help of a Pathload measurement tool. We used Pathload for col- lecting available bandwidth measurements because our research showed that it was the most robust and accurate measurement tool among those available. We success— fully applied the ARCH 2 model used in econometrics to modeling available band- width behavior. In particular, we combined two known estimation techniques used in econometrics thus obtaining our method for quick estimation of model parame- ters together with their confidence interval. We used known results from extreme 106 values theory for computation of a crossing probability for arbitrary threshold Al and time-period T. Then we verified an efficiency of our approach on a number of Internet paths. Before going any further with available bandwidth modeling, the dissertation discusses some work in improving the One Way Delay trend detection mechanism of present available bandwidth measurement tools. Instead of using empirically defined cutout thresholds for trend-detection metrics, we propose to detect trend based on the p—value of linear regression t-test conducted against slope of one way delay as a function of a packet sequence number. We also suggested a simple way of dealing with the interrupt coalescence / context switch effect. Our experiments clearly indicated that both Type I and Type II errors of trend detection were significantly smaller with t-test than errors reported by extensively used before PCT/ PDT test. The next part of the dissertation deals with a non-parametric approach that relies on binary observations of {0,1}, where 0 indicates that available bandwidth at a particular time-instance is higher than a specified critical threshold and 1 indicates that available bandwidth is lower than a critical threshold. Our non-parametric model gives an answer to the question: “What is the probability that among the “1”” in contrast to our next N observations of indicators will be present at least one parametric model that gives an answer to the question: “What is the probability that the minimal over next N observations value of an available bandwidth time series drops below the level of M”. With the non-parametric approach relying on {0, 1} observations, one can predict what happens within the frames of the next couple of seconds while considering a history time-period of 10 minutes. The parametric approach that relies on slow available bandwidth measurement tools needs observations collected for the period of at least 2 hours for predicting of what happens within next 5 minutes. In addition to introducing of a non-parametric model for a “crossing probability” we developed a self-verification procedure that combines predicted probabilities of realization of the event of a crossing and actual realizations/non—realizations of the 107 mentioned event into a statistic that follows the standard normal distribution if predicted probabilities “match” on-coming realizations. We also present a theoretical proof of the fact that the proposed self-verification statistic really follows a standard normal distribution given that the prediction module works correctly. The self-verification procedure opened further opportunities to adjust the pre- diction module to a particular network path. We output predictions of a crossing probability based on a number of sizes of a history time-frame used for a predic- tion. For an actual prediction we use the window size that is “confirmed” by a self-verification test. Then we added weights to the crossing probability prediction. Even within the frames of a history window the most recent past inputs into prediction with a weight higher than the less recent past. We adjust the rate at which weights decay to a particular network path. Experiments with real-life data-sets of available bandwidth time-series confirmed that introducing weights significantly increases the range of history window-sizes valid for a crossing-probability prediction. The last part of this dissertation is dedicated to the future research of possible actions that communicating parties may undertake if a crossing probability becomes unacceptably large. One of the promising scenarios of actions dedicated to support— ing tele-operation under a degraded quality of service is the reallocation of part of the communication channels through different network paths with the help of ex- ternal nodes-facilitators. In the context of such a reallocation arises an interesting and important “Reroute Sequence Planning (RSP)” problem. The problem chooses the optimal (or nearly optimal) sequence of channel reallocations (once they are planned) in a way that minimizes extra capacities required along the network for carrying out the reallocation procedure. In the last part of this dissertation we develop a solid theoretical basis for the algorithm that finds a lower bound for an extra needed capacity (for carrying out reallocation) as well as actually finds in polynomial time a sequence of reallocations from old paths to new paths that fit the mentioned above bound. A part of the dissertation was published in the frames of following references: [13, 108 12, 14] 7 .2 Future research 7 .2.1 (208 overlay network infrastructure As it was described in Chapter 6 we see a promising direction in setting up an overlay infrastructure that helps in dynamic allocation-reallocation of communica- tion channels through various network paths in a way that minimizes likelihood of a failure of remote real-time task execution. Chapter 6 makes a step towards setting up such an infrastructure. However, it is clear that there are still many unsolved directions to follow. Directions include algorithms for finding the best set of paths with respect to particular QoS-related probabilistic requirements, detecting an ap- propriate time-moment for triggering the reroute sequence procedure, finding the best reroute-sequence procedure with respect to probabilistic (but not hard-coded as in Chapter 6) QoS characteristics, and many others. 7.2.2 QoS model improvement The Quality of Service model that was introduced in Chapter 3 assumes that real- time task can be specified with some fixed value of a rate (bandwidth) that the task requires for a safe execution. For most tele—operations, however, the vast majority of tele-operation traffic is constituted by a real-time video. Traffic associated with video transfer may be bursty. Various encoding formats, for example, send only change from the current video frame to the previous one when camera moves slowly and there are no significant scene changes (B-frames). If a rapid scene change takes place then the whole newly-captured frame has to be sent at once, which causes bursts in video-traffic. It may not be efficient to reserve bandwidth for a video-traflic at such a “burst-level” because for most of the time the high bandwidth will remain unused. On the other hand, reservation of bandwidth at lower levels may lead to unexpected degradation of service. Therefore a precisely estimated “crossing probability” as it 109 was introduced in Chapter 3 may not do a good job for estimating the likelihood of an actual degradation of service during the execution of a remote real-time task. Under a more complex model, the likelihood of degradation of service for the period of time T can be viewed as a probability that the traffic associated with remote real-time task execution exceeds the available bandwidth time-series at any moment during the critical period of time T. In order to estimate such a likelihood, one has to model traffic associated with the task in addition to modeling of available bandwidth time-series. Liu and Mao [38] analyze various prediction algorithms for real-time video traffic and introduce another prediction algorithm. One of our goals therefore is to unify results presented in [38] with our results in available bandwidth estimation and come out with a more robust model for estimation of a likelihood of a failure of remote real-time task. We conclude this dissertation with a hope that the introduced parametric and non-parametric approaches to modeling of an available end—to—end bandwidth to— gether with the improvement of present OWD trend detection mechanism and novel self-verification / self adjustment module'will facilitate the extension of tele- operations from “traditional” isolated and fully controlled lines to inexpensive and easily accessible Internet. 110 BIBLIOGRAPHY 111 BIBLIOGRAPHY [1] Auckland-ii trace archive. ”http://pma.nlanr.net/Traces/long/auck2.html”. [2] David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris. Re- silient overlay networks. In 18th ACM Symposium on Operating Systems Prin- ciples (SOSP), 2001. [3] Javier Aracil, Richard Edell, and Pravin Varaiya. A phenomenological ap- proach to internet traffic self-similarity. In 35th Annual Allerton Conference on Communication, Control and Computing, 1997. [4] M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a web proxy in a cable modem environment. In ACM Performance Evaluation Review, 1999. [5] M. Arlitt and T. J in. A workload characterization study of the 1998 world cup web site. In IEEE Network, 2000. [6] I. Barany and VS. Grinberg. On some combinatorial questions in finite- dimensional spaces. Linear Algebra and its Applications, 4121—9, 1981. [7] J. Beran, M. Sherman, S. Taqqu, and W. Willinger. Long-range dependence in variable-bit-rate video traffic. In IEEE Transactions on Communications, 1995. [8] M. Borella and G. Brewster. Measurement and analyses of long-range packet dependent behavior of internet packet delay. In IEEE INFOCOM, 1998. [9] Arup Bose and Kanchan Mukherjee. Estimating the ARCH parameters by solving linear equations. Journal of time series analysis, 24(2):127—136, 2002. [10] J. Bourgain. /\ -sets in analysis: results, problems and related aspects. Hand- book of the geometry of Banach spaces, I:195—232, 2001. [11] Peter J. Brockwell and RA. Davis. Time Series: Theory and Methods. Springer, New York, 1991. [12] Alexander Chobanyan, Matt Mutka, Zhiwei Gen, and Ning Xi. One way delay trend detection for available bandwidth measurement. In IEEE GL OBECOM 2005, 2005. [13] Alexander Chobanyan, Matt Mutka, Shlomo Levental, and Ning Xi. Behavior of available end-to—end bandwidth: Non-parametric approach. In International Conference on Quantitative Evaluation of SysTems ( QES T), 2006. 112 [14] Alexander Chobanyan, Matt Mutka, V.S. Mandrekar, and Ning Xi. Modeling available bandwidth for an efficient qos characterization of a network path. In 4th IF IP- TC6 Networking Conference, Networking2605, 2005. [15] Sergei Chobanyan. The structure of the set of sums of a conditionally convergent series in a normed space. Mat Sb (NS), 128(170)(1):50—65, 1985. [16] Sergej Chobanyan. Convergence as. of rearranged series in Banach spaces and associated inequalities. Appeared in Probability in Banach spaces .9 edited by J. Hoflmann-Jorgensen and J. Kuelbs and M.B. Markus. Birkhauser, 1994. [17] M. Crovella and A. Bestarvos. Self—similarity in world-wide web traffic: Ev- idence and possible causes. In ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, 1996. [18] A. Dvoretzky and H. Hanani. Sur les changements des signes des termes d’une serie a termes complexes. CR. Acad. Sci. Paris, 255:516—518, 1947. [19] Paul Embrechts, Claudia Kluppelberg, and Thomas Mikosch. Modelling Ex- tremal Events for Insurance and Finance. Springer, Berlin Heidelberg, 1997. [20] RF. Engle. Autoregressive conditional heteroscedasticity and estimates of vari- ance of UK inflation. Econometrica, (50):987—1008, 1982. [21] M. Garrett and W. Willinger. Analysis, modeling, and generation of self-similar vbr video traffic. In SIGCOMM, 1994. [22] A. Garsia. Topics in almost everywhere convergence. Lectures in Advanced Mathematics. Markham Publishing Co., Chicago, 1970. [23] N. Giroux and S. Ganti. Quality of Service in ATM Networks. Prentice Hall, Upper Saddle River, NJ, 1999. [24] VS. Grinberg and S.V. Sevastyanov. Value of the steinitz constant. Funktsional. Anal. i Prilozhen, 14:56-57, 1980. [25] I. Halperin and T. Ando. Bibliography: Series of vectors and Riemann sums. Hokkaido University, Sapporo, 1989. [26] H. Hotelling. Multivariate Quality Control. Techniques of Statistical Analysis. Eisenhart C, Hastay MW, Wallis WA, McGraw-Hill, New York, 1947. [27] N. Hu and P. Steenkiste. Evaluation and characterization of available bandwidth techniques. In IEEE JSAC Special Issue in Internet and WWW Measurement, Mapping, and Modeling, 2003. [28] Ibragimov and Linnik. Independent and stationary sequences of random vari- ables. Wolters—Noordhoff, The Netherlands, 1971. 113 [29] [30] [31] [32] [33] 1341 [35] [36] [37] 1381 [39] Manish Jain and Constantinos Dovrolis. Pathload: a measurement tool for available bandwidth estimation. In Passive and Active Measurements Work- shop, 2002. Manish Jain and Constantinos Dovrolis. End-to-end available bandwidth: Measurement methodology, dynamics, and relation with top throughput. In ACM/IEEE Transactions on Networking, 2003. Manish Jain and Constantinos Dovrolis. Ten fallacies and pitfalls on end-to—end available bandwidth estimation. In Internet Measurement Conference, 2004. Balazs Gabor Jozsa and Marton Makai. On the solution of reroute se- quence planning problem in mpls networks. Computer Networks: The Inter- national Journal of Computer and Telecommunications Networking, 42(2):199— 210, 2003. M.I Kadets and V.M. Kadets. Series in Banach Spaces.Conditional and Un- conditional Convergence. Birkhauser Verlag, 1997. Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson. On the self-similar nature of ethernet traflic (extended version). IEEE/A CM Transactions on Networking, 2(1):l—15, 1994. P. Levy. Sur les series semi-convergents. Nouv. Ann. Math, 5:506~511, 1905. Mingzhe Li, Mark Claypool, Robert Kinicki, and Jim Nichols. Characteristics of streaming media stored on the internet. Technical Report WPI—CS—TR-03-18, Computer Science Department, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts 01609-2280, May 2003. Zhi Li and P. Mohapatra. QRON: QoS-aware routing in overlay networks. IEEE Journal on Selected Areas in Communications, 22(1):29—40, 2004. Huabing Liu and Guoquiang Mao. Prediction algorithms for real-time variable- bit video. In Asia Pacific Conference on Communications, 2005. Regina Y. Liu and Kesar Singh. Moving Blocks Jacknife and Bootstrap Cap- ture Weak Dependence. Appeared in Exploring the Limits of Bootstrap edited by LePage and Billiard. John Wiley Sons, New York, 1992. [40] X. Liu, K. Ravindran, B. Liu, and D. Loguinov. Single-hop probing asymp- [41] totics in available bandwidth estimation: A sample-path analysis. In ACM SICCOMM conference on Internet measurement, 2004. Marton Makai. Reroute sequence planning in telecommunication networks and compact vector summation. Applied Mathematics and Computation, 150(3):785—801, 2004. 114 [42] 1431 [44] 1451 1461 [47] [48] [49] [50] [51] 1521 [53] [54] [55] Cao Le Thanh Man, Go Hasegawa, and Masayuki Murata. A new avail— able bandwidth rneasurement technique for service overlay networks. In 6th [PIP/IEEE International Conference on Management of Multimedia Networks and Services, 2003. B. Melander, M. Bjorkman, and P. Gunningberg. A new end-to—end probing and analysis method for estimating bandwidth bottlenecks. In Global Internet Symposium, 2000. B. Melander, M. Bjorkman, and P. Gunningberg. Regression-based available bandwidth measurements. In International Symposium on Performance Eval- uation of Computer and Telecommunication Systems, 2002. Jiri Navratil and R. Les. Cottrell. ABwE:a practical approach to available bandwidth estimation. In Passive and Active Measurements Workshop, 2003. Hikori Nishikawa, Takuya Asaka, and Tatsuro Takashi. ABdis: Approach to estimate available bandwidth distribution using a multi-rate probe. In Inter- national Conference on Communication and Broadband Networking ( I CBN04 ), 2004. M. Peligrad and Q. Shao. Central limit theorem for weakly dependent random variables. Journal of Theoretical Probability, 1994. Ravi Prasad, Manish Jain, and Constantinos Drovolis. Effects of interrupt coalescence on network measurements. In Passive and Active Measurements Workshop, 2004. V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell. Pathchirp: Efficient available bandwidth estimation for network paths. In Passive and Active Measurements PAM workshop, 2003. V.J. Ribeiro, M. Coates, R.H. Riedi, S.Sarvotham, and RG. Baraniuk. Mul- tifractal cross traffic estimation. In I TC specialist seminar on IP trafi‘ic Mea- surement, 2000. John A. Rice. Mathematical Statistics and Data Analysis. Duxbury Press, Belmont, California, 1995. S.V. Sevastyanov. On some combinatorial methods in scheduling theory. Dis— crete Applied Mathematics, 55:59—82, 1994. J. Spencer. Balancing games. Journal of Combinatorial Theory, B23:68——74, 1977. J. Spencer. Balancing vectors in max norm. Combinatorica, 6(1):55~65, 1986. E. Steinitz. Bedingt konvrgente reihen und convexe systeme. Journal fur die Reine und Angewandte Mathematik, 146268—111, 1916. 115 [56] Jacob Strauss, Dina Katabi, and Hans Kaashoek. A measurement study of available bandwidth estimation tools. In IMC, 2003. [57] B. Tsybakov and N. Georganas. On self-similar traffic in atrn queues: Defini- tions, owerflow probability bound, and cell delay distribution. In IEEE/A CM Transactions on Networking, 1997. [58] Sven Ubik and Antonin Kral. End-to-end bandwidth estimation tools. Technical Report WPI-CS-TR-03-18, CESNET, November 2003. [59] A.A. Weiss. Asymptotic theory for ARCH models: Estimation and testing. Econometric Theory, (2):31, 1986. [60] Moshe Zukerman, Timothy D. Neame, and Ronald G. Addie. Internet traffic modeling and future technology implications. In IEEE INFOCOM, 2003. 116 V LIBRAHJE If LJNIVEPSIY Fqu RAH M K 4295 6 1293 0295