n 1..
ans; . . . . . . .
. an!” . . . ., 2- £32.51. 92.... .
as}... , ‘ . ‘ . . . .13... A 5 .t .
. . . . .ﬁw. .. ‘

‘ V t H La».

.¢£.T

1r

.. rgw,”

v
Qwﬁhﬂ . uﬁm
.

?..G ~

1.3“.

I .
um .
,ﬁgmwﬁmmm:

.. 3.,
97.3}
,.r..:?:9

S: 9... a.

2.?

a 1..

‘-

Wvﬁ§.§ur

all.

. 31:3

I)

v3.9.3.4.
7,133.8
;. I

 

v. ‘

It. s. 3.2!... ‘

r. ... Lukﬁllr.\)hﬂﬂ
1; (-l}

 

 

 

, . . . 1 .
On A .99.... ﬁt I 5 uh .19. it. ,5 I"
. . .t .. ‘ a? L", 1!: . .o‘A'vI‘n ‘12‘ .3. 5..
:91...qu . , A ‘ . .3. 5.14. 45.3.“: .w: «3.». 2.3.1., , . . .‘ . .. .. ‘ n. .. .

..\
.A.

 

has“

:to O"?-

This is to certify that the
dissertation entitled

WIRELESS CHANNEL MODELING AND MALWARE
DETECTION USING STATISTICAL AND INFORMATION-
THEORETIC TOOLS

presented by
SYED ALI KHAYAM

has been accepted towards fulﬁllment
of the requirements for the

 

 

 

 

PhD. degree in Electrical Engineering
j/m//Z///
/’?j /’Major‘Professor’ 5 Signature
$6 (. 4/ 2 00 6
Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

 

 

LIBRARY
Michigan State
University

 

 

 

00-0O--I-I-u--D-~-I-t-OOIOIO-OI-o-n--I---u—-—.- - —

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 p:/CIRC/DateDue.indd-p.1

 

 

WIRELESS CHANNEL MODELING AND
MALWARE DETECTION USING
STATISTICAL AND INF ORMATION-
THEORETIC TOOLS

By

Syed Ali Khayam

A DISSERTATION

Submitted to
Michigan State University
in partial ﬁ11ﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical and Computer Engineering

2006

ABSTRACT

WIRELESS CHANNEL MODELING AND
MALWARE DETECTION USING STATISTICAL
AND INFORMATION-THEORETIC TOOLS

By

Syed Ali Khayam

This is a bipartite thesis that tackles two different research problems: (i) medium
access control (MAC) layer wireless channel modeling and applications of the models in
design, analysis and simulations of wireless systems; and (ii) malicious soﬁware
(malware) detection at network endpoints. For both problems, we collect extensive new
datasets which are analyzed and modeled using statistical and infonnation-theoretic tools.

In the ﬁrst part of this thesis, we provide analysis and modeling of bit-errors at the
802.11b MAC layer. We show that the bit-errors at 2 Mbps and 5.5 Mbps can be modeled
by high-order full-state Markov (FSM) chains. Bit-errors at 11 Mbps are shown to have
long-range dependence (LRD), and consequently a multifractal wavelet model (MWM) is
used to model these LRD bit-errors. The complexity of FSM chains is an exponential
function of the bit-error process’ memory-length. To mitigate the exponential FSM
complexity, we derive guidelines for accurate approximation of an FSM chain of
arbitrary memory-length. These guidelines lead to a novel and accurate constant-
complexity model (CCM) which always consists of ﬁve states irrespective of a process'

memory-length.

Two applications of the proposed channel models are explored. First, we use the
models in a novel maximum-likelihood header estimation framework which can be used
by wireless multimedia applications to realize considerable throughput improvements.
Trace-driven wireless video simulations show that the proposed header estimation
framework provides signiﬁcant improvements over existing techniques. Second, we use
protocol goodput and retransmission metrics to show that inaccurate channel models can
lead to extremely misleading simulation and analytical results. The models proposed in
this thesis, however, provide highly accurate estimates of goodput and retransmissions.

In the second part of this thesis, we propose three endpoint-based anomaly detection
techniques that detect self-propagating malware in real-time by observing deviations
from a behavioral model derived from a benign data proﬁle. In the ﬁrst technique, we
leverage the Kullback-Leibler (K—L) information divergence of real-time source and
destination ports’ distributions to characterize deviations ﬁom the distributions observed
in the benign trafﬁc proﬁle. Experiments using actual endpoint and malware data
demonstrate that the source and destination ports’ distributions are perturbed signiﬁcantly
on a compromised endpoint. K-L perturbations are used to train support vector machines
which provide almost 100% detection rates and negligible false alarm rates.

The remaining two malware detection techniques proposed in this thesis employ
perturbations in the distribution of keystrokes that are used to initiate network sessions.
We show that the keystrokes’ entropy increases and the session-keystroke mutual
information decreases when an endpoint is compromised by a self-propagating malware.
These two types of perturbations are used for real-time malware detection. Both detectors

provide almost 100% detection rates and very low false alarm rates.

Copyright by
SYED ALI KHAYAM
2006

ACKNOWLEDGMENTS

I would like to thank my family for always respecting and supporting my professional
and academic goals. I also thank my academic advisor, Professor Hayder Radha, for
always encouraging me to think out-of-the-box and for helping me identify and reﬁne
research ideas. I sincerely thank my friends, family members and colleagues in WAVES
lab who allowed me to collect network trafﬁc data on their computers. Apama, Mujahid,
Dmitri and Farshad deserve special mention here for discussing and critiquing the theory,
experiments and writing of my research papers. I also thank Shardha who was a great
friend during my ﬁrst year, and who I regretfully forgot to acknowledge in my Masters
thesis. I must also acknowledge the Higher Education Commission of Pakistan and the
National Science Foundation of USA for their continued ﬁnancial support during my
MS. and Ph.D. studies. I thank my PhD. committee members and Professor Rong Jin for
their technical and editorial guidance. Finally, I thank those associate editors and
anonymous reviewers who gave constructive feedback on my papers. That feedback has

deﬁnitely improved the quality of this thesis.

TABLE OF CONTENTS

LIST OF TABLES ........................................................................................................ x
LIST OF FIGURES ..................................................................................................... xi

Part A Statistical Models of MAC Layer Wirless Channels and their Applications 1

CHAPTER A.l Introduction ......................................................................................... 2
A. 1 .1 Overview of Contributions ............................................................................. 4
A. 1 .2 Organization of this Part ................................................................................. 6

CHAPTER A.2 Related Work ...................................................................................... 7
A.2.l Channel Modeling ........................................................................................... 7
A22 Cross-Layer Design for Wireless Multimedia ................................................ 9

CHAPTER A.3 Background ....................................................................................... 11
A.3.1 802.1 lb Wireless Networks .......................................................................... ll
A.3.2 Autocorrelation of Random Processes .......................................................... 12
A.3.3 Discrete-Time Markov Chains ...................................................................... 12
A.3.4 Burst Representation of Binary Wireless Traces .......................................... 14
A.3.5 The Gilbert Channel Model .......................................................................... 15
A36 Full-State Markov Chains for Wireless Channels ........................................ 16
A37 Long-Range Dependent Processes ................................................................ 17
A.3.8 The Multifractal Wavelet Model .................................................................. 19
A.3.9 Performance Evaluation Measure ................................................................. 20

CHAPTER A.4 Empirical Analysis and Accurate Modeling of Wireless Channels.. 22

A.4.1 Wireless Trace Collection ............................................................................. 22
A.4.2 Empirical Analysis of 802. lb Bit-Errors ...................................................... 25
A.4.2.l Autocorrelation Analysis .................................................................. 25
A.4.2.2 Preliminary Empirical Analysis of FSM Chains .............................. 27
A.4.2.3 Long-Range Dependence in 11 Mbps Bit-Errors ............................. 28
A.4.2.3.1 LRD Evaluation by Observing Energy at Different Scales ......... 28
A.4.2.3.2 LRD Evaluation using Variance-Time Diagrams ........................ 30
A.4.2.3.3 LRD Evaluation using the Periodogram ...................................... 32
A.4.3 Accurate Modeling of 802.1 lb Bit-Errors .................................................... 33
A.4.3.1 Bit-Error Modeling at 5.5 Mbps ....................................................... 33
A.4.3.2 Bit-Error Modeling at 2 Mbps .......................................................... 34
A.4.3.3 Bit-Error Modeling at 11 Mbps ........................................................ 35
A.4.3.3.1 The Multifractal Wavelet Model .................................................. 36

vi

A.4.3.3.2 ENK-based Performance Evaluation ........................................... 36

A.4.3.3.3 Performance in Capturing Energy at Different Scales ................. 39
A.4.3.3.4 Performance in Capturing the Variance-Time Characteristics 39
A44 Discussion ..................................................................................................... 41
CHAPTER A.5 Complexity Reduction for Markov Channels ................................... 43
A.5.l The Hierarchical Markov Model .................................................................. 44
A.5.2 The Hidden Markov Model .......................................................................... 46
A.5.3 FSM Observations ........................................................................................ 48
A.5.4 Observations about FSM Chains .................................................................. 48
A55 Markov Chain Lumpability ........................................................................... 51
A.5.5.1 Lumpability for Wireless Bit-Error Channels ................................... 51
A552 Folded Markov Chains ...................................................................... 55
A.5.5.3 Evaluation of Folded Markov Chains ............................................... 58
A.5.6 Complexity Reduction by Approximating an F SM Chain’s Good- and Bad-
Burst Behavior .............................................................................................................. 59
A.5.6.1 Simpliﬁcation of Good-bursts Distribution ...................................... 64
A562 Simpliﬁcation of Bad-bursts Distribution ......................................... 65
A563 Guidelines for Approximating an FSM chain ................................... 66
A57 Constant-Complexity Model ......................................................................... 67
A571 Performance of the CCM at 2 Mbps ................................................. 68
A572 Performance of the CCM at 5.5 Mbps .............................................. 71
A.5.8 Discussion ..................................................................................................... 72

CHAPTER A.6 Channel Model Based Header Estimation for Wireless Multimedia 73
A61 F EC Redundancy Lower Bounds for UDP, UDP-Lite and Header Estimation

...................................................................................................................................... 76
A.6. l .1 Redundancy Bounds on the q-ary Symmetric Channel .................... 77
A.6. l . 1.1 FEC Redundancy Bound on a UDP based Protocol Stack .......... 78

A6] .1.2 FEC Redundancy Bound on a UDP-Lite based Protocol Stack... 79
A.6.l.l.3 FEC Redundancy Bound on a Header Estimation based Protocol
Stack 80

A.6. 1.1.4 Comparison of the F EC Redundancy Bounds ............................. 80

A.6. l .2 Redundancy Bounds on the Gilbert Channel .................................... 83
A6121 Bound on a UDP based Protocol Stack ........................................ 83
A6122 Bound on a UDP-Lite based Protocol Stack ................................ 84
A6] .2.3 Bound on a Header Estimation based Protocol Stack .................. 85
A6124 Comparison of the FEC Redundancy Bounds ............................. 85
A613 Discussion ......................................................................................... 88
A62 Maximum-Likelihood Header Estimation Framework ................................. 88
A.6.2.1 Functionality at and below a Receiver’s MAC layer ........................ 89
A622 The Header Estimation Module ........................................................ 91
A623 Processing at a Receiver’s Network, Transport and Application

Layers 91

A63 Likelihood Functions for Header Estimation ................................................ 91
A631 Header Estimation Likelihood Function for FSM Chains ................ 93

vii

A.6.3.2 Header Estimation Likelihood Function of MWM ........................... 95
A633 Extending the FSM Likelihood Function to the CCM ...................... 98
A64 Performance Evaluation of the Header Estimation Framework ................... 99
A641 Experimental Setup ........................................................................... 99
A642 Throughput Performance ................................................................ 100
A643 Comparison of Packet Drops .......................................................... 101
A644 False Alarm Rate ............................................................................. 102
A645 F EC Performance ............................................................................ 103
A646 Video Performance ......................................................................... 106
A65 Discussion ................................................................................................... 107
CHAPTER A.7 Impacts of Ignoring Channel Memory on Analysis and Simulation of
Wireless Systems ............................................................................................................ 108
A.7.l Goodput of an Unreliable Protocol ............................................................. 109
A.7. 1.1 Goodput of a Wireless Channel ...................................................... 110
A.7.1.2 Goodput of a Binary-Symmetric Channel Model ........................... 111
A.7.1.3 Goodput of a Gilbert Channel Model ............................................. 111
A.7.1.4 Goodput of a Full-state Markov Channel Model ............................ 112

A.7. l .5 Goodput of a Constant-Complexity Channel Model ...................... 114

A.7. l .6 Comparison of Estimated Goodputs ............................................... 115
A72 Retransmissions of a Reliable Protocol ...................................................... 116
A721 Expected Retransmissions on a Wireless Channel ......................... 117
A722 Comparison of Estimated Retransmissions .................................... 118
CHAPTER A.8 Conclusions and Future Work ........................................................ 122
Part-A References ..................................................................................................... 123
Part B Self-Propagating Malware Detection at Network Endpoints using Information-
Theoretic Tools ............................................................................................................... 132
CHAPTER 3.1 Introduction ..................................................................................... 133
3.1.1 Overview of Contributions .......................................................................... 134
B. l .2 Organization of this Part ............................................................................. 137
CHAPTER 32 Related Work .................................................................................. 139
CHAPTER B.3 Background ..................................................................................... 142
B.3.l Self-Propagating Malware .......................................................................... 142
B.3.2 Support Vector Machines ............................................................................ 143
CHAPTER B.4 Data Collection and Simulation ...................................................... 144
8.4.1 Benign Trafﬁc-Keystroke Proﬁles .............................................................. 144
8.4.2 All-Keystrokes’ Proﬁles .............................................................................. I48
B.4.3 Malware Classiﬁcation ................................................................................ 149
B.4.4 Real Malware .............................................................................................. 150

viii

8.4.5 Simulated Malware ..................................................................................... 152

8.4.6 Inserting Malware Data in Benign Trafﬁc Proﬁles ..................................... 153
CHAPTER 8.5 Malware Detection using Trafﬁc Features ...................................... 155
8.5.1 Malware Detection Using Sample Entropy ................................................. 155
8.5.1.1 Entropy of Source and Destination Ports ........................................ 156
8.5.1.2 Entropy-based Trafﬁc Perturbations in the Infected Proﬁles ......... 157
8.5.2 Malware Detection Using Information Divergence .................................... 159
8.5.2.1 Kullback-Leibler Divergence of Source and Destination Ports ...... 160
8.5.2.2 K-L-based Trafﬁc Perturbations in the Infected Proﬁles ............... 163
8.5.2.3 Evaluating Trafﬁc Perturbations with Other Information Divergences
164
8.5.3 Leveraging K-L Perturbations in an SVM-based Framework .................... 167
8.5.3.1 SVM Training ................................................................................. 167
8.5.3.2 Performance Evaluation and Comparison with Existing Techniques
169

8.5.4 Summary and Discussion ............................................................................ 173
CHAPTER 8.6 Malware Detection using Joint Network-Host Features ................. 174
8.6.1 Correlation in the Session-Key Data ........................................................... 174
8.6.2 Malware Detection Using Keystroke Entropy ............................................ 178
8.6.2.1 Deﬁnition of Keystroke Entropy .................................................... 178
8.6.2.2 Entropy Perturbations in the Infected Proﬁles ................................ 179
8.6.3 Malware Detection Using Session-Key Mutual Information ...................... 182
8.6.3.1 Mutual Information of Sessions and Keys ...................................... 182
8.6.3.2 Mutual Information Perturbations in the Infected Proﬁles ............. 184
8.6.3.3 Automated Detection using Keystroke Perturbations ..................... 187
CHAPTER 8.7 Attacks and Countermeasures ......................................................... 190
8.7.1 Mimicry Attack ........................................................................................... 190
8.7.2 Attack by Acquiring System-Level Privileges ............................................ 191
CHAPTER 8.8 Conclusions And Future Work ....................................................... 192
Part-8 References ..................................................................................................... 193

ix

LIST OF TABLES

Table 1. Packet-Level Statistics at 2, 5.5 and 11 Mbps .................................................... 24
Table 2. Performance of MWM and F SM for the 11 Mbps Bit-Error Process ................ 37
Table 3. Performance of the hMM for 5.5 Mbps Bit-Error Process ................................. 45
Table 4. Performance of the HMM for the 5.5 Mbps Bit-Error Process .......................... 47
Table 5. Empirical Evidence in Support of Observation 2 ............................................... 50
Table 6. Statistics of the Benign Proﬁles ........................................................................ 147
Table 7. Information of Malware Used in This Study .................................................... 151

LIST OF FIGURES

Figure l. The Gilbert channel model [81]. ....................................................................... 15
Figure 2. Set up for collection of wireless bit-error traces. .............................................. 24
Figure 3. Autocorrelation of bit-error traces. .................................................................... 26
Figure 4. Percentage of unused FSM states at 2 and 5.5 Mbps. ....................................... 28
Figure 5. Aggregates of the 11 Mbps energy process at different time scales. ................ 29
Figure 6. Variance-time diagrams of two 11 Mbps bit-error traces. ................................ 31
Figure 7. Logscale periodogram of two 11 Mbps bit-error traces. ................................... 32

Figure 8. Performances of varying order FSM chains for the 5.5 Mbps MAC layer bit-
error process. ............................................................................................................. 35

Figure 9. Performances of varying order FSM chains for the 2 Mbps MAC layer bit-error
process. ...................................................................................................................... 35

Figure 10. Probability mass functions for good- and bad-bursts random variables derived

from an 11 Mbps trace. (Only the probabilities of small bursts are shown here.) 38
Figure 1 1. Energy processes of actual and synthesized bit-error traces. .......................... 38
Figure 12. Variance-time diagrams of varying order FSM chains for the 11 Mbps bit-

error process. ............................................................................................................. 41
Figure 13. Variance-time diagrams of the MWM for the 11 Mbps bit-error process. ..... 41
Figure 14. The hierarchical Markov model (hMM) [18]. ................................................. 45
Figure 15. Transition possibilities for an F SM chain (memory-length, k = 4). ............. 50

Figure 16. Aggregate states S, and SJ. containing FSM states {12, m} and {272, 2m},
respectively. .............................................................................................................. 55

Figure 17. Performance of FMCs formed by folding a 512 state F SM to 256, 128, 64, 32,
16, 8, 4 and 2 states; the FSM process is trained using a 5.5 Mbps trace. ................ 58

Figure 18. State transitions of an FSM with memory-length k and a good-burst of length
l 2 k. ........................................................................................................................ 60

xi

Figure 19. State aggregation and transitions for the CCM. Each box represents an
aggregate CCM state. The number(s) inside a CCM state are the aggregated FSM
states. ......................................................................................................................... 68

Figure 20. ENK based modeling performance versus complexity for the 2 Mbps bit-error
process ....................................................................................................................... 69

Figure 21. ENK based modeling performance versus memory-length for the 2 Mbps bit-
error process. ............................................................................................................. 69

Figure 22. ENK based modeling performance versus complexity for the 5.5 Mbps bit-
error process. ............................................................................................................. 71

Figure 23. ENK based modeling performance versus memory-length for the 5.5 Mbps
bit-error process. ....................................................................................................... 71

Figure 24. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal Header
Estimation over an q-ary symmetric channel;m = 8 , q = 256 , L = 30, 1211 = 60,
up = 452 ................................................................................................................. 82

Figure 25. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal Header
Estimation over a Gilbert channel;m = 8 , L = 30, ng = 60, ”D = 452. ......... 87

Figure 26. Interactions between the UDP-based header estimation module and different
layers of a wireless receiver’s protocol stack; modiﬁed protocol stack layers are
shown in different colors and dotted lines represent communications that are not
related to packet reception. ....................................................................................... 89

Figure 27. Average packet drops for UDP Normal, UDP-Lite and UDP with Header
Estimation at different data rates and for varying number of video streams per
receiver; each point is averaged over 3 x (# of video streams) x 5 x 25 received

video streams. ......................................................................................................... 101
Figure 28. Codeword construction for video FEC simulations. ..................................... 104

Figure 29. Average F EC redundancy required by UDP Normal, UDP-Lite and UDP with
Header Estimation at different data rates of an 802.11b LAN; each point is averaged
over 3 x 5 x 5 x 25 = 1875 received video streams ............................................... 105

Figure 30. Average PSNR of video sequences for UDP Normal, UDP-Lite and UDP with
Header Estimation using a 30 byte RS codeword with 2 parity bytes; each graph is
averaged over 3 x 5 x 5 = 75 received video streams. .......................................... 107

Figure 31. Comparison of the average goodput of the actual traces with the goodput
estimates provided by BSC, Gilbert, 1024-state Markov, and 5-state CCM models;
each result is averaged over ﬁve traces ................................................................... 115

xii

Figure 32. Comparison of the number of retransmissions per packet estimated by BSC,
Gilbert, 1024-state Markov, and 5-state CCM models; each result is averaged over

ﬁve traces. ............................................................................................................... 120
Figure 33. Number of retransmissions per packet without the BSC model .................... 120
Figure 34. Number of retransmissions per packet without the BSC model .................... 120

Figure 35. Source and destination port entropies at infected endpoints. Infection start
times are marked with a circle. Infections in (a), (b), and (c) last approximately 15
minutes, while that in (d) lasts approximately one minute. Each non-overlapping
time-window is 20 seconds. .................................................................................... 157

Figure 36. Source and destination ports’ K-L divergences at infected endpoints. ......... 162

Figure 37. Jensen-Shannon (J-S), K— and resistor-average (R-A) divergences of source ‘
and destination ports at infected endpoints. ............................................................ 166

Figure 38. Comparison of detection and false-alarm rates of the proposed K—L/SVM-
based malware detector with maximum-entropy and rate-limiting detectors. Each
point is averaged over 12 malware with 100 random infections per malware per
endpoint ................................................................................................................... 169

Figure 39. A generalized flow diagram of the proposed K-L/SVM-based malware
detector. The shaded area contains real-time components ...................................... 172

Figure 40. Normalized histograms of 20 most-used session initiation keystrokes.
Histograms are generated from the session-key data. Virtual keys codes 1 and 13
correspond to the left mouse click and the Enter key, respectively [48].
................................................................................................................................. 175

Figure 41. Normalized histograms of 20 most-used keystrokes. Histograms are generated
from the all-keys data. Virtual keys codes 40, 38 and 17 correspond to the down
arrow key, the up arrow key and the control key, respectively [48]. 177

Figure 42. Entropy of the keystroke histograms at infected endpoints. Infection start times
are marked with a circle. Infections last approximately15 minutes. Each non-
overlapping time-window is 60 seconds. ................................................................ 181

Figure 43. Mutual information of the session and keystroke random variables at infected
endpoints. Infection start times are marked with a circle. Infections last
approximately15 minutes. Each non-overlapping time-window is 60 seconds. ..... 186

Figure 44. Comparison of detection and false-alarm rates of the mutual information based
and keystroke-entropy based malware detectors with maximum-entropy [14] and
rate-limiting [20] detectors. Each point is averaged over 9 malicious codes with 100
random infections per malicious code per endpoint. .............................................. 189

xiii

PART A

STATISTICAL MODELS OF MAC LAYER
WIRLESS CHANNELS AND THEIR
APPLICATIONS

 

CHAPTER A.1 INTRODUCTION

Error modeling has been used to improve design of communication channels and
systems for many decades [1]—[7]. Stochastic models of wireless medium access control
(MAC) layer packet-losses and bit-errors have recently attracted signiﬁcant research
attention [8]- [30]. The main objective of analyzing and modeling MC-to-MAC [29] or
residual [11] bit-errors is to develop accurate simulators which allow experimentation
without having the actual network in place. Moreover, bit-error analysis and modeling
provide important insights into characteristics of the underlying error random process.
This insight is essential for design and performance evaluation of a wide range of
wireless protocols, applications and services. For instance, accurate channel models can
facilitate design, parameter tuning and veriﬁcation of the following wireless protocols:

0 Wireless congestion control protocols, instead of relying on MAC layer
retransmissions, can use accurate MAC layer error models to differentiate between
losses due to congestion, medium degradation or mobility. The inability of wired
congestion control algorithms to differentiate between different types of losses and the
consequent bandwidth underutilization have been repeatedly highlighted by prior
studies [10], [31]—[40]. Knowledge of losses due to channel errors, which is assumed
in many wireless congestion control solutions, can be provided by a real-time MAC-
to-MAC channel model. Understanding of error frequency and burstiness is also

instrumental in parameter tuning of congestion control protocols.

0 Cross-layer protocols can use a real-time channel model to choose between reliable
(e. g., using MAC layer retransmissions) versus cross-layer (e.g., ignoring data payload

errors [41 ]—[5 1 ]) protocols.

0 Reliable routing protocols [52]—[55] for mobile networks can use MAC-to-MAC
channel models to differentiate reliable versus shortest routes to different destinations,
if the model is able to provide real-time error characterization at different hops of the

network.

0 MAC protocols can decide when to increase/decrease the physical transmission data
rate based on real-time channel estimation. An accurate channel model can predict
future error characteristics, thereby saving the MAC layer protocol the overhead of

switching to an inaccurate lower/higher data rate based on short-term observations.

Similarly, design of many wireless applications can be improved by accurate channel
models. For instance:

0 Real-time channel estimation provided by an accurate model can be employed by rate-

adaptive applications to perform channel- and/or source-coding rate adaptation for

efﬁcient bandwidth utilization.

0 Design of effective error-control schemes for different wireless applications requires a

thorough understanding of errors above the physical layer [56].

o Error-resilience features of contemporary multimedia codecs can be effectively

designed and veriﬁed with knowledge of MAC layer error characteristics.

Note that most beneﬁts of a wireless MAC layer channel model can be realized if the

model is able to provide real-time and online channel characterization and prediction. In

3

complexity- and power-constrained wireless environments, such channel characterization
is only possible with a low-complexity model. Despite some recent interest in reducing
the complexity of wireless models [23]- [29], development of accurate, pragmatic and

low-complexity wireless channel models is still an open problem.

A.1.1 Overview of Contributions

In this part of the thesis, we analyze and model bit-errors propagated to the 802.11
MAC layer at three physical layer data rates of an 802.11b LAN: 2, 5.5 and 11 Mbps
[57], [58]. Our objective is to develop low-complexity MAC-to-MAC channel models
without compromising modeling accuracy. To that end, Chapter A.4 focuses on empirical
analysis and “accurate” modeling of the bit-errors observed at 2, 5.5 and 11 Mbps. After
identifying accurate bit-error models, in Chapter A.5 we reduce the complexity of these
models by approximating their behavior. In Chapter A.6, the accurate and low-
complexity charmel models are used in a header estimation framework to improve
wireless multimedia quality. As a ﬁnal contribution of this part, Chapter A.7 shows that
inaccurate channel models can provide extremely misleading results for critical wireless
performance metrics.

Chapter A.4 shows that the MAC-to-MAC bit-error characteristics vary with changes
in the physical layer data rate. We show that the error-rate is quite low at 2 Mbps as
compared to 5.5 and 11 Mbps. At 2 Mbps, approximately 95% of the packets are
received without errors, which is a testament of the high physical layer robustness at 2
Mbps. The loss-rate subsequently increases with an increase in data rate.

We observe that the 2 and 5.5 Mbps bit-errors exhibit decaying correlation and a low

memory-length can be identiﬁed. Thus the bit-errors at 2 and 5.5 Mbps can be modeled
4

using Markov chains [59]. However, the bit-errors at 11 Mbps exhibit very high
correlation even at large lags. Such high correlation is reminiscent of long-range
dependence (LRD) [60] in the 11 Mbps bit-error process. We substantiate the LRD
notion through aggregation, variance-time and periodogram analyses.

Bit-errors at 2 and 5.5 Mbps are accurately modeled using high-order full-state
Markov (FSM) chains [59]. The LRD nature of the 11 Mbps bit-errors renders traditional
stochastic models (e.g., Markov, Poisson) ineffective. Therefore, we employ a
multifractal wavelet model (MWM) [61]—[63] to characterize the 11 Mbps bit-error
random process. For comparison, we also model the 11 Mbps bit-error process using
F SM chains of varying orders. We demonstrate that the MWM outperforms the Markov
models in both complexity and channel approximation.

The complexity of F SM chains increases exponentially with respect to the memory-
length. In Chapter A.5, we mitigate the exponential F SM complexity by approximating
the F SM behavior using low-complexity models. We ﬁrst show that hierarchical, hidden
and lumped Markov models cannot capture the complex behavior of FSM chains.
Consequently, we directly analyze FSM chains and derive important guidelines that
should be followed to realize accurate, effective and low-complexity models. These
guidelines are used to propose a constant-complexity model (CCM) [30] that always
comprises of ﬁve states irrespective of the underlying process’ memory-length. At both 2
and 5.5 Mbps, the 5-state CCM provides performance that is comparable to the
exponential-complexity F SM chains and better than the linear-complexity models [29].

In Chapter A.6, we leverage the proposed low-complexity channel models in a novel

cross-layer wireless multimedia framework. Under the proposed header estimation

framework, corrupted headers of received packets are estimated using the MAC-to-MAC
channel models. The corrupted packets are in turn passed to the application layer, which
uses forward error correction (FEC) to recover the corrupted data. Trace-driven wireless
video simulations show that signiﬁcantly better bandwidth utilization and video quality
than UDP [64] and UDP-Lite [41]- [43] protocols can be achieved by employing the
header estimation framework. We also show analytically that an ideal header estimation
scheme will always perform better than UDP and UDP-Lite under realistic wireless
channel conditions.

As a ﬁnal contribution of this part, Chapter A.7 shows that an inaccurate channel
model that ignores channel memory can provide extremely misleading results. We use
two critical wireless performance metrics, namely goodput and retransmissions, and show
that highly inaccurate estimates of these metrics are obtained if memory-less or IM order
channel models are used. On the other hand, F SM and CCM channel models which cater

for channel memory provide very accurate goodput and retransmission estimates.

A.1.2 Organization of this Part

The rest of this part is organized as follows. Chapter A.2 provides an overview of the
related work in this area. Chapter A.3 provides background that is required to understand
the material presented in this part. Chapter A.4 focuses on empirical analyses and
“accurate” modeling of the bit-errors at 2, 5.5 and 11 Mbps. Chapter A.5 reduces the
complexity of the proposed models by evaluating low-complexity alternatives. Chapter
A.6 proposes a header estimation framework for wireless multimedia. Chapter A.7 shows

the impact of ignoring channel memory on the design of wireless systems.

CHAPTER A.2 RELATED WORK

A.2.1 Channel Modeling

Recently, link layer modeling for reliable protocols has received some research
attention [9], [10]. In the context of delay-sensitive trafﬁc, a previous study derived
conditions under which block-based residual/MAC-to-MAC errors can be modeled using
a Markov chain [1 1]. For AT&T WaveLAN, a trace-based link layer investigation was
conducted in [13]. In the context of link layer modeling, Konrad et 01. performed analysis
and presented a Markov-based Trace Analysis (MTA) model algorithm for frame-errors
on GSM networks [14], [15]. Ii et a1. [16], [l 7] compared the performance of the MTA, a
full-state k -the order model, a hidden Markov model and an extended ON(error-
free)/OFF(error-ﬁlled) model in capturing the GSM (link layer) frame losses. Based on
the comparison provided in [16], [17], it was concluded that an extended ON/OFF model
with geometric distributions governing the state holding times provides signiﬁcantly
better results than the other three modeling paradigms.

In view of the increasing popularity of 802.11 networks, we studied the 802.11b link
layer in order to facilitate design of effective cross-layer error-control schemes for the
support of real-time services [18], [19], [45]. Since most error-control schemes operate
on byte and/or packet boundaries, we proposed Markov-based models at the packet and
byte levels. We showed that a 2-state Markov model can characterize the packet-loss
process and a hierarchical Markov Model was proposed for the byte-level errors [18].
Willig er a1. [26] have performed the only prior study that attempts to analyze and model

7

bit—error behavior of 802.11b networks with modeling accuracy as a performance
criterion. There are fundamental differences between the measurements, analyses and
modeling of [26] and this thesis. In [26] the authors attempt to capture the impact of
physical layer parameters (e.g., modulation type, antenna diversity etc.) on the bit-error
rate of a wireless LAN in an industrial setting. This study performs all experiments with
default physical layer parameters, thereby capturing a realistic channel that is
omnipresent in most common home/business/classroom settings. Also, in [26] the error-
prone 11 and 5.5 Mbps channels were not evaluated.

Chen et a1. [24], [25] investigated Markov chain lumpability to reduce the complexity
of wireless channel models. Since lumpability constraints are too stringent for practical
wireless channels, Chen et a1. [24], [25] resorted to an ON-OFF model that stochastically
bounds the sojourn time distributions of the lumped good and bad states. However, and
as asserted by [11], an ON-OFF model assumes geometric (memory-less) distributions
for good and bad periods which is not a valid assumption in most real-life channels.
Bipartite models were proposed for wireless channels by Willig [26]. The accuracy of
bipartite models depends on a selected value of complexity. We argue that model
accuracy is not optional and even a low-complexity model should provide the requisite
accuracy. Moreover, bipartite models require a large number of parameters to achieve a
certain level of accuracy. deke et a1. [28] used chaotic maps to model 802.11b bit-errors
at low data rates (1 Mbps and 2 Mbps). Due to the focus on low data rates, in [28] it was
observed that: (a) probability of bit—error bursts of more than two bits is very low, and (b)
there is almost no correlation in error traces. The chaotic map model in [28] ignores the

correlation and captures only the heavy-tail behavior of bit-errors. While this assumption

of “no autocorrelation in data” might be suitable for the particular experimental setup
used in [28], it is not generically applicable to network error and loss data. In [20], it was
shown that low-complexity Markov models (such as hidden and hierarchical Markov
models) are inadequate for modeling of an 802.11b link layer bit-error wireless channel.
In [29], two linear-complexity models were proposed which were reasonably effective in

capturing 802.11b bit-errors.

A.2.2 Cross-Layer Design for Wireless Multimedia

The traditional UDP protocol detects and drops corrupted packets using a checksum
operating at the MAC layer [64]. Such packet drops results in signiﬁcant bandwidth
wastage, especially in the context of error-resilient multimedia applications which can
inherently tolerate some errors and losses in the received content. Larzon et al. proposed
a UDP-Lite protocol which allows delivery of partially corrupted packets to the
application layer [41]- [44]. In its commonly-used form, UDP-Lite disables the MAC
layer checksum while the transport layer partial checksum only covers transport and
application layer headers. Errors in the application layer payload are simply ignored.
Note that support of partial checksum requires modiﬁcations to the multimedia senders,
receivers and/or (multicast or multihop) intermediate nodes.

Many wireless cross-layer studies have shown that UDP-Lite performs better than
UDP on contemporary wireless networks [18], [41]- [50]. In [18], it was shown that over
802.11b LANs an application layer FEC must be employed in conjunction with UDP-
Lite. Otherwise the partially corrupted packets delivered by UDP-Lite result in almost

unintelligible multimedia quality. It was also shown in [18] that UDP-Lite over 802.1 lb

LANs can only work at the 2 and 5.5 Mbps data rates. At the 11 Mbps data rate, the

errors and losses in the received content are too high for effective FEC-based recovery.

10

CHAPTER A.3 BACKGROUND

This chapter provides the background that is required to understand the contributions

of this part.

A.3.1 802.11b Wireless Networks

Due to their high data rates and use of the time-tested TCP/IP protocol suite, 802.11b
networks have experienced widespread deployment. These LANs are ﬁnding their way
into homes and businesses ubiquitously. However, like other wireless technologies,
802.11b networks also suffer from severe quality degradation in the presence of physical
obstructions and inter-symbol-interferences. Two modes of operation are supported in
802.11 networks [57], [58]: (i) ad hoc mode in which wireless nodes can communicate
with each other directly, and (ii) infrastructure mode in which wireless nodes are

arbitrated using a central entity called an access point (AP).

All 802.11b-complaint networks support four basic physical layer data rates of 1
Mbps, 2 Mbps, 5.5 Mbps and 11 Mbps. Increase in the data rate reduces the robustness of
the 802.1 lb physical layer. In the infrastructure mode, if the number of retransmission
requests exceeds a certain threshold, the AP drops down to a lower data rate than its
current data rate. For retransmissions, 802.11b relies on a 32-bit frame check sequence
(FCS) that computes checksum over the entire MAC layer frame. Positive
acknowledgement (ACK) frames are employed to signal successful transmission of data
frames. If a frame fails checksum then it is dropped at the receiver’s MAC layer. The

sender after timing out schedules a retransmission.
1 l

A.3.2 Autocorrelation of Random Processes

Let X (ml) and X (n?) be two random variables derived from a random process

X (t). The correlation coefﬁcient of these random variables is deﬁned as [65]

p (n) = E{X(0)X(n)} — E{X(O)}E{X(77)}’ (AJ)

‘7 X (0)” X (n)

 

where E {X} and o X represent the mean and standard deviation of the random variable

X . When evaluating a dataset, the sample mean and the sample standard deviation are
used to compute the correlation coefﬁcient of (A.1). This sample autocorrelation
coefﬁcient for different values of lag is a direct measure of the level of temporal
dependence in the random process. Lag beyond which the autocorrelation coefﬁcient
drops to an insigniﬁcant value corresponds to the memory-length of a random process.
Autocorrelation of a Markov source yields the order of the model required to accurately

characterize the source [66], [67].

A.3.3 Discrete-Time Markov Chains

Markov chains are employed to model statistical data with short-term temporal

dependence. Let a stochastic process Xn take on values denoted by non-negative
integers 0,1,...,M . If Xn =z' then the process is said to be in state i at timen.

Whenever the process is in state i there is a ﬁxed probability that the next state of the

process will be state j. If that probability can be expressed as

Pr{Xn+1=j]Xn =i,Xn_1= in_1,...,X1=i1,X0 = 7'0} = Prixn+1=jIXn = iI’

(A.2)
12

for all states tb,i1,...,in_1,i,j and all n 2 0, then Xn is a Markov chain [59]. The

property given in (A.2) is commonly referred to as the Markov Property. Thus, for a

Markov chain the conditional distribution of any future state Xn+1, given the past states
Xn_1, . . . , X1, X0 and the present state Xn , is independent of the past states and depends

only on the present state. Equation (A.2) is also referred to as homogeneity property since

it ensures that the transition probabilities do not vary with time.
Let pm- = Pr {Xn+1 = jIXn = 2'} denote the probability of transiting to state j
from 2'. Since pm- represents a probability measure, it exhibits the following properties:

(i) pm- 2 0 for all 2,3" 2 0, and (ii) f: pm- =1 for all 2': 0,1,...,M. The probability
j=0
of transiting to the next state can be represented in a matrix form. This matrix is referred
to as the one-step state transition probability matrix.
The steady-state or stationary probabilities of a Markov chain represent the long-run
proportion of the time spent in each state. Once the transitional probabilities of a Markov

chain are known, the steady-state probabilities of being in a particular state are the unique

non-negative solutions of the following linear system of equations:

M

"j = Ere-pr], j=0,1,...,M
i=0

M

Zﬂj =1.

i=0

For stationary Markov chains, the steady-state and transition probabilities do not vary

with time. Throughout this thesis, we use stationary Markov chain for modeling bit-error

l3

processes. The memory-length of a Markov chain is also referred to as its order.
Discussion in the preceding section outlined that autocorrelation analysis can be
performed on the realizations of a random process to determine the appropriate order of
the respective Markov chain. This observation will be used later to identify the orders of

Markov chains.

A.3.4 Burst Representation of Binary Wireless Traces
Wireless bit-error traces are generally represented as a binary time-series {z(i)}:=1,

where :r:(z') 6 {0,1} and l is the length of the time-series. Throughout this thesis, we

deﬁne :1: (i) as:

0 error-free bit
113(2) = 1 corrupted bit.

Without loss of generality, a binary time-series can be represented as an interleaved

sequence of runs (bursts) of good and bad bits (27(2') =0 and x(i)=1), i.e.,
(11,B1),(12,Bg),---,(II,BI), where I,- and B, represent the lengths of the ith good and

bad bursts, respectively. Wireless channel modeling studies have established that this
binary data representation is rather suitable for deﬁnition and evaluation of a model
[l4]— [17], [20]. The burst-lengths of good and bad bits are used for empirical

performance evaluations in this thesis. (Subsequent sections discuss this in further detail.)

14

 

Figure l. The Gilbert channel model [81].

A.3.5 The Gilbert Channel Model

The Gilbert channel was proposed in [81] to model channels with 1St order memory.
Since then, it has been used to model many wireless channels at bit, byte and packet
levels [9]- [11], [l3]- [15], [18]- [20], [26]. The Gilbert model captures channel memory
through a two-state Markov chain having a good and a bad state. The probability of the
next (good or bad) symbol is dependent on the whether the last received symbol was
good or bad. The steady-state probabilities of staying in the bad and good states are

respectively expressed as:

7r = __Pgb and 7rg = ______pbg . (A3)
M + Pbg pgb + Pbg

Clearly, 7n, + 7rg = 1.

Higher probabilities of staying in the present state (i.e., p99 and Pbb) indicate the

intensity of channel memory. A more appropriate measure to quantify channel memory

was proposed by Mushkin and Bar-David in [82], where memory ,u of a Gilbert channel

was deﬁned as:

15

It 2 1‘ pgb _ pbg - (AA)

It can be easily seen that —1 S a S 1. Moreover, a closer look at above equations reveals

that
a = 0 => 7n, = Pgb and 7rg = Pbg- (A.5)

In other words, when tr = 0 , the probability of getting a good or a bad symbol at any

time instance is independent of the last symbol value, that is, the channel behaves as a

memory-less channel. In [82], channels with a > 0 and ,u < 0 were referred to the

persistent and oscillatory memory channels, respectively. Real-life channels generally

have persistent memory.

A.3.6 Full-State Markov Chains for Wireless Channels

Wireless bit-error processes are generally bursty and have a memory-length of greater
than one bit, and therefore these processes cannot be modeled using the Gilbert model.
To make such a process comply with the Markov property of (A.2), a Markov chain is
deﬁned such that at each time instance the process is characterized by as many bits as the
memory-length. At each time instance, a new bit is added to the memory-window and the
oldest bit is dropped from the memory-window. As mentioned before, memory-length of
a Markov chain is also referred to as its order.

For a memory length of k hits, a full-state Markov (FSM) chain [20] corresponds to

all the 2k different possible combinations of k consecutive bits. Transition probabilities
between states are computed by sliding a k bit memory-window over the data and
counting the number of times a bit-pattern [$1,$2,...,:rk] is followed by another bit-

16

pattern [y1,y2,...,yk]. Note that the number of states of an FSM chain increases

exponentially with an increase in memory-length — 2k states for a memory-length of k .

A.3.7 Long-Range Dependent Processes

Long-range-dependent processes belong to a generic class of scaling or self-similar
processes [68], [69]. Self-similar processes exhibit similar statistical behavior at different
scales — zooming into or out-of a sample path of the process gives a new process

realization which is statistically similar to the original. A self-similar process X(t)

satisﬁes the relation:

X(t) icHXU/c), (A6)

where =1 represents equivalence in ﬁnite-dimensional distributions, c is a scaling
(compression/dilation) factor and H is known as the Hurst parameter. Self-similar
processes are also referred to as H—ss processes. It is not possible to deﬁne a characteristic
scale for H-ss processes which implies that these processes are scale-invariant. A self-

similar process with stationary increments is referred to as an H—sssi process [68]- [70].

Long-range dependent (LRD) processes model stationary increments of a second-

order self-similar process. The Hurst parameter of an LRD process is 1/2 < H < 1. Also,

the autocovariance r [k] of an LRD process is of the form:
7' [k] N Crk—(2-2H) , (A.7)

where N represents asymptotic equivalence and CT is a positive and constant scaling

factor. From (A.7) and the constraint on H it can be seen that summing the

17

autocorrelation function results in a divergent power series [71], Zk|r[k]| = 00. Thus

all samples of an LRD process depend heavily on previous samples, thereby resulting in
occurrence of clusters of similar values. For the present binary process, this observation

simply implies long bursts of zeros and ones.

An important property of LRD processes is that they can be equivalently

characterized in terms of the behavior of the aggregated process:

km (A8)
1 .
a. 2 X121,

z=(k—1)m+1

X(m) [k] ____

where m is the aggregation level. For an LRD process (and in general for all second-

order self-similar processes), var{X(m)[k]} = m2H—2 var {X[k]}. Thus for an LRD

process, a log-log plot of var {Xﬂm [16]} as a function of m is strictly linear with a slope

of 2H — 2 [70]. This plot, generally known as the variance-time diagram, can be used to

ascertain the presence of LRD in the data and can also render an estimate of H .

The power spectral density of an LRD random process is the Fourier transform of the

autocorrelation of (A.7), and has been shown to be [70]:

I —-C 2' “’2 00 1 C 1-2H 0 (A9)
(w)— H sma- Z I2H+1~ lel asw—r ,

i=—oo

 

Iw + 27ri

where w is a frequency and C H is a constant. Note that the spectral density is

proportional to hull—2H for frequencies close to the origin. A log-log plot of the power
spectral density as a function of the frequency has a slope of 1 - 2H , which can be used
to estimateH .

18

A.3.8 The Multifractal Wavelet Model

The multifractal wavelet model (MWM) was proposed in [61]- [63] to analyze and
model LRD network data. The MWM has shown promise in modeling various LRD
network phenomena [61]- [63]. The MWM relies on the premise that network data is
inherently non—negative and generally spiky. Both these properties are clearly true for
wireless bit-error data. Moreover, the scaling properties of wireless bit-errors can be
adequately characterized by wavelet-based analysis.

The MWM employs the Haar wavelet family and applies a constraint that the input
training data are always non-negative. For the Haar wavelet, the scaling and wavelet

coefﬁcients are computed recursively as

_ 1 _ 1 (A.10)
U j,k - WW j+1,2k + U j+1,2k+1) and Wj,k - 7—2—(Uj+1,2k _ Uj+1,2k+1)r

where U j, k and WjJc respectively represent the scaling and wavelet coefﬁcients at time

k and scale/level j. With the Haar scaling function, the scaling coefﬁcients are simply

averaged versions of the input signal and thus, due to the non-negative nature of the data,

the scaling coefﬁcients are always non-negative, U j, k 2 0. Rearranging (A. 10) yields

1 1 (A.11)
Ur+l,2k = $01ch + WM) and Ur+1.2k+1 = :EIUM - Wm)-

In the ﬁrst equation of (All), to keep the next level’s scaling coefﬁcients (U j+1,2k ’s)

non-negative, negative wavelet coefﬁcients are constrained as IerkI S U j,k- Similarly,
to keep the U 341,21,“ ’5 non-negative, the positive wavelet coefﬁcients are constrained

as Wj,k s U j, k . Combining these two constraints gives a non-negativity constraint that

19

[WM 3 UM. (A.12)

The above constraint simply ensures that once the inverse transform is taken, the resultant

process is always non-negative. Alternatively, the constraint can be implemented as

er = Aj,kUj,k: (M3)

where Aj, k is a random variable deﬁned over the interval [— 1, 1] .

In order to train the MWM to match the wireless bit-error traces, two random
variables need to be captured. The ﬁrst random variable is the scaling coefﬁcient at the

coarsest scale U jO’kO . The second set of random variables is the AN, ’8 at each level

which in turn yield the wavelet coefﬁcients (A.13) at that level. Once a general sense of
probability distribution is ascertained for these random variables, the expectation-
maximization algorithm [76], [77] can be used to ﬁt that distribution to the actual dataset.
The training and synthesis algorithm is detailed in [61]. The complexity of synthesizing a

length N trace using the MWM is 0(N).

A.3.9 Performance Evaluation Measure

Entropy is a measure of the average number of bits required to represent all outcomes
of a probability distribution. The Kullback-Leibler divergence quantiﬁes the difference in
the entropies of two probability distributions [78]. In [20] we proposed an entropy
normalized Kullback-Leibler (ENK) divergence measure to quantify the accuracy of a

channel model. The ENK divergence quantiﬁes the source-coding-like overhead incurred

20

by employing a model instead of the actual source. For two probabilities distributions

[2 (X) and q (X) deﬁned over a common alphabet \II , the ENK divergence is deﬁned as:

v

Z p(X)10g2 [IN—X] (A.14)

(X
ENK(P(X)II‘1(X)) = 36:13 p(X)log2(P(X)) ,

XE\II

Q

 

where the numerator and denominator respectively represent the Kullback-Leibler

divergence and entropy functions.

The ENK divergence inherits basic properties of the Kullback-Leibler divergence: (a)

non-negativity, ENK (pllq) 2 0 , (b) non-symmetry, ENK (pllq) 2: ENK (qllp) , and (c)
ENK (pllq) = 0 (it p = q. Small values of ENK divergence indicate that a model

accurately approximates the actual source. We would expect the ENK between two actual
traces to be a very small value as the traces are realized by the same random source.
Therefore we employ the ENK divergence between two 802.11b traces as a performance
evaluation reference for the ENK divergence between an actual trace and a trace
artiﬁcially generated by a model.

The ENK divergence relies on the fact that an appropriate random variable X is
being used to characterize the underlying source. We employ two random variables for
performance evaluation of all the models in this thesis: (i) burst-length of good bits I ,
where I takes positive integer values; (ii) burst-length of bad/corrupted bits B , where
B also takes positive integer values. Throughout this thesis, we refer to I and B as

good-bursts and bad-bursts random variables, respectively.

21

CHAPTER A.4 EMPIRICAL ANALYSIS AND
ACCURATE MODELING OF WIRELESS
CHANNELS

In this chapter, we ﬁrst describe the wireless trace collection experiment. We then
evaluate the correlation in the bit-error traces collected at 2, 5.5 and 11 Mbps. We
observe that the correlation at 2 and 5.5 Mbps exhibit a decaying trend, but the 11 Mbps
traces have high correlation even at large lags. Due to their manageable correlation, we
use Markov chains to model the 2 and 5.5 Mbps bit-errors. We show that full-state
Markov (F SM) chains provide highly accurate models of the 2 and 5.5 Mbps bit-errors.
Moreover, we show that FSM chains have unused states which can be ignored to reduce
the complexity of the F SM-based channel modeling paradigm.

Unlike the bit-errors at 2 and 5.5 Mbps, the 11 Mbps bit-error process requires a
model that can capture long-memory. We evaluate the 11 Mbps bit-errors using scaling,
variance-time and periodogram analyses. These evaluations substantiate the presence of
long-memory or long-range dependence (LRD) in 11 Mbps bit-errors. Consequently, we
employ a multifractal wavelet model (MWM) to characterize the 11 Mbps bit-errors. We
show that the MWM captures second-order statistics of the 11 Mbps bit-errors much

more accurately than Markov chains.

A.4.1 Wireless Trace Collection

For this study, ﬁve wireless receivers were used to simultaneously collect error traces

on an 802.1 lb LAN. The receivers were placed at different locations in a room, while the

22

access point (AP) was placed in a room across a hallway from the receivers to simulate a
realistic home/classroom/ofﬁce setting as shown in Figure 2.

The receivers’ MAC layer device drivers were modiﬁed to pass corrupted packets to
higher layers. The receivers were Linux clients using DLink DWL-650 wireless cards
with the open source linux—wlan-ng device drivers [72]. To capture packets at high
transmission rates, packet dissectors were implemented inside the device drivers. These
packet dissectors ensured that only packets pertinent to our wireless experiment are
processed, while all other packets are simply dropped. Each experiment comprised one

million packets with a payload of 1, 000 bytes each, i.e., each trace had approximately 1

GB of data.

A wired sender was used to send multicast packets with a predetermined payload on
the wireless LAN; multicasting disabled MAC layer retransmissions. The sender used
different transmission rates ranging from 4 Kbps to 1 Mbps for each experiment. At the
physical layer, the auto rate selection feature of the AF was disabled and for each
experiment the AP was forced to transmit at a ﬁxed data rate. Each trace collection
experiment was repeated multiple times at 2, 5.5 and 11 Mbps physical layer data rates

and at different times of day.

23

Room 1

 

 

 

w. tn)
802.11b A

/

 

 

i

RoomZ

“4‘

i-j
%

 

Receiver 0

modiﬁed linux-wlan-ng

drivers

 

 

 

 

Sender 7 3
ill
%
Receiver 4
Figure 2. Set up for collection of wireless bit-error traces.
Table l. Packet-Level Statistics at 2, 5.5 and 11 Mbps
Data rate Average Packet Min Packet Max Packet
Error Rate Error Rate Error Rate
2 Mbps 5.97% 0.75% 14.31%
5.5 Mbps 9.79% 0.61% 22.74%
11 Mbps 39.5% 10.99% 77.83%

 

 

 

 

 

 

l

Table 1 provides some statistics of the traces collected for this study. The packet error

rate is computed as

pkt error rate = (pkts received with one or more errors)/ (total received pkts).

As expected, the average packet error rate increases with an increase in the physical layer

data rate. In particular, the average packet error rate increases from approximately 10%
at 5.5 Mbps to almost 40% at 11 Mbps. Thus traditional higher layer protocols that drop
all corrupted packets (e.g., 802.11 MAC, UDP, TCP etc.) experience profound losses at
11 Mbps, and consequently there is room for considerable improvement. Since the

wireless receivers were placed at different locations, the receivers experienced different

24

packet error rates. The minimum and maximum error rates in Table l outline that the

receivers were experiencing both good and bad link conditions.

In our initial experiments all wireless receivers maintained Line of Sight (LoS) with
the access point (AP). The AP was forced to transmit at 2, 5.5 and 11 Mbps for each
trace. It was observed that with clear LoS, the error-rate (at all bitrates) was extremely
low. Such excellent performance deemed further LoS study inconsequential. Hence, we
positioned the receivers in separate rooms to simulate a more realistic

business/classroom/home-network wireless setup as shown in Figure 2.

A.4.2 Empirical Analysis of 802.1b Bit-Errors

To maintain focus, throughout Chapters A.4 and A5 we show results for two traces at
each physical layer data rate. These traces are collected at the same receiver under similar

conditions. The results for the remaining traces and receivers are similar.

A.4.2.1 Autocorrelation Analysis

The sample autocorrelations of 2 Mbps, 5.5 Mbps and 11 Mbps bit-error traces are
illustrated in Figure 3. Clearly, the correlation at 11 Mbps is higher than that at 2 and 5.5
Mbps. Let us ﬁrst concentrate on the autocorrelation of 2 and 5.5 Mbps traces. It is clear
that the autocorrelation at both data rates is a decaying function, i.e., the level of temporal
dependence is decreasing with time. From the examples provided in Figure 3, we assume
that the memory-length is determined by the lag beyond which the normalized correlation
is less than 0.15 , an empirically-determined threshold. We observed that in some traces
the correlation does not drop signiﬁcantly below 0.15 , even for very large lags.

However, in general the bit-errors exhibited rapidly decaying correlation as in Figure 3.

25

 

n A:
v.w

1 1'0 2b i) 40

Figure 3. Autocorrelation of bit-error traces.

Extensive performance evaluation suggests that correlation of less than 15% does not
play a signiﬁcant role in the error process characteristics.

Based on the threshold, the memory-lengths of the 5.5 Mbps traces of Figure 3 are 12
and 14 respectively. The correlation of both 2 Mbps traces drops below 0.15 at the lag of
16. Hence, we use memory-length 14 and 16 as the maximum order of the 5.5 and 2
Mbps Markov chains respectively. Since the memory-lengths of the 2 and 5.5 Mbps bit-
error processes are not very large as compared to the 11 Mbps process, high-order
Markov chains can appropriately model these processes.

Contrary to the correlations at 2 and 5.5 Mbps, Figure 3 clearly shows that the 11
Mbps bit-error process has high correlation even at large lags. This is reminiscent of
long-range dependence since a low-order memory-length cannot be identiﬁed for the 11
Mbps bit-error process. Consequently, Markov models cannot be used to model 11 Mbps

bit-errors.

26

A.4.2.2 Preliminary Empirical Analysis of FSM Chains

In accordance with the discussion in Section A.3.4, we represent the bit-error traces
as a binary series {r(i)}:=1, where :r (i) = 1 represents a bit-error and l is the length of

the series. Also, for a memory-length of k , a full-state Markov (FSM) chain has states

corresponding to all possible 2k combinations of k consecutive bits. The complexity
(i.e., number of states) of the FSM chains increases exponentially with an increase in
memory-length. Previous studies employed low-order Markov chains [8], [14]—[17].
However, due to the present interest in capturing high-order behavior, we provide
analysis and modeling with high-order FSM chains.

For efﬁcient and accurate representation of the transition probability data and to
reduce the complexity we examined the FSM transition probability matrices for bit-
pattems that never occur in the collected traces. We refer to such bit-patterns as the
unused states. These states result in all-zero columns in the transition probability matrix.
An all-zero column implies that the probability of jumping to that state from any state is
zero. While other methods for judicious selection of Markov states exist [67], used states
provide a simple and effective method of minimizing the model complexity.

The percentage of unused states for each order is shown in Figure 4. It can be

observed that the number of unused states grows as the order of the Markov chain is

increased. For example, in case of a 212 state model, at 2 and 5.5 Mbps approximately
80% and 30% states are never used. We lay special emphasis on this observation since
the total number of states directly corresponds to the complexity of the model. All
following FSM results will employ the used states only. Here we recognize that the

number of unused states will decrease as the channel is observed for a signiﬁcantly long
27

+ 5.5 Mbps I WI
,‘9' 2Mbps__l

\I
CC
4;:

 
   
   

a:
9

or
9

(D
O
r

to
c?

percentage of unused states
A
O

.3
O
r

    

 

 

O

" " 1024 16384
number of states (logscale)

Figure 4. Percentage of unused F SM states at 2 and 5.5 Mbps.

4096

period of time, i.e., number of unused states is inversely proportional to the trace length.
However, and as substantiated by the FSM performance evaluation in later sections, FSM
chains perform quite reasonably without the unused states thereby implying that the

unused states do not play a signiﬁcant role in overall channel characterization.

A.4.2.3 Long-Range Dependence in 11 Mbps Bit-Errors

The autocorrelation analysis in Section A.4.2.1 provided initial indications that the 11
Mbps bit-errors are LRD in nature. This section substantiates this preliminary notion of

LRD by analyzing the 11 Mbps bit-error process in further detail.

A.4.2.3.1 LRD Evaluation by Observing Energy at Different Scales

Since LRD processes typically demonstrate second-order self-similarity, zooming out
from a sample path of the process should yield a path similar to the original in second-
order statistics. As shown by (A8), in order to determine scaling in a process, we can
deﬁne an aggregate process by dividing a bit-error trace of length I into non-overlapping

blocks of length m and averaging over each block. The resultant aggregate sample path

28

 

 

 

 

mm ' Xm 1 ‘ I ‘ -
5’; Ion” WMM MNVIIiII/VQIIWW

 

X<4>

IAN/EMA 4;- .iitviekmnwm
2°° M “iii 1%”le NW

0

 

 

 

 

 

 

 

JRVWW “Maw w [YIMWW

0100

 

Figure 5. Aggregates of the 11 Mbps energy process at different time scales.

averages m points from the actual sample path. Due to the {0,1} representation of the

bit-errors, an m level aggregate process represents the normalized energy of bit—errors in
non-overlapping windows of size 177.. We henceforth use the terms aggregate process
and energy process synonymously. Aggregation smoothes high variances in the sample
path and provides an on-average zoomed-out version of the actual sample path. Thus
energy processes at different aggregation levels outline the impact of aggregation on the
short-term second moment of the process.

Figure 5 outlines three aggregate processes. The top ﬁgure is a process sample path

X(1)[k] outlining the unnormalized energy (i.e., the total number of errors) observed in

each packet (packet transmission time=l second). The second ﬁgure is a level-4
aggregate of the ﬁrst sample path which depicts the average energy observed in four

packets. Thus, the ﬁrst point in this level-4 aggregate sample path is
29

X(4) [1] = ﬁx“) [1] + X(1)[2] + X(1)[3] + x(1)[4]).

Similarly, the remaining two ﬁgures are aggregates at levels 8 and 16. Each aggregate
path is zooming out of the actual sample path and no statistically differentiating features
are revealed by simple observation. Thus it can be inferred that the decrease in variability
with increased smoothing is very slow. This slow-varying decay is further highlighted by

the analysis of second-order statistics in the following section.

A.4.2.3.2 LRD Evaluation using Variance-Time Diagrams
Recall that for an LRD process, the variance var {X(m)} of the aggregate process is

equal to m2H_2 var {Km}. Variance-time diagrams plot the logscale variance of the

aggregate process as a function of the logscale aggregation level. Second-order self-
similarity is implied if the logscale decay in the variance is strictly linear, that is, the
change in variance is directly proportional to the aggregation level. For an LRD process,
the Hurst parameter H can then be estimated by ﬁtting a least-squares line through the
plot. A stationary second-order self-similar process is said to be long-range dependent if

1/2<H<1.

30

log(Var(X‘""))

 

 

 

 

 

o 0.5 1 2.5 3 3.5 ' ' 0 0:5 1 2:5 3 3:5

‘13g(mf ‘Iigonf
(a) trace 11 Mbps 1: Hurst parameter (b) trace 11 Mbps 2: Hurst parameter
estimate, H = 0.83 estimate, H = 0.857

Figure 6. Variance-time diagrams of two 11 Mbps bit-error traces.

The variance-time diagrams of two 11 Mbps bit-error traces for different aggregation
levels is given in Figure 6. Clearly, for both the traces under consideration, the variance
has a mostly linear decay with respect to the aggregation level. Least-squares lines of
order-1 are ﬁt to the data points of the two variance-time diagrams. The slopes of the two
least-squares lines of Figure 6 (a) and (b) are —0.3401 and —0.287, respectively. In
accordance with the above discussion, an estimate of the Hurst parameter, H , can be
obtained by noting that the slope of the variance-time plot should be equal to 2H — 2.
This results in Hurst parameter estimates of H = 0.83 and H = 0.857 for the two
traces. The two values of H are quite close to each other because the two traces are
realizations of the same random process. Further, for both Hurst estimates the

1/2 < H < 1 condition is satisﬁed, thus implying that the 11 Mbps bit-error process is

long—range dependent. To further substantiate the LRD notion, in following section we

provide LRD analysis using a frequency-domain estimator.

31

log( hi 3))

   

 

log(l(ml)
a in en A u u '— o -

 

'3? is it .33 .5 723*: «*5 :3 “iv in. '— :‘4’ z r *3
Iogtwl Iogtw)
(a) trace 1 1 Mbps 1: Hurst parameter (b) trace 1 1 Mbps 2: Hurst parameter
estimate, H = 0.874 estimate, H = 0.877

Figure 7. Logscale periodogram of two 11 Mbps bit-error traces.

A.4.2.3.3 LRD Evaluation using the Periodogram

A periodogram renders an estimate of the power spectral density of a process. The
periodogram is simply the square of the magnitude of the discrete-time Fourier
transformed samples. Mathematically, the periodogram of a discrete-time process X” is
given as:

2 (A.15)
I (w) =

 

 

__1_ g: X e—ikw
27rN k=1 k
where w is the frequency, N is the total number of samples and i = J——1 . Recall from
Section A.3.7 that the spectral density of an LRD process is proportional to hull—2H near
the origin, w = 0. Since the periodogram of (A. 15) is an estimate of the spectral density,
a regression of the logarithm of the periodogram on the logarithm of the frequency w
should render an order 1 polynomial with a slope of 1— 2H. A frequency-domain
estimate of H can thus be obtained by ﬁtting an order-1 least-squares line through a log-

log plot of the periodogram versus the frequencies. In general, only the lower 10%

32

frequencies of the periodogram are used for this estimation since the approximation only

holds true near the origin.

The logscale periodograms of the two 11 Mbps traces are shown in Figure 7 (a) and
(b), respectively. These slopes yield Hurst parameter estimates of H = 0.874 and

H = 0.877 for the two traces. These estimates satisfy the 1/2 < H < 1 condition, thus

substantiating that the 11 Mbps random process has long-range dependence. Further, note
that these estimates are quite close to the estimates rendered by the variance-time

diagrams of the last section.

A.4.3 Accurate Modeling of 802.11b Bit-Errors
A.4.3.1 Bit-Error Modeling at 5.5 Mbps

The autocorrelation analysis in preceding sections outlined a maximum memory-

length of 14 for the 5.5 Mbps bit-error process. A memory-length of 14 corresponds to an

FSM chain with 214 = 16384 states. ENK-based performances1 of FSM chains with
varying memory-lengths are given in Figure 8. FSM chains perform remarkably well for
the bad-bursts random variable. Note that even smaller order chains perform adequately
with the source coding overhead of less than or approximately equal to 0.03 for all cases.
However, the good—bursts random variable incurs signiﬁcant overhead for smaller order
chains. For example, the two-state chain renders an overhead of approximately 0.5 and is
therefore not a viable option. For higher-order chains, the overhead decreases and drops
to a reasonable level, beginning at the 51 l-state model. Due to data over-ﬁtting

considerations, we assume that any overhead less than 0.1 is acceptable. Thus we

 

' The terms performance and accuracy are used synonymously throughout this thesis.

33

conclude that all FSM chains of orders 9 and above render appropriate models for the 5.5

Mbps bit-error process.

A.4.3.2 Bit-Error Modeling at 2 Mbps

The performances of varying order FSM chains are shown in Figure 9. For both
random variables, small order FSM models incur profound overhead. For instance, for
the good-bursts random variable the overhead of the order-1 (two-state) chain is
approximately 0.8. Although lower order F SM chains cannot model the bit-error process
effectively, as we move to higher order chains the overhead decreases substantially and
drops to a reasonable level. Since the 5.5 Mbps F SM outlined that the actual models’
order can be smaller than what is outlined by the data correlation, in Figure 9 we only
provide analysis up to order 10 since the performance improvement saturates after the
order-10 (548-state) model.

It is clear from Figure 9 that low-order F SM chains incur signiﬁcant ENK overhead
and hence are unable capture the 2 Mbps bit-error behavior. For both random variables,
the FSM performance subsequently improves with an increase in the order of the model.
The accuracy of the order-10 F SM chains is comparable to the divergence between two
actual traces. Hence, we conclude that 548-state FSM renders a good model of the 2

Mbps MAC layer bit-error process.

34

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.7 a_r .31
§ FSM FSM
0.6» """" .5 Mb 5 bit-error traces 0.01 -------- 5.5 Mbps bit-error traces r
33 a
3 50.008
190.4 1.3
§ '0
(U
003 90.006-
32” x
60.2 E
0.004. >
0.1
V: 8 16 32 64 1.28 I 56 ”5 Z G.%% 8 1‘6 3‘2 67‘ 128 :32“ 5 2
number of states number of states
(a) good-bursts (b) bad-bursts
Figure 8. Performances of varying order FSM chains for the 5.5 Mbps MAC layer bit-
error process.
A d; , r——v——
U‘ FSM 0'035 FSM
07+ ........ 2 Mbpsibit-error traces -------- 2 M s bit-error traces
o.o< .
.306)- 9
U)
‘3 0 5r 30.02%
ID I
o 0.4 » U o 02
g, 8
§ 0'35 §0015
LU 0.2» “J
0 1 0.01 -
> s
”I 4 8 1‘6 3‘2 64 128 2‘56 SiZ 1024 Gwsl 4 8 1‘6 3‘2 64 128 2‘56 52 1024
number of states number of states
(a) good-bursts (b) bad-bursts
Figure 9. Performances of varying order FSM chains for the 2 Mbps MAC layer bit-error
process.

A.4.3.3 Bit-Error Modeling at 11 Mbps

In earlier sections, we revealed the LRD nature of the 11 Mbps bit-errors using
autocorrelation and scaling analyses. Based on the results from the last two sections, one
can conjecture that if high-order FSM simulations are performed, it might be possible to
identify an appropriate Markov process of an order lower than what is outlined by the

autocorrelation. However, ascertaining such a model order might require simulations with

35

high-order F SM chains, which is computationally infeasible. In this section we show that
a multifractal wavelet model (MWM) captures the LRD characteristics of the 11 Mbps
bit-error process quite accurately. Although Markov models cannot capture the LRD
nature of bit-errors at 11 Mbps, we use Markov chains as a performance reference when

evaluating the performance of the MWM.

A.4.3.3.l The Multifractal Wavelet Model
We used the MWM toolbox [73] to train an MWM. An actual 11 Mbps trace (i.e., a

bit-error sequence of zeros and ones) was used for MWM training. Various simulations

were performed with ,6 , point-mass and hybrid ﬂ/point-mass probability distributions

for the Aﬂc random variables and Gaussian and log-normal distributions for the U10: k0

random variable. We observed that the performance of the MWM was quite insensitive to
the choice of the probability distribution chosen to capture the MWM random variables.

For brevity we only report results for the 5 distribution.

A.4.3.3.2 ENK-based Performance Evaluation

A cautionary note is in place before we proceed with ENK-based performance
evaluation of the MWM at 11 Mbps. Due to its reliance on entropy, the ENK divergence
compares the skew of the probability distributions, but does not place much emphasis on
the second-order statistics (e.g., energy, variance etc.) of the distributions. The MWM (or
for that matter any model of LRD data), on the other hand, is designed to capture scaling
phenomena (and the consequent second-order statistics) of an LRD random process. Thus
for an LRD process, comparison only using ENK of good- and bad-burst distributions
can be misleading. Thus ENK divergence by itself cannot render an appropriate measure

to completely quantify the MWM performance. In addition to ENK, it is imperative that
36

Table 2. Performance of MWM and FSM for the 11 Mbps Bit-Error Process

 

 

 

 

 

 

 

 

 

 

 

good-bursts bad-bursts

ENK(trace1||trace2) 0.0586 0.00032
ENK(trace1||MWM) 0127 0.093
ENK (tracelll FSM (16)) 0-174 0002
ENK(trace1||FSM(4096)) 0.091 0.00094
ENK (trace2|| MWM) 0-143 0.096
ENK (trace2lIFSM (16)) 0-189 0002
ENK(trace2||FSM(4096)) 0.088 0.0017

 

 

second-order statistics of the random process be compared with the model. We perform
such second-order performance evaluation in the subsequent sections.

The ENK-based performances of the MWM and FSM chains are tabulated in Table 2,
where FSM (2:) represents an FSM chain with :1: states. The good-bursts ENK overhead
of the MWM is lesser than the l6-state FSM chain, while the bad-bursts overhead is more
than the 16-state FSM chain. The MWM ENK overhead is slightly worse than the 4096-
state FSM chain. For instance, for the ﬁrst actual trace the MWM’s ENK good-bursts
divergence is 0.127 — 0.091 = 0.036 more than the 4096-state FSM. For the same
example, the bad-bursts ENK overhead of MWM is 0.093 — 0.00094 = 0.09206 more

than the 4096-state FSM.

37

9
&
o
&

 

 

0,35 0.35
E‘ i:
:5 0.3 E 0'3
:0 :0
@025 30.25
‘3 02 .3 0.2
3 3
.0015 .0015
g) 0.1 g 0.1
0.05 0.05
o o 1 - -
0 10 9003) burst lggigth 5° 0 10 900? burst lgggth ‘0 50
(a) good-bursts (b) bad-bursts

Figure 10. Probability mass functions for good- and bad-bursts random variables derived
from an 11 Mbps trace. (Only the probabilities of small bursts are shown here.)

. mpslllllllllllllllﬂlllllllmtlulll #0101 Mill

MWM
0
0.6
16 state
FSM

     

 

 

200000 400000 0 6000 10000 15000

(a) aggregation level=8 (b) aggregation level=256
Figure 11. Energy processes of actual and synthesized bit-error traces.

The slightly superior performance of the FSM chains is due to the fact that the FSM
model is extremely apt at capturing the short-term correlation structure of the random
process. This short-term dependence is because of small bursts of good and bad bits.
Such small bursts are quite prevalent even in an LRD process such as the present 11
Mbps bit-error process. To substantiate this claim, we show the small burst probabilities
of the good- and bad-bursts random variables in Figure 10. Note that burst-lengths of l,
2, 3, 4 and 5 constitute 78.35% and 98.03% of the probability space of the good- and

bad-bursts random variables, respectively. This small burst behavior is very adequately
38

characterized by an F SM chain. Since the skew of both probability distributions is
dictated by these highly probable small bursts, the ENK overhead of the FSM chains is
quite low, although FSM chains cannot capture the long-term process correlation. The
skew-oriented bias of the ENK measure masks the long-term correlation properties of a
random process, which is exhibited in the spread (i.e., the variance) of the probability
distribution. We henceforth focus solely on second-order analysis of the models under

consideration.

A.4.3.3.3 Performance in Capturing Energy at Different Scales
We ﬁrst consider energy in non-overlapping windows of the bit-error traces. As

mentioned previously, the deﬁnition of energy (as given in (A.8) and explained in prior
sections) outlines the second moment of the random process in short-term windows. Two
examples of an energy process derived from an actual source and energy processes
synthesized using the MWM, the 16-state FSM chain and the 4096-state FSM chain are
illustrated in Figure 11. It can be observed that the FSM chains project overly pessimistic
energy estimates (i.e., very high error rates), whereas the MWM in general has less
energy per window than the actual 11 Mbps bit-error process. By simple observation, it
can be deduced that the MWM captures the energy characteristics of the 11 Mbps bit-
error process better than the Markov chains. In the next section, we compare the

aggregated variance-time behavior of the FSM chains and the MWM.

A.4.3.3.4 Performance in Capturing the Variance-Time
Characteristics
In this section, we evaluate the accuracy of the MWM and the FSM chains in

modeling the decay of aggregated variances. Figure 12 shows the variance-time behavior

of FSM chains. Clearly, the FSM chains can capture the short-term correlations of the
39

random process with outstanding accuracy as shown in the top-left corner of Figure 12.
However, the performance degrades sharply as the dependence (i.e., linear decay of the
variance) persists at higher scales. Not surprisingly, more and more correlation is
captured as we increase the memory-length of FSM chains. Thus if the complexity of an
FSM chain that captures all the scales present in the data can be afforded then such an
FSM chain can render a highly accurate model.

Unfortunately, in practical LRD processes the correlation typically persists at very
high scales. In such a case, a model that is designed to capture the correlation structure at
different scales (e.g., the MWM) is more suitable than FSM chains. This observation is
outlined in Figure 13 (a), which shows that the MWM can capture the decay of variance
of the 11 Mbps quite accurately within an additive constant. The phenomenon of a
model’s inability to capture the exact variance values is well-known in LRD literature. It
has been diagnosed that this problem arises because of non-stationarities introduced by
jumps in the mean and slow decaying trends [74]. (The jumps in the mean of the 11 Mbps

bit-error process can be easily observed in Figure 11.) Teverovsky and Taqqu [74]

2H—2

proposed to eliminate this problem by ﬁtting the ﬁmction 01+ 02m to the

variance-time diagram of the LRD process. The corrective factors, Cl and Cg , can then

be added to the variances produced by a model.

In the present problem, the corrective factors were Cl = 0 and 02 = 3.71 .
Variance-time diagram of the MWM with the corrective factors is given in Figure 13 (b).
Clearly, the corrected MWM captures the decay in the variance quite accurately. Thus we
deduce that MWM is an effective model of the long-range dependence present in the 11
Mbps bit-error process.

40

  
  
  

.N
(n

log(Var<x‘""»
3 a '

 

+ 4 state FSM
—e— 16 state FSM
J —8— 4096 state FSM
+ trace 11 Mbps
'50 ois i 11.5 i is :3 3:5 4

1:

.5-
a:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10g(m)
Figure 12. Variance-time diagrams of varying order FSM chains for the 11 Mbps bit-
error process.
-1 -1.
-1.
'1' -1.
a: 7:4.
’2? ‘ ’E _
% 2,2.
’3’ .8 -2.
-2.
“’5‘ —~— trace 11 Mbps 1
+ MWM
'40 as 3 1:5 5 23 a 3‘5 it '3‘0 0.5 i 11.5 a 2:5 5 31.5 3
log(m) 1080'“)
(a) without corrective factors (b) with corrective factors

Figure 13. Variance-time diagrams of the MWM for the 11 Mbps bit-error process.

A.4.4 Discussion

In this chapter, we proposed accurate models of MAC layer bit-error channels at 2,
5.5 and 11 Mbps data rates of an 802.11b LAN. While the MWM model for 11 Mbps bit-
errors has linear-complexity in synthesizing and predicting bit-error behavior, the FSM

chain model’s complexity increases exponentially with respect to the memory-length.

41

The following chapter reduces the exponential complexity by approximating FSM

chains’ behavior using low-complexity models.

42

CHAPTER A.5 COMPLEXITY REDUCTION
FOR MARKOV CHANNELS

Most beneﬁts of a wireless MAC layer channel model can be realized if the model is
able to provide real-time and online channel characterization and prediction. In
complexity— and power-constrained wireless and mobile environments, such channel
characterization is only possible with a low-complexity model. Despite some recent
interest in reducing the complexity of wireless models [20]—[29], development of
accurate, pragmatic and low-complexity wireless channel models is still an open
problem. Since low-complexity models have not been thoroughly explored and veriﬁed
for contemporary wireless and mobile networks, many of the protocols, applications and
systems mentioned in Chapter A] have not been realized in practical wireless systems.

The number of states of an FSM chain is an exponential function of the random

process’ memory-length - 2” states for a process with a memory-length of 10 bits. This
phenomenon is commonly referred to as state explosion. Due to state explosion, although
F SM chains can provide accurate models of wireless bit-errors, their high complexity
renders them impractical for realistic wireless environments.

To reduce FSM chains’ complexity, in this chapter we ﬁrst consider hierarchical [18]
and hidden [75] Markov models. We observe that these models cannot accurately
characterize the bit-errors channels under consideration. Consequently, we focus on
directly approximating FSM chain behavior. We ﬁrst make insightful observations about

underlying characteristics of an F SM model. As a ﬁrst direct approximation model, we

43

derive and evaluate a new class of lumped Markov chains [59]. However, we observe that
lumped Markov chains are also unable to approximate the behavior of FSM chains.
Finally, we analyze how FSM chains capture good- and bad-burst behavior of
wireless channels. Using this analysis, we derive important guidelines for the realization
of accurate, effective and low—complexity models. These guidelines lead to a constant-
complexity model (CCM) that always comprises of ﬁve states irrespective of the memory-
length. We show that the performance of the 5—state CCM in modeling of the 2 and 5.5
Mbps MAC layer bit-error channels is comparable to exponential-complexity FSM

chains and better than linear-complexity models [29].

A.5.1 The Hierarchical Markov Model

The hierarchical Markov model (hMM) is based on the observation that error traces
exhibit clear delineation between highly bursty error regions and relatively low error
regions. Therefore, in an hMM [18], severe- and low—burst regions are identiﬁed in the
bit-error traces. Each of the burst states has an embedded two-state Markov model as
depicted in Figure 14. One of the challenges of the hMM model is the delineation of the
high-level severe- and low-burst states. The work in [18] employed a state demarcation
heuristic to delineate the low- and severe-burst states in the error traces. The state
demarcation heuristic relied on two empirically determined thresholds to transit between
severe- and low-burst states. One of the thresholds, say threshold 1, determines whether
or not a burst of bad bits is a small isolated burst between mostly good bits. The other
threshold, say threshold 2, ascertains the number of good bits which can characterize the
end of a long/severe burst of bad bits. Small thresholds can make the process transit

erratically between the high-level states, whereas large thresholds can unnecessarily

44

 

Figure 14. The hierarchical Markov model (hMM) [18].

Table 3. Performance of the hMM for 5.5 Mbps Bit-Error Process
good-bursts bad-bursts

 

 

 

 

 

 

 

 

 

 

ENK(trace1]|trace2) 0.0086 0.0029
ENK(trace1||hMM) , thresholdl=threshold2=10 0-521 0-009
ENK (trace2|| hMM) , thresholdl=threshold2=10 0-63 0-0106
ENK(trace1||hMM) , thresholdl=threshold2=25 0-62 0-009
ENK (trace2ll hMM) , thresholdl=threshold2=25 0676 001“
ENK (tracelll hMM) , thresholdl=threshold2=50 0587 0.009
ENK (trace2|| hMM) , thresholdl=threshold2=50 0-594 0-013
ENK(trace1||hMM) , thresholdl=threshold2=100 0-624 0-009
ENK (trace2|| hMM) , thresholdl=threshold2=100 0-682 0-01 1

 

 

 

 

 

increase the sojourn time spent in a high-level state. There is unfortunately no good
method of determining the best values of these thresholds and heuristic experimentations
are needed to ﬁnd somewhat accurate values of these thresholds.

Table 3 outlines the ENK-based performance of the hMM for varying values of the

two thresholds. The ENK divergence of the actual traces [row 1 of Table 3] provides a

45

reference value for performance evaluation of the hMM. It is obvious that irrespective of
the threshold values, the hMM always incurs a high overhead of more than 0.6 for the
good-bursts random variables. The bad-bursts random variable usually takes small values
since most of the bits are not corrupted. From Table 3, it can be seen that for the bad-
bursts random variable, the ENK distance between the hMM- and actual traces is quite
small for all thresholds. This ENK overhear is nevertheless much larger than the ENK
divergence between the actual traces. From these results, we conclude that the hMM

cannot capture the present MAC layer wireless bit-errors.

A.5.2 The Hidden Markov Model

To apply hidden Markov models (HMMs) to this problem, we need discriminative
statistical features that can be used to train the HMM. After much experimentation, we
found that bit-error energy in non-overlapping windows can serve as an effective
discriminative feature for the low and severe bit-error conditions. Due to the present

{0,1} representation of bit-errors, the error-rate simply corresponds to the energy process

deﬁned in Section A.4.2.3.l [equation (A.8)] with the window size representing the
aggregation level. We use the bit-error energy as input to the HMM’s Baum-Welch
forward-backward training algorithm [76], [77].

We ran simulation for varying window sizes and for varying number of HMM states.
Table 4 enumerates performances of three HMMs; similar trends were observed for other
HMM experiments. Note that the HMM performance is quite sensitive to the window
length. For instance, note that the HMM over a 1000 bit window has far inferior
performance than the 2000 bits HMM, even though the 2000 hits HMM has lesser

number of states. In general, the HMM performance improved with an increase in
46

Table 4. Performance of the HMM for the 5.5 Mbps Bit-Error Process

 

 

 

 

 

 

 

good-bursts bad-bursts
ENK(trace1|| HMM) , window=2000 bits, HMM states=3 0-403 0-685
ENK (traceZII HMM), window=2000 bits, HMM states=3 0.409 0.731
ENK (tracelll HMM) , window=1000 bits, HMM states=5 2-466 2-408
ENK (trace2lIHMM) , window=1000 bits, HMM states=5 2-406 2-562
ENK (tracellI HMM) , window=4096 bits, HMM states=8 0-109 0-175
ENK (trace2ll HMM) , window=4096 bits, HMM states=8 0-114 0-179

 

 

 

 

 

window size. This improvement, however, saturated once the window size became
greater than the packet size.

Comparing the good-bursts column of Table 4 with Table 3 reveals that the HMMs
with 3 and 5 states yield better good-bursts performance than the hMM. However, the
ENK values for the bad-bursts random variable in all the HMM cases are orders of
magnitude greater than the respective values for the hMM traces. Hence, we conclude
that, while the HMM improves the modeling of good-bursts, it does not model the bad-
bursts adequately. Thus the overall performance of HMM modeling for the experimental
error traces is unsatisfactory.

The poor performance of HMMs is because unlike other problem areas where well-
defmed characteristics of the input data are available for training, the bit-error traces of
this study do not provide robust training features. Furthermore, the HMM assumes that
the probability of staying in a state is exponentially distributed which is not be an
accurate assumption for wireless bit-errors. This assumption of exponentially distributed

sojourn state times results in inaccurate HMM parameterization.

47

 

 

Since the hierarchical and hidden Markov models cannot capture the bit-error
process, we now focus on directly approximating F SM chains’ behavior. This direct FSM

approximation will be performed by aggregating F SM chain states.

A.5.3 FSM Observations

In this section, we ﬁrst state two intuitive observations regarding FSM chains. These
observations are used in subsequent sections to derive important characteristics of FSM
chains. It is important to outline here how we intend to approximate FSM chains. The
approximate models of this thesis are developed by creating partitions of the FSM chain
state space. All FSM states in a partition are then simply aggregated/grouped into a single
aggregate state of the approximate process. Hence, this section mainly addresses the
following question: How should one deﬁne partitions on the FSM state space such that
the resulting aggregate process accurately approximates the underlying F SM chain? In
other words, we are trying to ﬁnd out which FSM states can be aggregated together

without compromising the FSM model’s performance.

A.5.4 Observations about FSM Chains

The ﬁrst observation is a direct consequence of the binary nature of the present
wireless bit-error process:

Observation 1. If a bit-by-bit sliding window is used to compute the transition

probabilities of a 2k state FSM chain, then from a current state, Xn = i, in one

transition the F SM chain can transit to only two possible states given by:

48

(2i) mod 2k (A.16)

Xn+1 =
(22' + 1) mod 2’“,

where k is the memory-length of the FSM chain and 2' 6 {0,1, . ..,2k — 1} is an arbitrary

state in the F SM chain 's state space.
An example given in Figure 15 clearly demonstrates this observation. A memory-

length of four, I: = 4, is used in this example so the set all possible FSM states is
{0,1,2,...,24 — 1 = 15}. The current state is Xn = (0110)2 = (6)10 and as the window

slides by one bit, the 0 in the most signiﬁcant bit position will be dropped and a bit will
be added to the least signiﬁcant bit position. Since the data are binary, the chain can

transit to either (1100)2 = (12)“) or (1101)2 = (13)10. Thus in essence Observation 1

implies that at each slide of the memory-window the process’ current state i is subjected
to three operations: left-shift by one bit which yields 2i, followed by an addition of a
zero (2i + 0 = 22') or an addition of a one (2i + 1) at the least signiﬁcant bit (LSB)

position, followed by a modulus operation which ensures that if the current state of the

= 210—1

process is Xn then the next state wraps around to state 0 (for Xn+1 = 22') or

state 1 (for Xn+1 = 2i+1). For instance, in the preceding example with k = 4, if
Xn = (1000)2 = (8)10 = 2”—1 then the next state will be either

Xn+1 = (2 x 8)mod 24 = (0000)2 or Xn+1 = (2 x 8 + 1) mod 24 = (0001)2. Since

each FSM state has two transition possibilities, each row of the FSM transition

49

 

 

 

  
  

 

Sliding Window

Xn+1=2x6=12

 

 

 

Sliding Window

Xn+1=2><6+1=13

Figure 15. Transition possibilities for an F SM chain (memory-length, ’9 = 4 ).

Table 5. Empirical Evidence in Support of Observation 2

 

 

 

 

 

 

 

2 Mbps 5.5 Mbps
7‘0 0.997 0.974
probability matrix will have at most two non-zero entries, given by p2. (21.)m0d 2k and

pi,(2i+1)mod2k = 1“ pi,(2i)mod2k °
Intuitively, one can claim that the number of error-free bits received over any
reasonable wireless channel should be much more than the number of corrupted bits. The
second observation stated below formulates this claim in terms of FSM chain parameters:
Observation 2. The steady—state probability of state 0 of an FSM chain for wireless

channels is much greater than the steady-state probabilities of all other states,

50

2’c —1 (A17)

7r0 >> 2 7rj ,
j=1

where It represents the memory-length and 7r,- represents the steady-state probability of

being in state 2' of the F SM chain.

The above observation implies that the mean-time spent in state 0 of the F SM chain
(i.e., the state with no errors) is much greater than the mean-time spent in all other states.
It can be intuitively argued that this observation holds for real-life wireless channels. For
instance, Table 5 gives the steady-state probabilities of the 802.11b 2 Mbps bit-error
FSM chain of order 10 and the 5.5 Mbps bit-error FSM chain of order 9. Since the
steady-state probability of staying in the good (all-zero) FSM state is very close to one
for both the channels shown in Table 5, we can safely claim that Observation 2 holds for

the wireless channels currently under consideration.

A.5.5 Markov Chain Lumpability

We ﬁrst evaluate direct applicability of the well-known Markov chain lumpability
technique [59] to the wireless modeling problem under investigation. Chen et al. [24],
[25] showed that on some wireless channels lumpability might be a viable option for
reducing channel modeling complexity. We specialize the general deﬁnition of

lumpability to the binary FSM case using the observations made in the previous section.

A.5.5.1 Lumpability for Wireless Bit-Error Channels

Let the state space of an FSM with 2” states be given as H = {0,1,...,2k — 1} . Now

consider a new process with state space S = {30,Sl,...,SN_1}, where N 3 2’“. Let the

51

F SM states belonging to H be disjointly distributed among states of the new process. In
other words, each element of S is in turn a set containing one or more FSM states and is
henceforth referred to as an aggregate state. If we impose a condition that an FSM state
cannot exist in two different aggregate states simultaneously then the set S constitutes a
partition of the FSM state space.

Before proceeding further, we employ Observation 1 to prove a necessary condition
for deﬁning partitions of the FSM state space. This condition is stated as a lemma.

LEMMA 1. The next state in an aggregate process can be accurately determined only if

the FSM states (22') mod 2k and (22' + 1) mod 2k do not belong to the same aggregate

state,
(2i)mod2k e S,- => (22' +1)mod2k g 3., (A.18)

where k is the memory-length, i E H and S j E S.

Proof: Lemma 1 is easily proven by contradiction. In essence, this lemma implies that

both transition possibilities of an FSM state cannot be aggregated in a single state. As

mentioned in Observation 1, (22') mod 2k and (2i+1)mod2’c are the only possible

transitions for F SM state i. Let there exist an aggregate state S ]- that contains both F SM

states (22) mod 2’“ and (2i+1)mod2k. Also, let Sq represent an aggregate state that

contains FSM state i. Then qurSj does not give any information about whether a good-

or a bad-bit should be added to the memory-window.

52

Let 1),-,3]. = Z PiJc , then 172353 represents the probability of moving from FSM
keS -
J

state i to aggregate state S j in one step of the FSM chain. Given Lemma 1 and using

Observation 1, Pi,5'j can be written as

1 . k ,
pi,(2i)mod2k ’(2’)m°d2 63]

' '— 4 .— ' k .
p2,Sj _ 1 pi,(2i)m0d2k ,(2Z + 1) mod2 E S]
0 ,otherwise.

 

L

An F SM chain is lumpable [59] with respect to a partition if for every choice of an
FSM chain starting vector the lumped process is a Markov chain and the transition
probabilities do not depend on the choice of the FSM starting vector. A process is said to
be weakly lumpable [59] with respect to a partition if at least one starting vector leads to a
Markov chain.

The strong lumpability theorem [59] is stated as:

THEOREM 1. A necessary and suﬂicient condition for an F SM to be lumpable with respect

to a partition 5' = {50, 5'1, . . . , SN_1} is that for each pair of aggregate sets 5,- and S ',
1),-,3]. has the same value for every FSM state in 3,.
See [59] for proof of a general case of this theorem.

The strong lumpability condition asserts that all F SM states belonging to an aggregate

state should have the same probability of moving out of the aggregate state. We illustrate

it using an example. Figure 16 shows two aggregate states, S,- = {n,m} and
Sj ={(2n)mod2k,(2m)mod2k}, where for ease of notation aggregate set

53

{(2a) mod 2]“, (2m) mod 2k} is written as {2n,2m}. As outlined in Observation 1, FSM

state 2n represents one of the two possible transition possibilities of FSM state n. The

probability of this transition is denoted as pmgn in Figure 16. Similarly, FSM state m
can move to FSM state 2m in one transition and this probability is denoted as Pm,2m~
The overall probability of moving from aggregate state S,- to aggregate state S j is given

as p5t15j' For this example, the lumpability condition requires that

p3,- 53- = pn,2n = pm,2m-

Since accurate wireless modeling necessitates the derivation of the Markov model
parameters from traces collected over an actual network, it is virtually impossible to
guarantee that the consequent FSM chain will have a transition probability matrix that
strongly or weakly satisﬁes the lumpability condition. (This assertion can be easily
veriﬁed by considering any of the real-life traces collected over actual wireless MAC
channels.) We hence deduce that lumpability in its precise form is not generically
applicable to wireless channel modeling.

The above discussion motivates a new question: Can we somehow relax the
lumpability conditions such that it is more readily applicable to the wireless modeling

problem under investigation? The following section tackles this question.

54

________________

  

(“"3

5'2. . . 1. .. A

Figure 16. Aggregate states SI. and 3, containing FSM states {71, m} and {2n,2m} ,
respectively.

A.5.5.2 Folded Markov Chains

The lumpability condition is too stringent to be enforced on wireless models. In this
section, we modify an FSM chain’s state transition probabilities such that the modiﬁed
chain can be divided into two equal-sized partitions that satisfy the strong lumpability
condition. We show that this state aggregation procedure can be applied recursively to a
transition probability matrix to achieve a desired level of complexity. Then we use the
802.1 lb MAC layer bit-error channel for empirical performance evaluation of this new

class of models.

We ﬁrst note that to reach FSM states 22' and 22' + 1 for 0 g i 5 2k—1 — 1 in a single
transition, the current state of the FSM chain should be either state i or state (219—1 + i).
In other words, the following pairs of FSM states have the same set of next possible

states:

55

(0,2k"1),(1,1+ 2k‘1),...,(2k—1—1,2"“1—1+ 2k—1 = 2k -1), (A.19)

For instance, a 4-state (i.e., memory-length of 2 bits) FSM transition probability matrix is

given by

'po,0 190,1 0 0 '
0 0 P12 P1,3
102,0 192,1 0 0 '

0 0 193,2 173,3‘

 

 

We can see that states 0 and 2 have the same one-step transition possibilities since in
one transition both these states can either transit to state 0 or state 1. Now if the
probability of transiting to state 0 is the same from both states 0 and 2 then these states
will satisfy the lumpability condition and hence can be aggregated together. Similarly,
states 1 and 3 have the same transition possibilities. If the probability of transiting to
state 2 is the same for both states 1 and 3 then they can be aggregated together. That is,

if the above conditions are satisﬁed then the FSM chain can be lumped with respect to

. . __ 2—1 _ _ 2—1 _ . .
partitions SO —- {0,0 + 2 — 2} and 51 — {1,1 + 2 — 3} thereby grvrng the
following transition probability matrix:

1750,30 1350,31
1731,50 1951,81

 

 

Based on the observation that the state pair (i,2k_1 + i) have the same one-step

transition possibilities, we propose to modify an FSM chain’s transition probabilities

matrix as follows:

56

pi,(2i)mod2k + p2k_l+i,(2i)mod2k (A20)
2
and pi,(2i+1)mod2k = P2k‘1+t,(2t+1)mod2k =1— pi,(2i)mod2k

 

f’i,(2i)mod2k = i72k_l+i,(2i)mod2k =

For i = 0,1,...,2k — 1, where pm and 13,-, j represent the transition probabilities of the

original and modiﬁed FSM chains. Aﬁer this transformation, state pairs (i,2k_1 + i) in

the modiﬁed transition probability matrix clearly satisfy the lumpability constraint and

can be aggregated together.

Using the above strategy, any 2’“ X2,“ FSM transition probability matrix can be

modiﬁed and folded about 2k—1 to give a Markov chain with exactly half the number of

states. Since the basic transition probability structure is retained aﬁer the folding

operation, this state reduction procedure can in fact be applied recursively to a 2’“ state

FSM chain to give a 2'" state folded Markov chain, where m is an integer such that
1 S m < k. We henceforth refer to these models as folded Markov chains (F MCs).

A folded process is a coarse approximation of an FSM chain because folding simply
ensures a non-zero transition probability between two aggregate states. However, the
FSM transition probabilities for the FSM states that are aggregated/grouped together may
be very different. Thus a folded process represents an on-average behavior of the FSM.

This fact will become clear in the performance evaluation section.

At this point, the following question may be raised: How is a 2m state FMC different
from a 2'” state FSM? A 2’” state FSM represents a process with a memory-length of
m , whereas a 2’" state FMC might be a folded version of an FSM with a memory-length

greater than m. For instance, in the following section we evaluate performance of a 64-

state FSM with a 64-state FMC. While the number of states is the same in both the
57

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.5 . . . . . . . . .
0 —V— FSM 0014- ‘9' FSM r
' -a— FMC l -a— FMC
0.45 .
U)
E 0.4 1
3
.9035
g 0.3r
90.2%
x
2 0.2»
1.1.!
0.15»
0.1
005 + JAILL- -...-H_Lr---111--r.-r r r- .1- r .l
248163264128256512 240163264128256512
number of states number of states
(a) good-bursts (b) bad-bursts

Figure 17. Performance of FMCs formed by folding a 512 state FSM to 256, 128, 64, 32,
16, 8, 4 and 2 states; the FSM process is trained using a 5.5 Mbps trace.

models, the FSM has a memory-length of 6 whereas the FMC is formed by performing

three folding operations on an FSM with a memory-length of 9.

A.5.5.3 Evaluation of Folded Markov Chains

We fold the order-9 FSM chain for the 5.5 Mbps process to FMCs having 256, 128,
64, 32, 16, 8, 4 and 2 states. The performance comparison of these FMCs with FSMs of
memory-lengths 2, 3, 4, 5, 6, 7 and 8 is provided in Figure 17. It can be clearly seen that
the FMC performance for any number of states is similar to or worse than the FSM. Thus
the FMCs do not provide any improvement in performance over varying order FSMs.
FMC performance was similar for a 2 Mbps bit-error channel. It can be deduced that only
the on-average FSM behavior captured by the FMCs is not sufﬁcient and more statistical

characteristics of FSM chains should be incorporated in an effective model.

58

A.5.6 Complexity Reduction by Approximating an FSM
Chain’s Good- and Bad-Burst Behavior

Now that we have established that lumpability and relaxed versions of it cannot
capture the complex wireless bit-error behavior, we focus on analyzing how an FSM
chain captures a channel’s good- and bad-bursts. To that end, in this section we derive
generalized probability distributions of good- and bad-bursts for an FSM chain of
arbitrary order. The probability distributions are derived in terms of F SM chain transition
and steady-state probabilities. These distributions render useful insights into important
F SM characteristics, which are used to develop guidelines for deﬁning FSM state space
partitions. (Recall that the objective of the present analysis is to ascertain partitions of
FSM state space. FSM states in a particular partition are then grouped together to form an
aggregate state in the low-complexity approximating process.) We want to deﬁne the
FSM state space partitions such that the resulting aggregate process, while being less
complex, closely matches the FSM chain characteristics. .

Let H and S denote the state spaces of an F SM chain and an aggregate

(approximate) process, respectively. Let i E H and S,- E S denote two arbitrary states of

the F SM and the approximate process, respectively. From Lemma 1 we have a necessary

condition that should be imposed on the aggregate states 8,. To simplify notation, from

this point forward we drop the mod 2’“ operation (where k is the memory-length) on

each F SM chain state. Thus, an FSM state (i)mod 2k is simply written as statei. As in

previous sections, let I and B denote the good- and bad-bursts random variables,
respectively. We want to derive closed-form expressions of I and B in terms of FSM

chain parameters. We expect the expressions for good- and bad-burst random variables to

59

 

Initial State

 

 

 

 

 

 

 

 

 

X, = 2i+1
. Xn+1=(2i+1)2mod2lc
=($k_1,$k_2,...,231,1‘0 =1)
2 l
X“,2 = (21+ 1)22 modz" I

 

 

 

i

l

Xn+k_1=(2i+ 1)2k‘1mod2k

 

= 21.—1

l

xn+k = 0

 

 

 

 

 

 

 

 

 

Xn+z+1 =1 +—' Xn+t = 0 <—---<—

 

 

 

 

 

Figure 18. State transitions of an FSM with memory-length k and a good-burst of length
l 2 k.

render insights into how an FSM chain captures these random variables. The following

theorem states the FSM probability distribution of good-bursts:

THEOREM 2. The probability distribution of a good-burst of length exactly I,

Pr {1 = l}, for an FSM chain of memory-length k is

2k_1 —1 min{k—2,l-—2}
PT {I = l} = Z 7r2i+1 X [12' X H p(2i+1)2j,(2i+1)2j+1’
p(2t+1)2"1,(2z'+1)2’ x p(2z'+1)2’ ,(2i+1)21+1 +1. l< k
Vk,l > 0, where p,- = l—k
p2k‘1,0(1’0,0) P01, 12 k.
(A.21)

Proof Before proceeding with the proof, we recall that the subscripts of all transition

and steady-state probabilities are modulo 2” . Let us focus on the proof of the l 2 10 case
60

since the proof of the other case is much simpler and follows a similar procedure. Given
any current state, a good-burst (i.e., burst of 0’s) will start if the current state has a 1 in

the LSB position of the memory-window, i.e., the current state represents an odd-

numbered FSM state X" = 22' + 1,0 g z' s 2’“1 — 1.

Without loss of generality, consider the state path given in Figure 18. For a good-

burst of length l starting in the current odd-numbered FSM state, the next It —1
transitions will be (22' +1),(2i +1)2,(2i +1)22,...,(2i+1)2k—1. Note that
(22' +1)mod2k—1 : 2k—1 and based on the discussion in Observation 1, the process
wraps around to F SM state 0 at this point, i.e., at point Xn+k—1 = 2k-1, the good-burst

continues and the process wraps around, Xn+k = 0. This transition sequence is

followed by l— k zero bits, i.e., the next l— k transitions are from state 0 to state 0

giving Xn+k = Xn+k+1 = Xn+k+2 = = Xn+l = 0. The good-burst ends when a

one bit is encountered at the (1+1)St transition, and the FSM process moves to
Xn+l+1 = (00...01)2 = (1)10. This state-transition path when expressed in the form of

probabilities will have to be summed over all possible odd-valued FSM states,

61

p1,(1)2 X p(1)2,(1)22 X -- - X p(1)2k-2’(1)2k—1
Pr{I =1}: 70 )1-..

Xp(1)2k-1,0 X (p0,0 X 170,1

p3,(3)2 x p(3)2,(3)22 x >< P(3)2k—2,(3)2k—1

+03 l—k
Xp(3)2k-1,0 X (120,0) X p0,].
+2
p21. —1,(2’c —1)2 x p(2k —1)2,(2’€ —1)22 X ‘” x p(2k —1)2’°‘2,(2’C —1)2’“‘1
+7T2k _1 l—k

XP(2k _1)2k—1,0 X (190,0) >< 100,1

Taking out common terms yields

2"“1—1 k—2
l—k
Pr {1 :1} = p2k‘1,0 X09010) X17011 X 2 “2241 H p(2i+1)2j,(2i+1)2j+1 ’
i=0 j=0

which is the same as the expression in Theorem 2 for all k 2 I.

Some explanation of the good-bursts probability distribution given above is as
follows: Let n denote the discrete time index at which a good-burst started. The last bit

received before the good-burst must be a corrupted bit, i.e., :1: (n - 1) = 1. Thus, at time
instance n — 1 the FSM chain’s memory-window had a “1” at the LSB position. In other
words, the FSM chain was in an odd state, i.e., Xn__1 = 22' + 1, where 1 S i 3 2k“1 — 1.

For the good-bursts probability distribution, we have to account for (or sum over) all the

odd states of the F SM chain. This fact explains the n2i+1’s in the additive expression of
(A21). For a good-burst of 1 bits, the 1 bits following :1:(n — 1) = 1 must be error-free,
i.e., :z:(n) = 0,2:(n +1) = 0,...,:z:(n + l — 1) = 0. This results in the multiplicative
expression following each 7%“. Thus the multiplicative expression characterizes the

62

state transition path for 1 good bits starting in F SM state 22' + 1. Since the good-burst

ends after 1 bits, the 17. +1 -th bit must be corrupted, i.e., x(n + l + 1) = 1. The p,-

expression characterizes the transition on the n + l -th step depending on whether the
total burst-length is smaller or longer than the memory-window.

Similar to Theorem 2, the probability distribution of a bad-burst of length l is given
in the following theorem:

THEOREM 3. The probability distribution of a bad-burst of length exactly I,

Pr {B = l}, for an FSM chain of memory-length k is

2k—1 —1 min{k—2,l—2}
Pr {B = l} = Z “'21; X “z X 1—1 p(2i+1)2j—1,(2i+1)2j+1_1’
i—O 31:0
p(2t+1)2"1—1,(2t+1)2’ —1 X p(2z'+1)2’_1,(2,-+1)21+1_2, l < k
Vk,l > 0, where p,- = l—k
02k—1_1,2k_1 X (2921: _1,2k_1) x P2k_1,2k _2, l 2 1:.
(A22)

Proof of this theorem is skipped because it is very similar to the proof of Theorem 2.

The expression for good- and bad-burst probability distributions given in (A21) and
(A22) are rather convoluted. Hence in their present forms, these expressions neither offer
any obvious insight into the FSM chain behavior nor are they amenable to further
analysis. In the following section, we employ Observation 2 to simplify the probability
distribution expressions of (A21) and (A22). The simpliﬁcation in turn leads us to the

design guidelines that should be followed by a low complexity model.

63

A.5.6.1 Simpliﬁcation of Good-bursts Distribution

We know from Observation 2 that the steady-state probability of F SM state 0 is very
high. Consequently, the steady-state probabilities of odd FSM states in the good-bursts
expression of (A21) are negligible. The terms involving a transition to or from state 0 of
the FSM will hence dominate the good-burst probability distribution of (A21).
Moreover, since the channel usually stays in the good state for practical wireless

networks, the good-burst length should in general be signiﬁcantly greater than the

memory-length. Hence, an effective good-bursts probability distribution Pr {1 =1}

should accurately capture the l 2 k behavior. An approximation of the good-bursts

probability distribution of (A.2 l) for l 2 k can be rewritten as:

l—k A.23
Pr{I=l}zp2k_10(p0,0) 190,1 ,Vl2k>0. ( )

Although the above expression is an approximation of the FSM chain’s good-bursts
probability distribution, it is clearly more insightful. For instance, note that the parameter
characterizing this approximate probability distribution is the probability of a good bit

transmission followed by another good bit transmission (190,0) since this is the only

parameter in (A.23) that involves the good-burst-length,l. Hence, one important
consideration while grouping FSM states should be that the all-zero (i.e., no-error) FSM
state is not grouped with a large number of other states. This is a natural consequence of
Observation 2 which implies that the mean time spent in the all-zero (i.e., no-error) FSM

state is signiﬁcantly higher than all other FSM states.

64

Similarly, in addition to the F SM state 0, two other important FSM states are state

2’“ _1 and state 1 since p2k_1 0 and 190,1 are the only parameters, other than 190,0, that

appear in the approximate probability distribution given in (A.23). Hence, due to their
relative importance in describing real-life wireless channels, a good model, in addition to

210—1

F SM state 0, should not group F SM states 1 and with too many other states. This

guideline will be employed to deﬁne the constant-complexity model.

A.5.6.2 Simplification of Bad-bursts Distribution

For the bad-bursts probability distribution of (A22), we again invoke Observation 2

and neglect the terms in (A.22) that are not multiplied with #0. Using this

approximation, the bad-bursts distribution (A.22) can be written as:

min{k-2,l~2} (A.24)

Pr {B :1} = 7r0u0 H 1223' _1’2j+1_1,
i=0

whereuo = l—k

The only terms appearing in (A.24) after the approximation involve FSM states 0,

2k —2, and 2j —1, for any 13 j g It. From Observation 1 and the good-bursts

approximation, we have already established that FSM state 0 should not be aggregated

with many other states. This deduction is reasserted here. Moreover, it is preferable not to
aggregate FSM state 2" — 2 with many other states. Also, if possible, all FSM states

2j - 1, where 1 S j S k , should not be grouped with too many other states.

65

A.5.6.3 Guidelines for Approximating an FSM chain

Based on the analyses of previous sections, we now deﬁne guidelines that should be
followed to develop partitions on the FSM state space. FSM states in each partition are
then aggregated to give a low-complexity aggregate model. The FSM state aggregation
procedure is based on the underlying assumption that there is a given complexity budget.
That is, the required number of states in the aggregate model is speciﬁed beforehand.
F SM state aggregation should result in a model which has the required number of states.

Given a complexity budget in the form of the total number of states and based on
preceding discussions, we deﬁne the following guidelines that should be followed to
develop an aggregate model with total number of states satisfying the complexity budget:

Guideline 1. Any FSM chain state aggregation should satisfy the condition given in
Lemma 1.

Guideline 2: F SM state 0 should not be aggregated with other states.

Guideline 3: FSM states 2’“1 and 1 should be aggregated with a minimal number of

other states.

Guideline 4: FSM states 2’“ — 2 and 2j — 1, for all 1 S j _<_ It, should be aggregated

with a minimal number of other states.

Note that Guideline 1 and Guideline 2 are more assertive than Guideline 3 and
Guideline 4. This is due to the analysis provided in the previous section, which outlined
that: (i) Guideline 1 is necessary for an accurate model, and (ii) Guideline 2, which is a
consequence of Observation 2, is asserted by the approximate distributions of both good-
and bad-bursts. Also note that Guideline 1, Guideline 2, and Guideline 3 can be easily

satisﬁed in a low-complexity model. However, Guideline 4 is somewhat problematic

66

because putting each 2j —1 F SM state, for all 1 s j _<_ k , in a separate partition (i.e.,
separate aggregate state) makes the total number of states of the approximate model an
increasing function of the memory-length k . Thus, satisfying Guideline 3 implies that the
resultant complexity (i.e., number of states) of the aggregate model will at least be a
linear function of the memory-length. We, on the other hand, want to keep the number of
states in the model independent of the underlying process’ memory-length. In the
following section, we develop a constant-complexity model which adheres to the ﬁrst
three guidelines. Performance evaluation of the model for 802.1 lb channels demonstrates
that although the proposed model ignores Guideline 4, it approximates an FSM chain’s

behavior with outstanding accuracy.

A.5.7 Constant-Complexity Model

In this section, we propose a constant-complexity model (CCM) which adheres to
Guideline 1, Guideline 2, and Guideline 3. Here, it should be emphasized that the FSM
state space partitioning presented in this section is only one of the many possible state
assignments. Future low-complexity channel models can deﬁne other state partitions

which should perform adequately as long as the above guidelines are followed.

The CCM keeps FSM states 0, l and 2k_1 each in a separate partition, while
grouping all the remaining FSM states into two partitions. The resulting model always
has 5 states irrespective of the memory-length. The structure and transition possibilities

of the CCM are illustrated in Figure 19. It is clearly outlined by Figure 19 that the CCM

assigns separate states to FSM states 0, 1 and 21°”1 , thereby adhering to Guideline 2 and

Guideline 3. All remaining even FSM states are grouped in a single aggregate CCM state,

67

 

 

  

 

.,2"—2

I 2k-1 H 2,4,6,...,2k‘1

Figure 19. State aggregation and transitions for the CCM. Each box represents an
aggregate CCM state. The number(s) inside a CCM state are the aggregated F SM states.

while all remaining odd FSM states are grouped in another aggregate state. Note that
none of the CCM states contains both an odd and an even F SM state, i.e., an aggregate
state either contains even FSM states or odd FSM states. Thus Guideline 1, which states
that FSM states 2i and 22' + 1 should not be aggregated together, is also satisﬁed by the

CCM. Based on our analysis, this 5-state CCM should follow the behaviour of the

underlying 2’“ state F SM quite closely. This CCM efﬁcacy will be adequately
highlighted in the next section where we compare its performance with FSM and linear-

complexity models.

A.5.7.1 Performance of the CCM at 2 Mbps

We provide ENK based performance comparison between the 548-state FSM and the
5-state CCM for memory-lengths ranging from 1 up to 10 in Figure 20 and Figure 21. We
also compare performance with previously proposed short-term energy model (SEM) and
zero-crossing model (ZCM) [29]. These two models constrain the complexity to increase
linearly with the memory-length. Performance of the 548-state F SM model formulates a
criterion for performance evaluation of the CCM, SEM and ZCM. The longest memory-

length of 10 yields a 548—state FSM, an ll-state SEM, a lO-state ZCM and a 5-state

68

 

 

19.7 : FSM

I (W)
‘7 SEM

a u

‘9 <

a»

LU

CID [E] Q
04 4 B 16 32 64 128 256 512 1024
number of states (logscale)
(a) good-bursts (b) bad-bursts
Figure 20. ENK based modeling performance versus complexity for the 2 Mbps bit-error
process.

A?

_.

good-bursts

/\

ENK:
p
\I (II

 

 

      

memeory-Ingth '
(a) good-bursts (b) bad-bursts
Figure 21. ENK based modeling performance versus memory-length for the 2 Mbps bit-
error process.

CCM. Let us ﬁrst focus on Figure 20 which plots performance versus complexity for
FSM chains, CCM, SEM and ZCM. Although all memory-lengths from 1 up to 10 were
evaluated, to show the results clearly this ﬁgure only plots the ENK values for a certain
number of states.

Due to the ﬁxed CCM complexity, only the ENK performance of one CCM

(corresponding to a memory-length of 8) is shown in Figure 20. This particular CCM was

chosen since it rendered the best overall performance. The performance of CCM models

69

for the remaining memory-lengths will be discussed shortly. It is clear from Figure 20
that for the good-bursts random variable the CCM performs as well as the 548-state FSM.
For the same complexity as the CCM (i.e., 5-states), the linear-complexity models have
higher ENK overhead. However, the performance of higher order linear-complexity
(SEM and ZCM) models is reasonable. Hence, it can be deduced that the CCM captures
the good-bursts behavior of the 2 Mbps wireless MAC layer channel accurately and with
lesser number of states than any other model under consideration. Similarly, Figure 20
shows that the CCM ENK overhead for the bad-bursts random variable is also very small
and is quite comparable to the corresponding FSM, SEM and ZCM. Speciﬁcally, the
CCM incurs an ENK overhead of 0.053 as opposed to 0.018 for the 4-state FSM, 0.039
for the 5-state SEM and 0.0386 for the 5-state ZCM.

Figure 21 provides further insight into the performance of CCM for different
memory-lengths. From Figure 21 it can be observed that the CCM performance for all
orders is better than the FSM model, SEM and ZCM for the good-bursts random variable.
In case of the bad-bursts random variable, the performance of all models with memory-
lengths greater than 3 is comparable. The CCM performance for small orders is better
than the linear-complexity models. For high orders, while both linear- and constant-
complexity models have slightly greater overhead than the FSM model, the CCM
performance is comparable to its linear-complexity counterparts.

The ENK divergence highlights that the CCM provides an accurate and low-
complexity bit-error model for 802.11b LANs operating at 2 Mbps. This performance

substantiates our initial analysis which outlined that a 5-state CCM can render a

performance that is comparable to the respective 2" state FSM chain. As shown in [29],

70

    

 

 

 

1.3. : FSM 00:7 -. FSM
15. ‘7 x CCM(memory-length=6) x CCM(memory-tength=6)
v SEM 1. v SEM
is”) [:1 ZCM 1:1 ZCM
(I)
€12» E 0.1»
.9 3
'8 1i 30.01-
a... s 1
=24... g
0.
wo4- “J 1 V
0.2 0.01- V
° 4 63264128256511 [:3 $é$§ﬁ$§$
number of states (logscale) number of sta es (logsca e)
(a) good-bursts (b) bad-bursts
Figure 22. ENK based modeling performance versus complexity for the 5.5 Mbps bit-
error process.

  

 

 

   

‘ memory-length .
(a) good-bursts (b) bad-bursts
Figure 23. ENK based modeling performance versus memory-length for the 5.5 Mbps
bit-error process.
the linear-complexity (SEM and ZCM) models also yield very good ENK based

performances.

A.5.7.2 Performance of the CCM at 5.5 Mbps

ENK based performances of the FSM chains, CCM, SEM and ZCM at 5.5 Mbps are
outlined in Figure 22 and Figure 23. Only a CCM with memory-length of 6 is shown in
Figure 22 since it renders the best overall (good- and bad-bursts) performance. It is clear

from Figure 22 that the CCM performance for the good-bursts random variable is
71

comparable to or better than all other modeling techniques. Note however that the ZCM
performs slightly better than the CCM. Thus the CCM and ZCM, even at low orders,
capture the good-bursts behavior of the 5.5 Mbps channel very accurately. Similarly,
Figure 22 shows that the CCM ENK overhead for the bad-bursts random variable is also
very small. Figure 23 outlines the performance rendered by CCMs corresponding to
different memory-lengths. From Figure 23 it can be observed that the CCM performance
for all orders is better than or comparable to the FSM, SEM and the ZCM for the good-
bursts random variable. In case of the bad-bursts random variable, the performances of all
the models except the SEM are similar.

Thus, while keeping both complexity and modeling performance under consideration,
the ENK divergence asserts that the CCM outperforms its linear-complexity counterparts

in modeling of the 802.11b bit-errors at 5.5 Mbps.

A.5.8 Discussion

At this point, we have developed accurate and low-complexity models for the
wireless bit-error channels under consideration. In the following chapters, we explore the
application and usefulness of these models. Speciﬁcally, the next chapter uses these
models in a novel wireless multimedia framework. The last contribution chapter of this
part quantiﬁes the inaccuracies that are incurred if channel memory is ignored and a low-

order FSM model is used to simulate and analyze wireless systems.

72

CHAPTER A.6 CHANNEL MODEL BASED
HEADER ESTIMATION FOR WIRELESS
MULTIMEDIA

Wireless channels incur unpredictable and time-varying packet losses due to channel
interference and node mobility. This data loss is particularly detrimental for real-time
communications since their delay constraints generally do not allow retransmission-based
recovery of lost packets. Consequently, recent multimedia standards have introduced
enhanced error resilience and concealment features (e.g., slices in JVT/H.264 [83] and
reversible VLC in MPEG-4 [84]) to cater for bandwidth-constrained and error-prone
wireless channels. Distortion in multimedia quality at a wireless receiver can be
substantially decreased if corrupted packets, instead of being dropped, are relayed to the
multimedia application. The application can then decide to retain, drop or recover the
corrupted packets.

To improve packet throughput at a wireless receiver, enhanced robustness is provided
at the physical layer of emerging wireless protocol stacks. Nevertheless, residual/MAC-
to-MAC errors not corrected by the physical layer cause checksum failures at higher
(MAC and transport) layers, leading to a signiﬁcant number of packet drops. The UDP-
Lite protocol was proposed to address this problem [41]- [44]. As explained in Section
A.2.2, the proposed UDP-Lite based transport schemes ignore errors in the application
layer payload, but drop all packets that have one or more bit-errors in the IP, the UDP, or

the application layer headers.

73

It has been shown that UDP-Lite based partial protection with application layer
forward error correction (FEC) improves wireless bandwidth utilization [41]-[51].
Support of partial protection necessitates changes to the standard protocols at the
multimedia transmitter and/or intermediate network nodes. In many realistic scenarios,
modiﬁcations to multimedia servers and/or intermediate nodes cannot be dictated by the
end-receivers. We argue that the requirement of transmitter modiﬁcations in UDP-Lite
has hampered its wide-spread deployment. Furthermore, frequent header errors result in
signiﬁcant packet drops for UDP-Lite, especially at high data ratesz.

UDP-Lite’s shortcomings can be addressed by a receiver-based scheme that, in
addition to ignoring payload errors, can estimate corrupted header ﬁelds. For such a
header estimation scheme to be practical, modiﬁcations below the application layer
should only be made to the wireless receiver. Thus no additional information (such as
FEC redundancy) is available for header estimation at the receiver. However, the
corrupted payloads relayed to the receiver’s application layer by a header estimation
scheme can and should be corrected using application layer F EC decoding. In this
chapter, we propose a cross-layer header estimation methodology that employs the MAC
layer bit-error channel models employed in the previous chapters to estimate the
corrupted headers of a packet.

Before outlining the actual header estimation methodology, we derive and present
sound analytical conditions for the region-of-operation under which header estimation
performs better/worse than UDP and UDP-Lite. We clearly show that for any realistic

wireless system, the FEC redundancy required by header estimation is always lower than

 

2 In [1 8] the authors showed that under realistic settings of an 802.1 lb network, packets dropped by a UDP-
Lite based protocol stack are 5.87% and 36.7% at 5.5 and 11 Mbps, respectively.

74

UDP and UDP-Lite protocols. Since F EC is generally performed on a byte-level, analysis
is provided for an arbitrary symbol size with the implicit assumption that the symbol size
is greater than one bit. We demonstrate the efﬁcacy of header estimation for two
important classes of symbol-level wireless channels: symmetric/memory-less channels
and Gilbert channels. We show that an ideal header estimation scheme can provide
redundancy reduction (or goodput improvement) of up to 75 % over UDP and UDP-Lite.

The analysis in the ﬁrst part of this chapter serves as a motivation to develop a
practical, effective and accurate header estimation framework to improve wireless
multimedia quality. We propose a header estimator that can use the accurate MAC layer
bit-error channel models developed in the preceding chapter to estimate the corrupted
critical header ﬁelds (CHF) of a packet, while non-critical header ﬁelds are simply
ignored. At a header estimation-based UDP multimedia receiver, the most likely
transmitted CHF is estimated through channel parameters. The proposed scheme requires
no modiﬁcations to the standard protocols at senders and/or intermediate nodes. Only
minor protocol stack modiﬁcations are needed at the receiver. We map header estimation
to a problem of maximum-likelihood (ML) estimation of known parameters in noise [79].
We derive likelihood functions for an arbitrary-order full-state Markov chain model and a
multifractal wavelet model [61]- [63]. The FSM likelihood ﬁrnction is extended to the
provide likelihood using the constant-complexity model. Trace-driven video simulations
at varying data rates of an 802.1 lb LAN show that the proposed scheme provides

signiﬁcantly better throughput and multimedia quality than normal UDP and UDP-Lite.

75

A.6.1 FEC Redundancy Lower Bounds for UDP, UDP-
Lite and Header Estimation

In this section, we derive theoretical bounds on the improvements provided by an

ideal header estimation scheme with application layer FEC operating on an q-ary

symmetric channel (SC) and a Gilbert channel (GC). Throughout this section, we
consider a MAC layer channel which sends and receives symbols of size at bits. We
assume that this symbol size is equal to the F EC symbol size. Since FEC is generally not
performed on the bit-level, for the following theoretical analysis we assume that m > 1.

The term “ideal header estimation” implies that all corrupted packets intended for a
receiver are passed to its application layer. We derive lower bounds on the expected
amount of FEC redundancy required to successfully decode one F EC block. Naturally,
we want the amount of redundancy to be as low as possible for efﬁcient utilization of
scarce wireless bandwidth. The bounds derived in this section answer the following
question: Under what conditions does header estimation require lesser FEC redundancy
for payload correction than UDP and UDP-Lite?

As mentioned before, we assume that the transmitter packetizes and transmits
symbols of arbitrary size m , where m is also the FEC symbol size. A block-based
maximum distance separable (MDS) FEC scheme capable of simultaneously correcting
errors and erasures operates at the transmitter and receiver application layers. The
transmitter packetizes each FEC block into 1 packets, with each packet having a data

payload of tip symbols. Before transmitting each packet, a header of size it}; is
appended to the packet. Thus each packet has a ﬁxed length of n H + n D symbols. The

FEC algorithm only protects the data symbols, and hence the FEC block-length is nDl

76

symbols. A total of r out of the 77,131 symbols are redundant. A packet dropped by a

protocol below the application layer is treated as a packet erasure by the FEC decoder.

Since the FEC decoder is operating at symbol level, each packet erasure will result in n D

symbol erasures. We are assuming that before decoding, the FEC decoder can identify
missing packets or packet erasures in an FEC block. This can, for example, be achieved
by transmitting an FEC-protected sequence number in each packet.

Let X and Y be two random variables which respectively characterize the number
of errors and erasures observed at the wireless receiver before FEC decoding. An MDS

code can recover all errors X and erasures Y if 2X + Y S r [80].

A.6.1.1 Redundancy Bounds on the q-ary Symmetric
Channel

The inputs and outputs of an q-ary symmetric channel (SC) are derived from an

alphabet of q = 2m symbols. An SC is characterized by a single parameter p, the

probability that a transmitted symbol :rj is received as 2:, z xj:
p = Pr {113,- is received :rj is transmitted}, for i x j.
The overall probability of a symbol :rj being corrupted over a SC is:

p50 = Pr {symbol error} = (2m — 1) p, (A25)

We now derive FEC redundancy lower bounds on UDP, UDP-Lite and header estimation

based protocol stacks operating on an SC channel.

77

A.6.l.l.1 FEC Redundancy Bound on a UDP based Protocol Stack

Traditional wireless protocol stacks perform a checksum on the entire packet and
drop all packets that fail the checksum. While the checksum is generally performed at
both UDP and MAC layers, for simplicity and brevity, we refer to a protocol stack that
drops all corrupted packets as a UDP protocol stack. Throughout this chapter, dropped
packets are treated as erasures by the wireless receiver’s application layer FEC decoder;

each dropped packet results in n D erased symbols. Since UDP drops all corrupted
packets, the number of errors in the received data are always equal to zero,
Pr {X = 0| UDP} = 1. In this section, we derive an expression for the expected value of
the number of erasures, Y , observed with the UDP protocol.

For UDP, an n D -symbol erasure will occur whenever a received packet has one or

more symbol-errors. Let Eudp,SC denote the probability of observing a UDP packet
erasure over an SC:

+
Eudpsc = 1- (1 - 050)” "D.

where p SC is the probability of symbol error given in (A25). The probability of having

1: packet erasures over UDP is:

1 (~40

Pr{k pkterasIUDP}= k](eudp,30)k(1—eudp,gc) .

 

where l is the total number of packets containing one FEC block. Then the expected

value of packet erasures is

78

E{# of pkt eraslUDP} = lcudpﬂg

=> E{# of symbol eraslUDP} = E {YlUDP} = nDlsudpﬁc.

Since Pr {X = OIUDP} = 1, E {XI UDP} = 0. Thus the average amount of
redundancy required by a FEC decoder operating on a UDP protocol stack is

rudp,SC —>— ”Dlaudpﬂ C (A26)

2 ”Dill -- (1 - 10de +710 ].

A.6.l.l.2 FEC Redundancy Bound on a UDP-Lite based Protocol
Stack

Since a UDP-Lite protocol stack drops all packets that have header errors, the

probability of UDP-Lite packet erasures over an SC is
Eudplz'te,SC = Pr {corrupt hdr} = 1 — (1 — p50)nH . Consequently, the expected
number of UDP-Lite erasures is

E{pkt eraslUDPLite} = laugh-“’50

=> E{symbol eras'UDPLite} = E {Y|UDPLite} = nDlEudplite,SC = nDl{1 — (1 — pSC)nH ].

In addition to erasures, a UDP—Lite protocol stack will also have errors in the application

layer payload. The probability of having 10 symbol errors in the total

h = nDl — E {Yl UDPLite} symbols received at the FEC decoder is

h
k

hlc

Pr{k symbol errslUDPLite} = Pr {X = k|UDPLite} = (pSC)k(1 — p50) _ .

 

 

The expected number of symbol errors is
79

E{symbol errsIUDPLite} = E {XI UDPLite} = hpsc

= (7.01— E {YIUDPLite})PSC’ = n111(1- 1930)"H pso-

Thus the total expected redundancy required to recover the errors and losses in an FEC

block over a UDP-Lite protocol stack is

Tudptz'tesc 2 nal€udpttte,sc + 213 {XIUDPLite}

2 nDlI1 " (1 — 0501"” + 2(1— PSC)nHPSCI

rudpmgsc 2 nDlIl — (1 _ pSC)nH (1_ 21,3 0 )I' (A.27)

A.6.l.l.3 FEC Redundancy Bound on a Header Estimation based
Protocol Stack
Under an ideal header estimation protocol stack, there are no erasures since all

packets are passed to the FEC decoder regardless of whether there are errors in the

headers or payload. That is, Pr {Y = OIHdrEst} = 1 => E{YIHdrEst} = 0. Based on

previous derivations, the expected number of symbol errors is

E {XI HdrEst} = n Dlp SC . Thus the total expected amount of redundancy required by an

ideal header estimation scheme that passes all packets to the FEC decoder is

7hdrest,SC —>— 2nDlPSC- (A°28)

A.6.l.l.4 Comparison of the FEC Redundancy Bounds
We now compare the minimum expected FEC redundancy of UDP and UDP-Lite

with header estimation. Let us ﬁrst compare the minimum redundancies of UDP-Lite and

header estimation:

80

min (rudpliteﬁC) - min (Thdrest,SC) = nDlIl _ (1 _ pSC )nH (1 _ 21750) _ 2pSC’I
= nDlI(1 - (1 - p30)nH)(1— 2p50)I.

Clearly, min(rudpme,50) — min(7hdr€3t,50) > 0when p30 < 0.5 . Thus

minITudpzz'tesc) > minImdrestsc) when 1950 < 0-5- (A29)

The condition p50 < 0.5 is true for any realistic wireless channel, and therefore in all

practical wireless environments header estimation should always require lesser FEC

redundancy than UDP-Lite. In fact, on most wireless channels, p SC << 0.5.
Now let us compare the minimum redundancy of header estimation with UDP:

)nH +711)

min (71.111250) - min (Thdrest,SC) = ”D1I1_ (1_ p30 — 2PSCI

= nDlI2(1— p50)—(1+(1—p30)nH+nD)

 

It can be easily shown that:

min(rudp,50) > min (UzdrestﬂC) when 1950 < 0.49 and (”H + up) 2 6, (A30)

In accordance with prior discussions, we know that the p50 < 0.49 condition is true for

any realistic wireless channel. Also, the size of a wireless packet (headers included),

71 H + n D , is always greater than6. For instance, in 802.1 lb networks, even without any

payload data, the total size of MAC, IP and UDP headers is 60 bytes.

81

 

redundant FEC symbols %

  

 

"""""""
.......
.......
.............
........
.......
.......

.......

 

 

probability 3? symbotlﬁerror,008 psc 0‘1

Figure 24. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal

Header Estimation over an q-ary symmetric channel;m = 8, q = 256 , L = 30,

ng =60, tip 2452.

Figure 24 plots the minimum expected FEC redundancies required by UDP, UDP-
Lite and header estimation for symbol error probabilities ranging between 0 and 0.1. It
can be clearly seen that header estimation requires signiﬁcantly lower redundancy than
both UDP and UDP-Lite. Note that the difference in redundancy increases with an
increase in the probability of symbol error. For p50 = 0.0013, the percentage of
bandwidth used for redundancy is approximately0.25% , 7.6% and 47.9% for header
estimation, UDP-Lite and UDP, respectively. The FEC redundancy difference becomes
much wider for p30 = 0.01 , where header estimation, UDP-Lite, and UDP respectively
use 2.02% , 47.04% and 99.48% of bandwidth in redundant symbol transmission. For
p50 = 0.06 and higher, the gap between UDP-Lite and UDP narrows with each using
78.15% and 100% of bandwidth for redundancy, while header estimation requires
4.84% redundancy - a remarkable goodput improvement of approximately 73% over
UDP-Lite and of approximately 95% over UDP.

82

Thus, while header estimation always requires lesser F EC redundancy, the advantages
are dramatic for somewhat high error-rate channels, e.g., the 5.5 and 11 Mbps 802.1 lb
channels. Later in this chapter, we assert these theoretical ﬁndings using a practical
header estimator that is tested using actual wireless error traces. In the next section, we

derive similar bounds for the Gilbert channel.

A.6.1.2 Redundancy Bounds on the Gilbert Channel

Consider the one-hop symbol-level Gilbert wireless channel of Figure l. The Gilbert
channel (GC) [81] has been used to model many wireless channels [9]- [l 1], [l3]- [15],
[l8]- [20], [26]. In this section, we compare minimum expected FEC redundancies of

UDP-Lite and UDP with header estimation over a GC.

A.6.1.2.l Bound on a UDP based Protocol Stack
Let EudpGC denote the probability of observing a packet erasure on a UDP protocol

stack operating over a Gilbert channel (GC). Then Eudp,GC is the probability of having
one or more symbol-errors in the received packet, and can be expressed as:

)nH +n )nH +nD-1 (A.3l)

D
Eudpgc = Pr {corrupt pkt} = 1 — 779(ng - nbprIng

= 1‘ ”gipgglnH +nD—1

=1—(1 — 7rb)(1— 7rb(1— u))nH+nD—11

where try and 7rb respectively represent the steady-state probabilities of staying in the

good and bad states and p is the Gilbert channel’s memory as deﬁned in (A.4). Using the

derivations in Section A.6.l.l.1, we can express the average amount of redundancy

required by a FEC decoder operating on a UDP protocol stack as

83

711.1ch 2 nDl€udp,ac- (A32)

A.6.1.2.2 Bound on a UDP-Lite based Protocol Stack
A UDP-Lite based protocol stack drops all packets that have header errors. Thus the

probability of packet erasures, 51tte,GC , of UDP-Lite over a GC is:

"H —1

)nH—l 71' (A33)

5106,00 = Pr {corrupt hdr} = 1— 7rg(pgg)nH — nbpngpgg) = 1 — (ng g-

Using derivations of Section A.6. l. 1.2, the expected number of UDP-Lite erasures is

”H —1

E{Y|UDPLite} = Wish-two. = 7101 1— (p99) «9 .

 

 

In addition to erasures, a UDP-Lite protocol stack will also have errors in the application

layer payload. The probability of a symbol error over the Gilbert channel is

p00 = PrIsyrnbol errIUDPLiteI = ﬂgpgb + ”bpbb = 70- (A34)

Then the expected number of UDP-Lite symbol errors is
E{X|UDPLite} = nDlIl — Ell-mac mm,

and the lower bound on the total expected redundancy required to recover the errors and

losses over a UDP-Lite protocol stack over a GC is

TudpliteﬂC Z nDlgliteGC + 2E {XIUDPLite} (A35)

2 Hal 51ite,GC' + 2(1 - Ettte,ac)7TbI-

84

A.6.1.2.3 Bound on a Header Estimation based Protocol Stack
Using the reasoning of Section A.6.l.l.3, the total (expected) amount of redundancy

required by an ideal header scheme over a GC is
7hdrest,GC Z 27’0le - (A36)

Comparison of the above bound with the bounds in (A32) and (A35) reveals that
minimum expected redundancy required by header estimation is independent of channel
memory. The redundancy is simply a function of the probability of error. Thus the
performance of header estimation will remain unchanged with changes in channel
memory. On the other hand, the redundancy required by UDP and UDP-Lite is high for
low-memory channels and the redundancy decreases with an increase in channel

memory.

A.6.1.2.4 Comparison of the FEC Redundancy Bounds

First, let us compare minimum expected FEC redundancies of UDP-Lite and header

estimation:
min (Tudptz'tecc) - min (7hdrest,GC) = “Dianaac + 20131 (1 - 51ite,GC)pGC - 201911700
= "DlIelite,GC(1— 2pgc)I > O for p00 < 0.5.

That is,

min (Tudplz'teﬂC) > min (ThdrestﬂC) when PGC' < 0'5 - (A37)

This condition is similar to the one derived for the q-ary symmetric channel, implying
that header estimation should perform better than UDP-Lite as long as the average
probability of error is less than 0.5. For any reasonable Gilbert wireless channel, the

probability of symbol error should be considerably smaller than 0.5 .
85

We now compare minimum expected redundancies of UDP and header estimation

over a GC:
min (Tudp,GC I _ min (ThdrestﬂC I = "Dlgudpﬂ C _' 271011900,

where Eudp,GC and p00 are given in (A31) and (A34), respectively. Plugging in the

values of EudpGC and 1700 gives

min (Tudp,GC ) _ min (ThdrestﬂC I

nH+nD—1

= 77,01 :1 — ﬁg (pgg )nH +710 " ”bpbg (pgg) _ 27rgpgb _ 2”(217%

= nDzI—l + (mg + pngIZ ‘ (pgg)nH +”ID—III

[2 __ (10%)..” +nD—1]

l

=nDl —1+ Pb

g
u— + p1
pbg + pgb g

_WL‘LI2—(pgglnHMD—1I
9

 

 

 

 

= 710le

 

 

Based on the above comparison, we obtain the following condition:

min (TudpGC) > minI7hdrest,GC) when 7rg —r 1. (A38)

The above inequality is generally true because on any practical wireless channels

"H + nD will always be greater than one symbol. Also, ng the overall probability of

staying in the error-free state is generally very high. Thus FEC comparison of UDP
versus header estimation for the Gilbert channel essentially converges to the same
conclusion as the symmetric channel: Unless the channel has an unreasonably high error-

rate, header estimation will always utilize wireless bandwidth more efﬁciently than UDP.

86

 

 

 

  
 

 

 

  

 

 

      
 

 

 

 

 

 

 

 

1‘ T , x1.'r' - - 3"-": - -
o -e— UDP o '
i —a— UDP-Lite l i
— —A— Header Estimation . - 80
a 360, »a- uonum
O Q -A- Header Estimation
11‘ 1‘: ’
‘E ‘5 4°

20
8 = = - - 8
L - - = = = = = = = - - h 10.

. tour-r ‘ ‘ ‘ 0M. ‘ Orr“. ‘ ‘ -
channel memory, a channel memory, u
(a) me = 7Tb = 0-001 (0) PGC = 7Tb = 0-01

Figure 25. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal Header
Estimation over a Gilbert channel;m = 8 , L = 30, it]; = 60, n D = 452.

Figure 25 shows the percentage of redundant symbols in each FEC block for UDP,
UDP-Lite and header estimation over a Gilbert Channel. The redundancy is plotted
against channel memory while ﬁxing the probability of error. The leftmost points in
Figure 25 represent the memory-less case. It can be seen that header estimation always
requires lesser FEC redundancy to recover corrupted packets than UDP and UDP-Lite.
This difference in the amount of required redundancy gets more signiﬁcant with an
increase in the probability of error. In general, due to the large number and bursts of good
symbols in a high memory channel, the amount of redundancy required by UDP and
UDP-Lite decreases with an increase in channel memory. In all cases, the redundancy
required by header estimation is extremely low and independent of the channel memory.
Thus, while the design of FEC schemes for UDP and UDP-Lite need to take channel
memory into account, an accurate header estimator can be deployed on a wireless

network without any knowledge of the underlying channel’s memory.

87

A.6.l.3 Discussion

At this point, we have theoretically veriﬁed that a protocol employing header
estimation should require lesser FEC redundancy at a wireless receiver than UDP and
UDP-Lite. This naturally brings us to the practical question of how to realize an accurate
header estimation technique for wireless environments. The following section addresses
this question by designing a header estimation scheme which utilizes the channel models

proposed in preceding chapters.

A.6.2 Maximum-Likelihood Header Estimation
Framework

The maximum-likelihood estimation scheme proposed in this section only estimates
the critical header ﬁelds (CHF) that can uniquely identify a UDP multimedia session at a
receiver and are not liable to change during the course of the multimedia session. In our
experiments, we treat the following as CHF: (i) destination MAC address, (ii) source IP
address, (iii) destination IP address, (iv) source port, and (v) destination port.
Nevertheless, all mathematical treatment is provided for a general case of N critical
ﬁelds.

Under the proposed methodology, a list of active CHF (i.e., CHF of sessions that are
currently being received) is provided to a header estimation module by the multimedia
application(s). On receiving the ﬁrst error-free packet of a new session, the multimedia
application adds the new session’s CHF information to the list of active multimedia
sessions. Whenever a corrupted packet is received, a likelihood score of its critical ﬁelds
is computed with respect to each entry of the CHF list. The CHF rendering the highest

likelihood are chosen as the estimated CHF of the received (corrupted) packet.

88

 

L Application Layer I.................

 

 

I Pkt after network and transport layer processing 1

 

 

 

 

 

 

 

   
 

Header Estimation
Module

dst MAC or dst
IP address of
local receiver

 
 

Newark and T: 1111511011 55‘.
Network and l .::" :i-.~~.-.'_‘.,;.=1I “3..
' I N
Transport Layers ( 1.01:1. T412 1".9 5%
7 Corrupt UDP pkt 5%
E”°"f'°° PHI I with estimated CHF 5
Corrupt UDP pkt . 3
which has either :
o'

 
 

:1th L21}.Cl‘ Wirimu!
L‘i‘h‘l’ Hit 131mm

  

 

 

Reccjvedpkt aﬂer :0...OD0.000000000000000000000?

physical layer Updated channel model parameters
processrng

l Physical Layer TpT

Figure 26. Interactions between the UDP-based header estimation module and
different layers of a wireless receiver’s protocol stack; modiﬁed protocol stack layers
are shown in different colors and dotted lines represent communications that are not
related to packet reception.

 

 
 

 

Wireless
Channel

 
   
  

The main objective of header estimation is to pass maximum number of (error-free
and corrupted) packets to the application layer using only parameters of a MAC layer bit-
error channel model. We defer discussion on how an application can make use of the

corrupted packets to subsequent sections.

A.6.2.1 Functionality at and below a Receiver’s MAC layer

Figure 26 outlines the interactions between the proposed header estimation module
and different layers of a wireless receiver’s protocol stack. The packets after wireless
physical layer processing are passed to the MAC layer which veriﬁes the packet’s
checksum to determine if the received packet has errors. Instead of dropping a corrupted

packet, the packet and its checksum information (i.e., packet passed/failed the checksum)

89

is passed to a module that checks the transport type, the destination MAC address, and

the destination IP address of the received packet. Header estimation is invoked only for

UDP packets, while TCP and network layer trafﬁc are handled by the conventional

protocol stack. Furthermore, the MAC layer does not attempt retransmission—based

recovery of corrupt UDP packets, i.e., ACKs are sent even for corrupt UDP packets.

Instead of MAC retransmissions, header estimation with application layer FEC is used to

recover from errors in the packet. Such retransmission-less recovery is well-suited for

delay-sensitive real-time communications.

Header estimation is invoked when all of the following conditions are satisﬁed: (i) a
corrupt UDP packet is received, (ii) either the destination MAC or the destination IP
address matches the local receiver’s addresses, and (iii) there are one or more active
multimedia sessions on the receiver. Three scenarios exist when a packet is received:

(i) Packet is error-free: No need to perform header estimation.

(ii) Packet is corrupt and the packet is intended for the local receiver: Header estimation
is invoked and an ACK is sent to the last hop network entity to avoid MAC layer
retransmissions.

(iii) Packet is corrupt and the packet is not intended for the local receiver: This case
represents a false alarm when, due to channel errors, either destination MAC or
destination IP of a packet not intended for the local receiver gets mapped to the MAC or
IP address of a receiver. Due to the receiver-based nature of the present scheme, false
alarms cannot be detected at a receiver’s MAC layer. Thus header estimation is invoked
even for false alarm packets, and a MAC layer ACK is sent to the last hop network

entity.

90

A.6.2.2 The Header Estimation Module

The header estimation module employs a likelihood function to ﬁnd the most likely
transmitted CHF given: (i) the received CHF, (ii) a list of active CHF, and (iii)
parameters of the MAC layer error channel model. The list of active CHF is provided by
the receiver’s application layer as shown in Figure 26. The transmitted/active CHF that
renders the maximum value of the likelihood function is chosen as the estimated CHF.
The corrupt packet and the estimated CHF are passed to higher layers. In essence, the
present header estimation problem is the estimation-theoretic problem of maximum-

likelihood (ML) estimation of known parameters in noise [79].

A.6.2.3 Processing at a Receiver’s Network, Transport and
Application Layers

The corrupted packets along with the estimated CHF are passed by the header
estimation module to the receiver’s network layer. The network layer performs its regular
operation with two modiﬁcations: (a) instead of the (possibly corrupted) IP addresses in
the network layer header, the estimated IP addresses are treated as the true IP addresses;
(b) network layer checksum on IP headers is disabled. At the UDP layer, source and
destination ports are taken from the estimated CHF and the corrupted packets are passed

to the (estimated) multimedia application.

A.6.3 Likelihood Functions for Header Estimation

In this section, we derive header estimation likelihood functions for two previously
proposed classes of MAC layer channel models, namely the full-state Markov (F SM)

model and the multifractal wavelet model (MWM). Let A,- = {x,1,xz-2,...,:c,-N} denote

91

an ordered set of N critical header ﬁelds for an arbitrary multimedia sessioni. As
mentioned before, in this chapter we have N = 5 , where 1152-1, 552-2, 5132-3 , 552-4 and 2:15
correspond to the destination MAC, source IP, destination IP, source port, and destination
port of multimedia session 2', respectively. A receiver receives M 21 simultaneous
multimedia streams. Let Q = {A1,A2,...,AM} denote an unordered set of CHF each
corresponding to a currently active multimedia sessions on a given receiver. Note that
each A,- = {xZ-1,:r,;2,...,a:,-N} E Q is in turn a set of critical ﬁelds corresponding to a
given session, where the ﬁrst subscript of a: is the session index and the second subscript

is the CHF index. Let Ar denote the set of CHF of a received packet, i.e.,

A, = {5:5,:E,3,...,f,.7\7} is a possibly corrupted version of an A; E I). Let A: denote

the estimated CHF.
Let X represent a stochastic MAC layer channel model characterizing the bit-error

channel over which a receiver is receiving it packets. Then, for a critical header ﬁeld 93,-]-

(i.e., critical ﬁeld 3' for a multimedia session 2'), our objective is to derive the likelihood
function Pr {EEI‘TU’X} in terms of the parameters of X. In other words, given

parameters of a channel model X , we want to ﬁnd the likelihood that a transmitted

critical header ﬁeld :rz-j (after possible channel corruptions) was received as 53;}. We

assume that the likelihood functions of all CHF are independent. Thus Pr {:6}; I zij,X} ’s

for each critical ﬁeld can be ascertained independently and then the overall likelihood

considering all critical ﬁelds is:

92

N
PrIEIA21XI= H Pr {EEIIIIU,X}, (A.39)
1:1

where 1 g i S M is the session index and j is the CHF index. Once Pr {A7IA,,X} has

been computed for all 1 g i g M , the CHF estimate A; is simply the A,- that renders

the maximum Pr {KI/XIX}.

The challenge of this ML-based header estimation lies in the derivation of a

likelihood function Pr {:EEIzz-j,X} of a critical ﬁeld, given parameters of a wireless

channel model. In the following sections, we derive likelihood functions for FSM and

MWM channel models.

A.6.3.1 Header Estimation Likelihood Function for FSM
Chains

In this section, we derive the CHF likelihood function for a 1: -th order FSM chain

Xn , where n is the bit time index. For clarity, in this chapter we deviate slightly from
the previously used FSM chain notation and the transition probability between F SM

states 2' and j are represented as Pr {2' —r j}. We focus on one arbitrary critical ﬁeld
xij E A,- by ﬁxing the CHF index j. Henceforth 2:,- and 2:}: respectively represent the

critical ﬁeld j from A,- and the received critical ﬁeld j. Let us deﬁne a new variable:

where EB represents a binary exclusive-OR operation. 2,- comprises bit locations that are

different between 27,“. and 23,-. Assuming that the different bits are in fact the errors

93

introduced by F SM channel, Pr {:2}? I 23,,Xn} is likelihood of observing error pattern 2,-

on the channel.

Recall from previous discussions [see Figure 15] that an FSM chain in state 11,- can
only transit to two FSM states, 212,- + Oor 222,- +1; all FSM states are mod 2k . Thus,
when the bit added to 21), is 2; Us + 1] , we get the Pr {1),- —> 211, + z,- [k + 1]}. From state
212,- + z,- [k + 1] , the process will transit to
2(2122- + z,- [k + 1]) + z,- [k + 2] = 412,- + 22,- [k + 1] + 2,- [h + 2]. Using similar logic, the

process will next transit to

2(411,+2z,[k+1]+z,[k+2])+z,-[k+3]=80,-+4z,[k+1]+2z,-[k+2]+z,-[k+3].

A recursive relationship in the transition probabilities can be identiﬁed at this point.
Generalizing the recursive relationship yields the header estimation likelihood function

for a 1: -th order F SM chain as follows:

 

.Prlﬁ I max" = FSM} = «,1. Pr{v, —. (2., + 2,1]. + 1}) mod 2“} (A41)
r a—2 I
[Tl—1’01 + Z 2“"1‘b2, [k +1 + b] mod 2’C
W—k—l b=0
Pr< I ,,
a=2

a—l
2%,- + Z 2““1—bz, [k +1 + 0] mod 2’C

b=0

 

 

 

 

where n is the bit time index, 1),-is the FSM state represented by the ﬁrst 10 bits of z;,
W represents the number of bits in the critical ﬁeld, ax represents the steady-state

probability of being in FSM state 2: , Pr {2: —> y} is the transition probability of going
94

from FSM state 2: to state y, and 2,123] represents the value of z,- at the 2:-th bit
location.

The F SM likelihood ﬁmction answers the following question: What is the probability
that channel errors have changed 2:,- to f; ? Since 2,- = 23’, 69 2:, denotes the bit pattern
that would be observed if the channel changed 2,- to 5:}? , we have to ﬁnd the probability
that the channel Xn produced the bit-error pattern 2,. Clearly, the FSM channel’s initial
state must be 0,- because 0,- denotes the FSM state represented by the ﬁrst memory-

window of 2,, leading to the ”0,: term. This initial state must be followed by a unique

sequence of state transitions that result in the bit-error pattern 2,. To quantify the

probability that an FSM channel will follow this “unique state sequence”, recall that in
one transition the F SM process can only transit to two possible states. Also, due to the
Markov property, the probability of transiting to one of the two possible states is only
dependent on the present state. The ﬁnal likelihood score of 2:,- is hence characterized by
a multiplication of the transition probabilities of this unique state sequence, as

represented by the multiplicative Pr {2: —-» 3;} terms in the likelihood function.

A.6.3.2 Header Estimation Likelihood Function of MWM

Recall from Section A.3.8 that the multifractal wavelet model (MWM) uses
expectation-maximization to model two random variables: (i) the scaling coefﬁcient at

the coarsest scale U joako , where jg and [to represent the coarsest scale and time,

respectively; (ii) AJEk random variables deﬁned over a [— 1,1] interval, j and k

representing the scale and time, respectively. In previous chapters, we showed that the 11

95

Mbps bit-errors have long-range dependence which can be captured using the MWM.
Therefore, in this section we derive the likelihood function for an MWM. Previously we
used the bit-error sequences of zeros and ones to train the MWM. Derivation of a
likelihood function for an MWM trained using such a strategy is somewhat difﬁcult.
Consequently, in this chapter we train an MWM using the number of bit-errors in a
packet as the training sequence.

Let Xn denote the MWM process, where n represents discrete packet time

instances. It was shown in [61] that due to the use of the Haar wavelet transform, the

MWM-predicted number of errors eln] in packet 22 can be expressed as

"% m—l -
eln] = 2 Um,” for n = 0,1,...,2 . If the packets have a ﬁxed srze C , then the

probability of bit-errors in the packet received at packet time index n is

—m
p112] = elm/C = 2 éUm’n/C . Now note that each received bit is basically a value

taken from a binary time series of length l, i.e., {2:[2']}$=0, :c[2]€ {0,1}, and 2'

represents the discrete bit time index. Based on equation (A.40), 23:21.2, [mlyields the

total number of bits that are different between 2:7. and $2, i.e., the hamming distance

between 23‘; and 2,. If the bits of z,- are in fact the errors introduced by a MWM wireless

channel then given a probability of having 2:73:12,- [272] errors is (p[n])zm
. . . W . .
and the probabrlrty of havrng W — Zm=1zi [ml correct bits rs

W
(1 — pln])W_Zm=lzi[m]. Likelihood of the bit pattern 2,- is then a multiplication of the
above events. Thus the MWM likelihood function is as follows:

96

W W
_ Z z-[m] _ W— 2 21171]
Pr {5; Ix,,x,, = MWM} = I2 "72%,, /C)m=1 . I1—(2 %Um,n/C)] m=1 ' ,

(A.42)

m—l

where n = 0,1,...,2 is the packet time index, 772 is the number of scales used to

train the MWM, C is the number of bits in a packet, W is the number of bits of in the

critical ﬁeld, and z,- is given in (A.40), and Um,” is the scaling coefﬁcient at scale m
and time 12.

Similar to (A.41), the MWM likelihood function renders the probability that the bit-

error pattern 2,- = 2?; 69 2:,- is observed on an MWM channel. Since the probability of bit-

—m
error in packet 22 is given by 2 /3Um,n /C , the probability of observing 237:1 z,- [222]
W
W
bit-errors in packet n is (191221) 2 Zilm] 2 I2 AUmm/CImzl z . Treating error-

m=l

free and corrupted bits as the two outputs of a Bernoulli random variable yields the
MWM likelihood expression.

Once Pr {2:7 I 2,,Xn} ’s for all currently active sessions, 1 g 2' S M , are computed
using the FSM or MWM likelihood functions, the session 2' that renders the maximum

Pr {EIA,,X} is chosen as the estimated CHF, A: . We also introduce a provision that a

packet is dropped if the maximum likelihood is less than 0.25 because in such a case the

estimation conﬁdence is very low.

97

A.6.3.3 Extending the FSM Likelihood Function to the
CCM

The complexity of an MWM to generate a length 1 sequence is linear. However, the
complexity of FSM chains grows exponentially with respect to memory-length. Due to
their exponential complexity, F SM chains are unreasonably complex to be employed in
the header estimation framework. Therefore, in this section we extend the FSM
likelihood ﬁrnction to the CCM so that the approximating CCM model can be used for
header estimation instead of the FSM model.

Let Sm denote the aggregate CCM state that contains FSM state2:. Since the CCM

aggregates FSM states, using (A.4l) the likelihood function for the CCM can be rewritten

as:

“if? I waxy. = CCM} = "5v,- Pr{S”2' ’* S2'vt+zz'lk+1l}

W—m—l
H FT S a_1 a—2 a—l—b —* S a a—l a—l—b 7
0:2 2 ”2+ 2 2 Z, [Iii-1+0] 2 ”2+ 2 2 Zz'Ik'i‘l'i'b]
0:0 0:0

where the subscripts of all aggregate states Sm are modulo 2m and all other parameters
are deﬁned in Section A.6.3. l. The low-complexity of the CCM clearly makes it a natural
alternative to FSM chains in the present header estimation methodology. In all
subsequent performance evaluations of the header estimation methodology, we use
CCMs instead of FSM chains and show that the likelihood function rendered by the CCM

is highly accurate.

98

A.6.4 Performance Evaluation of the Header Estimation
Framework

A.6.4.1 Experimental Setup

We use the wireless traces described in Section A.4.1 to simulate the wireless
channel. For video evaluations, we report throughput, FEC and PSNR results for ﬁve
multimedia receivers. Each receiver receives multiple video streams with a maximum of
ﬁve video streams. At each physical layer data rate, we repeat video experiments using
three distinct wireless trace-sets that were collected at different times of day. Video
experiments for each trace-set are repeated 25 times starting at different randomly
selected locations inside the error traces. Thus the throughput and FEC results for 2, 5.5

and 11 Mbps are each averaged over 3 x 5 x 5 x 25 = 1, 875 received video streams. Due

to the high complexity of video decoding, for each trace-set the PSNR results are
reported for one (randomly selected) video experiment, that is, PSNR results for 2, 5.5
and 11 Mbps are each averaged over 3 x 5 x 5 = 75 received video streams.

For each packet transmission, a 512 byte packet (452 bytes of video payload and 60
bytes of headers) was corrupted using the bit-error traces. The models used for likelihood
computation on all receivers were trained using error traces which were not used in the
video experiments. In accordance with the results of Sections A.4.3.l and A.4.3.2, F SM
chains of order-9 and order-10 were employed for the 5.5 and 2 Mbps bit-error processes,
and an MWM trained using the number of bit-errors in a packet was employed for the 11
Mbps process. Each FSM chain was folded to a 5-state CCM.

Video sequences were compressed using the H.264 video coding standard [83], [85].

The sequences had a QCIF frame size and were encoded at a frame-rate of 30 fps. The

99

 

 

streams were encoded at different source coding bitrates ranging from 100 kbps to 1
Mbps. A slice mode with ﬁxed number of 452 bytes per slice was used for encoding [83].
Intra frame period was set to 12, i.e., each group of pictures (GOP) had 12 frames.
Varying numbers of video streams were assigned to the wireless receivers. Transmission
of packets from each stream was simulated in a round robin fashion according to source
bitrates. In order to achieve successful video decoding, in the simulations we introduced a
provision that the ﬁrst frame of the video sequence (i.e., the very ﬁrst I-frame of the ﬁrst

GOP) is always received correctly.

A.6.4.2 Throughput Performance
The term throughput here refers to the ratio of the total number of packets relayed to
the receiver’s application to the total number of packets sent by the sender’s application

layer. That is, throughput comprises of both error-free and corrupted packets. The

percentage packet drop rate is (1 — throughput) x 100. Figure 27 outlines the packet

drops incurred by UDP Normal, UDP-Lite and UDP with header estimation at 2, 5.5, and
11 Mbps. The results are averaged over all receivers and multimedia streams and hence
the packet drops are referred to as average packet drops. The leftmost points in Figure 27
(a), (b), and (c) depict the simplest case of each receiver is receiving only one multimedia
stream. The number of video streams per receiver is then incremented. More than one

multimedia per receiver is an important scenario for video conferencing applications.

100

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 1 .3 5.5
o\ o\
8 0.9 1 84.5-
gas A UDP Normal * 193 Z A UDP Normal
- o.7 UDP Ute 1 .. 35 . UDP Ute I
g UDP Hdr Est g UDP Hdr Est
o 0.6 o 3~

(U
a , a 3
m 0.5 m 2.5 ’ -1
or c»
9 0' n H Er L] E E I
2 o 3 H 1: 1 ‘3 1 5» .
~' ~ 0 >

0H A A > W

. a .4 .2 3 .4
Video streams per receiver vrdeo streams per receiver

 

 

 

 

 

 

 

 

 

 

(a) 2 Mbps (b) 5.5 Mbps

“K A A 2. ..

o\°25

(D

a

$20 A- UDP Normal

., UDP Lite

3 UDP Hdr Est

015 n r-

8.10.

E

01

a 5+- _A/4

L A A

S/ v Y Y

 

 

5

d

video stream; per receicer
(c) 11 Mbps
Figure 27. Average packet drops for UDP Normal, UDP-Lite and UDP with Header
Estimation at different data rates and for varying number of video streams per receiver;
each point is averaged over 3 x (# of video streams) x 5 x 25 received video streams.

A.6.4.3 Comparison of Packet Drops

It can be clearly seen in Figure 27 (a), (b) and (c) that header estimation always incurs
lesser packet drops than normal UDP and UDP-Lite. The header estimation packet drops
include: (i) packets that were dropped because both the destination IP and the destination
MAC address were corrupted, and (ii) packets whose critical ﬁelds were incorrectly
estimated (resulting in false alarms). At 2 Mbps, header estimation packet drops are
approximately 0.2% , as opposed to approximately 0.4% and 1% in case of UDP-Lite

and normal UDP. Since the 2 Mbps channel has receivers with very low packet error

10]

rates, the margin of improvement is small. At 5.5 Mbps, UDP with header estimation
provides approximately 4% and 2% throughput improvements over normal UDP and
UDP-Lite. Due to the very high data rate at 11 Mbps, the header estimation packet drops
increase to about 3%, but this packet drop rate is still substantially lower than that of

normal UDP (z 15%) and UDP-Lite ( z 30% ).

A.6.4.4 False Alarm Rate

A false alarm is a packet that is not intended for a multimedia session, but is relayed
to that session. There are three sources of false alarms: (i) due to channel errors, either
destination MAC or destination IP address of a packet (not intended for the local
receiver) gets mapped to the MAC or IP address of the receiver; (ii) a corrupted packet is
inaccurately estimated; (iii) a corrupt non-multimedia UDP packet is received when one
or more multimedia sessions are active.

For the ﬁve streams per receiver case, cumulative false alarm rates are 0.07%,
0.52% , and 1.3% at 2, 5.5 and 11 Mbps. While these false alarms are quite low, they
must be detected because they can desynchronize the video and/or FEC decoders. To
detect false alarms, we protected the 2 byte H.264 slice sequence numbers (in the RTP
header, with one slice per packet) with 4 bytes of redundancy to ensure that these
sequence numbers can always be recovered at the receiver. A receiver dropped all
packets whose slice numbers were much larger or smaller than the next/expected slice
number. For applications which do not have a slice/packet sequence number, a small
incremental packet sequence number with parity bytes can be easily inserted into each
packet by the sender’s application layer. This sequence number based scheme also

provides erasure locations (i.e., dropped packets) to the FEC decoder.
102

A.6.4.5 FEC Performance

We now evaluate the amount of F EC redundancy required by the application to
recover from errors and packet drops in the multimedia content. Since the corrupted
packets contain many error-free bytes, this error-free data should facilitate application
layer FEC decoding. As mentioned earlier, for an MDS F EC code if a codeword has 2t
number of redundant symbols then a maximum of t transmission errors in that block can
be corrected [80]. For the same amount of redundancy, 2t erasures can be recovered. In

the UDP-Lite and UDP with header estimation scenarios, for an F EC codeword with el
erasures (i.e., packet drops) and e2 errors, if el 3 2t then the FEC decoding algorithm

can recover the el erasures. Aﬁer erasure decoding, C2 errors can be corrected if

e2 5 I(2t — el)/2I.

We simulate MDS forward error correction for all three (UDP Normal, UDP-Lite,
UDP with header estimation) protocol variants. A codeword length of N = 30 bytes is
used for all experiments. Each codeword is composed of one byte from a different packet,
where each packet consists of 452 bytes of data payload. Thus each packet contributes to
452 separate RS codewords, and each codeword spans over 30 packets. The FEC
construction is shown pictorially in Figure 28. For all protocol stack variants, we treat
packet drops as erasures in the received codewords. Note in Figure 28 that a packet drop

results in an erasure in 452 codewords.

103

pkt hdr pkt payload=452 bytes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A
pkt num "W‘ ""I Apkt drop Will
1' "nu... ... introduce an
erasure in all the
452 RS codewords
/ r 7 y
2 ///////// / lllllllll
/ 2.242 f
3L IIIIIIIII ‘B
24.
7° a: 7:1 : g a
m m w : o O
8 8 8 : a a
o. ca. :2. . o o
a a 9 : as
9. 9. 9. : ' a
a. a. a. . :45, A
~ N w : 11:" a
30' IIIIIIIII Bl

 

 

 

 

 

 

 

Figure 28. Codeword construction for vieo F EC simulations.

Since normal UDP does not have corrupted packets, all parity bytes are used for
erasure decoding. Unlike normal UDP, FEC codewords for UDP with header estimation
have errors due to corrupted packets and erasures due to incorrect estimations and/or
false alarms. Similarly, FEC codewords for UDP-Lite have errors due to corrupted
packets and erasures due to packets with corrupted headers. For performance evaluation,

we deﬁne a simple measure called decodable probability:

pd=(decodable codewords received)/ (codewords transmitted),

where a codeword with e1 erasures and 62 errors is decodable only if 215 2 61 + 2e2.

Clearly, 0 g pd S 1 and pd = 1 implies that all received codewords were successﬁrlly

decoded.

104

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      

 

 

 

 

 

a 1 l a
a 0.999> €0.99-
9 0.996” -°
9 e
0-0994- goes»
2 9 >
.o 0992- .o . .
<0 (“0.94
'§ 0.99 ~ g
0 0.999» 03°92
1: '0
c 0.999 <0
a: a: 0 9 :I
9 0.994 S
m A one Normal 90.99 19: L109 Normal
(>6 0.932 a UDP Lite 3 UDP Lite X
0 98 UDP Hdr Est UDP Hdr Est
' 19 19 20 22 24 29 29 °'°‘is 19 20 22 24 26 29
message bytes per block message bytes per block
(a) 2 Mbps (b) 5.5 Mbps
g .0»
'5 0.9
:0
.o
9 0.
a
2 0.
1:
<0
g 0.
o
1: 0.
0
m0.
‘8
0: F009 Normal
> 0'3 UDP um
m
3 UDP Hdr Est
0% 29

18 messzgige byztzes perzblock 26
(c) 11 Mbps
Figure 29. Average FEC redundancy required by UDP Normal, UDP-Lite and UDP with
Header Estimation at different data rates of an 802.1 lb LAN; each point is averaged over
3 x 5 x 5 x 25 = 1875 received video streams.

Figure 29 outlines the decodable probability as a function of the number of message
bytes in an RS codeword for the ﬁve streams per receiver experiment. At each data rate,
the results are averaged over all the experiments. From Figure 29 (a), it is clear that at 2
Mbps normal UDP and UDP-Lite require 6 bytes per RS codeword for almost 100%
recovery; that is, approximately 20% bandwidth is wasted in redundancy. UDP with
header estimation achieves almost error-free recovery even if two redundant bytes are
sent per 28 message bytes - approximately 7% bandwidth is used for redundant
symbols. From Figure 29 (b), it can be observed that, due to the increased error-rate at 5.5
Mbps, the performance gap between UDP with header estimation and the other protocols

105

widens. Normal UDP and UDP-Lite waste approximately 33% bandwidth on FEC
redundancy to achieve almost 100% recovery. UDP with header estimation achieves
ahnost 100% recovery by wasting merely 20% bandwidth on FEC redundancy. Figure
29 (c) shows that at 11 Mbps the improvements provided by UDP with header estimation
are quite signiﬁcant; UDP with header estimation requires approximately 27%
redundancy for almost 100% recovery, while both normal UDP and UDP-Lite require

53% redundancy. Thus header estimation salvages the high error rate 11 Mbps channel.

A.6.4.6 Video Performance

In this section, we present results for the 5 streams per receiver experiment, with a
ﬁxed rate FEC having two redundant bytes per RS codeword of 30 bytes. The average
GOP-by-GOP peak signal-to-noise ratio (PSNR) plots at different data rates are given in
Figure 30. All PSNR results are averaged over 75 received video streams. Since we
allow the very ﬁrst video (I) ﬂame of the ﬁrst GOP to be received without any errors and
losses, PSNR of the ﬁrst GOP is not plotted. The dotted line in Figure 30 represents
PSNR of error-free video, which provides a performance upper bound for the protocols
under consideration. PSNR of UDP with header estimation is the closest to the PSNR of
the error-free video at all data rates. At 2 and 5.5 Mbps, respective average PSNRs of
normal UDP and UDP-Lite are approximately 10 dB and 25 dB lower than the PSNR of
UDP with header estimation. However, at 11 Mbps the PSNR of UDP with header
estimation is approximately 25 dB higher than the PSNRs of normal UDP and UDP-Lite,

both of which render equally and extremely low PSNRs at 11 Mbps.

106

 

 

0000000 ‘° ‘>

9 New:

   
   

0:

z i “2‘3

K’ K’; S
8:35 S g,

E 92

925 . a)

a 5:

ll ‘
. 'l'llmun.
A m. Ill-I.
-— Error-free J ‘ — Error-tree [:3
A UDP Normal ‘ A UDP Normal
UDP Lite 15 8 UDP Uta
UDP Hdr Est UDP Hdr Est
,, ,

8

 

 

 

a‘.
l
r
I

 

 

 

mGOP
(c) 11 Mbps
Figure 30. Average PSNR of video sequences for UDP Normal, UDP-Lite and UDP with
Header Estimation using a 30 byte RS codeword with 2 parity bytes; each graph is
averaged over 3 x 5 x 5 = 75 received video streams.

A.6.5 Discussion

In this chapter, we developed an effective header estimation framework for wireless
multimedia applications. The proposed framework used the channel models proposed in
preceding chapters to provide signiﬁcant improvements in wireless bandwidth utilization.
In the following chapter, we show another use of the proposed channel models by
quantifying the simulation and analysis inaccuracies that are incurred if channel memory

is ignored.

107

 

 

CHAPTER A.7 IMPACTS OF IGNORING
CHANNEL MEMORY ON ANALYSIS AND
SIMULATION OF WIRELESS SYSTEMS

Results of the preceding chapters have established that the MAC layer wireless bit-
error channels have memory. We have also showed that accurate and low-complexity
models can be developed to capture the underlying channel’s memory. The burstiness
and the consequent memory of wireless channels are well-accepted concepts in the
wireless research community. However, much of the contemporary research continues to
use memory-less binary-symmetric and 1St order Gilbert channels for bit-level theoretical
analysis and experimental evaluation of wireless protocols and applications [86]- [100].
The impacts of these simplistic bit-error channel models on the design and evaluation of
wireless systems are largely unexplored.

In this chapter, we quantify the impact of bit-level Markovian channel memory on the
performance of two commonly-used and very meaningful wireless performance metrics:
the expected goodput of an unreliable protocol and the expected number of per-packet
retransmissions for a reliable wireless protocol operating on a single-hop wireless
network. Due to the analytical intractability of the multifractal wavelet model, we focus
solely on the Markov-based channel models considered in this thesis. We derive the two
protocol performance metrics in terms of the parameters of four channel models of
varying memory-lengths, namely a memory-less binary-symmetric channel (BSC) model,
a two-state Gilbert channel (GC) model [81], an order-10 (1024 state) full-state Markov

chain, and an order-20 constant-complexity model (CCM). These models are trained

108

 

using actual 802.1 lb MAC layer bit-error traces and subsequently the trained models are
used to estimate the goodput and retransmissions.

We show that extremely misleading estimates of goodput and retransmissions are
obtained when using a BSC or a GC. In particular, for the retransmission metric the
results obtained under the memory-less assumption can be orders of magnitude more
pessimistic than what is observed on the actual channel. On the other hand, the estimates
provided by channel models with high-order memory (i.e., 1024 state F SM and constant-

complexity models) are highly accurate.

A.7.1 Goodput of an Unreliable Protocol

In this section, we quantify the goodput of an abstract unreliable protocol - such as
the UDP protocol [64] - operating over wireless links. Here goodput refers to the ratio
between the number of received error-free packets and the total number of transmitted
packets. We compare how accurately the following bit-error wireless channel models
estimate the goodput of a wireless channel: (i) a memory-less binary-symmetric channel
(BSC) model, (ii) a 2-state Gilbert channel (GC) model [81], (iii) a full-state Markov
(F SM) channel model, and (iv) a constant-complexity channel model (CCM). We ﬁrst
analytically derive packet goodput in terms of the channel models’ parameters. We train
these models using actual traces and then estimate the traces’ goodputs using the trained
models. If a model accurately characterizes the bit-error channel then it should provide a

goodput estimate that is very close to the trace-based goodput.

109

A.7.l.1 Goodput of a Wireless Channel

Since contemporary wireless stacks perform a checksum on each packet to detect and
drop corrupted packets, the present abstract protocol drops all packets with one or more
bit-errors. To cater for end-to-end sessions with multiple hops that include a wired
(Internet) segment followed by a wireless access segment, we assume that only the last
transmission hop is a wireless link. We assume an uncogested path between the sender
and the receiver. Also, the wireless hop employs a CSMA/CA mechanism to resolve
channel contentions, and therefore the number of collisions is negligible. These
assumptions ensure that all packet drops are due to channel noise and interference; i.e.,
for simplicity of analysis, we ignore packet drops due to congestion or collisions.

Since we deﬁne goodput as the ratio between the number of received error-free

packets and the total number of transmitted packets, goodput is simply the probability '7

of receiving an error-free packet on the wireless channel. Goodput is constrained by

0 S 'y S 1, where '7 = 0 represents the limiting case when all the received packets have
errors and are therefore dropped, and '7 = 1 represents the limiting case when all the

received packets are error-free.

We ﬁrst derive expressions of goodput estimates '3/ in terms of the parameters of the
trained channel models. Second, we compute the actual goodput 7 of the bit-error traces

used in this study. Then for each wireless trace, we train all four channel models

considered in this chapter. Finally, the actual and estimated goodputs (7 and 3 s) are

compared.

110

A.7.1.2 Goodput of a Binary-Symmetric Channel Model

A binary symmetric channel (BSC) is a special case of the q -ary symmetric channel
mentioned in the last chapter. Speciﬁcally, a BSC is stateless channel that corrupts every
transmitted bit with a probability 5. Consequently, goodput or the probability of

receiving an error-free packet of length L over a BSC is simply given by:

3350 = Pr {error—free pktlBSC} = (1 _ €)L. (A.43)

Given training bit-error data, the parameter 5 is computed by taking the ratio between the

number of bad bits and the total number of bits in the training data.

A.7.1.3 Goodput of a Gilbert Channel Model

The Gilbert channel (GC) [81] is a lSt order Markov chain with a good and a bad
state. In the present bit-error modeling context, the two Gilbert states jointly capture a
process with a memory-length of one bit. The probability of the next (good or bad) bit is
dependent on the whether the last received bit was good or bad. Transitions to the good
state result in error-free bits, while transitions to the bad state yield corrupted bits. Due to
the present notation, we represent the good and bad states as state 0 and state 1,

respectively. The GC is completely characterized using two parameters, 190,0 and 121,1,

where 0 represents the error-free state and 1 represents the error state. Although both
BSC and GC are special cases of FSM chains, we treat them separately because of their
widespread use in wireless studies [86]- [100].

As shown in the last chapter, goodput or the probability of receiving an error-free

packet of length L over a GC is given by:

111

 

200 = Pr {error-free pktIGC} (A44)

)L—l

= no (190,0 )L + W119”) (190,0 2 (170,0)L—1 [”0900 + 7’1P1,0]

= 7r0(100,0)L_1

The above expression shows that the probability of getting a good packet over a Gilbert
channel model is simply the probability of starting in the error-free state and then staying

in that state for the length of the packet.

A.7.1.4 Goodput of a Full-state Markov Channel Model

The probability of receiving an error-free packet of L bits on a k—th order FSM
channel model is dependent on the present state of the model. If the last received bit was
error-free then the least-signiﬁcant bit in the memory-window will be zero, implying that
the FSM chain is in an even state. On the other hand, if the last received bit was corrupted
then the F SM chain would be in an odd state.

Let us ﬁrst focus on the scenario of currently being in an even state and then
receiving L consecutive good bits. Throughout this chapter, we invoke a realistic

assumption that L > k , where k is the memory-length of the process. Let F SM state 273 ,
0 g 2' 3 2k—1 — 1, be the current even state of the FSM channel model. Since all FSM

states have the mod 2" operation, unless otherwise stated, we drop the mod 2k operation
throughout the following text. Recall from Observation 1 in Section A.5.4 that every
FSM state 2' can transit to only two other states. Thus the current state 22' can transit to

either state 2(22’) or state 2(271) + 1. Since we are only concerned with bursts of error-

free bits, the probability of getting an error-free bit starting in state 22' is P2i,2(2i)- Now

112

for the length of the memory-window, the next k —1 transitions will be between even

states giving the following states sequence:
22' = 20 (22') .9 2(22') = 21 (21')... _. (2k‘1 (22)) mod 2k = 0.

Thus after these k — 1 transitions the process will be in F SM state 0. From that state, to

get the remaining error-free bits, the next L — (k — 1) transitions will be from state 0 to

state 0. To generalize the above discussion in terms of F SM chain parameters, the

probability of getting a burst of L good bits starting in FSM state 22' is given by

k—2
L— k—l . . .
71’22' H0 p2j(2z)2j+1(2z)(p00) ( ). Th1s probability has to be summed over all

j=0

2’“ 1—1 k—2 L_(k_1)
possible even FSM states yielding 2 #2,- H 122390214102) (190,0)
2': 0 j=0

An expression for the probability of getting an error-free packet starting in an odd
F SM state can be derived similarly. Adding these expressions gives the goodput of an
FSM channel model as follows:

:YFSM = Pr {error-free pktlFSM}

2k—1_1

k—2 k—2

 

 

 

 

L—(k—l) L—(k—l)

Z "21' H0p(2z)21,(22')21+1 (p00) + ”2H1 H p(2i+1)2j,(2i+1)2j+1 (100,0)
= j=0 i=0

k—l

L_ _(k_ _1)2 -1 k—2 k—2
=(p00) 2:30 "21' 1101122121 ,(22')2J+1 + 7r2i+11:10”(2z’+1)21,(2i+1)23+1
z— j=0
(A45)

The above expression gives the overall probability of getting L consecutive error-free

bits by summing over all possible state paths starting in an even or an odd FSM state.

113

A.7.l.5 Goodput of a Constant-Complexity Channel Model
The constant-complexity model (CCM) aggregates states of the FSM chain as shown

in Figure 19. Recall that the CCM aggregates states of an FSM chain of arbitrary order to

a ﬁve state model. Speciﬁcally, FSM states 0, l and 2k—1 are kept in three isolated
states of the CCM. The remaining even F SM states are aggregated into one CCM state,
while the remaining odd F SM states are aggregated into another CCM state. Throughout

the following text, we refer to the ﬁve CCM states as CO , c1, c2k_1 , cave” and Cadd-
Note that at any time instance, if the process is in states c0 , c2k_1 or ceven then the last

received bit was error-free. Similarly, the CCM being in state c1 or state Codd implies

that the last received bit was corrupted. The probability of transiting ﬁ'om current CCM

state c,- to CCM state cj is denoted by pew]. , and ”c,- represents the steady-state

probability of being in CCM state c,- .
To get a burst of L error-free bits on a CCM-based channel, we have to consider that

the CCM can be in any of the ﬁve states when the burst starts. If the CCM is in state CO

at the start of the burst then the probability that the following L bits are error-free is
. . L . . .

s1mply glven by (110,0) . If the process IS 1n state c1, for the next bit to be error-free, the

CCM should transit to state ceven . This transition has to be followed by k — 3 good bits,
i.e., k — 3 transitions from cave” to ceven. After that the CCM should transit to state

c2k_1 and then to state c0. Once in state c0 , the process will continue being in that state

for the following L — [9 transitions. Summarizing the above discussion gives probability

of receiving an error-free packet starting in state c1 as

114

 

 
 
    
   

- Actual traces

- Binary-symmetric channel model

B 2-state Gilbert channel model

- 1024-state full-state Markov channel model
- 5~state constant-complexity channel model

goodput %

J)
O

   

2 Mbps 5.5 Mbps

Figure 31. Comparison of the average goodput of the actual traces with the goodput

estimates provided by BSC, Gilbert, 1024-state Markov, and 5-state CCM models; each
result is averaged over ﬁve traces.

)k—B

L—k . . .
7rCl Pc1,even (pcwenvceven pcevenaczk—l pczk—l 1C0 (pCOJ‘O ) ' Slmllar expressrons

can be derived for the remaining CCM states. Now summing over all possible initial
states gives the complete expression for CCM goodput as

TCCM = Pr {error-free pktICCM}

L k—3
_ 7rC0 (pftlvq) ) + 7erl pelvceuen (pcetrenlceven )

)k—3

)L—k +

pcevenvc2k—1p52k—11CO (1250100

L — k
7|.codd pcodd ‘Ce L187] (peeven 1Cevcn. )

)k—2

pccve'nszk—l p62k_1 1(‘0 (1)0050 +

7r (
seven pceven vceven

L—k L—l
pcevenvczk—lpczk_1,c0 (”60.60) + ”021-4 Peg—1.60 (”60.60)

(A.46)
Similar to the FSM expression of (A.45), the above probability sums over all possible

CCM state paths of receiving an error-free packet of length L bits.

A.7.1.6 Comparison of Estimated Goodputs
In this section, we compare the goodput estimates provided by the channel models

against the goodput computed from an actual trace. For comparison with a trace, we ﬁrst
115

train all four models (BSC, GC, FSM and CCM) using that trace. We then plug in the
trained parameters of these models into equations (A.43) to (A.46) in order to get
throughput estimates of the trace from the models.

Actual and estimated goodputs are compared in Figure 31. The results in Figure 31
are averaged over ﬁve traces at each physical layer data rate. The CCM is trained by
aggregating states of an order-20 F SM chain. A packet length of 100 bytes is used to
compute actual and estimated goodputs. It can be clearly seen that for both data rates the
goodput estimates provided by the binary-symmetric and Gilbert channels are highly
pessimistic and inaccurate; at both data rates percentage goodputs estimated by the BSC
and the GC are approximately 20% and 30% respectively, while the actual goodput is
approximately 97%. Since the Gilbert channel incorporates one bit of memory, its
goodput estimate is slightly better than the memory-less binary-symmetric channel.
However, both these channels models are too inaccurate to be used in any realistic
measurement or analytical study. The order-10 full-state Markov model provides very
accurate goodput estimates because it incorporates high-order channel memory. While
being signiﬁcantly less-complex than the FSM model, the CCM provides estimates that
are even better than the order-10 FSM models because the CCM is constructed by

aggregating states of an order-20 F SM chain.

A.7.2 Retransmissions of a Reliable Protocol
In this section, we show that the expected number of retransmissions per packet can
be modeled as a simple function of the goodput. We then compare the retransmission

estimates provided by the models under consideration.

116

A.7.2.1 Expected Retransmissions on a Wireless Channel

In this section, we quantify the expected number of retransmissions experienced by a
packet being transported by an abstract reliable protocol - such as the transmission
control protocol (TCP) [101] or the 802.11 MAC layer protocol [58]. We only focus on
the retransmission-due-to-channel-noise aspect of reliable protocols by employing the
following simple abstraction: keep retransmitting until the packet is received correctly.
We acknowledge that this abstraction is somewhat unrealistic because reliable protocols
generally stop retransmitting aﬁer a certain threshold. However, this abstraction allows us
to quantify the worst-case performances of the channel models under consideration. Like
the previous section, at the receiver the abstract reliable protocol drops all packets with
one or more bit-errors. Also, we carry the assumption from the last section that only the
last transmission hop is a wireless link.

Let X denote the random variable representing the total number of retransmissions
required to successfully transmit a packet under the abstract retransmission protocol. Due
to the present abstraction, X can be modeled as a geometric random variable with

parameter '7 , where 7 is deﬁned in the last section as the true probability of a successful
packet on the wireless channel. More speciﬁcally, the probability that a packet will
experience m retransmissions can be expressed as Pr {X = m} = (1 —7)m '7.

Consequently, the expected number of retransmissions ﬂ is

a=E{X.=1_.. (m
7

117

As expected intuitively, the expected number of retransmissions is inversely proportional

to the probability of a good packet; increase in the probability of a good packet '7 will

cause the 1/7 expression to decrease.

Until this point, we have assumed that we accurately know the value 7 , the true

probability of a successful packet on the wireless channel. In wireless simulations, an

estimate of this parameter ’7 is provided by a wireless channel model. From the last
section, we know that equations (A.43) to (A.46) provide the 3 estimates for the BSC,
GC, FSM and CCM channel models. Given the ‘9 estimates, the estimated number of

retransmissions per packet can be computed as:

3 :;_1. (A.48)
7

Plugging in equations (A.43) to (A.46) renders each channel model’s estimate of per-

packet retransmissions.

A.7.2.2 Comparison of Estimated Retransmissions

To compute the average number of retransmissions per packet from an actual trace,
we divide the trace into 100 byte packets. Then to emulate transmission of packet 2', we
count the burst-length of corrupted packets including and following packet i. This burst-
length is the number of retransmissions that packet 2' will experience. Burst-
lengths/retransmissions of all the emulated packets are accumulated. Finally, the
accumulated retransmission count is normalized by the total number of error-free packet

transmissions.

118

As before, parameters of the channel models are derived from the traces against
which they are being compared. Note here that the results of (A.48) are not computed by
taking the reciprocal of the averaged goodput results of Figure 31. The retransmission
estimates are computed by applying equation (A.47) to a model that is trained speciﬁcally

for a particular trace. Since equation (A.47) takes the reciprocal of 0 _<_ ’7 g 1, a model

with low Sr can render very high values of 3.

Figure 32 plots the average number of retransmissions per packet observed in an
actual trace compared against the retransmission estimates provided by the binary-
symmetric, Gilbert, full-state Markov and constant-complexity channel models. It can be
clearly seen in Figure 32 that the estimates provided by the BSC model are grossly
inaccurate. For instance, at 2 Mbps the BSC models estimates the expected number of
retransmissions per packet to be approximately 700 whereas the average number of per-
packet retransmissions observed in the actual traces is about 0.02. The highly inaccurate
retransmission estimates by the BSC are mostly due to receiver-4’s traces. The goodput
estimate of the BSC model for this trace is approximately 0.0003 at 2 Mbps. Putting this
value into equation (A.47) gives an extremely inaccurate estimate of more than 3000
retransmissions per packet. This simple result shows the scale of inaccuracy that is
incurred if channel memory is completely ignored during theoretical or experimental

veriﬁcation of a wireless system.

119

§

 

- Actual traces

- Binary—symmetric channel model

B 2—state Gilbert channel model

1024—state full-state Markov channel model
- S-state constant-complexity channel model

9‘

§

 

§

§

retraﬂsmisslqns per packet
8 8

  

é

 

 

 

O

r:1 -
2 Mbps 5.5 Mbps

Figure 32. Comparison of the number of retransmissions per packet estimated by BSC,
Gilbert, 1024-state Markov, and 5-state CCM models; each result is averaged over ﬁve
traces.

    
 

 

- Actual traces

(.0
0|

:] 2-state Gilbert channel model
- 1024-stale full-state Markov channel model
i - 5-state constant-complexity channel model

0)
O

M
(II

_.
0|

retransmissions per packet
-| N
O O

l

l

2 Mbps 5.5 Mbps

OI

 

 

>

 

C

Figure 33. Number of retransmissions per packet without the BSC model.

 

  
    

- Actual traces
- -

Cl -
a 1024-state tull~stale Markov channel model
— 5-stare constant-complexity channel model

 

 

2 Mbps 5.5 Mbps

Figure 34. Number of retransmissions per packet without the BSC model.

120

The estimates of the BSC model are so overwhelming inaccurate that the remaining
plots are not clearly visible in Figure 32. Therefore, in Figure 33 we plot the results
without the BSC model. From Figure 33, it can be seen that at 2 Mbps even the Gilbert
channel provides very inaccurate estimates of the expected number of retransmissions.
The GC estimate is closer to the actual traces at 5.5 Mbps, but is still signiﬁcantly worse
than the F SM and CCM models. Figure 34 only shows the estimates by the 1024-state
FSM model and the CCM. Since these channel models incorporate high-order memory,

their estimates are extremely close to the retransmissions observed in the actual traces.

121

CHAPTER A.8 CONCLUSIONS AND
FUTURE WORK

In this part of the thesis, we showed that 802.1 lb MAC layer bit-errors at 2 and 5.5
Mbps are Markovian, while bit-errors at 11 Mbps are long-range dependent. We
demonstrated that high-order full-state Markov (FSM) chains can model the bit-errors at
2 and 5.5 Mbps. A multifractal wavelet model (MWM) was used to characterize 11 Mbps
bit-errors. We mitigated the complexity of FSM chains by approximating F SM behavior
using a constant-complexity model which always comprised ﬁve states and was highly
accurate. We employed the proposed channel models to estimate corrupted packet
headers in an FEC-based wireless multimedia framework. This novel framework
provided signiﬁcant improvements in bandwidth utilization and multimedia quality.
Finally, we highlighted some of the inaccuracies that are incurred by using inaccurate
models. These inaccuracies can be avoided by using the constant-complexity model
proposed in this thesis.

As future work, we will study the applicability of the proposed models on other
wireless channels. Another ongoing extension of this work is to incorporate the proposed
channel models into open-source network simulators, such as ns-2 [102] and Qualnet
[103]. We are also investigating alternative methods that can reduce the complexity of the
header estimation framework. Finally, we intend to extend analysis similar to Chapter
A.7 to other wireless protocols and systems so that we can quantify the inaccuracies that

are incurred by inaccurate channel models.

122

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[91

[10]

[11]

PART-A REFERENCES

B. D. Fritchman, “A Binary Channel Characterization using Partitioned Markov
Chains,” IEEE Transactions on Information Theory, vol. 13, pp. 221-227, April
1967.

S. Tsai, “Markov Characterization of the HF Channel,” IEEE Transactions on
Communications T echnologt, vol. 17, pp. 24—32, February 1969.

H. 0. Burton and D. Sullivan, “Errors and Error Control,” Proceedings of the
IEEE, pp. 1293—1301 , November 1972.

H. A. Blank and P. J. Traﬁon, “A Markov Error Channel Model,” IEEE National
Telecommunications Conference, December 1973.

RT. Chien, A.H. Haddad, B. Goldberg and E. Moyes, “An Analytic Error Model
for Real Channels,” IEEE International Conference on. Communications (ICC),
June 1972.

A. H. Haddad, S. Tsai, B. Goldberg, G. C. Ranieri, “Markov Gap Models for Real
Communication Channels,” IEEE Transactions on Communications, vol. 23, no.
ll,pp.1189—1197,1975.

L. N. Kanal and A. R. K. Sastry, “Models for Channels with Memory and
Applications to Error Control,” Proceedings of the IEEE, vol. 66, no. 7, pp. 724—
744, 1978.

M. Yajnik, S. Moon, J. Kurose, and Don Towsley, “Measurement and Modelling
of the Temporal Dependence in Packet Loss,” IEEE Infocom, March 1999.

M. Zorzi and R. R. Rao, “On Channel Modeling for Delay Analysis of Packet
Communications over Wireless Links,” Allerton Conference on Communications,
Control and Computing, September 1998.

H. Balakrishnan and R. Katz, “Explicit Loss Notiﬁcation and Wireless Web
Performance,” IEEE Globecom, November 1998.

M. Zorzi and R. R. Rao, “On the Statistics of Block Errors in Bursty Channels,”
IEEE Transactions on Communications, vol. 45, no. 6, pp. 660—667, June 1997.

123

[12] R. R. Rao, “Higher Layer Perspectives on Modeling the Wireless Channel,” IEEE
ITW, June 1998.

[13] G. T. Nguyen, R. Katz, and B. Noble, “A Trace-based Approach for Modeling
Wireless Channel Behavior,” Winter Simulation Conference, December 1996.

[14] A. Konrad, B. Y. Zhao, A. D. Joseph, and R. Ludwig, “A Markov-based Channel
Model Algorithm for Wireless Networks,” ACM Wireless Networks Journal
(WINE T), vol. 9, pp. 189 — 199, 2003.

[15] A. Konrad, B. Y. Zhao, A. D. Joseph, and R. Ludwig, “A Markov-based Channel
Model Algorithm for Wireless Networks,” ACM Mobicom Workshop on
Modeling, Analysis and Simulation of Wireless and Mobile Systems (MS WiM),
July 2001.

[16] P. Ji, B. Lin, D. Towsley, Z. Ge, and J. Kurose, “Modeling Frame-level Errors in
GSM Wireless Channels,” Performance Evaluation Journal, vol. 55, no. 1-2, , pp.
165—181, January 2004.

[17] P. Ji, B. Liu, D. Towsley, and J. Kurose, “Modeling Frame-level Errors in GSM
Wireless Channels,” IEEE Globecom, November 2002.

[18] S. A. Khayam, S. Karande, H. Radha, and D. Loguinov, “Performance Analysis
and Modeling of Errors and Losses over 802.11b LANs for High-Bitrate Real-

time Multimedia,” Signal Processing: Image Communication, vol. 18, no. 7, pp.
575—595, August 2003.

[19] S. Karande, S. A. Khayam, M. Krappel, and H. Radha, “Analysis and Modeling
of Errors at the 802.11b Link Layer,” IEEE International Conference on
Multimedia and Expo (ICAE), July 2003.

[20] S. A. Khayam and H. Radha, “Markov—based Modeling of Wireless Local Area '
Networks,” ACM Mobicom Workshop on Modeling, Analysis and Simulation of
Wireless and Mobile Systems (MS WiM), September 2003.

[21] S. A. Khayam, S. Aviyente, H. Radha, and J. R. Deller, Jr. “Markov and
Multifractal Wavelet Models for Wireless MAC-to-MAC Channels,”
Performance Evaluation, to appear.

[22] S. A. Khayam, S. Aviyente and H. Radha, “On Long-Range Dependence in High-
Bitrate Wireless Residual Channels,” Conference on Information Sciences and
Systems (CISS), March 2005.

124

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

R. R. Rao, “Higher Layer Perspectives on Modeling the Wireless Channel,” IEEE
ITW, June 1998.

A. M. Chen and R. R. Rao, “Wireless Channel Models — Coping with
Complexity,” Wireless Multimedia Network Technologies, Kluwer Academic
Publishers, pp. 271—288, 1999.

A. M. Chen and R. R. Rao, “On Tractable Wireless Channel Models,” IEEE
PIWC, September 1998.

A. Willig, M. Kubisch, C. Hoene, and A. Wolisz, “Measurements of a Wireless
Link in an Industrial Environment using an IEEE 802.11-Complaint Physical
Layer,” IEEE Transactions on Industrial Electronics, vol. 49, no. 6, pp. 1265—
1282, 2002.

A. Willig, “A New Class of Packet- and Bit-Level Models for Wireless
Channels,” IEEE PIA/ﬂtC, October 2001.

A. Kopke, A. Willig, and H. Carl, “Chaotic Maps as Parsimonious Bit Error
Models of Wireless Channels,” IEEE Infocom, March 2003.

S. A. Khayam and H. Radha, “Linear-Complexity Models for Wireless MAC-to-
MAC Channels,” ACM Wireless Networks (WINE?) Journal, vol. 11, no. 5,
September 2005.

S. A. Khayam and H. Radha, “Constant-Complexity Models for Wireless
Channels,” IEEE Infocom, April 2006.

R. Caceres and L. Iﬁode, “Improving the Performance of Reliable Transport
Protocols in Mobile Computing Environments,” IEEE Journal on Selected Areas
in Communications (JSAC), vol. 13, no. 5, 1995.

A. Bakre and B. R. Badrinath, “I-TCP: Indirect TCP for Mobile Hosts,” IEEE
ICDCS, May 1995.

R. Yavatkar and N. Bhagwat, “Improving End-to-End Performance of TCP over
Mobile Intemetwor ,” Workshop on Mobile Computing Systems and
Applications, Dec. 1994.

H. Balakrishnan, V. N. Padmanabhan, S. Seshan and R. H. Katz, “A Comparison
of Mechanisms for Improving TCP Performance over Wireless Links,”
IEEE/A CM Transactions on Networking, 1997.

125

[35]

[361

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

G. Holland and N. Vaidya, “Analysis of TCP Performance over Mobile Ad Hoc
Networks,” ACM Wireless Networks (WINE 7), vol. 8, pp. 275—288, 2002.

Z. Fu, X. Meng, and S. Lu, “How Bad TCP Can Perform In Mobile Ad Hoc
Networks,” IEEE ISCC, 2002.

K. Chandran, S. Raghunathan, S. Venkatesan, and R. Prakash, “A Feedback based
Scheme for Improving TCP Performance in Ad-hoc Wireless Networks,” IEEE
ICDCS, 1998.

T. D. Dyer and R. V. Boppana, “A Comparison of TCP Performance over Three
Routing Protocols for Mobile Ad Hoc Networks,” ACM MobiHoc, Oct. 2001.

C. Parsa and J .J . Garcia-Luna-Aceves, “Improving TCP Performance over
Wireless Networks at the Link Layer,” Mobile Networls and Applications, vol. 5,
pp. 57—71, 2000.

M. Gerla, K. Tang, and R. Bagrodia, “TCP Performance in Wireless MultihOp
Networks,” IEEE WMCSA, 1999.

L-A. Larzon, M. Degermark, S. Pink, L-E. Jonsson, and G. Fairhurst, “The
Lightweight User Datagram Protocol (U DP-Lite),” RFC 3828, July 2004.

L-A. Larzon, M. Degermark, and S. Pink, “UDP-Lite for real time multimedia
applications,” IEEE ICC, Jun. 1999.

L-A. Larzon, M. Degermark, and S. Pink, “Efﬁcient use of wireless bandwidth for
multimedia applications,” IEEE MOMUC, Oct. 2000.

L-A. Larzon, M. Degermark, and S. Pink, “The Lightweight User Datagram
Protocol (UDP-Lite),” RFC 3828, Jul. 2004.

H. Zheng and J. Boyce, “An Improved UDP Protocol for Video Transmission
Over Intemet-to-Wireless Networks,” IEEE Transactions on Multimedia, vol. 3,
no. 3, pp. 356--365, September 2001.

H. Zheng, “Optimizing Wireless Multimedia Transmissions through Cross Layer
Design,” IEEE International Conference on Multimedia and Expo (ICAE), July
2003.

A. Singh, A. Konrad, and A. D. Joseph, “Performance evaluation of UDP-Lite for
cellular video,” A CM NOSSDA V, 2001.

126

 

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

 

A Servetti and J. C. De Martin, “Error tolerant MAC extension for speech
communications over 802.11 WLANs,” IEEE VT C, 2005.

C. H. Shih, Y. M. T011, and C. K. Shieh, “A self-regulated redundancy control
scheme for wireless video transmission,” IEEE WirelessCom, 2005.

E. Masala, M. Bottero, and J. C. De Martin, “MAC-level partial checksum for
H.264 video transmission over 802.11 ad hoc wireless networks,” IEEE VTC,
2005.

S. A. Khayam, S. Karande, M. Krappel, and H. Radha, “Cross-Layer Protocol
Design for Real-time Multimedia Applications over 802.11b Networks,” IEEE
International Conference on Multimedia and Expo (ICAE), July 2003.

Z. Ye, S. V. Krishnamurthy, and S. K. Tripathi, “A Framework for Reliable
Routing in Mobile Ad Hoc Networks,” IEEE Infocom, 2003.

J. Tang, G. Xue, and W. Zhang, “Reliable Routing in Mobile Ad Hoc Networks
Based on Mobility Prediction,” IEEE AMSS, Oct. 2004.

S. Mueller, R. P. Tsang, and D. Ghosal, “Multipath Routing in Mobile Ad Hoc
Networks: Issues and Challenges,” Lecture Notes in Computer Science, 2004.

P. Papadimitratos, Z. J. Haas, and E. G. Sirer, “Path Set Selection in Mobile Ad
Hoc Networks,” ACM MobiHoc, Jun. 2002.

F. Zhai, Y. Eisenberg, T. N. Pappas, R. Berry, and A. K. Katsaggelos, “Rate-
Distortion Optimized Product Code Forward Error Correction for Video

Transmission over IP-based Wireless Networks,” International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2004.

ISO/IEC 8802-11:1999(E), “Part 11: Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Speciﬁcations,” August 1999.

IEEE Std 802.11b-1999, “Part 11: Wireless LAN Medium Access Control (MAC)
and Physical Layer (PHY) Speciﬁcations: Higher-Speed Physical Layer
Extension in the 2.4 GHz band,” September 1999.

J. G. Kemeny and J. L. Snell, Finite Markov Chains, Springer-Verlag: New York,
1976.

D. Cox, “Long-Range Dependence: A Review,” Statistics: An Appraisal, pp. 55 —
74, 1984.

127

[61]

[62]

[63]

[64]

[651

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

R. Riedi, M. Crouse, V. Ribeiro, and R. Baraniuk, “A Multifractal Wavelet Model
with Application to Network Trafﬁc,” IEEE Transactions on Information Theory,
45(3), pp. 992—1018, 1999.

P. Arby, R. Baraniuk, P. F landrin, R. Riedi, and D. Veitch, “Multiscale Nature of
Network Trafﬁc,” IEEE Signal Processing Magazine, 19(3), pp. 28 — 46, May
2002.

V. Ribeiro, R. Riedi, and R. Baraniuk, “Wavelets and Multifractals for Network
Trafﬁc Modeling and Inference,” IEEE ICASSP, May 2001.

J. Postel, “User datagram protocol,” RFC 768, Aug. 1980.

P. Brockwell and R. Davis, Introduction to Time Series and Forecasting,
Springer: Verlag, 1996.

N. Merhav, M. Gutrnan, and J. Ziv, “On the Estimation of the Order of a Markov
Chain and Universal Data Compression,” IEEE Transactions on Information
Theory, vol. 35, pp. 1014—1019, September 1989.

M. J. Weinberger, J. J. Rissanen, and M. Feder, “A Universal Finite Memory
Source,” IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 643—652,
1995 .

W. Willinger, V. Paxson, R. H. Riedi, and M. S. Taqqu, “Long-Range
Dependence and Data Network Trafﬁc,” Long Range Dependence: Theory and
Applications, Birkhiiuser, pp 373- 407, 2002.

P. Abry, P. F landrin, M. Taqqu, and D. Veitch, “Wavelets for the Analysis,
Estimation and Synthesis of Scaling Data,” Self Similar Network T raﬁic Analysis
and Performance Evaluation, Wiley, 2000.

R. J. Adler, R. E. Feldman, and M. Taqqu, A Practical Guide to Heavy Tails,
Birkhauser, 1998.

M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables, Dover, pp. 564- 565, 1972.

Homepage of l inux—wlan-ng device drivers, hﬁnz//ww.linux-w1;1n.org.

Multifractal Wavelet Model Toolbox, http://www-
dsp.rice.edu/soﬁware/mwm.shtml.

128

[741

[75]

[761

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

V Teverovsky and M. Taqqu, “Testing for Long-range Dependence in the
Presence of Shifting Mean or a Slowly Declining Trend using a Variance-type
Estimator,” Journal of Time Series Analysis, vol. 18, no. 3, pp. 279—304(25), May
1997.

L. R. Rabiner, “A Tutorial on Hidden Markov Models and its Applications,”
Proceedings of the IEEE, vol. 77, no. 2, pp. 257- 286, February 1989.

A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum-Likelihood from
Incomplete Data via the EM Algorithm,” Journal of Royal Statistics Society
Series, vol. 39, 1977.

L. E. Baum, “An Inequality and Associated Maximization Technique in Statistical
Estimation of Probabilistic Functions of Markov Processes,” Inequalities, vol. 3,
no. 1, pp. 1—8, 1972.

T. Cover and J. Thomas, Elements of Information Theory, Wiley: New York,
1991.

H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I, Wiley:
New York, 2001.

R. E. Blahut, Theory and Practice of Error Control Codes, Addison-Wesley, May
1 984.

E. N. Gilbert, “Capacity of a Burst Noise Channel,” Bell. Sys. Tech. Journal, vol.
39, pp. 1253—1265, September 1960.

M. Mushkin and I. Bar-David, “Capacity and coding for the Gilbert-Elliot
channels,” IEEE Transactions on Information Theory, vol. 35, no. 6, pp.
1277- 1290, November 1989.

ISO/IEC JTC l/SC29/WG11 and ITU-T SG16 Q.6, “Draft ITU-T
Recommendation and Final Draft International Standard of Joint Video
Speciﬁcation (ITU-T Rec. H.264IISO/IEC 14496-10 AVC),” Mar. 2003.

ISO/IEC JT C 1/SC29/WG11, “Text of ISO/IEC 14496-222001 (Unifying N2502,
N3301, N3056, and N3664,” Doc. N4350, July 2001.

H.264/AVC Software Coordination webpage, http://iphome.hhi.de/suehring/tml.

A. Natu and D. Taubman, “Unequal protection of JPEG2000 code-streams in
wireless channels,” IEEE Globecom, Nov. 2002.

129

[37]

[88]

[39]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[981

M. Grangetto, E. Magli, G. Olmo, “Reliable JPEG 2000 wireless imaging by
means of error-correcting coder,” IEEE ICAE, June 2004.

D. Krishnaswamy and S. Kalluri, “Multi-level weighted combining of
retransmitted vectors in wireless communications,” IEEE VTC, 2006.

C. E. Koksal and H. Balakrishnan, “Quality-aware routing metrics for time-
varying wireless mesh networks,” IEEE Journal on Selected Areas in
Communications (JSA C), to appear.

J. Farber and K. Zeger, “Optimality of the natural binary code for quantizers with
channel optimized decoders,” IEEE 151 T, July 2003.

W. S. Lee, M. R. Pickering, M. R. Frater, and J. F. Arnold, “Error resilience in
video and multiplexing layers for very low bit-rate video coding systems,” IEEE
Journal on Selected Areas in Communications (JSAC), vol. 15, no. 9, pp.
1764- 1774, 1997.

X. Luo and G. B. Giannakis, “Energy-constrained optimal quantization for
wireless sensor networks,” IEEE SECON, Oct. 2004.

M. U. Ilyas and H. Radha, “End-to-end channel capacity of a wireless sensor
network under reachback,” CISS, Mar. 2006.

M. Godavarti and A. 0. Hero III, “Diversity and degrees of freedom in wireless
communications,” IEEE ICASSP, May 2001.

H. Dong, I. D. Chakares, A. Gersho, E. Belding-Royer, and J. D. Gibson,
“Selective bit-error checking at the MAC layer for voice over mobile ad hoc
networks with IEEE 802.11,” IEEE WCNC, Mar. 2004.

L. Bononi, M. Conti, and E. Gregori, “Runtime Optimization of IEEE 802.11
Wireless LANs Performance,” IEEE Transactions on Parallel and Distributed
Computing, vol. 15, no. 1, 2004.

C-F. Chiasserini and E. Magli, “Energy-Efﬁcient Coding and Error Control for
Wireless Video-Surveillance Networks,” Telecommunication Systems, vol. 26, no.
2,pp.369-387,2004.

Z-H. Tan, P. Dalsgaard, and B. Lindberg, “A subvector-based error concealment
algorithm for speech recognition over mobile networks,” IEEE ICASSP, May
2004.

130

 

[99] W. S. Lee, M. R. Frater, M. R. Pickering, and J. F. Arnold, “A robust codec for
transmission of very low bit-rate video over channels with bursty errors,” IEEE
Transactions on Circuits and Systems for Video Technology (CS VT), vol. 10, no.
8, pp. 1403- 1412, December 2000.

[100] L. Zhong, F. Alajaji, and G. Takahara, “A queue-based model for wireless
Rayleigh fading channels with memory,” IEEE VT C, Sep. 2005 .

[101] J. Postel, “Transmission control protocol,” RFC 793, Sep. 1981.
[102] Homepage of the ns-2 network simulator, http://www.isi.edu/nsnam/ns/.

[103] Homepage of the Qualnet network simulator, ht_tp://www.sca_l§ble-networks.com/.

131

PART B
SELF -PROPAGATING MALWARE

DETECTION AT NETWORK ENDPOIN TS
USING IN F ORMATION -THEORETIC TOOLS

132

CHAPTER B.l INTRODUCTION

A recent and dramatic increase in automated network intrusions has necessitated
defense mechanisms that can curb the spread of self-propagating malicious software3
(malware) in real-time. Moreover, rapid evolution and mutation of malware stipulate
detection of novel (i.e., previously unknown) attacks with few, if any, assumptions about
the attack strategy. To that end, network-based anomaly detectors attempt to ﬂag
behavior that is anomalous or abnormal for a networked entity or a user [1]—[29]. The
challenge of anomaly detection systems is the characterization of benign behavior. Most
of the contemporary anomaly detectors are either (a) network-based systems that detect
anomalies by observing unusual network trafﬁc patterns [2]—[24] or (b) host-based
systems that detect anomalies by monitoring an endpoint’s operating system (OS)
behavior, for instance by tracking OS audit logs, processes, command-lines or keystrokes
[25]—[29]. Contemporary anomaly detectors tend to be computationally complex with
high false alarm rates and slow response times [30]- [32].

Since network endpoints4 are serving as extremely potent and viable launch pads and
carriers for malware infections [34], [35], it is important that real-time and effective
defenses be developed speciﬁcally for network endpoints. Recently, there has been some
interest in network- and host-based malware detection at endpoints [25]- [29], [19]- [21]

or at servers close to the endpoints [22]- [24]. Most of these studies leverage some

 

3 Due to the present focus on detection of self-propagating malicious software, throughout this thesis the
term malware corresponds to self-propagating malware.

4 “An endpoint is an individual computer system or device that acts as a network client and serves as a
workstation or personal computing device.” [33].

133

characteristics of past malware for endpoint-based malware detection. While these
malware characteristics hold true for some of the contemporary malware, their validity
and efficacy are currently being questioned [37]- [42]. Consequently, there is a growing
interest in developing behavioral signatures of benign/legitimate behavior [1]. Once a
robust behavioral model is in place, malicious activity can be detected using deviations
from benign behavior rather than relying on prior experiences of malicious activity.

The objective of behavioral anomaly detection is the characterization of an endpoint’s
benign behavior. Naturally, it is desirable to identify behavioral features that will get
perturbed if the endpoint is compromised by any (past, present, or future) self-
propagating malware. This work identiﬁes such behavioral features using information-
theoretic tools and leverages these features for real-time malware detection at network

endpoints.

B.1.1 Overview of Contributions

To obtain benign behavioral proﬁles of end-users, we have spent 12 months
collecting trafﬁc statistics of a diverse set of endpoints in home, office, and university
settings. An endpoint’s trafﬁc proﬁle contains information about session-level network
activity, such as one-way bashed source and destination IP addresses, session direction
(incoming or outgoing), source and destination ports, timestamps, and keystrokes that are
used to initiate sessions. For malicious activity, we use a diverse set of real and simulated
worms. These worms vary in their propagation rates and scanning techniques. We
evaluate the benign data proﬁles for behavioral features that get perturbed when the
endpoint is compromised by a self-propagating malware. Based on the identiﬁed features,

we propose three malware detection techniques.

134

The ﬁrst malware detection technique proposed in this thesis is truly network-based
since it only uses trafﬁc features for malware detection. This technique relies on the
premise that the vulnerabilities targeted by any malware are associated with a small
number of source or destination ports. Thus, on a compromised machine, the distribution
of source or destination ports on which a host communicates should be perturbed after
infection. Information-theoretic measures can quantify such perturbations in port
distributions. We ﬁrst evaluate whether entropy of port distributions can be used to detect
worms. We observe that in many cases entropy cannot identify malware-related port
perturbations because it captures the variance of a distribution rather than the frequencies
of individual ports. As an alternative technique, we pr0pose the use of the Kullback-
Leibler (K-L) divergence measure [43] to characterize perturbations in source and
destination port proﬁles as a means of detecting attempts of malware propagation. In our
framework, we record a small version of each endpoint’s benign trafﬁc proﬁle and
continuously compare it using the K-L divergence measure to the port histograms
observed in the last window of t time-units. Our results with the collected benign and
malware data show that K-L divergence of port histograms is perturbed signiﬁcantly on
compromised endpoints, which allows very accurate detection of malicious activity by
simply observing each host’s trafﬁc. We also experiment with three other information
divergence measures, namely the Jenson-Shannon (J -S) divergence, the K -divergence
and the resistor-average (R-A) divergence [44], [45]. However, these divergence
measures do not provide any substantial improvements over the K-L divergence.

We use a very small subset of K-L divergences derived from the normal user proﬁles

and the malware data to train support vector machines (SVMs) [46] for each endpoint.

135

The trained SVMs are then tested using all other malware which are embedded at
multiple random instances in the normal proﬁles. For all our experimental evaluations,
we observe ahnost 100% detection accuracy and negligible false alarm rates. We
compare the performance of our proposed malware detector with two existing anomaly
detectors, namely the maximum-entropy detector [14] and the rate-limiting detector [20].
We show that the proposed K-L/SVM-based detector provides consistently and
substantially better performance than the techniques of [14] and [20].

The remaining two malware detection techniques proposed in this thesis are joint
network-host anomaly detectors which exploit the observation that when a user is
actively using his/her computer most of the benign traffic is triggered by a small subset of
keystrokes and mouse clicks. Based on this observation, we propose to correlate the last
input from the keyboard or mouse hardware buffer with every new network session. We
use marginal keystroke data to show that the session initiation keys are not necessarily
used as frequently by an end-user. To effectively exploit the session-keystroke correlation
in a real-time and automated fashion, we propose two information-theoretic measures,
namely keystrokes’ entropy and session-keystroke mutual information [43].

We compute the keystrokes’ entropy and mutual information on a window-by-
window basis. We observe that the entropy is consistently low and mutual information is
somewhat high in the time windows containing benign data. However, once malicious
trafﬁc with a marginal keystroke distribution is inserted into the benign proﬁle, there is a
signiﬁcant increase in the entropy and simultaneously there is a decrease in the mutual
information. These entropy and mutual information perturbations are because of the fact

that many keys that are generally used very frequently by the users are never used to

136

 

initiate legitimate network activity. For a user who is active on his/her endpoint, the
malicious network sessions that are not initiated by the user are logged with unlikely and
diverse keystrokes thereby changing the keystrokes’ distribution.

To create an automated detection tool based on the keystroke distributions, we use a
small subset of the benign proﬁles to generate the joint and marginal distributions of
keystrokes and network sessions. Based on the statistics of these distributions, we
develop entropy/mutual information threshold above/below which an alarm is raised. For
both entropy and mutual information based detectors, we observe almost 100% detection
accuracy and very low false-alarm rates. Overall the mutual information detector has
lower false alarm rates than the entropy detector. Nevertheless, both detectors provide
signiﬁcantly better performance than the existing maximum-entropy and rate-limiting

detectors.

B.1.2 Organization of this Part

The rest of this part is structured as follows. Chapter B.2 describes related work in
this area. Chapter B.3 provides brief background on self-propagating malware and
support vector machines. Chapter B.4 details the benign endpoint proﬁles and malware
collected/simulated for this study. Chapter B.5 proposes a network-based information-
theoretic technique which detects malware by leveraging the K-L divergence between
benign and real-time trafﬁc features in an SVM framework. Chapter B.6 presents two
other techniques which respectively employ entropy and mutual information of
keystrokes and network sessions to detect malware. Chapter 8.7 identiﬁes possible

attacks on the proposed malware detectors and discusses defenses against these attacks.

137

Chapter B.8 summarizes key conclusions of this part and outlines our future research

detections.

138

CHAPTER B.2 RELATED WORK

Most of the contemporary studies perform network-based anomaly detection at the
enterprise network perimeter or the local network perimeter. Zou et al. [2] propose a
malware warning center (MWC) and distributed ingress and egress sensors at a local
network’s perimeter. Similarly, Wu et al. [3] propose a network architecture and a
distributed algorithm to detect multi-vector worms. Schechter et al. [4] use a combination
of rate limiting and portscan detection on local network worm detector. Jung et al. [5]
develop a network-level fast portscan detector that uses a threshold random walk (TRW)
on typical access patterns to infer whether a host is malicious or benign. Weaver et al. [6]
simplify the TRW algorithm to make it more amenable to hardware and software
implementations. The simpliﬁed algorithm of [6] can accurately detect very low rate
worms. Soule et al. [11] apply a Kalman ﬁlter to normal trafﬁc and then use multiple
anomaly detection techniques to detect abnormal behavior. Kim et al. [12] propose that
gateway routers score each packet based on its legitimacy. Similarly, anomaly detectors
that monitor blocks of unused IP addresses are also becoming increasingly popular
[15]- [18].

There has been some recent interest in detecting malware at servers near the
endpoints. Whyte et al. [22] detect worms by monitoring (at the gateway router)
connections that are not preceded by a DNS address resolution request. Gupta and Sekar
[23] detect changes in trafﬁc volume at a mail server to detect mass mailing worms.

Xiong [24] trace attachment at mail servers to detect mass mailing worms.

139

 

Barford et al. [9] use time-frequency signal analysis to develop a change detection
algorithm. Krishamurthy et al. [10] propose a sketch-based change detection algorithm.
Lakhina et al. [7], [8] propose a subspace method to detect and characterize network-
wide volumetric trafﬁc anomalies. The authors then extend their work in [13] and use
entropy to detect anomalies. Another recent study by Gu et al. [14] uses maximum-
entropy estimation to quantify a baseline distribution at a network gateway or router,
which is in turn used to classify anomalous activity using the K-L divergence.

The most commonly used endpoint-based network-level malware detection technique
is rate limiting. This technique proposed by Twycross and Williamson [19], [20] limits
the rate of an endpoint’s network trafﬁc to curb and detect malware propagation. Sellke
et al. [21] extend rate limiting by proposing a branching worm propagation model and in
turn using this model to develop a window-based rate limiting mechanism.

Wong et al. [37], [38] show that rate limiting is not very effective on endpoints or
local network perimeter, but can provide effective malware throttling if deployed on
backbone routers. Panjwani et al. [42] evaluated whether portscans are precursor to
malicious attacks. It was concluded in [42] that over 50% of attacks are not preceded by a
portscan and, therefore, “port scans should not be considered as precursors to an attack.”
Moreover, Li et al. [39] show that statistical ﬁltering-based defense mechanisms are
effective when they are adapted in accordance with an attack. In [39] it is also shown that
the performance of a statistical ﬁlter degrades signiﬁcantly if the attacker is more
adaptive than the ﬁlter.

In the host-based anomaly detection context, most of the existing detectors

characterize benign user behavior by modeling commands given by a user in a textual OS

140

environment [26]-[29]. Due to the high market penetration of graphical operating
systems, it is important to model graphical behavioral features of end-users.

A recent technique called BINDER [25] correlates keystrokes with OS processes and
raises an alarm whenever a process is initiated without an end-user’s input. There are
important differences between BINDER and the detector proposed in this thesis. First,
BINDER is purely host-based and does not employ any network session information.
Second, BINDER cannot detect memory-resident malicious codes because its detector is
invoked only when a new process is created. (There have been many well-known worms
that were memory-resident; two most famous examples are CodeRed I I and Witty.)
Since our technique uses both network and host information, it can detect memory-
resident malware. Lastly, BINDER requires a whitelist of legitimate applications before
deployment. The detector proposed in this thesis can be deployed out-of-the-box aﬁer

which all training is done online.

141

CHAPTER B.3 BACKGROUND

In this section, we provide background material which is required to understand the

contributions of this part.

B.3.l Self-Propagating Malware

Self-probating malware is a recent term that is used to refer to a malicious code that
has the ability to spread from one compromised computer to another computer without
any human intervention. These malicious codes generally target vulnerabilities in
background processes or services that are continuously running on vulnerable hosts. Aﬁer
compromising a vulnerable computer, a self-propagating malware tries to locate and
infect other vulnerable hosts on the network. The process of locating vulnerable hosts is
called scanning. Over the last few years, malware have evolved to use very sophisticated
scanning and infection techniques [41].

There are two prevalent types of self-propagating malware:

o Worms: A worm is a standalone malicious code that propagates copies of itself to

vulnerable computers;

0 Bots: A bot is a malware which after infecting a computer contacts a central

command and control server. This server in turn makes the compromised computer
part of a bot network (botnet) of compromised computers. These botnets are

subsequently remote controlled by the central server.

142

Aﬁer compromising vulnerable computers, malware can use these computers to
launch distributed denial of service (DDoS) attacks, relay spam or steal personal

information.

B.3.2 Support Vector Machines

Given training vectors x, 6 1R”, 2' = 1,2,...,l in two classes, and a vector y E 1Rl
such that each y; 6 {+1, ——1}, a C-SVM for non-separable data considers the following
primal optimization problem [46]:

. 1 T l T
minEW w + C: (Ji,-y,- (w K(sz-,x) + b)

such that derivatives of the objective furiclt-ilon vanish with respect to a,- and subject to the
constraint that a, 2 0,2’ = 1,2, . . .,l . In the objective function w is a perpendicular to the
hyperplane that separates the positive and negative points, 0 is a parameter that is used
to cost the ai’s, K (92-,x) is a non-linear kernel that maps the input data to another

(possibly inﬁnite dimensional) Euclidean space, and s,- ’s are points called the support

vectors that maximize the separation between the positive and negative examples. We use

a degree-3 radial basis kernel function to train the C-SVM.

143

 

CHAPTER B.4 DATA COLLECTION AND
SIMULATION

In this section, we explain the two main datasets collected for this study. The ﬁrst
dataset comprises benign trafﬁc and keystroke proﬁles collected from several hosts with
regular human users. The second dataset comprises real and simulated malware trafﬁc.
Since university policy and user reservations prohibited us from infecting operational
endpoints with malware, we ﬁrst identify network- and host-based features perturbed by
the introduction of malicious code into each system and then perform ofﬂine analysis by

inserting malicious trafﬁc at random instances in the endpoints’ benign trafﬁc proﬁles.

B.4.1 Benign T rafﬂe-Keystroke Proﬁles

Our ﬁrst step towards the development of a network-based malware detector was to
collect pertinent network and OS-based data. We started by investing up to 12 months in
monitoring network/OS proﬁles of a diverse set of 13 endpoints. Users of these
endpoints included home users, research students, and technical/administrative staff with
Windows 2000/XP laptop and desktop computers. The laptop endpoints were used by
their users both at home and at work. Some endpoints, in particular home computers,
were shared among multiple users. The endpoints used in this study were running
different types of applications, including peer-to-peer ﬁle sharing software, online
multimedia applications, network games, SQL/SAS clients etc.

Data were collected by a multi-threaded windows application called argus, which

runs as a background process storing network and keystroke activity in a log ﬁle. The log

144

ﬁle is periodically and securely uploaded to a secure copy (SCP) server. argus only
logs session-level information where a session corresponds to bidirectional
communication between two IP addresses. Communication between the same IP address
on different ports is considered part of the same network session. This session-level
granularity reduces the complexity of the malware detector, while providing complete
information about sessions originating from or terminating at an endpoint. Each session is
logged using the information contained in the ﬁrst packet of the session. A session
expires if it does not send/receive a packet for more than 7' seconds. In the collected data,
7' is set to 10 minutes.

For each logged session, argus also logs the last keystroke or mouse click that was
pressed before the ﬁrst packet of the session. We generically refer to keyboard and mouse
inputs as keystrokes or keys in this thesis. The last keystroke is associated with a session
only if the key was pressed no more than A seconds before the session. If there was no
key pressed in the last A seconds before a session then a void keystroke value of zero is
inserted. In the collected traces, A is set to 10 seconds. Throughout this thesis, we only
focus on sessions with non-zero keys. We assume that the last pressed key has initiated
the associated session, that is, an inherent correlation relationship is assumed between the
last key and the consequent session. Clearly, this correlation will not be present when a
malicious code is trying to propagate from an oblivious end-user’s computer, and hence
perturbations in the session-keystroke correlation can be leveraged at that point to detect
the malicious code.

Each entry of the log ﬁle has the following seven ﬁelds:

145

<session id, direction, protocol, src port, dst port,

timestamp, virtual key code>,

whose explanation is given below:

0 sess ion id: 20-byte SHA-l hash [47] of the concatenated hostname and remote IP
address. Hashing preserves privacy, which is important because the collected data are
going to be publicly available;

0 direction: one byte ﬂag indicating outgoing unicast, incoming unicast, outgoing
broadcast, or incoming broadcast packets;

0 protocol: transport-layer protocol (i.e., TCP or UDP) of the packet;

o src port: source port of the packet;

0 dst port: destination port of the packet;

0 t ime s tamp: millisecond-resolution time of session initiation;

0 virtual key code: one byte virtual key code, as deﬁned by Microsoft’s MSDN
library [48], of the last (keyboard or mouse) keystroke that was pressed before the
session. In view of our stringent privacy considerations, we only log the very last
keystroke that was pressed right before the ﬁrst packet of a new session. Throughout
this thesis, we refer to this jointly collected session and keystroke data as session-key
or key-session data. Moreover, keystrokes observed in this joint proﬁle are referred to
as the session initiation keys.

Some pertinent statistics of the collected benign data are listed in Table 6. Diversity
of the endpoints used in this study is evident from Table 6, which shows that the
endpoints operate in different environments (and hence run different types of

applications). Also, the total size of the dataset (i.e., total number of sessions) varies from

146

Table 6. Statistics of the Benign Proﬁles

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Endpoint Endpoint Total Mean lCumulative'Cumulative Cumulative

ID Type Sessions Session Freq of Freq of Freq of Ten

lHome/Ofﬁce/ Rate (sps) Ten Most— Ten Most— Most-Used

Univ Used Src Used Dst Session Keys
Ports (%) Ports (%) (%)
1 Ofﬁce 33,487 0.25 90.37 88.06 96.01
2 Ofﬁce 21,066 0.22 47.8 87.53 92.32
3 Home 373, 009 1.92 3.95 37.29 94.01
4 Home 444, 345 5.28 5.86 10.82 94.86
5 Home/Univ 27,873 0.44 15.91 99.27 95.25
6 Univ 60,979 0.19 54.95 94.0 95.49
7 Univ 171,601 0.28 40.7 96.75 95.56
8 Univ 41,809 0.52 66.1 96.44 96.13
9 Univ 235,133 0.41 44.1 94.84 95.48
10 Univ 152,048 0.21 75.19 95.11 95.27
11 Univ 207,187 0.31 38.85 95.2 95.14
12 Home/Univ 100, 702 0.33 24.78 95.0 95.13
13 Univ 11,996 0.23 44.56 95.98 95.95

 

 

 

 

 

 

 

 

 

11, 996 for endpoint 13 to 444,345 for endpoint 4. In general, we observed that home

computers generate signiﬁcantly higher trafﬁc volumes than ofﬁce and university
computers because: (1) they are generally shared between multiple users, and (ii) they run
peer-to-peer and multimedia applications. The high trafﬁc volumes of home computers
are also evident from the high mean sessions per second [column 4].

Another interesting observation is that, with the exception of home computers, the
observed endpoints generally use a small set of source and destination ports very
frequently [columns 5 and 6]. (The source and destination port frequencies in Table 6 are
computed for outgoing unicast packets.) This observation holds particularly true for
destination ports because in most cases ten destination ports are used approximately 90 %

of the times — endpoints 3 and 4 being the exceptions here. This is a preliminary

147

indication that port usage is a statistic that is somewhat consistent across endpoints, and
therefore can be leveraged to detect malicious activity. Also, later in the thesis it is shown
that the different benign behavior of home endpoints poses a considerable challenge to
malware detectors.

The last important observation is that without exception all of the observed endpoints
use a small set of session initiation keys very frequently [column 7]. (The session
initiation key frequencies in Table 3 are computed for outgoing unicast packets with non-
zero keys.) In fact, on all hosts more than 90% of the sessions are initiated using 10
keys. This is a preliminary indication that the correlation of the session-key data is
consistent across endpoints and therefore can be leveraged to detect malicious activity.

The joint session-key data described above provides us correlated information of
keystroke and sessions. In other words, this data can be used to develop a joint session-
key probability distribution. In addition to the correlated/joint data, the keystroke-based
detectors proposed later in this thesis also requires marginal distributions of keystrokes.
That is, we need a distribution of all the keystrokes that are pressed on an endpoint. The

following section describes this data.

B.4.2 All-Keystrokes’ Profiles

To develop a marginal distribution of keystrokes, we had to log all the keys that are
pressed on a host. Due to strict privacy constraints imposed by the university, and due in
part to user reservations, it was not possible to collect such data on all the participating
hosts. We installed a custom-developed keylogger on two computers [endpoints 5 and

12] and collected keystroke data for more than a month. Each entry of the keylogger

148

contains two ﬁelds: <timestamp, keystroke>, which are in the same format as
described in the last section.

This dataset is referred to as the all-keys data. For the remaining endpoints, an
average of the all-keys data of endpoints 5 and 12 is used for the keystrokes’ marginal
distribution. This marginal keystroke distribution is simply a normalized histogram of the
frequency of usage of the keystrokes.

In addition to benign data, we have also collected malware data generated by real

malicious codes. The following section explains collection of the malicious trafﬁc data.

B.4.3 Malware Classiﬁcation

To generate traffic patterns for each malware, we infected a vulnerable machine with
a malware and observed the traffic generated by the malware using the argus data
utility described in the previous section. (The vulnerable machines used here are different
from the operational endpoints used for benign proﬁle collection.) This section details the
malware collected and simulated in this study. Before we describe malware data
collection, explanation of some terminology is in order.

After compromising a vulnerable host, a malware tries to infect other computers by
sending out scan packets with infectious payloads. A vulnerable machine gets infected if
it receives and processes a scan packet. Throughout this text, scan packets generated by a
malware aﬁer compromising a vulnerable host are referred to as outgoing scan packets.
Based on the outgoing scan packets, we classify malware into two broad categories:

0 Destination-port malware: destination ports of scan packets are ﬁxed, but the source

ports may be arbitrary;

149

0 Source-port malware: source ports of scan packets are ﬁxed, but the destination ports
may be arbitrary.

In the former case, we call the destination ports of a malware attack ports and source
ports non-attack parts. In the latter case, the roles are reversed and we call source ports
attack and destination ports non-attack. With the exception of the Witty worm [57],
[60], all contemporary malware are destination-port malware. However, the above
classiﬁcation is important to understand later results. Note that a source/destination port
malware can be multi-vector [41] targeting multiple vulnerabilities simultaneously. We

now describe the malware used in this study.

B.4.4 Real Malware

A critical aim of our study is to use real and diverse malware data to test our detection
techniques. To this end, we installed original and unpatched releases of Windows 2000
and Windows XP on a computer using Microsoft Virtual PC 2004 [49]. The advantage of
using virtual machines (VMs) was that once a virtual host was infected, we could
reinstall it by overriding just a few key ﬁles. We assigned static IP addresses to both
virtual machines and connected them to the Internet. These hosts were then compromised
by the following malware: Zotob . G [50], Forbot-FU [51], Sdbot-AFR [52], and
Dloader-NY [53]. We also requested network administrators and research
collaborators in our university to share malware binaries and source codes with us. This
way we acquired SoBig . E@mm [54] and the C source code of MyDoom.A@mm [55],
which are mass-mailing worms. Finally, we downloaded binaries or source codes of the
following worms from the Internet: Blaster [56], Rbot-AQJ [57], and RBOT . CCC

[58].
150

Table 7. Information of Malware Used in This Study

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Malware Release Date Avg. Scan Rate (sps) Port(s) Used
Blaster Aug 2003 10.5 TCP 135, 4444 , UDP 69
Dloader-NY Jul 2005 46.84 TCP 135 , 139
Forbot-FU Sep 2005 32.53 TCP 445
MyDoom-A Jan 2006 0.14 TCP 3127 - 3198
RBOT . CCC Aug 2005 9.7 TCP 139 , 445
Rbot—AQJ Oct 2005 0.68 TCP 139, 769
Sdbot-AFR Jan 2006 28.26 TCP 445
SoBig . E Jun 2003 21.57 TCP 135 ,UDP 53
Zotob . G Jun 2003 39.34 TCP 135 , 445 ,UDP 137
Witty Mar 2004 357.0 UDP 4000
CodeRed II Jul 2004 4.95 TCP 80
Sim Src Port Simulated 3.57 TCP 1500

 

 

 

 

 

Table 7 shows the diversity of the malware used in this thesis. The malware have
different (and sometimes multiple) attack ports and transport protocols. Also, these
malware include both high- and low-rate malware; Dloader-NY has the highest scan
rate of 46.84 scans per second (sps), while MyDoom-A and Rbot-AQJ have very low
scan rates of 0.14 and 0.68 sps, respectively. We show later that the low-rate MyDoom-
A and Rbot—AQJ are more difﬁcult to detect than high-rate malware. Blaster is one
of the two worms that are used to generate negative examples for SVM training later in

the document.

All real malware collected for this study fall into the widely prevalent category of
destination-port malware. While these malware provided us with a good base for
evaluating our proposed techniques, we wanted to test our methods against an even
broader class of attacks. Consequently, we simulated three additional malware that were
somewhat different from the ones described above. These simulated malware and their

distinguishing characteristics are described next.

151

B.4.5 Simulated Malware

The ﬁrst malware simulated for this study is the Witty worm [59], [60]. Among
other distinguishing characteristics, this worm has two unique properties that are of direct
consequence here: (a) it uses a ﬁxed source port 4000 to propagate, while the destination

port is selected randomly; and (b) after every 20,000 transmitted packets Witty

overwrites a random block on the hard disk of the compromised host. Therefore, Witty
not only falls in the rare source-port malware category, but it also potentially crashes

compromised hosts after dispatching only 20, 000 scan packets. On an endpoint with

broadband connectivity, Witty demonstrates an average scan rate of 357 sps, peaking

out at 970 sps [60]. At this rate, 20, 000 scan packets can be transmitted (and the infected

host crashed) in less than a minute, which presents a tremendous challenge to real-time
detectors. We simulated the Witty worm using the exact pseudo random number
generator parameters and pseudo code provided in [60]. We only test the worst-case

scenario with 20, 000 scan packets at the average scan rate of 357 sps.

In addition to Blaster, we employ Witty as the second worm for training the
SVMs in the network-based malware detector proposed in the following chapter. To
comprehensively evaluate the performance of the proposed detector for source port
malware, we simulate a worm that sends scan packets with a ﬁxed TCP source port of
1500 at an average scan rate of 3.57 sps; note that this scan rate is exactly 100 times
less than Witty’s average scan rate, which makes this simulated worm challenging to
detect.

The last simulated malware of this study is an HTTP worm. We acknowledge that it

is unlikely that an endpoint will be running a service that can be infected by an HTTP
152

malware. Nevertheless, we simulate an HTTP worm because they use destination port
80 , which is a very common port in the benign proﬁle of an endpoint. Thus it is quite
challenging for network-based frequency/histogram detectors to detect malicious HTTP
trafﬁc. We simulate the HTTP-based CodeRed II worm [61] using an average scan

rate of 4.95 sps [62]. Table 7 gives additional information about the simulated malware.

B.4.6 Inserting Malware Data in Benign Traffic Proﬁles

A vulnerable VM was infected with each of the malicious codes. We then used
argus to log malicious trafﬁc traces from the VM in the same format as the benign
session-key data. While this provided us complete information about the malicious
sessions, we did not have information about the keystrokes that a user will be pressing
when a malicious code is trying to propagate after compromising his/her machine. The
only way to realistically generate such data is to infect participating endpoints with
malicious codes without informing the user of that endpoint. Clearly, such a procedure is
not possible. Therefore, for each malicious session we generate an associated keystroke
using the marginal keystroke distribution generated from the all-keys data.

Armed with this information, we insert T minutes of malicious trafﬁc data of each
malicious code in the benign session-key proﬁle of each endpoint at a random time
instance. Speciﬁcally, for a given endpoint’s benign session-key proﬁle, we ﬁrst generate

a random infection time t I (with millisecond accuracy) between the endpoint’s ﬁrst and
last session times. Given n malicious sessions starting at times t1,...,tn , where tn 3 T ,

we create a special infected proﬁle of each host with these sessions appearing at times

t1 + t1,...,t1 + t,,. Thus in most cases once a malware’s trafﬁc is completely inserted

153

into a benign proﬁle, the resultant proﬁle contains interleaved benign and malicious

sessions starting at t I and ending at t1 + tn. For all malware used in this study, we use

T = 15 minutes.

We are now ready to use the infected proﬁles to characterize trafﬁc and keystroke
perturbations observed when an endpoint is compromised by a malicious code. In the
following two chapters, we propose two malware detection techniques that use the data

described in this section.

154

CHAPTER B.5 MALWARE DETECTION
USING TRAFFIC FEATURES

In this chapter, we propose the ﬁrst of the two information-theoretic malware
detection techniques developed in this thesis. This technique is purely network-based and
does not utilize the keystroke data described in the last section. Thus in this chapter
malware are detected using only trafﬁc perturbations. Like prior endpoint-based studies,
throughout this thesis we focus solely on outgoing unicast trafﬁc since incoming packets
can be easily blocked using ﬁrewalls.

We observe that the vulnerabilities targeted by all malware are associated with a
small number of source or destination ports. Thus, on a compromised machine the
distribution of source or destination ports on which a host communicates should be
perturbed aﬁer infection. These perturbations can be quantiﬁed using information-
theoretic measures. This chapter evaluates the efﬁcacy of using port perturbations as
features for malware detection and identiﬁes appropriate measures to quantify these

perturbations.

B.5.1 Malware Detection Using Sample Entropy

Lakhina et al. [13] in a recent work showed that sample entropy of source and
destination ports observed at a border router can reveal trafﬁc anomalies. We ﬁrst
evaluate whether entropy is an appropriate feature to detect trafﬁc anomalies at an
endpoint and then propose an alternative framework that signiﬁcantly surpasses the

performance of prior detectors when used at endpoints.

155

B.5.1.1 Entropy of Source and Destination Ports

Entropy characterizes the degree of dispersal (or concentration) of a probability
distribution without regard to the actual values of the random variable under
consideration. This degree of dispersal is characterized by the variance of a probability
distribution. To compute sample trafﬁc entropy as proposed in [13], we generate usage
frequency histograms of source and destination ports for outgoing packets using a 20-
second window (other window sizes produce qualitatively similar results). Source and
destination port histograms for each window are computed by counting the number of
times a particular port is used during the window.

Let 3,, and Du denote the sets of source and destination ports observed in window
n , respectively. Deﬁne Xn = {pznn' 6 Sn} and Yn = {qS-z, j 6 Dn} to be respectively
the source and destination port histograms derived from window 11. , where p,” is the
number of times source port 2' was used in time-window n and q? is the number of
times destination port j was used in time-window n. Also let us deﬁne

1),, = 2:2 E S p,” as the aggregate frequency of source ports observed in window 71. and
ﬂ

qn = Z . q” as the corresponding frequency of destination ports. Then sample
JEDn ‘7

entropies of the source and destination port histograms for window 11. can be computed

as:
(A.49)

2—2pzlog2— n,,=andH(Y) —Z q} log LIZ.
zESn p" 3'61)” qn qn

156

 

 

  
 
  

 

 

 

 

 

 

     

 

 

 

 

 

 

 

   

 

 

 

   

>' 1, I I I . I I I I >‘ 1 1 1 . . . . I ,
3 I I ' 3 a
.1 I ~ u I"! . .1 l
s 1.011.111 1.1.1 1111—41 5 II III I IIII II 11 111111111111
t 05 .. . . .1' t 05 . - ' u; .1:
8 I ' 8 ';.
g o 41.-rams? 'au-u1m.‘n-Icrn’ “1m \i‘riﬁr- % U ”i" J "" r
2000 4000 6000 80001000012000140 1600018000 1000 2000 3000 4000 5000 6000 7000
a, >§a f 1
a" 3‘
O .5
‘3 2 5 _‘1 1- hike-1-. 53115.1”, .11 1.415.251
o :1. 14...... II
1: I a
2 0 4. ' 4-21:;— h'ui- Edy... ---_1I-L_ g 0 3". af_',_—-:'.'a'_'£~:je,d,m Lad-‘1: SIEGJH-lu- -
U) 2000 4000 6000 80001000012000140001600018000 1000 2000 3000 4000 5000 6000 7000
tIme W1 1W”d° time window
(a) endpoint l, Blaster (b) endpoint 5, MyDoom E
‘5 '31LI41111W'41LLII11ﬂ :3
m 05 - ‘1- “I' 'élh 02
t . - t
8 . o 81
.8 0 1m: 1.7;. 1.32mi.” 1725'. 8 1~0_ % 0
x10
g ' I g5
*3 1'5 1311.14.11.11 11131Ilki ’31 1:14.. til;- ‘11,; ‘31 H
O 1 L I 0 1 -‘
t: I j 1: " r: 1 .- _,
8.0.5? . . g5 _ __.. .'
2 I - a. .. 1- . . ., ..I - 8 1... E- imil‘ -- v; “1
m 0 A2 .4 . a 10 m 0 200.0 300?—
tlme wmdow x 10‘ tIme wm ow
(c) endpoint 9, Rbot—AQJ (d) endpoint l3, Witty

Figure 35. Source and destination port entropies at infected endpoints. Infection start
times are marked with a circle. Infections in (a), (b), and (0) last approximately 15
minutes, while that in (d) lasts approximately one minute. Each non-overlapping time-
window is 20 seconds.

If there is no trafﬁc in a window n (i.e., pn = 0 or qn = 0) then malware detection is
not performed. For simplicity, we refer to sample entropy as entropy and the normalized

port histograms as part distributions.

B.5.1.2 Entropy-based Trafﬁc Perturbations in the Infected
Proﬁles

Figure 35 shows source and destination port entropies of four different endpoints

infected with one random instance of Blaster, MyDoom—A, Rbot-AQJ, and Witty.

157

Figure 35 shows that the attack port entropies do not reveal any discemable perturbations.
However, in some cases entropy perturbations in non-attack ports can provide useful
information about infection. For instance, Figure 35 (a) shows that the entropy of source
ports exhibits a sudden increase at the time of infection. Similar behavior is observed in
Figure 35 (d), where the non-attack (destination) ports’ entropy jumps at the infection
time. This phenomenon is not observed for low-rate (MyDoom-A and Rbot-AQJ)
malware as shown in Figure 35 (b) and (c). The jump in non-attack ports’ entropy for
high-rate malware is due to the fact that most endpoints initiate only a few sessions
during any given time-window [see Table 6]. Once compromised, while the attack port is
ﬁxed, an endpoint starts communicating through a large number (i.e., one per scan
packet) of non-attack ports. Thus the degree of dispersal of non-attack ports increases
dramatically, in turn leading to an increase in the entropy. Since low-rate malware do not
initiate a lot of simultaneous sessions, no perturbations in the non-attack port are
discemable for low-rate malware. Thus we conclude that entropy cannot detect
perturbations caused by low-rate malware.

Results of Figure 35 are at odds with [13] which showed that the entropy of
destination ports was perturbed signiﬁcantly on compromised networks. The failure of
entropy-based anomaly detection in Figure 35 is due to the huge difference in the volume
of trafﬁc observed at a network’s perimeter as opposed to that observed at an endpoint.
During an attack, a perimeter router still observes a considerable amount of trafﬁc on
benign ports, thus perturbing the port distributions enough so as to allow entropy-based
detectors to discern the attack. However, this phenomenon does not occur at individual

endpoints as explained by the following example. Consider two windows of activity

158

 

observed by an endpoint’s entropy-based anomaly detector. The ﬁrst window has benign
activity with 9 HTTP sessions on port 80 and 1 FTP session on port 21. The second
window contains malicious activity with 900 malicious sessions on port 135 and 100

malicious sessions on port 500. Aﬁer normalization, both of these windows will render
the same port distribution {pf/pmz' 6 Sn} = {0.9,0.1}, which is a Bernoulli random

variable with parameter 0.9. Consequently, the entropy of both malicious and benign
windows will be exactly the same although the trafﬁc behavior in each case is completely
different. Thus anomalous activity can go undetected because entropy does not take the
actual values of source/destination ports into consideration.

Results presented in this section show that the entropy-based framework of [13] is not
a robust indicator of infection when applied to attack ports and therefore cannot be used
to detect non-attack port perturbations for low-rate malware. Entropy fails to highlight
anomalies because it does not take the actual values of the source and destination ports
into consideration. To address this problem, the following section employs an
information-theoretic measure that compares port probabilities in the current window to

the corresponding port probabilities in an endpoint’s benign proﬁle.

B.5.2 Malware Detection Using Information Divergence

At this point, we have established that for effective malware detection we need to use
a measure that compares the frequencies of individual ports. Information divergence
measures can provide such comparison. In this section, we evaluate four information

divergence measures. First, we evaluate the widely-used Kullback-Leibler divergence.

159

8.5.2.1 Kullback—Leibler Divergence of Source and
Destination Ports

The Kullback-Leibler (K-L) divergence [43] is an information-theoretic measure of
the similarity or dissimilarity between two probability distributions. Let us denote the
benign source and destination port histograms derived from an endpoint’s benign proﬁle

as X = {pl-,z' E S} and Y = {qj, j E D}, where S and D respectively denote the sets

of source and destination ports observed in the benign proﬁle. Then the K-L divergence

between the benign and currently observed port histograms can be expressed as:

n n 7.3 7.1 (A50)

D(X,, ||X)= Z Pz—IogZ—pz/pn and D(Y,, HY): Z qiloggm.
- pn Pi/P - an qi/q
265,, .7601:

where p = 22.63 p,- and q = 2]. E D qj respectively represent the aggregate source and

destination port frequencies observed in the benign proﬁle. Note that K-L divergence is
an asymmetric measure. The advantages of using window-based metrics Xn and Yn as
primary distributions of the K-L divergence are twofold: (a) fewer sessions are observed
in a window as opposed to the benign proﬁle, |Sn |<l S | and ID” |<| D I, which
reduces the complexity of real-time detection; and (b) better detection accuracy can be
achieved if we focus on the speciﬁc ports engaged in communication during the current

window 12 .

We generate port histograms of benign proﬁles using the ﬁrst 100 sessions on an
endpoint. The training time for the endpoints of this study ranged between 12 hours to 5
days with an average of approximately 2 days. We train with only 100 sessions to

quantify worst-case performance of the proposed detector.

160

To effectively leverage K-L divergence in the present endpoint-based anomaly
detector, we introduce the following provisions. First, in (A.50) if p,- = 0 and p," > 0,
for any 2', then D(Xn H X) is set to 00. This problem persists for D(Yn H Y) in (A.50).
In other words, X and Y must be continuous with respect to X n and Yn , respectively.
To achieve this, before training we initialize the benign histograms with p,- = 1 and
q, =1, for 2': O,...,65535, which assigns never-used ports very small, non-zero

frequencies.

Second, it is well-known that scaling of training data improves the performance of
learning tools by making the training process better behaved and by mitigating the bias
towards larger input values. Therefore, we normalize the K-L divergence values by a
constant factor.

Finally, to reduce complexity and to ﬁlter out noise due to benign data, we introduce
a provision to ignore overtly benign behavior. From the training data, we generate a

histogram of session volume (i.e., total number of sessions) in a window. After

normalization, we compute the histogram’s mean He and variance a? for each endpoint
e. We invoke malware detection only when the total number of sessions observed in a

window is greater than 7 = [He + 08]. The value of 7 varied between 3 and 13

sessions per 20 second window, with an average of 6.6 sessions per 20 second window,

for the endpoints considered in this study.

161

3
-.’
X
'25.?“

l

0.!
U)
'02

 

 

 

 

  

 

 

£3.11 1»
111111-114qu
50 100
3.5I .
Y5 ' we
Eﬂs
g 2
(I)
1" 50 , 100 3 150 200
time wmdow
(a) endpoint l, Blaster
.31 2.8 3 . Y D 3
x 2.8 .
E 2.4 I .
.33 2333A.3,3_L3:_.. .1." ”ML. 2.1.1 lla-
U 1.8 1£33333..3 131-3131 ‘19; .

 

W
200 400” 600 800 1000120014001600ﬁ

Y ﬁ 1 1 f

35*

  

1’
X
0 F
L
U)

235:3."33 vii “I 'I if ‘1133‘ thy-1 1“ ”1"“?- q

200 400 300 1 120014001600
time win ow

(c) endpoint 9, Rbot-AQJ

 

 

9’
Ln

 

U

dst port KL

 

 

v

 

43.5 1
x .ltdt ._ Iii—nun“ i.‘ I

3 .1. "5:" iﬁ—EJ’TM
t -1."_-"_.. «q.- - .'. "34' 3.
82.5 Ft"? "' I' 95F F“! V‘srvn I
2 2-
«n

1.5

 

 

100 200 ,300 .400 500 600 700
time wnndow

(e) endpoint 10, Zotob

 

U

N.
”mum

dst port K-L

 

 

1.1.1.- 312..

f

00"r.t1il 5:1“?! I
25213 1400

9’
(.11

v’

33 E5. bummﬁniarm a.- gunman“.- L};

'HIIlI M“? 1"“ In" ' 'l' 'II'F"11

srqport K-L

 

 

h.)

200 400 600 800 1000 1200 1400
time window

(b) endpoint 5, MyDoom

 

111.... ..‘4. __...1 4,111”. l._

I LIP.

2: 1h? ﬁi'ﬁnd ' 51M ‘M

L

1000 2000 3000 4000 5000 6000 7000

 
 
  
  

dst port KL

3" _J':..“";;:.::._.._' __..___A.___ --__.__.,__..._"' 1......“ " 'it.

 

 

 

 

 

 

 

 

_J
x
E25 ggprﬂvwnli'iwrf'vr ‘ '-'§.'-,‘,-*
0
a 2

1000 2000 3000 40‘ 5000 6000 7000'

time wm ow
(d) endpoint 3, SoBig

32.. F113
5 2W 11.111.111.11
31.5 “"1.
g 1

50 100 150 200 250 300 350 400 450
'33s
‘1: arm. 11.11.31.“ ”.13.; “‘11-!-
8 1W! .1; Ir ‘é'I
o 2.5 ' [fl T
«7:

 

 

50 100 150 200 300
time wmdow

(i) endpoint l3, Witty

350400 ASOL

Figure 36. Source and destination ports’ K-L divergences at infected endpoints.

162

B.5.2.2 K-L-based Trafﬁc Perturbations in the Infected
Proﬁles

The K-L divergences of different endpoints randomly infected with a single infection
of each malware are outlined in Figure 36. We ﬁrst focus on malware with high scan
rates. From Figure 36 (a), (d), (e), and (f), it is clear that the K-L divergence highlights
anomalous behavior in both attack and non-attack ports for malware with high scan rates.
Comparing Figure 36 (a), (d), (e), and (t) with entropy-based perturbations of Figure 35
(a) and ((1) establishes the effectiveness of using a port-by—port divergence measure to
highlight trafﬁc anomalies. Speciﬁcally, a K-L—based anomaly detector can reveal
perturbations in the attack port distribution, which is an important characteristic that was
completely missed by entropy. Moreover, for high scan rate malware of Figure 36 (a),
(d), (e), and (f), perturbations in the non-attack port distribution in the K-L divergence are
much more profound than the entropy perturbations. These perturbations are revealed for
both destination [i.e., Blaster, Zotob.G, and SoBig.E] and source [i.e., Witty]
port malware.

For the low-rate malware [MyDoom—A and Rbot—AQJ], Figure 36 (b) and (c) show
obvious perturbations in the attack port divergence. Comparing these two ﬁgures with
Figure 35 (b), (c) clearly establishes the advantages of using K-L-based detection features
as opposed to entropy. Due to the low rate of these malware, even the K-L divergence
cannot reveal non-attack-port perturbations. Nevertheless, our results show that

perturbations in the attack port feature are more than sufﬁcient to detection infection.

163

B.5.2.3 Evaluating Trafﬁc Perturbations with Other
Information Divergences

In the last section, we observed that the attack ports’ K-L divergence always gets
perturbed on a compromised endpoint. However, the non-attack ports are not perturbed
for low-rate worms. In this section, we evaluate three other information-theoretic
divergence measures with the objective of identifying a measure which can
simultaneously highlight perturbations in both attack and non-attack port distributions for
low-rate worms. Brief description of the three measures follows.

As before, let X and Y respectively represent the source and distribution port

distributions observed in the benign proﬁles, and Xn and Yn respectively represent the

source and distribution port distributions observed in window n. The ﬁrst information
measure that we employ is the Jenson-Shannon (J-S) Divergence measure [44] deﬁned

as:

J(Xn H X) = H(vr1Xn + 71'2X)— H(7r1Xn) _ H(7T2X) (A51)
and J(Yn H Y) = H(7T1Yn + 7T2Y) — H(7r1Yn) — H(7rZY),

where 7r1 and 7r2 are weighting factors such that 7r1 + 7r2 = 1, and H (.) is the entropy
function.

The second information divergence measure used in this work is the K directed

divergence measure deﬁned as [44]:

 

7‘ (A52)
192 pn
Xn H X) :1): p—"2 —1082 n /
2'65" 2[__ P7; + &]
21711 2p

164

 

 

n n
‘1' (Ij 4n
and K(Yn HY): Z i1082—0—/—_—,
JED" q" £4.31
ZQn 2Q

where parameters of the above expressions are deﬁned in (A49) and (A50).

The third and last information measure that we use is the Resistor-Average (R-A)

divergence measure deﬁned as [45]:

1 1 1 (A53)

R(Xn||X) D<Xnux>+D(X—uxn)

1 1 1
+

3“dean Bunny) D(Y||Y_n)’

where D (.II.) is the K-L divergence.

Figure 37 shows source and destination port perturbations characterized by J-S, K
and R-A divergences. We only show perturbations for the low-rate My-Doom worm
because perturbations due to high-rate worms are adequately highlighted by the K-L
divergence. Figure 37 clearly shows that none of the three divergences under
consideration can highlight perturbations in the source (non-attack) ports. Thus in the
non-attack ports’ context, the divergences under consideration do not provide any
advantages. Also, from Figure 37 (a) and (b) it can be seen that even the destination
(attack) port perturbations in the J-S and K divergences are not as clear and profound as
the K-L case [see Figure 36 (b)]. The R-A divergence, however, provides clear

perturbations in the destination ports [Figure 36 (c)]. This divergence measure also has

the advantage of being symmetric, i.e., R(Xn||X)=R(X||Xn) and

R(Yn H Y) = R(Y H Y”). On the other hand, R-A divergence is more complex that K-L
165

 

E;
ge

 

 

 

 

 

 

 

 

 

 

I— 1»
<9 311:1
ﬁ 1 anode V1171. 7.43ka L. .1
5099 3': if _,.
. 0.996
a. T .
1: I I I l n f“ “ ”w
130.98 00,994 ’IHH i :u 1i. ‘I‘rnfﬂ
‘3 3
7., -- . u, -. .
°9 200 400 600 300 1000 1200 1400 8°”? 200 400 600 800 1000 1200 “00
a:
1&5 E’ 1
o
"9 1:93“? 4.}; WE . .>
T 1 11* r1155
1: I .0990
O X
30995 E0996
(0 O.
0.99 1 1 1 - 1 - - 80,994
200 400 600 800 1000 1200 1400 u, 200 400 600 300 1000 12001400

time window time window
(a) J-S Divergence, endpoint 5, MyDoom (b) K-Divergence, endpoint 5, MyDoom

 

 

 

 

‘F

[:14

t

012 Jy

3 .z‘Lm ISI'LII

g‘jwa'w ~11... .. 91-,
12001400

<2:16 .

0:14 ‘_W—J-¥+-Aﬁ$H-td

§1zwqﬁmewnwrwwlm

810* 1

(I)

 

200 400 600 600 100012001400
time window

(c) Resistor-Average Divergence, endpoint 5, MyDoom
Figure 37. J enson-Shannon (J-S), K— and resistor-average (R-A) divergences of source
and destination ports at infected endpoints.

divergence because it requires computation of two K-L divergences. One of these
divergences has to be computed over the entire sample space of the benign proﬁle,
thereby presenting a signiﬁcant complexity overhead for an endpoint. Hence, in problem
areas where divergence symmetry is important and complexity is not an issue, R-A
divergence is more appropriate than K-L divergence. In the present malware detection
context, we continue to use the K-L divergence for the remainder of this chapter.

In the following section, we train a machine learning tool using the K-L divergence of
benign and malicious data, which is then used for automated malware detection and

comparison of our approach with prior methods.
166

B.5.3 Leveraging K-L Perturbations in an SVM-based
Framework

To use K-L divergences of source and destination ports for automated malware
detection, we ﬁrst used a simple thresholding mechanism where K-L values above and
below a certain threshold were flagged as anomalous. This simple technique, however,
resulted in high false alarm rates. Consequently, in this section we resort to the
sophisticated support vector machines (SVMs) [46] for real-time malware detection. We
ﬁrst train the SVMs using K-L divergence values derived from a subset of the benign
proﬁles and malware data. The SVMs are then used to detect malware in the infected
proﬁles. We also compare the performance of the proposed detector with the techniques
proposed in [14] and [20].

From contemporary machine learning tools, we select SVMs to classify K-L
divergence values because: (a) SVMs are not probabilistic in nature. Probabilistic
intrusion detectors generally do not take the basic rate of incidence into account, thus
yielding low Bayesian detection rates. (b) SVMs employ a small subset of training
examples (called support vectors) for classiﬁcation, and all remaining examples are
irrelevant to the classiﬁcation task. Thus SVMs can train with very few positive (benign)
and negative (malware-based) examples, allowing timely and low-complexity training.
Few negative examples also improve detection rates for novel malware. (c) SVMs are

inherently designed for binary-decision tasks, such as anomaly detection.

B.5.3.1 SVM Training

In this section, we use a small subset of malicious and benign data to train support

vector machines for real-time detection of malware using the K-L divergence. Given

167

 

positive and negative training examples, an SVM ﬁnds a classiﬁcation boundary that
maximizes the distance between the two classes, while minimizing classiﬁcation error.
We use a degree-3 radial basis kernel function to train a C-SVM [46].

It should be noted that the use of malware-based, negative training examples does not
compromise the proposed detector’s ability to detect novel malware. The negative
examples only provide a rough quantiﬁcation of the magnitude of K-L perturbations on a
compromised endpoint. This quantiﬁcation can be provided by any malware that
highlights perturbations in the source and/or destination ports’ distributions. In general, a
high-rate source-port malware in conjunction with a high-rate destination-port malware
can encompass the increase and decrease in the K-L divergences of source and
destination ports. Malware traces can be hardcoded into the detector, and the training
algorithm can merge the malware traces with an endpoint’s trafﬁc logs to compute the
negative K-L divergence examples.

We use the source and destination ports’ K-L divergence values to train two SVMs
for each endpoint. To train the SVMs, we take ten K-L divergence values from the
benign trafﬁc proﬁle. These values comprise the positive examples. We then take a total
of 13 negative examples by computing K-L divergence of benign trafﬁc windows with
Blaster- and Witty-infected windows. Performance evaluations in the next section
illustrate that this small subset of the available training data can provide highly accurate
detection of novel malware, where the term novel refers to all the remaining malware not

used for SVM training.

168

 

 

 

I
* Proposed K-L/SVM-based Detector
'3‘ Maximum-Entropy etector
e- Rate-lettlng Detector

1MP. _________

    

u
D

        
  
  

°/o

    
     

n rate
8
8 8

O
0

   

U"

et

E“?-

  

 

average false alram rate %

avera

 
   

 

 

seridpdintilDé 16 1A1 1'2 13 1 2
(a) detection rate (b) false-alarm rate
Figure 38. Comparison of detection and false-alarm rates of the proposed K-L/SVM-
based malware detector with maximum-entropy and rate-limiting detectors. Each point is
averaged over 12 malware with 100 random infections per malware per endpoint.

B.5.3.2 Performance Evaluation and Comparison with
Existing Techniques

In this section, we evaluate the performance of the proposed malware detector with
two existing techniques proposed in [14] and [20]. The rate-limiting detector [20] is the
only other technique that is designed speciﬁcally for endpoints and the maximum-entropy
detector [14] is one of the only two information-theoretic anomaly detection techniques.
We use the same parameters values and leaming/detection algorithms that were
employed in [14] and [20]. We also tried to compare with the entropy-based technique by

Lakhina et al. [13]. However, we observed that it was impractical to migrate the detector
0f [13] to endpoints because the detector required projection of high-dimensional feature
metrics into benign and anomalous subspaces at a border router. On an endpoint, the
Same technique will result in only 3 possible subspaces, and in most cases it is not
Possible to classify them as benign and anomalous using the thresholding technique of

[13].

169

We inserted 100 non-overlapping infections of each malware in every endpoint’s
benign proﬁle. As discussed earlier, each infection was approximately T = 15 minutes,
with the exception of Witty that had each infection lasting approximately T =1

minute (i.e., 20, 000 packets at 357 sps). Hence, all results provided in this section are

averaged over one hundred experiments per endpoint per malware. We compute detection
and false alarm rates for each experiment as follows. For 100 infections of a particular
malware on an endpoint, the percentage detection rate for that malware is computed by
simply counting the number of infections that are detected by the malware detector. The
false alarm rate is computed by taking the ratio of the total number of false alarms with
the total evaluated time-windows (i.e., windows with one or more sessions).

The average detection and false alarm rates for each endpoint are shown in Figure 38.
It can be seen in Figure 38 (b) that the proposed K—L/SVM-based detector has negligible
false alarm rates at all endpoints. The highest false alarm rate we observed was
approximately 0.45 % , with endpoints ]l, 6 , 7 , and 8 exhibiting almost no false alarms
at all. Also, the detection rate of the proposed technique is 100 % for all endpoints except
endpoints 2 and 4; for endpoints 2 and 4 , some instances of the low-rate MyDoom-A
and Rbot—AQJ worms were not detected. Nevertheless, even for endpoints 2 and 4 , the
average detection rate is above 90% . Hence, overall the proposed K-L/SVM-based
malware detector provides very high accuracy for the diverse set of endpoints considered
in this study.

Let us now compare the proposed detector and the maximum-entropy detector of
[14]. Figure 38 (a) shows that the proposed K-L/SVM-based detector provides much

higher detection rates than the maximum-entropy detector. Also, for the maximum-

170

 

entropy detector, the false alarm rates for the home endpoints [endpoints 3 and 4] are
extremely high. We believe that the high false alarm rates are due to peer-to-peer
applications running on the home endpoints of this study. Moreover, maximum-entropy
detector was designed for deployment at the perimeter where even in a short period of

time most of the 2,348 packet classes of [14] were observed. On an endpoint, many of

these classes are not present in the benign training data. We observed that even if the
maximum-entropy training is performed using a lot of benign data, the performance still
does not improve. (The maximum-entropy model was trained using 100 and 1000
benign sessions, but the performance in both cases was identical.) Also note that due to
the use of a sliding window, the maximum-entropy detector has higher training
complexity and incurs an inherent detection delay that is not present in our detector. The
run-time complexities of the two techniques are comparable as the maximum-entropy
technique requires frequent computation of K-L divergence over a large sample space of

2,348 outcomes, whereas our technique computes K-L divergence over small sample

spaces followed by SVM classiﬁcation.

For the rate-limiting detector [20], the detection rates of all endpoints except endpoint
2 are much lower than the proposed K—L/SVM-based detector. Also, much like the
maximum-entropy detector, the false-alarm rates for home endpoints are quite high. (A
false alarm is raised when the rate-limiter reports an anomaly, but the session queue of
the rate limiter has no malicious sessions.) Thus the performance of the rate-limiting
detector, although better than the maximum-entropy detector, is still much worse than the
K-L/SVM-based detector proposed in this thesis. The inferior performance of the rate-

limiting detector shows that simply monitoring trafﬁc volume at an endpoint is not

171

 

Generate source and
destination port histogram
from benign proﬁle data

 

 

 

 

 

   
   

 

   

Store benign histograms
of source and destination
ports, and ,u and s

 
  

 

Benign data
> d sessions

No

 
  
 
 
 
  

  
   
   

 

 

 
  
  
 
 

 

 

_ . Network traces of a
Tram SVMs usmg K- source port worm

1: divergence or and a destination
benign and worm data port worm

 

 

 

    
 

 
     
 

Source and
destination port

histograms in last
window

   

Observe all sessions
on an endpoint

Sessions in last
window > [1+5

Time since
last detection
> 1 seconds

    

     
 

 

  
    
   
     
   
 
  
 

   
   
 

Use SVMs to classify the
source and destination
ports’ K-L values as
benign or anomalous

Compute source and
destination ports' K-L
divergences using benign and
window-based histograms

  
   
 

 

Figure 39. A generalized ﬂow diagram of the proposed K-L/SVM-based malware
detector. The shaded area contains real-time components.

sufficient. In addition to session volume, the actual characteristics of the trafﬁc must also
be taken into account for accurate detection.

Based on the results of this section, we conclude that the K-L/SVM—based malware
detector proposed in this chapter provides signiﬁcantly better performance than the

techniques of [14] and [20].

172

B.5.4 Summary and Discussion

Figure 39 outlines the data flow of the proposed malware detection technique. In
summary, once deployed the detector initially uses d sessions to characterize benign
source and destination port histograms. Traces of a high-rate source port and a high-rate
destination port malware are hardcoded in the detector. K-L divergence of the benign and
malware-based histograms is used to train SVMs. Parameters of the SVMs are then used
for real-time detection of malware. After every window of t seconds, the detector checks
whether the total number of sessions in the window are more than what was statistically
observed in the benign proﬁle. If so, the detector computes the K-L divergence between
the window-based histograms and the benign histograms. The trained SVMs are then
used to classify the source and destination port K-L divergences.

In this chapter, we proposed a network-based malware detector that can detect self-
propagating malicious codes in real-time by leveraging the K-L divergence of benign and
real-time trafﬁc features. In the following chapter, we use the data collected in this thesis
to develop another malware detection technique that correlates both network and host/OS

features to detect self-propagating malware.

173

 

CHAPTER B.6 MALWARE DETECTION
USING JOINT N ETWORK-HOST FEATURES

Traditional anomaly detectors are either host- or network-based. We argue that
signiﬁcant improvements can be achieved if both network and host features are correlated
and then employed in a joint framework. To that end, in this chapter we propose two
endpoint-based joint network—host anomaly detectors both of which exploit the
observation that when a user is actively using his/her computer most of the benign trafﬁc
is triggered by a small subset of keystrokes and mouse clicks. Based on this observation,
we propose to correlate the last input from the keyboard or mouse hardware buffer with

every new network session in a novel entropy-based information-theoretic framework.

B.6.1 Correlation in the Session-Key Data

As mentioned before, we focus solely on outgoing unicast trafﬁc. Also, for the
present anomaly detector we only focus on the scenario when the end-user is actively
using his/her computer, although he/she may not be accessing the Internet. This is
achieved by only processing sessions with non-zero keystroke values; recall that a zero
keystroke value implies that no key was pressed right before the session. Detection when
a user is inactive cannot employ keystroke data, thereby requiring purely network-based

approaches.

174

 

 

 

    

 

 

 

 

 

 

 

 

 

 

 

 

0.7
0.6~
E!
c>)so.5jeh
g 0.4th
30.31%
(D
J: o.2~
0.1-
0“.“ I_I_I_r—I_I_1_r—r—1_r—r r r l r l r
V N (‘0 (D (I) (D ('0 (D co ‘-
virtual key code
(a)endpoint5
0.7
0.6—
5‘05—
8 o.4~
30.34
9 0.2«
u—
0.14
0j I-Y-I_I-T—l r I T r—fgr r r I r l T
ommwmmmcocoaorxvrscomvmo
x-r co 3 v co rs co on r»
virtual key code

 

 

 

(b) endpoint 12
Figure 40. Normalized histograms of 20 most-used session initiation keystrokes.
Histograms are generated from the session-key data. Virtual keys codes 1 and 13
correspond to the left mouse click and the Enter key, respectively [48].

Figure 40 shows the normalized frequencies of the 20 most-used session initiation
keys for two endpoints. In both cases more than 85% of the times network sessions are
initiated by the left mouse click or the Enter key. (Similar results are

observed for the remaining endpoints.) Figure 41 shows the normalized histograms of all

175

 

the keystrokes that are pressed on a host. Note that the all-keys distribution looks quite
different ﬁ'om the session-key distribution of Figure 40. For one thing, the marginal all-
keys distribution of Figure 41 is much more spread out than the session-key distribution
of Figure 40. That is, the variance of the marginal all-keys distribution is more than the
session-key distribution. Also, contrary to the session-key—based keystroke histogram,
less than 50 % sessions are initiated by the two most-commonly used keys. Lastly, left
mouse click or Enter key are not in the two most-commonly used keys in either
Figure 41 (a) or (b). These results can be summarized as follows: (i) users frequently
employ only a few session initiation keys to trigger network sessions, thus there is strong
correlation between these few session initiation keys and network sessions; (ii)
frequencies of session initiation keys are very consistent across different users,
consequently making this a common benign feature that can be leveraged to detect
abnormal behavior; (iii) frequencies of keys that are generally used on a host are quite

different from frequencies of session initiation keys.

176

 

 

 

0.18
0.16 —
0.14 4
0.12 r
0.1 —
0.08 —
0.06 a
0.04 ~
0.02

frequency

 

 

40

virtual key code

 

(a) endpoint 5

 

0
#

 

frequency

0 O O
.°'—\.°io.°'oo.
AUIMUICDU'I

0.05 1

 

 

O
1

virtual key code

 

 

 

(b) endpoint 12
Figure 41. Normalized histograms of 20 most-used keystrokes. Histograms are generated
ﬁom the all-keys data. Virtual keys codes 40, 38 and 17 correspond to the down arrow
key, the up arrow key and the control key, respectively [48].

Based on the above discussion, we deduce that session-key correlation is a feature
that is common across users and can be used for malware detection. There are two
information-theoretic measures that can formally leverage this observation for real-time
worm detection. The ﬁrst measure is the entropy of the keystroke histogram observed in a

time window. Since entropy quantiﬁes the degree of dispersal or concentration of a
177

probability distribution, according to Figure 41 the keystroke entropy in a malware-
infected window should be higher than the benign windows where only a few keystrokes
are being used to initiate sessions. The second information-theoretic measure that we use
to quantify the keystroke perturbations is mutual information. From Figure 40 it can be
deduced that in a benign time window mutual information of sessions and keystrokes that
are used to initiate the sessions should be very high. On the other hand, in a malware-
infected window this mutual information should decrease as the keystrokes will be drawn
from the marginal all-keys distributions. The following sections formally describe the

entropy and mutual information based detectors.

B.6.2 Malware Detection Using Keystroke Entropy
B.6.2.1 Definition of Keystroke Entropy

Entropy is an information-theoretic measure that can capture the spread/variance of a
distribution quite effectively [43]. Deﬁne Xn = {p3,z' E K n} as the histogram of
keystrokes in a time-window n , where p," is the number of times keystroke i was used
in time-window n . Note that due to MSDN’s virtual key code deﬁnition,

Kn = {1, 2,...,255}. Let pn = ZieKn p,” be the aggregate frequency of keystrokes

observed in window n . Then sample entropy of the keystroke histogram for window n
is
n (A.54)

H(X )=-Z Lyme L"
n 2 -

178

If there is no trafﬁc in a window n (i.e., pn = 0) then malware detection is not
performed. Based on previous results, we know that for legitimate sessions, Xn has

small variance and therefore the keystrokes’ entropy should be low. On the other hand
once a self-propagating malicious code starts initiating sessions, the keystrokes will be
drawn from the marginal keystroke distribution of the all-keys data. Hence the variance

and consequently the entropy of X n should increase.

We compute keystroke entropy on a window-by-window basis. The results reported
in this chapter use a window size of 60 seconds. In each window with one or more

sessions, we compute the keystroke histogram Xn which is used in equation (A54) to

compute the entropy. The marginal keystroke histogram is generated from the ﬁrst 500

entries of the all-keys data.

B.6.2.2 Entropy Perturbations in the Infected Proﬁles

We use the infected proﬁles described in Section B.4.6 to evaluate the performance of
the entropy-based detector throughout this chapter. Since the present detector does not
rely on source and destination ports, there is no need to evaluate against the simulated
malware described in Section 345. Therefore, throughout this chapter we only focus on
detection using the 9 real worms collected for this study. When we used keystroke-
entropy for detection of randomly inserted infections, we observed a number of noisy
spikes due to variations in benign user behavior. We use a median ﬁlter to remove the
spikes that arise due to inherent changes in legitimate user behavior. Henceforth, all
results use an order-7 median ﬁlter.

The entropies of different endpoints randomly infected with a single infection of a

malicious code are outlined in Figure 42. It can be observed in Figure 42 that keystrokes’
179

 

entropy clearly highlights anomalous behavior in all cases. The increase in entropy is
revealed for both high- and low-rate malware, and for endpoints with high and low
session rates. Thus we conclude that entropy of keystroke histograms is a robust feature

that can be leveraged for self-propagating malware detection on network endpoints.

180

 

 

 

 

_n
O

Keystroke Entropy

 

 

 

 

Keystroke Entropy
<> -‘ N (a) h (It a ‘4 O 0

mlljoll “I“,

 

 

300 400 500 1000 1500 2000 2500 3WD
time window time window
(a) endpoint 1, Blaster (b) endpoint 3, Forbot—FU

 

 

   

O
‘1

UI
Q

4

(A!
b 0|

“I

I9
Keystroke Entropy

   

>
O.
.9.
C
Lu
0)
‘6
a
>
O
X

N

  

_.
..

O
O

500 1000150020002500 3000350040004500
time wrndow

2000 10000

4000 6000
time window
(c) endpoint 6, MyDoom-A (d) endpoint 9, Rbot-AQJ

A

_e
O

10

     

9 9
a a 3‘“
9 7 9 7
‘E ‘c‘
in 6 Lu 8
3 5 .92 5
9 9
0-0 4 ‘-‘ I
‘9. g
d) 3 d) 3
x x

2 2

1 1

0 500 IMO 1500 2000 2500 3000 3500 0 200 400 600 800 1 1200

time window time window
(e) endpoint 11, SoBig . E (t) endpoint 13, Zotob. G

Figure 42. Entropy of the keystroke histograms at infected endpoints. Infection start times
are marked with a circle. Infections last approximately15 minutes. Each non-overlapping
time-window is 60 seconds.

181

B.6.3 Malware Detection Using Session-Key Mutual
Information

In this section, in addition to the keystroke distribution, we also characterize the
session information in a probabilistic framework. We show that the conditional mutual
information of the session and keystroke distributions can clearly highlight anomalous

behavior.

B.6.3.1 Mutual Information of Sessions and Keys

Mutual information [43] is an information-theoretic measure of the similarity between
two probability distributions. Consider two random variables X and Y with marginal

distributions p(:z:) and p(y) , and a joint distribution p(a:, y). The mutual information of

these random variables is deﬁned as

m __)_y (A55)
19 (x) p (y)

=ZZP( (my )———10g2

Mutual information is a non-negative measure of the similarity between X and Y , with

I (X ;Y) = 0 when X and Y are independent. In general, I (X ; Y) increases with an

increase in the correlation between X and Y .

To leverage mutual information in the present context, we deﬁne X as a binary
random variable which characterizes the probability of whether or not a session was
initiated in the last time window. That is,

X E {0 => no session in time window, 1 => one or more sessions in time window} .

Moreover, we deﬁne Y as a random variable characterizing the keystrokes’ probability
distribution. Speciﬁcally, the marginal p(Y) distribution is simply the normalized all-

182

 

keys histogram, such as the ones shown in Figure 41. Then the session-keystroke mutual

information can be written as:

 

 

I X;Y = E p :1: = 0, lo M 0,y) + p 2: =1, 1 1(2) Ly) .
( ) yEKn ( .7!) g2 PU? — 0)}?(31) ( y) ng PW - 1)P(3/)
(A.56)

We derive the marginal p(X) distribution using the ﬁrst 500 entries of each endpoint’s
benign session-key proﬁle. More speciﬁcally, p(X) is computed by counting the total
number of windows n with one or more sessions between the 1 -st session and the 500-
th session. We also count the total number of windows N (with and without sessions) in
that time frame. Then, p(X :1): N / n and p(X = 0)=1— p(X =1). The joint
distribution p(:r = 1,y = j) then simply corresponds to the joint probability that a

network session was initiated using keystroke j .

From the data collection chapter, we know that the keystroke information is not

logged when there are no network sessions in a window. That is, we do not have the

distribution p(:r: = 0,y). Hence we cannot use the mutual information expression of

(A56) in its present form. To resolve this problem, we employ a partial mutual

information measure I (X =1;Y), which only uses the p(:1: =1,y) probability

distribution. Since the partial mutual information employs only one outcome of the

random variable X , it can be written as

1 ) (A.57)

 

x:
1(X =1;Y)= Z P($ =1,y)10g2 pp: _

183

 

Note that due to the binary nature of the session random variable X , the partial mutual

information I (X =1;Y) is in fact the self-information of p(a: =1,y) normalized by

p (a: = 1). For brevity, we continue to refer to this measure as mutual information.

The above characterization describes the correlation between network sessions and
keystrokes in a simple and intuitive manner. Based on previous results, we know that for
legitimate activity X and Y are highly correlated. Therefore, their mutual information
should be high. Once a self—propagating malicious code starts initiating sessions, the

keystrokes will be drawn from the marginal p(X) distribution and therefore the

correlation between X and Y should drop.
Like the last section, results reported in this chapter use a window size of 60 seconds.

In each window with one or more sessions, we compute the joint conditional
distribution p(a:,y|a: =1). The joint distribution p(2:,y|:c =1) is to compute the
conditional mutual information. The marginal p(X) and p(Y) are generated from the

ﬁrst 500 values of the all-keys and session-key data, respectively.

B.6.3.2 Mutual Information Perturbations in the Infected
Proﬁles

Similar to the entropy-based keystroke perturbations, we observed some noisy mutual
information spikes. Therefore, like the entropy-based technique we use an order-7
median ﬁlter to remove these spikes. The mutual information of different endpoints
randomly infected with a single infection of a malicious code is outlined in Figure 43.
Clearly, session-keystroke mutual information clearly highlights anomalous behavior for

both high- and low-rate malware and endpoints. In the benign data, the mutual

184

information is consistently high because only a few keys are used to initiate most of the
sessions. Once compromised, the endpoint’s marginal keystrokes get ﬂagged as session
initiation keys. The mutual information drops in Figure 43 are because the marginal all-
keys distribution has very little correlation with network sessions.

The keystroke-based measures proposed in this chapter are fairly independent of the
rate of session initiation. This is a unique attribute of the present techniques because other
network-based anomaly detectors implicitly or explicitly use this rate for detection.
Consequently detection and false alarm rates of such detectors are dependent on the
scanning rate of the malicious code. The techniques proposed in this chapter jointly
consider sessions and keystrokes and are therefore not entirely dependent on the session
rate.

In the following section, we develop an automated tool that uses keystroke entropy

and mutual information values for real-time malware detection.

185

 

8

 

 

 

 

 

 

 

 

 

 

        

 

   

 

 

.5 .5
223 230'
226 223-
£24 £26
322 32.
D :I
220 222
a a
¥1a ¥20
.516 8w
0) '7)
W14 1”16
o
«2.. - . . . t «3.. L . . . . .
" 100 200 _ 300, 400 500 600 " 500 1000 1500 2000 2500 3000
time wrndow time window
(a) endpoint 1, Blaster (b) endpoint 3, Forbot-FU
:35 :45
-9 .9
230 E40
.9 .2
E E
_25 -35
(V (U
3 3
3 :3
220 23°
5‘ 5‘
XI x25
:15 e
2 .9
In (D
In ”’20
8” . . . . . 6’3 . . . . .
2000 4000 soon 3000 10000 0.5 1 ' 1:5 2 2.5
time window time wrndow , 10‘
(c) endpoint 6, MyDoom—A (d) endpoint 9, Rbot-AQJ

 

 

28
I)
I.

gates
53“?

_.
O

..

;
.n
.

a

N
.s
N

_.

0
_e
O

 

Session-Key Mutual lnforrnation
an

 

 

 

 

Session-Key Mutual lnfonnation

 

a 560100015002000250030003500 8 200 400.500.?!) 10001200
time window time wrn ow
(e) endpoint 11, SoBig . E (f) endpoint 13, Zotob . G

Figure 43. Mutual information of the session and keystroke random variables at infected
endpoints. Infection start times are marked with a circle. Infections last approximately15
minutes. Each non-overlapping time-window is 60 seconds.

186

B.6.3.3 Automated Detection using Keystroke Perturbations

As mentioned in previous sections, we use an order-7 median ﬁlter to ﬁlter out the
noise in the keystroke entropy and mutual information values. To leverage the ﬁltered
entropy values in a real-time and automated fashion, we train the entropy detector using
the ﬁrst 50 benign keystroke entropy values and the mutual information based detector is
trained using the ﬁrst 10 benign mutual information values of an endpoint. We ﬁnd the
sample mean and sample standard deviation of the entropy values of an endpoint. An
alarm is raised when the ﬁltered entropy value observed in a window is more than the
mean plus three standard deviations. Similarly, we ﬁnd sample mean and sample standard
deviation of the mutual information values. An alarm is raised when the ﬁltered mutual
information value in a window is less than the mean plus one standard deviation.

We use the infected proﬁles used in the last chapter for performance evaluation of the
present malware detectors. Thus there are 100 non-overlapping random infections of
each malicious code in every endpoint’s benign proﬁle. As discussed earlier, each
infection is approximately T = 15 minutes. Hence, all results provided in this section are
averaged over one hundred experiments per endpoint per malicious code. We compute
detection and false alarm rates for each experiment as follows. For 100 infections of a
particular malicious code on an endpoint, the percentage detection rate for that malicious
code is computed by simply counting the number of infections that are detected by the
malware detector. The false alarm rate is computed by taking the ratio of the total number
of false alarms with the total evaluated time-windows (i.e., windows with one or more

sessions).

187

 

The average detection and false alarm rates of the entropy and mutual information
based detectors are shown in Figure 44. Figure 44 (a) shows that the detection rate of the
entropy-based technique is 100 % for all endpoints and all malware. Detection rate of the
mutual information detector is 100% for all endpoints except endpoint 1 which has an
average detection rate of 99.66% . Thus both the proposed detectors provide very high
detection accuracy. Figure 44 (b) shows that the mutual information detector has
negligible false alarm rates. The keystroke-entropy detector has slightly higher false
alarm rates than the mutual information detector; the highest false alarm rate of 2.39%
was observed at endpoint 12 . Hence, overall the both malware detector proposed in this
chapter provide very high accuracy for the diverse set of endpoints and malware
considered in this study.

The proposed detectors provide much higher detection rates than the maximum-
entropy and the rate-limiting detectors. As mentioned before, the false alarm rates of the
maximum-entropy and rate-limiting detectors for the high session rate endpoints
[endpoints 3 and 4] are extremely high. The reasons for the inferior performance of
these detectors have been highlighted in the last chapter.

The detection accuracy of the keystroke- based detectors proposed in this chapter is
better than the K-L/SVM-based detector of the last chapter. The false alarm rate of the
keystroke-entropy detector is slightly higher than the K-L/SVM-based detector. The
mutual information detector provides false alarm rates which are comparable to the K-
L/SVM detector. Also, the keystroke-based detectors have lower complexity than the K-
L/SVM based detector since they does not require a complex learning tool for automated

detection. The complexity of computing the keystrokes’ entropy and mutual information

188

 

 

  
  
   
 
   
   
  
 
 

 

  

. 35
1' "— r -n- Mutual Info Detector
°\° A" °\° 30 ‘- neyggwggyageteaor
+ Mutual Info Detector a) ax n . or

2 ‘ Key-Entropy Detector ‘ T6 Rate-Limitin Detector
E ~6- MaxEnt Detector |‘25 <
C Q-‘RatetlmttinL Detector E

.9 — — 220 T
8 re

ti 3

_ 15 4

° .9

Q

a w 10.

N C)

‘- N

0) s.

a 95

(U
A

 

a :-----=-:l"-
10 11 12 13 1 2 3

 

’6 7 e 9 54617.8 é
endpoint ID endpoint ID
(a) detection rate (b) false-alarm rate
Figure 44. Comparison of detection and false-alarm rates of the mutual information based
and keystroke-entropy based malware detectors with maximum-entropy [l4] and rate-
limiting [20] detectors. Each point is averaged over 9 malicious codes with 100 random
infections per malicious code per endpoint.

is also low because these measures are computed on a very small sample space
comprising only the session initiation keystrokes used in the last time-window. However,
the training time required for the keystroke-based detectors is higher than the K-L/SVM
detector. The high detection accuracy and low-complexity of the keystroke-based
malware detectors are a consequence of jointly using network- and host/OS-level
information. In summary, if high detection accuracy and low-complexity are the main
objectives, then the keystroke-based detectors should be used. If low false alarm rates and
small training times are desired, then the K-L/SVM-based detector is more suitable.
Nevertheless, all detectors proposed in this thesis provide highly accurate and fast

detection of self-propagating malware.

189

CHAPTER B.7 ATTACKS AND
COUNTERMEASURES

In this chapter, we discuss attacks that can circumvent the proposed malware

detectors and possible countermeasures to mitigate these attacks.

B.7.1 Mimicry Attack

In a mimicry attack [66], a malware tries to hide its trafﬁc inside benign trafﬁc to
avoid detection. There are two mimicry attacks that can be launched against the K-
L/SVM based malware detector. Under the ﬁrst attack, a malware can use ports that are
frequently used by an endpoint. While this attack can mimic non-attack ports, mimicry of
attack ports is not possible because vulnerabilities targeted by a malware are associated
with ﬁxed ports, and consequently the destination ports of outgoing scan packets are
ﬁxed. Thus, even with mimicked non-attack ports, the proposed detector can detect
perturbations in the attack port distribution, as shown by the CodeRed II results in
Section B.5.2.2.

Another type of mimicry attack on the K-L/SVM detector can be launched by a very
low-rate malware which can hide its trafﬁc within benign trafﬁc, while keeping the total

number of sessions under 7 , where 7 is the threshold number of sessions below which

malware detection is not invoked. As mentioned in Section B.5.2.1, for the endpoints of

this study the values of 7 were very small; ranging between 0.15 and 0.65 sessions per

minute, with an average of 0.33 sessions per minute. A mimicking malware with less

190

 

 

than '7 sessions per time-window will have a very slow propagation rate, and hence will
allow human countermeasures.

A mimicry attack can be launched against the keystroke-based detectors by a malware
which always initiates its scanning sessions aﬁer a certain predeﬁned time has elapsed
since the last keystroke. Such a malicious session will not be evaluated by the proposed
keystroke- based detectors. To mitigate this attack, the time threshold for logging the
session initiation keystroke can be made adaptive. Also, we are currently investigating
the efﬁcacy of the keystroke-based detectors in a scenario when the last keystroke is

always logged irrespective of the time elapsed since that keystroke.

B.7 .2 Attack by Acquiring System-Level Privileges

On an endpoint where security policies and user-privileges are not appropriately
deﬁned, a malware after compromising the endpoint can gain system-level privileges and
can in turn disable the malware detector or overwrite keyboard/mouse buffers [33]. This
vulnerability is a consequence of the design of contemporary Operating systems and the
lack of appropriate user rights management. All endpoint-based malware detectors suffer
from this vulnerability. This attack can be mitigated by appropriate security policing and
user management. To completely defeat this attack, a trusted computing platform [67] or
a virtual machine [49] must be employed. Design of such Operating systems is presently

an area of active research [68]- [71].

191

 

‘L..-_..__

CHAPTER B.8 CONCLUSIONS AND
FUTURE WORK

In this part, we proposed information-theoretic malware detection techniques for
network endpoints. The ﬁrst technique leveraged the K-L divergence from an endpoint’s
benign port usage to detect malicious activity. The second set of techniques used the
entropy and mutual information of keystrokes that are used to initiate network sessions to
detect malware propagation. All of the pr0posed techniques were highly accurate and
provided signiﬁcant improvements over existing methods.

As ﬁJture work, we intend to increase the number of endpoints on which data are
collected. Moreover, we are currently collecting data on local area networks to see if the
network-based malware detector of Section 8.5 can provide good performance when
deployed on LANs. We are also investigating effective countermeasures against the

attacks outlined in the last section.

192

 

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[101

[11]

[12]

PART-B REFERENCES

D. Ellis, J. G. Aiken, K S. Attwood, and S. D. Tenaglia, “A behavioral approach
to worm detection,” ACM WORM, October 2004.

C. C. Zou, L. Gao, W. Gong, and D. Towsley, “Monitoring and early warning of
Internet worms,” ACM CCS, October 2003.

J. Wu, S. Vangala, and L. Gao, “An effective architecture and algorithm for
detecting worms with various scan techniques,” NDSS, February 2004.

S. E. Schechter, J. Jung, and A. W. Berger, “Fast detection of scanning worm
infections,” RAID, September 2004.

J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan, “Fast portscan detection
using sequential hypothesis testing,” IEEE Symposium on Security and Privacy,
May 2004.

N. Weaver, S. Stamford, and V. Paxson, “Very fast containment of scanning
worms,” Usenix Security Symposium, August 2004.

A Lakhina, M. Crovella, and C. Diot, “Diagnosing network-wide trafﬁc
anomalies,” ACM Sigcomm, August/September 2004.

A. Lakhina, M. Crovella, and C. Diot, “Characterization of network-wide trafﬁc
anomalies in trafﬁc ﬂows,” ACM/Usenix IMC, October 2004.

P. Barford, J. Kline, D. Plonka, and A. Ron, “A signal analysis of network trafﬁc
anomalies,” ACM/Usenix IMC, November 2002.

B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-based change
detection: Methods, evaluation, and applications,” ACM/Usenix IMC, October
2003.

A. Soule, K. Salamatian, and N. Taﬁ, “Combining ﬁltering and statistical
methods for anomaly detection,” ACM/Usenix IMC, October 2005.

Y. Kim, W. C. Lau, M. C. Chuah, and H. J. Chao, “PacketScore: Statistics-based
overload control against distributed denial-of-service attacks,” IEEE Infocom,
March 2004.

193

..u.’

 

 

[13]

[141

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

A. Lakhina, M. Crovella, and C. Diot, “Mining anomalies using trafﬁc feature
distributions,” ACM Sigcomm, August 2005.

Y. On, A. McCullum, and D. Towsley, “Detecting anomalies in network trafﬁc
using maximum entropy estimation,” ACM/Usenix IMC, October 2005.

D. Moore, C. Shannon, G. M. Voelker, and S. Savage, “Network Telescopes,”
CAIDA technical report, http://www.caida.org/outreach/fpapers/Z004/tr-2004-04/.

E. Cooke, M. Bailey, Z. M. Mao, D. Watson, F. Jahanian, and D. McPherson,
“Toward Understanding Distributed Blackhole Placement,” ACM WORM,
October 2004.

M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. Watson, “The Internet

Motion Sensor: A distributed blackhole monitoring system,” NDSS, February
2005.

D. Dagon, X. Qin, G. Gu, and W. Lee, “HoneyStat: Local worm detection using
Honeypots,” RAID, September 2004.

J. Twycross and M. M. Williamson, “Implementing and testing a virus throttle,”
Usenix Security Symposium, August 2003.

M. M. Williamson, “Throttling viruses: Restricting propagation to defeat
malicious mobile code," ACSA C, December 2002.

S. Sellke, N. B. Shroff, and S. Bagchi, “Modeling and automated containment of
worms,” DSN, June/July 2005.

D. Whyte, E. Kranakis, and P. C. van Oorschot, “DNS-based detection of
scanning worms in an enterprise network,” NDSS, February 2005.

A. Gupta and R. Sekar, “An approach for detecting self-propagating email using
anomaly detection,” RAID, September 2003.

J. Xiong, “ACT: Attachment chain tracing scheme for email virus detection and
control,” ACM WORM, October 2004.

W. Cui, R. H. Katz and W-T. Tan, “BINDER: An Extrusion-based Break-In
Detector for Personal Computers,” Usenix Security Symposium, April 2005.

194

 

 

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

K. Ilgun, R. A. Kemmerer, and P. A. Porras, “State Transition Analysis: A Rule-
based Intrusion Detection Approach,” IEEE Transactions. on Software
Engineering, vol. 21, no. 3, pp. 181-199, March 1995.

S. Jha, K. Tan, and RA. Maxion, “Markov Chains, Classiﬁers, and Intrusion
Detection,” IEEE CSF W, June 2001.

N. Ye, “A Markov Chain Model of Temporal Be-havior for Anomaly Detection,”
IEEE Workshop on Information Assurance and Security, June 2000.

W. DuMouchel, “Computer Intrusion Detection Based on Bayes Factors for
Comparing Command Transition Probabilities,” Tech. Rep. 91, National Institute
of Statistical Sciences, 1999.

A. Lazarevic, A. Ozgur, L. Ertoz, J. Srivastava, and V. Kumar, “A Comparative
Study of Anomaly Detection Schemes in Network Intrusion Detection,” SIAM
Conference on Data Mining, May 2003.

R. P. Lippmann et al., “The 1998 DARPA/AFRL Off-line Intrusion Detection
Evaluation,” RAID, September 1998.

R. P. Lippmann, J.W. Haines, D. J. Fried, J. Korba, and K. Das, “The 1999
DARPA Off-line Intrusion Detection Evaluation,” ACM Computer Networks, vol.
34, 4, October 2000.

Endpoint Security Homepage, http://www.en@ointsecurity.org/.

“Symantec Internet Security Threat Report - Trends for January 05 - June 05,”
Volume VIII, September 2005.

T. Raschke, “The New Security Challenge: Endpoints,” IDC/F—Secure, August
2005.

N. Weaver, D. Ellis, S. Staniford, and V. Paxson, “Worms vs. Perimeters: The
case for Hard-LANs,” IEEE Symposium on High Performance Interconnects (Hot
Interconnects), August 2004.

C. Wong, C. Wang, D. Song, S. Bielski, and G. R. Ganger, “Dynamic quarantine
of Internet worms,” DSN, July 2004.

C. Wong, S. Bielski, A. Studer, and C. Wang, “Empirical Analysis of Rate
Limiting Mechanisms,” RAID, September 2005.

195

 

 

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]
[48]
[49]

[50]

[51]

[52]

[53]

Q. Li, E-C Chang, and M. C. Chan, “On effectiveness of DDOS attacks on
statistical ﬁltering,” IEEE Infocom, March 2005.

A. Kuzrnanovic and E. W. Knightly, “Low-rate TCP-targeted denial of service
attacks,” ACM Sigcomm, August 2003.

S. Staniford, V. Paxson, and N. Weaver, “How to Own the Internet in your spare
time,” Usenix Security Symposium, August 2002.

S. Panjwani, S. Tan, K. M. Jarrin, and M. Cukier, “An experimental evaluation to
determine if port scans are precursor to an attack,” DSN, June/July 2005.

T. M. Cover and J. A. Thomas, “Elements of Information Theory,” Wiley-
Interscience, 1991.

J. Lin, “Divergence Measures Based on the Shannon Entropy,” IEEE
Transactions on Information Theory, vol. 37, no. 3, January 1991.

D. H. Johnson and S. Sinanovic, “Symmetrizing the Kullback-Leibler Distance,”
Technical Report, March 2001.

C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,”
Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121- 167, 1998.

“The Secure Hash Algorithm,” FIPS PUB 180-1, April 1995.
MSDN Library, http://msdn.microsﬁ.com.
Microsoft Virtual PC 2004, lip://www.microsoft.com/Windowg/virtualpg.

Symantec Security Response, W32.Zotob.G,
httD://securitvresponse.svmgntec.corn/avcenter/venc/data/w32.zotob.g.html.

Sophos Virus Info, W3 2/F orbot-FU,
http://www.sophos.com/virusinfo/agalvses/w32forbotfu.html.

Sophos Virus Info, W3 2/ Sdbot-AF R,
http://www.sophos.com/virusinfo/an_a1vses/w3stbotafrhtml.

Sophos Virus Info, Troj/Dloader-NY,
http://www.sophos.com/virusinfo/zﬂtlvses/troiMlemvhtml.

196

 

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

Symantec Security Response, W32.SoBig.E@mm,
http://secuﬁﬂresponse.syLnantec.com/avcenter/venc/data/w32.sobig.e@mm.html.

Symantec Security Response, W32.MyDoom.A@mm,
http:ﬂsecuritvresponse.svmantec.com/avcenter/venc/Qta/w32.mvdoom.a@mm.ht
Q11.

Symantec Security Response, W32.Blaster.Worm,
http:Hsecuritvresponse.smjntec.com/avcenter/venc/data/w32.blaster.worm.html.

Symantec Security Response, W3 2/Rbot-AQJ,
http://www.sophos.com/virusiﬁnfo/analvses/w32rbotaqi.htrnl.

TrendMicro Virus Encyclopedia, WORM_RBOT.CCC, http://gutrendmicro-
europe.com/smb/vinfo/encvclopedig.php?LYstFVMAINDATA&vNav=3&VNa_1
me=WORM RBOT.CCC.

Symantec Security Response, W32.Witty.Worm,
httD://securitvresponse.svmantec.com/avcenter/venc/dataﬂ32.wittv.worm.html.

C. Shannon and D. Moore, “The spread of the Witty worm,” IEEE Security &
Privacy, vol. 2. no. 4, pp. 46- 50, July/August 2004.

Symantec Security Response, CodeRed II,
http://secunlvresponse.svmantec.com/avcenter/venc/data/codered.ii.htrnl.

D. Moore, C. Shannon, and J. Brown, “Code-Red: A case study on the spread and
victims of an Internet worm,” ACM/Usenix IMC, November 2002.

A. Kumar, V. Paxson, and N. Weaver, “Exploiting underlying structure for
detailed reconstruction of an Internet-scale event,” ACM/Usenix IMC, October
2005.

W. S. Sarle, “AI FAQ,” http://www.fags.org[fags/ai-faq/netLal-netsﬂ

S. Axelsson, “The base-rate fallacy and its implications for the difﬁculty of
intrusion detection,” RAID, September 1999.

D. Wagner and P. Soto, “Mimicry Attacks on Host-Based Intrusion Detection
Systems,” ACM CCS, Nov. 2002.

Trusted Computing Alliance, https://www.trustedcomputinggroup.org.

197

 

 

[68]

[69]

[70]

[71]

G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen, “ReVirt: Enabling intrusion
analysis through virtual-machine logging and replay,” Usenix OSDI, December
2002.

T. Garﬁnkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh, “Terra: A virtual
machine-based platform for trusted computing,” ACM SOSP, October 2003.

B. W. Lampson, “Computer security in the real world,” IEEE Computer, vol. 37,
no. 6, pp. 37—46, June 2004.

M. Rosenblum and T. Garﬁnkel, “Virtual Machine Monitors: Current technology
and future trends,” IEEE Computer, (38)5, pp. 39—47, May 2005.

..J

 

198