n 1.. ans; . . . . . . . . an!” . . . ., 2- £32.51. 92.... . as}... , ‘ . ‘ . . . .13... A 5 .t . . . . . .fiw. .. ‘ ‘ V t H La». .¢£.T 1r .. rgw,” v Qwfihfl . ufim . ?..G ~ 1.3“. I . um . ,figmwfimmm: .. 3., 97.3} ,.r..:?:9 S: 9... a. 2.? a 1.. ‘- Wvfi§.§ur all. . 31:3 I) v3.9.3.4. 7,133.8 ;. I v. ‘ It. s. 3.2!... ‘ r. ... Lukfillr.\)hflfl 1; (-l} , . . . 1 . On A .99.... fit I 5 uh .19. it. ,5 I" . . .t .. ‘ a? L", 1!: . .o‘A'vI‘n ‘12‘ .3. 5.. :91...qu . , A ‘ . .3. 5.14. 45.3.“: .w: «3.». 2.3.1., , . . .‘ . .. .. ‘ n. .. . ..\ .A. has“ :to O"?- This is to certify that the dissertation entitled WIRELESS CHANNEL MODELING AND MALWARE DETECTION USING STATISTICAL AND INFORMATION- THEORETIC TOOLS presented by SYED ALI KHAYAM has been accepted towards fulfillment of the requirements for the PhD. degree in Electrical Engineering j/m//Z/// /’?j /’Major‘Professor’ 5 Signature $6 (. 4/ 2 00 6 Date MSU is an Affirmative Action/Equal Opportunity Institution LIBRARY Michigan State University 00-0O--I-I-u--D-~-I-t-OOIOIO-OI-o-n--I---u—-—.- - — PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 2/05 p:/CIRC/DateDue.indd-p.1 WIRELESS CHANNEL MODELING AND MALWARE DETECTION USING STATISTICAL AND INF ORMATION- THEORETIC TOOLS By Syed Ali Khayam A DISSERTATION Submitted to Michigan State University in partial fi11fillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering 2006 ABSTRACT WIRELESS CHANNEL MODELING AND MALWARE DETECTION USING STATISTICAL AND INFORMATION-THEORETIC TOOLS By Syed Ali Khayam This is a bipartite thesis that tackles two different research problems: (i) medium access control (MAC) layer wireless channel modeling and applications of the models in design, analysis and simulations of wireless systems; and (ii) malicious sofiware (malware) detection at network endpoints. For both problems, we collect extensive new datasets which are analyzed and modeled using statistical and infonnation-theoretic tools. In the first part of this thesis, we provide analysis and modeling of bit-errors at the 802.11b MAC layer. We show that the bit-errors at 2 Mbps and 5.5 Mbps can be modeled by high-order full-state Markov (FSM) chains. Bit-errors at 11 Mbps are shown to have long-range dependence (LRD), and consequently a multifractal wavelet model (MWM) is used to model these LRD bit-errors. The complexity of FSM chains is an exponential function of the bit-error process’ memory-length. To mitigate the exponential FSM complexity, we derive guidelines for accurate approximation of an FSM chain of arbitrary memory-length. These guidelines lead to a novel and accurate constant- complexity model (CCM) which always consists of five states irrespective of a process' memory-length. Two applications of the proposed channel models are explored. First, we use the models in a novel maximum-likelihood header estimation framework which can be used by wireless multimedia applications to realize considerable throughput improvements. Trace-driven wireless video simulations show that the proposed header estimation framework provides significant improvements over existing techniques. Second, we use protocol goodput and retransmission metrics to show that inaccurate channel models can lead to extremely misleading simulation and analytical results. The models proposed in this thesis, however, provide highly accurate estimates of goodput and retransmissions. In the second part of this thesis, we propose three endpoint-based anomaly detection techniques that detect self-propagating malware in real-time by observing deviations from a behavioral model derived from a benign data profile. In the first technique, we leverage the Kullback-Leibler (K—L) information divergence of real-time source and destination ports’ distributions to characterize deviations fiom the distributions observed in the benign traffic profile. Experiments using actual endpoint and malware data demonstrate that the source and destination ports’ distributions are perturbed significantly on a compromised endpoint. K-L perturbations are used to train support vector machines which provide almost 100% detection rates and negligible false alarm rates. The remaining two malware detection techniques proposed in this thesis employ perturbations in the distribution of keystrokes that are used to initiate network sessions. We show that the keystrokes’ entropy increases and the session-keystroke mutual information decreases when an endpoint is compromised by a self-propagating malware. These two types of perturbations are used for real-time malware detection. Both detectors provide almost 100% detection rates and very low false alarm rates. Copyright by SYED ALI KHAYAM 2006 ACKNOWLEDGMENTS I would like to thank my family for always respecting and supporting my professional and academic goals. I also thank my academic advisor, Professor Hayder Radha, for always encouraging me to think out-of-the-box and for helping me identify and refine research ideas. I sincerely thank my friends, family members and colleagues in WAVES lab who allowed me to collect network traffic data on their computers. Apama, Mujahid, Dmitri and Farshad deserve special mention here for discussing and critiquing the theory, experiments and writing of my research papers. I also thank Shardha who was a great friend during my first year, and who I regretfully forgot to acknowledge in my Masters thesis. I must also acknowledge the Higher Education Commission of Pakistan and the National Science Foundation of USA for their continued financial support during my MS. and Ph.D. studies. I thank my PhD. committee members and Professor Rong Jin for their technical and editorial guidance. Finally, I thank those associate editors and anonymous reviewers who gave constructive feedback on my papers. That feedback has definitely improved the quality of this thesis. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................ x LIST OF FIGURES ..................................................................................................... xi Part A Statistical Models of MAC Layer Wirless Channels and their Applications 1 CHAPTER A.l Introduction ......................................................................................... 2 A. 1 .1 Overview of Contributions ............................................................................. 4 A. 1 .2 Organization of this Part ................................................................................. 6 CHAPTER A.2 Related Work ...................................................................................... 7 A.2.l Channel Modeling ........................................................................................... 7 A22 Cross-Layer Design for Wireless Multimedia ................................................ 9 CHAPTER A.3 Background ....................................................................................... 11 A.3.1 802.1 lb Wireless Networks .......................................................................... ll A.3.2 Autocorrelation of Random Processes .......................................................... 12 A.3.3 Discrete-Time Markov Chains ...................................................................... 12 A.3.4 Burst Representation of Binary Wireless Traces .......................................... 14 A.3.5 The Gilbert Channel Model .......................................................................... 15 A36 Full-State Markov Chains for Wireless Channels ........................................ 16 A37 Long-Range Dependent Processes ................................................................ 17 A.3.8 The Multifractal Wavelet Model .................................................................. 19 A.3.9 Performance Evaluation Measure ................................................................. 20 CHAPTER A.4 Empirical Analysis and Accurate Modeling of Wireless Channels.. 22 A.4.1 Wireless Trace Collection ............................................................................. 22 A.4.2 Empirical Analysis of 802. lb Bit-Errors ...................................................... 25 A.4.2.l Autocorrelation Analysis .................................................................. 25 A.4.2.2 Preliminary Empirical Analysis of FSM Chains .............................. 27 A.4.2.3 Long-Range Dependence in 11 Mbps Bit-Errors ............................. 28 A.4.2.3.1 LRD Evaluation by Observing Energy at Different Scales ......... 28 A.4.2.3.2 LRD Evaluation using Variance-Time Diagrams ........................ 30 A.4.2.3.3 LRD Evaluation using the Periodogram ...................................... 32 A.4.3 Accurate Modeling of 802.1 lb Bit-Errors .................................................... 33 A.4.3.1 Bit-Error Modeling at 5.5 Mbps ....................................................... 33 A.4.3.2 Bit-Error Modeling at 2 Mbps .......................................................... 34 A.4.3.3 Bit-Error Modeling at 11 Mbps ........................................................ 35 A.4.3.3.1 The Multifractal Wavelet Model .................................................. 36 vi A.4.3.3.2 ENK-based Performance Evaluation ........................................... 36 A.4.3.3.3 Performance in Capturing Energy at Different Scales ................. 39 A.4.3.3.4 Performance in Capturing the Variance-Time Characteristics 39 A44 Discussion ..................................................................................................... 41 CHAPTER A.5 Complexity Reduction for Markov Channels ................................... 43 A.5.l The Hierarchical Markov Model .................................................................. 44 A.5.2 The Hidden Markov Model .......................................................................... 46 A.5.3 FSM Observations ........................................................................................ 48 A.5.4 Observations about FSM Chains .................................................................. 48 A55 Markov Chain Lumpability ........................................................................... 51 A.5.5.1 Lumpability for Wireless Bit-Error Channels ................................... 51 A552 Folded Markov Chains ...................................................................... 55 A.5.5.3 Evaluation of Folded Markov Chains ............................................... 58 A.5.6 Complexity Reduction by Approximating an F SM Chain’s Good- and Bad- Burst Behavior .............................................................................................................. 59 A.5.6.1 Simplification of Good-bursts Distribution ...................................... 64 A562 Simplification of Bad-bursts Distribution ......................................... 65 A563 Guidelines for Approximating an FSM chain ................................... 66 A57 Constant-Complexity Model ......................................................................... 67 A571 Performance of the CCM at 2 Mbps ................................................. 68 A572 Performance of the CCM at 5.5 Mbps .............................................. 71 A.5.8 Discussion ..................................................................................................... 72 CHAPTER A.6 Channel Model Based Header Estimation for Wireless Multimedia 73 A61 F EC Redundancy Lower Bounds for UDP, UDP-Lite and Header Estimation ...................................................................................................................................... 76 A.6. l .1 Redundancy Bounds on the q-ary Symmetric Channel .................... 77 A.6. l . 1.1 FEC Redundancy Bound on a UDP based Protocol Stack .......... 78 A6] .1.2 FEC Redundancy Bound on a UDP-Lite based Protocol Stack... 79 A.6.l.l.3 FEC Redundancy Bound on a Header Estimation based Protocol Stack 80 A.6. 1.1.4 Comparison of the F EC Redundancy Bounds ............................. 80 A.6. l .2 Redundancy Bounds on the Gilbert Channel .................................... 83 A6121 Bound on a UDP based Protocol Stack ........................................ 83 A6122 Bound on a UDP-Lite based Protocol Stack ................................ 84 A6] .2.3 Bound on a Header Estimation based Protocol Stack .................. 85 A6124 Comparison of the FEC Redundancy Bounds ............................. 85 A613 Discussion ......................................................................................... 88 A62 Maximum-Likelihood Header Estimation Framework ................................. 88 A.6.2.1 Functionality at and below a Receiver’s MAC layer ........................ 89 A622 The Header Estimation Module ........................................................ 91 A623 Processing at a Receiver’s Network, Transport and Application Layers 91 A63 Likelihood Functions for Header Estimation ................................................ 91 A631 Header Estimation Likelihood Function for FSM Chains ................ 93 vii A.6.3.2 Header Estimation Likelihood Function of MWM ........................... 95 A633 Extending the FSM Likelihood Function to the CCM ...................... 98 A64 Performance Evaluation of the Header Estimation Framework ................... 99 A641 Experimental Setup ........................................................................... 99 A642 Throughput Performance ................................................................ 100 A643 Comparison of Packet Drops .......................................................... 101 A644 False Alarm Rate ............................................................................. 102 A645 F EC Performance ............................................................................ 103 A646 Video Performance ......................................................................... 106 A65 Discussion ................................................................................................... 107 CHAPTER A.7 Impacts of Ignoring Channel Memory on Analysis and Simulation of Wireless Systems ............................................................................................................ 108 A.7.l Goodput of an Unreliable Protocol ............................................................. 109 A.7. 1.1 Goodput of a Wireless Channel ...................................................... 110 A.7.1.2 Goodput of a Binary-Symmetric Channel Model ........................... 111 A.7.1.3 Goodput of a Gilbert Channel Model ............................................. 111 A.7.1.4 Goodput of a Full-state Markov Channel Model ............................ 112 A.7. l .5 Goodput of a Constant-Complexity Channel Model ...................... 114 A.7. l .6 Comparison of Estimated Goodputs ............................................... 115 A72 Retransmissions of a Reliable Protocol ...................................................... 116 A721 Expected Retransmissions on a Wireless Channel ......................... 117 A722 Comparison of Estimated Retransmissions .................................... 118 CHAPTER A.8 Conclusions and Future Work ........................................................ 122 Part-A References ..................................................................................................... 123 Part B Self-Propagating Malware Detection at Network Endpoints using Information- Theoretic Tools ............................................................................................................... 132 CHAPTER 3.1 Introduction ..................................................................................... 133 3.1.1 Overview of Contributions .......................................................................... 134 B. l .2 Organization of this Part ............................................................................. 137 CHAPTER 32 Related Work .................................................................................. 139 CHAPTER B.3 Background ..................................................................................... 142 B.3.l Self-Propagating Malware .......................................................................... 142 B.3.2 Support Vector Machines ............................................................................ 143 CHAPTER B.4 Data Collection and Simulation ...................................................... 144 8.4.1 Benign Traffic-Keystroke Profiles .............................................................. 144 8.4.2 All-Keystrokes’ Profiles .............................................................................. I48 B.4.3 Malware Classification ................................................................................ 149 B.4.4 Real Malware .............................................................................................. 150 viii 8.4.5 Simulated Malware ..................................................................................... 152 8.4.6 Inserting Malware Data in Benign Traffic Profiles ..................................... 153 CHAPTER 8.5 Malware Detection using Traffic Features ...................................... 155 8.5.1 Malware Detection Using Sample Entropy ................................................. 155 8.5.1.1 Entropy of Source and Destination Ports ........................................ 156 8.5.1.2 Entropy-based Traffic Perturbations in the Infected Profiles ......... 157 8.5.2 Malware Detection Using Information Divergence .................................... 159 8.5.2.1 Kullback-Leibler Divergence of Source and Destination Ports ...... 160 8.5.2.2 K-L-based Traffic Perturbations in the Infected Profiles ............... 163 8.5.2.3 Evaluating Traffic Perturbations with Other Information Divergences 164 8.5.3 Leveraging K-L Perturbations in an SVM-based Framework .................... 167 8.5.3.1 SVM Training ................................................................................. 167 8.5.3.2 Performance Evaluation and Comparison with Existing Techniques 169 8.5.4 Summary and Discussion ............................................................................ 173 CHAPTER 8.6 Malware Detection using Joint Network-Host Features ................. 174 8.6.1 Correlation in the Session-Key Data ........................................................... 174 8.6.2 Malware Detection Using Keystroke Entropy ............................................ 178 8.6.2.1 Definition of Keystroke Entropy .................................................... 178 8.6.2.2 Entropy Perturbations in the Infected Profiles ................................ 179 8.6.3 Malware Detection Using Session-Key Mutual Information ...................... 182 8.6.3.1 Mutual Information of Sessions and Keys ...................................... 182 8.6.3.2 Mutual Information Perturbations in the Infected Profiles ............. 184 8.6.3.3 Automated Detection using Keystroke Perturbations ..................... 187 CHAPTER 8.7 Attacks and Countermeasures ......................................................... 190 8.7.1 Mimicry Attack ........................................................................................... 190 8.7.2 Attack by Acquiring System-Level Privileges ............................................ 191 CHAPTER 8.8 Conclusions And Future Work ....................................................... 192 Part-8 References ..................................................................................................... 193 ix LIST OF TABLES Table 1. Packet-Level Statistics at 2, 5.5 and 11 Mbps .................................................... 24 Table 2. Performance of MWM and F SM for the 11 Mbps Bit-Error Process ................ 37 Table 3. Performance of the hMM for 5.5 Mbps Bit-Error Process ................................. 45 Table 4. Performance of the HMM for the 5.5 Mbps Bit-Error Process .......................... 47 Table 5. Empirical Evidence in Support of Observation 2 ............................................... 50 Table 6. Statistics of the Benign Profiles ........................................................................ 147 Table 7. Information of Malware Used in This Study .................................................... 151 LIST OF FIGURES Figure l. The Gilbert channel model [81]. ....................................................................... 15 Figure 2. Set up for collection of wireless bit-error traces. .............................................. 24 Figure 3. Autocorrelation of bit-error traces. .................................................................... 26 Figure 4. Percentage of unused FSM states at 2 and 5.5 Mbps. ....................................... 28 Figure 5. Aggregates of the 11 Mbps energy process at different time scales. ................ 29 Figure 6. Variance-time diagrams of two 11 Mbps bit-error traces. ................................ 31 Figure 7. Logscale periodogram of two 11 Mbps bit-error traces. ................................... 32 Figure 8. Performances of varying order FSM chains for the 5.5 Mbps MAC layer bit- error process. ............................................................................................................. 35 Figure 9. Performances of varying order FSM chains for the 2 Mbps MAC layer bit-error process. ...................................................................................................................... 35 Figure 10. Probability mass functions for good- and bad-bursts random variables derived from an 11 Mbps trace. (Only the probabilities of small bursts are shown here.) 38 Figure 1 1. Energy processes of actual and synthesized bit-error traces. .......................... 38 Figure 12. Variance-time diagrams of varying order FSM chains for the 11 Mbps bit- error process. ............................................................................................................. 41 Figure 13. Variance-time diagrams of the MWM for the 11 Mbps bit-error process. ..... 41 Figure 14. The hierarchical Markov model (hMM) [18]. ................................................. 45 Figure 15. Transition possibilities for an F SM chain (memory-length, k = 4). ............. 50 Figure 16. Aggregate states S, and SJ. containing FSM states {12, m} and {272, 2m}, respectively. .............................................................................................................. 55 Figure 17. Performance of FMCs formed by folding a 512 state F SM to 256, 128, 64, 32, 16, 8, 4 and 2 states; the FSM process is trained using a 5.5 Mbps trace. ................ 58 Figure 18. State transitions of an FSM with memory-length k and a good-burst of length l 2 k. ........................................................................................................................ 60 xi Figure 19. State aggregation and transitions for the CCM. Each box represents an aggregate CCM state. The number(s) inside a CCM state are the aggregated FSM states. ......................................................................................................................... 68 Figure 20. ENK based modeling performance versus complexity for the 2 Mbps bit-error process ....................................................................................................................... 69 Figure 21. ENK based modeling performance versus memory-length for the 2 Mbps bit- error process. ............................................................................................................. 69 Figure 22. ENK based modeling performance versus complexity for the 5.5 Mbps bit- error process. ............................................................................................................. 71 Figure 23. ENK based modeling performance versus memory-length for the 5.5 Mbps bit-error process. ....................................................................................................... 71 Figure 24. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal Header Estimation over an q-ary symmetric channel;m = 8 , q = 256 , L = 30, 1211 = 60, up = 452 ................................................................................................................. 82 Figure 25. Minimum expected FEC redundancies of UDP, UDP-Lite and Ideal Header Estimation over a Gilbert channel;m = 8 , L = 30, ng = 60, ”D = 452. ......... 87 Figure 26. Interactions between the UDP-based header estimation module and different layers of a wireless receiver’s protocol stack; modified protocol stack layers are shown in different colors and dotted lines represent communications that are not related to packet reception. ....................................................................................... 89 Figure 27. Average packet drops for UDP Normal, UDP-Lite and UDP with Header Estimation at different data rates and for varying number of video streams per receiver; each point is averaged over 3 x (# of video streams) x 5 x 25 received video streams. ......................................................................................................... 101 Figure 28. Codeword construction for video FEC simulations. ..................................... 104 Figure 29. Average F EC redundancy required by UDP Normal, UDP-Lite and UDP with Header Estimation at different data rates of an 802.11b LAN; each point is averaged over 3 x 5 x 5 x 25 = 1875 received video streams ............................................... 105 Figure 30. Average PSNR of video sequences for UDP Normal, UDP-Lite and UDP with Header Estimation using a 30 byte RS codeword with 2 parity bytes; each graph is averaged over 3 x 5 x 5 = 75 received video streams. .......................................... 107 Figure 31. Comparison of the average goodput of the actual traces with the goodput estimates provided by BSC, Gilbert, 1024-state Markov, and 5-state CCM models; each result is averaged over five traces ................................................................... 115 xii Figure 32. Comparison of the number of retransmissions per packet estimated by BSC, Gilbert, 1024-state Markov, and 5-state CCM models; each result is averaged over five traces. ............................................................................................................... 120 Figure 33. Number of retransmissions per packet without the BSC model .................... 120 Figure 34. Number of retransmissions per packet without the BSC model .................... 120 Figure 35. Source and destination port entropies at infected endpoints. Infection start times are marked with a circle. Infections in (a), (b), and (c) last approximately 15 minutes, while that in (d) lasts approximately one minute. Each non-overlapping time-window is 20 seconds. .................................................................................... 157 Figure 36. Source and destination ports’ K-L divergences at infected endpoints. ......... 162 Figure 37. Jensen-Shannon (J-S), K— and resistor-average (R-A) divergences of source ‘ and destination ports at infected endpoints. ............................................................ 166 Figure 38. Comparison of detection and false-alarm rates of the proposed K—L/SVM- based malware detector with maximum-entropy and rate-limiting detectors. Each point is averaged over 12 malware with 100 random infections per malware per endpoint ................................................................................................................... 169 Figure 39. A generalized flow diagram of the proposed K-L/SVM-based malware detector. The shaded area contains real-time components ...................................... 172 Figure 40. Normalized histograms of 20 most-used session initiation keystrokes. Histograms are generated from the session-key data. Virtual keys codes 1 and 13 correspond to the left mouse click and the Enter key, respectively [48]. ................................................................................................................................. 175 Figure 41. Normalized histograms of 20 most-used keystrokes. Histograms are generated from the all-keys data. Virtual keys codes 40, 38 and 17 correspond to the down arrow key, the up arrow key and the control key, respectively [48]. 177 Figure 42. Entropy of the keystroke histograms at infected endpoints. Infection start times are marked with a circle. Infections last approximately15 minutes. Each non- overlapping time-window is 60 seconds. ................................................................ 181 Figure 43. Mutual information of the session and keystroke random variables at infected endpoints. Infection start times are marked with a circle. Infections last approximately15 minutes. Each non-overlapping time-window is 60 seconds. ..... 186 Figure 44. Comparison of detection and false-alarm rates of the mutual information based and keystroke-entropy based malware detectors with maximum-entropy [14] and rate-limiting [20] detectors. Each point is averaged over 9 malicious codes with 100 random infections per malicious code per endpoint. .............................................. 189 xiii PART A STATISTICAL MODELS OF MAC LAYER WIRLESS CHANNELS AND THEIR APPLICATIONS CHAPTER A.1 INTRODUCTION Error modeling has been used to improve design of communication channels and systems for many decades [1]—[7]. Stochastic models of wireless medium access control (MAC) layer packet-losses and bit-errors have recently attracted significant research attention [8]- [30]. The main objective of analyzing and modeling MC-to-MAC [29] or residual [11] bit-errors is to develop accurate simulators which allow experimentation without having the actual network in place. Moreover, bit-error analysis and modeling provide important insights into characteristics of the underlying error random process. This insight is essential for design and performance evaluation of a wide range of wireless protocols, applications and services. For instance, accurate channel models can facilitate design, parameter tuning and verification of the following wireless protocols: 0 Wireless congestion control protocols, instead of relying on MAC layer retransmissions, can use accurate MAC layer error models to differentiate between losses due to congestion, medium degradation or mobility. The inability of wired congestion control algorithms to differentiate between different types of losses and the consequent bandwidth underutilization have been repeatedly highlighted by prior studies [10], [31]—[40]. Knowledge of losses due to channel errors, which is assumed in many wireless congestion control solutions, can be provided by a real-time MAC- to-MAC channel model. Understanding of error frequency and burstiness is also instrumental in parameter tuning of congestion control protocols. 0 Cross-layer protocols can use a real-time channel model to choose between reliable (e. g., using MAC layer retransmissions) versus cross-layer (e.g., ignoring data payload errors [41 ]—[5 1 ]) protocols. 0 Reliable routing protocols [52]—[55] for mobile networks can use MAC-to-MAC channel models to differentiate reliable versus shortest routes to different destinations, if the model is able to provide real-time error characterization at different hops of the network. 0 MAC protocols can decide when to increase/decrease the physical transmission data rate based on real-time channel estimation. An accurate channel model can predict future error characteristics, thereby saving the MAC layer protocol the overhead of switching to an inaccurate lower/higher data rate based on short-term observations. Similarly, design of many wireless applications can be improved by accurate channel models. For instance: 0 Real-time channel estimation provided by an accurate model can be employed by rate- adaptive applications to perform channel- and/or source-coding rate adaptation for efficient bandwidth utilization. 0 Design of effective error-control schemes for different wireless applications requires a thorough understanding of errors above the physical layer [56]. o Error-resilience features of contemporary multimedia codecs can be effectively designed and verified with knowledge of MAC layer error characteristics. Note that most benefits of a wireless MAC layer channel model can be realized if the model is able to provide real-time and online channel characterization and prediction. In 3 complexity- and power-constrained wireless environments, such channel characterization is only possible with a low-complexity model. Despite some recent interest in reducing the complexity of wireless models [23]- [29], development of accurate, pragmatic and low-complexity wireless channel models is still an open problem. A.1.1 Overview of Contributions In this part of the thesis, we analyze and model bit-errors propagated to the 802.11 MAC layer at three physical layer data rates of an 802.11b LAN: 2, 5.5 and 11 Mbps [57], [58]. Our objective is to develop low-complexity MAC-to-MAC channel models without compromising modeling accuracy. To that end, Chapter A.4 focuses on empirical analysis and “accurate” modeling of the bit-errors observed at 2, 5.5 and 11 Mbps. After identifying accurate bit-error models, in Chapter A.5 we reduce the complexity of these models by approximating their behavior. In Chapter A.6, the accurate and low- complexity charmel models are used in a header estimation framework to improve wireless multimedia quality. As a final contribution of this part, Chapter A.7 shows that inaccurate channel models can provide extremely misleading results for critical wireless performance metrics. Chapter A.4 shows that the MAC-to-MAC bit-error characteristics vary with changes in the physical layer data rate. We show that the error-rate is quite low at 2 Mbps as compared to 5.5 and 11 Mbps. At 2 Mbps, approximately 95% of the packets are received without errors, which is a testament of the high physical layer robustness at 2 Mbps. The loss-rate subsequently increases with an increase in data rate. We observe that the 2 and 5.5 Mbps bit-errors exhibit decaying correlation and a low memory-length can be identified. Thus the bit-errors at 2 and 5.5 Mbps can be modeled 4 using Markov chains [59]. However, the bit-errors at 11 Mbps exhibit very high correlation even at large lags. Such high correlation is reminiscent of long-range dependence (LRD) [60] in the 11 Mbps bit-error process. We substantiate the LRD notion through aggregation, variance-time and periodogram analyses. Bit-errors at 2 and 5.5 Mbps are accurately modeled using high-order full-state Markov (FSM) chains [59]. The LRD nature of the 11 Mbps bit-errors renders traditional stochastic models (e.g., Markov, Poisson) ineffective. Therefore, we employ a multifractal wavelet model (MWM) [61]—[63] to characterize the 11 Mbps bit-error random process. For comparison, we also model the 11 Mbps bit-error process using F SM chains of varying orders. We demonstrate that the MWM outperforms the Markov models in both complexity and channel approximation. The complexity of F SM chains increases exponentially with respect to the memory- length. In Chapter A.5, we mitigate the exponential F SM complexity by approximating the F SM behavior using low-complexity models. We first show that hierarchical, hidden and lumped Markov models cannot capture the complex behavior of FSM chains. Consequently, we directly analyze FSM chains and derive important guidelines that should be followed to realize accurate, effective and low-complexity models. These guidelines are used to propose a constant-complexity model (CCM) [30] that always comprises of five states irrespective of the underlying process’ memory-length. At both 2 and 5.5 Mbps, the 5-state CCM provides performance that is comparable to the exponential-complexity F SM chains and better than the linear-complexity models [29]. In Chapter A.6, we leverage the proposed low-complexity channel models in a novel cross-layer wireless multimedia framework. Under the proposed header estimation framework, corrupted headers of received packets are estimated using the MAC-to-MAC channel models. The corrupted packets are in turn passed to the application layer, which uses forward error correction (FEC) to recover the corrupted data. Trace-driven wireless video simulations show that significantly better bandwidth utilization and video quality than UDP [64] and UDP-Lite [41]- [43] protocols can be achieved by employing the header estimation framework. We also show analytically that an ideal header estimation scheme will always perform better than UDP and UDP-Lite under realistic wireless channel conditions. As a final contribution of this part, Chapter A.7 shows that an inaccurate channel model that ignores channel memory can provide extremely misleading results. We use two critical wireless performance metrics, namely goodput and retransmissions, and show that highly inaccurate estimates of these metrics are obtained if memory-less or IM order channel models are used. On the other hand, F SM and CCM channel models which cater for channel memory provide very accurate goodput and retransmission estimates. A.1.2 Organization of this Part The rest of this part is organized as follows. Chapter A.2 provides an overview of the related work in this area. Chapter A.3 provides background that is required to understand the material presented in this part. Chapter A.4 focuses on empirical analyses and “accurate” modeling of the bit-errors at 2, 5.5 and 11 Mbps. Chapter A.5 reduces the complexity of the proposed models by evaluating low-complexity alternatives. Chapter A.6 proposes a header estimation framework for wireless multimedia. Chapter A.7 shows the impact of ignoring channel memory on the design of wireless systems. CHAPTER A.2 RELATED WORK A.2.1 Channel Modeling Recently, link layer modeling for reliable protocols has received some research attention [9], [10]. In the context of delay-sensitive traffic, a previous study derived conditions under which block-based residual/MAC-to-MAC errors can be modeled using a Markov chain [1 1]. For AT&T WaveLAN, a trace-based link layer investigation was conducted in [13]. In the context of link layer modeling, Konrad et 01. performed analysis and presented a Markov-based Trace Analysis (MTA) model algorithm for frame-errors on GSM networks [14], [15]. Ii et a1. [16], [l 7] compared the performance of the MTA, a full-state k -the order model, a hidden Markov model and an extended ON(error- free)/OFF(error-filled) model in capturing the GSM (link layer) frame losses. Based on the comparison provided in [16], [17], it was concluded that an extended ON/OFF model with geometric distributions governing the state holding times provides significantly better results than the other three modeling paradigms. In view of the increasing popularity of 802.11 networks, we studied the 802.11b link layer in order to facilitate design of effective cross-layer error-control schemes for the support of real-time services [18], [19], [45]. Since most error-control schemes operate on byte and/or packet boundaries, we proposed Markov-based models at the packet and byte levels. We showed that a 2-state Markov model can characterize the packet-loss process and a hierarchical Markov Model was proposed for the byte-level errors [18]. Willig er a1. [26] have performed the only prior study that attempts to analyze and model 7 bit—error behavior of 802.11b networks with modeling accuracy as a performance criterion. There are fundamental differences between the measurements, analyses and modeling of [26] and this thesis. In [26] the authors attempt to capture the impact of physical layer parameters (e.g., modulation type, antenna diversity etc.) on the bit-error rate of a wireless LAN in an industrial setting. This study performs all experiments with default physical layer parameters, thereby capturing a realistic channel that is omnipresent in most common home/business/classroom settings. Also, in [26] the error- prone 11 and 5.5 Mbps channels were not evaluated. Chen et a1. [24], [25] investigated Markov chain lumpability to reduce the complexity of wireless channel models. Since lumpability constraints are too stringent for practical wireless channels, Chen et a1. [24], [25] resorted to an ON-OFF model that stochastically bounds the sojourn time distributions of the lumped good and bad states. However, and as asserted by [11], an ON-OFF model assumes geometric (memory-less) distributions for good and bad periods which is not a valid assumption in most real-life channels. Bipartite models were proposed for wireless channels by Willig [26]. The accuracy of bipartite models depends on a selected value of complexity. We argue that model accuracy is not optional and even a low-complexity model should provide the requisite accuracy. Moreover, bipartite models require a large number of parameters to achieve a certain level of accuracy. deke et a1. [28] used chaotic maps to model 802.11b bit-errors at low data rates (1 Mbps and 2 Mbps). Due to the focus on low data rates, in [28] it was observed that: (a) probability of bit—error bursts of more than two bits is very low, and (b) there is almost no correlation in error traces. The chaotic map model in [28] ignores the correlation and captures only the heavy-tail behavior of bit-errors. While this assumption of “no autocorrelation in data” might be suitable for the particular experimental setup used in [28], it is not generically applicable to network error and loss data. In [20], it was shown that low-complexity Markov models (such as hidden and hierarchical Markov models) are inadequate for modeling of an 802.11b link layer bit-error wireless channel. In [29], two linear-complexity models were proposed which were reasonably effective in capturing 802.11b bit-errors. A.2.2 Cross-Layer Design for Wireless Multimedia The traditional UDP protocol detects and drops corrupted packets using a checksum operating at the MAC layer [64]. Such packet drops results in significant bandwidth wastage, especially in the context of error-resilient multimedia applications which can inherently tolerate some errors and losses in the received content. Larzon et al. proposed a UDP-Lite protocol which allows delivery of partially corrupted packets to the application layer [41]- [44]. In its commonly-used form, UDP-Lite disables the MAC layer checksum while the transport layer partial checksum only covers transport and application layer headers. Errors in the application layer payload are simply ignored. Note that support of partial checksum requires modifications to the multimedia senders, receivers and/or (multicast or multihop) intermediate nodes. Many wireless cross-layer studies have shown that UDP-Lite performs better than UDP on contemporary wireless networks [18], [41]- [50]. In [18], it was shown that over 802.11b LANs an application layer FEC must be employed in conjunction with UDP- Lite. Otherwise the partially corrupted packets delivered by UDP-Lite result in almost unintelligible multimedia quality. It was also shown in [18] that UDP-Lite over 802.1 lb LANs can only work at the 2 and 5.5 Mbps data rates. At the 11 Mbps data rate, the errors and losses in the received content are too high for effective FEC-based recovery. 10 CHAPTER A.3 BACKGROUND This chapter provides the background that is required to understand the contributions of this part. A.3.1 802.11b Wireless Networks Due to their high data rates and use of the time-tested TCP/IP protocol suite, 802.11b networks have experienced widespread deployment. These LANs are finding their way into homes and businesses ubiquitously. However, like other wireless technologies, 802.11b networks also suffer from severe quality degradation in the presence of physical obstructions and inter-symbol-interferences. Two modes of operation are supported in 802.11 networks [57], [58]: (i) ad hoc mode in which wireless nodes can communicate with each other directly, and (ii) infrastructure mode in which wireless nodes are arbitrated using a central entity called an access point (AP). All 802.11b-complaint networks support four basic physical layer data rates of 1 Mbps, 2 Mbps, 5.5 Mbps and 11 Mbps. Increase in the data rate reduces the robustness of the 802.1 lb physical layer. In the infrastructure mode, if the number of retransmission requests exceeds a certain threshold, the AP drops down to a lower data rate than its current data rate. For retransmissions, 802.11b relies on a 32-bit frame check sequence (FCS) that computes checksum over the entire MAC layer frame. Positive acknowledgement (ACK) frames are employed to signal successful transmission of data frames. If a frame fails checksum then it is dropped at the receiver’s MAC layer. The sender after timing out schedules a retransmission. 1 l A.3.2 Autocorrelation of Random Processes Let X (ml) and X (n?) be two random variables derived from a random process X (t). The correlation coefficient of these random variables is defined as [65] p (n) = E{X(0)X(n)} — E{X(O)}E{X(77)}’ (AJ) ‘7 X (0)” X (n) where E {X} and o X represent the mean and standard deviation of the random variable X . When evaluating a dataset, the sample mean and the sample standard deviation are used to compute the correlation coefficient of (A.1). This sample autocorrelation coefficient for different values of lag is a direct measure of the level of temporal dependence in the random process. Lag beyond which the autocorrelation coefficient drops to an insignificant value corresponds to the memory-length of a random process. Autocorrelation of a Markov source yields the order of the model required to accurately characterize the source [66], [67]. A.3.3 Discrete-Time Markov Chains Markov chains are employed to model statistical data with short-term temporal dependence. Let a stochastic process Xn take on values denoted by non-negative integers 0,1,...,M . If Xn =z' then the process is said to be in state i at timen. Whenever the process is in state i there is a fixed probability that the next state of the process will be state j. If that probability can be expressed as Pr{Xn+1=j]Xn =i,Xn_1= in_1,...,X1=i1,X0 = 7'0} = Prixn+1=jIXn = iI’ (A.2) 12 for all states tb,i1,...,in_1,i,j and all n 2 0, then Xn is a Markov chain [59]. The property given in (A.2) is commonly referred to as the Markov Property. Thus, for a Markov chain the conditional distribution of any future state Xn+1, given the past states Xn_1, . . . , X1, X0 and the present state Xn , is independent of the past states and depends only on the present state. Equation (A.2) is also referred to as homogeneity property since it ensures that the transition probabilities do not vary with time. Let pm- = Pr {Xn+1 = jIXn = 2'} denote the probability of transiting to state j from 2'. Since pm- represents a probability measure, it exhibits the following properties: (i) pm- 2 0 for all 2,3" 2 0, and (ii) f: pm- =1 for all 2': 0,1,...,M. The probability j=0 of transiting to the next state can be represented in a matrix form. This matrix is referred to as the one-step state transition probability matrix. The steady-state or stationary probabilities of a Markov chain represent the long-run proportion of the time spent in each state. Once the transitional probabilities of a Markov chain are known, the steady-state probabilities of being in a particular state are the unique non-negative solutions of the following linear system of equations: M "j = Ere-pr], j=0,1,...,M i=0 M Zflj =1. i=0 For stationary Markov chains, the steady-state and transition probabilities do not vary with time. Throughout this thesis, we use stationary Markov chain for modeling bit-error l3 processes. The memory-length of a Markov chain is also referred to as its order. Discussion in the preceding section outlined that autocorrelation analysis can be performed on the realizations of a random process to determine the appropriate order of the respective Markov chain. This observation will be used later to identify the orders of Markov chains. A.3.4 Burst Representation of Binary Wireless Traces Wireless bit-error traces are generally represented as a binary time-series {z(i)}:=1, where :r:(z') 6 {0,1} and l is the length of the time-series. Throughout this thesis, we define :1: (i) as: 0 error-free bit 113(2) = 1 corrupted bit. Without loss of generality, a binary time-series can be represented as an interleaved sequence of runs (bursts) of good and bad bits (27(2') =0 and x(i)=1), i.e., (11,B1),(12,Bg),---,(II,BI), where I,- and B, represent the lengths of the ith good and bad bursts, respectively. Wireless channel modeling studies have established that this binary data representation is rather suitable for definition and evaluation of a model [l4]— [17], [20]. The burst-lengths of good and bad bits are used for empirical performance evaluations in this thesis. (Subsequent sections discuss this in further detail.) 14 Figure l. The Gilbert channel model [81]. A.3.5 The Gilbert Channel Model The Gilbert channel was proposed in [81] to model channels with 1St order memory. Since then, it has been used to model many wireless channels at bit, byte and packet levels [9]- [11], [l3]- [15], [18]- [20], [26]. The Gilbert model captures channel memory through a two-state Markov chain having a good and a bad state. The probability of the next (good or bad) symbol is dependent on the whether the last received symbol was good or bad. The steady-state probabilities of staying in the bad and good states are respectively expressed as: 7r = __Pgb and 7rg = ______pbg . (A3) M + Pbg pgb + Pbg Clearly, 7n, + 7rg = 1. Higher probabilities of staying in the present state (i.e., p99 and Pbb) indicate the intensity of channel memory. A more appropriate measure to quantify channel memory was proposed by Mushkin and Bar-David in [82], where memory ,u of a Gilbert channel was defined as: 15 It 2 1‘ pgb _ pbg - (AA) It can be easily seen that —1 S a S 1. Moreover, a closer look at above equations reveals that a = 0 => 7n, = Pgb and 7rg = Pbg- (A.5) In other words, when tr = 0 , the probability of getting a good or a bad symbol at any time instance is independent of the last symbol value, that is, the channel behaves as a memory-less channel. In [82], channels with a > 0 and ,u < 0 were referred to the persistent and oscillatory memory channels, respectively. Real-life channels generally have persistent memory. A.3.6 Full-State Markov Chains for Wireless Channels Wireless bit-error processes are generally bursty and have a memory-length of greater than one bit, and therefore these processes cannot be modeled using the Gilbert model. To make such a process comply with the Markov property of (A.2), a Markov chain is defined such that at each time instance the process is characterized by as many bits as the memory-length. At each time instance, a new bit is added to the memory-window and the oldest bit is dropped from the memory-window. As mentioned before, memory-length of a Markov chain is also referred to as its order. For a memory length of k hits, a full-state Markov (FSM) chain [20] corresponds to all the 2k different possible combinations of k consecutive bits. Transition probabilities between states are computed by sliding a k bit memory-window over the data and counting the number of times a bit-pattern [$1,$2,...,:rk] is followed by another bit- 16 pattern [y1,y2,...,yk]. Note that the number of states of an FSM chain increases exponentially with an increase in memory-length — 2k states for a memory-length of k . A.3.7 Long-Range Dependent Processes Long-range-dependent processes belong to a generic class of scaling or self-similar processes [68], [69]. Self-similar processes exhibit similar statistical behavior at different scales — zooming into or out-of a sample path of the process gives a new process realization which is statistically similar to the original. A self-similar process X(t) satisfies the relation: X(t) icHXU/c), (A6) where =1 represents equivalence in finite-dimensional distributions, c is a scaling (compression/dilation) factor and H is known as the Hurst parameter. Self-similar processes are also referred to as H—ss processes. It is not possible to define a characteristic scale for H-ss processes which implies that these processes are scale-invariant. A self- similar process with stationary increments is referred to as an H—sssi process [68]- [70]. Long-range dependent (LRD) processes model stationary increments of a second- order self-similar process. The Hurst parameter of an LRD process is 1/2 < H < 1. Also, the autocovariance r [k] of an LRD process is of the form: 7' [k] N Crk—(2-2H) , (A.7) where N represents asymptotic equivalence and CT is a positive and constant scaling factor. From (A.7) and the constraint on H it can be seen that summing the 17 autocorrelation function results in a divergent power series [71], Zk|r[k]| = 00. Thus all samples of an LRD process depend heavily on previous samples, thereby resulting in occurrence of clusters of similar values. For the present binary process, this observation simply implies long bursts of zeros and ones. An important property of LRD processes is that they can be equivalently characterized in terms of the behavior of the aggregated process: km (A8) 1 . a. 2 X121, z=(k—1)m+1 X(m) [k] ____ where m is the aggregation level. For an LRD process (and in general for all second- order self-similar processes), var{X(m)[k]} = m2H—2 var {X[k]}. Thus for an LRD process, a log-log plot of var {Xflm [16]} as a function of m is strictly linear with a slope of 2H — 2 [70]. This plot, generally known as the variance-time diagram, can be used to ascertain the presence of LRD in the data and can also render an estimate of H . The power spectral density of an LRD random process is the Fourier transform of the autocorrelation of (A.7), and has been shown to be [70]: I —-C 2' “’2 00 1 C 1-2H 0 (A9) (w)— H sma- Z I2H+1~ lel asw—r , i=—oo Iw + 27ri where w is a frequency and C H is a constant. Note that the spectral density is proportional to hull—2H for frequencies close to the origin. A log-log plot of the power spectral density as a function of the frequency has a slope of 1 - 2H , which can be used to estimateH . 18 A.3.8 The Multifractal Wavelet Model The multifractal wavelet model (MWM) was proposed in [61]- [63] to analyze and model LRD network data. The MWM has shown promise in modeling various LRD network phenomena [61]- [63]. The MWM relies on the premise that network data is inherently non—negative and generally spiky. Both these properties are clearly true for wireless bit-error data. Moreover, the scaling properties of wireless bit-errors can be adequately characterized by wavelet-based analysis. The MWM employs the Haar wavelet family and applies a constraint that the input training data are always non-negative. For the Haar wavelet, the scaling and wavelet coefficients are computed recursively as _ 1 _ 1 (A.10) U j,k - WW j+1,2k + U j+1,2k+1) and Wj,k - 7—2—(Uj+1,2k _ Uj+1,2k+1)r where U j, k and WjJc respectively represent the scaling and wavelet coefficients at time k and scale/level j. With the Haar scaling function, the scaling coefficients are simply averaged versions of the input signal and thus, due to the non-negative nature of the data, the scaling coefficients are always non-negative, U j, k 2 0. Rearranging (A. 10) yields 1 1 (A.11) Ur+l,2k = $01ch + WM) and Ur+1.2k+1 = :EIUM - Wm)- In the first equation of (All), to keep the next level’s scaling coefficients (U j+1,2k ’s) non-negative, negative wavelet coefficients are constrained as IerkI S U j,k- Similarly, to keep the U 341,21,“ ’5 non-negative, the positive wavelet coefficients are constrained as Wj,k s U j, k . Combining these two constraints gives a non-negativity constraint that 19 [WM 3 UM. (A.12) The above constraint simply ensures that once the inverse transform is taken, the resultant process is always non-negative. Alternatively, the constraint can be implemented as er = Aj,kUj,k: (M3) where Aj, k is a random variable defined over the interval [— 1, 1] . In order to train the MWM to match the wireless bit-error traces, two random variables need to be captured. The first random variable is the scaling coefficient at the coarsest scale U jO’kO . The second set of random variables is the AN, ’8 at each level which in turn yield the wavelet coefficients (A.13) at that level. Once a general sense of probability distribution is ascertained for these random variables, the expectation- maximization algorithm [76], [77] can be used to fit that distribution to the actual dataset. The training and synthesis algorithm is detailed in [61]. The complexity of synthesizing a length N trace using the MWM is 0(N). A.3.9 Performance Evaluation Measure Entropy is a measure of the average number of bits required to represent all outcomes of a probability distribution. The Kullback-Leibler divergence quantifies the difference in the entropies of two probability distributions [78]. In [20] we proposed an entropy normalized Kullback-Leibler (ENK) divergence measure to quantify the accuracy of a channel model. The ENK divergence quantifies the source-coding-like overhead incurred 20 by employing a model instead of the actual source. For two probabilities distributions [2 (X) and q (X) defined over a common alphabet \II , the ENK divergence is defined as: v Z p(X)10g2 [IN—X] (A.14) (X ENK(P(X)II‘1(X)) = 36:13 p(X)log2(P(X)) , XE\II Q where the numerator and denominator respectively represent the Kullback-Leibler divergence and entropy functions. The ENK divergence inherits basic properties of the Kullback-Leibler divergence: (a) non-negativity, ENK (pllq) 2 0 , (b) non-symmetry, ENK (pllq) 2: ENK (qllp) , and (c) ENK (pllq) = 0 (it p = q. Small values of ENK divergence indicate that a model accurately approximates the actual source. We would expect the ENK between two actual traces to be a very small value as the traces are realized by the same random source. Therefore we employ the ENK divergence between two 802.11b traces as a performance evaluation reference for the ENK divergence between an actual trace and a trace artificially generated by a model. The ENK divergence relies on the fact that an appropriate random variable X is being used to characterize the underlying source. We employ two random variables for performance evaluation of all the models in this thesis: (i) burst-length of good bits I , where I takes positive integer values; (ii) burst-length of bad/corrupted bits B , where B also takes positive integer values. Throughout this thesis, we refer to I and B as good-bursts and bad-bursts random variables, respectively. 21 CHAPTER A.4 EMPIRICAL ANALYSIS AND ACCURATE MODELING OF WIRELESS CHANNELS In this chapter, we first describe the wireless trace collection experiment. We then evaluate the correlation in the bit-error traces collected at 2, 5.5 and 11 Mbps. We observe that the correlation at 2 and 5.5 Mbps exhibit a decaying trend, but the 11 Mbps traces have high correlation even at large lags. Due to their manageable correlation, we use Markov chains to model the 2 and 5.5 Mbps bit-errors. We show that full-state Markov (F SM) chains provide highly accurate models of the 2 and 5.5 Mbps bit-errors. Moreover, we show that FSM chains have unused states which can be ignored to reduce the complexity of the F SM-based channel modeling paradigm. Unlike the bit-errors at 2 and 5.5 Mbps, the 11 Mbps bit-error process requires a model that can capture long-memory. We evaluate the 11 Mbps bit-errors using scaling, variance-time and periodogram analyses. These evaluations substantiate the presence of long-memory or long-range dependence (LRD) in 11 Mbps bit-errors. Consequently, we employ a multifractal wavelet model (MWM) to characterize the 11 Mbps bit-errors. We show that the MWM captures second-order statistics of the 11 Mbps bit-errors much more accurately than Markov chains. A.4.1 Wireless Trace Collection For this study, five wireless receivers were used to simultaneously collect error traces on an 802.1 lb LAN. The receivers were placed at different locations in a room, while the 22 access point (AP) was placed in a room across a hallway from the receivers to simulate a realistic home/classroom/office setting as shown in Figure 2. The receivers’ MAC layer device drivers were modified to pass corrupted packets to higher layers. The receivers were Linux clients using DLink DWL-650 wireless cards with the open source linux—wlan-ng device drivers [72]. To capture packets at high transmission rates, packet dissectors were implemented inside the device drivers. These packet dissectors ensured that only packets pertinent to our wireless experiment are processed, while all other packets are simply dropped. Each experiment comprised one million packets with a payload of 1, 000 bytes each, i.e., each trace had approximately 1 GB of data. A wired sender was used to send multicast packets with a predetermined payload on the wireless LAN; multicasting disabled MAC layer retransmissions. The sender used different transmission rates ranging from 4 Kbps to 1 Mbps for each experiment. At the physical layer, the auto rate selection feature of the AF was disabled and for each experiment the AP was forced to transmit at a fixed data rate. Each trace collection experiment was repeated multiple times at 2, 5.5 and 11 Mbps physical layer data rates and at different times of day. 23 Room 1 w. tn) 802.11b A / i RoomZ “4‘ i-j % Receiver 0 modified linux-wlan-ng drivers Sender 7 3 ill % Receiver 4 Figure 2. Set up for collection of wireless bit-error traces. Table l. Packet-Level Statistics at 2, 5.5 and 11 Mbps Data rate Average Packet Min Packet Max Packet Error Rate Error Rate Error Rate 2 Mbps 5.97% 0.75% 14.31% 5.5 Mbps 9.79% 0.61% 22.74% 11 Mbps 39.5% 10.99% 77.83% l Table 1 provides some statistics of the traces collected for this study. The packet error rate is computed as pkt error rate = (pkts received with one or more errors)/ (total received pkts). As expected, the average packet error rate increases with an increase in the physical layer data rate. In particular, the average packet error rate increases from approximately 10% at 5.5 Mbps to almost 40% at 11 Mbps. Thus traditional higher layer protocols that drop all corrupted packets (e.g., 802.11 MAC, UDP, TCP etc.) experience profound losses at 11 Mbps, and consequently there is room for considerable improvement. Since the wireless receivers were placed at different locations, the receivers experienced different 24 packet error rates. The minimum and maximum error rates in Table l outline that the receivers were experiencing both good and bad link conditions. In our initial experiments all wireless receivers maintained Line of Sight (LoS) with the access point (AP). The AP was forced to transmit at 2, 5.5 and 11 Mbps for each trace. It was observed that with clear LoS, the error-rate (at all bitrates) was extremely low. Such excellent performance deemed further LoS study inconsequential. Hence, we positioned the receivers in separate rooms to simulate a more realistic business/classroom/home-network wireless setup as shown in Figure 2. A.4.2 Empirical Analysis of 802.1b Bit-Errors To maintain focus, throughout Chapters A.4 and A5 we show results for two traces at each physical layer data rate. These traces are collected at the same receiver under similar conditions. The results for the remaining traces and receivers are similar. A.4.2.1 Autocorrelation Analysis The sample autocorrelations of 2 Mbps, 5.5 Mbps and 11 Mbps bit-error traces are illustrated in Figure 3. Clearly, the correlation at 11 Mbps is higher than that at 2 and 5.5 Mbps. Let us first concentrate on the autocorrelation of 2 and 5.5 Mbps traces. It is clear that the autocorrelation at both data rates is a decaying function, i.e., the level of temporal dependence is decreasing with time. From the examples provided in Figure 3, we assume that the memory-length is determined by the lag beyond which the normalized correlation is less than 0.15 , an empirically-determined threshold. We observed that in some traces the correlation does not drop significantly below 0.15 , even for very large lags. However, in general the bit-errors exhibited rapidly decaying correlation as in Figure 3. 25 n A: v.w 1 1'0 2b i) 40 Figure 3. Autocorrelation of bit-error traces. Extensive performance evaluation suggests that correlation of less than 15% does not play a significant role in the error process characteristics. Based on the threshold, the memory-lengths of the 5.5 Mbps traces of Figure 3 are 12 and 14 respectively. The correlation of both 2 Mbps traces drops below 0.15 at the lag of 16. Hence, we use memory-length 14 and 16 as the maximum order of the 5.5 and 2 Mbps Markov chains respectively. Since the memory-lengths of the 2 and 5.5 Mbps bit- error processes are not very large as compared to the 11 Mbps process, high-order Markov chains can appropriately model these processes. Contrary to the correlations at 2 and 5.5 Mbps, Figure 3 clearly shows that the 11 Mbps bit-error process has high correlation even at large lags. This is reminiscent of long-range dependence since a low-order memory-length cannot be identified for the 11 Mbps bit-error process. Consequently, Markov models cannot be used to model 11 Mbps bit-errors. 26 A.4.2.2 Preliminary Empirical Analysis of FSM Chains In accordance with the discussion in Section A.3.4, we represent the bit-error traces as a binary series {r(i)}:=1, where :r (i) = 1 represents a bit-error and l is the length of the series. Also, for a memory-length of k , a full-state Markov (FSM) chain has states corresponding to all possible 2k combinations of k consecutive bits. The complexity (i.e., number of states) of the FSM chains increases exponentially with an increase in memory-length. Previous studies employed low-order Markov chains [8], [14]—[17]. However, due to the present interest in capturing high-order behavior, we provide analysis and modeling with high-order FSM chains. For efficient and accurate representation of the transition probability data and to reduce the complexity we examined the FSM transition probability matrices for bit- pattems that never occur in the collected traces. We refer to such bit-patterns as the unused states. These states result in all-zero columns in the transition probability matrix. An all-zero column implies that the probability of jumping to that state from any state is zero. While other methods for judicious selection of Markov states exist [67], used states provide a simple and effective method of minimizing the model complexity. The percentage of unused states for each order is shown in Figure 4. It can be observed that the number of unused states grows as the order of the Markov chain is increased. For example, in case of a 212 state model, at 2 and 5.5 Mbps approximately 80% and 30% states are never used. We lay special emphasis on this observation since the total number of states directly corresponds to the complexity of the model. All following FSM results will employ the used states only. Here we recognize that the number of unused states will decrease as the channel is observed for a significantly long 27 + 5.5 Mbps I WI ,‘9' 2Mbps__l \I CC 4;: a: 9 or 9 (D O r to c? percentage of unused states A O .3 O r O " " 1024 16384 number of states (logscale) Figure 4. Percentage of unused F SM states at 2 and 5.5 Mbps. 4096 period of time, i.e., number of unused states is inversely proportional to the trace length. However, and as substantiated by the FSM performance evaluation in later sections, FSM chains perform quite reasonably without the unused states thereby implying that the unused states do not play a significant role in overall channel characterization. A.4.2.3 Long-Range Dependence in 11 Mbps Bit-Errors The autocorrelation analysis in Section A.4.2.1 provided initial indications that the 11 Mbps bit-errors are LRD in nature. This section substantiates this preliminary notion of LRD by analyzing the 11 Mbps bit-error process in further detail. A.4.2.3.1 LRD Evaluation by Observing Energy at Different Scales Since LRD processes typically demonstrate second-order self-similarity, zooming out from a sample path of the process should yield a path similar to the original in second- order statistics. As shown by (A8), in order to determine scaling in a process, we can define an aggregate process by dividing a bit-error trace of length I into non-overlapping blocks of length m and averaging over each block. The resultant aggregate sample path 28 mm ' Xm 1 ‘ I ‘ - 5’; Ion” WMM MNVIIiII/VQIIWW X<4> IAN/EMA 4;- .iitviekmnwm 2°° M “iii 1%”le NW 0 JRVWW “Maw w [YIMWW 0100 Figure 5. Aggregates of the 11 Mbps energy process at different time scales. averages m points from the actual sample path. Due to the {0,1} representation of the bit-errors, an m level aggregate process represents the normalized energy of bit—errors in non-overlapping windows of size 177.. We henceforth use the terms aggregate process and energy process synonymously. Aggregation smoothes high variances in the sample path and provides an on-average zoomed-out version of the actual sample path. Thus energy processes at different aggregation levels outline the impact of aggregation on the short-term second moment of the process. Figure 5 outlines three aggregate processes. The top figure is a process sample path X(1)[k] outlining the unnormalized energy (i.e., the total number of errors) observed in each packet (packet transmission time=l second). The second figure is a level-4 aggregate of the first sample path which depicts the average energy observed in four packets. Thus, the first point in this level-4 aggregate sample path is 29 X(4) [1] = fix“) [1] + X(1)[2] + X(1)[3] + x(1)[4]). Similarly, the remaining two figures are aggregates at levels 8 and 16. Each aggregate path is zooming out of the actual sample path and no statistically differentiating features are revealed by simple observation. Thus it can be inferred that the decrease in variability with increased smoothing is very slow. This slow-varying decay is further highlighted by the analysis of second-order statistics in the following section. A.4.2.3.2 LRD Evaluation using Variance-Time Diagrams Recall that for an LRD process, the variance var {X(m)} of the aggregate process is equal to m2H_2 var {Km}. Variance-time diagrams plot the logscale variance of the aggregate process as a function of the logscale aggregation level. Second-order self- similarity is implied if the logscale decay in the variance is strictly linear, that is, the change in variance is directly proportional to the aggregation level. For an LRD process, the Hurst parameter H can then be estimated by fitting a least-squares line through the plot. A stationary second-order self-similar process is said to be long-range dependent if 1/2 0.1 V: 8 16 32 64 1.28 I 56 ”5 Z G.%% 8 1‘6 3‘2 67‘ 128 :32“ 5 2 number of states number of states (a) good-bursts (b) bad-bursts Figure 8. Performances of varying order FSM chains for the 5.5 Mbps MAC layer bit- error process. A d; , r——v—— U‘ FSM 0'035 FSM 07+ ........ 2 Mbpsibit-error traces -------- 2 M s bit-error traces o.o< . .306)- 9 U) ‘3 0 5r 30.02% ID I o 0.4 » U o 02 g, 8 § 0'35 §0015 LU 0.2» “J 0 1 0.01 - > s ”I 4 8 1‘6 3‘2 64 128 2‘56 SiZ 1024 Gwsl 4 8 1‘6 3‘2 64 128 2‘56 52 1024 number of states number of states (a) good-bursts (b) bad-bursts Figure 9. Performances of varying order FSM chains for the 2 Mbps MAC layer bit-error process. A.4.3.3 Bit-Error Modeling at 11 Mbps In earlier sections, we revealed the LRD nature of the 11 Mbps bit-errors using autocorrelation and scaling analyses. Based on the results from the last two sections, one can conjecture that if high-order FSM simulations are performed, it might be possible to identify an appropriate Markov process of an order lower than what is outlined by the autocorrelation. However, ascertaining such a model order might require simulations with 35 high-order F SM chains, which is computationally infeasible. In this section we show that a multifractal wavelet model (MWM) captures the LRD characteristics of the 11 Mbps bit-error process quite accurately. Although Markov models cannot capture the LRD nature of bit-errors at 11 Mbps, we use Markov chains as a performance reference when evaluating the performance of the MWM. A.4.3.3.l The Multifractal Wavelet Model We used the MWM toolbox [73] to train an MWM. An actual 11 Mbps trace (i.e., a bit-error sequence of zeros and ones) was used for MWM training. Various simulations were performed with ,6 , point-mass and hybrid fl/point-mass probability distributions for the Aflc random variables and Gaussian and log-normal distributions for the U10: k0 random variable. We observed that the performance of the MWM was quite insensitive to the choice of the probability distribution chosen to capture the MWM random variables. For brevity we only report results for the 5 distribution. A.4.3.3.2 ENK-based Performance Evaluation A cautionary note is in place before we proceed with ENK-based performance evaluation of the MWM at 11 Mbps. Due to its reliance on entropy, the ENK divergence compares the skew of the probability distributions, but does not place much emphasis on the second-order statistics (e.g., energy, variance etc.) of the distributions. The MWM (or for that matter any model of LRD data), on the other hand, is designed to capture scaling phenomena (and the consequent second-order statistics) of an LRD random process. Thus for an LRD process, comparison only using ENK of good- and bad-burst distributions can be misleading. Thus ENK divergence by itself cannot render an appropriate measure to completely quantify the MWM performance. In addition to ENK, it is imperative that 36 Table 2. Performance of MWM and FSM for the 11 Mbps Bit-Error Process good-bursts bad-bursts ENK(trace1||trace2) 0.0586 0.00032 ENK(trace1||MWM) 0127 0.093 ENK (tracelll FSM (16)) 0-174 0002 ENK(trace1||FSM(4096)) 0.091 0.00094 ENK (trace2|| MWM) 0-143 0.096 ENK (trace2lIFSM (16)) 0-189 0002 ENK(trace2||FSM(4096)) 0.088 0.0017 second-order statistics of the random process be compared with the model. We perform such second-order performance evaluation in the subsequent sections. The ENK-based performances of the MWM and FSM chains are tabulated in Table 2, where FSM (2:) represents an FSM chain with :1: states. The good-bursts ENK overhead of the MWM is lesser than the l6-state FSM chain, while the bad-bursts overhead is more than the 16-state FSM chain. The MWM ENK overhead is slightly worse than the 4096- state FSM chain. For instance, for the first actual trace the MWM’s ENK good-bursts divergence is 0.127 — 0.091 = 0.036 more than the 4096-state FSM. For the same example, the bad-bursts ENK overhead of MWM is 0.093 — 0.00094 = 0.09206 more than the 4096-state FSM. 37 9 & o & 0,35 0.35 E‘ i: :5 0.3 E 0'3 :0 :0 @025 30.25 ‘3 02 .3 0.2 3 3 .0015 .0015 g) 0.1 g 0.1 0.05 0.05 o o 1 - - 0 10 9003) burst lggigth 5° 0 10 900? burst lgggth ‘0 50 (a) good-bursts (b) bad-bursts Figure 10. Probability mass functions for good- and bad-bursts random variables derived from an 11 Mbps trace. (Only the probabilities of small bursts are shown here.) . mpslllllllllllllllfllllllllmtlulll #0101 Mill MWM 0 0.6 16 state FSM 200000 400000 0 6000 10000 15000 (a) aggregation level=8 (b) aggregation level=256 Figure 11. Energy processes of actual and synthesized bit-error traces. The slightly superior performance of the FSM chains is due to the fact that the FSM model is extremely apt at capturing the short-term correlation structure of the random process. This short-term dependence is because of small bursts of good and bad bits. Such small bursts are quite prevalent even in an LRD process such as the present 11 Mbps bit-error process. To substantiate this claim, we show the small burst probabilities of the good- and bad-bursts random variables in Figure 10. Note that burst-lengths of l, 2, 3, 4 and 5 constitute 78.35% and 98.03% of the probability space of the good- and bad-bursts random variables, respectively. This small burst behavior is very adequately 38 characterized by an F SM chain. Since the skew of both probability distributions is dictated by these highly probable small bursts, the ENK overhead of the FSM chains is quite low, although FSM chains cannot capture the long-term process correlation. The skew-oriented bias of the ENK measure masks the long-term correlation properties of a random process, which is exhibited in the spread (i.e., the variance) of the probability distribution. We henceforth focus solely on second-order analysis of the models under consideration. A.4.3.3.3 Performance in Capturing Energy at Different Scales We first consider energy in non-overlapping windows of the bit-error traces. As mentioned previously, the definition of energy (as given in (A.8) and explained in prior sections) outlines the second moment of the random process in short-term windows. Two examples of an energy process derived from an actual source and energy processes synthesized using the MWM, the 16-state FSM chain and the 4096-state FSM chain are illustrated in Figure 11. It can be observed that the FSM chains project overly pessimistic energy estimates (i.e., very high error rates), whereas the MWM in general has less energy per window than the actual 11 Mbps bit-error process. By simple observation, it can be deduced that the MWM captures the energy characteristics of the 11 Mbps bit- error process better than the Markov chains. In the next section, we compare the aggregated variance-time behavior of the FSM chains and the MWM. A.4.3.3.4 Performance in Capturing the Variance-Time Characteristics In this section, we evaluate the accuracy of the MWM and the FSM chains in modeling the decay of aggregated variances. Figure 12 shows the variance-time behavior of FSM chains. Clearly, the FSM chains can capture the short-term correlations of the 39 random process with outstanding accuracy as shown in the top-left corner of Figure 12. However, the performance degrades sharply as the dependence (i.e., linear decay of the variance) persists at higher scales. Not surprisingly, more and more correlation is captured as we increase the memory-length of FSM chains. Thus if the complexity of an FSM chain that captures all the scales present in the data can be afforded then such an FSM chain can render a highly accurate model. Unfortunately, in practical LRD processes the correlation typically persists at very high scales. In such a case, a model that is designed to capture the correlation structure at different scales (e.g., the MWM) is more suitable than FSM chains. This observation is outlined in Figure 13 (a), which shows that the MWM can capture the decay of variance of the 11 Mbps quite accurately within an additive constant. The phenomenon of a model’s inability to capture the exact variance values is well-known in LRD literature. It has been diagnosed that this problem arises because of non-stationarities introduced by jumps in the mean and slow decaying trends [74]. (The jumps in the mean of the 11 Mbps bit-error process can be easily observed in Figure 11.) Teverovsky and Taqqu [74] 2H—2 proposed to eliminate this problem by fitting the fimction 01+ 02m to the variance-time diagram of the LRD process. The corrective factors, Cl and Cg , can then be added to the variances produced by a model. In the present problem, the corrective factors were Cl = 0 and 02 = 3.71 . Variance-time diagram of the MWM with the corrective factors is given in Figure 13 (b). Clearly, the corrected MWM captures the decay in the variance quite accurately. Thus we deduce that MWM is an effective model of the long-range dependence present in the 11 Mbps bit-error process. 40 .N (n log(Var