STATISTICAL AND LEARNING ALGORITHMS FOR THE DESIGN,
ANALYSIS, MEASUREMENT, AND MODELING OF NETWORKING
AND SECURITY SYSTEMS
By
Muhammad Shahzad

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Computer Science—Doctor of Philosophy
2015

ABSTRACT
STATISTICAL AND LEARNING ALGORITHMS FOR THE DESIGN,
ANALYSIS, MEASUREMENT, AND MODELING OF NETWORKING
AND SECURITY SYSTEMS
By
Muhammad Shahzad
The goal of this thesis is to develop statistical and learning algorithms for the design,
analysis, measurement, and modeling of networking and security systems with specific focus
on RFID systems, network performance metrics, user security, and software security. Next,
I give a brief overview of these four areas of focus.
Radio frequency identification (RFID) systems are widely used in supply chain and inventory management. Existing RFID systems are primarily used to identify the RFID tags
present in a tag population. While identifying individual tags is a useful operation, it is
usually very time consuming and is not always desired or required. For example, if the objective is to determine whether any of the tags are missing (e.g., to detect a theft), then first
identifying all tags and then determining if any tags are missing is a very slow process. In
this thesis, I present novel statistical algorithms to enable new applications in RFID systems,
such as counting the number of tags in a population and detecting missing tags, while using
existing infrastructure of RFID systems that is already deployed in industry.
With the growth in number and significance of the emerging applications that require
extremely low latencies, network operators are facing increasing need to perform latency
measurement on per-flow basis between any two observation points for network monitoring
and troubleshooting. Per-flow latency measurement can be used reactively by network operators to perform tasks such as detecting and localizing delay spikes in a network, isolating
offending flows that are responsible for causing delay bursts, and rerouting them through
other paths. It can also be used proactively by network operators to monitor latencies

between observation points for locating bottleneck links and replacing them with higher
capacity links. In this thesis, I present a novel per-flow latency measurement scheme that
requires no probe packets and time stamping.
With the rich functionalities and enhanced computing capabilities available on mobile
computing devices with touch screens, users not only store sensitive information (such as
credit card numbers) but also use privacy sensitive applications (such as online banking) on
these devices, which make them hot targets for hackers and thieves. In this thesis, I present
a gesture based user authentication scheme for the secure unlocking of touch screen devices.
Unlike existing authentication schemes for touch screen devices, which use what user inputs
as the authentication secret, our scheme authenticates users mainly based on how they input.
Even if attackers see what gesture a user performs, they cannot reproduce the behavior of
the user doing gestures through shoulder surfing or smudge attacks.
Software systems inherently contain vulnerabilities that have been exploited in the past
resulting in significant revenue losses. The study of vulnerability life cycles can help in the
development, deployment, and maintenance of software systems. It can also help in designing
future security policies and conducting audits of past incidents. In this thesis, I present an
exploratory measurement study of a large software vulnerability data set containing 46310
vulnerabilities disclosed since 1988 till 2011. Our exploratory analysis uncovers several statistically significant findings along several dimensions including phases in the life cycle of
vulnerabilities and evolution of vulnerabilities over the years. These findings have important
implications for software development and deployment.

To my parents and my beautiful wife,
for their support and encouragement.

iv

ACKNOWLEDGEMENTS
Working towards a Ph.D. has been a deeply enriching and rewarding experience. Looking
back, many people have helped shape my journey. I would like to extend them my thanks.
• First and foremost, my advisor, Prof. Alex X. Liu. My work would not have been possible without his constant guidance, his unwavering encouragement, his many insights,
and his exceptional resourcefulness. And most importantly, his friendship. I have been
very fortunate to have an advisor who has also been a close friend. For all of this, Alex,
thank you.
• I would also like to thank the rest of my thesis committee Profs. Eric Torng, Guoliang
Xing, and Subir Biswas for their encouragement and insightful comments during my
qualifier and comprehensive exams.
• I would also like to thank Dr. Arjmand Samuel. I learnt a lot from him during my
summer internships at Microsoft Research. My collaboration with him has been one of
the most fruitful and fun engagements I have experienced.
• I would also like to thank Drs. Henrik Lundgren and Ioannis Pefkianakis. I really
enjoyed working with them during my summer internship at Technicolor Research.
• Throughout my Ph.D., I was supported by various NSF research grants. Thanks NSF!
• I would also like to thank Michigan State University, and specifically Department of
Computer Science and Engineering for providing me financial support to attend various
conferences during my Ph.D.
• Many thanks to my colleagues in Systems and Security Lab at Michigan State University. In particular, I would like to thank Muhammad Zubair Shafiq, Momina Tabish,
Kamran Ali, Jamal Afridi, Ann Wang, Ali Munir, Faraz Ahmed, and Fei Chen for
numerous insightful discussions and collaborations on various projects.
• I must say that I owe my great time in Michigan State University to all of my fabulous
friends. It is simply not feasible to list all of them here. I would like to thank them all
for their friendship and support.
v

• I am also very thankful to Drs. Syed Ali Khayam and Muddassar Farooq, who advised
me before my Pn.D. and encouraged me to pursue Ph.D.
• Finally, I do not know how I can thank my family enough: my wife and my parents,
from whom I realized that kindness and devotion is endless, and my beautiful daughter
who has been the best and persistent source of happiness and joy for me ever since she
came into my world!

vi

TABLE OF CONTENTS
LIST OF TABLES

xiii

LIST OF FIGURES

xiv

1 Introduction
1.1 Contributions . . . . . . . . . . . . . . . . .
1.1.1 RFID Estimation [103, 106] . . . . .
1.1.2 RFID Identification [104, 108] . . . .
1.1.3 RFID Missing Tags [109] . . . . . . .
1.1.4 Per-flow Latency Measurement [107]
1.1.5 User Security [110] . . . . . . . . . .
1.1.6 Software Security [111] . . . . . . . .
1.2 Published Material . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

2 RFID Estimation
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Motivation and Problem Statement . . . . . . . . . . .
2.1.2 Proposed Approach . . . . . . . . . . . . . . . . . . . .
2.1.3 Advantages of ART over Prior Art . . . . . . . . . . .
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 ART — Scheme Overview . . . . . . . . . . . . . . . . . . . .
2.3.1 Communication Protocol Overview . . . . . . . . . . .
2.3.2 Estimation Scheme Overview . . . . . . . . . . . . . .
2.3.3 Formal Development: Overview and Assumptions . . .
2.4 ART — Estimation Algorithm . . . . . . . . . . . . . . . . . .
2.5 ART — Parameter Tuning . . . . . . . . . . . . . . . . . . . .
2.5.1 Persistence Probability p . . . . . . . . . . . . . . . . .
2.5.2 Number of Rounds n . . . . . . . . . . . . . . . . . . .
2.5.3 Optimal Frame Size f . . . . . . . . . . . . . . . . . . .
2.5.3.1 Summary of steps to calculate p, n, and fop .
2.5.4 Obtaining Population Upper Bound tm . . . . . . . . .
2.6 ART — Practical Considerations . . . . . . . . . . . . . . . .
2.6.1 Unbounded Tag Population Size . . . . . . . . . . . . .
2.6.2 ART with Multiple Readers . . . . . . . . . . . . . . .
2.7 ART — Analysis . . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Independence of Estimation Time from Tag Population
2.7.2 Computational Complexity . . . . . . . . . . . . . . . .
2.7.3 Analytical Comparison of Estimators . . . . . . . . . .

vii

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Size
. . .
. . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

1
2
2
2
3
4
5
5
6

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

8
8
8
9
10
11
12
12
13
14
15
21
22
27
27
29
30
32
33
35
36
36
37
37

2.8

2.9

Performance Evaluation
2.8.1 Estimation Time
2.8.2 Actual Reliability
Conclusion . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

3 RFID Identification
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Background and Problem Statement . . . . . .
3.1.2 Summary and Limitations of Prior Art . . . . .
3.1.3 System Model . . . . . . . . . . . . . . . . . . .
3.1.4 Proposed Approach . . . . . . . . . . . . . . . .
3.1.4.1 Population Size Estimation . . . . . .
3.1.4.2 Finding Optimal Level . . . . . . . . .
3.1.4.3 Population Size Re-estimation . . . . .
3.1.4.4 Finding Hopping Destination . . . . .
3.1.4.5 Population Distribution Conversion . .
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Nondeterministic Identification Protocols . . . .
3.2.2 Deterministic Identification Protocols . . . . . .
3.2.3 Hybrid Identification Protocols . . . . . . . . .
3.3 Optimal Tree Hopping . . . . . . . . . . . . . . . . . .
3.3.1 Average Number of Queries . . . . . . . . . . .
3.3.2 Calculating Optimal Hopping Level . . . . . . .
3.3.3 Maximum Number of Queries . . . . . . . . . .
3.4 Minimizing Identification Time . . . . . . . . . . . . .
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Virtual Conversion of Population Distributions .
3.5.2 Reliable Tag Identification . . . . . . . . . . . .
3.5.3 Continuous Scanning . . . . . . . . . . . . . . .
3.5.4 Multiple Readers . . . . . . . . . . . . . . . . .
3.6 Performance Comparison . . . . . . . . . . . . . . . . .
3.6.1 Reader Side Comparison . . . . . . . . . . . . .
3.6.1.1 Normalized Reader Queries . . . . . .
3.6.1.2 Identification Speed . . . . . . . . . .
3.6.2 Tag Side Comparison . . . . . . . . . . . . . . .
3.6.2.1 Normalized Tag Responses . . . . . . .
3.6.2.2 Tag Response Fairness . . . . . . . . .
3.6.2.3 Normalized Collisions . . . . . . . . .
3.6.2.4 Normalized Empty Reads . . . . . . .
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .

viii

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

38
39
42
42

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

43
43
43
44
46
46
47
47
48
49
50
51
51
51
52
52
53
57
58
60
64
64
67
68
69
69
70
71
72
75
76
77
78
78
79

4 RFID Missing Tags
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Background & Motivation . . . . . . . . . .
4.1.2 Summary & Limitations of Prior Art . . . .
4.1.3 Problem Statement & Proposed Approach .
4.1.4 Technical Challenges & Solutions . . . . . .
4.1.5 Key Novelty & Advantages over Prior Art .
4.2 Related Work . . . . . . . . . . . . . . . . . . . . .
4.2.1 Probabilistic Protocols . . . . . . . . . . . .
4.2.2 Deterministic Protocols . . . . . . . . . . . .
4.3 System Model . . . . . . . . . . . . . . . . . . . . .
4.3.1 Architecture . . . . . . . . . . . . . . . . . .
4.3.2 C1G2 Compliance . . . . . . . . . . . . . . .
4.3.3 Communication Channel . . . . . . . . . . .
4.3.4 Formal Development Assumption . . . . . .
4.4 Protocol Description . . . . . . . . . . . . . . . . .
4.5 Parameter Optimization . . . . . . . . . . . . . . .
4.5.1 Estimating Number of Unexpected Tags . .
4.5.2 False Positive Probability . . . . . . . . . .
4.5.3 Achieving Required Reliability . . . . . . . .
4.5.4 Minimizing Execution Time . . . . . . . . .
4.5.5 Handling Large Frame Sizes . . . . . . . . .
4.5.6 Expected Detection Time . . . . . . . . . .
4.6 Performance Evaluation . . . . . . . . . . . . . . .
4.6.1 Impact of Number of Missing Tags . . . . .
4.6.2 Impact of Number of Unexpected Tags . . .
4.6.3 Impact of Deviation from Threshold . . . .
4.6.4 Comparison with Tag ID Collection Protocol
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . .
5 Per-flow Latency Measurement
5.1 Introduction . . . . . . . . . . . . . . . . . . .
5.1.1 Motivation . . . . . . . . . . . . . . . .
5.1.2 Problem Statement . . . . . . . . . . .
5.1.3 Limitations of Prior Art . . . . . . . .
5.1.4 Proposed Approach . . . . . . . . . . .
5.1.4.1 Recording Phase . . . . . . .
5.1.4.2 Querying Phase . . . . . . . .
5.1.5 Technical Challenges and Solutions . .
5.1.6 Advantages of COLATE over Prior Art
5.2 Related Work . . . . . . . . . . . . . . . . . .
ix

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

81
81
81
82
83
85
86
87
87
88
88
88
89
89
89
90
91
93
95
96
97
99
100
101
102
103
104
105
106

.
.
.
.
.
.
.
.
.
.

107
107
107
108
109
110
110
111
114
114
115

5.3

5.4

5.5

5.6

5.7

COLATE – Recording Phase . . . . . . . . . . . . . . . .
5.3.1 Noisy Accumulation of Time Stamps . . . . . . .
5.3.2 Analysis of Noisy Accumulation . . . . . . . . . .
COLATE – Querying Phase . . . . . . . . . . . . . . . .
5.4.1 Estimating Latency Average . . . . . . . . . . . .
5.4.2 Estimating Latency Standard Deviation . . . . .
5.4.2.1 Denoising Counter Subvectors . . . . . .
5.4.2.2 Statistical Simulations . . . . . . . . . .
5.4.2.3 Steps of Estimating Standard Deviation
COLATE – Reliability . . . . . . . . . . . . . . . . . . .
5.5.1 Individual Reliability Requirements . . . . . . . .
5.5.2 Reliability Centered Parameter Selection . . . . .
5.5.3 Flexibility in Parameter Selection . . . . . . . . .
Performance Evaluation . . . . . . . . . . . . . . . . . .
5.6.1 Network Traces . . . . . . . . . . . . . . . . . . .
5.6.2 COLATE Accuracy . . . . . . . . . . . . . . . . .
5.6.2.1 Average Latency . . . . . . . . . . . . .
5.6.2.2 Standard Deviation . . . . . . . . . . . .
5.6.3 RAM and Storage Size . . . . . . . . . . . . . . .
5.6.4 Comparison with RLI . . . . . . . . . . . . . . .
5.6.5 Comparison with Count-Min Sketch . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .

6 User Security
6.1 Introduction . . . . . . . . . . . . . . . . . . . .
6.1.1 Motivation . . . . . . . . . . . . . . . . .
6.1.2 Proposed Approach . . . . . . . . . . . .
6.1.3 Technical Challenges and Solutions . . .
6.1.4 Threat Model . . . . . . . . . . . . . . .
6.1.5 Key Contributions . . . . . . . . . . . .
6.2 Related Work . . . . . . . . . . . . . . . . . . .
6.2.1 Gesture Based Authentication on Phones
6.2.2 Phone Usage Based Authentication . . .
6.2.3 Keystrokes Based Authentication . . . .
6.2.4 Gait Based Authentication . . . . . . . .
6.3 Data Collection and Analysis . . . . . . . . . .
6.3.1 Data Collection . . . . . . . . . . . . . .
6.3.2 Data Analysis . . . . . . . . . . . . . . .
6.4 GEAT Overview . . . . . . . . . . . . . . . . .
6.5 Noise Removal . . . . . . . . . . . . . . . . . . .
6.6 Feature Extraction & Selection . . . . . . . . . .
x

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

117
117
118
121
121
122
124
126
128
128
129
130
134
134
134
136
136
137
138
139
140
141

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

143
143
143
144
146
149
149
150
150
150
151
151
151
151
152
156
158
159

6.6.1

Stroke Based Features . . . . . . . . . . . . . . . . . . .
6.6.1.1 Extraction . . . . . . . . . . . . . . . . . . . . .
6.6.1.2 Selection . . . . . . . . . . . . . . . . . . . . .
6.6.2 Sub-stroke Based Features . . . . . . . . . . . . . . . . .
6.6.2.1 Stroke Segmentation and Feature Extraction . .
6.6.2.2 Sub-stroke Time Duration . . . . . . . . . . . .
6.6.2.3 Sub-stroke Selection at Appropriate Resolutions
6.7 Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7.1 Partitioning the Training Sample . . . . . . . . . . . . .
6.7.2 Training the SVDE Classifiers . . . . . . . . . . . . . . .
6.7.3 Classifying the Test Samples . . . . . . . . . . . . . . . .
6.8 Ranking and Classification . . . . . . . . . . . . . . . . . . . . .
6.9 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .
6.9.1 Accuracy Evaluation . . . . . . . . . . . . . . . . . . . .
6.9.1.1 Single Behavior Results . . . . . . . . . . . . .
6.9.1.2 Multiple Behaviors . . . . . . . . . . . . . . . .
6.9.1.3 Individual Gestures . . . . . . . . . . . . . . . .
6.9.2 Impact of Training Samples Size . . . . . . . . . . . . . .
6.9.3 Determining Threshold for cv . . . . . . . . . . . . . . .
6.9.4 Real-world Evaluation . . . . . . . . . . . . . . . . . . .
6.9.4.1 Non-shoulder Surfing Attack . . . . . . . . . . .
6.9.4.2 Shoulder Surfing Attack . . . . . . . . . . . . .
6.9.5 Comparison with Existing Schemes . . . . . . . . . . . .
6.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Software Security
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Terminology and Notations . . . . . . . . . . . . . . .
7.2.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2.1 Data Aggregation . . . . . . . . . . . . . . . .
7.2.2.2 Selection of Vendors and Products . . . . . .
7.3 General Vulnerability Analysis . . . . . . . . . . . . . . . . . .
7.3.1 Vulnerability Disclosure Trend . . . . . . . . . . . . . .
7.3.2 Evolution of CVSS-Vector Metrics . . . . . . . . . . . .
7.3.3 General Trend of CVSS Score for Short-listed Vendors
7.3.4 Evolution of Types of Vulnerabilities . . . . . . . . . .
7.4 Exploitation Behavior . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Evolution of Exploitation . . . . . . . . . . . . . . . .
7.4.2 Exploitation of Types of Vulnerability . . . . . . . . .
7.4.3 Exploitation Trend for Vendors and Products . . . . .
xi

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

160
160
160
161
162
163
165
166
166
167
168
169
169
170
170
171
172
173
174
174
174
175
176
176

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

178
178
180
180
181
182
182
183
184
184
185
185
187
188
189
189

7.5

7.6

7.7

7.8

7.9

7.4.4 Exploitation Behavior: CVSS Scores . . . . . . .
7.4.5 Interesting Exploitation Rules . . . . . . . . . . .
Patching Behavior . . . . . . . . . . . . . . . . . . . . .
7.5.1 Evolution of Patching Behavior . . . . . . . . . .
7.5.2 Patching of Types of Vulnerabilities . . . . . . . .
7.5.3 Patching Trend for Vendors and Products . . . .
7.5.4 Patching Behavior: CVSS Scores . . . . . . . . .
7.5.5 Interesting Patch Rules . . . . . . . . . . . . . . .
Patching vs. Exploitation . . . . . . . . . . . . . . . . .
7.6.1 Patching vs. Exploitation: Over the Years . . . .
7.6.2 Patching vs. Exploitation: Vendors and Products
7.6.3 Patching vs. Exploitation: CVSS Scores . . . . .
Implications . . . . . . . . . . . . . . . . . . . . . . . . .
7.7.1 Software Design . . . . . . . . . . . . . . . . . . .
7.7.2 Code Development Practices . . . . . . . . . . . .
7.7.3 Customer Assessment of Vendors and Products .
Related Work . . . . . . . . . . . . . . . . . . . . . . . .
7.8.1 Large Scale Vulnerability Analysis . . . . . . . . .
7.8.2 Studies on Disclosure and Patching . . . . . . . .
7.8.3 Modeling and Classification . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

191
192
193
194
195
195
196
196
197
198
198
199
199
199
200
200
201
201
202
202
203

8 Conclusion

204

BIBLIOGRAPHY

207

xii

LIST OF TABLES
Table 2.1

Values of fop , n, and p for different values of α, β and tag population size 29

Table 2.2 Utm for different population sizes and accuracy requirements

. . . . .

32

Table 3.1 Comparison with Prior C1G2 Compliant Protocols (TH/Prior Art) . .

68

Table 5.1 Summary of network traces . . . . . . . . . . . . . . . . . . . . . . . . 136
Table 5.2 Average number of regular packets after which RLI inserts a probe packet140
Table 6.1 AUC for filtered and unfiltered gestures . . . . . . . . . . . . . . . . . 173
Table 6.2 Comparison of GEAT with [75] . . . . . . . . . . . . . . . . . . . . . . 176
Table 7.1 Results of vulnerability clustering

xiii

. . . . . . . . . . . . . . . . . . . . 186

LIST OF FIGURES
Figure 2.1 Average run size of 0s and 1s vs. tag population size t. (f = 16) . . .

13

Figure 2.2 Expectation of ART estimator . . . . . . . . . . . . . . . . . . . . . .

22

Figure 2.3 Variances of ART estimator . . . . . . . . . . . . . . . . . . . . . . .

22

Figure 2.4 Equation (2.24) as a function of p . . . . . . . . . . . . . . . . . . . .

25

Figure 2.5 Total estimation time vs. frame size . . . . . . . . . . . . . . . . . . .

28

Figure 2.6 Expected value of actual reliability vs. tm
t . . . . . . . . . . . . . . .

28

Figure 2.7 Experimentally observed values of ratio tm
t . . . . . . . . . . . . . . .

32

Figure 2.8 Variance of different estimators versus RFID tag population size . . .

38

Figure 2.9 Estimation time vs. tag population size of ART and existing schemes

40

Figure 2.10 Estimation time vs. required reliability for ART and existing schemes

40

Figure 2.11 Estimation time vs. confidence interval for ART and existing schemes

41

Figure 2.12 Actual reliability achieved by ART for three different requirements . .

41

Figure 3.1 Identifying a population of 9 tags using TW and TH. . . . . . . . . .

44

Figure 3.2 Impact of dynamic adjustment of γop on different types of populations. 50
Figure 3.3 Norm. E[Q] vs. pop. size ∀γ . . . . . . . . . . . . . . . . . . . . . . .

56

Figure 3.4 E[Q]: TH vs. TW . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

Figure 3.5 Max. # queries: TH vs. TW

. . . . . . . . . . . . . . . . . . . . . .

60

Figure 3.6 E[Q] of Reliable TH . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

Figure 3.7 Crossover points obtained using E[Q] and E[T ] . . . . . . . . . . . .

63

Figure 3.8 Normalized expected identification time vs. population size . . . . . .

63

Figure 3.9 Distributions of populations with binary trees with MSBs and LSBs.

65

Figure 3.10 Last level l = b = 4 of the binary trees made with MSBs and LSBs. .

65

Figure 3.11 Normalized queries of TH and existing protocols . . . . . . . . . . . .

73

Figure 3.12 Identification speed of TH and existing protocols

. . . . . . . . . . .

73

Figure 3.13 Normalized responses of TH and existing protocols . . . . . . . . . .

74

xiv

Figure 3.14 Response fairness of TH and existing protocols . . . . . . . . . . . . .

74

Figure 3.15 Normalized collisions of TH and existing protocols . . . . . . . . . . .

74

Figure 3.16 Normalized empty reads of TH and existing protocols . . . . . . . . .

75

Figure 3.17 Distribution of tag responses of TH and existing protocols . . . . . .

77

Figure 4.1 Pf p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Figure 4.2 Sd vs. n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Figure 4.3 f vs. p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

Figure 4.4 n vs. p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

Figure 4.5 Actual reliability vs. missing tags . . . . . . . . . . . . . . . . . . . . 103
Figure 4.6 Detection time vs. missing tags . . . . . . . . . . . . . . . . . . . . . 103
Figure 4.7 Actual reliability vs. number of unexpected tags . . . . . . . . . . . . 104
Figure 4.8 Detection time vs. number of unexpected tags . . . . . . . . . . . . . 105
Figure 4.9 Effect of difference between m and T . . . . . . . . . . . . . . . . . . 105
Figure 5.1 Counter vector and subvectors . . . . . . . . . . . . . . . . . . . . . . 111
C

Figure 5.2 CDF of observed E[Cf ] . . . . . . . . . . . . . . . . . . . . . . . . . . 119
f
Figure 5.3 Permanent storage vs. RAM . . . . . . . . . . . . . . . . . . . . . . . 133
Figure 5.4 Threshold T vs. RAM size . . . . . . . . . . . . . . . . . . . . . . . . 133
Figure 5.5 Flow sizes CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Figure 5.6 CDF of observed β in average estimate (1-S, 1-R) . . . . . . . . . . . 137
Figure 5.7 CDF of observed β in average estimate (multiple S,R) . . . . . . . . . 138
Figure 5.8 CDFs of relative errors in STD . . . . . . . . . . . . . . . . . . . . . 138
Figure 5.9 Rel. error in STD vs. # reps

. . . . . . . . . . . . . . . . . . . . . . 138

Figure 5.10 Storage bits per packet . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Figure 5.11 Comparison of delay estimates . . . . . . . . . . . . . . . . . . . . . . 139
Figure 6.1 GEAT implemented on Windows Phone 7 . . . . . . . . . . . . . . . 145
Figure 6.2 The 10 gestures that GEAT uses . . . . . . . . . . . . . . . . . . . . 148
Figure 6.3 Velocity magnitudes of gesture 4 . . . . . . . . . . . . . . . . . . . . . 153

xv

Figure 6.4 Device acceleration of gesture 4 . . . . . . . . . . . . . . . . . . . . . 153
Figure 6.5 Distributions of stroke time . . . . . . . . . . . . . . . . . . . . . . . 155
Figure 6.6 Dists. of inter-stroke time . . . . . . . . . . . . . . . . . . . . . . . . 155
Figure 6.7 Distributions of disp. mag. . . . . . . . . . . . . . . . . . . . . . . . . 155
Figure 6.8 Distributions of disp. dir.

. . . . . . . . . . . . . . . . . . . . . . . . 155

Figure 6.9 Velocity direction of gesture 10 . . . . . . . . . . . . . . . . . . . . . 156
Figure 6.10 Unfiltered and filtered time series . . . . . . . . . . . . . . . . . . . . 159
Figure 6.11 Dendrograms for feature values with one and two behaviors . . . . . . 161
Figure 6.12 cv vs. time periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Figure 6.13 Consistency factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Figure 6.14 Parameter selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Figure 6.15 EERs with and without accelerometer and FNR at FPR < 0.1% . . . 171
Figure 6.16 EER under different scenarios . . . . . . . . . . . . . . . . . . . . . . 172
Figure 6.17 Avg. FPR vs. TPR for all gestures . . . . . . . . . . . . . . . . . . . 172
Figure 6.18 Effect of system parameters on EER . . . . . . . . . . . . . . . . . . 174
Figure 6.19 Real world results of GEAT . . . . . . . . . . . . . . . . . . . . . . . 175
Figure 7.1 Vulnerability trends in the data set . . . . . . . . . . . . . . . . . . . 183
Figure 7.2 # of vulnerabilities for each vendor (in descending order . . . . . . . 184
Figure 7.3 Evolution of vulnerability clusters over the years . . . . . . . . . . . . 186
Figure 7.4 Yearly change in exploitation behavior for different ted ranges . . . . 188
Figure 7.5 Exploitation trend in clusters . . . . . . . . . . . . . . . . . . . . . . 188
Figure 7.6 Exploited vulnerabilities for vendors relative to disclosure dates . . . 189
Figure 7.7 Exploited vulnerabilities for products relative to disclosure dates . . . 189
Figure 7.8 Exploited vulnerabilities for different CVSS scores . . . . . . . . . . . 192
Figure 7.9 Yearly change in the patching behavior for different tpd ranges . . . . 193
Figure 7.10 Patching trend in clusters . . . . . . . . . . . . . . . . . . . . . . . . 193
Figure 7.11 Patched vulnerabilities for vendors relative to disclosure dates . . . . 194

xvi

Figure 7.12 Patched vulnerabilities for products relative to disclosure dates . . . . 194
Figure 7.13 Patched vulnerabilities for different CVSS scores . . . . . . . . . . . . 196
Figure 7.14 Yearly change in patching vs. exploitation trend for tpe . . . . . . . . 198
Figure 7.15 Patched vulnerabilities for vendors relative to exploit dates . . . . . . 198
Figure 7.16 Patched vulnerabilities for products relative to exploit dates . . . . . 199
Figure 7.17 Patched vulns. relative to exploited vulns.: CVSS . . . . . . . . . . . 199

xvii

1

Introduction

In this thesis I present my work on measurement, modeling, design, and analysis of networking and security systems. For networking, I present my work on probabilistic network
measurements in both wireless as well as wired networks. For wireless networks, I focus on
the modeling, design, and analysis of probabilistic measurement schemes for radio frequency
identification (RFID) systems. More specifically, I present my work on designing statistical
algorithms for estimating the number of tags in a population of RFID tags, for optimizing
the standardized RFID identification protocol, and for detecting missing tags from a population of RFID tags. The key distinction of my work compared to prior art is that my
schemes are compliant with the EPCGlobal Class 1 Generation 2 (C1G2) RFID standard.
It is critical for RFID schemes to be compliant with the C1G2 standard because the commercially available off-the-shelf RFID equipment follows the C1G2 standard. A scheme that
does not comply with the C1G2 standard cannot be deployed on the existing installations of
RFID systems because it requires custom hardware, which costs a lot. For wired networks,
I focus on the modeling, design, and analysis of probabilistic schemes for measuring fundamental network performance metrics such as latency. More specifically, I present my work
on designing statistical algorithms to measure latency of any given flow between any pair of
observation points in a given network. The key distinction of my work compared to prior art
is that my schemes do not use probe packets, which change the behavior of network traffic
and thus skew the measurement results. For security, I present my work on the design of
user security systems and the measurement of software security. For user security systems,

1

I focus on designing learning algorithms for user authentication schemes for smart phones.
For software security, I focus on characterizing trends in life cycles of software vulnerabilities
by studying large vulnerability databases.

1.1

Contributions

This thesis takes an in-depth look at the following research problems.

1.1.1

RFID Estimation [103, 106]

We address the fundamental problem of estimating RFID tag population size, which is
needed in many applications such as tag identification, warehouse monitoring, and privacy
sensitive RFID systems. We propose a new scheme for estimating tag population size called
Average Run based Tag estimation (ART). The technique is based on the average runlength of 1s in the bit string received using the standardized framed slotted Aloha protocol.
ART is significantly faster than prior schemes. For example, given a required confidence
interval of 0.1% and a required reliability of 99.9%, ART is consistently 7 times faster than
the fastest existing schemes (UPE and EZB) for any tag population size. Furthermore,
ART’s estimation time is provably independent of the tag population sizes. ART works
with multiple readers with overlapping regions and can estimate sizes of arbitrarily large tag
populations. ART is easy to deploy because it neither requires modification to tags nor to
the communication protocol between tags and readers. ART only needs to be implemented
on readers as a software module.

1.1.2

RFID Identification [104, 108]

Identifying RFID tags in a given tag population is the most fundamental operation in RFID
systems. While the Tree Walking (TW) protocol has become the industrial standard for
identifying RFID tags, little is known about the mathematical nature of this protocol and

2

only some ad-hoc heuristics exist for optimizing it. In this thesis, first, we analytically model
the TW protocol, and then using that model, propose the Tree Hopping (TH) protocol that
optimizes TW both theoretically and practically. The key novelty of TH is to formulate
tag identification as an optimization problem and find the optimal solution that ensures the
minimal average number of queries or identification time as per the requirement. With this
solid theoretical underpinning, for different tag population sizes ranging from 100 to 100K
tags, TH significantly outperforms the best prior tag identification protocols on the metrics
of the total number of queries per tag, the total identification time per tag, and the average
number of responses per tag by an average of 40%, 59%, and 67%, respectively, when tag
IDs are non-uniformly distributed in the ID space, and of 50%, 10%, and 30%, respectively,
when tag IDs are uniformly distributed.

1.1.3

RFID Missing Tags [109]

RFID systems have been deployed to detect missing products by affixing them with cheap
passive RFID tags and monitoring them with RFID readers. Existing missing tag detection
protocols require the tag population to contain only those tags whose IDs are already known
to the reader. However, in reality, tag populations often contain tags with unknown IDs,
called unexpected tags, and cause unexpected false positives i.e., due to them, missing tags
are detected as present. We take the first step towards addressing the problem of detecting
the missing tags from a population that contains unexpected tags. Our protocol, RUN,
mitigates the adverse effects of unexpected false positives by executing multiple frames with
different seeds. It minimizes the missing tag detection time by first estimating the number of
unexpected tags and then using it along with the false positive probability to obtain optimal
frame sizes and number of times Aloha frames should be executed. RUN works with multiple
readers with overlapping regions. It is easy to deploy because it is implemented on readers
as a software module and does not require modifications to tags or to the communication
protocol between tags and readers. We implemented RUN along with four major missing

3

tag detection protocols and the fastest tag ID collection protocol and compared them sideby-side. Our experimental results show that RUN always achieves the required reliability
whereas the best existing protocol achieves a maximum reliability of only 67%.

1.1.4

Per-flow Latency Measurement [107]

With the growth in number and significance of the emerging applications that require extremely low latencies, network operators are facing increasing need to perform latency measurement on per-flow basis for network monitoring and troubleshooting. In this thesis, we
propose COLATE, the first per-flow latency measurement scheme that requires no probe
packets and time stamping. Given a set of observation points, COLATE records packet
timing information at each point so that later for any two points, it can accurately estimate
the average and standard deviation of the latencies experienced by the packets of any flow
in passing the two points. The key idea is that when recording packet timing information,
COLATE purposely allows noise to be introduced for minimizing storage space, and when
querying the latency of a target flow, COLATE uses statistical techniques to denoise and
obtain an accurate latency estimate. COLATE is designed to be efficiently implementable
on network middleboxes. In terms of processing overhead, COLATE performs only one hash
and one memory update per packet. In terms of storage space, COLATE uses less than 0.1
bit per packet, which means that, on a backbone link with about half a million packets per
second, using a 256GB drive, COLATE can accumulate time stamps of packets traversing
the link for over 1.5 years. We evaluated COLATE using three real traffic traces that include
a backbone traffic trace, an enterprise network traffic trace, and a data center traffic trace.
Results show that COLATE always achieves the required reliability for any given confidence
interval.

4

1.1.5

User Security [110]

With the rich functionalities and enhanced computing capabilities available on mobile computing devices with touch screens, users not only store sensitive information (such as credit
card numbers) but also use privacy sensitive applications (such as online banking) on these
devices, which make them hot targets for hackers and thieves. To protect private information, such devices typically lock themselves after a few minutes of inactivity and prompt a
password/PIN/pattern screen when reactivated. Passwords/PINs/patterns based schemes
are inherently vulnerable to shoulder surfing attacks and smudge attacks. Furthermore,
passwords/PINs/patterns are inconvenient for users to enter frequently. In this thesis, we
propose GEAT, a gesture based user authentication scheme for the secure unlocking of touch
screen devices. Unlike existing authentication schemes for touch screen devices, which use
what user inputs as the authentication secret, GEAT authenticates users mainly based on
how they input, using distinguishing features such as finger velocity, device acceleration,
and stroke time. Even if attackers see what gesture a user performs, they cannot reproduce
the behavior of the user doing gestures through shoulder surfing or smudge attacks. We
implemented GEAT on Samsung Focus running Windows, collected 15009 gesture samples
from 50 volunteers, and conducted real-world experiments to evaluate GEAT’s performance.
Experimental results show that our scheme achieves an average equal error rate of 0.5% with
3 gestures using only 25 training samples.

1.1.6

Software Security [111]

Software systems inherently contain vulnerabilities that have been exploited in the past resulting in significant revenue losses. The study of vulnerability life cycles can help in the
development, deployment, and maintenance of software systems. It can also help in designing future security policies and conducting audits of past incidents. Furthermore, such an
analysis can help customers to assess the security risks associated with software products of
different vendors. In this thesis, we present an exploratory measurement study of a large
5

software vulnerability data set containing 46310 vulnerabilities disclosed since 1988 till 2011.
We investigate vulnerabilities along following seven dimensions: (1) phases in the life cycle of
vulnerabilities, (2) evolution of vulnerabilities over the years, (3) functionality of vulnerabilities, (4) access requirement for exploitation of vulnerabilities, (5) risk level of vulnerabilities,
(6) software vendors, and (7) software products. Our exploratory analysis uncovers several
statistically significant findings that have important implications for software development
and deployment.

1.2

Published Material

The chapters of this dissertation are based in part on the following publications.
• Muhammad Shahzad and Alex X. Liu. “Expecting the Unexpected: Fast and Reliable
Detection of Missing RFID Tags in the Wild”, IEEE INFOCOM, 2015.
• Muhammad Shahzad and Alex X. Liu. “Noise Can Help: Accurate and Efficient Perflow Latency Measurement without Packet Probing and Time Stamping”, ACM SIGMETRICS, 2014.
• Muhammad Shahzad, Alex X. Liu, and Arjmand Samuel. “Secure Unlocking of Mobile
Touch Screen Devices by Simple Gestures – You can see it but you can not do it”, ACM
MobiCom, 2013.
• Muhammad Shahzad and Alex X. Liu. “Probabilistic Optimal Tree Hopping for RFID
Identification”, ACM SIGMETRICS, 2013.
• Muhammad Shahzad and Alex X. Liu. “Every Bit Counts - Fast and Scalable RFID
Estimation”, ACM MobiCom, 2012.
• Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X. Liu. “A Large Scale
Exploratory Analysis of Software Vulnerability Life Cycles”, ICSE, 2012.
• Muhammad Shahzad and Alex Liu. “Probabilistic Optimal Tree Hopping for RFID
Identification”, IEEE/ACM Transactions on Networking (ToN), 2014.

6

• Muhammad Shahzad and Alex Liu. “Fast and Accurate Estimation of RFID Tags”,
IEEE/ACM Transactions on Networking (ToN), 2013.

7

2

RFID Estimation

2.1

Introduction

2.1.1

Motivation and Problem Statement

RFID systems are widely used in various applications such as object tracking [85], 3D positioning [123], indoor localization [86], supply chain management [67], inventory control, and
access control [46, 84] because the cost of commercial RFID tags is negligible compared to
the value of the products to which they are attached (e.g., as low as 5 cents per tag [93]).
An RFID system consists of tags and readers. A tag is a microchip with an antenna in a
compact package that has limited computing power and communication range. There are
two types of tags: (1) passive tags, which are powered up by harvesting the radio frequency
energy from readers (as they do not have their own power sources) and have communication
range often less than 20 feet; (2) active tags, which have their own power sources and have
relatively longer communication range. A reader has a dedicated power source with significant computing power. It transmits a query to a set of tags and the tags respond over a
shared wireless medium.
This chapter concerns the fundamental problem of estimating the size of a given tag population. This is needed in many applications such as tag identification, privacy sensitive
RFID systems, and warehouse monitoring. In tag identification protocols, which read the
ID stored in each tag, population size is estimated at the start to guide the identification process [105]. For example, for tag identification protocols that are based on the framed slotted
8

Aloha protocol (standardized in EPCGlobal Class-1 Generation-2 (C1G2) RFID standard
[55] and implemented in commercial RFID systems), tag estimation is often used to calculate
the optimal frame size. In privacy sensitive RFID systems, such as those used in parks for
continuously monitoring the number of visitors in different areas of a park to plan the guided
trips efficiently, readers may not have the permission to identify human individuals. In warehouses with RFID-based monitoring systems, managers often need a quick estimate of the
number of products left in stock for various purposes such as the detection of employee theft.
Note that although tag population size can be accurately measured by tag identification, the
speed will be too slow.
We formally define the tag estimation problem as: given a tag population of unknown
size t, a confidence interval β ∈ (0, 1], and a required reliability α ∈ [0, 1), a set of readers
needs to collaboratively compute the estimated number of tags t˜ so that P |t˜ − t| ≤ βt ≥ α.
When the number of readers is one, we call this problem single-reader estimation; otherwise,
we call this problem multi-reader estimation. A tag estimation scheme should satisfy the
following three requirements:
1. Reliability: The actual reliability should always be greater than or equal to the required reliability. The reliability α given as input is called the required reliability. The
reliability that an estimation scheme achieves is called its actual reliability.
2. Scalability: The estimation time needs to be scalable to large population sizes because
in many applications, the number of passive tags can be very large due to their low
cost, easy disposability, and powerless operation.
3. Deployability: The estimation scheme needs to be compliant with the C1G2 standard
and should not require any changes to tags.

2.1.2

Proposed Approach

In this chapter, we propose a new scheme called Average Run based Tag estimation (ART ),
which satisfies all of the above three requirements. The communication protocol used by

9

ART is the standardized framed slotted Aloha protocol, in which a reader first broadcasts
a value f to the tags in its vicinity where f represents the number of time slots present in
a forthcoming frame. Then each tag randomly picks a time slot in the frame and replies
during that slot. Thus, the reader gets a binary sequence of 0s and 1s by representing a slot
with no tag replies as 0 and a slot with one or more tag replies as 1. The key idea of ART is
to estimate tag population size based on the average run size of 1s in the binary sequence.
We show that the average run size of 1s in a frame monotonously increases with the increase
in the size of tag population. Thus, average run size of 1s is an indicator of tag population
size.

2.1.3

Advantages of ART over Prior Art

ART is advantageous in terms of speed and deployability. For speed, ART is faster than all
prior schemes. For example, given a confidence interval of 0.1% and the required reliability
of 99.9%, ART is consistently 7 times faster than the fastest existing schemes (i.e., UPE [61]
and EZB [62]) for any tag population size. The reason behind ART being faster than prior
schemes is that the new estimator that we propose in this chapter, namely the average run
size of 1s, has significantly smaller variance compared to the estimators used in prior schemes
(such as the total number of 0s [61, 62] and the location of the first 1 in the binary sequence
[53]), as we analytically show in Section 2.7.3. An estimator with small variance is faster
because the Aloha frames need to be repeated fewer times to achieve the required reliability.
Furthermore, the estimation time of ART is provably independent of tag population sizes.
In contrast, as tag volume increases, the estimation time of some prior schemes (e.g., FNEB
[53]) increases.
For deployability, ART neither requires modification to the tags nor to the communication
protocol between tags and readers. ART only needs to be implemented on the reader side
as a software module without any hardware modifications. ART also does not demand
any unpractical system parameters beyond the C1G2 standard. In contrast, some prior

10

schemes require modification to tags and some demand unrealistic system parameters. For
example, the scheme in [90] requires each tag to store thousands of hash functions, which is
not practical to implement on passive tags and is not compliant with the C1G2 standard.
As another example, the scheme in [53] uses increasingly large frame sizes as population
size increases (e.g., the frame size required by the scheme in [53] is greater than half of tag
population size), which soon exceeds the maximum limit allowed by the C1G2 Standard.

2.2

Related Work

The first tag estimation scheme, called Unified Probabilistic Estimator (UPE), was proposed
by Kodialam and Nandagopal in 2006 [61]. UPE uses the framed slotted Aloha protocol
and makes estimation based on either the number of empty slots or that of collision slots
in a frame. Besides this estimator having larger variance than ART, UPE requires the
differentiation among empty, single, and collision slots, which takes significantly more time
than differentiating between empty and non-empty slots. According to C1G2, a reader
requires 300µs to detect an empty slot, 1500µs to detect a collision, and 3000µs to complete a
successful read. In [62], Kodialam et al. proposed an improved framed slotted Aloha protocol
based estimation scheme called Enhanced Zero Based (EZB) estimator, which performs
estimation based on the total number of 0s in a frame. While UPE estimates population
size in each round and averages the estimated sizes when all rounds are finished, EZB only
records the total number of 0s in each frame and at the end of all rounds, EZB first averages
the recorded values and then uses it to do estimation.
In [90], Qian et al. proposed an estimation scheme called Lottery Frame (LoF). Compared
to UPE and EZB, LoF is faster; but, it is impractical to implement as it requires each tag to
store a large number (i.e., the number of bits in a tag ID times the number of frames, which
can be in the scale of thousands) of unique hash functions. LoF needs to modify both tags
and the communication protocol between readers and tags, which makes it non-compliant

11

with C1G2. Han et al. proposed a tag estimation scheme called First Non Empty Based
(FNEB) estimator, which is based on the size of the first run of 0s in a frame [53]. FNEB
is based on an assumption that frame size can be arbitrarily large, which does not hold in
practice. Li et al. proposed an estimation scheme called Maximum Likelihood Estimator
(MLE) for active tags with the goal of minimizing power consumption of active tags [72]. In
[101], Shah and Wong proposed a multi-reader tag estimation scheme which is based on an
unrealistic assumption that any tag covered by multiple readers only replies to one reader.
In [127], Zanella proposed Collision Set Estimator (CSE) that utilizes maximum likelihood
estimation to estimate the number of tags in a population. CSE does not take accuracy
requirements (α and β) as input and, therefore, can not achieve any arbitrary required
reliability.

2.3
2.3.1

ART — Scheme Overview
Communication Protocol Overview

ART uses the framed slotted Aloha protocol specified in C1G2 as its MAC layer communication protocol. In this protocol, the reader first tells tags the frame size f and a random seed
number R. Later in the chapter, we will see how a simple use of seed number R will make
it straightforward to extend our estimation scheme to use multiple readers with overlapping
regions. Each tag within the transmission range of the reader then uses f , R, and its ID to
select a slot in the frame by evaluating a hash function h(f, R, ID) whose result is in [1, f ]
following a uniform distribution. Each tag has a counter initialized with the slot number it
chose to reply. After each slot, the reader first transmits an end of slot signal and then each
tag decrements its counter by one. In any given slot, all the tags whose counters are equal
to 1 respond to the reader. In essence, each tag picks a random slot from 1 to f following a
uniform distribution. If no tag replies in a slot, it is called an empty slot; if exactly one tag
replies, it is called a singleton slot; and if two or more tags reply, it is called a collision slot.

12

2.3.2

Estimation Scheme Overview

At the end of a frame, the reader obtains a sequence of 0s and 1s by representing an empty
slot with 0 and a singleton or collision slot with 1. In this binary sequence, a run is a
subsequence where all bits in this subsequence are 0s (or 1s) but the bits before and after
the subsequence are 1s (or 0s), if they exist. For example, 011100 has 3 runs: 0, 111, and
00.

Average size of runs

20

1s
0s

15

10

5

0
0

50
100
Number of tags t

150

Figure 2.1 Average run size of 0s and 1s vs. tag population size t. (f = 16)

ART uses the average run size of 1s to estimate tag population size. The intuition is that as
tag population size increases, the average run size of 1s increases (and that of 0s decreases).
We illustrate this intuition using the simulation results in Figure 2.1, which shows that the
average run size of 1s increases as tag population size increases from 0 to 160. The markers
in this figure are the average of 100 runs. The lines above and below each marker show
the standard deviation of the experiments. This figure shows that given a tag population
size and a frame size, there is a distinct expected value of the average run size of 1s. The
expected value of the average run size of 1s is a monotonic function of the number of tags,
which means that a unique inverse of this function exists. Thus, given the observed average
run size of 1s, using the inverse function, we can get the estimated value t˜ of tag population
size t. Similar to other tag estimation schemes, ART also uses multiple frames obtained from
multiple rounds of the framed slotted Aloha protocol to reduce its estimation variance and
therefore increase its estimation reliability. Using different seed values for different frames,

13

in each frame, the same tag will choose a different slot to respond.
To scale to large tag population sizes, ART uses a persistence probability p by which a tag
decides whether it should reply to the reader in a given frame. The persistence probability
was first introduced in [61]. To avoid making any modification to tags, this probability is
implemented by “virtually” extending frame size 1/p times, i.e., the reader announces a
frame size of f /p but terminates the frame after the first f slots. According to C1G2, the
reader can terminate a frame at any point. By adjusting p, ART is able to estimate tag
populations of large sizes.

2.3.3

Formal Development: Overview and Assumptions

To formally develop an estimator, we first need to derive the equation for the expected value
of average run size of 1s as a function of frame size f , tag population size t, and persistence
probability p. We then use the inverse of this function to get the estimated value t˜ from the
observed value of the average run size of 1s. To achieve the required reliability in minimum
estimation time, we optimize f , p, and the number of rounds n so that the total number of
slots (f + l) × n is minimized while satisfying P {|t˜ − t| ≤ βt} ≥ α. Here l is a constant
that represents the C1G2 specified mandatory time delay in terms of number of empty slots
between the end of a frame and the start of next frame. Typically, this delay is about 1ms
(i.e., l ≈ 3.33 empty slots) [55, 100].
To make the formal development tractable, we assume that instead of picking a single slot
to reply at the start of frame of size f , a tag independently decides to reply in each slot of
the frame with probability 1/f regardless of its decision about previous or forthcoming slots.
Vogt first used this assumption for the analysis of framed slotted Aloha protocol for RFID
and justified its use by recognizing that this problem belongs to a class of problems known
as “occupancy problems”, which deals with the allocation of balls to urns [121]. Ever since,
the use of this assumption has been a norm in the formal analysis of all Aloha based RFID
protocols [121, 37, 129, 61, 62, 90, 53, 72, 101, 102].

14

The implication of this assumption is that when a tag independently chooses a slot to
reply, it can end up choosing more than one slots in the same frame or even not choosing
any at all, which is not in accordance with C1G2 standard that requires a tag to pick exactly
one slot in a frame. However, even with the independence assumption, the expected number
of slots that a tag chooses in a frame is still one. As we draw our estimate from a large
number of frames to achieve required reliability, we can expect to observe this expected
number. Therefore, the analysis with the assumption of independence is asymptotically the
same as that without the independence assumption. Bordenave et al. further explained in
detail why this independence assumption in analyzing Aloha based protocols provides results
just as accurate as if all the analysis was done without this assumption [33]. Note that this
independence assumption is made only to make the formal development tractable. In all the
simulations we have presented in this chapter, a tag chooses exactly one slot at the start of
frame.

2.4

ART — Estimation Algorithm

Next, we first focus on the single-reader version of ART. In Section 2.6.2, we will present a
method to extend ART to handle multiple-readers with overlapping regions.
For ART, in each round of the Aloha protocol, we calculate the average run size of b. For
example, the average run size of 1 in frame 01110011 (which has two runs of 1, i.e., 111 and
11) is (3 + 2)/2 = 2.5. After n rounds, we obtain n average run sizes of b and then calculate
the average of these n values. This final value is then substituted for the expected value of
the average run size of b in a frame to estimate the tag population size.
The probability that a slot in a frame is b, where b = 0 or 1, can be calculated using
Lemma 1.
Lemma 1. Let t be the actual tag population size, f be the frame size, p be the persistence
probability (i.e., the probability that a tag participates in a frame), and qb be the probability

15

that a slot in a frame is b. Thus:

qb =

(1 − fp )t
if b = 0
p t
1 − (1 − f ) if b = 1

(2.1)

Proof. The probability that a tag chooses a given slot in a frame is p/f . The probability
that it does not choose that slot is 1 − fp . The probability that none of the tags choose that

slot is (1 − fp )t , which is the value of q0 . As the tags choose the slots independently, qb is

the same for each slot of the frame. The probability that a slot is chosen by at least one tag
is 1 − q0 , which is the value of q1 .
Let Xb be the random variable representing the average run size of b in a frame. Next, we
calculate the expectation and variance of Xb . The expectation of Xb will be used to estimate
the tag population size and the variance of Xb will be used to calculate the values of p, n,
and f that will ensure that the actual reliability is greater than the required reliability and
the estimation time is minimium. Let Yb be the random variable representing the number
of times b occurs in a frame and Rb be the random variable representing the number of
Y

runs of b in a frame. By definition, Xb = Rb holds for any frame. Next, we first calculate
b
E[Yb ], Var(Yb ), E[Rb ], Var(Rb ), and Cov(Yb , Rb ) in Lemmas 2 and 3. Then, we use them to
calculate E[Xb ] and Var(Xb ) in Theorem 4. Using Equation (2.12) in Theorem 4, replacing
E[Xb ] by the observed average run size of b from n frames, we obtain an equation with only
one unknown t. Finally, we use Brent’s method to obtain the numerical solution of this
equation. The result is the estimated tag population size t˜. Since ART uses Xb to estimate
the tag population size, we call Xb the estimator of ART.
Lemma 2. Let Yb be the random variable representing the number of times b occurs in a
frame and Rb be the random variable representing the number of runs of b in a frame. Given
tag population size t, frame size f , and persistence probability p, we have:

16

E[Yb ] = f qb

(2.2)

Var(Yb ) = f qb (1 − qb )

(2.3)

E[Rb ] = qb qb + f (1 − qb )

(2.4)

Var(Rb ) = f (qb −4qb2 + 6qb3 −3qb4 ) + (3qb2 −8qb3 + 5qb4 )

(2.5)

Proof. Each slot i of frame f has probability qb of being b. Therefore, Yb ∼ Binom(f, qb ).
Using general formula for expectation and variance of a binomial random variable, E[Yb ] and
Var(Yb ) are given by Equations (2.2) and (2.3).
Let γ1 , γ2 , . . . , γf represent the sequence of binary random variables representing the
value of each slot in a frame of size f . Since each tag randomly and independently picks
a slot in the frame, all γi are identically distributed. Furthermore, P {γi = b} = qb . Let
b = 1 − b and let Ii be the indicator random variable whose value is 1 if a run of b begins at
γi .
Ii =
Thus, Rb =

f
i=1 Ii .

1 if (γi = b, i = 1) ∨ (γi = b ∧ γi−1 = b, i > 1)
0 otherwise

Because

E[Ii ] =

P {γi = b} = qb
if i = 1
P γi−1 = b, γi = b = qb (1 − qb ) if i > 1

we get
f

E[Rb ] =

f

E[Ii ] = qb +
i=1

i=2

qb (1 − qb ) = qb qb + f (1 − qb )

As Rb is sum of f random variables, some of which are correlated, we use the general
expression for variance of sum of correlated random variables to obtain the variance of Rb .
f

Var(Rb ) = Var(

f

Ii ) =
i=1

f

Var(Ii ) + 2
i=1

Cov(Ii , Ij )
j=2 ∀i<j

Here we used the fact that the frame size is always greater than 1 during the estimation
process whenever the information about runs is used. As Ii ∼ Bernoulli(qb ), its variance is
that of a bernoulli random variable given by
Var(Ii ) = E[Ii ](1 − E[Ii ])
17

(2.6)

Note that Ii and Ij are dependent on each other if and only if i = j − 1 because Ij−1 and Ij
can not both be 1 in the same frame. Other than that, ∀i < j − 1, Ii and Ij are independent.
Thus,


if i < j − 1
 0
Cov(Ii , Ij )= −E[Ii ]E[Ij ] = −E[Ii ]qb (1 − qb )


if i = j − 1

Hence we have:

f

Var(Rb ) = Var(I1 ) +

f

Var(Ij ) + 2Cov(I1 , I2 ) + 2

Cov(Ij−1 , Ij )

j=2

= qb (1 − qb ) + (f

j=3
− 1)qb (1 − qb ) {1 − qb (1 − qb )} − 2qb2 (1 − qb ) − 2(f

− 2)qb2 (1 − qb )2

= f (qb − 4qb2 + 6qb3 − 3qb4 ) + (3qb2 − 8qb3 + 5qb4 )
Lemma 3. Given tag population size t, frame size f , and persistence probability p, we have:
f
f ⌈2⌉

Cov(Yb , Rb ) =

y=0 r=0

y

yrqb (1 − qb )f −y .ξ {f, y, r} − E[Yb ]E[Rb ]
(2.7)

where

y−1
f −y−1

+ 2 f −y−1
+ f −y−1

r

r−1
r−2
r−1




if
r
>
1
∧
0
<
y
<
f
∧
r
≤
y
∧
r
≤
f −y−1








y−1
f −y−1

+ f −y−1

r
r−1 2 r−1



 if r = 1 ∧ 0 < y < f ∧ r ≤ y ∧ r ≤ f − y − 1
ξ {f, y, r}=




1 if r = 1 ∧ y = f









1 if r = 0 ∧ y = 0








0 otherwise

Proof. By definition, we have

f

f

Cov(Yb , Rb ) =
y=0 r=0

yrP {Yb = y, Rb = r} − E[Yb ]E[Rb ]
18

(2.8)

Here P {Yb = y, Rb = r} represents the probability that exactly y out of f slots in the frame
are b and at the same time the number of runs of b is r. This probability is difficult to
evaluate directly, but conditioning on Yb simplifies the task.
P {Yb = y, Rb = r} = P {Rb = r|Yb = y} × P {Yb = y}

(2.9)

As Yb ∼ Binom(f, qb ), we have:
P {Yb = y} =

f y
q (1 − qb )f −y
y b

(2.10)

Now we calculate P {Rb = r|Yb = y} i.e., the probability of having r runs of b in a frame
of size f given that y out of f slots are b. As tags choose the slots independently, each
occurrence with r runs having y slots of b is equally likely. Therefore, we determine the total
number of ways, denoted by ξ {f, y, r}, in which y occurrences of b and f − y occurrences
of b can be arranged such that the number of runs of b is r. We treat this as an ordered
partition problem. First, we separate all the y occurrences of b from the frame and make r
partitions of these y occurrences. Then, we create appropriate number of partitions of f − y
occurrences of b such that between consecutive partitions of b, the partitions of b can be
interleaved. For r partitions of b, there are 4 possible partitions of b.
1. The frame starts with b and ends with b, implying that there are r − 1 partitions of b,
each interleaved between adjacent partitions of b.
2. The frame starts with b and ends with b, implying that there are r partitions of b.
3. The frame starts with b and ends with b, implying that there are r partitions of b.
4. The frame starts with b and ends with b, implying that there are r + 1 partitions of b.
We can make r partitions of y occurrences of b in y−1
r−1 ways and r partitions of f − y

occurrences of b in f −y−1
ways. Similarly, we can make r +1 partitions of f −y occurrences
r−1

of b in f −y−1
ways and r − 1 partitions of of f − y occurrences of b in f −y−1
ways. The
r
r−2
equation of ξ {f, y, r} in the lemma statement follows from this discussion. The total number
of ways in which y zeros can be arranged among f slots is fy . Thus, we get
ξ {f, y, r}
P {Rb = r|Yb = y} =
f
y

19

(2.11)

Substituting values from Eqs. (2.10) and (2.11) in (2.9) and (2.8) gives Eq. (2.7).
Theorem 4. Given tag population size t, frame size f , and persistence probability p, we
have:
E[Xb ] =
Var(Xb ) =

Cov(Yb , Rb )
E[Yb ]
E[Y ]
−
+ 3 b Var(Rb )
2
E[Rb ]
E [Rb ]
E [Rb ]

(2.12)

Var(Yb ) 2E[Yb ]
E 2 [Yb ]
−
Cov(Y
,
R
)
+
Var(Rb )
b b
E 2 [Rb ]
E 3 [Rb ]
E 4 [Rb ]

(2.13)

Y

Proof. Let g(Yb , Rb ) = Xb = Rb . The Taylor series expansion of g around (θ1 , θ2 ) is:
b
g(Yb, Rb ) =

∞
j=0

1
∂
∂ j
(Yb − θ1 ) ′ + (Rb − θ2 ) ′ × g(Yb′ , Rb′ ) Y ′ =θ
j!
∂Yb
∂Rb
b 1

Rb′ =θ2

According to Bienaym´e-Chebyshev inequality, we have θ1 = E[Yb ] and θ2 = E[Rb ]. Therefore, we get the following expansion of the Taylor series of g(Yb , Rb ):

∂g
∂g
+ (Rb − θ2 )
g(Yb , Rb ) = g(θ1 , θ2 ) + (Yb − θ1 )
∂Y
∂R
+

∂2g

b

∂2g

b

1
∂2g
+ (Rb − θ2 )2 2 + O(j −1 )
(Yb − θ1 )2 2 + 2(Yb − θ1 )(Rb − θ2 )
2
∂Yb ∂Rb
∂Yb
∂Rb

Taking the expectation of both sides, we get

E[g(Yb, Rb )] ≈

∂2g
∂2g
1
∂2g
Var(Yb ) 2 + 2Cov(Yb , Rb )
+ Var(Rb ) 2 + g(θ1 , θ2 )
2
∂Yb ∂Rb
∂Yb
∂Rb

(2.14)

Evaluating the partial derivatives of g as required in Equation (2.14), we get
∂ 2 g(Yb , Rb )
∂ 2 g(Yb , Rb )
1
∂ 2 g(Yb, Rb )
θ1
=
0,
=
−
,
Yb =θ1
Yb =θ1
Yb =θ1 = 2 3
2
2
2
∂Yb ∂Rb
∂Yb
θ2
∂Rb
θ1
Rb =θ2
Rb =θ2
Rb =θ2
Putting these values in Equation (2.14) and using θ1 = E[Yb ] and θ2 = E[Rb ], we get
Equation (2.12). The variance can be calculated as follows:
Var g(Yb , Rb ) = E

g(Yb , Rb ) − E[g(Yb , Rb )]
20

2

(2.15)

Considering that E[g(Yb, Rb )] is being squared in the expression above, we use first order
Taylor series expansion to get the value of E[g(Yb , Rb )] and substitute it in Equation (2.15).
E[g(Yb , Rb )] = E (Yb − θ1 )
= (0)

∂g
∂g
+ (Rb − θ2 )
+ g(θ1 , θ2 ) + O(j −1)
∂Yb
∂Rb

∂g
∂g
+ (0)
+ g(θ1 , θ2 ) + O(j −1 ) ≈ g(θ1 , θ2 )
∂Yb
∂Rb

Substituting the value of E[g(Yb , Rb )] and using the first order Taylor series expansion of
g(Yb , Rb ) in (2.15), we get
∂g
∂g 2
+ (Rb − θ2 )
+ O(j −1 )
∂Yb
∂Rb
∂g 2
∂g ∂g
∂g 2
≈ Var(Yb )(
) + 2Cov(Yb , Rb )
+ Var(Rb )(
)
∂Yb
∂Yb ∂Rb
∂Rb
Var g(Yb, Rb ) = E

(Yb − θ1 )

(2.16)
Evaluating the partial derivatives of g as required in the equation above, we get
1 ∂g(Yb , Rb )
θ1
∂g(Yb , Rb )
,
Yb =θ1 =
Yb =θ1 = − 2
∂Yb
θ2
∂Rb
θ2
Rb =θ2
Rb =θ2
Putting these values in Equation (2.16) and using θ1 = E[Yb ] and θ2 = E[Rb ], we get
Equation (2.13).
Figures 2.2 and 2.3 show the expectation and variance of X1 calculated using Equations
(2.12) and (2.13), respectively, with f = 16 and p = 1. The dots in these figures represent the
corresponding values obtained through 100 repetitions of simulation for each tag population
size. These figures show that the values given by Equations (2.12) and (2.13) track the
simulation results very well, which serves as an experimental proof that the assumption
“instead of picking a single slot to reply at the start of frame of size f , a tag independently
decides to reply in each slot of the frame with probability 1/f regardless of its decision about
previous or forthcoming slots” practically holds.

2.5

ART — Parameter Tuning

To minimize estimation time while achieving required reliability, next, we obtain values of
persistence probability p, number of rounds n, and frame size f . As we have three unknowns,
21

30

ART
Simulations

ART
Simulations

25

10

Variance

Average Run Size

15

20
15
10

5

5
0
0

50
100
Number of tags t

0
0

150

50
100
Number of tags t

150

Figure 2.2 Expectation of

Figure 2.3 Variances of ART

ART estimator

estimator

we require three equations that can be solved simultaneously. We derive these three equations
using following three conditions: (1) the confidence interval should be symmetric around t
i.e., |t˜ − t| ≤ βt, (2) actual reliability is greater than or equal to the required reliability i.e.,
P |t˜ − t| ≤ βt ≥ α, and (3) estimation time is minimized. We use the first condition to
calculate p, the second condition to calculate n, and the last condition to calculate f .
Although both X0 and X1 can be used to estimate the tag population size, we choose X1
for ART because the tag population size estimation calculated from X1 has smaller variance
compared to X0 as we show in Section 2.7.3. It is worth noting that X0 and X1 are not
equivalent estimators. The average run size of 0s cannot be inferred from the average run
size of 1s, and vice versa. For example, 1100011 and 1100110 have the same average run
size of 1s, but they have different average run size of 0s. Fundamentally, X0 and X1 are not
equivalent estimators because for any slot, the probability of it being 0 and that of it being
1 are different.

2.5.1

Persistence Probability p

We express confidence interval requirement |t˜ − t| ≤ βt as
(1 − β)t ≤ t˜ ≤ (1 + β)t

22

(2.17)

Recall from Lemma 1 that we use q1 to denote the probability that a slot in a frame is 1
when the number of tags in the population are t and the persistence probability is p. Let
q1+ and q1− denote the probabilities that a slot in a frame is 1 when the number of tags in
the population are (1 + β)t and (1 − β)t, respectively, and the persistence probability is p.
Let q˜1 represent the estimate of q1 . Therefore, we have
q1+ = 1 − (1 −
q1− = 1 − (1 −
q˜1 = 1 − (1 −

ln 1 − q1+
p (1+β)t
)
⇒ (1 + β)t =
f
ln 1 − fp
ln 1 − q1−
p (1−β)t
)
⇒ (1 − β)t =
f
ln 1 − fp
p t˜
)
f

ln {1 − q˜1 }
t˜ =
ln 1 − fp

⇒

(2.18)

(2.19)
(2.20)

Substituting values of (1 + β)t, (1 − β)t, and t˜ from Equations (2.18), (2.19), and (2.20),
respectively, into Expression (2.17), we get
ln 1 − q1−
ln 1 − fp

As ln 1 − fp

≤

ln {1 − q˜1 }

ln 1 − fp

≤

ln 1 − q1+
ln 1 − fp

< 0, thus,
ln 1 − q1+ ≤ ln {1 − q˜1 } ≤ ln 1 − q1−

Exponentiating and rearranging, the confidence interval requirement becomes
q1− ≤ q˜1 ≤ q1+
As E[X1 ] and Var(X1 ) are functions of q1 , denoting E[X1 ] by µ {q1 }, Var(X1 ) by σ 2 {q1 },
and the observed average value of X1 from the n frames by X˜1 , we have q˜1 = µ−1 {X˜1 }.
Using µ−1 {X˜1 } to substitute q˜1 in the expression above, we get
q1− ≤ µ−1 {X˜1 } ≤ q1+ ⇒ µ q1− ≤ X˜1 ≤ µ q1+
Based on the fact that the variance of a random variable is reduced by n times if the same
experiment is repeated n times, by running n rounds and getting n frames, the variance of X1
23

becomes

σ 2 {q1 }
n

˜

σ{q }
X1 −µ{q1 }
√ .
and the standard deviation of X1 becomes √n1 . Let Z denote σ{q
1 }/ n

Thus, the expression above becomes

µ q1− − µ {q1 }
σ{q1 }
√
n

≤Z≤

µ q1+ − µ {q1 }
σ{q1 }
√
n

(2.21)

By the central limit theorem, Z approximates a standard normal random variable. The
area under the standard normal curve gives the success probability, which is the required
reliability in our context. For the confidence interval to be symmetric on both the upper
and lower sides of the population size as per the first of the three conditions, the absolute
value of the upper and lower limits of Z should be equal. Let k represent the absolute value
of these upper and lower limits. Thus, we can represent Z as follows:
−k ≤ Z ≤ k

(2.22)

From Expressions (5.19) and (5.20), we get
µ q1− − µ {q1 }
σ{q1 }
√
n

= −k,

µ q1+ − µ {q1 }
σ{q1 }
√
n

=k

(2.23)

As absolute values of the right hand sides (R.H.S.) of both equations above are k, we get
2µ {q1 } − µ q1+ − µ q1− = 0

(2.24)

The equation above gives the condition that needs to be satisfied to make the confidence
interval symmetric around the tag population size. Figure 2.4 plots the value of left hand
side (L.H.S) of this equation as a function of p for three different values of β. We can see
that it is a well behaved function of p and thus, there exists a unique value of p that makes
it equal to zero. Furthermore, we also observe that all the curves cross the zero line at the
same point which gives us a hint that the solution to the equation above is independent of
β. Next we solve this equation.
Applying the first order Taylor series expansion on µ {q1 }, we get µ {q1 } = E[Y1 ]/E[R1 ].
Using the expressions of E[Y1 ] and E[R1 ] from Equations (2.2) and (2.4) respectively, we
24

0.1

Function value

0.08

β=0.05
β=0.04
β=0.03

0.06
0.04
0.02
0
−0.02
0

0.2
0.4
0.6
0.8
Persistence probabilty p

1

Figure 2.4 Equation (2.24) as a function of p
can express µ {q1 }, µ q1+ , and µ q1− as follows:
µ {q1 } =
µ

q1+

f q1
q1 q1 + f (1 − q1 )
f q1+

= + +
q1 q1 + f (1 − q1+ )

f q1−
µ q1− = − −
q1 q1 + f (1 − q1− )
Substituting these expressions in Equation (2.24), we get
2
1
1
− +
− −
=0
+
q1 + f (1 − q1 ) q1 + f (1 − q1 ) q1 + f (1 − q1− )
Substituting the value of q1 , q1+ , and q1− from Equations (2.1), (2.18), and (2.19) respectively,
into the equation above, and to simplify the presentation, using η = (1 − fp )t , we get
2
1
1
−
−
=0
1+β
1+β
1−β
1 − η + fη 1 − η
+ fη
1−η
+ f η 1−β
Next, we do algebraic simplification of the expression above.
− 1 − η + fη

1 − η 1+β + f η 1+β + 1 − η 1−β + f η 1−β

+2 1 − η 1+β + f η 1+β 1 − η 1−β + f η 1−β = 0
Dividing the equation above by η 1−β , we get
− 1 − η + fη

2η β−1 − η 2β + f η 2β − 1 + f

+2 1 − η 1+β + f η 1+β η β−1 − 1 + f = 0
25

Simplifying the equation above, we get
(f − 1) + η 2β − 1 + f + 2f η − η − f 2 η
+ 2η β η(f − 1)2 + 1 − f − η 1 − 2f + f 2 = 0
⇒ (f − 1) + η 2β (f − 1) − η(f − 1)2
+ 2η β η(f − 1)2 − (f − 1) − η(f − 1)2 = 0

Dividing the equation above by f − 1 and simplifying, we get
⇒ 1 − η(f − 1) (1 − η β )2 = 0
In the equation above, either 1 − η(f − 1) = 0 and/or 1 − η β = 0. The value of 1 − η β equals
zero only when β = 0, but we know from our problem statement that β ∈ (0, 1] i.e., β = 0.

Therefore, 1 − η(f − 1) = 0. Putting back η = (1 − fp )t and solving 1 − (f − 1)(1 − fp )t = 0
for p, we get
p=f 1−

1
1
t
f −1

(2.25)

Note that this equation does not involve β, which shows that indeed the solution to Equation
(2.24) is independent of β as we had intuitively inferred from Figure 2.4.
Equation (2.25) is first of the three equations that we will solve simultaneously. This
equation requires the value of actual tag population size t which we do not know. Fortunately,
we can calculate an upper bound, tm , on the actual tag population size and use that in
Equation (2.25) instead of t. We will describe a method to obtain tm in Section 2.5.4,
and also determine how close tm has to be to t to ensure that ART achieves the required
reliability.

26

2.5.2

Number of Rounds n

Using the persistence probability calculated in Equation (2.25), the two equations in (2.23)
hold. From them, we get
2
kσ {q1 }
=n=
µ q1+ − µ {q1 }

2
−kσ {q1 }
µ q1− − µ {q1 }

(2.26)

Let Φ be the cumulative distribution function of a standard normal distribution and erf {.}
be the standard error function, we get
P {−k ≤ Z ≤ k} = Φ(k) − Φ(−k) = erf

k
√
2

(2.27)

P {−k ≤ Z ≤ k} gives the success probability in terms of the area under the standard normal
curve between −k and +k. As per the second of the three conditions, this area should be at
least equal to the required reliability α i.e.,
P {−k ≤ Z ≤ k} = α

(2.28)

From Equations (5.21) and (2.28), we get
k=

√

2 erf−1 {α}

From Equations (2.26) and (2.29), we get
√
√
2 erf−1{α} ×σ {q1 } 2
− 2 erf−1{α}×σ {q1 } 2
=n=
µ q1+ − µ {q1 }
µ q1− − µ {q1 }

(2.29)

(2.30)

Equation (2.30) is second of the three equations that we will solve simultaneously.

2.5.3

Optimal Frame Size f

As per the third of the three conditions, total estimation time should be minimum. The
total estimation time is directly proportional to total number of slots, (f + l) × n, which is

27

1
Expected Reliability

800

(f + 3) × n

700

600

500

400

10

20
30
Frame size f

40

0.95

U ↑

tm

tm

0.85
0.8
0.75
0.7
0.6

50

↑L

0.9

1

1.4

t /t

1.8

2.2

2.6

m

Figure 2.5 Total estimation

Figure 2.6 Expected value of

time vs. frame size

actual reliability vs. tm
t

a convex function of f as seen from Figure 2.5. This means that an optimal frame size fop
exists and can be obtained by differentiating (f + l) × n with respect to f as shown below:
d
(f + l) × n = 0
df

(2.31)

Equation (2.31) is third of the three equations that we will solve simultaneously.
Required reliability α and confidence interval β are given constants and tm is calculated
using method proposed in the next Section 2.5.4. Thus, p, q1 , q1+ , and q1− are all functions
of f . Consequently, n is a function of f and, therefore, (f + l) × n is also a function of f
with only one unknown, i.e., f . The numerical solution of Equation (2.31) gives the optimal
value of frame size, represented by fop .
To numerically solve Equation (2.31), we substitute the value of n from Equation (2.30) in
Equation (2.31). As both expressions for n given in Equation (2.30) have same values when
p is calculated using Equation (2.25), either of them can be used to calculate n. Substituting
n in Equation (2.31) by the L.H.S of the expression for n in Equation (2.30), we get
µ

q1+

where

−µ q1

∂µ .
∂f

and

∂σ q1
σ q1 +2(f +l)
∂f
∂σ .
∂f

−2(f +l)σ q1

∂µ q1+
∂µ q1
−
∂f
∂f

= 0 (2.32)

are obtained through the differentiation of expressions for E[Xb ] and

Var(Xb ) in Equations (2.12) and (2.13), respectively. We solve Equation (2.32) numerically
to obtain fop .

28

2.5.3.1

Summary of steps to calculate p, n, and fop

First, we calculate the value of tm , as explained in the next Section 2.5.4. Second, we
numerically solve Equation (2.32) to obtain fop. Third, we put this value of fop along with
tm in Equation (2.25) to obtain the value of p. Last, we put the resulting value of p along
with fop in Equation (2.30) and obtain the value of n. Note that although Equation (2.25)
does not involve α and β, p still depends on them because it is a function of f and the
optimal value of f depends on α and β.
Table 2.1 shows the values of p, n, and fop for different accuracy requirements and tag
population sizes calculated using the steps described above. We observe from this table
that for a given tag population size, as the value of α increases and/or β decreases, the
value of n increases to fulfill the more stringent accuracy requirements. We also observe
from this table that for a given (α, β) pair, the values of fop and n are the same for all
tag population sizes, which shows that total number of slots, (fop + l) × n, depends only
on the accuracy requirements and is independent of tag population size. We will formally
prove the independence of estimation time from tag population size in Section 2.7.1. We
further observe that as the tag population size increases, the value of p decreases to reduce
the number of tags participating in a frame to keep the value of fop and n independent of
tag population size.

Table 2.1 Values of fop , n, and p for different values of α, β and tag population size
Accuracy
Requirement
α = 60.0%, β = 40.0%
α = 70.0%, β = 30.0%
α = 80.0%, β = 20.0%
α = 90.0%, β = 10.0%
α = 95.0%, β = 5.00%
α = 99.0%, β = 1.00%
α = 99.9%, β = 0.10%

102
fop
12
14
15
15
15
15
15

n
1.00E+00
2.00E+00
4.00E+00
2.50E+01
1.43E+02
6.24E+03
1.02E+06

p
2.84E-01
3.55E-01
3.91E-01
3.91E-01
3.91E-01
3.91E-01
3.91E-01

Tag Population Size
104
fop
n
p
12
1.00E+00
2.88E-03
14
2.00E+00
3.59E-03
15
4.00E+00
3.96E-03
15
2.50E+01
3.96E-03
15
1.43E+02
3.96E-03
15
6.24E+03
3.96E-03
15
1.02E+06
3.96E-03

29

106
fop
12
14
15
15
15
15
15

n
1.00E+00
2.00E+00
4.00E+00
2.50E+01
1.43E+02
6.24E+03
1.02E+06

p
2.88E-05
3.59E-05
3.96E-05
3.96E-05
3.96E-05
3.96E-05
3.96E-05

2.5.4

Obtaining Population Upper Bound tm

So far we have assumed the knowledge of an upper bound tm on tag population size t.
We now present a fast scheme to obtain tm based on Flajolet and Martin’s probabilistic
counting algorithm [47]. Before calculating system parameters p, n, and fop, the reader
uses this scheme to obtain tm . In this scheme, the reader keeps issuing single-slot frames,
where the persistence probability p follows a geometric distribution starting from p = 1
1 in the ith frame), until the reader gets an empty slot. Suppose the empty
(i.e., p = i−1
2

slot occurred in the ith frame, then tm = 1.2897 × 2i−2 is an upper bound on t [90, 47].
According to [47], tm asymptotically approaches t when instead of using a single value of the
first empty slot from one experiment, we use average of values of the first empty slot from a
large number of experiment.
Next, we determine how close the upper bound tm has to be to the actual tag population
size to ensure that ART achieves the required reliability and examine whether tm obtained
using tm = 1.2897 × 2i−2 lies close enough to t. We derive an expression to calculate the
expected value of actual reliability, denoted by α,
˜ as a function of tm given that the required
reliability α, confidence interval β, and the actual tag population size t are known.
Equation (2.30) is obtained using the condition that actual reliability should be greater
than or equal to the required reliability. Therefore, we use this equation to derive an expression for expected value of actual reliability. In Equation (2.30) , we calculate n using q1 , q1+ ,
and q1− , which are obtained from Equations (2.1), (2.18) and (2.19), respectively, by putting
1

t = tm , f = fop , and p = fop 1 − f 1−1 tm . This gives us:
op
q1 = 1 −

1±β
1
1
, q1± = 1 −
fop − 1
fop − 1

(2.33)

As the number of tags in the population are t and not tm , when the reader executes the
frames, the actual values of q1 , q1+ , and q1− represented by qˆ1 , qˆ1 + , and qˆ1 − , respectively,
follow the equations below.
qˆ1 = 1 −

t
t
1
1
tm
tm (1±β)
, qˆ1 ± = 1 −
fop − 1
fop − 1

30

(2.34)

Let α
˜ represent the expected value of actual reliability in n rounds when the population
contains t tags and the calculated upper bound is tm , then the following equality holds.
√
√
2 erf−1{α}
˜ ×σ {qˆ1 } 2
− 2 erf−1{α}
˜ ×σ {qˆ1 } 2
=n=
µ qˆ1 + −µ {qˆ1 }
µ qˆ1 − −µ {qˆ1 }
Substituting value of n from Eq. (2.30) into the equation above and solving for α
˜ , we get
σ {q1 } µ
×
σ {qˆ1 }
µ
σ {q1 } µ
= erf erf−1 {α} ×
×
σ {qˆ1 }
µ

α
˜ = erf erf−1 {α} ×

qˆ1 + − µ {qˆ1 }
q1+ − µ {q1 }
qˆ1 − − µ {qˆ1 }
q1− − µ {q1 }

(2.35)

The expected actual reliability α
˜ is a convex function of tm
t and is equal to α for two values
of tm
t represented by Ltm and Utm . Figure 2.6 plots the expected value of actual reliability
α
˜ as a function of tm
t using Equation (2.35) with α = 95% and β = 5%. The dashed
horizontal line in the figure marks the required reliability α = 95%. The actual reliability
will be greater than or equal to the required reliability as long as the value of tm
t satisfies
the following condition:
Ltm ≤

tm
≤ Utm
t

(2.36)

The values of Ltm and Utm can be obtained by using α
˜ = α in Equation (2.35) and solving
it for tm and dividing it by the tag population size t. This results in two values of tm
t because
α
˜ is a convex function of tm
t and its maxima is greater than α. The value of Ltm is always
equal to 1 and the value of Utm is calculated by the numerical solution of Equation (2.35)
using α
˜ = α.
The value of Utm depends on the required reliability α and confidence interval β. Table
2.2 tabulates the values of Utm for different population sizes and accuracy requirements. We
observe from Table 2.2 that the value of Utm is independent of tag population size. This
is because Utm depends on q1 for a given α and β (according to Equation 2.35) and q1 is
independent of tag population size as we will discuss in Section 2.7.1. We also observe that
Utm decreases with increasing accuracy requirements. This makes intuitive sense because the
higher the required accuracy, the lesser the error in the upper bound tm that can be tolerated.
31

1.8

m

t /t

1.6

1.4

1.2

1

3

10

4

5

10
10
Tag Population Size t

6

10

Figure 2.7 Experimentally observed values of ratio tm
t
We see from Table 2.2 that even for very high accuracy requirements of α = 99.99% and
β = 0.01%, the value of tm calculated as tm = 1.2897 × 2i−2 can be up to 1.64 × t.
Table 2.2 Utm for different population sizes and accuracy requirements
Accuracy
Requirement
α = 90.00%, β = 10.0%
α = 95.00%, β = 5.00%
α = 99.00%, β = 1.00%
α = 99.90%, β = 0.10%
α = 99.99%, β = 0.01%

Tag
103
1.83
1.71
1.66
1.64
1.64

Population
104 105
1.83 1.83
1.71 1.71
1.66 1.66
1.64 1.64
1.64 1.64

Size
106
1.83
1.71
1.66
1.64
1.64

From simulations, we have observed that the value of tm calculated as tm = 1.2897 × 2i−2
always lies within t and 1.64×t. This is seen in Figure 2.7, where we plot the observed values
i−2 for different values
of tm
t obtained through 100 runs of simulations using tm = 1.2897 × 2

of tag population size. Within each simulation run, we obtained 10 values of i, averaged
them, and replaced i with that average in the equation tm = 1.2897 × 2i−2 to obtain tm
t .

2.6

ART — Practical Considerations

In this section, we describe how ART estimates sizes of arbitrarily large tag populations. We
also present the method that ART employs to enable the use of multiple RFID readers for
estimating the size of a given RFID tag population.

32

2.6.1

Unbounded Tag Population Size

For a given value of frame size f , Theorem 5 calculates the upper bound tM on the number of
tags that ART can estimate. This upper bound exists because for tag population sizes larger
than tM , the system parameters take on values that can not be implemented practically.
After Theorem 5, we describe how we extend ART to estimate sizes of arbitrarily large
populations.
Theorem 5. For a given frame size f > 1, the maximum number of tags tM that ART can
estimate is:
tM = −

ln {f − 1}

1
ln 1 − 15

(2.37)

2

Proof. In theory, we can increase the estimation scope of ART to any population size by
decreasing the value of p according to Equation (2.25). In practice, however, f /p has a
minimum value of 215 − 1. Recall that in ART, the reader announces a virtual frame size of
f /p (although terminates the frame after the first f slots) and each tag uses the result of a
hash function h to select a slot in the range [1, f /p]. The number of bits to store the result
of the hash function is specified to be 15 in the C1G2 standard. Thus, the maximum value
of f /p can be 215 − 1, i.e.

f
p > 15
2

Substituting the value of p from Equation (2.25) into the equation above, we get
f 1−

1
1
t
f −1

f
> 15
2

Rearranging the expression above and solving for t, we get
t<−

ln {f − 1}

1
ln 1 − 15

= tM

2

As an example, with f = 15, tM is just 86,475. Practically, ART achieves required
reliability only for tag populations smaller than tM . If population size is larger than tM , ART

33

requires p ≤ f15 , which is practically not possible with C1G2 RFID tags. This limitation
2
exists with all the existing estimation schemes but has never been addressed before.
Next, we present a strategy to estimate the sizes of arbitrarily large tag populations. The
key idea is to first divide the entire population into smaller sub-populations of roughly equal
sizes and then estimate the size of each sub-population independently. At the end, adding
the estimated sizes of all sub-populations gives the estimate of number of tags in the entire
population. The size of any sub-population should not require fp ≥ 215 .
Next, we first calculate the number of sub-populations that ART should divide a given tag
population into and then present a strategy to perform this division virtually (i.e., requiring
no manual division of tags). Maximum number of tags that a sub-population can have
is given by Equation (2.37). Therefore, the minimum number of sub-populations that the
entire tag population should be divided into is ttm , where tm is calculated as explained in
M
Section 2.5.4.
To divide the tag population into sub-populations, we use the SELECT command standardized in the C1G2 standard. The ID of a tag is stored in its memory at a specific memory
address. The tag can retrieve any bits stored in its memory by specifying an appropriate
address range. Using the SELECT command, a reader can broadcast an address range and a
bit mask that specifies which tags should participate in an Aloha frame. Each tag compares
the bit mask with the bits in the specified address range in its memory and participates in
the frame only if the bit mask matches the specified bits in its memory. To divide the whole
population into sub-populations of roughly equal sizes, we leverage the fact that in large
populations, the expected number of tags whose IDs have the least significant bit (LSB) of
0 is approximately the same as the expected number of tags whose IDs have the LSB of 1.
Similarly, the expected number of tags whose IDs have the two LSBs of 00 is approximately
the same as the expected number of tags whose IDs have the two LSBs of 01, 10, or 11, and
so on. Therefore, a reader can divide the tag population into 2z groups of roughly equal
sizes by specifying appropriate masks for the address range corresponding to the z LSBs of

34

tag IDs. The value of z is given by log2 ttm
M

.

To summarize, a reader first obtains the value of upper bound tm . Second, it calculates
the value of n and fop . Third, it calculates the value of tM using Equation (2.37). Fourth,
tm
it calculates z = log2 tmax

. Fifth, it executes 2z independent estimation rounds for re-

quired reliability α and confidence interval β, where in each round it uses SELECT command
with a unique z bit mask for the z LSBs of the tag IDs. In each independent estimation
1

z
round, it uses p = fop 1 − f 1−1 tm /2 . Finally, it adds up all 2z estimates to obtain
op

the estimate of total number of tags in the population.

2.6.2

ART with Multiple Readers

We next discuss how to obtain tm and t˜ using multiple readers with overlapping coverage. To
obtain tm using multiple readers, we can let each reader obtain the tm value on its own and
then sum them up as the final overall tm because of two reasons. First, our requirement on
tm is only a rough upper bound with an error tolerance of over 1.64 × t. Second, deployment
of multiple readers in practice often requires site surveys to ensure minimal overlapping
between readers.
To obtain t˜ using multiple readers, we adapt the approach proposed by Kodialam et al. in
[62], which uses a central controller for all readers. ART parameters β, α, tm , p, n, and fop
have the same value across all readers. When a reader transmits seed Ri in its ith frame, it
does not generate Ri on its own, rather it uses the ith seed Ri issued by the central controller.
That is, each reader generates the same sequence of n seeds. In the ith frames from different
readers, because all readers use the same seed Ri , the slot number that a given tag chooses
is the same (i.e., h(f, Ri , ID)) in the frame of each reader covering this tag. Once a reader
has completed its frame, it sends the frame to the central controller. The controller applies
the logical OR on all the ith frames from all readers, and gets a single ith frame as if using
a single reader. ART uses the n frames computed by logical OR to estimate the population
size.

35

2.7

ART — Analysis

In this section, first we prove that the estimation time of ART is independent of the tag
population size. Second, we briefly discuss the computational complexity of ART. Last, we
perform an analytical comparison of ART with existing schemes to mathematically justify
the faster speed of ART compared to existing schemes.

2.7.1

Independence of Estimation Time from Tag Population Size

There are three inputs to ART: confidence interval β, required reliability α, and a population
of t tags where t is unknown. The total number of slots of ART, (fop + l) × n, actually
does not depend on t. Intuitively, the larger t is, the smaller p is according to Equation
(2.25). Although t plays an important role in computing p, n, and f individually, in formula
(fop + l) × n the impact of t eventually gets canceled out. Next, we prove this independence.
From Equation (2.30), we observe that the value of n depends on α, β, µ, σ and from
Equation (2.32), we observe that the value of fop depends upon β, µ, σ. Thus, the total
number of slots (fop +l)×n depends on α, β, µ, σ. The values of α and β are given constants
and µ and σ are functions of q1 , as seen from Equations (2.12) and (2.13). To prove that
(fop + l) × n is independent of t, we have to prove that q1 is independent of t. From Equation

(2.1), we have q1 = 1 − (1 − fp )t . As we do not know the value of t, rather we know tm , we

use q1 = 1 − (1 − fp )tm . Substituting the value of p using t = tm from Equation (2.25) into

this expression of q1 , we get
1
1
1
tm
q1 = 1 − 1 − × f 1 −
f
f −1

tm

=

f −2
f −1

(2.38)

Thus the value of q1 that we use to calculate µ and σ and consequently fop and n is
independent of tag population size t or the upper bound on tag population size tm . Therefore,
fop and n depend only on α and β regardless of the value of t or tm . The upper bound
on tag population size tm only affects the value of p. For ART to achieve the required

36

reliability, this upper bound has to satisfy the condition Ltm ≤ tm ≤ Utm . If tm > t × Utm ,
the required reliability will not be achieved because the value of p will become so small that
enough number of tags will not participate in the frames. Regardless, the value of (fop +l)×n
stays the same. We have seen from Figure 2.7 that for all practical purposes, the value of
tm satisfies the requirement Ltm ≤ tm ≤ Utm when calculated using the method proposed
in Section 2.5.4.

2.7.2

Computational Complexity

The two most computationally intensive tasks in ART are the numerical solutions of Equation (2.12) to obtain the estimate t˜ and of Equation (2.32) to calculate fop . Fortunately,
these two equations need to be solved numerically only once during the estimation process:
Equation (2.32) before executing the frames and Equation (2.12) after executing the frames.
Consequently, the runtime complexity of ART is no larger than that of a standardized Aloha
protocol. Almost all existing schemes involve numerical solutions of equations to obtain the
estimate t˜. Therefore, the off-line computational complexity of ART is comparable to those
of existing estimation schemes.

2.7.3

Analytical Comparison of Estimators

Next, we show that the ART estimator, namely the average run size of 1s, has less variance
than many other framed slotted Aloha based estimators, namely (1) the size of the first run
of 0s (used by FNEB [53]), (2) the average run size of 0s, (3) the total number of 0s (used
by UPE [61] and EZB [62]), (4) the total number of 1s, (5) the total number of runs of 0s,
and (6) the total number of runs of 1s. Higher the variance of an estimator, more number of
rounds n are needed to improve reliability, and more rounds means larger estimation time.
Figure 2.8 shows the analytical plots of the variances of the ART estimator and the above
six estimators with frame size f = 16 versus tag population sizes. This figure shows that
the variance of ART estimator is significantly lower than all other estimators. Runs of 1s
37

Size of first run of 0s
Avg. run size of 0s
Total 0s
Total 1s
Runs of 1s
Runs of 0s
Avg. run size of 1s

Variance

2

10

0

10

1

10

2

10
Number of tags t

Figure 2.8 Variance of different estimators versus RFID tag population size
and runs of 0s have smaller variance compared to ART for very small tag population sizes.
This observation, however, is insignificant because both these quantities are non-monotonic
functions of tag population size and therefore, cannot be used alone for estimation. The
variances of these estimators are calculated as follows. The variance of the total number
of 0s and 1s is calculated using Equation (2.3). The variance of the size of the first run is
calculated using Equation (3) in [102] by setting i = 1. The variance of the number of runs
of 0s and that of 1s is calculated using Equation (2.5). We emphasize that plots in Figure
2.8 are not based on experimental results, instead, they are based on analytical formulas.

2.8

Performance Evaluation

We numerically evaluated in Matlab our ART scheme as well as four prior RFID estimation
schemes: UPE [61], EZB[62], FNEB [53], and MLE [72]. We did not evaluate LoF [90]
because it is non-compliant with C1G2 and CSE [127] because it does not take accuracy
requirements as input. The estimation times for ART reported in this section include the
time required to obtain the value of tm . To ensure compliance with the C1G2 standard, in
all our simulations, each tag picks up exactly one slot at the start of frame as soon as the
reader broadcasts the frame size.
Next, we first conduct a side-by-side comparison on estimation time between ART and
the four prior schemes. Then, we conduct experiments to show that ART indeed achieves

38

the required reliability.

2.8.1

Estimation Time

The results in Figures 2.9, 2.10, and 2.11 show that the estimation time of ART is significantly
smaller than all prior schemes. Note that in Figures 2.10 and 2.11, the plots for FNEB are
out of the range of the vertical axes, and the plots of UPE and EZB are almost overlapping.
We make three main observations from Figures 2.9 (a), (b), and (c), which show the
estimation time needed by each scheme with population sizes of up to one million tags for
different configurations of α and confidence interval β. First, we observe that ART is faster
than all four prior schemes in all these configurations. For α = 99.9% and β = 0.1%, ART
is 7 times faster than the fastest prior estimation schemes, which are UPE [61] and EZB
[62]. For α = 99% and β = 1%, ART is 1.96 times faster than UPE and EZB. For α = 95%
and β = 5%, ART is 1.68 times faster than UPE and EZB. Second, we observe that ART,
UPE, EZB, and MLE perform estimation in constant time, which attributes to the use of
persistence probabilities. Third, we observe that FNEB, whose estimator is the size of the
first run of 0s, is the slowest. This concurs with our analytical analysis in Figure 2.8, where
we show that FNEB has the largest variance. The larger the variance of an estimator, the
more the rounds of execution needed to achieve the required reliability, and the longer the
estimation time.
We make three main observations from Figures 2.10 (a), (b), and (c), which show the
estimation time of each scheme for 5000 tags with the required reliability α varying from
90% to 99.9% for different configurations of confidence interval β. First, we observe that
ART is faster than all four prior estimation schemes in all these configurations. Second,
the difference between the estimation time of ART and those of prior schemes increases as
the required reliability increases. For example, for β = 5% and α = 95%, ART is 1.68
times faster than UPE and EZB while for β = 0.1% and α = 99.9%, it is 7 times faster.
This shows that ART becomes more and more advantageous over existing schemes when

39

the required reliability increases. Third, for all schemes, the estimation time increases as
the required reliability increases because more number of rounds are needed to achieve the
required reliability. We further observe that ART’s estimation time increases at the lowest
rate as the required reliability increases because its estimator has the smallest variance.
We make three main observations from Figures 2.11 (a), (b), and (c), which show the
estimation time of each scheme for 5000 tags with the confidence interval β varying from
0.1% to 10% for different configurations of α. First, we observe that ART is faster than all
estimation schemes in all these configurations. Second, for all schemes, the estimation time
decreases as the confidence interval increases because lesser number of rounds are needed to
achieve the required reliability.
4

250

FNEB
MLE
EZB
UPE
ART

10
8
6
4
2
0 3
10

4

5

10
10
Number of tags t

200
150
100
50
0 3
10

6

10

6

FNEB
MLE
EZB
UPE
ART

Estimation time (sec)

x 10

Estimation time (sec)

Estimation time (sec)

12

(a) α = 99.9%, β = 0.1%

4

5

10
10
Number of tags t

FNEB
MLE
EZB
UPE
ART

5
4
3
2
1
0 3
10

6

10

(b) α = 99%, β = 1%

4

5

10
10
Number of tags t

6

10

(c) α = 95%, β = 5%

Figure 2.9 Estimation time vs. tag population size of ART and existing schemes
4

Estimation time (sec)

Estimation time (sec)

3
2
1

150

7

FNEB
MLE
EZB
UPE
ART

Estimation time (sec)

x 10
FNEB
MLE
5
EZB
UPE
4 ART
6

100

50

6
5

FNEB
MLE
EZB
UPE
ART

4
3
2
1

0
0.9

0.92

0.94
0.96
0.98
Required reliability α

(a) β = 0.1%

1

0
0.9

0.92

0.94
0.96
0.98
Required reliability α

(b) β = 1%

1

0
0.9

0.92

0.94
0.96
0.98
Required reliability α

(c) β = 5%

Figure 2.10 Estimation time vs. required reliability for ART and existing schemes

40

1

Estimation time (sec)

Estimation time (sec)

2

10

4

FNEB
MLE
EZB
UPE
ART

4

10

2

10

0

10

0

10

0

0.02

0.04
0.06
0.08
Confidence Interval β

0

0.1

(a) α = 99.9%

0.02
0.04
0.06
0.08
Confidence Interval β

FNEB
MLE
EZB
UPE
ART

10
Estimation time (sec)

FNEB
MLE
EZB
UPE
ART

4

10

0.1

2

10

0

10

0

(b) α = 99%

0.02
0.04
0.06
0.08
Confidence Interval β

0.1

(c) α = 95%

Figure 2.11 Estimation time vs. confidence interval for ART and existing schemes

0.9998
0.9996
0.9994
0.9992
0.999 2
10

3

10

4

5

10
10
Number of tags t

(a) α = 99.9%, β = 0.1%

6

10

Actual Reliability (AR)

Actual Reliability (AR)

Actual Reliability (AR)

1

1

1

0.998
0.996
0.994
0.992
0.99 2
10

3

10

4

5

10
10
Number of tags t

(b) α = 99%, β = 1%

6

10

0.99
0.98
0.97
0.96
0.95 2
10

3

10

4

5

10
10
Number of tags t

(c) α = 95%, β = 5%

Figure 2.12 Actual reliability achieved by ART for three different requirements

41

6

10

2.8.2

Actual Reliability

The subfigures in Figure 2.12 show the actual reliability of ART versus the number of tags
for different configurations of required reliability α and confidence interval β. We observe
that ART always achieves the required reliability. These figures show several ups and downs
in the plotted values. These ups and downs are not because of any noise, rather we see them
because of the magnification level of vertical axis in these figures.

2.9

Conclusion

The key technical novelty of this chapter is in proposing the new estimator, the average
run size of 1s, for estimating RFID tag population size of arbitrarily large sizes. Using
analytical plots, we show that our estimator has much smaller variance compared to other
estimators including those used in prior work. It is this smaller variance that makes our
scheme faster than the previous ones. The key technical depth of this chapter is in the
mathematical development of the estimation theory using this estimator. ART can estimate
arbitrarily large tag populations with arbitrarily high accuracy. It works with single as well
as multiple readers. Our experimental results show that ART is significantly faster than all
prior RFID estimation schemes. We have shown, both theoretically and experimentally, that
the estimation time of ART is independent of the tag population size.

42

3

RFID Identification

3.1

Introduction

3.1.1

Background and Problem Statement

As the cost of commercial RFID tags, which is as low as 5 cents per tag [93], has become
negligible compared to the prices of the products to which they are attached, RFID systems
are being increasingly used in various applications such as supply chain management [67],
indoor localization [86], 3D positioning [123], object tracking [85], inventory control, electronic toll collection, and access control [46, 84]. For example, Walmart has started to use
RFID tags to track jeans and underwear for better inventory control. Large warehouses,
such as those of Amazon with sizes up to 1 million ft2 [22], or distribution centers with sizes
up to 3 million ft2 [1], contain hundreds of thousands of items. RFID systems can make the
inventory management and tracking in these large warehouses and distribution centers much
easier and error free. An RFID system consists of tags and readers. A tag is a microchip
combined with an antenna in a compact package that has limited computing power and
communication range. There are two types of tags: (1) passive tags, which do not have their
own power source, are powered up by harvesting the radio frequency energy from readers,
and have communication ranges often less than 20 feet; (2) active tags, which come with
their own power sources and have relatively longer communication ranges. A reader has a
dedicated power source with significant computing power. RFID systems mostly work in a
query-response fashion where a reader transmits queries to a set of tags and the tags respond
43

with their IDs over a shared wireless medium.
This chapter addresses the fundamental RFID tag identification problem, namely reading
all IDs of a given set of tags, which is needed in almost all RFID systems. Because tags
respond over a shared wireless medium, tag identification protocols are also called collision
arbitration, tag singulation, or tag anti-collision protocols. Tag identification protocols need
to be scalable as the number of tags that need to be identified could be as large as tens of
thousands with the increasing adoption of RFID tags. An RFID system with a large number
of tags may require multiple readers with overlapping regions. In this chapter, we first focus
on the single reader version of the tag identification problem and then extend our solution
to the multiple reader problem.
0

1
(0,0)

1

Empty Read
Collision

5
1

0

1

0

0

1

0

3

4

6

7

10

13

0 (3,1) 1

0 (3,2) 1

0 (3,3) 1

0 (3,4) 1

0 (3,5) 1

(4,0)

(4,2)

(4,6)

(4,8)

11
(4,5)

(4,7)

12

0

1

1
(1,1)

16

(2,2)

0 (3,0) 1

(4,3) (4,4)

1

(1,0)

9

(2,1)

0
(0,0)

(1,1)

2

(4,1)

Tag
1

0

(1,0)

(2,0)

Successful Read

8
1

0

0

Skipped Node

14

(2,3)

11
1

0

(2,0)

1
0 (3,6) 1

0 (3,7) 1

0

1

2

(2,1)

3

0 (3,0) 1

0 (3,1) 1

(4,0)

(4,2)

0

1

4

0 (3,2) 1

5
0 (3,4) 1

(4,6)

(4,8)

6

(a) Nodes visited using TW protocol

(4,1)

(4,3) (4,4)

(4,5)

(4,7)

(2,3)

1

8

0 (3,3) 1

15

(4,9) (4,10) (4,11) (4,12) (4,13) (4,14) (4,15)

0

1

(2,2)

7

0 (3,5) 1
9

0 (3,6) 1

0 (3,7) 1

10

(4,9) (4,10) (4,11) (4,12) (4,13) (4,14) (4,15)

(b) Nodes visited using TH protocol

Figure 3.1 Identifying a population of 9 tags using TW and TH.

3.1.2

Summary and Limitations of Prior Art

The industrial standard, EPCGlobal Class 1 Generation 2 (C1G2) RFID [55], adopted two
tag identification protocols, namely framed slotted Aloha and Tree Walking (TW). In framed
slotted Aloha, a reader first broadcasts a value f to the tags in its vicinity where f represents
the number of time slots present in a forthcoming frame. Then each tag whose inventory
bit is 0 randomly picks a time slot in the frame and replies during that slot. Each C1G2
compliant tag has an inventory bit, which is initialized to be 0. In any slot, if exactly one tag
responds, the reader successfully gets the ID of that tag and issues a command to the tag to
44

change its inventory bit to 1. The key limitation of framed slotted Aloha is that it can not
identify large tag populations due to the finite possible size of f . Qian et al. have shown that
framed slotted Aloha is most efficient when f is equal to the number of tags [89]. Therefore,
although theoretically any arbitrarily large tag population can be identified by indefinitely
increasing the frame size, practically this is infeasible because during the entire identification
process, Aloha based protocols require all tags, including those that have been identified, to
stay powered up and listen to all the messages from the reader in order to maintain the value
of the inventory bit. This results in high instability because any intermittent loss of power
at a tag will set its inventory bit back to 0, leading the tag to contend in the subsequent
frame. The instability of Aloha based protocols has formally been proven by Rosenkrantz
and Towsley in [94].
TW is a fundamental multiple access protocol, which was first invented by U.S. Army for
testing soldiers for syphilis during World War II [44]. TW was proposed as an RFID tag
identification protocol by Law et al. in [66]. In TW, a reader first queries 0 and all the tags
whose IDs start with 0 respond. If result of the query is a successful read (i.e., exactly one
tag responds) or an empty read (i.e., no tag responds), the reader queries 1 and all the tags
whose IDs start with 1 respond. If the result of the query is a collision, the reader generates
two new query strings by appending a 0 and a 1 at the end of the previous query string and
queries the tags with these new query strings. All the tags whose IDs start with the new
query string respond. This process continues until all the tags have been identified. This
identification process is essentially a partial Depth First Traversal (DFT) on the complete
binary tree over the tag ID space, and the actual traversal forms a binary tree where the leaf
nodes represent successful or empty reads and the internal nodes represent collisions. Nodes
on level l correspond to lth most significant bit of the tag IDs. Figure 3.1(a) shows the tree
walking process for identifying 9 tags over a tag ID space of size 24 . Here a successful
read node is one that an identification protocol visits and there is exactly one tag in the
subtree rooted at this node, an empty read node is one that an identification protocol visits

45

and there is no tag in the subtree rooted at this node, and a collision node is one that an
identification protocol visits and there are more than one tags in the subtree rooted at this
node. The key limitation of TW based protocols is that they visit a large number of collision
nodes in the binary tree, which makes the identification process slow. Although several
heuristics have been proposed to reduce the number of visits to collision nodes [87, 83],
all these heuristics based methods are not guaranteed to minimize such futile visits. Prior
Aloha-TW hybrid protocols also have this limitation.

3.1.3

System Model

As most commercially available tags and readers already comply with the C1G2 standard, we
do not assume changes to either tags or their physical protocol. We assume that readers can
be reprogrammed to adopt new tag identification software. For reliable tag identification,
we are given the probability of successful query-response communication between the reader
and a tag.

3.1.4

Proposed Approach

To address the fundamental limitations that lie in the heuristic nature of prior TW based
protocols, we propose a new approach to tag identification called Tree Hopping (TH). The
key novel idea of TH is to formulate the tag identification problem as an optimization
problem and find the optimal solution that ensures either minimal expected number of
queries (i.e., nodes visited on the binary tree) or minimal expected identification time, as
per the requirement. In TH, we first quickly estimate the tag population size. Second, based
on the estimated tag population size, we calculate the optimal level to start tree traversal so
that the expected number of queries or expected identification time is minimal, hop directly
to the left most node on that level, and then perform DFT on the subtree rooted at that
node. Third, after that subtree is traversed, we re-estimate the size of remaining unidentified
tag population, re-calculate the new optimal level, hop directly to the new optimal node, and
46

perform DFT on the subtree rooted at that node. Hopping to optimal nodes in this manner
skips a large number of collision nodes. This process continues until all the tags have been
identified. Figure 3.1(b) shows the nodes traversed by TH for the same population of 9 tags
as in Figure 3.1(a). Here a skipped node is one that TW visits but TH does not. We can
see that TH traverses 11 nodes to identify these 9 tags. In comparison, TW traverses 16
nodes as shown in Figure 3.1(a). This difference scales significantly as tag population size
increases.
3.1.4.1

Population Size Estimation

TH first uses a framed slotted Aloha based method to quickly estimate the tag population
size. For this, TH requires each tag to respond to the reader with a probability q. As C1G2
compliant tags do not support this probabilistic responding, we implement this by “virtually”
extending the frame size 1q times. To estimate the tag population size, the reader announces
a frame size of 1q but terminates it after the first slot. To terminate a frame, the reader issues
a SELECT command, specified in the C1G2 standard, with its position, target, and action
parameters set to 0. This command “resets” all tags and they go into a state where they
expect a new frame to start. For further details on frame termination, see Section 6 of [55].
The reader issues several single-slot frames while reducing q with a geometric distribution
1 in ith frame) until the reader gets an empty slot. Suppose the empty slot
(i.e., q = i−1
2

occurred in the ith frame, TH estimates the tag population size to be 1.2897 × 2i−2 based
on Flajolet and Martin’s algorithm used in databases [47, 90].
3.1.4.2

Finding Optimal Level

To determine the optimal level γop that TH directly hops to, we first calculate the expected
number of nodes that TH will visit or expected identification time that TH will take if it
starts DFTs from nodes on any given level γ. Let b be the number of bits in each tag ID
(which is 64 for C1G2 compliant tags), then, we have 1 ≤ γ ≤ b. If γ is small, more collision
47

nodes will be visited while if it is large, more empty read nodes will be visited. Our objective
is to calculate an optimal level γop that will, depending on the requirement, result in either
the smallest number of nodes visited or the smallest identification time. To find γop for
minimizing number of queries, we first derive the expression for calculating the expected
number of nodes visited by TH if TH directly hops to level γ. Then we calculate the value
of γ which minimizes this expression. This value of γ is the value of optimal level γop. We
present the technical details of finding γop in Section 3.3.

In Section 3.4, we derive the

expression for calculating the expected identification time of TH if TH directly hops to level
γ. We use this expression to calculate γop when we need to minimize the identification time
instead of number of queries.
3.1.4.3

Population Size Re-estimation

If the tags that we want to identify are uniformly distributed in the ID space [0, 2b − 1],
then performing DFTs from each node on level γop will result in minimum number of nodes
visited. However, in reality, the tags may not be uniformly distributed. In such cases, each
time when the DFT of a subtree is finished, TH needs to re-estimate the total tag population
size to find the next optimal level and the hoping destination node. TH performs the reestimation as follows. Let z be the first tag population size estimated using the Aloha based
method, x be the number of tags that have been identified, and s be the size of the tag
ID space covered by the nodes visited. Naturally, z − x is an estimate of the remaining
tag population size; however, we cannot use this estimate to calculate the next optimal level
because the remaining leftover ID space may not form a complete binary tree. Instead, based
on the node density in the remaining ID space, TH extrapolates the total tag population
size to be z−x
× 2b and uses it to find the next hopping destination node. Note that if tags
2b −s
are uniformly distributed, we have z−x
× 2b = z.
b
2 −s

48

3.1.4.4

Finding Hopping Destination

Each time after a DFT is done and the new optimal level is recalculated, TH needs to find
the next node to hop to, which may not be the leftmost node on the optimal level. Consider
the example shown in Figure 3.1(b). Assuming a uniform distribution, the optimal level to
start the DFT is 3. In this chapter, we use (l, p) to denote the pth node on level l. TH
performs DFTs on the subtrees of nodes (3, 0) to (3, 5) and identifies 8 out of 9 tags. Based
on the number of remaining tags after the last DFT, which is 1, the optimal level for the
next hop is changed from 3 to 1. However, if TH starts the DFT from the leftmost node
on level 1, which is (1, 0), it will result in identifying all tags in its subtree again which is
wasteful. Similarly, if TH starts the DFT from the second leftmost node on level 1, which
is (1, 1), it will visit the subtree of (2, 2), which is wasteful as all the tags in the subtree of
(2, 2) have already been identified. Similarly, if there had been a third leftmost node on the
new optimal level and if TH starts the DFT from that third left most node, it will not visit
the subtree of (2, 3), resulting in tag (4, 13) not being identified. To avoid both scenarios,
i.e., some subtrees being traversed multiple times and some subtrees with tags not being
traversed, after the optimal level is recalculated, TH hops to the root of the largest subtree
that can contain the next tag to be identified but does not contain any previously identified
tag. The level at which this root is located can not be smaller than the new optimal level.
For the example in Figure 3.1(b), after the subtree rooted at node (2, 2) has been traversed,
the recalculated optimal level is 1 and the next node that TH hops to is (2, 3).
Our experimental results in Figure 3.2 show that when the tags are not uniformly distributed in the ID space, our technique of dynamically adjusting γop according to the leftover
population size significantly reduces the total number of queries and the average number of
responses per tag. The two curves “TH w re-estimation-Seq” and “TH w/o re-estimationSeq” show the total number of queries needed, respectively, with and without the dynamic
adjustment of γop for non-uniformly distributed tag IDs. For example, for 10K tags, this
dynamic level adjustment reduces the total number of queries by 31.5%. Our experimental

49

results in Figure 3.2 also show that when the tags are uniformly distributed in the ID space,
there is no need to dynamically adjust γop . The two curves “TH w re-estimation-Uni” and
“TH w/o re-estimation-Uni” show the total number of queries needed, respectively, with and
without the dynamic adjustment for uniformly distributed tag IDs. These two curves are
similar because for uniformly distributed tag IDs, γop does not usually change after each
DFT and thus the benefit of dynamically adjusting γop is relatively small. Our experimental
results in Figure 3.2 further show that the performance of TH on non-uniformly distributed
populations is asymptotically the same as its performance on uniformly distributed populations when it uses the technique of dynamically adjusting γop according to the leftover
population size. The curve “TH w re-estimation-Seq” approaches the curves “TH w reestimation-Uni” and “TH w/o re-estimation-Uni” as the tag population size increases.

# Queries / # Tags

3

2.5

TH w/o re−estimation − Seq
TH w re−estimation − Seq
TH w/o re−estimation − Uni
TH w re−estimation − Uni

2

1.5

1 2
10

3

10
# Tags

4

10

Figure 3.2 Impact of dynamic adjustment of γop on different types of populations.

3.1.4.5

Population Distribution Conversion

Although dynamically adjusting γop for a non-uniformly distributed population reduces the
number of queries, the number of queries is still not as low as it would have been had the
population been uniformly distributed. Furthermore, the extent of reduction depends on
the distribution and size of the population. In Section 3.5.1, we present a simple technique
that TH uses to virtually convert almost any non-uniformly distributed population into
near-uniformly distributed population. The key idea is that instead of comparing the query
strings, transmitted by the reader, with the starting bits of the tag ID, each tag compares

50

the query string with the ending bits of its ID. The resulting binary tree has all the tags
near-uniformly distributed in the ID space. We will show that this can be implemented without any modifications to the physical communication protocol and the tags. This technique,
combined with the dynamic level adjustment, enables TH to identify any non-uniformly
distributed population in almost the same number of queries or time as for uniformly distributed population of the same size. In what follows, we first assume that tags compare the
query string with the starting bits of its ID, as in TW protocol, until Section 3.5.1 where we
explain this technique in detail.

3.2

Related Work

We review existing identification protocols, which can be classified as nondeterministic, deterministic, or hybrid.

3.2.1

Nondeterministic Identification Protocols

Existing such protocols are either based on framed slotted Aloha [129] or Binary Splitting
(BS) [35]. As we discussed above, Aloha based protocols only work for small tag populations.
In BS [35], the identification process starts with the reader asking the tags to respond. If
more than one tags respond, BS divides and subdivides the population into smaller groups
until each group has only one or no tag. This process of random subdivision incurs a lot of
collisions. Furthermore, BS requires the tags to perform operations that are not supported by
the C1G2 standard. ABS is a BS based protocol that is designed for continuous identification
of tags [82].

3.2.2

Deterministic Identification Protocols

There are 3 such protocols: (1) the basic TW protocol [66], (2) the Adaptive Tree Walking
(ATW) protocol [115], and (3) the TW-based Smart Trend Traversal (STT) protocol [87].

51

ATW is an optimized version of TW that always starts DFTs from the level of log z, where
z is the size of tag population. This is the traditional wisdom for optimizing TW. The key
limitation of ATW is that it is optimal only when all tag IDs are evenly spaced in the ID
space; however, this is often not true in real-world applications. In contrast, during the
identification process, our TH protocol adaptively chooses the optimal level to hop to based
on distribution of IDs. STT improves TW using some ad-hoc heuristics to select prefixes
for next queries based upon the type of response to previous queries. It assumes that the
number of tags identified in the past k queries is the same as the number of tags that will
be identified in the next k queries. This may not be true in reality.

3.2.3

Hybrid Identification Protocols

Hybrid protocols combine features from nondeterministic and deterministic protocols. There
are two major such protocols: Multi slotted scheme with Assigned Slots (MAS) [83] and
Adaptively Splitting-based Arbitration Protocol (ASAP) [89]. MAS is a TW-based protocol
in which each tag that matches the reader’s query picks up one of the f time slots to respond.
For large populations, due to the finite practical size of f , for queries corresponding to higher
levels in the binary tree, the response in each of the f slots is most likely a collision, which
increases the identification time. ASAP divides and subdivides the tag population until
the size of each subset is below a certain threshold and then applies Aloha on each subset.
For this, ASAP requires tags to pick slots using a geometric distribution, which makes
it incompliant with the C1G2 standard. Furthermore, subdividing the population before
identification is in itself very time consuming.

3.3

Optimal Tree Hopping

After quick population size estimation using Flajolet and Martin’s algorithm [47], TH needs
to find the optimal level to hop to. First, we derive an expression to calculate the expected

52

number of queries (i.e., the number of nodes that TH will visit) if it starts DFTs from
the nodes on level γ, assuming that tags are uniformly distributed in the ID space. The
expression to calculate the expected identification time will be derived in Section 3.4. Second,
as the derived expression is too complex to calculate the optimal value of γ that minimizes
the expected number of queries by simply differentiating the expression with respect to γ,
we present a numerical method to calculate the optimal level γop. If tags are not uniformly
distributed, each time when the DFT on a node is completed, as stated in Section 6.1.2,
TH re-estimates the total population size based on the initial estimate and the number of
tags that have been identified, re-calculates the new optimal level, and finds the hopping
destination node.

3.3.1

Average Number of Queries

Let random variable Q denote the total number of nodes that TH visits to identify all tags.
Note that each node visit corresponds to one reader query. We next calculate E[Q]. Let
I(l, p) be an indicator random variable whose value is 1 if and only if node (l, p) is visited.
Thus, Q is the sum of I(l, p) for all l and all p.

Q=

b 2l −1

I(l, p)

(3.1)

l=1 p=0

Let P {(l, p)} be the probability that TH visits node (l, p). Thus, E[Q] can be expressed as
follows:
E[Q] =

b 2l −1
l=1 p=0

P {(l, p)}

(3.2)

Next, we focus on expressing P {(l, p)} using variable γ, where γ denotes the level that TH
hops to. Recall that TH skips all nodes on levels from 1 to γ − 1 and performs DFT on each
of the 2γ nodes on level γ, where 1 ≤ γ ≤ b. Note that the root node of the whole binary
tree is always meaningless to visit as it corresponds to a query of length 0. Here P {(l, p)}
is calculated differently depending on whether node (l, p) is the left child of its parent or the
53

right. Let Pl {(l, p)} and Pr {(l, p)} denote the probability of visiting (l, p) when (l, p) is the
left and right child of its parent, respectively. If the estimated total number of tags z is zero,
then Pl {(l, p)} = Pr {(l, p)} = 0 for all l and p. Below we assume z > 0. As TH skips all
nodes from levels 1 to γ − 1, we have
Pl {(l, p)} = Pr {(l, p)} = 0 if 1 ≤ l < γ

(3.3)

As TH performs DFT from each node on level γ, it visits each node on this level. Thus, we
have
Pl {(l, p)} = Pr {(l, p)} = 1 if l = γ

(3.4)

For each remaining level γ < l ≤ b, when (l, p) is the left child of its parent, Pl {(l, p)} is
equal to the probability that the parent of (l, p) is a collision node. When (l, p) is the right
child of its parent, if the parent is a collision node and (l, p − 1) is an empty read node, then
(l, p) will also be a collision node. Thus, instead of visiting (l, p), TH should directly hop
to the left child of (l, p). Therefore, Pr {(l, p)} is equal to the probability that the parent of
(l, p) is a collision node and (l, p − 1) is not an empty read node.
Let k denote the number of tags covered by the parent of node (l, p) (i.e., the number
of tags that are in the subtree rooted at the parent of (l, p)). Let m = 2b−l+1 denote the
maximum number of tags that the parent of (l, p) can cover and n = 2b denote the maximum
number of tags that can be accommodated in the whole ID space. The probability that the
parent of (l, p) covers k of z tags follows a hypergeometric distribution:
m

n−m

P {#tags = k} = k nz−k

(3.5)

z

Let Pe be the probability that the parent of (l, p) is an empty read. Thus,
Pe = P {#tags = 0} =

n−m
z
n
z

(3.6)

Let Ps be the probability that the parent of (l, p) is a successful read. Thus,
Ps = P {#tags = 1} =
54

m n−m
z−1
n
z

(3.7)

Let Pc be the probability that the parent of (l, p) is a collision node. Thus,
Pc = 1 − (Pe + Ps ) = 1 −

n−m
z
n
z

−

m n−m
z
n
z

(3.8)

Next we calculate Pl {(l, p)} and Pr {(l, p)} for γ < l ≤ b for the following three cases:
n − m < z − 1, n − m = z − 1, and n − m > z − 1. Note that n − m is the size of the ID
space that is not covered by the parent of (l, p), and z − k is the remaining number of tags
that are not covered by the parent of (l, p). Thus, z − k ≤ n − m.
Case 1 n − m < z − 1. In this case, z − k ≤ n − m < z − 1, which means k ≥ 2. Thus, as
the parent of (l, p) covers at least two tags, it must be a collision node, i.e.Pc = 1. Thus, if
(l, p) is the left child of its parent, TH for sure visits it:
Pl {(l, p)} = 1

(3.9)

If (l, p) is the right child of its parent, TH visits it if and only if node (l, p − 1), which is the
left sibling of (l, p), is not an empty read. If (l, p − 1) is an empty read, as its parent is a
collision node, (l, p) must also be a collision node, which means that TH will directly visit
the left child of (l, p) instead of (l, p). The size of the ID space covered by (l, p − 1) is m
2 . If
n− m
2 ≤ z − 1, then node (l, p − 1) covers at least one tag, which means that (l, p − 1) is
not an empty read and TH for sure visits (l, p), i.e., Pr {(l, p)} = 1. If n − m
2 > z − 1, then
the probability that TH visits (l, p) is equal to the probability that (l, p − 1) is not an empty
m
read, which is 1 − n−z 2 / nz based on Equation (3.6). Finally, we have





n− m

( z2 )
1
−
if n − m
n)
2 > z−1
Pr {(l, p)} =
(
z

 1
if n − m
2 ≤ z−1

(3.10)

Case 2 n − m = z − 1. In this case, z − k ≤ n − m = z − 1, which means k ≥ 1. As
the parent of (l, p) covers k ≥ 1 tags, the probability of the parent of (l, p) being an empty

n
read is 0 and the probability of the parent of (l, p) being a successful read is m n−m
z−1 / z =

55

Expected # queries

Expected # queries / # Tags

4

15

2.5 •←γ
•←γ==12
•←γ = 3
•←γ = 4
•←γ = 5
γ=6
2
γ=7
γ=8
γ = 10

1.5

γ=9

x 10
TH
TW

10

5

γopt
1
0

0
200

400
600
# Tags

Figure 3.3 Norm.

800

1000

E[Q] vs.

1

2

3
# Tags

4

5
4
x 10

Figure 3.4 E[Q]: TH vs. TW

pop. size ∀γ
n
n
m z−1
z−1 / z = m/ z based on Equation (3.7). If (l, p) is the left child of its parent, then

TH visits it if and only if the parent of (l, p) is a collision node. Thus, the probability of
visiting (l, p) is equal to the probability of the parent of (l, p) being a collision node, which
is equal to 1 − Pe − Ps . Thus, we have
m
Pl {(l, p)} = 1 − Pe − Ps = 1 − n

(3.11)

z

If (l, p) is the right child of its parent, then TH visits it if and only if both the parent of (l, p)
is a collision node and (l, p − 1) is not an empty read. The probability that the parent of
(l, p) is a collision node is 1 − m/ nz as calculated above. Given that the parent of (l, p) is a
collision node, the probability that (l, p − 1) is an empty read is
m

Pr {(l, p)} = 1 − n

. 1−

z

n− m
2
z

−m
2 /

n− m
2 −m
2
z
n
−
m
z

n
z

−m .
(3.12)

Case 3 n − m > z − 1. In this case, k ≥ 0. Similar to the calculations above, as per
Equations (3.6) and (3.7), we have:
Pl {(l, p)} = 1 − Pe − Ps = 1 −
Pr {(l, p)} = 1 −

n−m
z

n−m
z

+ m n−m
z−1
n
z

+ m n−m
z−1
n
z

× 1−

n− m
n−m
n−m
2 −
+m
z
2 z−1
z
n
n−m
+ m n−m
z −
z
z−1

(3.13)
(3.14)

Finally, Equations (3.3) through (3.14) completely define the probabilities Pl {(l, p)} and
56

Pr {(l, p)}. Note that as tags are uniformly distributed, the probability of visiting node (l, p)
is independent of the horizontal position p.
The expected number of queries can now be calculated using Theorem 6.
Theorem 6. For a population of z tags uniformly distributed in the ID space, where each
tag has an ID of b bits, if TH hops to level γ to perform DFT from each node on this level,
the expected number of queries for identifying all z tags is:
E[Q] = 2γ +

b
l=γ+1

2l−1 [Pl {(l, p)} + Pr {(l, p)}]

(3.15)

Proof. First, on level γ, all the 2γ nodes are visited by TH. Second, on any level l where
γ + 1 ≤ l ≤ b, the probabilities of left and right nodes being visited are Pl {(l, p)} and
Pr {(l, p)} respectively. As there are 2l−1 pairs of left and right nodes on level l, the expected
number of nodes visited by TH on level l is 2l−1[Pl {(l, p)} + Pr {(l, p)}].
When γ = 1, Equation (3.15) is also the analytical model for calculating expected number
of queries of TW protocol.

3.3.2

Calculating Optimal Hopping Level

Equation (3.15) shows that E[Q] is a function of γ as n = 2b , m = 2b−l+1, and b is given.
For any given z, we want to find the optimal level γ = γop so that E[Q] is minimal. The
conventional approach to finding the optimal variable value that minimizes a given function
is to differentiate the function with respect to that variable, equate the resulting expression
to zero, and solve the equation to obtain the optimal variable value. However, it is very
difficult, if not impossible, to use this approach to find the optimal level because Equation
(3.15) for calculating E[Q] is too complex.
Next, we present a numerical method to find the optimal level. First, we define normalized
E[Q] as the ratio of E[Q] to tag population size. Figure 3.3 shows the plots of normalized
E[Q] vs. the number of tags for different γ values ranging from 1 to b (here we used b = 10

57

for illustration). From this figure, we observe that for any tag population size, there is a
unique optimal value of γ. For example, for a population of 600 tags, γop = 9. Second,
we define crossover points as follows: for a given ID length b, the crossover points are the
tag population sizes c0 = 0, c1 , c2 , · · · , cb+1 = 2b such that for any tag population size in
[ci , ci+1) (0 ≤ i ≤ b), γop = i. These crossover points are essentially the x-coordinates of
the intersection points of the normalized E[Q] curves of consecutive values of γ in Figure
3.3. Thus, the value of ci can be obtained by putting z = ci and numerically solving
E[Q, γ = i − 1] = E[Q, γ = i] for ci using the bisection method. Once ci is calculated for
each 1 ≤ i ≤ b, γop for a given z can be obtained by simply identifying the unique interval
[ci , ci+1) in which z lies and then using γop = i. The solid line in Figure 3.3 is plotted using
the values of γop obtained using the proposed strategy. As values of ci only depend on b, it
is a one time cost to calculate them.
We next conduct an analytical comparison between the expected number of queries for TH
and that for TW. Figure 3.4 shows the expected number of queries for TH, which is calculated
using Equation (3.15) using γ = γop , and that for TW, which is calculated using Equation
(3.15) using γ = 1, for 64 bit tag IDs. We observe that TH significantly outperforms TW
for the expected number of queries. For example, for a population of 10K tags, the expected
number of queries for TH is only 54% of that for TW. We will present detailed experimental
comparison between TH and other protocols in Section 6.9.

3.3.3

Maximum Number of Queries

Although the primary goal of our TH protocol is to minimize the average number of queries,
next, we analyze the maximum number of queries of TH and analytically show that it is still
smaller than that of TW. The maximum number of queries that TH may need to identify z
tags with b-bit IDs is shown in Theorem 7.
Theorem 7. Let V denote the number of queries that TH may need to identify a population

58

of z ≥ 2 tags with b-bit IDs using γ = γop . We have
V ≤ z(b − γop + 1) − 2γop + 2θ0 − θ1 (b − γop − 1)

(3.16)

where
θ0 = 2γop −
θ1 =

z
2b−γop

z
2b−γop

z−1

−

2b−γop

1−

γop
b

Proof. Let VT W denote the number of queries that TW may need to identify z ≥ 2 tags
with b-bit IDs. The upper bound of VT W is given as follows (proven in [66]):
z
VT W ≤ z(b + 1 − log ) − 1
2

(3.17)

Because z ≥ 2, we have VT W ≤ z(b + 1) − 1.
When z tags are uniformly distributed in the ID space, TH essentially performs TW on all
subtrees rooted at nodes on level γop. Let θ0 and θ1 denote the number of subtrees covering
0 and 1 tags, respectively. For these θ0 + θ1 subtrees, TH only visits the roots, which are at
level γop. Let α denote the number of remaining subtrees (i.e., α = 2γop − θ0 − θ1 ) and Ti
denote a subtree covering zi ≥ 2 tags. For each subtree Ti , the maximum number of nodes
that TH visits is zi (b − γop + 1) − 1. Summing all 2γop subtrees, we have
α−1

V ≤

i=0

zi (b − γop + 1) − 1 + θ0 + θ1

= z(b − γop + 1) − 2γop + 2θ0 − θ1 (b − γop − 1)

(3.18)

The right hand side (RHS) of Equation (3.18) is maximized when θ0 is maximized and θ1
is minimized, which happens when all z tag IDs are contiguous and they start from the left
most leaf of a subtree at level γop . In this case, the number of subtrees with tags are
and therefore θ0 = 2γop −

z
2b−γop

z
2b−γop

. Furthermore in this case, when γop ≤ b − 1, there is

at most one subtree at level γop that has exactly one tag i.e., θ1 =

z
2b−γop

−

z−1
2b−γop

;

when γop = b, θ1 equals z. Combining the two cases of γop ≤ b − 1 and γop = b, we have
θ1 =

z
2b−γop

−

z−1
2b−γop

γ
1 − op
b .

59

The proof above gives us the insight that TH requires fewer queries when the tag IDs are
distributed more uniformly in the ID space. Intuitively, this makes sense because the more
the tag IDs are distributed uniformly, the fewer the number of collisions encountered by TH.
Experimentally, our results shown in Figures 3.11(a) and 3.11(b) in Section 6.9 also confirm
this insight: for the same number of tags, the number of queries needed by TH when tags
are uniformly distributed is less than that when tags are non-uniformly distributed.
We now conduct an analytical comparison between the maximum number of queries for
TH and that for TW. Figure 3.5 shows the maximum number of queries for TH, which is
calculated using the RHS of Equation (3.16), and that for TW, which is calculated using the
RHS of Equation (3.17), for 64 bit tag IDs. We observe that TH again outperforms TW for
the maximum number of queries, although slightly. For example, for a population of 10K
tags, the maximum number of queries for TH is 93% of that for TW.
6

5

x 10
TH
2.5 TW
2
1.5
1
0.5
0

1

x 10
Reliable TH with optimization
2.5 Reliable TH without optimization
3

Expected # queries

Max # queries

3

2

3
# Tags

Figure 3.5 Max.

4

2
1.5
1
0.5
0

5
4
x 10

# queries:

1

2

3
# Tags

4

5
4
x 10

Figure 3.6 E[Q] of Reliable TH

TH vs. TW

3.4

Minimizing Identification Time

The optimal value of γ calculated using the expression for E[Q] in Equation (3.15) and
applying the numerical method proposed in Section 3.3.2 minimizes the average number
of queries, but does not minimize the average identification time because the durations of
successful read, empty read, and collision are different. Next, we derive an expression for
expected identification time as a function of γ. We can then use the numerical method of
60

Section 3.3.2 to calculate the optimal value of γ that will minimize the average identification
time.
Let random variable T denote the total identification time that TH takes to identify all
tags. Next, we calculate E[T ]. Let ts , tc , and te denote the time durations of successful
read, collision, and empty read, respectively. Let random variables Qs , Qc , and Qe denote
the number of queries resulting in successful reads, collisions, and empty reads, respectively.
Thus, T can be expressed as follows:
T = Qs × ts + Qc × tc + Qe × te

(3.19)

Applying expectation operator on both sides of the equation above, the expected value of
total identification time, E[T ], can be expressed as follows:
E[T ] = E[Qs ] × ts + E[Qc ] × tc + E[Qe ] × te

(3.20)

Next, we derive expressions for E[Qs ], E[Qc ], and E[Qe ]. Let Ix (l, p) be an indicator
random variable whose value is 1 if and only if node (l, p) is visited and the response type
is x, where x ∈ {s:successful read, c:collision, e:empty read}. Thus, Qx is the sum of
Ix (l, p) for all l and all p, where x ∈ {s, c, e}.
b 2l −1

Qx =

Ix (l, p)

(3.21)

l=1 p=0

The probability that TH visits node (l, p) is P {(l, p)}. Let P {x|(l, p)} be the probability
that given that TH visits node (l, p), the response type for the node is x, where x ∈ {s, c, e}.
Thus, E[Qx ] can be expressed as follows:
E[Qx ] =

b 2l −1
l=1 p=0

P {(l, p)} × P {x|(l, p)}

(3.22)

Recall that P {(l, p)} has already been completely defined in Equations (3.3) through
(3.14). Next, we derive expressions for P {x|(l, p)}. Let k denote the number of tags covered
by the node (l, p). Let m = 2b−l denote the maximum tags node (l, p) can cover. Recall that
61

n = 2b denotes the max number of tags that can be accommodated in the whole ID space.
The probability that node (l, p) covers k of z tags follows a hypergeometric distribution:

P #tags = k =

m
k

n−m
z−k
n
z

(3.23)

The probabilities P {s|(l, p)} and P {e|(l, p)} can be calculated using k = 1 and k = 0,
respectively, in Equation (3.23).


 m(n−m
z−1 ) if n − m ≥ z − 1 
P {s|(l, p)} =
(nz)


0
if n − m < z − 1


 (n−m
z ) if n − m > z − 1 
(nz)
P {e|(l, p)} =


0
if n − m ≤ z − 1

(3.24)

(3.25)

Probability P {c|(l, p)} can be calculated as follows.


n−m
n−m)
m
(
)


(
z−1

z


if n − m > z − 1 

 1 − (n) − (n)
z
z
m
P {c|(l, p)} = 1−(P {e|(l, p)}+P {s|(l, p)}) =
1− n
if n − m = z − 1 

(z )






0
if n − m < z − 1
(3.26)
The expected identification time of TH can now be calculated using Theorem 8.
Theorem 8. For a population of z tags uniformly distributed in the ID space, where each
tag has an ID of b bits, if TH hops to level γ to perform DFT from each node on this level,
the expected identification time for identifying all z tags is:

E[T ] = 2γ tc + (ts − tc )P {s|(γ, p)} + (te − tc )P {e|(γ, p)}
b

+
l=γ+1

tc + (ts − tc )P {s|(l, p)} + (te − tc )P {e|(l, p)}
×2l−1 [Pl {(l, p)} + Pr {(l, p)}]
(3.27)

Proof. Equation (3.27) is obtained in three steps. First, substitute the values of P {s|(l, p)},
P {e|(l, p)}, and P {c|(l, p)} from Equations (3.24), (3.25), and (3.26) into Equation (3.22)
62

to obtain values of E[Qs ], E[Qe ], and E[Qc ], respectively, and further substitute these values
of E[Qs ], E[Qe ], and E[Qc ] into Equation (3.20). Second, use P {(l, p)} = 0 for 1 ≤ l < γ
as per Equation (3.3) and use P {(l, p)} = 1 for l = γ as per Equation (3.4). Third, for
any level l > γ, use P {(l, p)} = Pl {(l, p)} for each node on this level that is left child of its
parent and use P {(l, p)} = Pr {(l, p)} for each node on this level that is right child of its
parent. Note that there are 2l−1 pairs of left and right nodes on level l.
When γ = 1, Equation (3.27) is also the analytical model for calculating expected identification time of TW protocol. Note that Equation (3.27) is a generalized form of Equation
(3.15). It reduces to Equation (3.15) if the time durations of successful read, collision, and
empty read are equal to unit time.
3

Using E[Q]
Using E[T]

Expected time (ms)/# Tags

Crossover point

10

2

10

1

10

0

10

0

2
4
6
8
Optimal level # (b=10)

3.5

3

2.5
0

10

Using E[Q]
Using E[T]

200

400
600
# Tags

800

1000

Figure 3.7 Crossover points

Figure 3.8 Normalized ex-

obtained using E[Q] and

pected identification time

E[T ]

vs. population size

According to [53] and [55], the values of ts , te , and tc are 3ms, 0.3ms, 1.5ms, respectively.
Figure 3.7 plots the values of crossover points obtained using expression of E[Q] from Theorem 6 and expression of E[T ] from Theorem 8 (we used b = 10 for illustration). We observe
from the figure that the values of crossover points obtained using the expression for E[Q]
are comparatively larger than those obtained using the expression for E[T ]. The reason is
that to minimize identification time instead of number of queries, TH starts the DFTs at
levels with comparatively larger values of l, which results in reduction in number of collisions at an expense of slightly increased number of empty reads. The over all identification

63

time is reduced because empty reads are five times faster than collisions and the amount
of identification time increased by the increased number of empty reads is smaller than the
amount of identification time reduced by the reduced number of collisions. Figure 3.8 shows
the normalized expected identification times for the two cases i.e., when the crossover points
are calculated using E[Q] and E[T ] (again we used b = 10 for illustration). We observe that
for several population sizes, the normalized expected time calculated using E[Q] is greater
than that calculated using E[T ].

3.5
3.5.1

Discussion
Virtual Conversion of Population Distributions

To virtually convert a non-uniformly distributed population into a uniformly distributed
population, we leverage the fact that in large populations, the expected number of tags
whose IDs have the least significant bit (LSB) of 0 is approximately the same as the expected
number of tags whose IDs have the LSB of 1. Similarly, the expected number of tags whose
IDs have the two LSBs of 00 is approximately the same as the expected number of tags
whose IDs have the two LSBs of 01, 10, or 11, and so on. Therefore, if we construct a
binary tree in which level l corresponds to lth LSB instead of lth most significant bit (MSB),
then each node of level l is expected to cover z/2l tags: a property of uniformly distributed
populations. To illustrate, consider an example where there are 8 tags in a population, each
with a unique 4-bit ID in the range [0, 7]. Figure 3.9(a) shows the binary tree constructed
in the conventional way in which level l corresponds to lth MSB. This population is clearly
non-uniformly distributed in the ID space and TH will have to frequently perform dynamic
adjustments to the optimal value of γ and the number of queries will be large compared
to the number of queries for a uniformly distributed population of the same size. Figure
3.9(b) shows the binary tree constructed in the proposed way where level l corresponds to
lth LSB. Note from the figure that the 8 tags are now uniformly placed in the entire ID

64

space. On the binary trees that resembles the one in Figure 3.9(b), TH will require very
few dynamic adjustments and the number of queries will be approximately same as for a
uniformly distributed population of the same size.

MSB: 1
LSB: 4
x

0

1

0

LSB: 1
MSB: 4
x

1

.

.

0

MSB: 2
LSB: 3
x

0

1

0

1

0

1

1

LSB: 2
MSB: 3
x

.

.

MSB: 3
LSB: 2
x

0

0

1

0

1

0

1

0

1

0

1

0

1

0

1

LSB: 3
MSB: 2
x

1

.

MSB: 4
LSB: 1
x

.

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

0

1

.

LSB: 4
MSB: 1
x
.

1
2
3
4
5
6
7
0
0000 0001 0010 0011 0100 0101 0110 0111

0
0000

(a) Binary tree where level l corresponds to lth MSB of the tag ID

(b) Binary tree where level l corresponds to lth LSB of the tag ID

4
0100

2
0010

6
0110

5
0101

1
0001

3
0011

7
0111

Figure 3.9 Distributions of populations with binary trees with MSBs and LSBs.

8
4
1000 0100

5
6
7
8
9
10
11
4
0100 0101 0110 0111 1000 1001 1010 1011

(a1) MSB distribution of block of IDs

1
2
0
0000 0001 0010

2
3
0010 0011

13
14
15
1101 1110 1111

6
0110

9
1001

11
7
1011 0111

(a2) LSB distribution of block of IDs

0
0000

(b1) MSB distribution of two blocks of IDs

0
0000

9
5
1001 0101

10
6
1010 0110

11
12
1011 1100

2
0010

14
1
1110 0001

15
1111

13
1101

(b2) LSB distribution of two blocks of IDs

14
1110

0
0000

(c1) MSB distribution of randomly chosen IDs

12
2
1100 0010

14
6
0110 1110

9
1001

3
11
0011 1011

(c2) LSB distribution of randomly chosen IDs

Figure 3.10 Last level l = b = 4 of the binary trees made with MSBs and LSBs.

Figures 3.10(a1) through 3.10(c2) show three other populations where the circles on the
left side of the dashed vertical line represent level l = b = 4 of the binary tree in which
level l corresponds the lth MSB of the tag ID, and the circles on the right side of the dashed
vertical line represent level l = b = 4 of the binary tree in which level l corresponds the
lth LSB of the tag ID. The population in Figures 3.10(a1) and 3.10(a2) consists of 8 tags
with consecutive IDs in the range [4, 11]. We can see that if the binary tree is built using
conventional method where lth level corresponds to the lth MSB of the tag ID, then the
resulting population is not uniformly distributed in the binary tree. However, if the binary

65

tree is built using our proposed modification where lth level corresponds to the lth LSB of the
tag ID, then the resulting population is more close to a uniform distribution. Similarly, the
population in Figures 3.10(b1) and 3.10(b2) consists of two blocks, each containing 3 IDs.
We make the same observation that the IDs are comparatively more uniformly distributed in
the binary tree made with LSBs compared to the one made with MSBs. In a scenario where
a population is already uniformly distributed in the ID space, our proposed modification
does not affect it and the uniformity is maintained in the tree made with LSBs. This is
shown in Figures 3.10(c1) and 3.10(c2).
Next we leverage these observations to propose a simple modification in TH that reduces
the number of queries and identification times of TH for non-uniformly distributed populations to approximately the same values as for uniformly distributed populations. When
the reader transmits a query string, the tag compares it with its LSBs instead of MSBs to
decide whether or not it will respond to the query. If the result of the query is a collision,
the reader generates two new query strings by appending a 0 and a 1 at the start of the
previous query string and queries the tags with these new query strings. All the tags whose
IDs end with the new query string respond.
This modification does not require any changes to the tags and works with the C1G2
compliant tags. To make a tag compare the query string with the LSBs of its ID, we use
the SELECT command standardized in the C1G2 standard. The ID of a tag is stored in its
memory at a specific memory address. A tag can retrieve any bits stored in its memory by
specifying an appropriate address range. Using the SELECT command, a reader broadcasts
an address range and a bit mask. Each tag compares the bit mask with the bits in the
specified address range in its memory and responds back only if the bit mask matches the
specified bits in its memory. In TH, the bit mask contains the query string of length l, where
1 ≤ l ≤ b, and the address range that the reader broadcasts is of the l LSBs of tag IDs.

66

3.5.2

Reliable Tag Identification

So far we have assumed that the communication channel between the reader and tags is
reliable, which means that each tag can receive the query from the reader and the reader
can receive either the response if only one tag responds or the collision if more than one tag
respond. However, this assumption often does not hold in reality because wireless communication medium is inherently unreliable. There are two existing schemes for making tag
identification reliable. Backes et al. proposed the scheme of letting each tag store the IDs of
several other tags [31]. When the reader queries a tag, the tag transmits back its own ID as
well as the IDs of other tags stored in it. When identification completes, the reader compares
the set of IDs of tags that responded with the union of sets of IDs of other tags reported by
each responding tag. If the sets are not equal, the whole process is repeated again to ensure
that the missed tags are identified. This scheme has two weaknesses. First, this scheme does
not comply with the C1G2 standard. Second, it assumes that the tag population remains
static for the lifetime of tags as each tag is hard coded with some other tags’ IDs. The
second scheme is to run an identification protocol on the same population several times until
probability of missing a tag falls below a threshold [51, 56]. They estimate the probability
of missing a tag based upon the number of tags that were identified in some runs of the
protocol but not in others.
While we can use the C1G2 compliant scheme proposed in [51, 56] to make TH reliable,
i.e., repeatedly run TH until the required reliability is achieved. We observe that in this
scheme, the leaf nodes in the binary tree are queried multiple times. This is wasteful of
time for the nodes that the reader successfully reads. To eliminate such waste, we propose
to query each node multiple times, instead of querying the whole binary tree multiple times.
We define the reliability of successfully reading a tag to be the probability that both the tag
receives the query from the reader and the reader receives the response from the tag. For this,
we calculate the maximum number of times the reader should transmit a query, which is
denoted by β. Let g and u be the given and required reliability of successfully reading a tag,

67

Table 3.1 Comparison with Prior C1G2 Compliant Protocols (TH/Prior Art)
Prior Deterministic
Protocols

Prior Hybrid
(=MAS)

Max

Min

Mean

Max

Min

Mean

0.18
0.76
0.69
1.13

ATW-f
ATW-c
ATW-c
TW

0.51
0.92
0.85
1.12

0.50
0.89
0.67
1.07

0.50
0.90
0.70
1.11

0.39
0.81
0.64
1.12

0.38
0.78
0.24
1.07

0.39
0.79
0.38
1.10

0.18
0.24
0.32
1.35

ATW-f
ATW-f
ATW-c
ATW-c

0.75
0.60
0.87
1.03

0.33
0.19
0.11
1.00

0.60
0.41
0.33
1.02

0.40
0.21
0.46
1.05

0.18
0.09
0.08
0.95

0.29
0.15
0.22
1.02

Min

Mean

THQ

Best
prior

Max
#queries/tag
query time/tag
#responses/tag
response fairness

0.24
0.84
0.85
1.15

0.10
0.71
0.59
1.10

THT

Non-Uni

Uniform

Prior Nondeter.
Protocol (=Aloha)

#queries/tag
query time/tag
#responses/tag
response fairness

0.26
0.40
0.63
1.38

0.10
0.12
0.11
1.25

respectively. Thus, the probability of successfully identifying a tag is 1 − (1 − g)β . Equating
it to u gives:
β = log(1−g) (1 − u)

(3.28)

Our scheme of reliable tag identification works as follows: for each non-terminal node in
the binary tree that TH needs to visit, TH transmits a query corresponding to that node
β times; corresponding to each terminal node, TH keeps transmitting the query until either
that query has been transmitted β times or the reader successfully receives the tag ID.
The optimization technique of stop transmitting the query corresponding to a terminal
node on a successful read significantly reduces the total number of queries. Figure 3.6 plots
the expected number of queries per tag for the reliable TH protocol with and without this
optimization. For example, for a population of 50000 tags, the number of queries per tag
are reduced by 24%.

3.5.3

Continuous Scanning

In some applications, the tag population may change over time (i.e., tags leave and join the
population dynamically). We adapt the continuous scanning strategy proposed by Myung
et al. in [82]. In the first scanning of the whole tag population, TH records the queries that
resulted in successful or empty reads. If the tag population does not change, by perfoming
DFTs on the subtrees rooted at successful and empty read nodes of the previous scan, TH
68

experiences no collision. If some new tags join the population, some of the successful read
nodes of the previous scan can now turn into collision nodes and some empty read nodes can
turn into successful or collision nodes. If some old tags leave the population, some successful
read nodes will become empty read nodes. If any of the new empty read nodes happens to
be a sibling of another empty read node, then TH discards these two nodes from the record
and stores the location of their parent because the parent is also an empty read node. This
strategy works well when tag population size remains static or increases. However, when the
tag population decreases, the best choice is to re-execute TH for the subsequent scan.

3.5.4

Multiple Readers

An application with a large number of RFID tags requires multiple readers with overlapping
regions because a single reader can not cover all tags due to the short communication range
of tags (usually less than 20 feet). The use of multiple readers introduces several new types
of collisions such as reader-reader collisions and reader-tag collisions. Such collisions can be
handled by reader scheduling protocols such as those proposed in [122, 36, 131, 116]. TH is
compatible with all of these reader scheduling protocols.

3.6

Performance Comparison

We implemented two versions of TH. (1) THQ , in which γop is obtained using E[Q] and
the query string is matched with MSBs of tag IDs, and (2) THT , in which γop is obtained
using E[T ] and the query string is matched with LSBs of tag IDs to virtually convert the
population distribution into a near-uniform distribution. We also implemented all the 8
prior tag identification protocols in Matlab, namely the 3 nondeterministic protocols (Aloha
[129], BS [35], and ABS [82]), the 3 deterministic protocols (TW [66], ATW [115], and STT
[87]), and the 2 hybrid protocols (MAS [83] and ASAP [89]). As ATW starts DFTs from the
level of log z which may not be a whole number, we present results for ATW by both ceiling

69

and flooring the values of log z and representing them with ATW-c and ATW-f respectively.
In terms of implementation complexity, TH and all the 8 prior protocols are implemented
in the similar number of lines of code. We performed extensive testing, both manually and
automatically, to ensure the correctness of each protocol implementation.
We performed the side-by-side comparison with TH, although this comparison is not completely fair for TH for two reasons. First, 3 of these 8 protocols (i.e., BS, ABS, and ASAP)
require modifications to tags and thus do not work with standard C1G2 tags, whereas TH is
fully compliant with C1G2. Second, for the framed slotted Aloha, to its best advantage, we
choose the frame size to be the ideal size, which is equal to the tag population size, disregarding the practical limitations on the frame sizes. We choose tag ID length to be the C1G2
standard 64 bits. We performed the comparison for both the uniform case (where the tag
population is uniformly distributed in the ID space) and the non-uniform case (where the tag
population is not uniformly distributed in the ID space). For the uniform case, we range tag
population sizes from 100 to 100, 000 to evaluate the scalability of these protocols. For the
non-uniform case, we distribute tag populations in blocks where each block is a continuous
sequence of tag IDs. We range block sizes from 5 to 1000. Our motivation for simulating
non-uniform distribution in blocks is that in some applications, such as supply chains, tag
IDs often come in such blocks when they are manufactured. For each tag population size,
we run each protocol 100 times and report the mean. We compare TH with prior protocols
from both reader and tag perspectives.

3.6.1

Reader Side Comparison

For the reader side, we compared TH with the 8 prior protocols based on the following
two metrics: (1) normalized reader queries and (2) identification speed. Normalized reader
queries is the ratio of the number of queries that the reader transmits to identify a tag
population divided by the number of tags in the population. Similarly, identification speed
is the total time that the reader takes to identify a tag population divided by the number of

70

tags in that population.
In general, more queries implies more identification time. However, identification time
is not strictly in proportion to the number of queries because different queries may take
different amounts of time.
For each metric, in Table 3.1, we show the value of TH divided by that for the best prior
C1G2 compliant protocol for this metric in the corresponding category of nondeterministic,
deterministic, or hybrid. Note that the only prior C1G2 compliant nondeterministic tag
identification protocol is the framed slotted Aloha and the only prior C1G2 compliant hybrid
tag identification protocol is MAS. There are 3 prior C1G2 compliant deterministic tag
identification protocols: TW, ATW, and STT. We report min, max, and mean for these
ratios for tag populations ranging from 100 to 100, 000.
For the two metrics defined above, the absolute performance of TH and all prior 8 tag
identification protocols is shown in Figures 3.11(a) to 3.12(b), for both uniform and nonuniform distributions. Note that for non-uniform distributions, we fix the tag population
size to be 5000 and range the block size from 2 to 1000.
3.6.1.1

Normalized Reader Queries

THQ reduces the normalized reader queries of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag identification protocols by an average of 82%, 50%, and
61%, respectively, for uniformly distributed tag populations. THT reduces the normalized
reader queries of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid
tag identification protocols by an average of 82%, 40%, and 71%, respectively, for nonuniformly distributed tag populations. Figures 3.11(a) and 3.11(b) show the normalized
reader queries of all protocols for uniformly and non-uniformly distributed populations, respectively. Based on these two figures, we make the following four observations from the
perspective of normalized reader queries for both uniform and non-uniform distributions.
First, normalized queries of THT are slightly greater than those of THQ for uniformly dis-

71

tributed tag populations. This is because, to minimize identification time, THT starts DFTs
at levels closer to the leaf nodes compared to THQ , which results in more empty reads and
less collisions. The increase in number of empty reads is slightly greater than the decrease in
number of collisions. Matching the query string with LSBs in THT does not bring much advantage because the population is already uniformly distributed. Second, for non-uniformly
distributed tag populations, normalized queries of THT are, on average, 18% fewer than those
of THQ . This significant improvement is a result of the virtual conversion of non-uniformly
distributed populations into uniformly distributed populations as proposed in Section 3.5.1.
Third, among all the 8 prior protocols, the traditional ATW protocol turns out to be the
best. Fourth, the framed slotted Aloha in the C1G2 standard performs the worst even when
we disregard the practical limitations on the frame sizes. Although BS is the best among
the 3 prior nondeterministic tag identification protocols, it is not compliant with C1G2.
Similarly, although ASAP is the best among the 2 prior hybrid tag identification protocols,
it is not compliant with C1G2.
3.6.1.2

Identification Speed

THQ improves the identification speed of the best prior C1G2 compliant nondeterministic,
deterministic, and hybrid tag identification protocols by an average of 24%, 10%, and 21%, respectively, for uniformly distributed tag populations. THT improves the identification speed of
the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag identification
protocols by an average of 76%, 59%, and 85%, respectively, for non-uniformly distributed
tag populations. Figures 3.12(a) and 3.12(b) show the identification speed of all protocols
for uniformly and non-uniformly distributed tag populations, respectively. Based on these
two figures, we make the following four observations from the perspective of identification
speed.

First, the normalized identification times of THT are slightly smaller than those

of THQ for uniformly distributed tag populations. This improvement is the result of using
E[T ] to calculate γop instead of using E[Q]. Second, the normalized identification times of

72

THT are, on average, 36% smaller than those of THQ for non-uniformly distributed tag populations. This significant improvement is a result of the virtual conversion of non-uniformly
distributed populations into uniformly distributed populations as proposed in Section 3.5.1.
Third, among all 8 prior protocols, the traditional ATW protocol turns out to be the best for
both uniform and non-uniform distributions. Fourth, although framed slotted Aloha is the
worst in terms of normalized reader queries, its identification speed is not the worst. This is
because in our experiments we allow it to use unrealistically large frame sizes, which leads
to many empty slots and empty read is much faster than successful read and collision.
10

Aloha
STT
MAS
ASAP
ABS
BS
TW
ATW−c
ATW−f
THT

6
5
4
3
2

# Queries / # Tags

# Queries / # Tags

7

6
4
2

THQ

1

8

2

3

4

10
10
10
# Tags (uniform distribution)

5

Aloha
MAS
TW
STT
ATW−c
ASAP
ABS
BS
ATW−f
TH
Q

THT
0

10

1

2

3

10
10
10
10
Block size (non−uniform distribution)

(a) Uniform

(b) Non-uniform

Figure 3.11 Normalized queries of TH and existing protocols

5
4.5
4

10
ABS
BS
TW
Aloha
MAS
STT
ASAP
ATW−f
ATW−c
THQ

Time (ms) / # Tags

Time (ms) / # Tags

5.5

THT

3.5

2

3

4

10
10
10
# Tags (uniform distribution)

4

TW
MAS
Aloha
ABS
BS
ATW−c
STT
ASAP
ATW−f
TH

3

THT

9
8
7
6
5

Q

0

5

1

2

3

10
10
10
10
Block size (non−uniform distribution)

10

(a) Uniform

(b) Non-uniform

Figure 3.12 Identification speed of TH and existing protocols

73

4

BS
ABS
TW
ASAP
MAS
STT
ATW−f
ATW−c
Aloha
TH

2

THT

14
12
10
8
6

# Tag responses / # Tags

# Tag responses / # Tags

16

Q

2

3

4

10
10
10
# Tags (uniform distribution)

20

BS
TW
ABS
MAS
STT
ASAP
ATW−f
Aloha
ATW−c
TH

15
10
5

Q

THT

0

5

10

0

1

2

3

10
10
10
10
Block size (non−uniform distribution)

(a) Uniform

(b) Non-uniform

Figure 3.13 Normalized responses of TH and existing protocols

Fairness

0.8
0.75
0.7

0.85

THT
THQ

0.8

ASAP
ABS
MAS
TW
BS
ATW−f
Aloha
ATW−c
STT

0.65

Fairness

0.85

0.75
0.7

2

3

4

10
10
10
# Tags (uniform distribution)

THT
THQ
MAS
TW
ATW−f
ATW−c
ABS
BS
ASAP
Aloha
STT

0.65

5

10

0

1

2

3

10
10
10
10
Block size (non−uniform distribution)

(a) Uniform

(b) Non-uniform

Figure 3.14 Response fairness of TH and existing protocols

1.5
1
0.5

6
ABS
BS
TW
ATW−f
MAS
STT
ASAP
ATW−c
Aloha
THQ

# Collisions / # Tags

# Collisions / # Tags

2

THT

0

2

3

4

10
10
10
# Tags (uniform distribution)

5
4
3
2
1
0

5

TW
MAS
ABS
BS
STT
ATW−f
ATW−c
ASAP
Aloha
THQ
THT
0

1

2

3

10
10
10
10
Block size (non−uniform distribution)

10

(a) Uniform

(b) Non-uniform

Figure 3.15 Normalized collisions of TH and existing protocols

74

2.5
2
1.5
1
0.5
0

6
Aloha
STT
ASAP
MAS
ATW−c
ATW−f
ABS
BS
TW
THT

# Empty reads / # Tags

# Empty reads / # Tags

3

THQ
2

3

4

10
10
10
# Tags (uniform distribution)

5
4
3
2

Aloha
STT
MAS
ATW−c
ASAP
ATW−f
ABS
BS
TH
Q

1

TH

T

TW

0

5

0

1

2

3

10
10
10
10
Block size (non−uniform distribution)

10

(a) Uniform

(b) Non-uniform

Figure 3.16 Normalized empty reads of TH and existing protocols

3.6.2

Tag Side Comparison

On the tag side, we compare TH with the 8 prior protocols based on the following four
metrics: (1) normalized tag responses, (2) response fairness, (3) normalized collisions, and
(4) normalized empty reads. Normalized tag responses is the ratio of sum of responses of
all tags during the identification process to the number of tags in the population. Response
fairness is the Jain’s fairness index given by

z x )2
i=1 i
z· zi=1 x2i

(

where xi is the total number of

responses by tag i [57]. Normalized collisions is the ratio of total number of collisions during
the identification process to the number of tags in the population. Normalized empty reads
is the ratio of total number of empty reads during the identification process to the number
of tags in the population.
The first two metrics are important for active tags because active tags are powered by
batteries. Lesser number of normalized tag responses mean lesser power consumption for
active tags. Response fairness measures the variance in the number of responses per tag.
Less fairness results in the depletion of the batteries of some tags more quickly compared to
others. In large scale tag deployments, it is often nontrivial to identify tags with depleted
batteries and replace them. Using an absolutely fair tag identification protocol, the batteries
of all tags deplete at the same time and therefore all can be replaced at the same time. We
use the Jain’s fairness metric defined in [57]. For z tags, the fairness value is in the range

75

[ 1z , 1]. The higher this fairness value is, the more fair the protocol is. The second two metrics
are important for understanding these identification protocols.
For normalized tag responses and response fairness, in Table 3.1, we show the value of TH
divided by that for the best prior C1G2 compliant protocol in the corresponding category
of nondeterministic, deterministic, or hybrid. The absolute performance of TH and all prior
8 tag identification protocols is shown in Figures 3.13(a) to 3.14(b), for both uniform and
non-uniform distributions.
3.6.2.1

Normalized Tag Responses

THQ reduces the normalized tag responses of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag identification protocols by an average of 31%, 30%, and
62%, respectively, for uniformly distributed tag populations. THT reduces the normalized tag
responses of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag
identification protocols by an average of 68%, 67%, and 78%, respectively, for non-uniformly
distributed tag populations. Figures 3.13(a) and 3.13(b) show the normalized tag responses
of all protocols for uniformly and non-uniformly distributed tag populations, respectively.
We make following four observations from these two figures.

First, the normalized tag re-

sponses of THT are, on average, 57% lesser than those of THQ for non-uniformly distributed
tag populations. Second, the normalized tag responses of BS, ABS, TW, MAS, and ASAP
increase with increasing tag population size. Third, for non-uniformly distributed tag populations, the normalized tag responses of nondeterministic protocols is not affected by the
block size because their performance is independent of tag ID distribution. In contrast, the
normalized tag responses of deterministic protocols slightly increase with increasing block
size. Fourth, among all 8 prior protocols, Aloha has the smallest number of normalized tag
responses. This is because of the unlimitedly large frame sizes that we used for Aloha. With
large frame sizes, tags experience lesser collisions and thus reply fewer times.

76

3.6.2.2

Tag Response Fairness

THQ improves the tag response fairness of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag identification protocols by an average of 13%, 11%, and
10%, respectively, for uniformly distributed tag populations. THT improves the tag response
fairness of the best prior C1G2 compliant nondeterministic, deterministic, and hybrid tag
identification protocols by an average of 35%, 2%, and 2%, respectively, for non-uniformly
distributed tag populations. Figures 3.14(a) and 3.14(b) show the tag response fairness of
all protocols for uniformly and non-uniformly distributed tag populations, respectively. We
TH
ASAP
ABS
MAS
TW
BS
Aloha
ATW
STT

TH
ATW
MAS
TW
ABS
BS
ASAP
STT
Aloha

0
10
20
30
40
# Tag responses (uniform distribution)

0
10
20
30
40
# Tag responses (non−uniform distribution)

(a) Uniform distribution

(b) Non-uniform dist.

Figure 3.17 Distribution of tag responses of TH and existing protocols

observe that among all 8 prior protocols, ASAP and ATW are the best for uniformly and
non-uniformly distributed populations, respectively. We observe that THT achieves slightly
better fairness than THQ .
Figures 3.17(a) and 3.17(b) show the distribution of the number of tag responses for each
protocol for uniformly and non-uniformly distributed tag populations, respectively. For any
protocol, the wider the horizontal span of its distribution is, the larger the range of the
number of responses per tag it has. We observe that TH has the smallest range among all
protocols for the number of responses per tag.

77

3.6.2.3

Normalized Collisions

THQ and THT , both incur smaller number of collisions than all 8 prior protocols for uniformly and non-uniformly distributed tag populations. Figures 3.15(a) and 3.15(b) show the
normalized collisions for all protocols for uniformly and non-uniformly distributed tag populations, respectively. From these figures we make following three observations. First, THT
incurs fewer collisions compared to THQ , which is one of the reasons behind the faster identification speed of THT . Second, Aloha incurs the smallest number of normalized collisions
among all 8 prior protocols because of the unlimitedly large frame sizes that we used for
it. Third, TW mostly incurs the largest number of normalized collisions for both types of
populations.
3.6.2.4

Normalized Empty Reads

For uniformly distributed tag populations, THQ incurs a smaller number of empty reads
than all 8 prior protocols. For non-uniformly distributed tag populations, THT , incurs a
smaller number of empty reads than all 8 prior protocols. Figure 3.16(a) and 3.16(b) show
the normalized empty reads of all protocols for uniformly and non-uniformly distributed tag
populations, respectively. From these figures, we observe that although the two prior C1G2
compliant protocols, TW and MAS, have fewer empty reads compared to THQ for large
block sizes, they have much larger number of collisions compared to THQ , which makes
their overall identification time much larger than THQ . Note that the slightly larger number
of empty reads for THQ for large block sizes is immaterial because the time for an empty read
is 5 times lesser than that for a collision and 10 times lesser than that for a successful read.
Therefore, reducing the number of collisions is more important than reducing the number
of empty reads. We also observe that THT has greater number of empty reads compared to
THQ , which is the cost of decreasing the collisions. As collisions are 5 times slower compared
to empty reads, this slight increase in number of empty reads is not of much significance.
Note that the collisions and empty reads shown in Figures 3.15(a) and 3.16(a), respectively,

78

are consistent with the reader queries shown in Figure 3.11(a) as well as the identification
speed shown in Figure 3.12(a). Similarly, the collisions and empty reads shown in Figures
3.15(b) and 3.16(b), respectively, are consistent with the reader queries shown in Figure
3.11(b) as well as the identification speed shown in Figure 3.12(b). For example, Figure
3.15(a) shows that TW has more collisions than Aloha, but 3.11(a) shows that Aloha has
more queries than TW. This is because Aloha has much more empty reads than TW as
shown in Figure 3.16(a). Although Aloha has more queries than TW, Figure 3.12(a) also
shows that Aloha requires less identification time than TW. This is because an empty read
is 5 times faster than a collision for a reader.
A common observation that we make from the plots of all the metrics of TH for uniformly
distributed populations is that these plots have ups and downs and are not monotonic. This
is because when the number of tags increases, the starting level from where TH performs
the first DFT increases, which has an effect on all these metrics. These ups and downs are
also observed in the analytical plot in Figure 3.8.

3.7

Conclusion

The technical novelty of this chapter lies in that it represents the first effort to formulate
the Tree Walking process mathematically and propose a method to minimize the expected
number of queries and expected identification time. The significance of this chapter in terms
of impact lies in that the Tree Walking protocol is a fundamental multiple access protocol
and has been standardized as an RFID tag identification protocol. Besides static optimality,
our Tree Hopping protocol dynamically chooses a new optimal level after each subtree is traversed. We presented a method to make our protocol work with non-uniformly distributed
populations and achieve similar performance that it achieves with uniformly distributed populations. We also presented methods to make our protocol reliable, to continuously scan tag
populations that are dynamically changing, and to work with multiple readers with overlap-

79

ping regions. Another key contribution of this chapter is that we conducted a comprehensive
side-by-side comparison of two variants of our protocol with eight major prior tag identification protocols that we implemented. Our experimental results show that our protocol
significantly outperforms all prior tag identification protocols, even those that are not C1G2
compliant, for metrics such as the number of reader queries per tag, the identification speed,
and the number of responses per tag.

80

4

RFID Missing Tags

4.1

Introduction

4.1.1

Background & Motivation

Shoplifting, employee theft, and vendor fraud have become major causes of lost capital for
retailers [113]. In 2011 alone, the retailers lost an estimated 34.5 billion dollars due to these
causes [12]. With the benefits of not requiring a line-of-sight and low cost of tags (e.g., 5
cents per tag [93]), radio frequency identification (RFID) systems have been deployed for
monitoring products by affixing them with cheap passive RFID tags and using RFID readers,
which are given the IDs of the tags that are being monitored, to detect any missing tags. A
tag is a microchip with an antenna in a compact package that has limited computing power
and communication range. There are two types of tags: (1) passive tags, which power up
by harvesting the radio frequency energy from readers (as they do not have their own power
sources) and have communication range often less than 20 feet; (2) active tags, which have
their own power sources and have relatively longer communication range. A reader has a
dedicated power source with significant computing power. It transmits queries to a set of
tags and the tags respond over a shared wireless medium. In this chapter, we deal with both
passive and active RFID tags.

81

4.1.2

Summary & Limitations of Prior Art

There are two types of missing tag detection protocols: probabilistic [114, 76] and deterministic [71, 128, 73]. The probabilistic protocols are faster but only report the event that some
tags are missing, without pinpointing exactly which ones. The deterministic protocols return
IDs of all the missing tags but are comparatively slower. Both approaches have their merits.
In fact, they are complementary to each other, and should be used together. For example,
a probabilistic protocol should be used to detect a missing tag event and once detected, a
deterministic protocol should be invoked to identify which tags are missing. Several probabilistic protocols such as TRP [114] and EMTD [76] and deterministic protocols such as IIP
[71], MTI [128], and SFMTI [73] have been proposed.
There are two key limitations of existing protocols. The first limitation is that all existing
protocols assume a perfect environment with no unexpected tags, which is not a realistic
assumption. In reality, tag populations often contain unexpected tags whose IDs are unknown. Here we give three examples. For the first example, in airports where an airline
company uses RFID readers to monitor baggage of its passengers, the tags of other airline’s
baggage, which are in the vicinity of this airline’s readers, also respond to the queries of this
airline’s readers. For the second example, in a large warehouse rented to multiple tenants,
one tenant’s RFID readers receive responses from tags of other tenants. For the third example, in a retail store that uses RFID readers to monitor only expensive merchandize, the
readers receive responses from tags of inexpensive merchandize as well. Similar scenarios
exist in other settings such as hospitals and malls. Existing protocols can not handle the
presence of unexpected tags because they fill up unexpected slots in Aloha frames resulting
in unexpected false positives.
The second major limitation of existing protocols is that except TRP, none of them is
compliant with the EPCGlobal Class 1 Generation 2 (C1G2) RFID standard [55]. These
protocols require the manufacturers to put random bit sequences in tags for calculating
specialized hash functions. They also require the tags to be able to receive and interpret

82

“pre-vector” and/or “post-vector” frames to select slots in frames. Such functionalities are
not provisioned in the C1G2 standard because tags, especially the passive ones, do not have
enough computational power. It is important for an RFID protocol to be compliant with the
C1G2 standard because the cheap commercially available off-the-shelf (COTS) tags follow
the C1G2 standard. A protocol that is not compliant with the C1G2 standard will require
home brewed tags, which will not only cost more but will also work only in limited settings.
For example, if an airline uses a protocol and tags that are non-compliant with the C1G2
standard, it may be able to track its baggage at its home airport but not at the airports in
rest of the world, which support only the C1G2 compliant tags.

4.1.3

Problem Statement & Proposed Approach

Now we formally define the missing tag detection problem. Let E represent the set of IDs of
the expected tags, i.e., the tags that are expected to be present in a population and need to
be monitored. Let an unknown number of tags, m, out of these |E| tags be missing, where
0 ≤ m ≤ |E|. Let Ep be the set of IDs of the remaining |E| − m tags that are actually present
in the population. Let U be the set of IDs of all the unexpected tags in the population that
do not need to be monitored. We neither know exactly which IDs belong to sets Ep and U nor
do we know their sizes, but we do know that Ep ⊆ E. Let T be a threshold on the number of
missing tags. Our objective is to design a missing tag detection protocol using which a set of
readers should quickly detect a missing tag event with a probability ≥ α whenever the number
of missing tags m is greater than or equal to the threshold T , where α is called the required
reliability and lies in the range 0 ≤ α < 1. Additionally, a missing tag detection protocol
should work in single as well as multiple-reader environments, and should be compliant with
the C1G2 standard.
For the problem of detecting missing tags in the presence of unexpected tags, there are
three seemingly obvious solutions based on previous work. The first solution is to repeatedly
execute a tag collection protocol to collect IDs of all tags and compare them with the IDs

83

in set E to detect if any tags are missing. This solution works; however, it is too slow. For
example, our experimental results show that even the fastest existing tag collection protocol
TH [105] is 14.3 times slower than our scheme. The second solution is to first execute a tag
collection protocol to get the IDs of unexpected tags and then repeatedly execute an existing
missing tag detection protocol. This solution has two limitations. First, it is slow because
the missing tag detection protocol will have to monitor the unexpected tags in addition to
the expected tags. Second, the missing tag detection protocol will report a missing tag event
even when some unexpected tags go missing, which is not the requirement. Furthermore,
both these solutions can not be used in settings where readers are not allowed to read the IDs
of tags in set U due to privacy reasons. An example of such a setting is the aforementioned
multi-tenant warehouse, where one tenant may not permit readers of other tenants to read
the IDs of its tags. The third solution is to repeatedly execute a tag estimation protocol
and look for a net change in the population size. The limitation of this solution is that
if some expected tags go missing but an equal or greater number of unexpected tags join
the population, the estimation protocol can not detect the missing tag event. Furthermore,
missing tag detection protocols are much faster compared to estimation protocols due to the
knowledge of set E [114, 71, 73].
In this chapter, we propose a new protocol called RFID monitoring protocol with
unexpected tags (RUN), the first protocol that can achieve required reliability in detecting a missing tag event when unexpected tags might be present in the population. RUN uses
the frame slotted Aloha protocol specified in the C1G2 standard as its MAC layer communication protocol. In Aloha protocol, the reader first tells the tags a frame size f and a random
seed number R. Each tag within the transmission range of the reader then uses f , R, and
its ID to select a slot in the frame by evaluating a hash function h(f, R, ID) whose result
is uniformly distributed in [1, f ]. Each tag has a counter initialized with the slot number
it chose to reply. After each slot, the reader first transmits an end of slot signal and then
each tag decrements its counter by one. In any given slot, all the tags whose counters equal

84

1 respond with a random sequence called RN16. If no tag replies in a slot, it is called an
empty slot. If one or more tags reply in a slot, it is called a nonempty slot. As per the C1G2
standard, tags do not transmit their IDs unless the reader specifically asks them to do so.
In RUN, reader checks if a slot is empty or nonempty using the RN16 sequence and never
asks tags to transmit their IDs. This preserves the privacy in settings where a reader is not
allowed to read IDs of tags in set U.
To detect if any tags are missing, RUN executes multiple Aloha frames with different
seeds. In each frame, each tag uses the seed for that frame to select its slot. As RUN already
knows the IDs of all tags in set E, it pre-computes which tags in E will select which slots
in each frame. Thus, it knows which slots in the frames should be nonempty if all the tags
in E are present in the population. When a reader executes a frame, RUN compares the
response in each slot of that frame with the corresponding slot in the pre-computed frame.
If it finds that a particular pre-computed slot was nonempty but the corresponding slot in
the executed frame is empty, it stops and declares that some tags are missing. To minimize
the effect of unexpected false positives and consequently the detection time, RUN estimates
the size of U implicitly without running an extra estimation phase and uses this estimate
to calculate optimal values of system parameters. RUN works in single as well as multiple
readers environment.

4.1.4

Technical Challenges & Solutions

There are three key technical challenges in detecting a missing tag event. The first technical
challenge is to handle the presence of unexpected tags. Due to the presence of such tags, it
is possible that a particular slot that RUN expected to be nonempty due to a specific tag
in E actually turns out to be nonempty even though that specific tag in E was missing. To
address this challenge, RUN executes multiple frames with different seeds, which reduces the
effects of such unexpected false positives. We calculate the false positive probability due to
tags in Ep ∪ U and use it to calculate optimal values of frame sizes and the number of times
85

the frames should be executed to mitigate the effects of false positives.
The second technical challenge is to estimate the number of unexpected tags |U| in the
population, which is required to calculate the optimal values of system parameters. To
address this challenge, RUN first pre-computes which slots in each frame will the tags in E
not select. Second, it executes the frames and sees how many of such slots turn out to be
nonempty. The number of such slots that are nonempty in the executed frames is a function
of |U| but is independent of |E| because we know from the pre-computed frames that the
tags in E never select these slots. Thus, by observing number of slots that are empty in the
pre-computed frames and nonempty in the executed frames, RUN estimates |U|. Note that
RUN does not carry out a separate estimation phase to estimate the size of U. It obtains
the estimate while executing the Aloha frames for detecting a missing tag event and thus,
does not incur any extra time cost.
The third technical challenge is to achieve the required reliability in smallest possible
time. To address this challenge, we use the false positive probability to derive a “reliability
condition”, which, if satisfied by the system parameters, guarantees that RUN will achieve
the required reliability. These values of system parameters ensure with probability α that
there will be at least one slot in all the frames that is nonempty in the pre-computed frames
and empty in the executed frames when m ≥ T . To minimize RUN’s execution time, we
express the time in terms of the system parameters and minimize it under the constraint
that the system parameters satisfy the reliability condition.

4.1.5

Key Novelty & Advantages over Prior Art

The key novelty of this chapter is twofold. First, we identify the problem of detecting missing
tags in the practical scenario where unexpected tags are present. Second, we propose RUN
for detecting missing tags in the presence of unexpected tags. RUN has two key advantages
over prior art. First, it achieves the required reliability in the presence of unexpected tags,
whereas none of the existing protocols achieves the required reliability. We have extensively

86

evaluated and compared RUN with four state-of-the-art missing tag detection protocols
(TRP [114], IIP[71], MTI[128], and SFMTI[73]) in a variety of scenarios for a large range
of tag population sizes. Among existing protocols, SFMTI achieves the highest reliability of
67% whereas RUN achieves arbitrarily high reliability as per the requirement. Second, it is
compliant with the C1G2 standard whereas existing protocols, except TRP, are not.

4.2

Related Work

Several probabilistic [114, 76] and deterministic [71, 128, 73] missing tag detection protocols
have been proposed. The common and major drawback of all of these protocols is that none
of them handle unexpected tags and assume that the readers already know the IDs of all
tags that can be present in the population. Next, we review the existing probabilistic and
deterministic protocols.

4.2.1

Probabilistic Protocols

The objective of the probabilistic protocols is to detect if any tags in the population are
missing. Tan et al. proposed the first probabilistic protocol called TRP [114]. TRP precomputes slots in a frame and compares them with the executed slots to detect missing tags.
The difference with RUN, however, lies in that TRP does not consider false positives from
unexpected tags. Furthermore, for large populations, TRP requires frame size that exceeds
the C1G2 specified upper limit of 215 , which is not possible in practical RFID systems.
Among existing protocols, TRP is the only one that is compliant with the C1G2 standard as
long as the frame size is below 215 . Luo et al. proposed another probabilistic protocol called
EMTD [76]. This protocol is non-compliant with the C1G2 standard because it assumes
the RFID tags to be intelligent with enough computing power to implement a hash ring
and calculate hashes using that ring. None of the existing probabilistic protocols have been
designed to work in multiple-reader environment.

87

4.2.2

Deterministic Protocols

The objective of the deterministic protocols is to identify exactly which tags are missing
from a population. Li et al. proposed a suite of protocols in [71] out of which IIP performs
the best. IIP is non-compliant with the C1G2 standard due to following three reasons. First,
it requires tags to interpret pre-vector frames and reply to the reader queries as described
in those frames. Second, it requires frame sizes greater than 215 for large populations.
Last, it requires manufacturers to insert a ring of random bits in tag memory at the time
of manufacturing. IIP does not handle multiple readers either. Zhang et al. proposed a
deterministic protocol called MTI [128], which is essentially a tag collection protocol that
first collects IDs of all tags and then checks which tags are missing. MTI cannot be used
to achieve an arbitrary desired accuracy because the authors do not provide a frame work
to calculate system parameters. Liu et al. proposed a deterministic protocol called SFMTI
[73]. SFMTI is non-compliant with the C1G2 tags because it requires tags to interpret
non-standardized vectors before and after selecting a slot in a frame.

4.3
4.3.1

System Model
Architecture

For detecting missing tags, RUN uses a central controller connected with a set of readers that
cover the area where the tags in set E are located. The use of a central controller ensures that
all readers use consistent values of frame sizes and seeds when executing frames, which helps
in efficiently aggregating and processing information returned by the readers. The readers
use the standardized frame slotted Aloha protocol to communicate with tags and never ask
the tags to transmit their IDs. The use of multiple readers with overlapping coverage regions
introduces following two problems: (1) scheduling the readers such that no two readers with
overlapping regions transmit at the same time, and (2) mitigating the effect of some tags

88

responding to multiple readers due to overlap in the coverage region of those readers. For the
first problem, the controller uses one of the several existing reader scheduling protocols [116]
to avoid reader-reader collisions. For the second problem, we propose solution in Section 4.4.

4.3.2

C1G2 Compliance

RUN does not require any modifications to tags or readers. It only requires the readers
to receive the frame size, persistence probability, and seed number from the controller and
communicate the responses in the frames back to the controller. Persistence probability p
is the probability with which a tag decides whether it will participate in a frame or not
before selecting a slot in that frame. Later in the chapter, we will show how we use p to
handle frame sizes that exceed the C1G2 specified upper limit of 215. Such large frame sizes
are required when the size of tag population is large, required reliability α is high, or the
threshold T is small. As the C1G2 standard does not specify the use of p, COTS tags do
not support it. To avoid making any modifications to tags, in RUN, the reader implements
p by announcing a frame size of f /p but terminating the frame after the first f slots, which
can be done as per the C1G2 standard.

4.3.3

Communication Channel

We assume that the communication channel between readers and tags is reliable i.e., tags
correctly receives queries from the readers and the readers correctly detect transmission of
RN16 sequence in a slot if one or more tags in the population transmit in that slot. If the
channel is unreliable, the solution proposed in [105] can be easily adapted for use with RUN.

4.3.4

Formal Development Assumption

To make the formal development tractable, we assume that instead of picking a single slot
to transmit at the start of ith frame of size f , a tag independently decides to transmit

89

in each slot of the frame with probability 1/f regardless of its decision about previous or
forthcoming slots. Vogt first used this assumption for the analysis of Aloha protocol for
RFID and justified its use by recognizing that this problem belongs to a class of problems
called occupancy problem, which deals with the allocation of balls to urns [121]. Ever since,
the use of this assumption has become a norm in the formal analysis of all Aloha based
RFID protocols [102, 121, 129].
The implication of this assumption is that a tag can end up choosing more than one slots
in the same frame or even not choosing any at all, which is not in accordance with the C1G2
standard that requires a tag to pick exactly one slot in a frame. However, this assumption
does not create any problems because the expected number of slots that a tag chooses in
a frame is still one. The analysis with this assumption is, therefore, asymptotically the
same as that without this assumption. Bordenave et al. further explained in detail why
this independence assumption in analyzing Aloha based protocols provides results just as
accurate as if all the analysis was done without this assumption [33]. This independence
assumption is made only to make the formal development tractable. In all our simulations,
a tag chooses exactly one slot at the start of a frame.

4.4

Protocol Description

To detect if any of the tags in set E is missing from the population, in RUN, the central
controller executes up to n Aloha frames using the RFID readers. There are 6 steps involved
in executing each frame. First, before executing any frame i, the controller calculates the
optimal values of frame size fi , persistence probability pi , and generates a random seed
number Ri . Second, as the controller knows the IDs in set E, it pre-computes which tag
in E will choose which slot in the ith frame. Thus, it knows which slots of the executed
ith frame should be nonempty if all the tags in E were present and a single reader covered
the entire population. It represents the nonempty slots in the pre-computed frame with 1s

90

and all other slots with 0s. Third, it provides each reader with the parameters fi , pi , and
Ri and asks each of them to execute the ith frame using these parameters. The motivation
behind using the same values of fi , pi , and Ri across all readers for the ith frame is to enable
RUN to work with multiple readers with overlapping regions. As all readers use the same
values of fi , pi , and Ri in the ith frame, the slot number that a particular tag chooses in
the ith frame of each reader covering this tag is the same i.e., h(fi /pi , Ri , ID) evaluated by
the tag results in same value for each reader. Fourth, each reader executes the frame on
its turn as per the reader scheduling protocol and sends the responses in the frame back to
the controller. Fifth, when the controller receives the ith frame of each reader, it applies
logical OR operator on all the received ith frames and obtains a resultant ORed frame. This
resultant ORed frame is same as if received by a single reader covering all the tags. Sixth,
the controller compares all the slots in the pre-computed ith frame with the corresponding
slots in the resultant ORed ith frame. If there is any slot that is 1 in the pre-computed
frame but 0 in the resultant ORed frame, the controller detects this as a missing tag event
because such a slot implies that all tags in E that mapped to this slot in the pre-computed
frame are absent from the population. At this point, the controller stops the protocol and
does not execute the remaining n − i frames. If the controller does not detect a missing tag
event even after each reader has executed n frames, it declares that the number of missing
tags m is less than the threshold T .

4.5

Parameter Optimization

Recall from the previous section that before executing any frame i, the controller calculates
the optimal values of frame size fi and persistence probability pi . For this, the controller
˜ i |, based on
first estimates the value of |U| at the start of the ith frame, represented by |U
the responses from the tag population in the previous i − 1 frames. Details about estimating
the value of |U| will be given in Section 4.5.1. Then, using this estimate along with the

91

values of |E|, α, and T , the controller calculates the optimal values of the frame size fi
and persistence probability pi such that RUN achieves the required reliability in shortest
time. Before asking the readers to execute the ith frame, the controller also recalculates
the maximum number of frames that it should execute, represented by ni . As the controller
˜ i | asymptotically becomes
executes more and more frames, i.e., as i increases, the estimate |U
equal to |U|. Consequently, fi , pi , and ni asymptotically become equal to constants f , p,
and n, respectively. When the estimate of |U| does not change by more than 1% in 10
consecutive frames, the controller considers the estimate to be close enough to |U|. At this
point, the controller calculates the values of fi , pi , and ni and puts f = fi , p = pi , and
n = ni , and uses these fixed values of f and p to execute subsequent frames until the total
number of frames executed since the first frame become equal to n. Note that the controller
executes n frames only if it does not detect any missing tag event in any frame. Otherwise,
it terminates the protocol as soon as it detects a missing tag event. For the first frame, i.e.,
when i = 1, the controller uses f1 = 2 × |E|, p1 = 1, and n1 = ∞. The choices of the values
of f1 , p1 , and n1 are arbitrary and do not really matter because as the controller executes
more frames, the frame size, the persistence probability, and the number of frames converge
to constants f , p, and n, respectively.
In rest of this section, we will derive equations that the controller uses at the start of each
frame to calculate the optimal values of frame size f , number of times the frames should
be repeated n, and persistence probability p to minimize the execution time of RUN while
ensuring that its actual reliability is no less than the required reliability. We have dropped
the subscript i from these parameters to make the presentation simple. To calculate these
optimal values, the controller requires the estimate of |U|. Next, we will first present a
method to obtain this estimate at the start of any frame i based on the responses from
the tag population in the previous i − 1 frames. Second, using the estimate of |U|, we will
derive an expression for the false positive probability, i.e., the probability that a missing
tag is detected as present. Third, we will use the expression for false positive probability

92

in conjunction with the required reliability α and threshold T to obtain an equation with
three unknowns f , p, and n. To ensure that the actual reliability is greater than or equal
to the required reliability, the controller must use the values of f , p, and n that satisfy this
equation. We call this equation the reliability condition. Fourth, we will derive an expression
for the total execution time of RUN and minimize it with respect to n to get an expression
involving p and n. The controller simultaneously solves this expression with the reliability
condition using p = 1 to obtain the optimal values of f and n. Last, we will show how to
bring the value of f within limit when the optimal value of the frame size exceeds the C1G2
specified upper limit of 215 , We will also calculate the expected number of slots RUN takes
to detect the first missing tag event. Next, we describe these five steps in detail.

4.5.1

Estimating Number of Unexpected Tags

In this section, we present a method to estimate the number of unexpected tags in the
population at the start of any frame i. Although a lot of work has been done by the research
community to estimate the number of tags present in an RFID tag population [61, 62, 102,
39], there is no work on estimating the size of some subset of RFID tag population. In our
case, that subset is the set of unexpected tags in the population.
Recall from Section 4.4 that in any frame i, the slots that are 0 in the ith pre-computed
frame are the slots that only the tags in set U can select when the reader executes the ith
frame. This is because we have prior knowledge that the tags in set E will select only those
slots in the ith executed frame that are 1 in the ith pre-computed frame. The intuition behind
our estimation method is that as the number of unexpected tags in a population increases,
the number of slots that are 0 in a pre-computed frame but are 1 in the corresponding
executed resultant ORed frame also increase. The number of such slots in any given frame
is a function of |U| and can, therefore, be used to estimate the value of |U|.
Next, we derive an expression that relates the number of slots that are 0 in a pre-computed
frame but are 1 in the corresponding executed resultant frame with the value of |U|. We
93

will use this expression to obtain the estimate of |U|. Let the size of the ith frame be fi and
let ki out of these fi slots be 1s in the pre-computed frames. Let j be the j th 0 slot in the
pre-computed frame. Thus, 1 ≤ j ≤ fi − ki . Let Xij be an indicator random variable for the
event that the j th 0 slot in the ith pre-computed frame turns out to be 1 in the ith executed
resultant frame. The expected value of Xij is given by
p
− i |U|
pi |U|
E[Xij ] = P Xij = 1 = 1 − 1 −
≈ 1 − e fi
fi

Let Ni01 be a random variable representing the number of slots that are 0 in the ith precomputed frame but 1 in the ith executed resultant frame. Thus, Ni01 =
Xi1 , Xi2 , . . . , Xi(f −k )
i i

fi −ki
j=1 Xij .

As

forms a set of identically distributed random variables, E[Ni01 ] is

given by
E[Ni01 ]

= E[

fi −ki
j=1

Xij ] = (fi − ki ) × E[Xij ] =

p
− i |U|
(fi − ki ) × (1 − e fi )

(4.1)

Let N˜i01 represent the observed value of the number of slots that were 0 in the ith precomputed frame but 1 in the corresponding executed resultant frame. Replacing E[Ni01 ] in
the equation above with N˜ 01 and solving for |U| gives an estimate of |U|. This estimate
i

is obtained by utilizing the information from the ith frame only. While this estimate may
not be accurate, if we use the information from a large number of frames, the estimate will
become more accurate. Specifically, we leverage the well known statistical result that the
variance in the observed value of a random variable reduces by x times if we take the average
˜ i | of |U| at
of x observations of that random variable. Therefore, to obtain the estimate |U
the start of the ith frame, we obtain an estimate from each of the previous i − 1 frames and
take their average. Solving Equation (4.1) for |U| and averaging over past i − 1 frames, the
˜ i | becomes
formal expression for |U
˜ i| = − 1
|U
i−1

i−1

N˜l01
fl
ln 1 −
p
fl − kl
l=1 l

(4.2)

Finally, note that the controller obtains this estimate without executing any additional
94

frames. It gets this estimate from the frames it was already executing to detect missing tag
events.

4.5.2

False Positive Probability

A false positive occurs when all the slots that a particular missing tag maps to in the n precomputed frames turn out to be nonempty when the frames are executed because some other
tags in the population also selected those slots. Lemma 9 gives the expression to calculate
the false positive probability.
Lemma 9. Let m out of |E| tags be missing, and let there be |U| unexpected tags in the
population. With persistence probability p, frame size f , and number of frames n, the false
positive probability, Pf p , is given by:
Pf p =

p |U|+|E|−m
1−p 1−
f

n

(4.3)

Proof. The total number of tags in the population are |U| + |E| − m. Consider an arbitrary
tag in E that is missing from the population. As this tag participates in each pre-computed
frame with probability p, it is possible that it does not participate in one or more of the n
pre-computed frames. Let Z be the random variable for the number of pre-computed frames
in which this missing tag participates. Let q be the probability that a slot that this missing
tag maps to in a pre-computed frame is selected by one or more of the tags present in the
population in the executed frame. Therefore,
n

Pf p =
z=0

P {Z = z} × q z

(4.4)

As a missing tag participates in each pre-computed frame with probability p and there
are n pre-computed frames, the number of pre-computed frames in which the missing tag
participates follows a binomial distribution i.e., Z ∼ Binom(n, p). When a frame is executed,
probability that at least one tag in the population chooses the same slot to which the missing

95

tag maps in the pre-computed frame is 1−(1− fp )|U|+|E|−m , which is the value of q. Therefore,
Equation (4.4) becomes
n

Pf p =
z=0

z
n z
p
p (1 − p)n−z 1 − (1 − )|U|+|E|−m
f
z

The binomial theorem states that

n
n
z=0 z

xz y n−z = (x + y)n . Substituting x = p × 1 −

(1 − fp )|U|+|E|−m and y = 1 − p, we get Equation (4.3).

Figure 4.1 shows the theoretically calculated false positive probability from Equation (4.3)
represented by the solid line and experimentally observed values of false positive probability
represented by the dots. To obtain this figure, we use |E| = 100, |U| = 500, f = 300,
p = 1, and n = 2. Each dot represents the false positive probability calculated from 100
runs of simulation. We observe that the theoretically calculated values match perfectly with
experimentally observed values, showing that our independence assumption that we stated
in Section 4.3.4 does not cause the theoretical analysis to deviate from practically observed
values. We also observe that as the number of missing tags increases, the false positive
probability decreases. This means that it is hardest for RUN to detect a missing tag event
when m = T and becomes easier as m increases beyond T . Thus, we will use m = T in all
further analytical development, because if RUN is able to detect a missing tag event with
probability α when m = T , it will be able to detect a missing tag event with probability
greater than α when m > T .

4.5.3

Achieving Required Reliability

Following theorem gives the reliability condition that the values of f , p, and n need to satisfy
in order for RUN to be able to achieve the required reliability.
Theorem 10. Given a set E with expected IDs, set U with unexpected IDs, threshold T , and
required reliability α, RUN will achieve the required reliability if the values of f , p, and n
satisfy the reliability condition given below.

96

p(T − |E| − |U|)

f=

ln

(4.5)

1
1−(1−α) nT
p

Proof. Probability that RUN detects at least one of the missing tags is 1 − PfTp . In the worst
case, this probability should at least be equal to α i.e., 1 − PfTp = α. Substituting the R.H.S
of Equation (4.3) for Pf p gives
p |U|+|E|−T
1 −α = 1 −p 1 −
f

nT

nT
p
− f (|U|+|E|−T )

≈ 1 −pe

Rearranging the equation above gives Equation (4.5).

4.5.4

Minimizing Execution Time

Following theorem gives the condition that the values of p and n need to satisfy to make
the execution time of RUN minimum under the constraint that it achieves the required
reliability.
12

x 10

0.9

Total slots

False positive probability

4

1

0.8
0.7
0.6
0.5

Theoretical
Simulations
10

20
30
40
No. of missing tags

9
6
3
0

50

Figure 4.1Pf p

2

4
6
8
No. of frames

10

Figure 4.2 Sd vs. n

Theorem 11. Given a threshold T and required reliability α, the execution time of RUN is
minimum under the constraint that it achieves the required reliability if the values of p and
n satisfy the following equation:
1
(1−α) nT

p=

1
1 − (1 − α) nT

1
nT
)
nT
(−1+(1−α)
(1 − α)

97

(4.6)

x 102

3
No. of frames

Frame size

15
10
5
0

x 103

2
1
0

0.2 0.4 0.6 0.8
1
Persistence probability

Figure 4.3 f vs. p

0.2 0.4 0.6 0.8
1
Persistence probability

Figure 4.4 n vs. p

Proof. Execution time is directly proportional to the total number of slots required to detect
the missing tag event because the duration of each slot is the same, typically 300µs for Philips
I-Code RFID reader [100]. Let Sd represent the total number of slots. Thus, Sd = f × n.
To ensure that RUN achieves the required reliability, we use the value of f from Equation
(4.5). Thus,
Sd =

pn(T − |E| − |U|)
ln

1
1−(1−α) nT
p

(4.7)

Figure 4.2 plots Sd as a function of n. We observe that Sd is a convex function of n.
Therefore, optimum value of n exists, represented by nop , that minimizes the total number
of slots Sd . To find optimal value of n, we differentiate Equation (4.7) with respect to n and
equate the resulting expression to 0, which gives Equation (4.6).
At the start of each frame, the controller replaces |U| with its estimate, puts p = 1 in
Equation (4.6), and solves it numerically using Brent’s method to obtain the optimal value
of number of frames nop . Then it puts n = nop and p = 1 in Equation (4.5) to get the
optimal value of frame size fop. When the controller calculates fop and nop like this at the
start of each frame, the execution time of RUN is minimized. At the same time, as the
reliability condition is satisfied, the protocol achieves the required reliability.

98

4.5.5

Handling Large Frame Sizes

For large populations, high required reliability, and/or small threshold, it is possible for the
value of fop to exceed the C1G2 specified upper limit of 215 . Next, we describe how we use
p to bring the frame size within limits. Bringing the frame size within limits comes at a cost
of increased number of slots; greater than the minimum value of Sd that would have been
achieved if the controller could use fop > 215 .
When we decrease the value of p, the number of tags that participate in a frame decrease.
Therefore, intuitively, the required value of f should also decrease. Figure 4.3 confirms this
intuition. This figure shows the plot of frame size vs. persistence probability, obtained using
Equations (4.5) and (4.6). We can see that when p decreases, f decreases. Participation
by lesser tags means that participation by the tags belonging to both the sets E and U
decreases. This increases the chances that a given missing tag will not map to any slot in
a given pre-computed frame, which means that chances of detecting its absence decrease.
Therefore, the overall uncertainty in detection of missing tags increases. To reduce this uncertainty, intuitively, the value of n should increase when p decreases to achieve the required
reliability. Figure 4.4 confirms this intuition. This figure shows the plot of number of frames
vs. persistence probability, obtained using Equations (4.5) and (4.6). We observe that when
p decreases, n increases.
We use these two observations to reduce the value of f whenever fop > 215 . When
fop > 215 , the controller uses f = fmax = 215 in Equation (4.5), which leaves two unknowns,
p and n, in the resulting equation. The controller solves the resulting equation simultaneously
with Equation (4.6) to get new values of p and n. The new value of p is less than 1 and the
new value of n is greater than nop because fmax < fop . Putting f = fmax in Equation (4.5)
and solving for n, we get
ln {1 − α}

n=
T ln

p
(T −|E|−|U|)
1 − pe fmax

99

(4.8)

Replacing n in Equation (4.6) with the R.H.S of the equation above, and simplifying, we get
p2 (T − |E| − |U|)

p (|E|+|U|−T )
f (e fmax

− p)

p

= ln 1 − pe fmax

(T −|E|−|U|)

The numerical solution of the equation above gives the new value of p, which the controller
puts in Equation (4.8) to get the new value of n. The controller uses these new values of
n and p along with f = fmax to pre-compute the ith frame. Although the total number of
slots Sd = fmax × n > fop × nop , this is still the smallest under the constraints that the
required reliability is achieved and the frame size does not exceed fmax .

4.5.6

Expected Detection Time

The values of f and n that we calculate as described in the sections above ensure that in
executing n frames, RUN will detect a missing tag event with probability greater than or
equal to α if number of missing tags is greater than or equal to T . However, in many cases,
the first missing tag event is detected before all n frames are executed. We calculate the
expected value of the number of slots that RUN takes to detect the first missing tag event.
For this, we calculate the probability that a missing tag event is detected in a given slot and
use it to calculate the expected value.
Lemma 12. Given a set E with expected IDs, set U with unexpected IDs, and threshold T ,
when controller executes RUN with persistence probability p and frame size f , the probability
g that a missing tag event is detected in any slot is given by the following equation.
g=

1− 1−

p T
f

×

1−

p |U|+|E|−T
f

(4.9)

Proof. Probability that a missing tag event is detected in a given slot is the product of the
probability that at least one missing tag maps to this slot in the pre-computed frame and the
probability that no tag in the population selects that slot in the executed frame. Considering
the scenario where it is hardest for RUN to detect a missing tag event i.e., when m = T ,
probability that at least one of the missing tags maps to the given slot in the pre-computed
100

frame is 1− 1− fp

T

. The probability that none of the tags present in the population selects

|U|+|E|−T
that slot is 1 − fp
. The product of these two probabilities gives the expression

for g in Equation (4.9).
Following theorem gives the expected value of the number of slots that RUN takes to
detect the first missing tag.
Theorem 13. Let D be the random variable for the slot number when the first missing tag
event is detected. Given that the probability of detecting a missing tag event in a slot is g,
as calculated in Lemma 12, frame size is f , and number of frames is n, we get

E[D] =

1 − (1 − g)f n − f ng(1 − g)f n
g

(4.10)

Proof. The random variable D follows geometric distribution with parameter g i.e.,
P {D = i} = (1 − g)i−1g. The expected value, thus, becomes
Sd

E[D] =
i=1

4.6

iP {D = i} =

f ×n
i=1

ig(1 − g)i−1

1 − (1 − g)f n − f ng(1 − g)f n
=
g

Performance Evaluation

We implemented RUN in Matlab. Although, none of the existing protocols handles the
presence of unexpected tags and except for TRP, none of them is C1G2 compliant, we still
implemented four prior state of the art missing tag detection protocols in Matlab namely
TRP [114], IIP[71], MTI[128], and SFMTI[73], and compared their performance with RUN.
We calculated parameter values for these protocols by following the instructions in their
respective papers. We also implemented the fastest existing tag collection protocol TH
[105]. We choose tag ID length of 64 bits as specified in the C1G2 standard. Note that the

101

distributions of the IDs of expected, unexpected, and missing tags do not matter because
RUN is independent of ID distributions.
We first evaluate the actual reliability of RUN and the existing protocols for multiple
values of required reliability, keeping the unexpected tag population size fixed and changing
the number of missing tags. We also show the time taken by each protocol to detect the
first missing tag event. Second, we evaluate the actual reliability of RUN and the existing
protocols for multiple values of required reliability by keeping the number of missing tags
fixed and changing the unexpected tag population size. We again show the time taken by
each protocol to detect the first missing tag event. Third, we study the actual reliability
achieved by each protocol when the number of tags missing from the population is different
from the value of threshold T . Last, we compare the detection times of our protocols with
the fastest tag collection protocol TH.

4.6.1

Impact of Number of Missing Tags

RUN is the only protocol that achieves the required reliability in the presence of unexpected
tags for any number of missing tags. Figures 4.5(a) and 4.5(b) show the actual reliability
achieved by RUN and all existing protocols for α = 0.9 and 0.99, respectively. These figures
are plotted using |E| = 1000, |U| = 10000 and m is varied from 50 to 900. The actual
reliabilities are obtained using 100 runs of each protocol for each value of m. None of the
existing protocols achieves the required reliability because none of them is designed to handle
unexpected tags. Among the existing protocols, SFMTI has the highest actual reliability of
up to 0.67
RUN is the fastest protocol that achieves the required reliability compared to the existing
protocols. Figures 4.6(a) and 4.6(b) show the average times each protocol took to either
detect the first missing tag event if it finds a missing tag or to complete execution if it
does not find a missing tag. From these figures, MTI seems to have smaller detection time
compared to RUN, but when we observe these figures in conjunction with Figures 4.5(a) and

102

0.8
0.6
0.4
0.2
0

1
Actual Reliability

Actual Reliability

1
RUN
SFMTI
IIP
TRP
MTI

0.8
0.6
0.4
0.2
0

200
400
600
800
No. of missing tags

(a) α = 0.90

RUN
SFMTI
IIP
TRP
MTI
200
400
600
800
No. of missing tags

(b) α = 0.99

Figure 4.5 Actual reliability vs. missing tags
4.5(b), we see that the actual reliability of MTI is close to 0, far lower than the required
reliability. This shows that for majority of times, MTI completed execution without detecting

x 103
15 SFMTI
TRP
10 IIP
RUN
MTI
5
0

No. of slots

No. of slots

any missing tags due to the unexpected tags.
x 103
15 SFMTI
TRP
10 IIP
RUN
MTI
5
0

200 400 600 800
No. of missing tags

(a) α = 0.90

200 400 600 800
No. of missing tags

(b) α = 0.99

Figure 4.6 Detection time vs. missing tags

4.6.2

Impact of Number of Unexpected Tags

RUN is the only protocol that achieves the required reliability in the presence of unexpected
tags while existing protocols achieve the required reliability only when there are no unexpected
tags in the population. Figures 4.7(a) and 4.7(b) show the actual reliability obtained by RUN
and the existing protocols for α = 0.9, and 0.99, respectively. These figures are plotted using
|E| = 1000, m = 200, and |U| is varied from 0 to 10000. RUN always achieves the required
103

reliability whereas the existing protocols achieve the required reliability only when |U| is
close to zero.
1

0.8

RUN
SFMTI
IIP
TRP
MTI

0.6
0.4
0.2
0
0

Actual Reliability

Actual Reliability

1

0.8

0.4
0.2
0
0

2.5k
5k
7.5k
10k
No. of unexpected tags

(a) α = 0.90

RUN
SFMTI
IIP
TRP
MTI

0.6

2.5k
5k
7.5k
10k
No. of unexpected tags

(b) α = 0.99

Figure 4.7 Actual reliability vs. number of unexpected tags

RUN is the fastest protocol that achieves the required reliability compared to the existing
protocols even when there are no unexpected tags in the population. Figures 4.8(a) and
4.8(b) show the average times each protocol took to either detect the first missing tag event
or complete execution without detecting any missing tags. From these figures, MTI again
seems to have smaller detection time compared to RUN when number of unexpected tags in
the population is large, but when we analyze these figures in conjunction with Figures 4.7(a)
and 4.7(b), we see that actual reliability of MTI is close to 0 when number of unexpected
tags in the population is large. Figures 4.7(a) and 4.7(b) show that SFMTI achieves the
required reliability for up to 5000 unexpected tags, but then Figures 4.8(a) and 4.8(b) show
that its execution time is 5 times greater than RUN.

4.6.3

Impact of Deviation from Threshold

The actual reliability of RUN exceeds the required reliability when the number of missing tags
in the population exceed the threshold T . This is seen in Figure 4.9, which plots the actual
reliabilities of all protocols when number of missing tags are larger or smaller compared to
T . This figure is made using |E| = 1000, |U| = 10000, T = 200, α = 0.99, and m is varied
104

3

3

x 10
SFMTI
7.5 TRP
IIP
5 RUN
MTI
2.5
0
0

x 10
SFMTI
7.5 TRP
IIP
5 RUN
MTI
2.5
10

No. of slots

No. of slots

10

0
0

2500 5000 7500 10000
No. of unexpected tags

2500 5000 7500 10000
No. of unexpected tags

(a) α = 0.90

(b) α = 0.99

Figure 4.8 Detection time vs. number of unexpected tags
from 50 to 900. The actual reliability of RUN is less than the required reliability only when
the number of missing tags are less than T , but this is insignificant because we are interested
in detecting the missing tags only if the number of missing tags in a population exceed the
threshold T .
Actual Reliability

1

RUN
SFMTI
IIP
TRP
MTI

0.8
0.6
0.4
0.2
0

200
400
600
800
No. of missing tags

Figure 4.9 Effect of difference between m and T

4.6.4

Comparison with Tag ID Collection Protocol

RUN is faster than the fastest tag ID collection protocol, TH, in all practical scenarios. For
example, for |E| = 1000, |U| = 10000, and T = m = 200, RUN is 14.3 times faster than TH
for α = 0.99. As the threshold T decreases and/or the required reliability increases, detection
time of RUN increases. Therefore, there exists a value of T and/or α for a given |E| and |U|
for which the tag ID collection protocol is faster than RUN. For example, for |E| = 1000,
105

|U| = 10000, and T = 200, TH is faster than RUN when required α is greater than 0.99999.
Such high values of α are seldom required. Similarly, for |E| = 1000, |U| = 10000, and
α = 0.99, TH is faster than RUN for T < 0.001. In all practical scenarios, the threshold can
not be less than 1. Therefore, practically, RUN is always faster than the tag ID collection
protocols.

4.7

Conclusions

The key technical contribution of this chapter is in proposing a protocol to detect missing
tag events in the presence of unexpected tags. This chapter represents the first effort on
addressing this important and practical problem. The key technical depth of this chapter
is in the mathematical development of the theory that RUN is based upon. The solid
theoretical underpinning ensures that the actual reliability of RUN is greater than or equal
to the required reliability. We have proposed a technique that our protocol uses to handle
large frame sizes to ensure compliance with the C1G2 standard. We have also proposed a
method to implicitly estimate the size of the unexpected tag population without requiring an
explicit estimation phase. We implemented RUN and conducted side-by-side comparisons
with four major missing tag detection protocols even though the existing protocols do not
handle the presence of unexpected tags. Our protocols significantly outperform all prior
protocols in terms of actual reliability as well as detection time.

106

5

Per-flow Latency Measurement

5.1

Introduction

5.1.1

Motivation

Although traditionally throughput has been the primary focus of network engineers, nowadays latency has seen growing importance because a wide variety of emerging applications
and architectures require extremely low (in microseconds) and stable (low jitter) latencies.
First, many emerging applications, such as financial trading applications [79], storage applications utilizing Fiber Channel over Ethernet [8], and high performance computing applications in data center networks [21], demand low latency. A small increase in latency
may cause violations of service level agreements and result in significant revenue losses. For
example, a one-millisecond advantage in financial trading applications can be worth $100
million a year for major brokerage firms [79]. Second, many emerging architectures, such
as content delivery networks (CDNs) and mission-critical data center networks, demand low
latency. CDN providers are mostly evaluated and ranked by content publishers based on
latency. Companies such as Cedexis [5] and Turbobytes [18] constantly evaluate and rank
CDN providers mostly based on latency. A one-millisecond disadvantage could put one CDN
provider behind others and result in loss of business with content publishers. Similarly, the
transit providers are primarily evaluated and ranked by CDN providers based on latency.
For data centers running mission-critical applications, latency guarantee is a key requirement
for the underlying networks. Low latency data center networks have become the primary
107

focus of many data center network solution providers such as Sidera [13].
In managing networks with stringent latency demands, operators often need to measure
the latency between two observation points for a particular flow. An observation point is
either a port in a middlebox (such as a router or a switch) or a network card in an end host.
Per-flow latency measurement can be used reactively by network operators to perform tasks
such as detecting and localizing delay spikes in a network, isolating offending flows that are
responsible for causing delay bursts, and rerouting them through other paths. It can also be
used proactively by network operators to continuously monitor latencies between observation
points for locating bottleneck links and replace them with higher capacity links.
Existing routers and switches provide little help for latency measurement and monitoring. SNMP counters measure the number of packets passing through a port. NetFlow
measures basic statistics, such as the numbers of packets and bytes, of a flow. Both provide no measurement on latency. Network operators often rely on injecting probe packets
to measure end-to-end delays and then use tomographic techniques to infer link and hop
properties [40, 45]. However, to achieve latency measurement with extremely high accuracy, the required number of probe packets will be extremely large; consequently, the probe
packets will consume too much bandwidth and the measured latency does not reflect the
real latency without probe packets. Although some specialized latency monitoring devices
are commercially available, they are too costly to be widely deployed. For example, London, Singapore, and Tokyo stock exchanges use latency monitoring devices manufactured by
Corvil [19, 14, 17] costing around USD 180,000 for a 2 × 10Gbps box [7].

5.1.2

Problem Statement

This chapter addresses the fundamental problem of per-flow latency measurement: for any
flow that passed through any two observation points, measure (or say estimate) the average
and standard deviation of the latencies experienced by the packets of that flow in passing
through the two observation points. Formally, given a confidence interval β ∈ (0, 1] and
108

a required reliability α ∈ [0, 1), for any flow f that passed through any two observation
points S and R, obtain estimates µ
˜ f of the average µf and σ
˜f of the standard deviation
σf of the latencies experienced by the packets of f in passing through S and R so that
P |˜
µf − µf | ≤ βµf ≥ α and P |˜
σf − σf | ≤ βσf ≥ α.
An accurate per-flow latency measurement scheme should further satisfy the following two
requirements. (1) No packet probing: As probe packets may use up a significant portion of
network bandwidth, the latency measured with the insertion of probe packets may significantly deviate from the real latency. Thus, the estimates obtained with probe packets may
not suffice for microsecond level accuracy. (2) No time stamping: First, IP headers do not
have a time stamp field and the TCP time stamp option is meant for measuring end-to-end
latencies. Embedding time stamps at observation points requires modifications to packet
header formats, which further requires modifications to the data forwarding paths of existing routers and middleboxes. Furthermore, the added packet header fields may consume a
significant portion of network bandwidth. Second, the process of attaching time stamps to
each packet takes a non-negligible amount of time at observation points. Thus, the latency
measured with time stamping may significantly deviate from the real latency.

5.1.3

Limitations of Prior Art

To the best of our knowledge, there are only two per-flow latency measurement schemes,
namely RLI [68] and MAPLE[69]. However, neither of them satisfies both requirements
because RLI uses packet probing and MAPLE attaches time stamps to every packet. Other
than RLI and MAPLE, the closest work is LDA [63], which performs aggregate latency
measurement, i.e., given any two observation points, measure (or say estimate) the average
and standard deviation of the latencies experienced by all the packets that passed through
the two observation points, regardless of the flow that each packet belongs to. Aggregate
latency measurement is useful; however, it does not provide fine grained per-flow latency
information. Here is an important fact: for all flows that passed through two arbitrary

109

observation points S and R, the latencies experienced by the packets of different flows in
passing through S and R can be quite different. First, there may be multiple paths from
S to R and different flows may be routed via different paths. Second, at each intermediate
middlebox along a path from S to R, packets of different flows may take different amount
of processing time due to mechanisms such as QoS. As aggregate latency measurement does
not reflect the latency of every flow, it falls short in engineering latency sensitive networks.
On one hand, when the aggregate latency between two observation points appears normal,
the latency experienced by an individual flow may be wildly abnormal. On the other hand,
when the aggregate latency between two observation points appears abnormal, aggregate
latency measurement does not provide operators the per-flow latency information needed to
identify the flows being hurt.

5.1.4

Proposed Approach

In this chapter, we propose COLATE, a Counter based Per-flow Latency Estimation scheme.
The key idea of COLATE is that it records timing information of packets at each observation
point and purposely allows noise to be introduced in the recorded timing information for
minimizing storage space. When querying the latency of a target flow, COLATE statistically
denoises the recorded information to obtain an accurate latency estimate. COLATE has two
phases: recording phase and querying phase. Next, we give an overview of these two phases.
5.1.4.1

Recording Phase

In this phase, at each observation point, COLATE records the timing information of each
packet arriving at or departing from that point using a vector of counters in RAM, which we
call counter vector. For each flow with a unique ID, COLATE maps it to a unique subset of
these counters, which we call counter subvector. The ID of a flow can be any flow identifier
such as the standard five tuple (i.e., source IP, destination IP, source port, destination port,
and protocol type). To make the mapping unique and memoryless (i.e., using no memory to

110

keep track of the mapping), COLATE maps each flow to a random subvector such that the
probability of two different flows being mapped to the same subvector is practically zero. A
counter may belong to multiple counter subvectors. Figure 5.1 shows an example counter
vector and its three counter subvectors, from which we see that counters 5 and 8 belong
to multiple counter subvectors. For each arriving or departing packet at the observation
point, COLATE executes two simple steps: (1) randomly maps the packet to a counter
in the counter subvector of the flow that the packet belongs to; (2) adds the current time
to that counter. Before any counter overflows, COLATE dumps the counter vector to a
permanent storage (such as a solid state drive (SSD)) and resets counters to zero. We call
a dumped counter vector a counter epoch, which has two attributes: the time stamp of the
first recorded packet and the time stamp of the last recorded packet. The recording module
can be implemented in hardware to keep up with wire speed. For the hashing function, we
can use hardware hash implementations such as those proposed in [54, 91]. For each counter,
we can store its less significant bits as a counter in SRAM and the more significant bits as a
counter in DRAM – when the counter in SRAM overflows, we increment the corresponding
counter in DRAM.
Counter Vector

1
1

3

2
2

3
5

4

5

6

7

6

5

4

8

8

9 10
5

8

7 10

Counter subvector of f1 Counter subvector of f2 Counter subvector of f3

Figure 5.1 Counter vector and subvectors

5.1.4.2

Querying Phase

In this phase, given a latency measurement query of a flow f , which contains the flow ID, the
starting and ending time of the flow, and two observation points that the flow passed through
within the time frame, COLATE first finds all the counter epochs whose time frame overlaps
with the starting and ending time of the flow f at each of the two given observation points.
111

Second, for each counter epoch, COLATE applies statistical techniques to estimate the sum
of time stamps contributed only by flow f from each counter in the counter subvector of f .
COLATE uses these extracted values to estimate the average and standard deviation of the
latencies experienced by the packets of flow f in passing through the two observation points.
COLATE requires the clocks of different observation points to be accurately synchronized;
otherwise, the measured latencies will contain a constant offset. This synchronization can be
simply achieved by the standard time synchronization protocol IEEE 1588, which provides
microsecond level time synchronization [10].
Key Intuition: The intuition behind mapping a flow to multiple counters (in a counter
subvector), instead of a single counter, is to mitigate counter overflow for elephant flows.
The motivation behind sharing counters among multiple flows, instead of allocating unique
counters for each individual flow, is to save memory. Due to the sharing of counters among
multiple flows, the counter subvector of a flow f contains not only the timing information of
the packets in f but also that of the packets in other flows. The intuition behind allowing
this “mixing” is that later in the querying phase, we can extract the timing information of
only the packets in the flow f using statistical techniques by treating the mixed-in timing
information of the packets from other flows as noise.
Deployability: COLATE is designed to be efficiently implementable on network middleboxes (such as routers and switches) from both processing overhead and storage space perspectives. In terms of processing overhead, COLATE performs only one hash and one memory update per packet. On traditional memory architecture, one memory update requires
two memory accesses (i.e., one read and one write); however, in modern memory architecture
used in high speed routers (such as the smart memory architecture developed by Huawei [77]
and the bandwidth engine developed by MoSys [80]), where each memory location has builtin circuitry for handling updates on site, one memory update (such as incrementing by up to
a 64-bit number) requires only one memory access. In terms of storage space, COLATE uses
less than 0.1 bit per packet. To get an idea of the length of time for which COLATE can

112

accumulate time stamps using commodity permanent storage devices, consider the 10GigE
backbone link in San Jose monitored by CAIDA. At the time of writing this chapter, an
interactive tool at CAIDA’s website reported that on average approximately 0.484 million
packets traversed this link per second between Nov. 01 and Nov. 29, 2013 [3]. With 0.1
bits per packet, on a commodity 256GB SSD, COLATE can accumulate time stamps of
packets traversing this link for about 1.5 years, which means that a network operator can
measure average and standard deviation of up to 1.5 years old flows. This gives not only
enough time to identify and debug any problems, but also enough information to study other
aspects related to packet delays such as diurnal patterns in flow latencies. On a 12TB SSD,
as recently showcased by OCZ at CES-2012 [20], COLATE can accumulate time stamps of
packets traversing this link for more than 69 years.
Packet Losses: COLATE, as described above, assumes no packet losses. However, to
handle packet losses in COLATE, we can easily adapt the strategy proposed in [63] for
handling packet losses in the aggregate latency measurement scheme LDA. According to
this strategy, instead of maintaining a single counter vector, COLATE can maintain a set
of counter vectors at each observation point, where each counter vector has a sampling
probability and a packet counter associated with it. The sum of the sampling probabilities
of all counter vectors is 1. The sampling probability of a counter vector is the fraction of
packets whose time stamps COLATE will add to this counter vector. The packet counter of
a counter vector keeps track of the number of packets whose time stamps have been added
to this counter vector. In the recording phase, when a packet arrives at or departs from
an observation point, COLATE first uses a hash function to map the packet to a counter
vector such that in the long run, the fraction of packets that it maps to any counter vector
is equal to its sampling probability. Then, COLATE adds the time stamp of the packet to
this vector as described earlier. Note that all observation points use the same hash function
to guarantee that the same packet is mapped to the same counter vector at all observation
points. In the querying phase, for a target flow between two observation points, COLATE

113

compares the packet counter of each counter vector at one observation point with that of the
corresponding counter vector at the other observation point. Then, COLATE selects those
two counter vectors at the two observation points that have the highest sampling probability
associated with them and equal values of packet counters. After this, COLATE follows the
procedure of the querying phase described earlier. For simplicity, in the rest of this chapter
in describing COLATE, we assume no packet losses.

5.1.5

Technical Challenges and Solutions

The first challenge is to denoise the recorded information to extract the sum of the time
stamps of the packets in the target flow from the counter subvector of the flow. To address
this challenge, we first show that for each counter in the counter subvector of the target flow,
the value contributed by the time stamps from the packets in the target flow and that from
the packets in other flows can both be modeled with binomial distributions. We then derive
an expression to calculate the expected value of each counter in the counter subvector of the
target flow. From this expression, we estimate the sum of the time stamps of all the packets
of the target flow. Using this estimate in conjunction with maximum likelihood estimation,
we extract the sum of the time stamps of the packets in the target flow from each counter
in the counter subvector.
The second challenge is to calculate the sum of the squares of the latencies of each packet
in a target flow, which is needed for calculating the standard deviation of packet latencies.
To address this challenge, we first use the time stamp sum extracted from counter subvectors
to construct a virtual deviation vector and then use this vector to estimate this sum of the
squares of the latencies.

5.1.6

Advantages of COLATE over Prior Art

COLATE brings forward the state-of-the-art in per-flow latency measurement on the following fronts: reliability, passiveness, scalability, memory, efficiency, and flexibility. For
114

reliability, COLATE takes the required reliability and confidence interval specified by network operators as input whereas existing schemes do not. For passiveness, COLATE neither
sends probe packets nor attaches time stamps to packets. For scalability, COLATE maintains
only one counter vector at each observation point regardless of how many other observation
points are sending and receiving packets from it; in contrast, LDA and RLI have to maintain
separate vectors of counters for each pair of sender and receiver. For memory, COLATE uses
less than 0.1 bit of storage space per packet, which is over 128 times improvement compared
to 12.8 bits of storage space per packet used by MAPLE. Due to this, on a commodity
256GB SSD, where COLATE can accumulate time stamps of packets traversing the San
Jose backbone link for about one and a half year, MAPLE can accumulate the time stamps
for only 4.1 days. For efficiency, COLATE performs only 1 hash and 1 memory update per
packet whereas MAPLE uses 9 hashes and 12 memory accesses per packet. For flexibility,
COLATE allows different observation points to allocate different amount of RAM based on
their available resources whereas LDA requires both the sender and receiver to allocate the
same amount of RAM.

5.2

Related Work

To the best of our knowledge, there are only two per-flow latency measurement schemes,
namely MAPLE[69] and RLI [68]. In MAPLE, for any packet passing through two observation points S and R, S attaches a time stamp to the packet and R calculates the latency
of the packet from S to R by subtracting the time stamp from its current time. To reduce space for storing latency values of all packets, MAPLE maps the calculated latency of
each packet to the closest value in a set of predetermined latency values. Thus, instead of
storing the latency of every packet, MAPLE only stores these predetermined latency values
and for each predetermined latency value MAPLE uses a Bloom filter to store all packets
mapped to it. To query the latency of a given packet, MAPLE first finds the predetermined

115

latency value that the packet was mapped to by querying Bloom filters and uses that value
as the estimated latency of the packet. Compared with COLATE, MAPLE falls short from
a few perspectives. First, MAPLE is based on a strong assumption that packet latencies
between two observation points are tightly clustered around a set of predetermined latency
values. We have not found any theoretical or empirical validation of this assumption in prior
literature. Second, MAPLE requires attaching time stamps to packets and thus has the
limitations we pointed out in Section 5.1.2 for time stamping based latency measurement
schemes. Furthermore, time stamping individual packets can consume up to 10% of the
available bandwidth [63]. Third, MAPLE has large memory overhead (12.8 bits/packet at
each observation point) and large processing overhead (9 hash computations and 12 memory
accesses per packet: 9 for hash functions, 2 for updating counters, and 1 for determining the
cluster a packet’s latency belongs to), whereas COLATE uses 0.1 bits/packet and performs
1 hash and 1 memory update per packet.
In RLI, for a flow passing through two observation points S and R, S inserts probe packets
with time stamps into the flow and R calculates the latency of each probe packet similarly to
MAPLE. To calculate the latency of the regular packets between two probe packets whose
latency has been calculated as l1 and l2 , R simply uses the straight line equation to calculate
the latency of these regular packets based on their arrival time and the latency of the two
probe packets. Compared with COLATE, RLI has the following limitations. First, RLI is
based on the strong assumption that packet latency between S and R increases or decreases
linearly in the time interval between receiving any two probe packets at R. When the time
interval between two probe packets is extremely small, this assumption may practically hold
but the extremely small time interval implies that the number of probe packets is extremely
large. For example, to achieve an accuracy of only 81%, RLI inserts, on average, 1 probe
packet every 4.78 regular packets. Furthermore, as we mentioned in Section 5.1.2, latency
measured with a large number of probe packets may significantly deviate from the real latency
when there are no probe packets. When the time interval between two probe packets is large,

116

this assumption does not make intuitive sense and may not hold in practice. Similarly, we
have not found any theoretical or empirical validation of this assumption in prior literature.
The other latency measurement schemes are LDA [63] and FineComb [70], which provide
aggregate, not per-flow, latency measurement between a sender and a receiver. In LDA, both
the sender and the receiver maintain several counter vectors where each element is a pair of
counters: time stamp counter for accumulating packet time stamps and packet counter for
counting the number of arriving/departing packets. Each vector has a sampling probability
and a sampling function. For each arriving or departing packet, LDA first maps the packet
to a counter vector such that in the long run, the fraction of packets that LDA maps to
any counter vector is equal to its sampling probability. It then randomly maps the packet
to a counter pair in this counter vector and adds the time stamp of the packet to the time
stamp counter and increments the packet counter by one. To obtain the aggregate latency
estimate between the sender and receiver, for each counter pair in each vector, LDA checks
whether they have the same packet counter value and selects all counter pairs that have the
same packet counter value for both the sender and receiver. Then, LDA can easily calculate
the total number of successfully delivered packets and the sum of their time stamps at both
sides. Finally, to obtain the aggregate average latency between the sender and receiver, it
subtracts the sum of time stamps at the sender side from that at the receiver side and divides
it with the total number of successfully delivered packets.

5.3

COLATE – Recording Phase

In this section, we present the recording phase and the statistical modeling of COLATE.

5.3.1

Noisy Accumulation of Time Stamps

At each observation point X, COLATE maintains a vector CX of n counters where each
counter CX [i] (1 ≤ i ≤ n) has b bits with initial value 0. When a packet arrives at or
117

departs from an observation point X, COLATE extracts its flow ID f , chooses a random
number j from a uniform distribution in the range [1, m] where m << n, calculates the
hash function H(f, j) whose output is uniformly distributed in range [1, n], and adds the
time stamp of this packet (i.e., the current time at observation point X) to the counter
CX [H(f, j)]. Thus, the time stamp of all packets in flow f will be uniformly distributed
to m counters: CX [H(f, 1)], CX [H(f, 2)], · · · , CX [H(f, m)]. These m counters constitute
f

f

the counter subvector of flow f , which is denoted by SX where SX [j] = CX [H(f, j)] for
j ∈ [1, m]. In this recording phase, for each packet, COLATE performs one memory update
to update the value of CX [H(f, j)]. For different flows, the probability that COLATE maps
−62 for n = 10000 and m = 20, for
them to the same counter subvector is nm!
m (=2.4 × 10

example), which is practically zero. Note that an observation point is a port of a middlebox,
not a middlebox itself, because the arriving and departing time of a packet at a middlebox
are different due to the non-negligible packet processing time within the middlebox.

5.3.2

Analysis of Noisy Accumulation

First, by Lemma 14, we show that in any counter epoch, on average, a flow contributes the
same amount to each counter in its counter subvector. We further show that the amount
contributed by a flow to each counter in its counter subvector can be modeled by a binomial
distribution. Second, we derive expressions for the expected value and variance of a counter
in counter vector in Theorem 15. In Section 5.4, we use the expression of expected value
to estimate the average and standard deviation of packet latencies for any given flow. In
Section 5.5, we use the expression of variance to determine the parameter values that can
ensure that the actual reliability achieved by COLATE is no less than the required reliability.
Lemma 14. Let Cf be the random variable representing the sum of time stamps contributed
f

by flow f to a counter SX [j], 1 ≤ j ≤ m, in its counter subvector of length m at observation
point X. Let pf be the number of packets in flow f that contributed time stamps to the
current counter epoch CX . Let Tf be an independent random variable representing the time
118

p ×E[T ]
stamp contributed by each packet of flow f to CX . Then, we have E[Cf ] = f m f .

Proof. Let Pf,j be a random variable representing the number of packets in flow f that
f

contributed time stamps to counter SX [j]. Therefore, E[Cf ] = E

Pf,j

Tf . Applying

Wald’s Lemma, we get E[Cf ] = E[Pf,j ] × E[Tf ]. As the output of hash function H is
uniformly distributed in range [1, n], its output is also uniformly distributed in [1, m] for the
packets in flow f . Thus, the probability that COLATE adds the time stamp of a packet of the
f

flow f to counter SX [j] is 1/m. The random variable Pf,j follows the binomial distribution
p
i.e., Pf,j ∼ Binom(pf , 1/m). Therefore, E[Cf ] = mf × E[Tf ].

Figure 5.2 plots the CDF of the ratio of the observed values of Cf from simulations of our
ICSI traffic trace to the E[Cf ] from Lemma 14. We observe from this figure that there is
a steep rise in the CDF when the value of this ratio is around 1. We also observe from the
simulations that the mean and median of the ratio are both equal to 1. This empirically
establishes the result in Lemma 14.
1

CDF

0.8
0.6
0.4
0.2
0
0

0.5
1
1.5
Ratio of observed Cf to E[Cf]

2

C

Figure 5.2 CDF of observed E[Cf ]
f
Lemma 14 shows that the sum of time stamps of all packets of flow f is equally divided among all counters in its counter subvector, which conforms to binomial distribution.

Thus, we approximate the distribution of Cf with a binomial distribution as

Cf ∼ Binom(tf,X , 1/m), where tf,X represents sum of all times-stamps contributed by flow
f to the counters in its counter subvector at observation point X. Let Cr be the random
variable representing the sum of time stamps contributed by packets of all flows other than
119

f

f to counter SX [j]. Using similar reasoning as for Cf , we approximate the distribution of
Cr with a binomial distribution. The probability that a packet of a flow f¯ = f contributes
f

a time stamp to counter SX [j] is the product of the probability that H maps the packet to
f

f¯

f

f

SX [j] given that SX [j] ∈ SX , which is 1/m, and the probability that counter SX [j] is in the
f
f¯
counter subvector of f¯, which is denoted by P {S [j] ∈ S } and calculated as
X

=1− 1−

1 m
1 0
1−
n
n

m
0

f¯

f

P {SX [j] ∈ SX } = 1 −

X

m (m)(m − 1)
+
− ...
n
n2 × 2!

≈

m
n

∵ m << n
(5.1)

Thus, the probability that a packet of a flow f¯ = f contributes a time stamp to counter
f

1 × m = 1 . Thus, C ∼ Binom(t − t
SX [j] is m
r
X
f,X , 1/n).
n
n
f

Theorem 15. Let C be the random variable representing the value of a counter SX [j],
1 ≤ j ≤ m, in the counter subvector of a flow f . Let tf,X be the sum of all time stamps
contributed by packets of flow f to all counters in its counter subvector and tX be the sum
of all time stamps contributed by packets of all flows in the counter epoch at observation
point X. Let Cf ∼ Binom(tf,X , 1/m) represent the sum of time stamps contributed by
f

packets of flow f to SX [j] and let Cr ∼ Binom(tX − tf,X , 1/n) represent the sum of time
f

stamps contributed by packets of all flows other than f to counter SX [j]. Let Cf and Cr be
independent of each other. The expected value and variance of C are calculated as follows
E[C] =
Var(C) =

tf,X
m

tf,X
tX − tf,X
+
m
n

1−

1
m

+

tX − tf,X
n

(5.2)
1−

1
n

(5.3)

t
tX −tf,X
. As Cf and Cr
Proof. As C = Cf + Cr , we get, E[C] = E[Cf ] + E[Cr ] = f,X
m +
n

are assumed to be independent, Var(C) = Var(Cf ) + Var(Cr ). As variance of Binom(v, w)
is vw(1 − w), we get Var(Cf ) =

tf,X
m

1 and Var(C ) = tX −tf,X
1− m
r
n

120

1 − n1 .

In Theorem 15, we assumed Cf and Cr to be independent of each other, which is true
whenever tf,X << tX . In practice tf,X indeed is much smaller than tX because tf,X is the
sum of the time stamps added by a single flow in the counter epoch while tX is the sum
of the time stamps added by all flows (in the order of tens and hundreds of thousands).
Furthermore, in Theorem 15, we approximate the distributions of Cf and Cr with binomial
distributions because when there are a large number of packets that contribute time stamps
to the counters in the counter vector, we can approximate each time stamp to be smeared
over the counters. This approximation makes the formal development of variance of C and
its subsequent use in calculating parameters for COLATE tractable. If the exact equation
for variance of C is desired, it can be obtained as follows. Consider any packet with time
stamp tp not belonging to a flow f . This packet has a probability n1 of being mapped to a
counter in the counter subvector of the flow f . Thus, the time stamp for each such packet
will contribute a variance of t2p ( n1 )(1− n1 ) to the overall variance of C. In Theorem 15, tf,X /m
models the average sum of the time stamps contributed by flow f , and (tX −tf,X )/m models
f

the average noise contributed by all other flows to counter SX [j].

5.4

COLATE – Querying Phase

In this section, we present the methods that COLATE uses to estimate the average and
standard deviation of the latencies of the packets of a flow in passing any two points.

5.4.1

Estimating Latency Average

For a flow f passing through observation point S and then observation point R, we want to
calculate µ
˜f , the estimate of average latency µf of flow f . For a packet z in flow f , let uf,S [z]
be its time stamp at observation point S and uf,R [z] be its time stamp at observation point
R. The delay experienced by this packet in traveling from S to R is thus uf,R [z] − uf,S [z].
Let tf,S and tf,R be the sums of all time stamps of the packets in flow f at S and R,

121

respectively. Recall that pf denotes the number of packets in f . Thus,
µf =

1
pf

1
=
pf

∀z
∀z

uf,R [z] − uf,S [z]
uf,R [z] −

uf,S [z] =
∀z

1
(t
− tf,S )
pf f,R

Note that the value of pf can be measured by tools such as NetFlow available on Cisco
routers or by schemes proposed in [74, 64]. Thus, to estimate the value of µf , we need
to obtain the estimated values t˜f,S and t˜f,R for tf,S and tf,R , respectively. Then, we can
calculate
µ˜f =

1 ˜
(t
− t˜f,S )
pf f,R

(5.4)

Theorem 16 shows how to obtain t˜f,X at X.
Theorem 16. Given a counter epoch CX of length n at observation point X where each
counter subvector is of length m, let tX denote the sum of all counters in CX , the estimate
t˜f,X of the sum of all time stamps of the packets in flow f is calculated as follows
t˜f,X =

1
n
n−m

m
j=1

f

SX [j] − mtX

(5.5)

Proof. Given CX and flow f , we can easily obtain the values of every counter in the counter
f

f

m
1
subvector SX of f . Thus, we can calculate E[C] as E[C] = m
j=1 SX [j]. Substituting
f
˜
˜
E[C] in Equation (5.2) by 1 m
j=1 S [j], replacing tf,X by tf,X , and solving for tf,X , gives
m

X

Equation (5.5).

5.4.2

Estimating Latency Standard Deviation

Let Df be the random variable representing the latency experienced by a packet in flow f .
The standard deviation of Df can be calculated by σ
˜f =
calculated as follows:

122

Var(Df ), where Var(Df ) can be

Var(Df ) = E[Df2 ] − E 2 [Df ]
=
=

1
pf
1
pf

∀z
∀z

uf,R [z] −uf,S [z]

2

uf,R [z] −uf,S [z]

2

1
−
pf

2

uf,R [z] −uf,S [z]

∀z

1
−
t
− tf,S
pf f,R

2

(5.6)

We can calculate the second term { p1 (tf,R − tf,S )}2 in Equation (5.6) based on Theorem
f
2
16. Now the key challenge is to calculate the first term { p1
∀z (uf,R [z] − uf,S [z]) } in
f

Equation (5.6). Our solution to this challenge is based on the statistical technique proposed
by Alon et al. in [25]. The main idea is to introduce a random variable Gz where the value gz
that this random variable takes on is either +1 or −1 with equal probability. Before adding
the time stamp of a packet z to a counter, if we randomly multiply the time stamp with gz
and then add it to the counter, then we will get Equation (5.7).
E
∀z

2

gz uf,R [z] − gz uf,S [z]

=
∀z

uf,R [z] − uf,S [z]

2

(5.7)

This can be proven as follows:
E
∀z

gz uf,R [z] − uf,S [z]

+
∀z=z

2

=E
∀z

gz2 uf,R [z] − uf,S [z]

2

gz gz uf,R [z] − uf,S [z] uf,R [z] − uf,S [z]

Using the well known result that expectation of sum of random variables is the sum of their
individual expectations and that gz2 = 1, we get:

=
∀z

uf,R [z] − uf,S [z]

2

+
∀z=z

uf,R [z] − uf,S [z] uf,R [z] − uf,S [z] × E[Gz Gz ]

Note that {G1 , G2 , G3 , . . . } is a set of independent and identically distributed random variables. So, E[Gz Gz ] = E[Gz ] × E[Gz ]. As E[Gz ] = 0 for all values of z, this implies
123

E[Gz Gz ] = 0. Thus, the second term in the equation above is 0, which proves Equation
(5.7).
We now present our method for calculating the first term in Equation (5.6). First, for each
counter in the counter subvector of f , whose value is contributed by the time stamps of the
packets in flow f and those of the packets in other flows, we use Theorem 17 to extract an
estimate of the value that is contributed by only the time stamps of the packets in f . In other
words, we eliminate the noise introduced by the packets of flows other than f from the counter
subvector of f . Second, we statistically simulate the process of multiplying the time stamp of
a packet in f with the random variable Gz and then adding the multiplication result to the
corresponding counter in the counter subvector of f . By repeating this process a statistically
sufficient number of times, we obtain an accurate estimate of

∀z (uf,R [z] − uf,S [z])

2

based

on Theorem 18. Note that this simulation does not require any changes to the recording
phase. Next, we present our denoising solution and statistical simulation process.
5.4.2.1

Denoising Counter Subvectors

Our denoising solution is based on Theorem 17. The numerical solution of Equation (5.8)
gives us the estimate of the value that is contributed by only the time stamps of the packets
in f for each counter in f ’s subvector.
Theorem 17. Let wf,X [j] (1 ≤ j ≤ m) be the sum of the time stamps of flow f ’s packets

f
that are mapped to counter SX [j] at observation point X. Let t˜f,X be the estimate of sum

of all time stamps contributed by packets of flow f to all counters in f ’s counter subvector
and tX be the sum of all time stamps contributed by packets of all flows in the counter epoch
at observation point X. The maximum likelihood estimate w˜f,X [j] of wf,X [j] satisfies the
following equation:
f
f
rCl ln {n − 1} = ψ (0) tX − t˜f,X − SX [j] + w˜f,X [j] + 1 − ψ (0) SX [j] − w˜f,X [j] + 1

where ψ (0) {.} is the 0th order polygamma function.
124

(5.8)

Proof. The maximum likelihood estimate of wf,X [j] is the value of wf,X [j] that maximizes
f

the probability that the counter SX [j] takes the observed value.
f

arg max P C = SX [j] wf,X [j]
w
[j]
f,X

f

This value of wf,X [j] can be obtained by differentiating P C = SX [j] wf,X [j] w.r.t wf,X [j]
and equating to 0.
d

f

d(wf,X

P C = SX [j] wf,X [j] = 0
[j])

As C = Cf + Cr and Cf = wf,X [j], the L.H.S. becomes
d

f

d(wf,X [j])

P Cr = SX [j] − wf,X [j]

f

For simplicity, let ξ = SX [j] − wf,X [j] and τ = tX − tf,X . As Cr is a binomial random
variable, this derivative further becomes
d

τ
d(wf,X [j]) ξ

1 ξ
1 τ −ξ
=
1−
n
n

1 ξ
1 τ −ξ
1−
n
n

τ
ξ

× ψ (0) {ξ + 1} − ψ (0) {τ − ξ + 1} + ln 1 −

1
n

− ln

1
n

(5.9)

Due to space limitations, we have skipped the intermediate derivation steps, which use the
following identity:
d v
dw w

=

v
w

ψ (0) {v − w + 1} − ψ (0) {w + 1}

By replacing tf,X with t˜f,X , which is calculated using Theorem 16, in τ = tX − tf,X
and further in the R.H.S of Equation (5.9) and equating it to zero, we obtain the maximum
likelihood estimate w˜f,X [j] of wf,X [j]. As τξ

1 ξ
n

τ −ξ
f
1 − n1
is P Cf = SX [j]|wf,X [j] ,

which is not equal to zero, we have

ψ (0) {ξ + 1} − ψ (0) {τ − ξ + 1} + ln 1 −
Simplifying the ln{.} terms results in Equation (5.8).
125

1
n

− ln

1
n

=0

5.4.2.2

Statistical Simulations

We have obtained the m values extracted using Theorem 17 from f ’s counter subvector at
observation point X. For flow f that passes observation point X, each unique permutation
of the m distinct integers from 1 to m, denoted by vector Q, defines a unique deviation vector
vf,X of size m/2 as follows. To ensure that m/2 is an integer, we choose m to be an even
number.
vf,X [l] = w˜f,X Q[2l] − w˜f,X Q[2l − 1]

1 ≤ l ≤ m/2

(5.10)

Each unique permutation of the m distinct integers from 1 to m is essentially a unique
simulation of the aforementioned statistical process of multiplying half the time stamps with
Gz = +1 and the other half with Gz = −1. Theorem 18 gives us the way to estimate
∀z (uf,R [z] − uf,S [z])

2,

which is needed for calculating Var(Df ) based on Equation (5.6).

Theorem 18. Given any two observation points S and R, for any permutation Q of the m
distinct integers from 1 to m, let vf,S and vf,R be the corresponding deviation vectors of flow
f at observation points S and R, respectively. The following equation holds:

∀z

uf,R [z] − uf,S [z]

2

m
2

=E
l=1

(vf,R [l] − vf,S [l])2

(5.11)

Proof. Let YlS be the set of all the time stamps contributed by the packets of flow f to
f

f

counters S S [Q[2l]] and S S [Q[2l − 1]]. Similarly, let YlR be the set of all the time stamps
f

f

contributed by the packets of flow f to counters S R [Q[2l]] and S R [Q[2l − 1]]. Let ylS [i] be
the i-th element of YlS , where 1 ≤ i ≤ |YlS |. Similarly, let ylR [i] be the i-th element of YlR ,
where 1 ≤ i ≤ |YlR|. Starting from the R.H.S of Equation (5.11), we have:

126

m
2

=
l=1
m
2

=

=

E (vf,R [l] − vf,S
E

|YlR |

i=1
l=1
m
2 |YlR |
l=1 i=1

=
∀z

m
2

[l])2

=

|YlR |

E

i=1

l=1

(gi .ylR [i] − gi .ylS [i])

2

gi .ylR [i] −

|YlS |

gi .ylS [i]

2

i=1

∵ |YlR| = |YlS |

2

ylR[i] − ylS [i] , using Equation (5.7)

uf,R [z] − uf,S [z]

2

The last equality follows from the fact that each ylX [i] is actually the value of some time
stamp uf,X [z] at observation point X.
For any permutation Q of the m distinct integers from 1 to m, we calculate vf,R [l] and
m
2
l=1(vf,R [l]

vf,S [l] for each 1 ≤ l ≤ m/2 based on Equation (5.10), and then calculate
vf,S [l])2 , which is one estimate of
m
2
2
l=1 (vf,R [l] − vf,S [l])

∀z

uf,R [z] − uf,S [z]

2

−

according to Theorem 18. As

is a random variable, its variance can be reduced by γ times if we

repeat the above process γ times using a different random permutation Q each time and
use the average of the γ values of
2

m
2
2
l=1 (vf,R [l] − vf,S [l])

as the estimate of

∀z

uf,R [z] −

uf,S [z] . As the R.H.S. of Equation (5.11) is an expected value, to get an accurate estimate
of

2

∀z uf,R [z] − uf,S [z] , we need to calculate

m
2
2
l=1(vf,R [l] − vf,S [l])

for a statistically

sufficient number of unique permutations of the m distinct integers from 1 to m. The process
of calculating

m
2
2
l=1(vf,R [l]−vf,S [l])

for different permutations of Q is essentially simulating

the aforementioned random statistical process of multiplying the time stamp of each packet
with random variable Gz that takes the value of +1 and −1 with equal probability without
having to perform this process in the recording phase. We name this process of calculating
m
2
2
l=1 (vf,R [l] − vf,S [l])

using different permutations of Q as virtual repetitions. The number

of distinct ways in which we can repeat this process is

m
m−2(i−1)
2
i=1
2

, which is large

enough for us to obtain any required reliability α for estimating the standard deviation. For
example, when m = 20,

m
m−2(i−1)
2
i=1
2

= 2.38 × 1015 .

127

5.4.2.3

Steps of Estimating Standard Deviation

To summarize, COLATE performs the following six steps to estimate the standard deviation
of the latencies that the packets in flow f experienced in traversing from observation points
S to R. (1) Obtain the number of packets in flow f , denoted by pf , using NetFlow or the
schemes proposed in [74, 64]. (2) Obtain the estimates of tf,S and tf,R , which are the sum
of the time stamps of all packets in flow f at observation points S and R respectively, using
Theorem 16. (3) Extract the values of wf,S [j] and wf,R [j], which are the sum of the time
f

stamps of flow f ’s packets that are mapped to counter S S [j] at observation point S and
f

to counter S R [j] at observation point R, respectively, for all 1 ≤ j ≤ m, using Theorem
17. (4) Randomly choose γ permutations of the m distinct integers from 1 to m. For
each permutation Q, first calculate vf,R [l] and vf,S [l] for all 1 ≤ l ≤ m/2 using Equation
(5.10) and then calculate
of

m
2
l=1(vf,R [l]

m
2
2
l=1(vf,R [l] − vf,S [l]) .

(5) Calculate the average of the γ values

− vf,S [l])2 , which is the estimated value of

∀z

uf,R [z] − uf,S [z]

2

(6)

Estimate of the standard deviation using Equation (5.6).

5.5

COLATE – Reliability

COLATE has four parameters: (1) the total number of counters denoted by n, (2) the
number of counters in each counter subvector denoted by m, (3) the number of bits in each
counter denoted by b, and (4) the vector threshold denoted by T . Note that when the sum
of all n counters in a counter vector reaches T , COLATE dumps the counter vector into
permanent storage as a counter epoch and then resets all counter values to be zero. In this
section, we present solutions to find the values for these parameters so that our estimated
average latency achieves the required reliability α ∈ [0, 1) for the given confidence interval
β ∈ (0, 1]. Note that for standard deviation, we have already presented a method in Section
5.4 that can achieve arbitrarily high required reliability. Recall µ
˜f = p1 (t˜f,R − t˜f,S ) (in
f
Equation (5.4)), which shows that the estimate µ
˜f depends on two other estimates t˜f,S and

128

t˜f,R . Next, we find the confidence interval B and required reliability A that the estimate
t˜f,X for tf,X at each observation point X must satisfy so that the estimate µ
˜f for µf satisfies
the confidence interval β and required reliability α. That is, we want to find the values of
B and A so that if for every observation point X we have P |t˜f,X − tf,X | ≤ Btf,X ≥ A,
then we will have P |˜
µf − µf | ≤ βµf

≥ α. After we find the values for B and A, we

present a solution to calculate the optimal values of the four parameters n, m, b, and T .

5.5.1

Individual Reliability Requirements

Individual Required Reliability: The maximum fraction of estimated values µ
˜f that
can violate the requirement of |˜
µf − µf | ≤ βµf , while the overall estimate still satisfies the
required reliability α, is 1 − α. Thus, the maximum fraction of estimates t˜f,X at either
observation points of S and R that can violate the requirement |t˜f,X − tf,X | ≤ Btf,X must
be no greater than (1 − α)/2. Thus,
A = 1 − (1 − α)/2 = (1 + α)/2

(5.12)

Individual Confidence Interval: The estimate µ
˜f obtained by COLATE needs to
satisfy the requirement of |˜
µf − µf | ≤ βµf with probability of at least α. As µ
˜ f = p1 (t˜f,R −
f
1
t˜f,S ) and µf = p (tf,R −tf,S ), the confidence interval requirement |˜
µf −µf | ≤ βµf becomes:
f

(t˜f,R − tf,R ) − (t˜f,S − tf,S ) = (t˜f,R − t˜f,S ) − (tf,R − tf,S ) ≤ β(tf,R − tf,S )
The largest value of (t˜f,R − tf,R ) − (t˜f,S − tf,S ) is Btf,R + Btf,S , which must be no greater
than β(tf,R − tf,S ).
Btf,R + Btf,S ≤ β(tf,R − tf,S )
Thus, we get
B≤β

tf,R − tf,S
tf,R + tf,S

(5.13)

To determine B for a given network, we conduct measurement of (tf,R − tf,S )/(tf,R + tf,S )
on the network to find the appropriate value so that Equation (5.13) statistically holds.
129

5.5.2

Reliability Centered Parameter Selection

As we have four unknown parameters (i.e., n, m, b, and T ), we need at least four equations
so that we can calculate the values of these parameters by solving the four equations. Next
we develop these four equations.
Equation 1: Let M be the total number of bits of the RAM that an observation point
can allocate for storing the counter vector, which requires n × b bits. Thus,
n×b= M

(5.14)

Equation 2: Based on Lemma 14, the sum of all the time stamps of all packets of all
flows is equally divided among all n counters on average. Thus, the value of each counter
on average can go up to T /n. Thus, the number of bits in each counter, which is b, needs to
satisfy the following equation.
b = log2

T
n

+1

(5.15)

Note that we add 1 in the R.H.S of this equation to double the capacity of each counter to
avoid overflows.
Equation 3: As the expected value of a counter in a counter subvector, which is specified
in Equation (5.2), should never exceed the maximum capacity of the counter, we have
tf,X
T − tf,X
+
≤ 2b − 1
m
n
Let tmax
f,X be the maximum value of tf,X for all flows on a network. Thus, the value of b
should satisfy the following equation:
tmax
f,X

T − tmax
f,X
+
= 2b − 1
m
n

(5.16)

Here tmax
f,X can be obtained by some measurement on the sum of the time stamps of all
packets on a per-flow basis on the given network.
Equation 4: To achieve the required reliability, P |t˜f,X − tf,X | ≤ Btf,X

should at

least be equal to its lower bound A, i.e.,
P (1 − B)tf,X ≤ t˜f,X ≤ (1 + B)tf,X = A
130

(5.17)

Based on Equation (5.2), we can represent E[C] as a function of tf,X ; denoting this function
by g, we have E[C] = g{tf,X }. Thus, tf,X = g −1{E[C]}. Let C˜ be the observed value of
˜ Equation (5.17) becomes
E[C]. Then, we have t˜f,X = g −1{C}.
˜ ≤ (1 + B)tf,X
A = P (1 − B)tf,X ≤ g −1 {C}
= P g (1 − B)tf,X ≤ C˜ ≤ g (1 + B)tf,X

(5.18)

Similarly, based on Equation (5.3), we can represent standard deviation of C as a function
of tf,X ; denoting this function by h, we have Var(C) = h2 {tf,X }. Based on the fact that the
variance of a random variable reduces by m times if the random event is repeated m times,
by observing the values of C from m counters, the variance of C becomes
the standard deviation of C becomes

h tf,X
√
m

. Let Z denote

˜
C−g
tf,X
√ .
h tf,X / m

h2 tf,X
m

and

Thus, Equation

(5.18) becomes
P

g (1 + B)tf,X −g tf,X
g (1 − B)tf,X −g tf,X
≤Z≤
√
√
h tf,X / m
h tf,X / m

=A
(5.19)

By the central limit theorem, Z approximates a standard normal random variable. The
area under the standard normal curve gives the success probability, which is the required
reliability in our context. As our confidence interval requirement is symmetric on both the
upper and lower sides of tf,X , we can represent the required reliability A in terms of a
constant k as follows:
P {−k ≤ Z ≤ k} = A

(5.20)

Let Φ be the cumulative distribution function (CDF) of a standard normal distribution and
erf {.} be the standard error function, we get
P {−k ≤ Z ≤ k} = Φ(k) − Φ(−k) = erf
From Equations (5.20) and (5.21), we get
k=

√

2 erf−1 {A}
131

k
√
2

(5.21)

We observe that the absolute values of the upper and lower bounds of Z in Equation (5.19)
are the same. Thus, equating the lower bound with −k or the upper bound with k results
in the following equation:

B 2 t2f,X =

k 2 n2 m
n−m

tf,X
m

1−

T − tf,X
1
+
m
n

1−

1
n

(5.22)

By rearranging Equation (5.22), we get.
k 2 n2 m
1
1
1
1−
−
tf,X n − m
m
m
n
2
2
1 k n m 1
1
+ 2
1−
T
n
tf,X n − m n

B2 =

1

1−

1
n

This equation shows that B is inversely proportional to tf,X when other parameters are
fixed. This makes intuitive sense because the more packets in flow f in passing observation
point X (i.e., the larger tf,X is), the more timing information we can obtain from the packets,
and the smaller confidence interval can be achieved. Thus, we should use the statistically
minimum observable value of tf,X , denoted tmin
f,X , for the given network, in Equation (5.22).
Here tmin
f,X can be obtained by some measurement on the sum of the time stamps of all
packets on a per-flow basis on the given network. The parameter values obtained using tmin
f,X
in Equation (5.22) will ensure that the estimates for all flows whose sum of time stamps
˜
are ≥ tmin
f,X satisfy P |tf,X − tf,X | ≤ Btf,X

≥ A. By replacing tf,X by tmin
f,X in Equation

(5.22), we get
2
B 2 tmin
f,X

k 2 n2 m
=
n−m

tmin
f,X
m

T − tmin
1
f,X
1−
+
m
n

1−

1
n

(5.23)

max
Solving Equations: COLATE takes M, α, β, tmin
f,X , and tf,X as input. The values of

required reliability α and confidence interval β are provided by network operators. The value
of RAM space M depends on the amount of RAM available at an observation point. For
max
tmin
f,X and tf,X , network operators can obtain them by measurements on targeted flows in
max
the given network. With the values of M, α, β, tmin
f,X , and tf,X , COLATE simultaneously

solves the four equations (i.e., (5.14), (5.15), (5.16), and (5.23)) to obtain the appropriate
132

values of the four parameters n, m, b, and T . To simultaneously solve these equations, we
express m, b, and T in terms of n using Equations (5.14), (5.15), and (5.16) and replace them
in Equation (5.23). This results in an expression with only one unknown parameter n. We
numerically solve this expression to obtain n and then the other three unknown parameters.
2.4

100

2.2

90

2

80

1.8

T

Permanent Storage (KB)

15

110

70

1.6

60

1.4

50 −2
10

−1

10

0

10
M (KB)

1

10

x 10

1.2 −2
10

2

10

Figure 5.3 Permanent storage vs. RAM

−1

10

0

10
M (KB)

1

10

2

10

Figure 5.4 Threshold T vs. RAM size

RAM Space vs. Storage Space: From the simultaneous solution of these four equations, we have an interesting observation that COLATE requires smaller amount of permanent storage space for storing counter epochs when it is allocated with more RAM for storing
counter vectors. Figure 5.3 plots an example graph of the permanent storage size vs. RAM
size for COLATE.
While this observation seems surprising, it makes intuitive sense because the sum of the
two maximum values of two b-bit numbers is 2 × 2b = 2b+1 whereas the maximum value of a
2b-bit number is 22b >> 2b+1 . As we increase the total number of bits in a counter vector,
i.e., M, the counter value threshold T increases as shown in Figure 5.4. This implies that
the frequency of writing the counter vector into permanent storage is reduced, and although
each counter epoch takes more space, the overall required storage space is reduced as shown
in Figure 5.3. The M value at the knee of the curve in Figure 5.3 represents the best tradeoff
point between RAM space and permanent storage space.

133

5.5.3

Flexibility in Parameter Selection

If we only measure average per-flow latency, each observation point can choose its own values
for the parameters of n, m, b, and T based on its available resources and traffic condition
without global coordination among observation points. If we also need to measure standard
deviation, then all observation points need to use the same value of m because Theorem 18
requires that the two sets YlS and YlR contain the time stamps from the same set of packets
at the two observation points, which is possible only when the value of m is the same at
both observation points. The remaining three parameters can still be chosen independently
at each observation point.

5.6

Performance Evaluation

We implemented COLATE in Matlab. We also implemented RLI [68] in Matlab for comparison purposes. We did not implement LDA [63] and MAPLE [69] because LDA can not
provide per-flow latency measurement and MAPLE requires attaching time stamps to every
packet i.e., MAPLE is not a latency estimation scheme but a storage scheme. In this section,
we present our evaluation results of COLATE in comparison with RLI. We first give details
of the three network traces that we used. Second, we evaluate the accuracy of COLATE
as well as the impact of RAM space on permanent storage space used by COLATE. Last,
we compare COLATE with RLI and with Count-Min (CM) sketch [43], a summarizing data
structure for queries on data streams.

5.6.1

Network Traces

To evaluate COLATE, we need real packet traces with high-resolution time stamps collected
simultaneously from at least two observation points. Unfortunately, no such traces are
publicly available. Thus, we resort to three real network traces where each is collected at
a single observation point at a time. These traces include CHIC [4], ICSI [88], and DC

134

[32]. CHIC is a backbone header trace, published by CAIDA, which includes the arrival
times of packets at a 10GigE link interface. We used traces generated from 5 minutes of
packet capture. Note that the authors of RLI and MAPLE also evaluated their schemes on
the header trace of this backbone link collected by CAIDA. ICSI is an enterprise network
traffic trace, collected at a medium-sized site, which includes the arrival times of packets
on an ethernet link for a duration of over 41.1 hours. ICSI is available in the form of 41
trace files collected at 17 different ports in an enterprise network. DC is a data center traffic
trace, collected at a university data center, which includes the arrival times of packets on an
ethernet link for a duration of a little more than an hour. DC is available in the form of 20
trace files collected at the same port. Figure 5.5 shows the CDFs of sizes of flows in each
trace. We observe that the traces contain both mice flows as well as elephant flows. Table
5.1 reports the total duration, number of packets, number of flows, and average data rate of
each trace.
1

CDF

0.8
0.6

CHIC
ICSI
DC

0.4
0.2
0 0
10

1

2

3

10
10
10
Flow Size (Packets)

4

10

Figure 5.5 Flow sizes CDFs

As these network traces contain only the arrival time stamps of packets, we adopt the
simulation strategy used by RLI and MAPLE, which simulates the traversal of packets in
each trace through a queue to get a departure time stamp for each packet and uses random
early detection (RED) [48] as the queue management strategy because RED is most popular.
As modern routers typically use a queue size that can hold 0.5 seconds of traffic at their
maximum line rates, we also use the same queue size. For the remaining parameters of RED
queue management strategy, we use minth = 0.475×queue size, maxth = 0.95×queue size,
135

1 as per the guidelines in [48].
wq = 0.002, and maxp = 50

Table 5.1 Summary of network traces
Trace
CHIC
ICSI
DC

5.6.2

Duration
5 mins
41.1 hours
1.08 hours

# pkts
37.3M
46.9M
19.9M

# flows
3.01M
0.387M
0.439M

Mbps
411
1.31
49.6

COLATE Accuracy

Now we evaluate the accuracy of the average and standard deviation of the flows in the three
network traces estimated by COLATE.
5.6.2.1

Average Latency

We evaluated COLATE for both the scenario of only two observation points (where one
sender sends and one receiver receives) and that of more than two observation points (where
multiple senders send and multiple receivers receive). For the scenario with more than two
observation points, we choose three observation points forming a triangle topology where
everyone sends to and receives from everyone else. We choose three observation points and
the triangle topology for the sake of simplicity as the number of senders and receivers does
not affect the accuracy of COLATE. For a triangle topology, there are 6 unidirectional links.
We used the largest 6 of the 41 trace files from ICSI data set, each trace file representing
the traffic on one of the 6 links. This choice is arbitrary.
We performed our evaluation for three different accuracy requirements: low (α = 0.90,
β = 0.10), medium (α = 0.95, β = 0.05), and high (α = 0.99, β = 0.01). For each of these
three accuracy requirements, we evaluated COLATE using three values of available RAM:
small (M = 1MB), medium (M = 10MB), and large (M = 100MB). We obtained the values
max
of tmin
f,X and tf,X from simple measurement of the network traces. We calculated the values

of the parameters b, m, n, and T using the method described in Section 5.5. For example, for
136

M = 1MB, α = 0.95, and β = 0.05, typical values of these parameters are b = 19, m = 20,
n = 455334, and T = 8 × 1010 .
Our results show that COLATE always achieves the required reliability. Figures 5.6(a),
5.6(b), and 5.6(c) show the CDFs of the observed values of β i.e., |˜
µf −µf |/µf , for the average
latency estimated by COLATE for the three traces under the scenario of one sender and
one receiver using the low, medium, and high accuracy requirements, respectively. Figures
5.7(a), 5.7(b), and 5.7(c) show the CDFs of the observed values of β i.e., |˜
µf − µf |/µf , for
the average latency estimated by COLATE for the six links under the scenario of multiple
senders and receivers using the low, medium, and high accuracy requirements, respectively.
The horizontal line in each of these figures shows the required reliability. We see that every
plot of CDF always crosses the horizontal line for an observed value of β that is smaller than
the required confidence interval. This shows that COLATE always achieves the required
reliability. Due to lack of space, we only show plots for M = 10MB. Observations from

1

1

0.996

0.98

0.96

0.992

0.96

0.92

0.988
0.984
0.98
0

CHIC
ICSI
DC
0.002 0.004 0.006 0.008
Observed β

0.01

(a) α = 0.99, β = 0.01

CDF

1

CDF

CDF

M = 1MB and M = 100MB are the same.

0.94
CHIC
ICSI
DC

0.92
0.9
0

0.01

0.02
0.03
Observed β

0.04

0.05

(b) α = 0.95, β = 0.05

0.88
CHIC
ICSI
DC

0.84
0.8
0

0.02

0.04
0.06
Observed β

0.08

0.1

(c) α = 0.9, β = 0.1

Figure 5.6 CDF of observed β in average estimate (1-S, 1-R)

5.6.2.2

Standard Deviation

Our results show that the relative error in the standard deviation estimates of over 91%
flows is less than 0.05 with only 1000 virtual repetitions.

137

Relative error is defined as

1

0.996

0.98

0.96

0.992

0.96

0.92

0.988
0.984
0.98
0

0.002 0.004 0.006 0.008
Observed β

0.01

CDF

1

CDF

CDF

1

0.94

0.88

0.92

0.84

0.9
0

(a) α = 0.99, β = 0.01

0.01

0.02
0.03
Observed β

0.04

0.05

0.8
0

(b) α = 0.95, β = 0.05

0.02

0.04
0.06
Observed β

0.08

0.1

(c) α = 0.9, β = 0.1

Figure 5.7 CDF of observed β in average estimate (multiple S,R)
(actual value − estimated value)/actual value. Figure 5.8 plots the CDFs of the relative errors in standard deviation estimated by COLATE for each of the three traces.
Our results also show that the percentage of flows, for which the relative error is less than
0.05, increases with the increase in the number of virtual repetitions. Figure 5.9 plots this
percentage versus the number of virtual repetitions for the three traces. With 105 iterations,
this percentage reaches 98%. Although 105 iterations may take some time depending on the
available computing power, it is not an issue as the estimation of standard deviation is an

%age flows with < 5% rel. error

offline process and does not have to keep up with high line rates.
1

CDF

0.9
0.8
0.7
CHIC
ICSI
DC

0.6
0.5
0

0.02

0.04
0.06
Relative Error

0.08

0.1

1
0.95
0.9
0.85
0.8
0.75
0.7
1

10

2

3

4

10
10
10
# Virtual Repetitions

5

10

Figure 5.8 CDFs of relative errors in STD Figure 5.9 Rel. error in STD vs. # reps

5.6.3

RAM and Storage Size

Our results show that COLATE uses less than 0.1 bit of permanent storage per packet. Figure
5.10 shows the bar graph of the number of bits per packet used by each of the three traces
138

1
M=1

M=10

M=100

0.1

0.9

0.08

0.8

CDF

bits/packet

0.12

0.06
0.04

0.7
0.6

0.02

0.5
0

CHIC

ICSI

DC

Figure 5.10 Storage bits per packet

CHIC COLATE
ICSI RLI: 1/10 to 1/2
DC
RLI: 1/300 to 1/10
0.02 0.04 0.06 0.08
0.1
Observed β

Figure 5.11 Comparison of delay estimates

for all three values of M, when α = 0.99 and β = 0.01. This small size of storage required
per packet results in a very low frequency of transferring the counter vectors from RAM
to permanent storage. For example, for ICSI network trace, COLATE transfers a counter
vector to SSD every 24.6 hours when M = 1MB, α = 0.95, and β = 0.05. Transferring 1MB
of content from RAM to SSD once a day is trivial for modern devices. The number of bits
per packet in permanent storage decreases with the increase in RAM size M, which confirms
our analysis based on Figures 5.3 and 5.4. However, this decrease is hard to observe from
Figure 5.10 because the difference is small. Nevertheless, this difference becomes significant
for longer time durations (on the order of say days and weeks).

5.6.4

Comparison with RLI

Our results show that COLATE always achieves higher accuracy than RLI. RLI requires two
inputs, namely the lower and upper limits of the Probe packet Injection Rate (PIR). The
authors of RLI used the lower limit as 1 probe packet per 300 regular packets and the upper
limit as 1 probe packet per 10 regular packets in [68]. We first evaluated RLI using this
pair of PIR values. Because the accuracy of RLI increases as PIR increases, to improve the
estimation accuracy of RLI, although at the cost of larger bandwidth usage, we also evaluated
RLI using much higher PIRs – 1 probe packet per 10 regular packets as the lower limit and
1 probe packet per 2 regular packets as the upper limit. Figure 5.11 plots the CDFs of the
observed value of β in the average latency estimated by RLI in the three traces for these two

139

configurations of PIRs. For comparison, we also plot the CDF of the observed value of β in
the estimates obtained using COLATE for high accuracy requirement (α = 0.99, β = 0.01).
Note that the observed value of β is essentially the relative error in the estimated values of
average latency. Figure 5.11 shows that the relative error of COLATE is much smaller than
that of RLI. This relative error can be made arbitrarily small by specifying smaller values of
β and larger values of α. This figure further shows that the relative error of RLI is smaller
when PIR is larger. With the PIR proposed by the authors, on average, only 77% flows have
relative error less than 5%. At this rate, on average, RLI inserts one probe packet after 21.66
regular packets. With our high PIR configuration, on average, only 81% flows have a relative
error less than 5%, but at this rate, on average, RLI inserts one probe packet every 4.78
regular packets in the three traces. Table 5.2 shows the average number of regular packets
after which RLI inserts a probe packet for each of the three traces. In contrast, COLATE
does not insert any probe packet at the cost of a small amount of memory at observation
points.

Table 5.2 Average number of regular packets after which RLI inserts a probe packet
Trace
CHIC
ICSI
DC

5.6.5

# reg. pkts
1:300 to 1:10
18.66
17.19
245.7

# reg. pkts
1:10 to 1:2
10.0
2.97
9.06

Comparison with Count-Min Sketch

Count-Min (CM) sketches can theoretically be used for latency measurement but practically,
there is a fundamental limitation. Let tf represent the sum of time stamps of all packets
i
in flow fi and let there be j flows in total whose time stamps need to be stored for latency
measurements. We can use a CM-sketch to store time stamps of multiple flows and obtain the
estimate t˜fi of the sum of time stamps of any flow fi as per the method described in [43]. The
140

estimate t˜fi obtained through CM-sketch can satisfy the condition t˜fi ≤ tfi + B ×

∀j tfj

with probability α. This is problematic because we require the estimate to satisfy the
condition t˜f ≤ tf + B × tf with probability α. Therefore, to achieve the required reliability
i
i
i
using CM-sketch, we need to ensure that ∀j tf ≤ tf , which is possible if and only if
j

i

we do not add time stamps of any flow other than fi to the CM-sketch. Consequently we
need to maintain a CM-sketch for each flow, which results in large memory requirements.
For example, for α = 0.95 and β = 0.05, CM-sketch requires 165 counters per flow and 3
hash functions and 3 memory accesses per packet. Assuming the same counter size of 19
bits as for COLATE in this scenario, CM-sketch requires 165 × 19 = 3135 bits per flow. In
comparison, COLATE requires 0.1 bit per packet. The memory requirement of CM-sketch
matches that of COLATE only if each flow has at least 31350 packets, which is impractical
as seen in Figure 5.5. The number of memory accesses and the number of hash functions
per packet for CM-sktech are always greater than those for COLATE.

5.7

Conclusion

The key contribution of this chapter is in proposing an accurate and efficient per-flow latency
measurement scheme without packet probing and time stamping. The key novelty of this
work is that we purposely allow noise to be introduced in recording packet timing information for minimizing storage space and use statistical techniques to denoise the recorded
information to obtain accurate latency estimates when latency of a target flow is queried.
The key technical depth of this chapter is in the mathematical development of the estimation theory that our scheme is based upon. Our theoretical analysis and experimental
results show that our scheme always achieves the required reliability. Our scheme has a
much smaller processing overhead in terms of number of hash computations and memory
updates compared to existing schemes, which further require sending probe packets or attaching time stamps to every packet. Our scheme is scalable in that the amount of memory

141

required at each observation point is only dependent on the number of packets and not on
the number of sending and receiving observation points. The memory requirement is so low
that a commodity storage device can store time stamps of several years worth of flows.

142

6

User Security

6.1

Introduction

6.1.1

Motivation

Touch screens have revolutionized and dominated the user input technologies for mobile
computing devices (such as smart phones and tablets) because of high flexibility and good
usability. Mobile devices equipped with touch screens have become prevalent in our lives
with increasingly rich functionalities, enhanced computing power, and more storage capacity.
Many applications (such as email and banking) that we used to run on desktop computers
are now also being widely run on such devices. These devices therefore often contain privacy
sensitive information such as personal photos, email, credit card numbers, passwords, corporate data, and even business secrets. Losing a smart phone with such private information
could be a nightmare for the owner. Numerous cases of celebrities losing their phones with
private photos and secret information have been reported on news [2]. Recently, security
firm Symantec conducted a real-life experiment in five major cities in North America by
leaving 50 smart phones in streets without any password/PIN protection [15]. The results
showed that 96% of finders accessed the phone with 86% of them going through personal information, 83% reading corporate information, 60% accessing social networking and personal
emails, 50% running remote admin, and 43% accessing online bank accounts.
Safeguarding the private information on such mobile devices with touch screens therefore
becomes crucial. The widely adopted solution is that a device locks itself after a few minutes
143

of inactivity and prompts a password/PIN/pattern screen when reactivated. For example,
iPhones use a 4-digit PIN and Android phones use a geometric pattern on a grid of points,
where both the PIN and the pattern are secrets that users should configure on their phones.
These password/PIN/pattern based unlocking schemes have three major weaknesses. First,
they are susceptible to shoulder surfing attacks. Mobile devices are often used in public settings (such as subway stations, schools, and cafeterias) where shoulder surfing often happens
either purposely or inadvertently, and passwords/PIN/patterns are easy to spy [117, 96].
Second, they are susceptible to smudge attacks, where imposters extract sensitive information from recent user input by using the smudges left by fingers on touch screens. Recent
studies have shown that finger smudges (i.e., oily residues) of a legitimate user left on touch
screens can be used to infer password/PIN/pattern [30]. Third, passwords/PINs/patterns
are inconvenient for users to input frequently, so many people disable them leaving their
devices vulnerable.

6.1.2

Proposed Approach

In this chapter, we propose GEAT, a gesture based authentication scheme for the secure
unlocking of touch screen devices. A gesture is a brief interaction of a user’s fingers with the
touch screen such as swiping or pinching with fingers. Figure 6.1 shows two simple gestures
on smart phones. Rather than authenticating users based on what they input (such as
a password/PIN/pattern), which are inherently subjective to shoulder surfing and smudge
attacks, GEAT authenticates users mainly based on how they input. Specifically, GEAT
first asks a user to perform a gesture on touch screens for about 15 to 25 times to obtain
training samples, then extracts and selects behavior features from those sample gestures, and
finally builds models that can classify each gesture input as legitimate or illegitimate using
machine learning techniques. The key insight behind GEAT is that people have consistent
and distinguishing behavior of performing gestures on touch screens. We implemented GEAT
on Samsung Focus, a Windows based phone, as seen in Figure 6.1 and evaluated it using

144

15009 gesture samples that we collected from 50 volunteers. Experimental results show that
GEAT achieves an average Equal Error Rate (EER) of 0.5% with 3 gestures using only 25
training samples.

Figure 6.1 GEAT implemented on Windows Phone 7

Compared to current secure unlocking schemes for touch screen devices, GEAT is significantly more difficult to compromise because it is nearly impossible for an imposter to
reproduce the behavior of others doing gestures through shoulder surfing or smudge attacks.
Unlike password/PIN/pattern based authentication schemes, GEAT allows users to securely
unlock their touch screen devices even when imposters are spying on them. GEAT actually
displays the gesture that the user needs to perform on the screen for unlocking. Compared
with biometrics (such as fingerprint, face, iris, hand, and ear) based authentication schemes,
GEAT has two key advantages on touch screen devices. First, GEAT is secure against
smudge attacks whereas some biometrics, such as fingerprint, are subject to such attacks as
they can be copied. Second, GEAT does not require additional hardware for touch screen
devices whereas biometrics based authentication schemes often require special hardware such
as a fingerprint reader.
For practical deployment, we propose to use password/PIN/pattern based authentication schemes to help GEAT to obtain the training samples from a user. In the first few
days of using a device with GEAT enabled, in each unlocking, the device first prompts the
user to do a gesture and then prompts with the password/PIN/pattern login screen. If the
user successfully logged in based on his password/PIN/pattern input, then the information

145

that GEAT recorded during the user performing the gesture is stored as a training sample;
otherwise, that gesture is discarded. Of course, if the user prefers not to set up a password/PIN/pattern, then the password/PIN/pattern login screen will not be prompted and
the gesture input will be automatically stored as a training sample. During these few days
of training data gathering, users should specially guard their password/PIN/pattern input
from shoulder surfing and smudge attacks. In reality, even if an imposter compromises the
device by shoulder surfing or smudge attacks on the password/PIN/pattern input, the private information stored on the device during the initial few days of using a new device is
typically minimal. Plus, the user can easily shorten this training period to be less than a
day by unlocking his device more frequently. We only need to obtain about 15 to 25 training
samples for each gesture. After the training phase, the password/PIN/pattern based unlocking scheme is automatically disabled and GEAT is automatically enabled. It is possible
that a user’s behavior of doing the gesture evolve over time. Such evolution can be handled
by adapting the scheme proposed by Monrose et al. [81].

6.1.3

Technical Challenges and Solutions

The first challenge is to choose features that can model how a gesture is performed. In this
work, we extract the following seven types of features: velocity magnitude, device acceleration, stroke time, inter-stroke time, stroke displacement magnitude, stroke displacement
direction, and velocity direction. The first five feature types capture the dynamics of performing gestures while the remaining two capture the static shapes of gestures. (1) Velocity
Magnitude: the speed of finger motion at different time instants. (2) Device Acceleration:
the acceleration of touch screen device movement along the three perpendicular axes of the
device. (3) Stroke Time: the time duration that the user takes to complete each stroke. (4)
Inter-stroke Time: the time duration between the starting time of two consecutive strokes
for multi-finger gestures. (5) Stroke Displacement Magnitude: the Euclidean distance between the centers of the bounding boxes of two strokes for multi-finger gestures, where the

146

bounding box of a stroke is the smallest rectangle that completely contains that stroke. (6)
Stroke Displacement Direction: the direction of the line connecting the centers of the bounding boxes of two strokes for multi-finger gestures. (7) Velocity Direction: the direction of
finger motion at different time instants.
The second challenge is to segment each stroke into sub-strokes for a user so that the
user has consistent and distinguishing behavior for the sub-strokes. It is challenging to
determine the number of sub-strokes that a stroke should be segmented into, the starting
point of each sub-stroke, and the time duration of each sub-stroke. On one hand, if the
time duration of a sub-stroke is too short, then the user may not have consistent behavior
for that sub-stroke when performing each gesture. On the other hand, if the time duration
of a sub-stroke is too large, then the distinctive information from the features is too much
averaged out to be useful for authentication. The time duration of different sub-strokes
should not be all equal because at different locations of a gesture a user may have consistent
behaviors that last different amounts of time. In this work, we propose an algorithm that
automatically segments each stroke into sub-strokes of appropriate time duration where for
each sub-stroke the user has consistent and distinguishing behavior. We use coefficient of
variation to quantify consistency.
The third challenge is to learn multiple behaviors from the training samples of a gesture
because people exhibit different behaviors when they perform the same gesture in different
postures such as sitting and lying down. In this work, we distinguish the training samples
that a user made under different postures by making least number of minimum variance
partitions, where the coefficient of variation for each partition is below a threshold, so that
each partition represents a distinct behavior.
The fourth challenge is to remove the high frequency noise in the time series of coordinate
values of touch points. This noise is introduced due to the limited touch resolution of
capacitive touch screens. In this work, we pass each time series of coordinate values through
a low pass filter to remove high frequency noise.

147

The fifth challenge is to design effective gestures. Not all gestures are equally effective
for authentication purposes. In our study, we designed 39 simple gestures that are easy
to perform and collected data from our volunteers for these gestures. After comprehensive
evaluation and comparison, we finally chose 10 most effective gestures shown in Figure 6.2.
The number of unconnected arrows in each gesture represents the number of fingers a user
should use to perform the gesture. Accordingly we can categorize gestures into single-finger
gestures and multi-finger gestures.
1

2

3

4

5

6

7

8

9

10

Figure 6.2 The 10 gestures that GEAT uses

The sixth challenge is to identify gestures for a given user that result in low false positive
and false negative rates. In our scheme, we first ask a user to provide training samples for as
many gestures from our 10 gestures as possible. For each gesture, we develop models of user
behaviors. We then perform elastic deformations on the training gestures so that they stop
representing legitimate user’s behavior. We classify these deformed samples and calculate
EER for a given user for each gesture and rank the gestures based on their EERs. Then
we use the top n gestures for authentication using majority voting where n is selected by
the user. Although larger n is, higher accuracy GEAT has, for practical purposes such as
unlocking smart phone screens, n = 1 (or 3 at most) gives high enough accuracy.

148

6.1.4

Threat Model

During the training phase of a GEAT enabled touch screen device, we assume imposters
cannot have physical access to it. After the training phase, we assume imposters have the
following three capabilities. First, imposters have physical access to the device. Such physical
access can be gained in ways such as thieves stealing a device, finders finding a lost device,
and roommates temporarily holding a device when the owner is taking a shower. Second,
imposters can launch shoulder surfing attacks by spying the owner when he performs gestures.
Third, imposters have necessary equipment and technologies to launch smudge attacks.

6.1.5

Key Contributions

In this chapter, we make following five key contributions.
1. We proposed, implemented, and evaluated a gesture based authentication scheme for
the secure unlocking of touch screen devices.
2. We identified a set of effective features that capture the behavioral information of
performing gestures on touch screens.
3. We proposed an algorithm that automatically segments each stroke into sub-strokes of
different time duration where for each sub-stroke the user has consistent and distinguishing behavior.
4. We proposed an algorithm to extract multiple behaviors from the training samples of
a given gesture.
5. We collected a comprehensive data set containing 15009 training samples from 50 users
and evaluated the performance of GEAT on this data set.

149

6.2
6.2.1

Related Work
Gesture Based Authentication on Phones

A work parallel to ours is that Luca et al. proposed to use the timing of drawing the password
pattern on Android based touch screen phones for authentication [75]. Their work has
following two major technical limitations compared to our work. First, unlike ours, their
scheme has low accuracy. They feed the time series of raw coordinates of the touch points
of a gesture to the dynamic time wrapping signal processing algorithm. They do not extract
any behavioral features from user’s gestures. Their scheme achieves an accuracy of 55%; in
comparison, ours achieves an accuracy of 99.5%. Second, unlike ours, they can not handle
the multiple behaviors of doing the same gesture for the same user.
Another work parallel to ours is that Sae-Bae et al. proposed to use the timing of performing five-finger gestures on multi-touch capable devices for authentication [95]. Their work
has following four major technical limitations compared to our work. First, their scheme
requires users to use all five fingers of a hand to perform the gestures, which is very inconvenient on small touch screens of smart phones. Second, they also feed the time series of raw
coordinates of the touch points to the dynamic time wrapping signal processing algorithm
and do not extract any behavioral features from user’s gestures. Third, they can not handle
the multiple behaviors of doing the same gesture for the same user. Fourth, they have not
evaluated their scheme in real world attack scenarios such as resilience to shoulder surfing.

6.2.2

Phone Usage Based Authentication

Another type of authentication schemes leverages the behavior in using several features on
the smart phones such as making calls, sending text messages, and using camera [112, 42].
Such schemes were primarily developed for continuously monitoring smart phone users for
their authenticity. These schemes take a significant amount of time (often more than a day)
to determine the legitimacy of the user and are not suitable for instantaneous authentication,

150

which is the focus of this chapter.

6.2.3

Keystrokes Based Authentication

Some work has been done to authenticate users based on their typing behavior [126, 81].
Such schemes have mostly been proposed for devices with physical keyboards and have low
accuracy [60]. It is inherently difficult to model typing behavior on touch screens because
most people use the same finger(s) for typing all keys on the keyboard displayed on a screen.
Zheng et al. [130] reported the only work in this direction in a technical report, where they
did a preliminary study to check the feasibility of using tapping behavior for authentication.

6.2.4

Gait Based Authentication

Some schemes have been proposed that utilize accelerometer in smart phones to authenticate
users based upon their gaits [78, 52, 65]. Such schemes have low true positive rates because
gaits of people are different on different types of surfaces such as grass, road, snow, wet
surface, and slippery surface.

6.3

Data Collection and Analysis

In this section, we first describe our data collection process for gesture samples from our
volunteers. Second, we extract the seven types of features from our data and validate our
hypothesis that people have consistent and distinguishing behaviors of performing gestures
on touch screens.

6.3.1

Data Collection

We developed a gesture collection program on Samsung Focus, a Windows based phone.
During the process of a user performing a gesture, our program records the coordinates of
each touch point and the accelerometer values and time stamps associated with each touch
151

point. The duration between consecutive touch points provided by the Windows API on
our device is 18ms. To track movement of multiple fingers, our program ascribes each touch
point to its corresponding finger.
We found 50 volunteers with age ranging from 19 to 55 and jobs ranging from students,
faculty, to corporate employees. We gave a phone to each volunteer for a period of 7 to
10 days and asked them to perform gestures over this period. Our data collection process
consists of two phases. In the first phase, we chose 20 of the volunteers to collect data for
the 39 gestures that we designed and each volunteer performed each gesture for at least 30
times. We conducted experiments to evaluate the classification accuracy of each gesture. An
interesting finding is that different gestures have different average classification accuracies.
We finally choose 10 gestures that have the highest average classification accuracies and
discarded the remaining 29 gestures. These 10 gestures are shown in Figure 6.2. In the
second phase, we collected data on these 10 gestures from the remaining 30 volunteers,
where each volunteer performed each gesture for at least 30 times. Finally, we obtained a
total of 15009 samples for these 10 gestures. The whole data collection took about 5 months.

6.3.2

Data Analysis

We extract the following seven types of features from each gesture sample: velocity magnitude, device acceleration, stroke time, inter-stroke time, stroke displacement magnitude,
stroke displacement direction, and velocity direction.
• Velocity and Acceleration Magnitude: From our data set, we observe that people
have consistent and distinguishing patterns of velocity magnitudes and device accelerations
along its three perpendicular axes while doing gestures. For example, Figure 6.3(a) shows
the time series of velocity magnitudes of two samples of gesture 4 in Figure 6.2 performed
by a volunteer. Figure 6.3(b) shows the same for another volunteer. Similarly Figures 6.4(a)
and 6.4(b) show the time series of acceleration along the x-axis in two samples of gesture 4
by two volunteers. We observe that the samples from same user are similar and at the same

152

5000

Sample 1
Sample 2

4000

Vel. Mag. (pixels/sec)

Vel. Mag. (pixels/sec)

time different from samples from another user.

3000
2000
1000
0
0

0.2

0.4
0.6
0.8
Normalized Time

2000
1500
1000
500
0
0

1

Sample 1
Sample 2

2500

(a) Volunteer 1

0.2

0.4
0.6
0.8
Normalized Time

1

(b) Volunteer 2

Figure 6.3 Velocity magnitudes of gesture 4

0.04

0
−0.02
−0.04
−0.06
−0.08
0

Sample 1
Sample 2

0.02
Acceleration

0.02
Acceleration

0.04

Sample 1
Sample 2

0
−0.02
−0.04
−0.06

0.2

0.4
0.6
0.8
Normalized Time

−0.08
0

1

(a) Volunteer 1

0.2

0.4
0.6
0.8
Normalized Time

1

(b) Volunteer 2

Figure 6.4 Device acceleration of gesture 4

To quantify the similarity between any two time series, f1 with m1 values and f2 with
m2 values, where m1 ≤ m2 , we calculate the root mean squared (RMS) value of the time
series obtained by subtracting the normalized values of f1 from the normalized values of f2 .
Normalized time series fˆi of a time series fi is calculated as below, where fi [q] is the q th
value in fi .
fˆi [q] =

fi [q] − min(fi )
max fi − min(fi )

∀q ∈ [1, mi ]

(6.1)

Normalizing the time series brings all its values in the range of [0, 1]. We do not use metrics
such as correlation to measure similarity between two time series because their values are
not bounded.

153

To subtract one time series from the other, the number of elements in the two need to
be equal; however, this often does not hold. Thus, before subtracting, we re-sample f2 at a
sampling rate of m1 /m2 to make f2 and f1 equal in number of elements. The RMS value of
a time series f containing N elements, represented by Pf , is calculated as:
Pf =

1
N

N

f 2 [m]

(6.2)

m=1

Normalizing the two time series before subtracting them to obtain f ensures that each value
in f lies in the range of [−1, 1] and consequently the RMS value lies in the range of [0, 1]. An
RMS value closer to 0 implies that the two time series are highly alike while an RMS value
closer to 1 implies that the two time series are very different. For example, the RMS value
between the two time series from the volunteer in Figure 6.3(a) is 0.119 and that between the
two time series of the volunteer in Figure 6.3(b) is 0.087, whereas the RMS value between
a time series in Figure 6.3(a) and another in Figure 6.3(b) is 0.347. Similarly, the RMS
values between the two time series of each volunteer in Figures 6.4(a) and 6.4(b) are 0.159
and 0.144, respectively, whereas the RMS value between one time series in Figure 6.4(a) and
another in Figure 6.4(b) is 0.362.
• Stroke Time, Inter-stroke Time, and Stroke Displacement Magnitude: From
our data set, we observe that people take consistent and distinguishing amount of time to
complete each stroke in a gesture. For multi-finger gestures, people have consistent and
distinguishing time duration between the starting times of two consecutive strokes in a
gesture and have consistent and distinguishing magnitudes of displacement between the
centers of any two strokes. The distributions of stroke times of different users are centered at
different means and the overlap is usually small, which becomes insignificant when the feature
is used with other features. Same is the case for inter-stroke times and stroke displacement
magnitudes. Figures 6.5, 6.6, and 6.7 plot the distribution of stroke time of gesture 4, interstroke time of gesture 6, and stroke displacement magnitude of gestures 7, respectively, for
different volunteers. The figures show that the overlap in distributions for different users is
154

small and are centered at different means.
1
Relative Frequency

Relative Frequency

1
0.8
0.6
0.4
0.2
0

V1
V2
0.5

0.6

0.7 0.8 0.9
Time (sec)

0.2

1
Relative Frequency

Relative Frequency

0.4

0.8
0.6
0.4
V1
V2
300

325
350
375
Distance

0.05

0.1
0.15 0.2
Time (sec)

0.25

Figure 6.6 Dists. of inter-stroke time

1

0.2

V1
V2

0.6

0

1

Figure 6.5 Distributions of stroke time

0

0.8

0.8

V1
V2
V3

0.6
0.4
0.2
0
pi/4 3pi/8 pi/2 5pi/8 3pi/4 7pi/8
Phase (rads)

400

Figure 6.7 Distributions of disp. mag.

pi

Figure 6.8 Distributions of disp. dir.

• Stroke Displacement and Velocity Directions From our data set, we observe that
people have consistent, but not always distinguishing, patterns of velocity and stroke displacement directions because different people may produce gestures of similar shapes. For
example, Figure 6.8 plots the distributions of the displacement direction of gesture 1 for
three volunteers. Figure 6.9 shows the time series of velocity directions of gesture 10 for
three volunteers. Volunteers V1 and V2 produced similar shapes of gesture 1 as well as
gesture 10, so they have overlapping distributions and time series. Volunteer V3 produced
shapes of the two gestures different from the corresponding shapes produced by volunteers
V1 and V2, and thus has a non-overlapping distribution and time series.

155

Vel. Directon

2pi

V1
V2
V3

3pi/2
pi
pi/2
0
0

0.2

0.4
0.6
0.8
Normalized Time

1

Figure 6.9 Velocity direction of gesture 10

6.4

GEAT Overview

To authenticate a user based on his behavior of preforming a gesture, GEAT needs to have
a model of the legitimate user’s behaviors of preforming that gesture. Given the training
samples of the gesture performed by the legitimate user, GEAT builds this model using
Support Vector Distribution Estimation (SVDE) in the following five steps.
The first step is noise removal, where GEAT passes the time series of touch point coordinates in each gesture sample through a filter to remove high frequency noise.
The second step is feature extraction, where GEAT extracts the values of the seven types
of features from the gesture samples and concatenates these values to form a feature vector.
To extract feature values of velocity magnitude, velocity direction, and device accelerations,
GEAT segments each stroke in a gesture sample into sub-strokes at multiple time resolutions
and extracts values from these sub-strokes. We call these three types of features sub-stroke
based features. For the remaining four types of features, GEAT extracts values from the
entire strokes in each gesture. We call these four types of features stroke based features.
The third step is feature selection. For each feature element, GEAT first partition all
its N values, where N is the total number of training samples, into the least number of
minimum variance partitions, where the coefficient of variation for each partition is below a
threshold. If the number of minimum variance partitions is less than or equal to the number
of postures in which the legitimate user provided the training samples, then we select this
feature element; otherwise, we discard it. For this purpose, ideally the user should inform

156

GEAT the number of postures in which he performed the training gestures. However, if the
user does not provide this information, the classification accuracy of GEAT decreases, but
only very slightly, as shown in our experimental results in Section 6.9.
The fourth step is classifier training. GEAT first partitions all N feature vectors into
the minimum number of groups so that within each group, all feature vectors belong to the
same minimum variance partition for any feature element. We call each group a consistent
training group. Then, for each group of feature vectors, GEAT builds a model in the form
of an ensemble of SVDE classifiers trained using these vectors. Note that we do not use
any gestures from imposters in training GEAT because in the real-world deployment of
authentication systems, training samples are typically available only from the legitimate
user.
The fifth step is gesture ranking. For each gesture, GEAT repeats the above four steps
and then ranks the gestures based on their EERs. A user can pick 1 ≤ n ≤ 10 gestures to
be used in each user authentication. Although the larger n is, the higher accuracy GEAT
has, for practical purposes such as unlocking smart phone screens, n = 1 (or 3 at most)
gives us high enough accuracy. To calculate the EER of a gesture, GEAT needs the true
positive rates (TPR) and false positive rates (FPR) for that gesture. TPRs for each gesture
are calculated using 10 fold cross validation on legitimate user’s samples of the gesture.
To calculate FPRs, GEAT needs imposter samples, which are not available in real world
deployment at the time of training. Therefore, GEAT generates synthetic imposter samples
by elastically deforming the samples of legitimate user using cubic B-splines and calculates
the FPRs using these synthetic imposter samples. Note that the synthetic imposter samples
are used only in ranking gestures, the performance evaluation of GEAT that we present
in Section 6.9 is done entirely on real world imposter samples. These synthetic imposter
samples are not used in classifier training either.
When a user tries to login on a touch screen device with GEAT enabled, the device displays
the n top ranked gestures for the user to perform. Then authentication process behind the

157

scene works as follows. First, for each gesture, GEAT extracts the values of all the feature
elements selected earlier by the corresponding classification model for this gesture. Second,
GEAT feeds the feature vector consisting these values to the ensemble of SVDE classifiers of
each consistent training group and gets a classification decision. If the classification decision
of any ensemble is positive, which means that the gesture has almost the same behavior
as one of the consistent training groups that we identified from the training samples of the
legitimate user, then GEAT accepts that gesture input to be legitimate. Third, after GEAT
makes the decision for each of the n gestures, GEAT makes the final decision on whether to
accept the user as legitimate based on the majority voting on the n decisions.

6.5

Noise Removal

The time series of x and y coordinates of the touch points of each stroke contain high
frequency noise as we can see from the time series of x coordinates for a sample gesture in
Figure 6.10(a). There are two major contributors to this noise. First, the touch resolution
of capacitive touch screens is limited. Second, because capacitive touch screens determine
the coordinates of each touch point by calculating the coordinates of the centroid of the area
on the screen touched by a finger, when a finger moves on the screen, its contact area varies
and the centroid changes at each time instant, resulting in high frequency noise. Such noise
should be removed because it affects velocity magnitude and direction values.
We remove such high frequency noise by passing the time series of x and y coordinates of
touch points through a low pass filter. We consider frequencies above 20Hz as high frequencies
because the time series of touch points contain most of their energy in frequencies lower than
20Hz, as we can see from the magnitude of the fourier transform of this time series in Figure
6.10(b). In this work, we use a simple moving average (SMA) filter, which is the unweighted
mean of previous α data points. We choose the value of α to be 10. Figure 6.10(c) shows
the time series of Figure 6.10(a) after passing through the SMA filter. We can see that the

158

5000

400

4000

FFT magnitude

x−coordinate

500

300
200
100
0
0

50

Time

100

1000
20

40
60
Frequency

80

(b) FFT of unfiltered
4000
FFT magnitude

500
x−coordinate

2000

0
0

150

(a) Unfiltered

400
300
200
100
0
0

3000

50

Time

100

3000
2000
1000
0
0

150

(c) Filtered

20

40
60
Frequency

80

(d) FFT of filtered

Figure 6.10 Unfiltered and filtered time series
filtered time series is much smoother compared to the unfiltered time series. Figure 6.10(d)
shows the magnitude of fourier transform of the filtered time series. We observe from this
figure that the magnitudes of frequency components above 20Hz are negligible.

6.6 Feature Extraction & Selection
In this section, we describe the feature extraction and selection process in GEAT. We categorize the seven types of features into stroke based features, which include stroke time,
inter-stroke time, stroke displacement magnitude, and stroke displacement direction, and
sub-stroke based features, which include velocity magnitude, velocity direction, and device
acceleration.

159

6.6.1

Stroke Based Features

6.6.1.1

Extraction

To extract the stroke time of each stroke, we calculate the time duration between the time of
the first touch point and that of the last touch point of the stroke. To extract the inter-stroke
time between two consecutive strokes in a gesture, we calculate the time duration between
the time of the first touch point of the first stroke and that of the second stroke. To extract
the stroke displacement magnitude between any two strokes in a gesture, we calculate the
Euclidean distance between the centers of the two bounding boxes of the two strokes. To
extract stroke displacement direction between any two strokes in a gesture, we calculate
the arc-tangent of the ratio of the magnitudes of the vertical component and the horizontal
component of the stroke displacement vector directed from the center of one bounding box to
the center of the other bounding box. We calculate inter-stroke time and stroke displacement
magnitude and direction from all pairs of strokes in a gesture.
6.6.1.2

Selection

Given N training samples, for each feature element, we first partition all its N values into the
least number of minimum variance partitions (MVPs) where the coefficient of variation (cv)
for each partition is below a threshold. Let Pk and Qk represent two different partitionings
of N values, each containing k partitions. Let σi2 (Pk ) and σi2 (Qk ) represent the variance of
values in partition i (1 ≤ i ≤ k) of partitioning Pk and Qk , respectively. Partitioning Pk
is the MVP if for any Qk , maxi σi2 (Pk )) ≤ maxi (σi2 (Qk ) . We empirically determined the
threshold of the cv to be 0.1. The detailed empirical evaluation of this threshold is given in
Section 6.9.
To find the least number of MVPs, we start by increasing the number of MVPs from one
until cv of all partitions is below the threshold. To obtain MVPs, we use agglomerative hierarchical clustering with Ward’s method [58]. Ward’s method allows us to make any number

160

of partitions by cutting the dendrogram built by agglomerative hierarchical clustering at an
appropriate level. Figure 6.11 shows dendrograms made through hierarchical clustering with
Ward’s method form the values of stroke time of two volunteers for gesture 5. A dendrogram
visually illustrates the presence and arrangement of clusters in data. The dendrogram in
Figure 6.11(a) is for a volunteer who performed gestures in two postures, sitting and laying
down. The dendrogram in Figure 6.11(b) is for a volunteer who performed gestures in one
posture. We make two MVPs for Figure 6.11(a) and one for Figure 6.11(b).

(a) Two behaviors

(b) One behavior

Figure 6.11 Dendrograms for feature values with one and two behaviors
After we find the least number of MVPs, where the cv for each partition is below the
threshold, we decide whether to select this feature element. If the number of partitions in
these MVPs is less than or equal to the number of postures in which the training samples
are performed, then we select this feature element; otherwise, we discard it. We ask the user
to enter the number of postures in which he performed training samples. If the user does
not provide this input, we assume the number of postures to be 1.

6.6.2

Sub-stroke Based Features

Sub-stroke based features include velocity magnitude, velocity direction, and device acceleration. To extract values for these features, GEAT needs to segment each stroke into
sub-strokes because of two major reasons. First, at different segments of a stroke, the finger
often has different moving speed and direction. Second, at different segments of a stroke,
161

the device often has different acceleration. If we measure the feature values from the entire
stroke, we will only utilize the information measured at the starting and ending points of the
stroke, by which we will miss the distinguishing information of velocity magnitude, velocity
direction, and device acceleration at different segments of the stroke.
Our goal is to segment a stroke into sub-strokes so that the velocity magnitude, velocity
direction, and device acceleration information measured at each sub-stroke characterizes the
distinguishing behaviors of the user who made the stroke. There are three key technical
challenges to this goal. The first technical challenge is how we should segment N stroke
samples of different time durations assuming that we are given an appropriate time duration
as the segmentation guideline. The second technical challenge is how to find the appropriate
time duration as the segmentation guideline. The third technical challenge is how to select
sub-strokes whose velocity magnitude, velocity direction, and device acceleration information
will be included in the feature vector used by GEAT for training. Next, we present our
solutions to these three technical challenges.
6.6.2.1

Stroke Segmentation and Feature Extraction

Given N strokes performed by one user and the appropriate time duration p as the segmentation guideline, we need to segment each stroke into the same number of segments so that
for each stroke we obtain the same number of feature elements. However, because different strokes have different time durations, segmenting each stroke into sub-strokes of time
duration p will not give us the same number of segments for different strokes. To address
this issue, we first calculate ⌈ pt ⌉ for each stroke where t is the time duration of the stroke.
From the resulting N values, we use the most frequent value, denoted s, to be the number
of sub-strokes that each stroke should be segmented into. Finally, we segment each stroke
into s sub-strokes where each sub-stroke within a stroke has the same time duration.
After segmenting all strokes into sub-strokes, we extract velocity magnitude, velocity direction, and device acceleration from each sub-stroke. To calculate velocity magnitude and

162

direction, we first obtain the coordinates of the starting and ending points of the sub-stroke.
The starting and ending points of a sub-stroke, which is segmented from a stroke based on
time duration, often do not lie exactly on touch points reported by the touch screen device.
For any end point that lies between two consecutive touch points reported by the touch
screen device, we calculate its coordinates by interpolating between these two touch points.
Let (xi , yi ) be the coordinates of a touch point with time stamp ti and (xi+1 , yi+i) be the
coordinates of the adjacent touch point with time stamp ti+1 . Suppose the time stamp of
an end point is t where ti < t < ti+1 . Then, we calculate the coordinates (x, y) of this end
point based on the straight line between (xi , yi ) and (xi+1 , yi+i) as follows:
(t − ti )
× (xi+1 − xi ) + xi
(ti+1 − ti )
(t − ti )
× (yi+1 − yi ) + yi
y=
(ti+1 − ti )
x=

(6.3)
(6.4)

We extract the velocity magnitude of each sub-stroke by calculating the Euclidean distance
between the starting and ending points of the sub-stroke divided by the time duration of
the sub-stroke. We extract the velocity direction of each sub-stroke by calculating the arctangent of the ratio of the magnitudes of the vertical and horizontal components of the
velocity vector directed from the starting point to the ending point of the sub-stroke. We
extract the device acceleration during each sub-stroke by averaging the device acceleration
values reported by the touch screen device at each touch point in that sub-stroke in all three
directions.
6.6.2.2

Sub-stroke Time Duration

Next, we investigate how to find the appropriate sub-stroke time duration. On one hand,
when the sub-stroke time duration is too small, the behavior information extracted from each
sub-stroke of the same user may become inconsistent because when feature values become
instantaneous, they are unlikely to be consistent for the same user. For example, from Figure
6.12, which shows the cv for the velocity magnitude values extracted from the first sub-stroke

163

from all samples of a gesture performed by a random volunteer in our data set, when we
vary the sub-stroke time duration from 5ms to 100ms, we observe that the cv is too large
to be usable when the sub-stroke time duration is too small and the cv decreases as we
increase sub-stroke time duration. On the other hand, when the sub-stroke time duration
is too large, the behavior information extracted from each sub-stroke of different users may
become similar because all unique dynamics of individual users are too averaged out to be
distinguishable. For example, treating all the samples of a gesture performed by all our
volunteers as if they are all performed by the same person, Figure 6.13 shows that when
the sub-stroke time duration is 80ms, over 60% of feature elements of velocity magnitude
are consistent, which means that they do not have any distinguishing power among different
users. It is therefore challenging to trade off between consistency and distinguishability in

0.8

0.6

Consistency factor

Coefficient of variation

choosing the appropriate time duration for sub-strokes.

0.5
0.4
0.3
0.2
0.1
0

0.7
0.6
0.5
0.4
20

20
40
60
80
100
Sub−stroke time duration (ms)

Figure 6.12 cv vs. time periods

Volunteer 1
Volunteer 2
Combined

40
60
80
Sub−stroke time duration (ms)

Figure 6.13 Consistency factor

Next, we present the way that we achieve this tradeoff and find the appropriate time
duration for sub-strokes. We first define a metric called consistency factor. Given a set of
samples of the same stroke, which are segmented using time duration p as the guideline,
let s be the number of sub-strokes, c be the number of sub-strokes that have the consistent
behavior for a particular feature, we define the consistency factor of this set of samples
under time duration p to be sc . For simplicity, we use combined consistency factor to mean
the consistency factor of the set of all samples of the same stroke from all volunteers, and
individual consistency factor to mean the consistency factor of the set of all samples of
the same stroke from the same volunteer. Figure 6.13 shows the combined consistency
164

factor plot and two individual consistency factor plots of an example gesture. We have two
important observations from this figure. First, the individual consistency factors mostly
keep increasing as we increase sub-stroke time duration p. Second, the combined consistency
factor has a significant dip when p is in the range from 30ms to 60ms. We conducted the
similar measurement for other strokes from other gestures for velocity magnitude, velocity
direction, and device acceleration and made the same two observations. This means that
when sub-stroke time duration is between 30ms to 60ms, people have distinguishing behavior
for the features of velocity magnitude, velocity direction, and device acceleration. Therefore,
we choose time duration p to be between 30ms to 60ms.
6.6.2.3

Sub-stroke Selection at Appropriate Resolutions

So far we have assumed that all sub-strokes segmented from a stroke have the same time
duration. However, in reality, people have consistent and distinguishing behavior for substrokes of different time durations. Next, we discuss how we find such sub-strokes of different
durations. For each type of sub-stroke based features, we represent the entire time duration
of a stroke as a line with the initial color of white. Given a set of samples of a stroke
performed by one user under b postures, we first segment the stroke with the time duration
p = 60ms and the number of MVPs k = 1. For each resulting sub-stroke, we measure cv
of the feature values extracted from the sub-stroke. If it is lower than the threshold, then
we choose this sub-stroke with k MVPs as a feature element and color this sub-stroke in
the line as black. After this round of segmentation, if any white sub-stroke is left, we move
to the next round of segmentation on the entire stroke with p = 55ms and the number of
MVPs k still being 1. In this round, for any sub-stroke whose color is completely white, we
measure its cv; if it is lower than the threshold, then we choose this sub-stroke with k MVPs
as a feature element and color this sub-stroke in the line as black. We continue this process,
decrementing the time duration p by 5ms in each round until either there is no white region
of length greater than or equal to 30ms left in the line or p is decremented to 30. If p is

165

decremented to 30 but there are still white regions of length greater than or equal to 30ms,
we increase k by 1, reset p to be 60ms, and repeat the above process again. The last possible
round is the one with k = b and p = 30ms. The process also terminates whenever there is
no white region of length greater than or equal to 30ms.

6.7

Classifier Training

In this section, we explain the internal details of GEAT on training its classifiers. After
feature extraction and selection, we obtain one feature vector for each training sample of a
gesture. For a single-finger gesture, the feature vector contains the values of the selected
feature elements such as stroke time and the velocity magnitude, velocity direction, and
device acceleration from selected sub-strokes. For a multi-finger gesture, the feature vector
additionally contains the selected feature elements such as inter-stroke time, displacement
magnitude, and direction between all pairs of strokes.

6.7.1

Partitioning the Training Sample

Before we use these N feature vectors to train our classifiers, we partition them into consistent training groups so that the user has the consistent behavior for each group for any
feature element. Recall that for each feature element, we have already partitioned the N
feature vectors into the least number of MVPs. For different feature elements, we may have
partitioned the N feature vectors differently. Thus, we partition the N feature vectors into
the least number of consistent training groups so that for each feature element, all feature
vectors within a training group belong to one minimum variance partition. If the number of
feature vectors in a resulting consistent training group is below a threshold, then it is not
used to train classifiers.

166

6.7.2

Training the SVDE Classifiers

In real world deployment of authentication schemes, training samples are often all from the
legitimate user. When training data is only from one class (i.e., the legitimate user in our
scenario) while test samples can come from two classes (i.e., both the legitimate user and
imposters), Support Vector Distribution Estimation (SVDE) with the Radial Basis Function
(RBF) kernel is effective and efficient [97, 59]. We use the open source implementation of
SVDE in libSVM [38].
We build an ensemble of classifiers for each consistent training group. First, for each
feature element, we normalize its N values to be in the range of [0, 1]; otherwise feature
elements with larger values will dominate the classifier training. Second, we empirically find
the appropriate values for γ, a parameter for RBF kernel, and ν, a parameter for SVDE,
by performing a grid search on the ranges 2−17 ≤ γ ≤ 20 and 2−10 ≤ ν ≤ 20 with 10-fold
cross validation on each training group. As the training samples are only from one class (i.e.,
the legitimate user), cross validation during grid search only measures the true positive rate
(TPR). Figure 6.14(a) plots a surface of TPR resulting from cross validation during the grid
search for a training group of a gesture for one volunteer. We see that TPR values are different
0

10

−1

10
nu

TP rate

100
50

−2

10
0
10

10

2

5

−3

10
0

nu

0

10 10

−4

10

gamma

(a) TPR surface plot

−2

10
gamma

0

10

(b) 95% TPR contour

Figure 6.14 Parameter selection
for different parameter values and there is a region where the TPR values are particularly
high. The downside of selecting parameter values with higher TPR is that it increases
the false positive rate (FPR). While selecting parameter values with lower TPR decreases

167

the FPR, it is inconvenient for the legitimate user if he cannot successfully authenticate in
several attempts. Therefore, we need to tradeoff between usability and security in selecting
parameter values. We choose the highest value of TPR such that 1−TPR equals FPR, which
results in the lowest EER. To calculate FPRs, GEAT needs imposter samples, which are not
available in real world deployment at the time of training. Therefore, GEAT generates
synthetic imposter samples by elastically deforming the samples of legitimate user using
cubic B-splines and calculates the FPRs using these synthetic imposter samples. Note that
these synthetic imposter samples are not used in classifier training.
Once we decide on TPR, we obtain the coordinates of the points on the contour of that
TPR from the surface formed by the grid search. Figure 6.14(b) shows the 95% TPR contour
on the surface in Figure 6.14(a). From the points on this contour, we randomly select z (say
z = 10) points, where each point provides us with the parameter values of γ and ν. For
each of the z pairs of parameter values of γ and ν, GEAT trains an SVDE classifier on a
consistent training group. Thus, for each consistent training group, we get an ensemble of z
classifiers for modeling the behavior of the legitimate user. This ensemble can now be used
to classify any test sample. The decision of this ensemble of classifiers for a test sample is
based on the majority voting on the decision of the z classifiers in the ensemble. Larger
value of z increases the probability of achieving the TPR at which the contour was made,
however, the computation required to perform authentication also increases. Therefore, we
need to tradeoff between classification reliability and efficiency in choosing the value of z.
We choose z = 10 in our experiments.

6.7.3

Classifying the Test Samples

Given a test sample of a gesture on a touch screen device, we first extract values from this
test sample for the selected feature elements of the legitimate user of this device and form
a feature vector. Then, we feed this feature vector to all ensembles of classifiers. If any
ensemble of classifiers accepts this feature vector as legitimate, which means that this test

168

sample gesture is similar to one of the identified behavior of the legitimate user, we accept
this test sample as legitimate and skip the remaining ensembles of classifiers. If no ensemble
accepts this test sample as legitimate, then this test sample is deemed as illegitimate.

6.8

Ranking and Classification

For each gesture, GEAT repeats the above three steps given in Sections 6.5, 6.6, and 6.7 and
then ranks the gestures based on their EERs. The user chooses the value of n, the number
of gestures with lowest EERs that the user needs to do in each authentication attempt.
Although larger n is, higher accuracy GEAT has, for practical purposes, n = 1 (or 3 at
most) gives high enough accuracy.
When a user tries to unlock, the device displays the n top ranked gestures for the user to
perform. GEAT classifies each gesture input as discussed in Section 6.7.3, and uses majority
voting on the n decisions to make the final decision about the legitimacy of the user.

6.9

Experimental Results

In this section, we present the results from our evaluation of GEAT. First, we report EERs
from Matlab simulations on gestures in our data set. Second, we study the impact of the
number of training samples on the EER of GEAT. Third, we study the impact of the threshold of cv on the EER of GEAT and justify our choice of using 0.1 as the threshold. Fourth,
we report the results from real world evaluation of GEAT implemented on Windows smart
phones. Last, we compare the performance of GEAT with the scheme proposed in [75]. We
report our results in terms of equal error rates (EER), true positive rates (TPR), false negative rates (FNR), and false positive rates (FPR). EER is the error rate when the classifier
parameters are selected such that FNR equals FPR.

169

6.9.1

Accuracy Evaluation

First, we present our error rates when the number of postures b is equal to 1, which means
that GEAT only looks for a single consistent behavior among all training samples. Second,
we present the error rates of GEAT when b > 1, which means that GEAT looks for multiple
consistent behaviors in training samples. We present these error rates for n = 1 and n = 3
where n is the number of gestures that the user needs to do for authentication. Recall that
GEAT allows a user to choose the n top ranked gestures. Third, we present the average
error rates for each of the 10 gestures. We calculated the average error rates by treating
each volunteer as a legitimate user once and treating the remaining as imposters for the
current legitimate user. To train SVDE classifiers on legitimate user for a given gesture,
we used a set of 15 samples of that gesture from that legitimate user. For testing, we used
remaining samples from the legitimate user and 5 randomly chosen samples of that gesture
from each imposter. We repeated this process of training and testing on the samples of the
given gesture for 10 times, each time choosing a different set of training samples. We did
not use imposter samples in training.
For the training samples of a gesture performed by a user, ideally, we would like to know
the number of postures b in which the user performed the gesture. Knowing the value of b
helps us to achieve higher classification accuracy. However, in real deployment, the value of
b may not be available. In such scenarios, actually our classification accuracy is still very
high. Next, we first present the evaluation results if we do not know the value of b. In such
cases, we treat all training samples to be from the same posture by setting b = 1. Then, we
present the evaluation results if we know the value of b.
6.9.1.1

Single Behavior Results

In this case, we assume b = 1. Figure 6.15(a) plots the cumulative distribution functions
(CDFs) of the EERs of GEAT with and without accelerometers, and the FNR of GEAT
when FPR is less than 0.1%, for n = 1. Similarly, Figure 6.15(b) shows the corresponding

170

plots for n = 3. We make following two observations when device acceleration features are
used in training and testing. First, the average EER of users in our data set for n = 1 and
n = 3 is 4.8% and 1.7%, respectively. Second, over 80% of users have their EERs less than
4.9% and 3.4% for n = 1 and n = 3, respectively. We make following two observations when
device acceleration features are not available. First, the average EER of users in our data
set for n = 1 and n = 3 is 6.8% and 3.7%, respectively. That is, EER increases by 2% for
both n = 1 and n = 3 when accelerometers are not available. This shows that even when
accelerometers are not available, GEAT still has high classification accuracy. Second, over
80% of users have their EERs less than 6.7% and 5.2% for n = 1 and n = 3, respectively.
We also observe that the average FNR is less than 14.4% and 9.2% for n = 1 and n = 3,
respectively when FPR is taken to be negligibly small (i.e.FPR < 0.1%). These CDFs show
that if the parameters of the classifiers in GEAT are selected such that the legitimate user
is rejected only once in 10 attempts i.e., for TPR≈ 90%, an imposter will almost never be

1

1

0.8

0.8

0.6

0.6

CDF

CDF

accepted i.e.FPR≈ 0%.

0.4
w/ accelerometer
wo/ accelerometer
FNR@FPR<0.1%

0.2
0
0

5

10

EER

15

20

0.4
w/ accelerometer
wo/ accelerometer
FNR@FPR<0.1%

0.2
0
0

25

(a) n = 1

5

10

EER

15

20

25

(b) n = 3

Figure 6.15 EERs with and without accelerometer and FNR at FPR < 0.1%

6.9.1.2

Multiple Behaviors

Among our volunteers, we requested ten volunteers to do each of the 10 gestures in 2 postures
(i.e., sitting and laying down). In this case, b = 2. Figure 6.16(a) shows the EER for these
ten volunteers for b = 1, 2, and 3. We see that the EER is minimum when b = 2 for these ten
171

b=2

EER (shoulder surfing)

EER (multi−behavior)

b=1

15

b=3

10
5
0

1

2

3

4 5 6 7
Volunteers

8

9 10

4

n=1

n=3

3
2
1
0

1

(a) EER w.r.t b

2

3

4 5 6 7
Volunteers

8

9 10

(b) Shoulder surfing

Figure 6.16 EER under different scenarios
100

60
40

G6
G7
G8
G9
G10

80
FPR

80
FPR

100

G1
G2
G3
G4
G5

20

60
40
20

0
84

86

88

90 92
TPR

94

96

0
84

98

(a) Gestures 1 to 5

86

88

90 92
TPR

94

96

98

(b) Gestures 6 to 1

Figure 6.17 Avg. FPR vs. TPR for all gestures
volunteers because these volunteers provided training samples of gestures in two postures.
Figure 6.16(a) shows that the use of b < 2 results in a larger EER because it renders most of
the sub-strokes inconsistent, which leaves lesser consistent information to train the classifiers.
Figure 6.16(a) shows that the use of b > 2 results in a larger EER as well because dividing
the training samples made under b postures into more than b consistent training groups
reduces the training samples in each group, resulting in increased EER.
6.9.1.3

Individual Gestures

The FPR of each gesture averaged over all users is always below 5% for a TPR of 90% and
decreases with the decrease in TPR. Figures 6.17(a) and 6.17(b) show the plots of FPRs
vs. TPRs for each of the 10 gestures, averaged over all users. Table 6.1 shows AUC, the
area under the receiver operating characteristic (ROC) curve, of all gestures for both filtered

172

and unfiltered samples. Unfiltered samples are the samples before the noise is removed.
We see that the AUC values are greater than 0.95 for most gestures. Note that an ideal
classification scheme that never misclassifies any samples has AUC=1. We also see from
Table 6.1 that AUC values for unfiltered gestures are slightly lower compared to AUC values
for filtered gestures showing that filtering before feature extraction improves classification
accuracy. We have presented both FPR and TPR for all gestures individually only to show
how individual gestures perform. In real world implementation, a user will only perform n
top ranked gestures, resulting in much lower FPR at much higher TPR as shown by the
small values of EER in 6.15(b).

Table 6.1 AUC for filtered and unfiltered gestures
G1
0.94
G1
0.92

6.9.2

G2
0.96
G2
0.95

G3
0.95
G3
0.94

G4
0.95

Filtered
G5
G6
0.95 0.96

G7
0.96

G8
0.96

G9
0.96

G10
0.96

G4
0.93

Unfiltered
G5
G6
0.94 0.95

G7
0.94

G8
0.95

G9
0.95

G10
0.94

Impact of Training Samples Size

The EER decreases with the increase in the number of training samples. Figure 6.18(a) plots
the EERs averaged over all users for n = 1 and n = 3 for the increasing number of training
samples. For n = 1 and n = 3, average EER falls to 3.2% and 0.5%, respectively, with just
25 training samples. An EER of 0.5% means TPR=99.5% and FPR=0.5%, which are very
good results for classification schemes. A user can achieve these rates by providing only 25
training samples for each gesture. Providing more training samples over time further lowers
the EER.

173

20

1 gesture
3 gestures

6

Average EER

Average EER

8

4
2
0

5

10
15
20
# Training Samples

15
10
5
0
0

25

(a) # of training samples

1 gesture
3 gestures

0.1

0.2

0.3
Tcv

0.4

0.5

(b) Effect of Tcv

Figure 6.18 Effect of system parameters on EER

6.9.3

Determining Threshold for cv

The average EER is a convex function in terms of the threshold of cv, denoted by Tcv . On
one hand, if Tcv is too small, then it is difficult to find sub-strokes in which the user has
consistent behavior, which gives us less information for classifier training. On the other hand,
if Tcv is too large, then the feature elements with less consistent behavior will be selected,
which adds noise in the user behavior models. Figure 6.18(b) shows the average EER for
n = 1 and n = 3. We see that the average EER is the smallest for Tcv = 0.1.

6.9.4

Real-world Evaluation

We evaluated GEAT on two sets of 10 volunteers each in real-world settings by implementing
it on Samsung Focus running Windows. We used the first set to evaluate GEAT’s resilience
to attacks by imposters that have not observed the legitimate users while doing the gestures.
We used the second set to evaluate GEAT’s resilience to shoulder surfing attack, where
imposters have observed the legitimate users while doing the gestures.
6.9.4.1

Non-shoulder Surfing Attack

In this case, our implementation requests the user to provide training samples for all gestures
and trains GEAT on those samples. We asked each volunteer in the first set to provide at
least 15 training samples for each gesture. GEAT also asks the user to select a value of n.
174

We used n = 1 and 3 in our experiments. Once trained, we asked the legitimate user to do
his n top ranked gestures ten times and recorded the authentication decisions to calculate
TPR. After this, we randomly picked 5 out of 9 remaining volunteers to act as imposters
and did not show them how the legitimate user does the gestures. We asked each imposer
to do the same top n ranked gestures, and recorded the authentication decisions to calculate
FPR. We repeated this process for each volunteer by asking him to act as the legitimate user
once. Furthermore, we repeated this entire process for all ten volunteers five times on five
different days. The average (TPR, FPR) over all volunteers for n = 1 and n = 3 turned out
100

n=1

8

n=3

FPR

TPR

n=3

6

80
60
40

4
2

20
0

n=1

0

1 2 3 4 5 6 7 8 9 10
Volunteers

(a) TPR for n = 1, 3

1

2

3

4 5 6 7
Volunteers

8

9 10

(b) FPR for n = 1, 3

Figure 6.19 Real world results of GEAT
to be (94.6%, 4.02%) and (98.2%, 1.1%), respectively. Figures 6.19(a) and 6.19(b) show the
bar plots of TPR and FPR of each of the 10 volunteers for n = 1 and 3, respectively.
6.9.4.2

Shoulder Surfing Attack

For this scenario, we made a video of a legitimate user doing all gestures on the touch screen
of our Samsung Focus phone and showed this video to each of the 10 volunteers in the
second set. The volunteers were allowed to watch the video as many times as they wanted
and then requested them to perform each gesture ten times. The average FPR over all 10
volunteers turned out to be 0% for n = 1 as well as n = 3 when we set the TPR at 80%.
The average EER over all volunteers for n = 1 and n = 3 turned out to be only 2.1% and
0.7%, respectively. These results show that GEAT is very resilient to shoulder surfing attack.
Figure 6.16(b) shows the bar plots of EER for the 10 volunteers in second set for n = 1, 3.
175

6.9.5

Comparison with Existing Schemes

We compared the performance of GEAT with the only work in this direction reported in
[75] where Luca et al. used the following four gestures: swipe left with one finger, swipe
down with one finger, swipe down with two fingers, and swipe diagonally up from bottom
left of the screen to top right. The highest FPR, when TPR= 93%, that they achieved is
43%, which is way higher than our average FPR of 4.77% at TPR of 95.23%. For a fair
comparison, we also collected data for these 4 gestures from 45 volunteers and calculated the
value of FPR at the TPRs reported in [75]. Table 6.2 reports the FPR achieved by GEAT
and the scheme in [75]. We see that the FPRs of GEAT on these gestures are at least 4.66
times lesser than the corresponding FPRs in [75] for the TPRs used in [75]. We do not use
these 4 gestures because their average EERs are larger compared to the average EERs of the
10 gestures we have proposed.
Table 6.2 Comparison of GEAT with [75]
TPR
Swipe
Swipe
Swipe
Swipe

6.10

left
down–1 finger
down–2 fingers
diagonal

85.11
95.71
89.58
90.71

FPR
Luca et al. [75]
48
50
63
43

GEAT
5.12
10.71
8.12
8.01

Conclusions

In this chapter, we propose a gesture based user authentication scheme for the secure unlocking of touch screen devices. Compared with existing passwords/PINs/ patterns based
schemes, GEAT improves both the security and usability of such devices because it is not
vulnerable to shoulder surfing attacks and smudge attacks and at the same time gestures
are easier to input than passwords and PINs. Our scheme GEAT builds single-class classifiers using only training samples from legitimate users. We identified seven types of features
176

(namely velocity magnitude, device acceleration, stroke time, inter-stroke time, stroke displacement magnitude, stroke displacement direction, and velocity direction). We proposed
algorithms to model multiple behaviors of a user in performing each gesture. We implemented GEAT on real smart phones and conducted real-world experiments. Experimental
results show that GEAT achieves an average equal error rate of 0.5% with 3 gestures using
only 25 training samples.

177

7

Software Security

7.1

Introduction

In computer software, a vulnerability is a loophole in the software code that enables an
attacker to circumvent the deployed security measures [99]. Each software vulnerability
has a life cycle that consists of distinct phases characterized by the events of its discovery,
disclosure, exploitation, and patching. Each phase has a certain level of risk associated with
it. The first phase of the life cycle of a vulnerability starts when it is discovered by the
vendor, a hacker, or any third-party software analyst. The security risk associated with a
vulnerability is particularly high if it is first discovered by hackers. The next phase starts with
the public disclosure of the vulnerability, which can again be done by the vendor, a hacker,
or any third-party software analyst. After disclosure, the information about a vulnerability
is freely available to everyone; therefore, the level of security risk increases further because
the hacker community is active in developing and releasing zero-day exploits [27]. The aim
of the vendor is to release a patch for the vulnerability as soon as possible. It is noteworthy
that many users of the affected software do not instantly install the patch released to fix
the vulnerability. The life cycle of a vulnerability ends when all users of a software install
the patch to fix the vulnerability. A vulnerability can be exploited by hackers at any time
during its entire life cycle.
The exploratory analysis of vulnerability life cycles can uncover interesting patterns for
vendors and software products that are helpful in following ways: First, a thorough analysis

178

is helpful in the deployment of best practices in the software development processes. Second,
such analysis is useful to develop the security policies that can handle future attacks and
threats more effectively. Third, an exploratory analysis provides insights about the previous
security incidents that are helpful in their audit. Finally, it also helps customers to assess
the security risks associated with the software products of a particular vendor.
To the best of our knowledge, no previous work has been done to analyze the evolution
of life cycle of different types of vulnerabilities for different software products and vendors.
The only work in this direction was reported by Frei et al. [49, 50]. In [49], Frei et al. studied
the performance of the software industry as a whole but did not characterize the behavior
of individual vendors. In [50], the authors only compared the vulnerability handling process
of two vendors and based their analysis on a small data set. Some researchers have focused
on the modeling of vulnerability discovery process [27, 24, 92]. The goal of such work is to
estimate the number of vulnerabilities in new software products. Another direction of work
aims to study the changes in the patching behavior of vendors in response to vulnerability
disclosures and the existence of competitors [29, 28]. These studies analyze only small
vulnerability data sets and do not cover the behavior of individual vendors.
In this chapter we make following three contributions. (1) We have aggregated a large software vulnerability data set from three vulnerability repositories: (a) National Vulnerability
Database (NVD) [11], (b) Open Source Vulnerability Database (OSVDB) [16], and (c) the
vulnerability data collected by Frei et al. (FVDB) [49]. Our aggregated software vulnerability data set contains 46310 vulnerabilities since 1988 to 2011. (2) We have comprehensively
analyzed software vulnerabilities along the seven dimensions mentioned in the abstract. Our
observations are supported by statistical tests for significance. (3) To systematically analyze
patterns in our vulnerability data set, we have utilized association rule mining to extract
rules that represent exploitation behavior of hackers and the patching behavior of vendors.
The rest of the chapter is organized as: Section 7.2 explains the terminology and notations
used in the chapter and provides details about our vulnerability collection process and the

179

aggregated data set. In Section 7.3, we analyze the evolution of vulnerability disclosure rates,
access methodology for vulnerability exploitation, impact of the exploitation, risk associated
with vulnerabilities and evolution of different types of vulnerabilities. In Sections 7.4 and
7.5, we study the exploitation and patching behavior of hackers and vendors respectively. In
Section 7.6, we cross examine the exploitation behavior of hackers and the patching behavior
of vendors. In Section 7.7, we present the implications of our work followed by the related
work and conclusion.

7.2

Preliminaries

In this section, we first explain the terms and notations used in rest of the chapter and then
present the data set used for analysis.

7.2.1

Terminology and Notations

Vendor is an entity (an individual, a group of individuals, or an organization) that develops
a software product and is responsible to keep it secure. An ideal vendor would discover and
patch all the vulnerabilities in its products before they are exploited.
Hacker is an entity that releases exploits for the vulnerabilities in the software products.
Independent organization is an entity that independently discovers and discloses
vulnerabilities as well as their corresponding exploits and patches but is not involved in the
development of patches or exploits.
Disclosure Date (td ) refers to the date when information about a vulnerability is made
publicly available after establishing that the vulnerability poses a potential risk.
Patch Date (tp ) is the date when a vendor provides a solution (i.e.patch) for a vulnerability
to neutralize the threat posed by it. We consider only those patches that are released by
the corresponding vendor.
Exploit Date (te ) is the earliest date when a vulnerability is exploited. An exploit can be

180

in the form of an automatic script, a virus, a tool, or any such thing that can breach the
security of a software.
Exploit – Disclosure (ted ) is the duration (in days) between the date an exploit for a
given vulnerability was provided by hackers and the date the vulnerability was disclosed.
Patch – Disclosure (tpd ) is the duration (in days) between the date a patch for a
vulnerability was released by the vendor and the date the vulnerability was disclosed.
Patch – Exploit (tpe ) represents the duration (in days) between the dates of availability
of a patch and an exploit for a given vulnerability.
Risk Score is assigned to a vulnerability by Common Vulnerability Scoring System (CVSS)
[9] and establishes the magnitude of risk associated with that vulnerability. We divide
vulnerabilities into three categories of low, medium, and high risk severity based on their
CVSS scores.
Access Vector (AV ∈ {Local, Adjacent Network, Network}) indicates if local or network
access to the hardware is required to exploit the vulnerability.
Access Complexity (AC ∈ {Low, Medium, High}) is a measure of the complexity of the
attack required to exploit the vulnerability.
Integrity Impact (Ii ∈ {None, Partial, Complete}) measures the potential impact of a
successfully exploited vulnerability on the integrity of the system. Integrity refers to the
trustworthiness of information.

7.2.2

Data Set

In this section, we provide details of our data aggregation process and the basic statistics
of the data. We provide details about the selection criteria of vendors and products for our
study. We have collected vulnerability information from three sources: (1) NVD [11], (2)
OSVDB [16], and (3) FVDB [49].

181

7.2.2.1

Data Aggregation

NVD and FVDB identify each vulnerability with Common Vulnerability and Exposures
Identifier (CVE-ID) [6]. OSVDB also provides CVE-IDs of about 70% of vulnerabilities. We
leverage the CVE-IDs to aggregate the vulnerability data from the three sources. We take
CVSS scores, CVSS vectors, vendor and product names, text description, and disclosure
dates from NVD. From OSVDB and FVDB, we take disclosure dates, exploit dates, and
patch dates.
The total number of vulnerabilities in our aggregate data set are 46310 and the number
of vulnerabilities for which disclosure dates, patch dates, and exploit dates are available are
46310, 9667, and 15456 respectively. We do not have exploit dates and patch dates for all
the vulnerabilities in our aggregate data set. Due to the shear size of the data set, it is not
feasible to find them manually. To systematically conduct our study, we divide our aggregate
data set into following three subsets:
ED-subset consists of 15456 vulnerabilities and contains those vulnerabilities for which both
exploit and disclosure dates are known. PD-subset consists of 9667 vulnerabilities and contains those vulnerabilities for which we have both patch and disclosure dates. PE-subset
consists of 1424 vulnerabilities and contains those vulnerabilities for which both patch and
exploit dates are known.
7.2.2.2

Selection of Vendors and Products

The aggregate data set contains vulnerabilities from more than 11 thousand vendors and
over 17 thousand software products. Figure 7.2 plots the number of vulnerabilities of each
vendor in the descending order. It can be seen that over 95% of the vendors have less than 10
vulnerabilities. Therefore, to make statistically sound observations, we focus our attention
only on the top 8 vendors each of which has at least 500 vulnerabilities. For our study, we
select Microsoft, Apple, Sun, Oracle, Linux1, Mozilla, Red Hat, and Google. We also study
1

Linux is not a vendor. It only represents the vulnerabilities in Linux kernel.

182

30000

500

20000

250

10000
0
1990
1991
1992
1993
1994
1995
1997
1998
1999
2000
2001
2002
2004
2005
2006
2007
2008
2009
2011
2012

0

Year

(a) Vulnerability

Year

Low Complexity
Medium Complexity
High Complexity

Year

disclosure (b) Access Vector Evolution (c) Access Complexity Evolu-

trend

tion
12

None
Partial
Complete

25000
Diffe
erence between intrra!clustter
disttance o
of conse
ecutive clusterrs

100
90
80
70
60
50
40
30
20
10
0

100
90
80
70
60
50
40
30
20
10
0

1990
1991
1992
1994
1995
1996
1998
1999
2000
2002
2003
2004
2006
2007
2008
2010

40000

750

Local Access
Adjacent Network
Network

Acccess Complexity

1000

100
90
80
70
60
50
40
30
20
10
0

1990
1991
1992
1994
1995
1996
1998
1999
2000
2002
2003
2004
2006
2007
2008
2010

50000

Access Vector

60000

Monthly Disclosures
Cummulative Disclosures

1250

Cummulative disclosed vulnerabilities

Monthly vulnerability disclosures

1500

ntegrity Impact
In

10

CVSS Scores

8
6
4
2
0

20000
15000
10000
5000

1990
1991
1992
1994
1995
1996
1998
1999
2000
2002
2003
2004
2006
2007
2008
2010

0
1

2

3

4

Year

5 6 7 8 9 10 11 12 13
Number of Clusters

(d) Integrity Impact Evolu- (e) Boxplots of the CVSS (f) Difference between intration

scores of selected vendors

cluster dissimilarity of consecutive clusters

Figure 7.1 Vulnerability trends in the data set
popular software products of these vendors that include Internet Explorer, Safari, Firefox,
Chrome, Windows, MAC OS X, Solaris, and several Linux based operating systems.

7.3

General Vulnerability Analysis

In this section, we study the trends in vulnerability disclosure and CVSS-vector metrics
(i.e., access vector, access complexity, and integrity impact) over the past 2 decades. We
also categorize the vulnerabilities into groups and study their evolution.

183

4

Number of vulnerabilities

10

3

10

2

10

1

10

0

10

0

20
40
60
80
Percentage of vendors (total: 12482)

100

Figure 7.2 # of vulnerabilities for each vendor (in descending order

7.3.1

Vulnerability Disclosure Trend

The rate of vulnerability disclosures experienced an exponential growth since 1997 and lasted
till 2006 as can be seen in Figure 7.1(a). The vertical lines in the figure show the number
of vulnerabilities disclosed every month since January 1990 and the dashed line shows the
cumulative number of vulnerabilities. The number of vulnerability disclosures has not been
increasing since 2006. In fact, on average, the number of vulnerabilities being disclosed every
month have been decreasing since 2008 despite the ever increasing use of software products.

7.3.2

Evolution of CVSS-Vector Metrics

Figures 7.1(b) to 7.1(d) show the evolution of three metrics of CVSS-vector. For each metric,
we have calculated the percentage of vulnerabilities corresponding to each of its three values
for every month since January 1990. We observe from Figure 7.1(b) that the percentage
of remotely exploitable vulnerabilities has been increasing since 1998. The fact that most
computer systems are connected to Internet has made it possible for hackers to exploit these
systems remotely. Figure 7.1(c) shows the change in access complexity of vulnerabilities over
the years. We observe that the percentage of low complexity vulnerabilities has decreased
over time indicating that the hackers have to use more sophisticated techniques to exploit
new vulnerabilities. From Figure 7.1(d), we also observe a reduction in the percentage of
vulnerabilities having complete integrity impact.

184

7.3.3

General Trend of CVSS Score for Short-listed Vendors

Recall from Section 7.2 that every vulnerability has an associated risk quantified by CVSS
score. Figure 7.1(e) shows the box plots of CVSS scores for vulnerabilities in the products
of the selected vendors. We note that CVSS scores of most vulnerabilities in our study lie in
medium to high range. The median CVSS scores for closed-source vendors are greater than
the median scores for open-source vendors.

7.3.4

Evolution of Types of Vulnerabilities

To determine the prevalent types of vulnerabilities and to study their evolution, we utilize
unsupervised k-means clustering to group different types of vulnerabilities. We leverage
the text information provided by NVD and OSVDB for each vulnerability to cluster them
into groups of distinct types. We extracted the keywords from the text description of each
vulnerability that characterize its functionality and used them as features to cluster all the
vulnerabilities into groups. Some example keywords include denial, service, buffer, injection
etc.We had a total of 608 relevant keywords.
It is well known that k-means clustering algorithm is well suited for large data sets with
large number of attributes. To set an appropriate value of k in k-means algorithm, we used
Euclidean distance as the intra-cluster dissimilarity metric due to the binary nature of the
attributes [119]. Figure 7.1(f) shows the difference in the intra-cluster dissimilarity between
consecutive clusters. It can be seen that the distance decreases as the number of clusters
increases for lower values of k. The bar above any value x in Figure 7.1(f) represents the
difference between intra-cluster distances of x and x + 1 clusters. Note that increasing the
number of clusters to 8 increases the intra-cluster distance (the bar above 6 is smaller than
that above 7). Therefore, the optimum value of k is 7. For statistical rigor, we repeated
k-means clustering algorithm 20 times with different seeds for each value of k. The coefficient
of variation in each case was less than 0.05 which shows the statistical significance of results.
We analyzed the centroids of clusters to determine their dominant keywords. Table 7.1
185

tabulates dominant keywords for each centroid. From the observed keywords, we label
the vulnerability clusters as PHP vulnerabilities (PHP), executable code (EXE), denial of
service (DoS), buffer overflow (BO), SQL injection (SQL), cross-site scripting (XSS), and
miscellaneous vulnerabilities (Misc).
Number of vulnerabilities of each type

1600

PHP
Exe
DoS
BO
SQL
XSS

1400
1200
1000
800
600
400
200
0
'99

'00

'01

'02

'03

'04
'05
Years

'06

'07

'08

'09

'10

'11

Figure 7.3 Evolution of vulnerability clusters over the years
Figure 7.3 shows the number of vulnerabilities belonging to each cluster disclosed since
1999.
Only BO, DoS, and EXE vulnerabilities were prevalent till 2001. These types of vulnerabilities constitute a major portion of software vulnerabilities even today which indicates
that the vendors have not been able to devise effective strategies to limit these types of
vulnerabilities. Since 2002, we observe an increase in the XSS vulnerabilities, which peak in
2006. PHP vulnerabilities were prevalent in 2006 and 2007 and SQL vulnerabilities became
dominant since 2005. These trends highlight the shift in focus of hackers to exploit new

Table 7.1 Results of vulnerability clustering
C#
1
2
3
4
5
6
7

Keywords

Label

Size

php, parameter, execute, file, code, url
–
execute, code
service, denial
buffer, execute, code, overflow
injection, sql, execute, commands
cross, scripting, site, script, html, inject

PHP
MISC
EXE
DoS
BO
SQL
XSS

8.32%
36.6%
7.25%
14.2%
10.2%
11.2%
12.3%

186

services as they become popular.
In the sections that follow, we present the behavior of hackers and vendors towards vulnerabilities.

7.4

Exploitation Behavior

In this section, we study the behavior of hackers in releasing exploits for vulnerabilities. For
this, we analyze trends in ted values of vulnerabilities. The analysis presented in this section
is done on ED-subset. We study three ranges of ted values.
ted < 0 shows that an exploit for a given vulnerability was released before its public disclosure. The vulnerabilities falling in this range represent a big threat to the security of
end-users as the vendor could be oblivious about them. A total of 2.8% software vulnerabilities fall into this range.
ted = 0 refers to the case when an exploit for a given vulnerability was released on the
day it was disclosed. The exploits corresponding to such vulnerabilities are called zero-day
exploits. In our ED-subset, a total of 88.2% vulnerabilities have zero-day exploits.
ted > 0 means that the exploit for a vulnerability was released after its public disclosure.
The vulnerabilities for which ted > 0 represent the case where a vulnerability is disclosed by
the vendor or an independent organization and the hackers used this information to release
an exploit in more than a day. 9.7% vulnerabilities fall in this range. To do more detailed
analysis, we subdivide this range into three parts: (1) 0 < ted ≤ 7 gives us the percentage
of exploits released within a week of disclosure, (2) 7 < ted ≤ 30 gives us the percentage of
exploits released after a week and within a month of disclosure, and (3) ted > 30 gives us
the percentage of exploits released a month after the disclosure.

187

7.4.1

Evolution of Exploitation

To extract and construe the dominant trends, we first divided the vulnerabilities in EDsubset into groups where each group contains vulnerabilities disclosed in one distinct year.
Then we subdivided the vulnerabilities in each group into five subgroups corresponding to
the five ranges of ted . We then calculated the percentage of vulnerabilities in each subgroup
(called the percentage size of the subgroup) in its respective group and plotted the results in
Figure 7.4 in the form of stacked bars where each bar corresponds to the group of vulnerabilities disclosed each year and each block in every bar represents the percentage size of the
corresponding subgroup in its respective group. The number inside each block is the value
of the percentage size of the corresponding subgroup. The number at the top of each bar
represents the total number of vulnerabilities in the corresponding group. All figures in rest

100

43

80

156 243 291 619 483 1471 2215 3022 1982 2782 1400 612
4
4
7
4
4
6
8
9
4
9
8
15

44

Percentage off Exploited vulnerabilities

Percentage of
o Exploited vulnerabilities

of the chapter have been made using similar methodology.
5

.
> 30 days

60
91

94

40

93

88

86

80
71

86

85

86

98

97

+30 days
89

91

+7 days
0 day

20
0

< 0 days
5

4

4

6

'98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11
Years

Figure 7.4 Yearly change in exploitation be-

100

67
4

80

298
8
7

418
7

346

106

13

62
8

8
14

17

.
> 30 days

19

60
94

40

97

76

+30 days
85

58

+7 days
0 dayy

45

< 0 days

20
10

0
PHP

EXE

7

9

DoS
BO
Vulnerability Type

SQL

XSS

Figure 7.5 Exploitation trend in clusters

havior for different ted ranges
It can be seen from Figure 7.4 that majority of vulnerabilities have always been exploited
on their disclosure dates (having ted = 0). Till 2004, the percentage size of the subgroup of
ted < 0 was non-negligible which shows that the hackers were finding a significant number
of vulnerabilities themselves and exploiting them. At the same time we observe a decrease
in the percentage size of the subgroup of ted = 0. This does not mean that hackers were
getting sluggish because we also observe a significant increase in the total number of exploited
vulnerabilities. Since 2004, although we observe a decrease in the percentage size of the
188

subgroup of ted < 0, an increasing trend in the percentage size of subgroup of ted = 0 still
shows that the hackers are becoming more and more active.

7.4.2

Exploitation of Types of Vulnerability

We now see the exploitation of different types of vulnerabilities. Figure 7.5 has been made
in the same way as Figure 7.4 except that now the groups are the types of vulnerabilities. It
can be seen that over 80% of vulnerabilities of each type (except BO and EXE) are exploited
on or before the day of disclosure. In case of BO and EXE, a significant percentage of
vulnerabilities is exploited several weeks after the disclosure. According to our data set, 79%
of BO and EXE pose high risk and only 7% have high access complexity, so intuitively, they
should attract more attention from hackers. The total number of exploited vulnerabilities
of these two types are large which justifies the intuition.

7.4.3

Exploitation Trend for Vendors and Products

We study the behavior of hackers in exploiting the vulnerabilities for different vendors and
their respective products. Figures 7.6 and 7.7 show the exploit data for the selected vendors
and products respectively. These figures have been made for vendors and products in the

100
80

602
5
6
13

235
5
5
11

122

85

76

10
7
11

11

14
4
13

11
18

127
4
23

79

Perce
entage o
of Explo
oited vullnerabiliities
(Products)

Percentage of
o Exploited vulnerabilities
(Vendors)

same way as Figure 7.5 was made for vulnerability types.
100

10
8
19

.
> 30 days

60

+30 days
40

70

76

70

61

+7 days

62
58

58

< 0 days

20
0

0 day

6

3

Microsoft Apple

2

4

8

Sun

Oracle

Linux

10

Mozilla

5

Redhat

80

127

116

10

10

12

13

21

21

60

121
5
12

48

48

9

8

50

76

30

16

10
3

37
8
5

15

4
12

14
4
13

20

16

192
13

41
7
7
5

76
5
24

.
> 30 days

78

40

72
6
6

73
71

62

61

63

65

3

5

+30 days
y
76

57

0 dayy

20
0

+7 days
< 0 days

2

3

6

8

Win Win OS X OS X Sol! Lnx Entp RH
XP 2000
Srvr aris Krnl Lnx Lnx

8

Int
Exp

5

13

Saf! Fire!
ari fox

Figure 7.6 Exploited vulnerabilities for ven-

Figure 7.7 Exploited vulnerabilities for prod-

dors relative to disclosure dates

ucts relative to disclosure dates

189

Lets first compare the vulnerability exploitation in open vs. closed-source vendors. In
comparison to closed-source vendors, for open-source vendors e.g., Linux, Red Hat etc.,
comparatively smaller percentage of vulnerabilities is exploited till the day of disclosure
while a larger percentage of vulnerabilities is exploited before the disclosure. To generate
statistically significant conclusion from these two conflicting observations, we do statistical
hypothesis testing.
As our samples for open-source and closed-source vendors contain large number of data
points, therefore, the most appropriate statistical test for this scenario (and all the subsequent scenarios) is the standard one-tailed t-test. t-test is considered to be the most appropriate when the number of data points in the samples are large (typically > 50) regardless
of the distributions they come from.
To remove any bias in testing, we state the null hypothesis as: the mean value of ted for
open-source vendors, µted (O), is equal to the mean value for closed source vendors, µted (C).
The alternative hypothesis is: µted (C) is greater than µted (O). We apply the right tailed
t-test to the null hypothesis. If the null hypothesis is rejected, it would be statistically sound
to claim that the average time to exploit a vulnerability in closed-source software is larger
compared to open-source software. We give a general equation for hypothesis testing that
will be used for all the subsequent tests:
H0 : µA (X) = µB (Y )
H1 : µA (X) > µB (Y )

(7.1)

where X = C represents closed-source vendors, Y = O represents open-source vendors,
and A = B = ted represents that the data points of ted are being considered. We do the
hypothesis testing for a 95% confidence interval i.e., α = 0.05. Our test resulted in a p-value
of 0.003 which is much smaller than α, thus we reject H0 to accept H1 . Therefore, it is
statistically sound to state that the exploitation of vulnerabilities in closed-source software
is slower compared to open-source software.

190

Figure 7.6 shows that hackers release most exploits till the disclosure dates for Microsoft
and Apple. This is primarily because hackers find it more rewarding to exploit these products
due to their wider market capitalization. For the selected products, we see the similar trend
in Figure 7.7 as for vendors in Figure 7.6 except for Windows. The percentage of exploited
vulnerabilities for Windows till disclosure date is lesser as compared to OS X but at the same
time the percentage of exploited vulnerabilities for Windows before disclosure is greater than
that for OS X. In fact, the mean value of ted for Windows is negative while that for OS X is
positive. The t-test with X = “OS X”, Y = “Windows”, and A = B = ted yields p = 0.031
proving that the exploitation in Windows is quicker compared to OS X.
Among web browsers, Firefox has the smallest percentage of vulnerabilities exploited till
disclosure date compared to Internet Explorer and Safari but at the same time has the
highest percentage of vulnerabilities exploited before the disclosure. The t-test with X =
“Safari” and Y = “Internet Explorer” yields p = 0.05 showing that exploitation in Internet
Explorer is quicker compared to Safari. The t-test with X = “Safari” and Y = “Firefox”
yields p = 0.09, and therefore, fails to reject the null hypothesis.

7.4.4

Exploitation Behavior: CVSS Scores

Recall from Section 7.2 that each vulnerability is assigned a CVSS score depending upon the
level of risk associated with it. Based on CVSS scores, we divide vulnerabilities into three
categories. Low: 0 ≤ CVSS Score < 4; Medium: 4 ≤ CVSS Score < 7; High: 7 ≤ CVSS
Score ≤ 10. Figure 7.8 has been generated in the same way as Figure 7.6 except that we
plotted the vulnerabilities belonging to low, medium, and high categories separately. The
white lines with round markers represent the percentage of total vulnerabilities belonging to
low, medium, or high categories.
It is intuitive to think that hackers would be less interested in exploiting low risk vulnerabilities because such vulnerabilities usually cause lesser damage. This is exactly what the
markers for low risk vulnerabilities show in Figure 7.8. The bars in Figure 7.8 show that the

191

Percentage of
o Exploited vulnerabilities
(CVSS
(
Scores)

100

5 4 7
11 11 10

80

4 6
12 6
9 15

15

22

4
6 16
10
8
10

6
13 8
19
13
13

20 15

60
40

5
19

17
26

10 13 8
10 11
30

62

85 78

63
72

78 78

63

4869

50 63
50

7 6
L MH
Microsoft

13

4
L MH
Apple

L MH
Sun

4

L MH
Oracle

16

58

63

+30 days

6

L MH
Linux

71

63
50

19

> 30 days

26

15
81 76

20
0

12 1616
12 9
12 21

+7 days

55
0 day

9 9

8

L MH

L MH

< 0 days

Mozilla Redhat

Figure 7.8 Exploited vulnerabilities for different CVSS scores
percentage of medium risk vulnerabilities for which exploits are released till the disclosure
date is greater than that for high risk vulnerabilities for all closed-source vendors and some
open-source vendors.

7.4.5

Interesting Exploitation Rules

Now we present some interesting association rules about the exploitation behavior in the
products of the short-listed vendors. We used implementation of Apriori association rule
mining algorithm in WEKA to extract the rules with confidence greater than 95% [23, 124].
For association rule mining, we used following 7 attributes of each vulnerability: Vendor
Name <vnd>, Product Name <prd>, Vulnerability Type <typ>, Severity <sev>, ted , tpd ,
and tpe . For the rules presented in this section, we used ted as class attribute.
We found that in case of Microsoft, majority of vulnerabilities including DoS, XSS, and
BO are exploited on the day they are disclosed. One such rule obtained from association
rule mining is: vnd=Microsoft typ=XSS sev=H → ted =0-day.
In case of Apple, the vulnerabilities are exploited on or before their disclosure date. For
example, as shown in the following rule, vulnerabilities in Safari browser are mostly exploited
on the day of disclosure: vnd=Apple prod=Safari typ=BO sev=H → ted =0-day.
For Solaris, association rules show that high risk vulnerabilities are exploited on the day
of disclosure while medium risk vulnerabilities are mostly exploited within a week after their
disclosure. The latter trend is shown by the following rule: vnd=Sun prod=Solaris sev=M
192

→ 0<ted ≤ +1 week.
For Mozilla, we get interesting rules showing that hackers do not exploit a vulnerability
that has already been patched while they quickly exploit those that have not been patched.
Two rules stating this observation are: (1) vnd=Mozilla prod=Firefox typ=BO tpd =0-day
→ ted > +1 month, (2) vnd = Mozilla prod=Firefox typ=BO +1 week <tpd ≤+1 month →
ted =0-day.

7.5

Patching Behavior

Now we study the behavior of vendors in providing patches for vulnerabilities in their products. For this, we study the trends in tpd values of vulnerabilities. The analysis presented in
this section is based upon PD-subset. The three ranges for tpd that we study are described

100
80

19
32

71
7

13

5

0

10

21

61

50

31

17

14

29

28

27

41

16

9

13

16

10

12

12

30

17

16

4
6

4

9

12

47
10

30

5

16

40
20

13

14
8

60

212 240 399 336 507 762 854 867 883 1624 2429 463

.

14

> 30 days

21
66

22

80

13

84

89

31

31

36

7

6

0 day
< 0 days
21

11

11

+30 days
+7 days

54
34

Percentage of Patched vulnerabilities

Percentage
e of patched vulnerabilities

below.

6

12

7

7

'98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11
Years

100
80

11
27

927
23

15

171
6
4
9
.

26

67

73

40

34

> 30 days
+30 days

45

20

1025
8
6
8

9
6

60

26

68

52

18
0

< 0 days

29
6
EXE

10

+7 days
0 dayy

9

PHP

Figure 7.9 Yearly change in the patching be-

1563
7
7
6

12

11

DoS
BO
Vulnerability Type

SQL

XSS

Figure 7.10 Patching trend in clusters

havior for different tpd ranges

tpd < 0 shows that the patch for a given vulnerability was released before its public disclosure. A total of 10.1% vulnerabilities have tpd < 0 which is greater than the corresponding
value for ted < 0. One possible reason is that the independent organizations inform the
vendors about the vulnerabilities they discover and give them a reasonable time to release a
patch before disclosing the vulnerabilities.
tpd = 0 means that the patch for a vulnerability was released on the disclosure day. Such

193

patches provide zero-day protection against exploitation. In our data set, zero-day patches
are provided for 62.2% of the vulnerabilities.
tpd > 0 refers to the case where the patch for a given vulnerability was released after its
public disclosure. In our PD-subset, 27.7% of all the vulnerabilities are patched more than
a day after their disclosures. We further subdivide the range tpd > 0 into the same three
parts as in Section 7.4.
The t-test with A = tpd , B = ted , and X = Y = “aggregate data set” yields p ≈ 0 which
leads us to accepting the alternative hypothesis that, compared to hackers, vendors take
more time on average to patch a vulnerability (considering disclosure date as reference).

7.5.1

Evolution of Patching Behavior

In Figure 7.9 we observe that till 2005, the percentage of vulnerabilities patched on or
before disclosure dates consistently decreased. Keeping in view the fact that independent
organizations inform the vendors about vulnerabilities well before disclosing them, such a
poor patching behavior of vendors indicates that security was not a major concern for vendors
at that time. However, we see a significant improvement after 2005. Since 2008, vendors
have been providing patches for more than 80% of total vulnerabilities till their disclosure
dates. A possible reason for this can be that it has become more common to not report

Percentage of Patched
P
vulnerabilities
(V
Vendors)

100
80

1530
12
5
3

998
6
6
5

298
8
2
4

666

325

392

29

78

64

13

80

10

.
16
12

17

78

20
0

94

5

Microsoft Apple

7

4

10

Sun

Oracle

Linux

+30 days

40

+7 days
55

0 day

27

19
Mozilla

2
Redhat

374

10

9

4

5

529
6
7
8

390
5
6
7

73
8
5
10

325

118

111

334

20

22

8

5
4

8
32
45

324

169

16
11

.

11

> 30 days

19
82

80

74

80

59

17

8

22

96

86
64

13

5

10

8

0 dayy
< 0 days

36

16

+30 days
y
+7 days

58

31
18

0

172

14

20

< 0 days
5

386

60

> 30 days

16
4

100

27
45

60
76

175

16

14

40

279

Perrcentage
e of Patcched vullnerabiliities
(Prod
ducts)

vulnerabilities publicly, rather, the vendors “pay” for vulnerabilities.

4

5

Win Win OS X OS X Sol! Lnx Entp RH Int Saf! Fire! Chr!
XP 2000
S
Srvr
aris
i K
Krnll Lnx
L
LLnx EExp arii ffox ome

Google

Figure 7.11 Patched vulnerabilities for ven-

Figure 7.12 Patched vulnerabilities for prod-

dors relative to disclosure dates

ucts relative to disclosure dates

194

7.5.2

Patching of Types of Vulnerabilities

From Figure 7.10, we can note that the vendors are generally slower in patching the PHP
and SQL vulnerabilities. Recall from Section 7.4.2 that hackers tend to quickly exploit these
groups of vulnerabilities. On the other hand, the vendors are quicker in patching the EXE
and BO vulnerabilities because these vulnerabilities are quickly exploited and thus pose high
security risk.

7.5.3

Patching Trend for Vendors and Products

Here we study the behavior of the selected vendors in patching the vulnerabilities in their
products. Figures 7.11 and 7.12 show the patch data for selected vendors and products.
Closed-source vendors are typically profit based organizations and have more resources
to secure their products as compared to open-source vendors. Therefore, we expect better
patching behavior from closed-source vendors. Figure 7.11 confirms this intuition as Microsoft, Apple, and Oracle release patches for about 70% or more of all the vulnerabilities on
or before disclosure dates. In comparison, we observe significantly smaller percentages and
quantity of patched vulnerabilities for open-source vendors. Applying the t-test with X = O,
Y = C, and A = B = tpd , we obtained p ≈ 0 which statistically justifies the observation
that open source-vendors are slower in patching as compared to closed-source vendors.
We see the similar trend for the selected products in Figure 7.12 as for vendors in Figure
7.11. We also see that over 85% of the vulnerabilities in Windows are patched on or before the
disclosure dates. If we compare Figure 7.12 with Figure 7.7, we observe that the percentage
of zero-day patches for Windows is greater than the percentage of zero-day exploits.
Among web browsers, Figure 7.12 shows that Google Chrome is the fastest patched web
browser followed by Apple’s Safari. t-test with Y = Chrome and X = (Internet Explorer,
Safari, Firefox) respectively yields p = (0, 0.024, 0) confirming that our observation about
Chrome from Figure 7.12 is statistically significant. t-test with Y = Safari and X = (Internet
Explorer, Firefox) yields p = (0.009, 0.078) confirming Safari is patched quicker compared to
195

Internet Explorer but the test fails to reject the null hypothesis of Safari against Firefox.

7.5.4

Patching Behavior: CVSS Scores

One would expect the vendors to be quicker in patching the medium and high risk vulnerabilities compared to low risk vulnerabilities. This is exactly what we observe in Figure
7.13. Open-source vendors are slower as compared to closed-source vendors for vulnerabilities
belonging to all risk categories.
Percentage of Patched vulnerabilities
(CVSS Scores)

100
80
60
40

6 11 10 6 6
5 7
1116 5
14 6

77 78
70

79
64 79

12 6
7

1812
28 29 7 312626
1316
5050
52
101618
18 1211

1313

0

L MH

12 6

L MH

Microsoft Apple

> 30 days
+30 days

100
88
78
82
75

121014 100 96
11 21
89
1815
62
29 22
52
1030
31
37 17
29
14
8
21
18
16
1112 9 6
4
3 2
3 7 7

20
6

5

8 8

L MH
Sun

L MH

L MH

Oracle

Linux

L MH

L MH

+7 days
0 day
< 0 days

L MH

Mozilla Redhat Google

Figure 7.13 Patched vulnerabilities for different CVSS scores

7.5.5

Interesting Patch Rules

We present some association rules about the patching behavior of the vendors extracted
using tpd as class attribute.
Microsoft is quicker in patching vulnerabilities in Windows as compared to its remaining
products. The following two rules show this: (1) vnd=Microsoft prod=Windows XP typ=BO
→ tpd =0-day, (2) vnd=Microsoft prod=Internet Explorer typ=BO → tpd >+1 month.
Apple also patches vulnerabilities in its operating systems as soon as they are disclosed.
The following rule highlights this trend: vnd=Apple prod=MAC OS typ=BO → tpd =0-day.
Following rule shows that Apple generally takes about a week to fix DoS vulnerabilities even
if they are exploited on the day they are disclosed: vnd=Apple prod=MAC OS typ=DoS →
0<tpd ≤+1 week. Other rules show that Apple takes about a month after disclosure to patch
196

the EXE and PHP vulnerabilities although they are always exploited before the patch is
released and are prevalent types of vulnerabilities.
Sun is quicker in patching all kinds of vulnerabilities except XSS. Sun fixes DoS vulnerabilities before their disclosure which is a better performance as compared to Microsoft and
Apple.
For Mozilla, BO and EXE vulnerabilities are mostly patched till the day of disclosure;
however, SQL vulnerabilities are not patched for months. Following rules state this: (1)
vnd=Mozilla prod=Seamonkey typ=BO sev=M → tpd =0-day, (2) vnd= Mozilla prod=
Firefox typ=SQL sev=H → tpd >+1 month.

7.6

Patching vs. Exploitation

In this section, we compare the quickness of vendors with hackers. We study the trends in
tpe values of vulnerabilities present in the PE-subset.
tpe < 0 shows that a vulnerability was patched before its exploitation irrespective of
whether or not it was disclosed. The inherent time-lag between the release of patches by
vendors and their installation by end-users motivates the hackers to write exploits for vulnerabilities even after corresponding patches have been released. In our PE-subset, 31.7%
of all the vulnerabilities fall in this range.
tpe = 0 means that a given vulnerability was exploited on the day its patch was released.
21.8% of the vulnerabilities fall in this range.
tpe > 0 shows that an exploit for a given vulnerability was released before the vendor
patched it. A total of 46.4% of vulnerabilities have tpe > 0. The larger percentage of tpe > 0
compared to tpe < 0 indicates that hackers have generally been quicker in exploiting the
vulnerabilities as compared to vendors in patching. This observation affirms the result of
the first t-test presented in Section 7.5.

197

7.6.1

Patching vs. Exploitation: Over the Years

From Figure 7.14 we can see the same behavior as observed in Section 7.5.1: patching
response of vendors was poor till around 2005 and a large percentage of vulnerabilities was
being exploited before being patched. In 2006, the situation was so bad that the patches for
about 38% of the vulnerabilities were released more than a month after their exploitation.
However, after 2007 a significant improvement can be observed in the vendor response. It is
encouraging to see that since 2008, over 70% of all the vulnerabilities have been patched on
or before the release date of their exploits. From the discussion in this section and Sections
7.4.1 and 7.5.1, we can conclude that the security state of the software industry has been

100
80

5

34
9

40

60
40

9
11

67
40

20
20

22

50 100 111 204 228 139 124 143
5
8
22
24 30 29
25
33
38
17
24
10
18
16
8 14
10
18
22
12
15
13
14
26
12
6
8
30
19 22
13 17
18 18
47
38
26 28 23 24 27 31 29

127 136 13
6 10
4 23
17
14
8

100
Percen
ntage of vulnerabilities
for tpe ranges (Vendors)

Percentage of vulnerabilities for different
tpe ranges

improving for the last 3 years.

.
> 30 days

35

31

+30 days
+7 days
69

39 41

0 day
< 0 days

80

297

31

60

9
5

40

24

125
26

25
4
4
12

44
14

35

30

58

52
12
19
15

28

10
28

> 30 days
+7 days
0 day

22
4

.
+30 days

19

13

36

31

69

15

20

0

40

16

19

22

20

43
12
5
9

35

< 0 days

16

0

'98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11
Years

Microsoft Apple

Sun

Oracle

Linux

Mozilla

Redhat

Figure 7.14 Yearly change in patching vs.

Figure 7.15 Patched vulnerabilities for ven-

exploitation trend for tpe

dors relative to exploit dates

7.6.2

Patching vs. Exploitation: Vendors and Products

It can be seen from Figure 7.15 that for all vendors except Oracle and Sun, the percentage size
of the subgroups corresponding to tpe > 0 is greater than that for tpe < 0. The magnitude of
the difference between the percentage sizes of tpe < 0 and tpe > 0 can serve as a measure to
gauge the agility of the vendors in reference to hackers. We can see that among the vendors,
only Oracle and Sun are faster than hackers, whereas hackers are, on average, faster than all
other vendors. From Figure 7.16 we can see that, compared to hackers, Microsoft and Sun

198

are quicker for Windows and Solaris respectively.

Percen
ntage of vulneraabilities
for tpe range
es (Products)

29

80

82

82

53

10

20

22

21

20

16
17

60

17

23
22

35

6
5

40

17

13

22

15

9

48

20

55

27

30

10

18

21

51

100
27
50

.

60

> 30 days
33
17

26

50
28

15

33

13

40

88

22

30
15

18
6
6

Percenttage of vulnerabilities
for tpe ranges
r
(CVSS Scores)

84

100

33
22

9
8
16

7
7
27

+30 days
y
+7 days

14
4

0 dayy
< 0 days

22

0

11

80 22

29 33

Int
Exp

Saf! Fire!
ari fox

5

9
11 4 9
5

21

40 33 29 21

16

60

23
11

21
13

9 8
8

73 23
100

20

40

20

46
22 29

32

32

15 21

12 8
4
12 8
8
16
69

28 17

56
40

18

32 29

30 33 24

14

24

57
16

L MH

Microsoft Apple

L MH
Sun

L MH
Oracle

26

9 9
29 14
22
14
17
29
27
17
14

11 14
7 11
14
43
7 32
14

40

21 6
10
6

29

L MH

L MH

L MH

0
L MH

Win Win OS X OS X Sol! Lnx Entp RH
XP 2000
Srvr aris Krnl Lnx Lnx

26 23 28

Linux

20

14 41
5

36 35

> 30 days
+30 days
+7 days
0 dayy
< 0 days

Mozilla Redhat

Figure 7.16 Patched vulnerabilities for prod-

Figure 7.17 Patched vulns. relative to ex-

ucts relative to exploit dates

ploited vulns.: CVSS

7.6.3

Patching vs. Exploitation: CVSS Scores

From Figure 7.17, it can be seen that for Microsoft and Apple, approximately the same
percentage of vulnerabilities belonging to medium and high risk categories are patched before
the release of their exploits. However, the percentage of vulnerabilities for which tpe = 0
is generally greater for medium risk vulnerabilities as compared to high risk vulnerabilities.
It can be seen that closed-source vendors are quicker in patching the medium and high risk
vulnerabilities compared to open-source vendors.

7.7

Implications

Observations from our study have important implications in software design, development,
deployment, and management. We separately discuss them in the following text.

7.7.1

Software Design

The analysis of access requirements, functionality, and risk level of vulnerabilities presented
in Sections 7.3.2, 7.3.4, and 7.3.3 respectively, can reveal inherent flaws in software design
process for specific products and vendors. For instance, if a particular software series has
199

more than typical instances of buffer overflow vulnerabilities, then this may reflect lack of
sanity checks in socket read processes. From our data set, we observed that DoS is the most
exploited vulnerability type in Solaris accounting for 38.85% of all its vulnerabilities. At
the same time, only 11.7% of vulnerabilities in OS X involve DoS, which shows that Solaris
is more susceptible to DoS attacks compared to OS X. The observation mentioned above
implies that Solaris developers need to take additional steps to make the design more robust
to DoS attacks.

7.7.2

Code Development Practices

The analysis of vulnerability life cycles during the evolution of a given software can reveal
insights about potential flaws in its code development and testing practices. In particular,
a correlation analysis of count of vulnerabilities across different software and vendors can
highlight important differences in code development practices. For instance, we observe
in Figure 7.11 that the percentage sizes of the subgroups corresponding to tpd > 0 for
open-source vendors (Linux, Redhat) are significantly greater than those of closed-source
vendors (Microsoft, Apple). This observation highlights an important insight into the code
development practices of open-source vendors which typically rely on contributions from a
group of volunteer developers. On the other hand, closed-source vendors have dedicated
resources to fix newly disclosed vulnerabilities as soon as possible. Therefore, open-source
vendors tend to have a slower patch response compared to closed-source vendors.

7.7.3

Customer Assessment of Vendors and Products

The analysis presented in this chapter also has direct implications in product assessment,
certification, and security recommendations to consumers. Several commercial products
e.g.eEye Digital Security (http://www.eeye.com), Arellia (http://www.arellia.com/),
can leverage the presented analysis for product recommendation and design of future security policies. For example, given that the exploits of vulnerabilities have already been
200

released, our measurement analysis showed that Sun releases patches for 96% of the vulnerabilities within a month; whereas, Microsoft, Apple, and Linux provide patches for only
69%, 74%, and 65% of vulnerabilities in the same time period. Therefore, if the patch response of vendor is of prime importance to a customer, then the products from Sun should
be preferred. As another example, if a customer’s infrastructure has less tolerance for DoS
attacks, then it is more suitable to deploy Mac OS X, which has the lowest percentage of
DoS vulnerabilities compared to other operating systems. Likewise, if a customer requires
more robustness to buffer overflow attacks, then it is more suitable to deploy Solaris because
BO vulnerabilities account for about 20% of all the vulnerabilities in Windows and Mac but
only 13% in Solaris.

7.8

Related Work

The major focus of the work on large scale analysis of vulnerabilities has been on the development of vulnerability discovery models (VDMs). Some work has also been done to
understand the economic impacts of vulnerability disclosures in software. We briefly describe the work that has been done in these areas in relation to our work.

7.8.1

Large Scale Vulnerability Analysis

The work most relevant to ours was reported in [49] in which the authors presented a large
scale analysis of vulnerabilities keeping in view the discovery, disclosure, exploit, and patch
dates. They analyzed about 14000 vulnerabilities and showed that till 2006, the hackers had
been quicker than vendors. This observation is in accordance with what we presented in
this chapter but we also show that in the last three years, the response of vendors has been
improving. Their work does not differentiate between vendors and types of vulnerabilities.
In [41], authors study the life-cycle of vulnerabilities from the time a software is released
till the time the first vulnerability is discovered. They show that the time till the discovery

201

of the first vulnerability is a function of the familiarity with the system and the amount of
legacy code. In [125], the authors propose to use semantic templates to help the developers
understand the vulnerabilities and their artifacts. This work only focuses on understanding
the technical details of a disclosed vulnerability and does not study any large scale trend in
vulnerabilities.

7.8.2

Studies on Disclosure and Patching

In [26], authors have studied the economic aspects of the quickness of vendors in releasing
patches for Internet based vulnerabilities. In [118], authors show that on average a vendor
loses 0.6% of the stock price with the disclosure of a vulnerability. In [28], authors show
that a vendor with more competitors patches the vulnerabilities more quickly. In [29], they
show that the vulnerability disclosure accelerates the patch release. Although their work
is based upon a small data set of just 354 vulnerabilities disclosed till 2003, they make
similar observation as ours that the closed-source vendors are quicker in patching the disclosed vulnerabilities. These studies, however, do not develop any insight into understanding
individual behaviors of vendors and hackers.
In [98], using a small data set, authors make a claim that there is no difference between the
patching behavior of open and closed-source vendors. They make this observation because
they only consider the percentage of patched vulnerabilities as a measure of goodness of a
vendor which is unreasonable because without analyzing the duration between disclosure
dates and patch dates, one can not determine how active a vendor is in fixing vulnerabilities
in its products.

7.8.3

Modeling and Classification

The motivation behind the work on VDMs is to enable the prediction of quantity and timing
of vulnerability discoveries in new software. Four notable VDMs have been proposed: (1)
Anderson Thermodynamic Model [27], (2) Rescorla Linear Model [92], (3) Rescorla Expo202

nential Model [92], and (4) Alhazmi-Malaiya Logistic Model [24]. Another work focused on
modeling the time interval between disclosure date of vulnerabilities and their corresponding exploit, patch, and discovery dates [120]. A recent work extracted various features from
NVD and OSVDB and used SVM to predict whether a recently disclosed vulnerability will
be exploited within a given time or not [34]. Our focus, however, is not the prediction rather
the study of phases of vulnerability life cycle in reference to different variables along with
several aspects associated with the nature of vulnerabilities.

7.9

Conclusion

In this chapter, we presented a large scale study of various aspects associated with software
vulnerabilities during their life cycle. We aggregated a large software vulnerability data set
containing 46310 vulnerabilities disclosed till 2011. Our study showed that the number of
vulnerabilities being disclosed every year has stopped increasing since 2008. We showed
that the most primitive and most exploited form of vulnerabilities are DoS, BO, and EXE;
however, SQL, XSS, and PHP have also become significantly large. We also observed that
the percentage of remotely exploitable vulnerabilities has gradually increased to over 80% of
all the vulnerabilities. Since 2008, the vendors have been becoming more agile in patching
the vulnerabilities and the access complexity of vulnerabilities has been increasing. However,
even then, the average time taken by hackers to exploit a vulnerability is smaller than that
taken by the vendor. Our findings highlight that patching in closed-source software is faster
compared to open-source software and at the same time the exploitation is slower.

203

8

Conclusion

In this thesis, I presented statistical algorithms for the design, analysis, measurement, and
modeling of RFID systems, network metrics, user authentication, and software security. For
RFID systems, I first presented a new estimator, the average run size of 1s, for estimating
RFID tag population size of arbitrarily large sizes. Using analytical plots, I showed that
our estimator has much smaller variance compared to other estimators, which makes our
scheme faster than the previous ones. Our experimental results show that our estimation
scheme is significantly faster than all prior schemes. Second, I presented our new RFID
identification scheme. It represents the first effort to formulate the Tree Walking process
mathematically and proposed a method to minimize the expected number of queries and
expected identification time. The significance of this work in terms of impact lies in that the
Tree Walking protocol is a fundamental multiple access protocol and has been standardized
as an RFID tag identification protocol. Our experimental results show that TH significantly
outperforms all prior tag identification protocols, even those that are not C1G2 compliant,
for metrics such as the number of reader queries per tag, the identification speed, and the
number of responses per tag. Third, I proposed a protocol to detect missing tag events in
the presence of unexpected tags. It represents the first effort on addressing the important
and practical problem of detecting missing tags in the presence of unexpected tags. We
have proposed a technique that our protocol uses to handle large frame sizes to ensure
compliance with the C1G2 standard. Our experimental results show that our protocols
significantly outperform all prior protocols in terms of actual reliability as well as detection

204

time even though the existing protocols do not handle the presence of unexpected tags.
Fourth, I proposed an accurate and efficient per-flow latency measurement scheme that
does not require packet probing and time stamping. The key novelty of this work is that we
purposely allow noise to be introduced in recording packet timing information for minimizing
storage space and use statistical techniques to denoise the recorded information to obtain
accurate latency estimates when latency of a target flow is queried. Our theoretical analysis
and experimental results show that our scheme always achieves the required reliability. Our
scheme has a much smaller processing overhead in terms of number of hash computations
and memory updates compared to existing schemes, which further require sending probe
packets or attaching time stamps to every packet. Fifth, I proposed GEAT, a gesture based
user authentication scheme for the secure unlocking of touch screen devices. Compared
with existing passwords/PINs/ patterns based schemes, GEAT improves both the security
and usability of such devices because it is not vulnerable to shoulder surfing attacks and
smudge attacks and at the same time gestures are easier to input than passwords and PINs.
I also proposed algorithms to model multiple behaviors of a user in performing each gesture.
We implemented GEAT on real smart phones and conducted real-world experiments. Last,
I presented a large scale study of various aspects associated with software vulnerabilities
during their life cycle. Our study showed that the number of vulnerabilities being disclosed
every year has stopped increasing since 2008. We showed that the most primitive and most
exploited form of vulnerabilities are DoS, BO, and EXE; however, SQL, XSS, and PHP have
also become significantly large. Our findings also highlighted that patching of vulnerabilities
in closed-source software is faster compared to open-source software and at the same time
the exploitation is slower.
The vision of this thesis can be extended to many other similar research directions. Within
RFID systems, the theoretical framework of the proposed schemes can be leveraged to enable
other applications such as RFID tag search for product recall, dynamic RFID population
tracking, multi-category RFID estimation, and fair RFID identification for active RFID tags.

205

For network measurements, the theoretical framework of the proposed scheme for latency
measurement can be extended to measure other network performance metrics such as loss,
throughput, jitter, flow size distributions, quality of service, and quality of experience. For
user authentication, the feature extraction and modeling aspect of the proposed scheme can
be extended to authenticate users with the help of wearable devices and even authenticate
devices themselves in the emerging internet of things infrastructure.

206

BIBLIOGRAPHY

207

BIBLIOGRAPHY
[1] http://en.wikipedia.org/wiki/Distribution_center.
[2] 25 leaked celebrity cell phone pics.
25-leaked-celebrity-cell-phone-pics/.

http://www.holytaco.com/

[3] CAIDA passive network monitors. http://www.caida.org/data/realtime/
passive/.
[4] The CAIDA UCSD anonymized 2011 internet traces. http://www.caida.org/data/
passive/passive_2011_dataset.xml.
[5] Cedexis. http://www.cedexis.com/.
[6] Common Vulnerabilities and Exposures, http://cve.mitre.org/.
[7] Corvil claims to minimize network latency. http://www.pcworld.idg.com.au/
article/196828/corvil_claims_minimize_network_latency/.
[8] Fibre Channel Backbone - 5 (FC-BB-5) REV 2.
[9] Forum for Incident Response and Security Teams, http://www.first.org/cvss.
[10] IEEE 1588 standard for a precision clock synchronization protocol for networked measurement and control systems.
[11] National Vulnerability Database, http://nvd.nist.gov/.
[12] Preliminary national retail security survey findings.
https://nrf.com/news/
national-retail-security-survey-retail-shrinkage-totaled-345-billion-2011.
[13] Sidera. http://www.sidera.net/.
[14] Singapore exchange (sgx) selects corvil for latency management.
//www.corvil.com/News/Press-Releases/Singapore-Exchange-(SGX)
-Selects-Corvil-For-Latenc.aspx.

http:

[15] The
symantec
smartphone
honey
stick
project.
http:
// www. symantec. com/ content/ en/ us/ about/ presskits/
b-symantec-smartphone-honey-stick-project. en-us. pdf? om_ ext_ cid=
biz_ socmed_ twitter_ facebook_ marketwire_ linkedin_ 2012Mar_ worldwide_
honeystick .
[16] The Open Source Vulnerability Database, http://osvdb.org/.

208

[17] Tokyo stock exchange select corvil. http://www.corvil.com/News/Press-Releases/
Tokyo-Stock-Exchange-Select-Corvil.aspx.
[18] Turbobytes. http://www.turbobytes.com/.
[19] While london stock exchange selects corvil for low latency network monitoring and analysis solution.
http://low-latency.com/article/\%E2\%80\
%A6-while-london-stock-exchange-selects-corvil-low-latency-networkmonitoring-and-analysis-sol.
[20] Z-Drive R4 and R5 PCIe SSD.
http://lensfire.in/2012/01/
ocz-launches-new-z-drive-r4-and-r5-pcie-ssd-ces-2012-2012/.
[21] HP expands high-performance computing offering with infiniband solutions from cisco.
http://www.hp.com/hpinfo/newsroom/press/2007/070524xa.html, May 2007.
[22] The amazon warehouses. http://imgur.com/gallery/uHZbW, 2013.
[23] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association
rules. In Proceedings of of 20th International Conference of on Very Large Data Bases,
pages 487–499, 1994.
[24] Omar H. Alhazmi and Yashwant K. Malaiya. Quantitative vulnerability assessment of
systems software. In Proceedings of Annual Reliability and Maintainability Symposium,
pages 615–620, 2005.
[25] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating
the frequency moments. In Proceedings of ACM SoTC, pages 20–29, 1996.
[26] Ross Anderson. Why information security is hard – an economic perspective. In
Proceedings of 17th Annual Computer Security Applications Conference of , pages 358–
365, 2001.
[27] Ross Anderson. Security in open versus closed systems – the dance of boltzmann,
coase and moore. In Proceedings of Open Source Software: Economics, Law, and
Policy Confocuce, June 2002.
[28] Ashish Arora, Chris Forman, Anand Nandkumar, and Rahul Telang. Competition and
patching of security vulnerabilities: An empirical analysis. Information Economics and
Policy, 22(2):164–177, 2010.
[29] Ashish Arora, Ramayya Krishnan, Rahul Telang, and Yubao Yang. An empirical
analysis of software vendors patch release behavior: Impact of vulnerability disclosure.
Information Systems Research, 21(1):115–132, 2010.

209

[30] Adam J. Aviv, Katherine Gibson, Evan Mossop, Matt Blaze, and Jonathan M. Smith.
Smudge attacks on smartphone touch screens. In Proceedings of 4th USENIX conference on Offensive technologies, pages 1–10, 2010.
[31] Michael Backes, Thomas R. Gross, and Guenter Karjoth. Tag identification system,
2008.
[32] Theophilus Benson, Aditya Akella, and David A. Maltz. Network traffic characteristics
of data centers in the wild. In Proceedings of IMC, pages 267–280, 2010.
[33] Charles Bordenave, David McDonald, and Alexandre Proutiere. Performance of random medium access control, an asymptotic approach. In Proceedings of ACM SIGMETRICS, 2008.
[34] Mehran Bozorgi, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. Beyond
heuristics: Learning to classify vulnerabilities and predict exploits. In Proceedings of
of 16th International Conference of on Knowledge discovery and data mining, pages
105–114, 2010.
[35] John I. Capetanakis. Tree algorithms for packet broadcast channels. IEEE Transactions on Information Theory, 25:505–515, 1979.
[36] Bogdan Carbunar, Murali Krishna Ramanathan, Mehmet Koyuturk, Christoph Hoffmann, and Ananth Grama. Redundant reader elimination in RFID systems. In Proceedings of IEEE Communications Society Conference of on SECON, pages 576–580,
2005.
[37] Jae-Ryong Cha and I. Jae-Hyun Kim. Novel anti-collision algorithms for fast object
identification in rfid system. In Proceedings of of International Conference of on Parallel and Distributed Systems, 2005.
[38] Chih-Chung Chang and Chin-Jen Lin. LIBSVM: a library for support vector machines.
ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27, 2011.
[39] Binbin Chen, Ziling Zhou, and Haifeng Yu. Understanding rfid counting protocols.
In Proceedings of the 19th annual international conference on Mobile computing &
networking, pages 291–302. ACM, 2013.
[40] Yan Chen, David Bindel, Hanhee Song, and Randy H. Katz. An algebraic approach to
practical and scalable overlay network monitoring. In Proceedings of ACM SIGCOMM,
pages 55–66, 2004.
[41] Sandy Clark, Stefan Frei, Matt Blaze, and Jonathan Smith. Familiarity breeds contempt: The honeymoon effect and the role of legacy code in zero-day vulnerabilities. In
Proceedings of 26th International Annual Computer Security Applications Conference
of , pages 251–260, 2010.
210

[42] Mauro Conti, Irina Zachia-Zlatea, and Bruno Crispo. Mind how you answer me!:
transparently authenticating the user of a smartphone when answering or placing a call.
In Proceedings of ACM Symposium on Information, Computer and Communications
Security, pages 249–259, 2011.
[43] Graham Cormode and S Muthukrishnan. An improved data stream summary: the
count-min sketch and its applications. Journal of Algorithms, 55(1):58–75, 2005.
[44] Robert Dorfman. The detection of defective members of large populations. Annals of
Mathematical Statistics, 14:436–440, 1943.
[45] Nick Duffield. Simple network performance tomography. In Proceedings of ACM IMC,
pages 210–215, 2003.
[46] Klaus Finkenzeller. RFID Handbook: Fundamentals and Applications in Contactless
Smart Cards, Radio Frequency Identification and Near-Field Communication. Wiley,
2010.
[47] Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base
applications. Journal of Computer and System Sciences, 31(2):182–209, 1985.
[48] Sally Floyd and Van Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking, 1(4):397–413, 1993.
[49] Stefan Frei, Martin May, Ulrich Fiedler, and Bernhard Plattner. Large-scale vulnerability analysis. In Proceedings of 2006 SIGCOMM workshop on Large-Scale Attack
Defense, pages 131–138, September 2006.
[50] Stefan Frei, Bernhard Tellenbach, and Bernhard Plattner. 0-day patch exposing vendors (in) security performance. In Proceedings of Black Hat Technical Security Conference of , volume 14, 2009.
[51] Karsten Fyhn, , Rasmus Melchior Jacobsen, Petar Popovski, and Torben Larsen. Fast
capture – recapture approach for mitigating the problem of missing rfid tags. IEEE
Transactions on Mobile Computing, 11(3):518–528, 2012.
[52] D. Gafurov, K. Helkala, and T. Søndrol. Biometric gait authentication using accelerometer sensor. Journal of computers, 1(7):51–59, 2006.
[53] Hao Han, Bo Sheng, Chiu C. Tan, Qun Li, Weizhen Mao, and Sanglu Lu. Counting
RFID tags efficiently and anonymously. In Proceedings of IEEE International Conference of on Computer Communications, 2010.
[54] Nan Hua, Eric Norige, Sailesh Kumar, and Bill Lynch. Non-crypto hardware hash functions for high performance networking ASICs. In Proceedings of ACM/IEEE ANCS,
pages 156–166, 2011.
211

[55] EPCGlobal Inc. Radio-Frequency Identity Protocols Class-1 Generation-2 UHF RFID
Protocol for Communications at 860 MHz–960 MHz. EPCGlobal Inc, 1.2.0 edition,
2008.
[56] Rasmus Jacobsen, Karsten Fyhn Nielsen, Petar Popovski, and Torben Larsen. Reliable
identification of rfid tags using multiple independent reader sessions. In Proceedings of
IEEE International Conference of on RFID, pages 64–71, 2009.
[57] Rajendra K. Jain, Dah-Ming W. Chiu, and William R. Hawe. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems.
Technical report, Digital Equipment Corporation, 1984.
[58] Jr Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of
the American statistical association, 58(301):236–244, 1963.
[59] S. Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support vector machines
with gaussian kernel. Neural computation, 15(7):1667–1689, 2003.
[60] Kevin Killourhy and Roy Maxion. Why did my detector do that?! In Proceedings of
Recent Advances in Intrusion Detection, pages 256–276, 2010.
[61] Murali Kodialam and Thyaga Nandagopal. Fast and reliable estimation schemes in
RFID systems. In Proceedings of 12th International Conference of on Mobile Computing and Networking, pages 322–333, 2006.
[62] Murali Kodialam, Thyaga Nandagopal, and Wing Cheong Lau. Anonymous tracking
using RFID tags. In Proceedings of IEEE International Conference of on Computer
Communications, 2007.
[63] Ramana Rao Kompella, Kirill Levchenko, Alex C. Snoeren, and George Varghese. Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator.
In Proceedings of ACM SIGCOMM, pages 255–266, 2009.
[64] Abhishek Kumar, Jun (Jim) Xu, Jia Wang, Oliver Spatschekt, and Li (Erran) Lit.
Space-code bloom filter for efficient per-flow traffic measurement. In Proceedings of
IEEE INFOCOM, pages 1762–1773, 2004.
[65] J.R. Kwapisz, G.M. Weiss, and S.A. Moore. Cell phone-based biometric identification.
In Proceedings of IEEE International Conference of on Biometrics: Theory Applications and Systems, pages 1–7, 2010.
[66] Ching Law, Kayi Lee, and Kai-Yeung Siu. Efficient memoryless protocol for tag identification. In Proceedings of 4th International Workshop on Discrete Algorithms and
Methods for Mobile Computing and Communications, 2000.

212

[67] Chun Hee Lee and Chin-Wan Chung. Efficient storage scheme and query processing for supply chain management using RFID. In Proceedings of ACM International
Conference of on Management of data, pages 291–302, 2008.
[68] Myungjin Lee, Nick Duffield, and Ramana Rao Kompella. Not all microseconds are
equal: fine-grained per-flow measurements with reference latency interpolation. In
Proceedings of ACM SIGCOMM, pages 27–38, 2010.
[69] Myungjin Lee, Nick Duffield, and Ramana Rao Kompella. A scalable architecture
for maintaining packet latency measurements. In Proceedings of IMC, pages 101–114,
2012.
[70] Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, and George Varghese. Finegrained latency and loss measurements in the presence of reordering. In Proceedings of
the ACM SIGMETRICS joint international conference on Measurement and modeling
of computer systems, pages 329–340. ACM, 2011.
[71] Tao Li, Shigang Chen, and Yibei Ling. Identifying the missing tags in a large RFID
system. In Proceedings of MobiHoc, pages 1–10, 2010.
[72] Tao Li, Samuel Wu, Shigang Chen, and Mark Yang. Energy efficient algorithms for
the RFID estimation problem. In Proceedings of IEEE International Conference of on
Computer Communications, 2010.
[73] Xiulong Liu, Keqiu Li, Geyong Min, Yanming Shen, A Liu, and Wenyu Qu. Completely
pinpointing the missing RFID tags in a time-efficient way. IEEE Transactions on
Computers, pages 1–11, 2013.
[74] Yi Lu, Andrea Montanari, Balaji Prabhakar, Sarang Dharmapurikar, and Abdul Kabbani. Counter braids: a novel counter architecture for per-flow measurement. In
Proceedings of ACM SIGMETRICS, pages 121–132, 2008.
[75] Alexander De Luca, Alina Hang, Frederik Brudy, Christian Lindner, and Heinrich
Hussmann. Touch me once and i know it’s you!: implicit authentication based on
touch screen patterns. In Proceedings of ACM Annual Conference on Human Factors
in Computing Systems (SIGCHI), pages 987–996, 2012.
[76] Wen Luo, Shigang Chen, Tao Li, and Yan Qiao. Probabilistic missing-tag detection
and energy-time tradeoff in large-scale RFID systems. In Proceedings of MobiHoc,
pages 95–104, 2012.
[77] Bill Lynch and Sailesh Kumar. Smart memory for high performance network packet
forwarding. In Proceedings of Hot Chips Symposium, 2010.

213

[78] J. Mantyjarvi, M. Lindholm, E. Vildjiounaite, S.M. Makela, and HA Ailisto. Identifying users of portable devices from gait pattern with accelerometers. In Proceedings
of IEEE International Conference of on Acoustics, Speech, and Signal Processing, volume 2, pages 973–976, 2005.
[79] Richard Martin. Wall street’s quest to process data at the speed of light. Information
Week, 4(21), 2007.
[80] Michael J. Miller. Bandwidth engine serial memory chip breaks 2 billion accesses/sec.
In Proceedings of Hot Chips Symposium, 2011.
[81] Fabian Monrose, Michael K. Reiter, and Susanne Wetzel. Password hardening based
on keystroke dynamics. In Proceedings of ACM CCS, pages 73 – 82, 1999.
[82] Jihoon Myung and Wonjun Lee. Adaptive splitting protocols for rfid tag collision
arbitration. In Proceedings of 7th ACM International Symposium on Mobile Ad Hoc
Networking and Computing, pages 202–213, 2006.
[83] Vinod Namboodiri and Lixin Gao. Energy-aware tag anticollision protocols for RFID
systems. In Proceedings of 5th IEEE International Conference of on Pervasive Computing and Communications, pages 23–36, 2007.
[84] Badri Nath, Franklin Reynolds, and Roy Want. RFID technology and applications.
IEEE Pervasive Computing, 5:22–24, 2006.
[85] Aditya Nemmaluri, Mark D. Corner, and Prashant Shenoy. Sherlock: Automatically
locating objects for humans. In Proceedings of International Conference of on Mobile
Systems, Applications, and Services, pages 187–198, 2008.
[86] Lionel M. Ni, Yunhao Liu, Yiu Cho Lau, and Abhishek P. Patil. Landmarc: Indoor
location sensing using active RFID. LANDMARC: Indoor Location Sensing Using
Active RFID, 10:701–710, 2004.
[87] Lei Pan and Hongyi Wu. Smart trend-traversal: A low delay and energy tag arbitration
protocol for large RFID systems. In Proceedings of 30th IEEE International Conference
of on Computer Communications, 2009.
[88] Ruoming Pang, Mark Allman, Mike Bennett, Jason Lee, Vern Paxson, and Brian
Tierney. A first look at modern enterprise traffic. In Proceedings of ACM IMC, pages
15–28, 2005.
[89] Chen Qian, Yunhuai Liu, Hoilun Ngan, and Lionel M. Ni. ASAP: Scalable identification
and counting for contactless rfid systems. In Proceedings of 30th IEEE International
Conference of on Distributed Computing Systems, pages 52–61, 2010.

214

[90] Chen Qian, Hoilun Ngan, and Yunhao Liu. Cardinality estimation for large-scale RFID
systems. In Proceedings of 6th IEEE PerCom, pages 30–39, 2008.
[91] M.V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions
for high performance computers. IEEE Transactions on Computers, 46(12):1378–1381,
1997.
[92] Eric Rescorla. Is finding security holes a good idea?
3(1):14–19, Januray 2005.

IEEE Security and Privacy,

[93] Mark Roberti. A 5-cent breakthrough. RFID Journal, 5(6), 2006.
[94] Walter A. Rosenkrantz and Donald Towsley. On the instability of slotted aloha multiaccess algorithm. IEEE Transactions on Automatic Control, 28(10):994–996, 1983.
[95] Napa Sae-Bae, Kowsar Ahmed, Katherine Isbister, and Nasir Memon. Biometric-rich
gestures: a novel approach to authentication on multi-touch device. In Proceedings of
ACM Annual Conference on Human Factors in Computing Systems (SIGCHI), 2012.
[96] Florian Schaub, Ruben Deyhle, and Michael Weber. Password entry usability and
shoulder surfing susceptibility on different smartphone platforms. In Proceedings of
11th International Conference of on Mobile and Ubiquitous Multimedia, 2012.
[97] Bernhard Schlkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C.
Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443–1471, 2001.
[98] Guido Schryen. A comprehensive and comparative analysis of the patching behavior
of open source and closed source software vendors. In Proceedings of 5th International
Conference of on IT Security Incident Management and IT Forensics, pages 153–168,
2009.
[99] E. Eugene Schultz, David S. Brown, and Thomas A. Longstaff. Responding to Computer Security Incidents: Guidelines for Incident Handling. Lawrence Livermore National Laboratory, Livermore, CA, 1990.
[100] Philips Semiconductors. SL2 ICS11 I.Code UID Smart Label IC Functional Specification Datasheet http://www.advanide.com/datasheets/sl2ics11.pdf, 2004.
[101] Vahid Shah-Mansouri and Vincent W.S. Wong. Anonymous cardinality estimation in
RFID systems with multiple readers. In Proceedings of IEEE Global Communications
Conference of , 2009.
[102] Muhammad Shahzad and Alex X. Liu. Every bit counts – fast and scalable RFID
estimation. In Proceedings of 18th International Conference of on Mobile Computing
and Networking (Mobicom), pages 365–376, 2012.
215

[103] Muhammad Shahzad and Alex X Liu. Every bit counts: Fast and scalable RFID
estimation. In ACM International Conference on Mobile Computing and Networking
(MobiCom), 2012.
[104] Muhammad Shahzad and Alex X Liu. Probabilistic optimal tree hopping for RFID
identification. In ACM International Conference on Measurement and Modeling of
Computer Systems (SIGMETRICS), 2013.
[105] Muhammad Shahzad and Alex X. Liu. Probabilistic optimal tree hopping protocol for
RFID identification. In submission, ACM International Conference of on Measurement
and Modeling of Computer Systems (SIGMETRICS), 2013.
[106] Muhammad Shahzad and Alex X Liu. Fast and accurate estimation of RFID tags.
IEEE/ACM Transactions on Networking (ToN), 2014.
[107] Muhammad Shahzad and Alex X Liu. Noise can help: Accurate and efficient per-flow
latency measurement without packet probing and time stamping. In ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2014.
[108] Muhammad Shahzad and Alex X Liu. Probabilistic optimal tree hopping for RFID
identification. IEEE/ACM Transactions on Networking (ToN), 2014.
[109] Muhammad Shahzad and Alex X Liu. Expecting the unexpected: Fast and reliable
detection of missing RFID tags in the wild. In IEEE International Conference on
Computer Communications (INFOCOM), 2015.
[110] Muhammad Shahzad, Alex X Liu, and Arjmand Samuel. Secure unlocking of mobile
touch screen devices by simple gestures: you can see it but you can not do it. In ACM
International Conference on Mobile Computing and Networking (MobiCom), 2013.
[111] Muhammad Shahzad, Muhammad Zubair Shafiq, and Alex X Liu. A large scale exploratory analysis of software vulnerability life cycles. In International Conference on
Software Engineering (ICSE), 2012.
[112] Muhammad Shahzad, Saira Zahid, and Muddassar Farroq. A hybrid GA-PSO fuzzy
system for user identification on smart phones. In Proceedings of 11th Annual Conference of on Genetic and Evolutionary Computation (GECCO), pages 1617–1624, 2009.
[113] Alan D Smith, Amber A Smith, and David L Baker. Inventory management shrinkage and employee anti-theft approaches. International Journal of Electronic Finance,
5(3):209–234, 2011.
[114] Chiu Chiang Tan, Bo Sheng, and Qun Li. How to monitor for missing RFID tags. In
Proceedings of IEEE ICDCS, pages 295–302, 2008.
216

[115] Andrew S. Tanenbaum. Computer Networks. Prentice-Hall, 2002.
[116] ShaoJie Tang, Jing Yuan, Xiang-Yang Li, Guihai Chen, Yunhao Liu, and JiZhong
Zhao. Raspberry: A stable reader activation scheduling protocol in multi-reader RFID
systems. In Proceedings of IEEE International Conference of on Network Protocols,
pages 304–313, 2009.
[117] F. Tari, A. Ozok, and S.H. Holden. A comparison of perceived and real shoulder-surfing
risks between alphanumeric and graphical passwords. In Proceedings of SOUPS, pages
56–66, 2006.
[118] Rahul Telang and Sunil Wattal. An empirical analysis of the impact of software vulnerability announcements on firm stock price. IEEE Transactions on Software Engineering, 33(8):544–557, 2007.
[119] Robert Tibshirani, Guenther Walther, and Trevor Hastie. Estimating the number of
clusters in a data set via the gap statistic. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 63(2):411–423, 2001.
[120] G´eraldine Vache. Vulnerability analysis for a quantitative security evaluation. In
Proceedings of 3rd International Symp. on Empirical Software Engineering and Measurement, 2009.
[121] Harald Vogt. Efficient object identification with passive RFID tags. Pervasive Computing, 2414:98–113, 2002.
[122] James Waldrop, Daniel W. Engels, and Sanjay E. Sarma. Colorwave: A MAC for RFID
reader networks. In Proceedings of IEEE Wireless Communications and Networking,
pages 1701–1704, 2003.
[123] Chong Wang, Hongyi Wu, and Nian-Feng Tzeng. RFID-based 3-d positioning schemes.
In Proceedings of IEEE International Conference of on Computer Communications,
pages 1235–1243, 2007.
[124] Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo
Cunningham. Weka: Practical Machine Learning Tools and Techniques with Java
Implementations. Citeseer, 1999.
[125] Yan Wu, Harvey Siy, and Robin Gandhi. Empirical results on the study of software
vulnerabilities: NIER track. In Proceedings of 33rd International Conference of on
Software Engineering, pages 964–967, 2011.
[126] Saira Zahid, Muhammad Shahzad, Syed Ali Khayam, and Muddassar Farooq.
Keystroke-based user identification on smart phones. In Proceedings of 12th International Symposium on Recent Advances in Intrusion Detection (RAID), pages 224–243,
2009.
217

[127] Andrea Zanella. Estimating collision set size in framed slotted aloha wireless networks
and RFID systems. IEEE Communications Letters, 16(3):300–303, 2012.
[128] Rui Zhang, Yunzhong Liu, Yanchao Zhang, and Jinyuan Sun. Fast identification of
the missing tags in a large RFID system. In Proceedings of IEEE SECON, 2011.
[129] Bin Zhen, Mamoru Kobayashi, and Masashi Shimizu. Framed ALOHA for multiple
RFID objects identification. IEICE Transactions on Communications, 88:991–999,
2005.
[130] Nan Zheng, Kun Bai, Hai Huang, and Haining Wang. You are how you touch: User
verification on smartphones via tapping behaviors. Technical report, College of William
and Mary, 2012.
[131] Zongheng Zhou, Himanshu Gupta, Samir R. Das, and Xianjin Zhu. Slotted scheduled tag access in multi-reader RFID systems. In Proceedings of IEEE International
Conference of on Network Protocols, pages 61–70, 2007.

218