PRIVACY AND INTEGRITY PRESERVING
COMPUTATION IN DISTRIBUTED SYSTEMS

By
Fei Chen

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Computer Science
2011

ABSTRACT
PRIVACY AND INTEGRITY PRESERVING COMPUTATION
IN DISTRIBUTED SYSTEMS
By
Fei Chen
Preserving privacy and integrity of private data has become core requirements for many
distributed systems across diﬀerent parties. In these systems, one party may try to compute
or aggregate useful information from the private data of other parties. However, this party is
not be fully trusted by other parties. Therefore, it is important to design security protocols
for preserving such private data. Furthermore, one party may want to query the useful
information computed from such private data. However, query results may be modiﬁed by
a malicious party. Thus, it is important to design query protocols such that query result
integrity can be veriﬁed.
In this dissertation, we study four important privacy and integrity preserving problems for
diﬀerent distributed systems. For two-tiered sensor networks, where storage nodes serve as
an intermediate tier between sensors and a sink for storing data and processing queries, we
proposed SafeQ, a protocol that prevents compromised storage nodes from gaining information from both sensor collected data and sink issued queries, while it still allows storage nodes
to process queries over encrypted data and the sink to detect compromised storage nodes
when they misbehave. For cloud computing, where a cloud provider hosts the data of an
organization and replies query results to the customers of the organization, we propose novel
privacy and integrity preserving schemes for multi-dimensional range queries such that the
cloud provider can process encoded queries over encoded data without knowing the actual

values, and customers can verify the integrity of query results with high probability. For distributed ﬁrewall policies, we proposed the ﬁrst privacy-preserving protocol for cross-domain
ﬁrewall policy optimization. For any two adjacent ﬁrewalls belonging to two diﬀerent administrative domains, our protocol can identify in each ﬁrewall the rules that can be removed
because of the other ﬁrewall. For network reachability, one of the key factors for capturing
end-to-end network behavior and detecting the violation of security policies, we proposed
the ﬁrst cross-domain privacy-preserving protocol for quantifying network reachability.

ACKNOWLEDGMENTS
I am extremely grateful to my advisor Dr. Alex X. Liu. He is not only an excellent
advisor, an outstanding researcher, but also a great friend. This thesis would not be possible
without his tremendous help. He always tries his best to guide me through every perspective
of my graduate study, and give tremendous support to make me successful in both study
and research. He not only teaches me how to identify problems, how to solve problems,
and how to build systems, but also helps me improve my writing skills, speaking skills, and
communication skills. Numerous times, he helps me set milestones together, and guides me
to make progress steadily. When I meet diﬃcult problems in my research, his encouragement
help me to gain more conﬁdence, and to come up with solutions.
I would like to thank other members in my thesis committee, Dr. Eric Torng, Dr. Richard
Enbody, and Dr. Hayder Radha. Dr. Torng not only guided my thesis, but also gave me
great help on ﬁnding travel funds such that I can present my papers in many conferences. Dr.
Enbody and Dr. Radha gave me many valuable suggestions and feedbacks on my qualifying
report and comprehensive proposal, which helped improve my thesis signiﬁcantly.
I thank my collaborator and friend Bezawada Bruhadeshwar. He makes signiﬁcant contributions to this thesis work. My work with him is both happy and fruitful. He actively
discussed with me the works of privacy preserving ﬁrewall optimization and privacy preserving network qualiﬁcations and provide signiﬁcant help to move the project forward.
I really thank my wife Xiaoshu Wu and my parents for her great love and tremendous
support in every aspect of my life in Michigan State University. They support me to pursue
my own goals with conﬁdence and to overcome obstacles with courage. I extremely grateful
for my wife taking great care of me when I had serious health problems and stayed in hospital.
This thesis is dedicated to my wife and my parents.

iv

TABLE OF CONTENTS
LIST OF TABLES

ix

LIST OF FIGURES

x

1 Introduction
1.1 Privacy and Integrity Preserving Queries for
Sensor Networks . . . . . . . . . . . . . . . . .
1.2 Privacy and Integrity Preserving Queries for
Cloud Computing . . . . . . . . . . . . . . . .
1.3 Privacy Preserving Optimization for Firewall
Policies . . . . . . . . . . . . . . . . . . . . . .
1.4 Privacy Preserving Quantiﬁcation of Network
Reachability . . . . . . . . . . . . . . . . . . .
1.5 Structure of the paper . . . . . . . . . . . . .

1
. . . . . . . . . . . . . . . . .

2

. . . . . . . . . . . . . . . . .

3

. . . . . . . . . . . . . . . . .

4

. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .

4
5

2 Privacy and Integrity Preserving Range Queries in Sensor Networks
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Limitations of Prior Art . . . . . . . . . . . . . . . . . . . . . . .
2.1.4 Our Approach and Key Contributions . . . . . . . . . . . . . . .
2.1.5 Summary of Experimental Results . . . . . . . . . . . . . . . . . .
2.2 Models and Problem Statement . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Privacy for 1-dimensional Data . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Preﬁx Membership Veriﬁcation . . . . . . . . . . . . . . . . . . .
2.3.2 The Submission Protocol . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 The Query Protocol . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Integrity for 1-dimensional Data . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Integrity Scheme Using Merkle Hash Trees . . . . . . . . . . . . .
2.4.2 Integrity Scheme Using Neighborhood Chains . . . . . . . . . . .
2.5 Queries over Multi-dimensional Data . . . . . . . . . . . . . . . . . . . .
2.5.1 Privacy for Multi-dimensional Data . . . . . . . . . . . . . . . . .
2.5.2 Integrity for Multi-dimensional Data . . . . . . . . . . . . . . . .
v

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

7
7
7
8
9
9
10
10
10
12
13
13
14
16
17
18
19
20
23
24
25
26

2.6
2.7
2.8

2.9

SafeQ Optimization . . . . . . . .
Queries in Event-driven Networks
Complexity and Security Analysis
2.8.1 Complexity Analysis . . .
2.8.2 Privacy Analysis . . . . .
2.8.3 Integrity Analysis . . . . .
Experimental Results . . . . . . .
2.9.1 Evaluation Methodology .
2.9.2 Evaluation Setup . . . . .
2.9.3 Evaluation Results . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

3 Privacy and Integrity Preserving Range Queries for Cloud Computing
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3 Limitations of Previous Work . . . . . . . . . . . . . . . . . . . . .
3.1.4 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.5 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.6 Summary of Experimental Results . . . . . . . . . . . . . . . . . . .
3.2 Models and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Privacy Preserving for 1-dimensional Data . . . . . . . . . . . . . . . . . .
3.3.1 The Order-Preserving Hash-based Function . . . . . . . . . . . . .
3.3.2 The Privacy-Preserving Scheme . . . . . . . . . . . . . . . . . . . .
3.3.3 Optimization of the Order-Preserving Hash-based Function . . . . .
3.3.4 Analysis of Information Leakage . . . . . . . . . . . . . . . . . . . .
3.4 Integrity Preserving for 1-dimensional Data . . . . . . . . . . . . . . . . .
3.4.1 Bit Matrices and Local Bit Matrices . . . . . . . . . . . . . . . . .
3.4.2 The Integrity-Preserving Scheme . . . . . . . . . . . . . . . . . . .
3.5 Finding Optimal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Detection Probability . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.2 Optimal Bucket Partition . . . . . . . . . . . . . . . . . . . . . . .
3.6 Query Over Multi-dimensional Data . . . . . . . . . . . . . . . . . . . . . .
3.6.1 Privacy for Multi-dimensional Data . . . . . . . . . . . . . . . . . .
3.6.2 Integrity for Multi-dimensional Data . . . . . . . . . . . . . . . . .
3.7 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.1 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi

.
.
.
.
.
.
.
.
.
.

29
32
34
34
35
36
36
36
37
38

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

47
47
47
48
49
49
50
50
51
51
51
52
52
54
56
57
58
59
61
62
64
65
66
68
68
69
72
72

3.7.2
3.7.3

Results for 1-dimensional Data . . . . . . . . . . . . . . . . . . . . .
Results for Multi-dimensional Data . . . . . . . . . . . . . . . . . . .

4 Privacy Preserving Cross-Domain Cooperative Firewall Optimization
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Limitation of Prior Work . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Cross-domain Inter-ﬁrewall Optimization . . . . . . . . . . . . . . .
4.1.4 Technical Challenges and Our Approach . . . . . . . . . . . . . . .
4.1.5 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 System and Threat Models . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Privacy-Preserving Inter-Firewall Redundancy Removal . . . . . . . . . . .
4.3.1 Privacy-Preserving Range Comparison . . . . . . . . . . . . . . . .
4.3.2 Processing Firewall F W1 . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3 Processing Firewall F W2 . . . . . . . . . . . . . . . . . . . . . . . .
4.3.4 Single-Rule Coverage Redundancy Detection . . . . . . . . . . . . .
4.3.5 Multi-Rule Coverage Redundancy Detection . . . . . . . . . . . . .
4.3.6 Identiﬁcation and Removal of Redundant Rules . . . . . . . . . . .
4.4 Firewall Update After Optimization . . . . . . . . . . . . . . . . . . . . . .
4.5 Security and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Evaluation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.3 Eﬀectiveness and Eﬃciency on Real Policies . . . . . . . . . . . . .
4.6.4 Eﬃciency on Synthetic Policies . . . . . . . . . . . . . . . . . . . .

73
75

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

78
78
78
79
80
81
82
83
83
84
84
85
87
90
93
95
98
100
100
100
102
103
103
104
105
111

5 Privacy Preserving Cross-Domain Network Reachability Quantiﬁcation
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Limitation of Prior Art . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3 Cross-Domain Quantiﬁcation of Network Reachability . . . . . . . . .
5.1.4 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.5 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.6 Summary of Experimental Results . . . . . . . . . . . . . . . . . . . .
5.1.7 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113
113
113
114
116
117
118
119
119

vii

5.2

5.3

5.4

5.5
5.6

Problem Statement and Threat Model . . . . . . . . . . .
5.2.1 Problem Statement . . . . . . . . . . . . . . . . . .
5.2.2 Threat Model . . . . . . . . . . . . . . . . . . . . .
Privacy-Preserving Quantiﬁcation of Network Reachability
5.3.1 Privacy-Preserving Range Intersection . . . . . . .
5.3.2 ACL Preprocessing . . . . . . . . . . . . . . . . . .
5.3.3 ACL Encoding and Encryption . . . . . . . . . . .
5.3.4 ACL Comparison . . . . . . . . . . . . . . . . . . .
Security and Complexity Analysis . . . . . . . . . . . . . .
5.4.1 Security Analysis . . . . . . . . . . . . . . . . . . .
5.4.2 Complexity Analysis . . . . . . . . . . . . . . . . .
Optimization . . . . . . . . . . . . . . . . . . . . . . . . .
Experimental Results . . . . . . . . . . . . . . . . . . . . .
5.6.1 Eﬃciency on Real ACLs . . . . . . . . . . . . . . .
5.6.2 Eﬃciency on Synthetic ACLs . . . . . . . . . . . .

6 Related Work
6.1 Secure Multiparty Computation . . . . . . . . . . . . . .
6.2 Privacy and Integrity Preserving in WSNs . . . . . . . .
6.3 Privacy and Integrity Preserving in DAS . . . . . . . . .
6.4 Firewall Redundancy Removal and Collaborative Firewall
6.5 Network Reachability Quantiﬁcation . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

119
119
121
121
122
124
126
130
133
133
135
136
137
137
138

145
. . . . . . . . . . . 145
. . . . . . . . . . . 146
. . . . . . . . . . . 147
Enforcement in VPN150
. . . . . . . . . . . 151

7 Conclusions and Future Work

154

APPENDICES
A
Analysis of SafeQ Optimization . . .
∗
B
Properties of fk and Their Proof . .
C
Calculation of Detection Probability
D
Proof of Theorems 3 and 4 . . . . .

156
156
159
160
162

BIBLIOGRAPHY

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

164

viii

LIST OF TABLES
2.1

Summary of notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.2

Complexity analysis of SafeQ . . . . . . . . . . . . . . . . . . . . . . . . . .

35

4.1

Redundancy ratios for 5 real ﬁrewall groups . . . . . . . . . . . . . . . . . . 105

ix

LIST OF FIGURES
2.1

Architecture of two-tired sensor networks . . . . . . . . . . . . . . . . . . . .

11

2.2

The idea of SafeQ for preserving privacy . . . . . . . . . . . . . . . . . . . .

14

2.3

Preﬁx membership veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.4

Merkle hash tree for 8 data items . . . . . . . . . . . . . . . . . . . . . . . .

20

2.5

Data integrity veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.6

An example neighborhood chain . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.7

Merkle hash trees for two-dimensional data . . . . . . . . . . . . . . . . . . .

27

2.8

A 2-dimensional neighborhood chain . . . . . . . . . . . . . . . . . . . . . .

28

2.9

An example Bloom ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.10 Example idle periods and data submissions . . . . . . . . . . . . . . . . . . .

34

2.11 Average power consumption per submission for a sensor (A) . . . . . . . . .

41

2.12 Average power consumption per submission for a sensor (B) . . . . . . . . .

42

2.13 Average power consumption per query response for a storage node (A) . . .

43

2.14 Average power consumption per query response for a storage node (B) . . .

44

2.15 Average space consumption for a storage node (A) . . . . . . . . . . . . . . .

45

2.16 Average space consumption for a storage node (B) . . . . . . . . . . . . . . .

46

3.1

The DAS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.2

Basic idea of privacy-preserving scheme . . . . . . . . . . . . . . . . . . . . .

54

3.3

Basic idea of integrity-preserving scheme . . . . . . . . . . . . . . . . . . . .

60

3.4

Example bit matrix and local bit matrices . . . . . . . . . . . . . . . . . . .

61

3.5

The example 2-dimensional bit matrix and local bit matrices . . . . . . . . .

70

3.6

Eﬀectiveness of optimal partition algorithm

. . . . . . . . . . . . . . . . . .

74

3.7

Correctness of integrity-preserving scheme . . . . . . . . . . . . . . . . . . .

74

3.8

Data processing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

x

3.9

Space cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

3.10 Query processing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

4.1

Eﬀect of the number of rules on the throughput with frame size 128 bytes [2]

79

4.2

Example inter-ﬁrewall redundant rules . . . . . . . . . . . . . . . . . . . . .

81

4.3

Preﬁx membership veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.4

The Conversion of F W1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

4.5

The Conversion of F W2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.6

Comparison of Two Firewalls . . . . . . . . . . . . . . . . . . . . . . . . . .

94

4.7

Three largest rules generated from Figure 4.4(d) . . . . . . . . . . . . . . . .

96

4.8

Identiﬁcation of redundant rules in F W2 . . . . . . . . . . . . . . . . . . . .

99

4.9

Processing F W1 on real ﬁrewalls

. . . . . . . . . . . . . . . . . . . . . . . . 106

4.10 Processing F W2 on real ﬁrewalls

. . . . . . . . . . . . . . . . . . . . . . . . 107

4.11 Comparing two real ﬁrewalls . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.12 Processing F W1 on synthetic ﬁrewalls . . . . . . . . . . . . . . . . . . . . . . 109
4.13 Processing F W2 on synthetic ﬁrewalls . . . . . . . . . . . . . . . . . . . . . . 110
4.14 Comparing two synthetic ﬁrewalls . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1

An example of end-to-end network reachability . . . . . . . . . . . . . . . . . 115

5.2

Three resulting ACLs converted from Figure 5.1

5.3

Privacy-preserving range intersection . . . . . . . . . . . . . . . . . . . . . . 124

5.4

The Conversion of A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.5

The example three adjacent ACLs . . . . . . . . . . . . . . . . . . . . . . . . 126

5.6

Encoding and encryption of ACL A1 . . . . . . . . . . . . . . . . . . . . . . 128

5.7

Encoding and encryption of ACL A3 . . . . . . . . . . . . . . . . . . . . . . 130

5.8

Comparison of ACLs A2 and A3 . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.9

Decryption process of the comparison result . . . . . . . . . . . . . . . . . . 133

. . . . . . . . . . . . . . . 117

5.10 Comp. & comm. costs for processing real ACL Ai (1≤i≤n−1) . . . . . . . . 139
xi

5.11 Comp. & comm. costs for processing real ACL An

. . . . . . . . . . . . . . 140

5.12 Comp. & comm. costs for processing synthetic ACL Ai (1≤i≤n−1) . . . . . 142
5.13 Comp. & comm. costs for processing synthetic ACL An . . . . . . . . . . . . 143
5.14 Comparison time of synthetic ACLs Ai and An . . . . . . . . . . . . . . . . 144

xii

CHAPTER 1
Introduction
For distributed systems across diﬀerent parties, preserving privacy and integrity of private
data has become core requirements in the recent decade. In these systems, one party may not
be fully trusted by other parties, but tries to compute or aggregate useful information from
private data of other parties. Thus, it is very important to design security communication
and storage protocols for preventing the party from revealing such data of other parties.
Otherwise, one party would be reluctant to share its private data. For example, Google
and Facebook collect signiﬁcant amount of personal data every day through the Internet,
while many persons want to preserve the privacy of such data due to the security concern.
Furthermore, one party may want to query the useful information computed from the private
data of other parties. However, query results may be modiﬁed by a malicious party. Thus,
it is very important to design query protocols such that the integrity of query results can be
veriﬁed.
In this dissertation, we study four important, yet under-investigated, privacy and integrity
preserving problems for diﬀerent distributed systems, privacy and integrity preserving range
queries in sensor networks, privacy and integrity preserving range queries for cloud computing, privacy preserving cross-domain cooperative ﬁrewall optimization, and privacy pre-

1

serving cross-domain network reachability quantiﬁcation. Next, for each problem, we ﬁrst
describe the motivation, present the challenges, and propose our solution.

1.1

Privacy and Integrity Preserving Queries for
Sensor Networks

Two-tiered sensor networks, where storage nodes gather data from nearby sensors and answer
queries from the sink, has been widely adopted due to the beneﬁts of power and storage
saving for sensors and the eﬃciency of query processing. However, a compromised storage
node could disclose all sensitive data collected from nearby sensors and return forged data
to the sink. Therefore, on a storage node, the data received from sensors should have been
encrypted and the queries received from the sink should also have been encrypted.
Without decrypting the data and queries, the storage node needs to process encrypted
queries over encrypted data correctly and send encrypted data that satisfy the queries back
to the sink. Moreover, the sink needs to be able to verify the integrity of the query results
received from storage nodes. Seemingly impossible, this problem is very challenging to solve.
It is even more challenging to solve eﬃciently.
To preserve privacy, I proposed a novel technique to encode both data and queries such that
a storage node can correctly process encoded queries over encoded data without knowing
their values. To preserve integrity, I proposed two schemes, one using Merkle hash trees
and another using a new data structure called neighborhood chains, to generate integrity
veriﬁcation information so that a sink can use this information to verify whether the result
of a query contains exactly the data items that satisfy the query. To improve performance,
I proposed an optimization technique using Bloom ﬁlters to reduce the communication cost
between sensors and storage nodes.

2

1.2

Privacy and Integrity Preserving Queries for
Cloud Computing

Outsourced database systems are one of the most important work in cloud computing, where
a cloud provider hosts the private databases of an organization and replies query results to the
customers on behalf of the organization. However, the inclusion of the outsourced database
systems also brings signiﬁcant security and privacy challenges. As cloud providers cannot
be fully trusted and the data of an organization are typically conﬁdential, the organization
always encodes the data before storing them in a cloud to prevent the cloud provider from
revealing the data.
However, it is diﬃcult to process queries over encoded data. Furthermore, since cloud
providers serve as an important role for answering queries from customers, they may misbehave the query results, such as returning forged data for the query or not returning all
data that satisfy the query. I proposed the basic idea of the privacy and integrity preserving
protocol for processing range queries in outsourced databases.
To preserve privacy, I proposed an order preserving hash function to encode the data items
from the data owner and the queries from its customers such that the cloud provider can use
the encoded queries and encoded data items to ﬁnd out the query results without knowing
the actual values. To preserve integrity, I propose the ﬁrst probabilistic integrity preserving
scheme for range queries in outsourced database systems. This scheme allows a customer to
verify the integrity of a query result with a high probability.

3

1.3

Privacy Preserving Optimization for Firewall
Policies

Firewalls have been widely deployed on the Internet for securing private networks by checking
whether to accept or discard packets based on its policy. Optimizing ﬁrewall policies is
crucial for improving network performance. Prior work on ﬁrewall optimization focuses on
either intra-ﬁrewall or inter-ﬁrewall optimization within one administrative domain where
the privacy of ﬁrewall policies is not a concern. I explored inter-ﬁrewall optimization across
administrative domains for the ﬁrst time.
The key technical challenge is that ﬁrewall policies cannot be shared across domains because a ﬁrewall policy contains conﬁdential information and even potential security holes,
which can be exploited by attackers.
In this work, I proposed the ﬁrst cross-domain privacy-preserving cooperative ﬁrewall
policy optimization protocol. Speciﬁcally, for any two adjacent ﬁrewalls belonging to two
diﬀerent administrative domains, the protocol can identify in each ﬁrewall the rules that
can be removed because of the other ﬁrewall. The optimization process involves cooperative
computation between the two ﬁrewalls without any party disclosing its policy to the other.

1.4

Privacy Preserving Quantiﬁcation of Network
Reachability

Network reachability is one of the key factors for capturing end-to-end network behavior
and detecting the violation of security policies. Many approaches were proposed to address
the network reachability problem. The main assumption in all these approaches is that the
reachability restriction information of each network device and other conﬁguration state is

4

known to a central network analyst, who is quantifying the network reachability.
However, in reality, it is common that the network devices along a given path belong to
diﬀerent administrative domains where the reachability restriction information cannot be
shared with others including the network analyst.
In this work, I will try to design the ﬁrst cross-domain privacy-preserving protocol for
quantifying network reachability. The protocol enables the network analyst to accurately
determine the network reachability along a network path through diﬀerent administrative
domains without knowing the the reachability restriction information of the other domains.

1.5

Structure of the paper

In Chapter 2, we present SafeQ, a protocol that prevents attackers from gaining information
from both sensor collected data and sink issued queries. we start with the system model,
threat model, and problem statement. Then, we present our schemes for preserving data
privacy and query result integrity, respectively. We also propose a solution to adapt SafeQ
for event-driven sensor networks. We show that SafeQ excels state-of-the-art scheme in both
privacy and performance.
In Chapter 3, we ﬁrst propose an eﬃcient privacy-preserving scheme that can process multidimensional range queries without false positives. Then, we propose the ﬁrst probabilistic
scheme for verifying the integrity of range query results. This scheme employs a new data
structure, local bit matrices, which enables customers to verify query result integrity with
high probability. We show that our scheme is eﬀectiveness and eﬃciency on both real and
synthetic datasets.
In Chapter 4, we ﬁrst introduce the problem, system model, and threat model. Then, we
present our privacy-preserving protocol for detecting inter-ﬁrewall redundant rules. Finally,
we give security analysis results of our protocol and present our experimental results.

5

In Chapter 5, we ﬁrst propose the cross-domain privacy-preserving protocol to quantify
network reachability across multiple parties. Then, we propose an optimization technique to
reduce computation and communication costs. Finally, we show that our protocol is eﬃcient
and suitable for real applications.
In Chapter 6, we ﬁrst survey related work of secure multiparty computation, one of the
fundamental cryptographic primitives for designing privacy-preserving protocols. Then, we
discuss related work for each speciﬁc problem we investigate. in two parts.
Finally, in Chapter 7, we summarize this dissertation, discuss limitations, and outline
future research directions.

6

CHAPTER 2
Privacy and Integrity Preserving
Range Queries in Sensor Networks
2.1
2.1.1

Introduction
Motivation

Wireless sensor networks have been widely deployed for various applications, such as environment sensing, building safety monitoring, and earthquake predication, etc.. In this work,
we consider a two-tiered sensor network architecture in which storage nodes gather data from
nearby sensors and answer queries from the sink of the network. The storage nodes serve as
an intermediate tier between the sensors and the sink for storing data and processing queries.
Storage nodes bring three main beneﬁts to sensor networks. First, sensors save power by
sending all collected data to their closest storage node instead of sending them to the sink
through long routes. Second, sensors can be memory limited because data are mainly stored
on storage nodes. Third, query processing becomes more eﬃcient because the sink only communicates with storage nodes for queries. The inclusion of storage nodes in sensor networks

7

was ﬁrst introduced in [65] and has been widely adopted [26, 82, 70, 71, 69]. Several products
of storage nodes, such as StarGate [7] and RISE [6], are commercially available.
However, the inclusion of storage nodes also brings signiﬁcant security challenges. As
storage nodes store data received from sensors and serve as an important role for answering
queries, they are more vulnerable to be compromised, especially in a hostile environment.
A compromised storage node imposes signiﬁcant threats to a sensor network. First, the
attacker may obtain sensitive data that has been, or will be, stored in the storage node.
Second, the compromised storage node may return forged data for a query. Third, this
storage node may not include all data items that satisfy the query.
Therefore, we want to design a protocol that prevents attackers from gaining information
from both sensor collected data and sink issued queries, which typically can be modeled as
range queries, and allows the sink to detect compromised storage nodes when they misbehave.
For privacy, compromising a storage node should not allow the attacker to obtain the sensitive
information that has been, and will be, stored in the node, as well as the queries that the
storage node has received, and will receive. Note that we treat the queries from the sink as
conﬁdential because such queries may leak critical information about query issuers’ interests,
which need to be protected especially in military applications. For integrity, the sink needs
to detect whether a query result from a storage node includes forged data items or does not
include all the data that satisfy the query.

2.1.2

Technical Challenges

There are two key challenges in solving the privacy and integrity preserving range query
problem. First, a storage node needs to correctly process encoded queries over encoded data
without knowing their actual values. Second, a sink needs to verify that the result of a query
contains all the data items that satisfy the query and does not contain any forged data.

8

2.1.3

Limitations of Prior Art

Although important, the privacy and integrity preserving range query problem has been
under-investigated. The prior art solution to this problem was proposed by Sheng and Li in
their recent seminal work [69]. We call it S&L scheme. This scheme has two main drawbacks:
(1) it allows attackers to obtain a reasonable estimation on both sensor collected data and
sink issued queries, and (2) the power consumption and storage space for both sensors and
storage nodes grow exponentially with the number of dimensions of collected data.

2.1.4

Our Approach and Key Contributions

In this work, we propose SafeQ, a novel privacy and integrity preserving range query protocol
for two-tiered sensor networks. The ideas of SafeQ are fundamentally diﬀerent from S&L
scheme. To preserve privacy, SafeQ uses a novel technique to encode both data and queries
such that a storage node can correctly process encoded queries over encoded data without
knowing their actual values. To preserve integrity, we propose two schemes, one using Merkle
hash trees and another using a new data structure called neighborhood chains, to generate
integrity veriﬁcation information such that a sink can use this information to verify whether
the result of a query contains exactly the data items that satisfy the query. We also propose
an optimization technique using Bloom ﬁlters to signiﬁcantly reduce the communication cost
between sensors and storage nodes. Furthermore, we propose a solution to adapt SafeQ for
event-driven sensor networks, where a sensor submits data to its nearby storage node only
when a certain event happens and the event may occur infrequently.
SafeQ excels state-of-the-art S&L scheme [69] in two aspects. First, SafeQ provides signiﬁcantly better security and privacy. While prior art allows a compromised storage node to
obtain a reasonable estimation on the value of sensor collected data and sink issued queries,
SafeQ makes such estimation very diﬃcult. Second, SafeQ delivers orders of magnitude bet-

9

ter performance on both power consumption and storage space for multi-dimensional data,
which are most common in practice as most sensors are equipped with multiple sensing
modules such as temperature, humidity, pressure, etc.

2.1.5

Summary of Experimental Results

We performed side-by-side comparison with prior art over a large real-world data set from
Intel Lab [4]. Our results show that the power and space savings of SafeQ over prior art
grow exponentially with the number of dimensions. For power consumption, for threedimensional data, SafeQ consumes 184.9 times less power for sensors and 76.8 times less
power for storage nodes. For space consumption on storage nodes, for three-dimensional
data, SafeQ uses 182.4 times less space. Our experimental results conform with the analysis
that the power and space consumption in S&L scheme grow exponentially with the number
of dimensions, whereas those in SafeQ grow linearly with the number of dimensions times
the number of data items.

2.2
2.2.1

Models and Problem Statement
System Model

We consider two-tired sensor networks as illustrated in Figure 2.1. A two-tired sensor network
consists of three types of nodes: sensors, storage nodes, and a sink. Sensors are inexpensive
sensing devices with limited storage and computing power. They are often massively distributed in a ﬁeld for collecting physical or environmental data, e.g., temperature. Storage
nodes are powerful mobile devices that are equipped with much more storage capacity and
computing power than sensors. Each sensor periodically sends collected data to its nearby
storage node. The sink is the point of contact for users of the sensor network. Each time the

10

sink receives a question from a user, it ﬁrst translates the question into multiple queries and
then disseminates the queries to the corresponding storage nodes, which process the queries
based on their data and return the query results to the sink. The sink uniﬁes the query
results from multiple storage nodes into the ﬁnal answer and sends it back to the user.

Sensor

Sensor
Data
Data
Data

Query
Storage Node

Result

Sink

Data
Sensor

Sensor

For interpretation of the references to color in this and
all other figures, the reader is referred to the electronic
version of this dissertation.
Figure 2.1. Architecture of two-tired sensor networks
For the above network architecture, we assume that all sensor nodes and storage nodes
are loosely synchronized with the sink. With loose synchronization in place, we divide time
into ﬁxed duration intervals and every sensor collects data once per time interval. From a
starting time that all sensors and the sink agree upon, every n time intervals form a time
slot. From the same starting time, after a sensor collects data for n times, it sends a message
that contains a 3-tuple (i, t, {d1 , · · · , dn }), where i is the sensor ID and t is the sequence
number of the time slot in which the n data items {d1 , · · · , dn } are collected by sensor si .
We address privacy and integrity preserving ranges queries for event-driven sensor networks,
where a sensor only submits data to a nearby storage node when a certain event happens,
in Section 2.7. We further assume that the queries from the sink are range queries. A
range query “ﬁnding all the data items, which are collected at time slot t and whose value
is in the range [a, b]” is denoted as {t, [a, b]}. Note that the queries in most sensor network

11

applications can be easily modeled as range queries. For ease of presentation, Table 2.1
shows the notation used in this chapter.
si
ki
t
d1 , · · · , dn
D1 , · · · , D n
H, G, E
F (x)
S([d1 , d2 ])
N
HMACg
QR
VO

A sensor with ID i
The secret key of sensor si
The sequence number of a time slot
n 1-dimensional data items
n z-dimensional data items
Three “magic” functions
The preﬁx family of x
The minimum set of preﬁxes converted from [d1 , d2 ]
A preﬁx numericalization function
An HMAC function with key g
A query result
An integrity veriﬁcation object
Table 2.1. Summary of notation

2.2.2

Threat Model

For a two-tiered sensor network, we assume that the sensors and the sink are trusted but
the storage nodes are not. In a hostile environment, both sensors and storage nodes can be
compromised. If a sensor is compromised, the subsequent collected data of the sensor will
be known to the attacker and the compromised sensor may send forged data to its closest
storage node. It is extremely diﬃcult to prevent such attacks without the use of tamper proof
hardware. However, the data from one sensor constitute a small fraction of the collected data
of the whole sensor network. Therefore, we mainly focus on the scenario where a storage
node is compromised. Compromising a storage node can cause much greater damage to the
sensor network than compromising a sensor. After a storage node is compromised, the large
quantity of data stored on the node will be known to the attacker and upon receiving a
query from the sink, the compromised storage node may return a falsiﬁed result formed by
12

including forged data or excluding legitimate data. Therefore, attackers are more motivated
to compromise storage nodes.

2.2.3

Problem Statement

The fundamental problem for a two-tired sensor network is: how can we design the storage
scheme and the query protocol in a privacy and integrity preserving manner ? A satisfactory
solution to this problem should meet the following two requirements: (1) Data and query
privacy: Data privacy means that a storage node cannot know the actual values of sensor
collected data. This ensures that an attacker cannot understand the data stored on a compromised storage node. Query privacy means that a storage node cannot know the actual
value of sink issued queries. This ensures that an attacker cannot understand, or deduce
useful information from, the queries that a compromised storage node receives. (2) Data
integrity: If a query result that a storage node sends to the sink includes forged data or excludes legitimate data, the query result is guaranteed to be detected by the sink as invalid.
Besides these two hard requirements, a desirable solution should have low power and space
consumption because these mobile devices have limited resources.

2.3

Privacy for 1-dimensional Data

To preserve privacy, it seems natural to have sensors encrypt data and the sink encrypt
queries; however, the key challenge is how a storage node processes encrypted queries over
encrypted data.
The idea of our solution for preserving privacy is illustrated in Figure 2.2. We assume
that each sensor si in a network shares a secret key ki with the sink. For the n data
items d1 , · · · , dn that a sensor si collects in time slot t, si ﬁrst encrypts the data items
using key ki , the results of which are represented as (d1 )k , · · · , (dn )k . Then, si applies a
i
i
13

“magic” function H to the n data items and obtains H(d1 , · · · , dn ). The message that the
sensor sends to its closest storage node includes both the encrypted data and the associative
information H(d1 , · · · , dn ). When the sink wants to perform query {t, [a, b]} on a storage
node, the sink applies another “magic” function G on the range [a, b] and sends {t, G([a, b])}
to the storage node. The storage node processes the query {t, G([a, b])} over encrypted
data (d1 )ki , · · · , (dn )ki collected at time slot t using another “magic” function E. The three
“magic” functions H, G, and E satisfy the following three conditions: (1) A data item
dj (1 ≤ j ≤ n) is in range [a, b] if and only if E(j, H(d1 , · · · , dn ), G([a, b])) is true. This
condition allows the storage node to decide whether (dj )k should be included in the query
i
result. (2) Given H(d1 , · · · , dn ) and (dj )ki , it is computationally infeasible for the storage
node to compute dj . This condition guarantees data privacy. (3) Given G([a, b]), it is
computationally infeasible for the storage node to compute [a, b]. This condition guarantees
query privacy.

Sensor si

Sink

Storage Node
(d1)ki,…,(dn)ki, H(d1,…,dn)

{t, G([a, b])}

dj[a, b] iff E(j, H(d1, …, dn), G([a, b])) is true
Figure 2.2. The idea of SafeQ for preserving privacy

2.3.1

Preﬁx Membership Veriﬁcation

The building block of our privacy preserving scheme is the preﬁx membership veriﬁcation
scheme ﬁrst introduced in [23] and later formalized in [48]. The idea of this scheme is to
convert the veriﬁcation of whether a number is in a range to several veriﬁcations of whether
two numbers are equal. A preﬁx {0, 1}k {∗}w−k with k leading 0s and 1s followed by w − k
∗s is called a k−preﬁx. For example, 1*** is a 1-preﬁx and it denotes the range [1000, 1111].
14

If a value x matches a k−preﬁx (i.e., x is in the range denoted by the preﬁx), the ﬁrst k
bits of x and the k−preﬁx are the same. For example, if x ∈ 1*** (i.e., x ∈ [1000, 1111]),
then the ﬁrst bit of x must be 1. Given a binary number b1 b2 · · · bw of w bits, the preﬁx
family of this number is deﬁned as the set of w + 1 preﬁxes {b1 b2 · · · bw , b1 b2 · · · bw−1 ∗, · · · ,
b1 ∗ · · · ∗, ∗ ∗ ...∗}, where the i-th preﬁx is b1 b2 · · · bw−i+1 ∗ · · · ∗. The preﬁx family of x is
denoted as F (x). For example, the preﬁx family of number 12 is F (12) = F (1100) ={1100,
110*, 11**, 1***, ****}. Preﬁx membership veriﬁcation is based on the fact that for any
number x and preﬁx P , x ∈ P if and only if P ∈ F (x).
To verify whether a number a is in a range [d1 , d2 ], we ﬁrst convert the range [d1 , d2 ] to
a minimum set of preﬁxes, denoted S([d1 , d2 ]), such that the union of the preﬁxes is equal
to [d1 , d2 ]. For example, S([11, 15]) ={1011,11**}. Given a range [d1 , d2 ], where d1 and
d2 are two numbers of w bits, the number of preﬁxes in S([d1 , d2 ]) is at most 2w − 2 [39].
Second, we compute the preﬁx family F (a) for number a. Thus, a ∈ [d1 , d2 ] if and only if
F (a) ∩ S([d1 , d2 ]) = ∅.
To verify whether F (a) ∩ S([d1 , d2 ]) = ∅ using only the operations of verifying whether
two numbers are equal, we convert each preﬁx to a corresponding unique number using a
preﬁx numericalization function. A preﬁx numericalization function N needs to satisfy the
following two properties: (1) for any preﬁx P , N (P ) is a binary string; (2) for any two
preﬁxes P1 and P2 , P1 = P2 if and only if N (P1 ) = N (P2 ). There are many ways to do
preﬁx numericalization. We use the preﬁx numericalization scheme deﬁned in [20]. Given a
preﬁx b1 b2 · · · bk ∗ · · · ∗ of w bits, we ﬁrst insert 1 after bk . The bit 1 represents a separator
between b1 b2 · · · bk and ∗ · · · ∗. Second, we replace every * by 0. Note that if there is no *
in a preﬁx, we add 1 at the end of this preﬁx. For example, 11 ∗ ∗ is converted to 11100.
Given a set of preﬁxes S, we use N (S) to denote the resulting set of numericalized preﬁxes.
Therefore, a ∈ [d1 , d2 ] if and only if N (F (a)) ∩ N (S([d1 , d2 ])) = ∅. Figure 2.3 illustrates
the process of verifying 12 ∈ [11, 15].
15

[11, 15]
⇓ Preﬁx conversion
1011
11**
⇓ Preﬁx numericalization
10111
11100

12 (=1100)
⇓ Preﬁx family construction
1100
11**
****
110*
1***
⇓ Preﬁx numericalization
11001 11100 10000
11010 11000

(a)

(b)
Figure 2.3. Preﬁx membership veriﬁcation

2.3.2

The Submission Protocol

The submission protocol concerns how a sensor sends its data to a storage node. Let
d1 , · · · , dn be data items that sensor si collects at a time slot. Each item dj (1≤j ≤n)
is in the range (d0 , dn+1 ), where d0 and dn+1 denote the lower and upper bounds, respectively, for all possible data items that a sensor may collect. The values of d0 and dn+1 are
known to both sensors and the sink. After collecting n data items, si performs six steps:
1) Sort the n data items in an ascending order.

For simplicity,

we assume

d0 <d1 <d2 <· · · <dn <dn+1. If some data items have the same value, we simply represent
them as one data item annotated with the number of such data items.
2) Convert the n + 1 ranges [d0 , d1 ], [d1 , d2 ], · · · , [dn , dn+1] to their corresponding preﬁx
representation, i.e., compute S([d0 , d1 ]), S([d1 , d2 ]), · · · , S([dn , dn+1]).
3) Numericalize all preﬁxes. That is, compute N (S([d0 , d1 ])), · · · , N (S([dn , dn+1 ])).
4) Compute the keyed-Hash Message Authentication Code (HMAC) of each numericalized
preﬁx using key g, which is known to all sensors and the sink. Examples of HMAC
implementations include HMAC-MD5 and HMAC-SHA1 [46, 66, 29]. An HMAC function
using key g, denoted HMAC g , satisﬁes the one-wayness property (i.e., given HMAC g (x),
16

it is computationally infeasible to compute x and g) and the collision resistance property
(i.e., it is computationally infeasible to ﬁnd two distinct numbers x and y such that
HMAC g (x) = HMAC g (y)). Given a set of numbers S, we use HMAC g (S) to denote
the resulting set of numbers after applying function HMAC g to every number in S. In
summary, this step computes HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1]))).
5) Encrypt every data item with key ki , i.e., compute (d1 )ki , · · · , (dn )ki .
6) Sensor si sends the encrypted data along with HMAC g (N (S([d0 , d1 ]))),

···,

HMAC g (N (S([dn , dn+1 ]))) to its closest storage node.
The above steps show that the aforementioned “magic” function H is deﬁned as follows:
H(d1 , · · · , dn ) = (HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1]))))
Due to the one-wayness and collision resistance properties of the HMAC function, given
H(d1 , · · · , dn ) and the n encrypted data items (d1 )k , · · · , (dn )k , the storage node cannot
i
i
compute the value of any data item.

2.3.3

The Query Protocol

The query protocol concerns how the sink sends a range query to a storage node. When
the sink wants to perform query {t, [a, b]} on a storage node, it performs the following four
steps. Note that any range query [a, b] satisﬁes the condition d0 < a ≤ b < dn+1,
1) Compute preﬁx families F (a) and F (b).
2) Numericalize all preﬁxes, i.e., compute N (F (a)), N (F (b)).
3) Apply HMAC g to each numericalized preﬁx, i.e., compute HMAC g (N (F (a))) and
HMAC g (N (F (b))).
17

4) Send {t, HMAC g (N (F (a))), HMAC g (N (F (b)))} as a query to the storage node.
The above steps show that the aforementioned “magic” function G is deﬁned as follows:
G([a, b]) = (HMAC g (N (F (a))), HMAC g (N (F (b))))
Because of the one-wayness and collision resistance properties of the HMAC function, the
storage node cannot compute a and b from the query that it receives.

2.3.4

Query Processing

Upon receiving query {t, HMAC g (N (F (a))), HMAC g (N (F (b)))}, the storage node processes this query on the n data items (d1 )k , · · · , (dn )k received from each nearby sensor si
i
i
at time slot t based on the following theorem.
Theorem 1. Given n numbers sorted in the ascending order d1 < · · · < dn , where dj ∈
(d0 , dn+1 ) (1 ≤ j ≤ n), and a range [a, b] (d0 < a ≤ b < dn+1), dj ∈ [a, b] if and only if
there exist 1 ≤ n1 ≤ j < n2 ≤ n + 1 such that the following two conditions hold:
(1) HMAC g (N (F (a))) ∩ HMAC g (N (S([dn1 −1 , dn1 ]))) = ∅
(2) HMAC g (N (F (b))) ∩ HMAC g (N (S([dn2 −1 , dn2 ]))) = ∅.
Proof. Note that dj ∈ [a, b] (1 ≤ j ≤ n) if and only if there exist n1 and n2 (1 ≤ n1 ≤ j <
n2 ≤ n + 1) such that a ∈ [dn1 −1 , dn1 ] and b ∈ [dn2 −1 , dn2 ]. Further, a ∈ [dn1 −1 , dn1 ] if and
only if HMAC g (N (F (a))) ∩ HMAC g (N (S([dn1 −1 , dn1 ]))) = ∅ and b ∈ [dn2 −1 , dn2 ] if and
only if HMAC g (N (F (b))) ∩ HMAC g (N (S([dn2 −1 , dn2 ]))) = ∅
Based on Theorem 1, the storage node searches for the smallest n1 and the largest n2
(1 ≤ n1 , n2 ≤ n + 1) such that a ∈ [dn1 −1 , dn1 ] and b ∈ [dn2 −1 , dn2 ]. If n1 < n2 , the data
items dn1 , dn1 +1 , · · · , dn2 −1 are in the range [a, b]; if n1 = n2 , no data item is in the range
[a, b].
18

In fact, there is another privacy preserving scheme. First, sensor si converts each data
value dj to a preﬁx family F (dj ), and then applies the numericalization and hash functions
HMACg (N (F (dj ))). Second, the sink converts a given range query [a, b] to a set of preﬁxes
S([a, b]), and then applies the numericalization and hash functions HMACg (N (S([a, b]))).
Finally, the storage node checks whether HMACg (N (F (dj ))) has a common element with
HMACg (N (S([a, b]))). However, this privacy preserving scheme is not compatible with
the integrity preserving scheme that we will discuss in Section 2.4. Because this privacy
preserving scheme does not allow storage nodes to identify the positions of a and b (from the
range query [a, b]) among d1 , · · · , dn if no data item satisﬁes the query. While our integrity
preserving scheme requires storage nodes to know such information in order to compute
integrity veriﬁcation objects.

2.4

Integrity for 1-dimensional Data

The meaning of data integrity is two-fold in this context. In the result that a storage node
sends to the sink in responding to a query, ﬁrst, the storage node cannot include any data
item that does not satisfy the query; second, the storage node cannot exclude any data item
that satisﬁes the query. To allow the sink to verify the integrity of a query result, the query
response from a storage node to the sink consists of two parts: (1) the query result QR,
which includes all the encrypted data items that satisfy the query; (2) the veriﬁcation object
V O, which includes information for the sink to verify the integrity of QR. To achieve this
purpose, we propose two schemes based on two diﬀerent techniques, Merkle hash trees and
neighborhood chains.

19

2.4.1

Integrity Scheme Using Merkle Hash Trees

Our ﬁrst integrity preserving mechanism is based on Merkle hash trees [58]. Each time a
sensor sends data items to storage nodes, it constructs a Merkle hash tree for the data items.
Figure 2.4 shows a Merkle hash tree constructed for eight data items. Suppose sensor si
wants to send n = 2m encrypted data items (d1 )ki , · · · , (dn )ki to a storage node. Sensor si
ﬁrst builds a Merkle hash tree for the n = 2m data items, which is a complete binary tree.
The n terminal nodes are H1 , · · · , Hn , where Hj = h((dj )k ) for every 1 ≤ j ≤ n. Function
i
h is a one-way hash function such as MD5 [66] or SHA-1 [29]. The value of each nonterminal
node v, whose children are vl and vr , is the hash of the concatenation of vl ’s value and vr ’s
value. For example, in Figure 2.4, H12 = h(H1 |H2 ). Note that if the number of data items
n is not a power of 2, interim hash values that do not have a sibling value to which they
may be concatenated are promoted, without any change, up the tree until a sibling is found.
Note that the resulting Merkle hash tree will not be balanced. For the example Merkle hash
tree in Figure 2.4, if we remove the nodes H6 , H7 , H8 , H78 , and let H58 = H56 = H5 , the
resulting unbalanced tree is the Merkle hash tree for 5 data items.

H18=HMACki (H14|H58)

H14=h(H12|H34)
H12=h(H1|H2)

H1=h((d1)ki)

H18

H14

H58

H12

H34

H56

H78

H1

H2

H3

H4

H5

H6

H7

(d1)ki

(d2)ki

(d3)ki

(d4)ki

(d5)ki

(d6)ki

(d7)ki

Figure 2.4. Merkle hash tree for 8 data items
20

H8
(d8)ki

The Merkle hash tree used in our solution has two special properties that allow the sink to
verify query result integrity. First, the value of the root is computed using a keyed HMAC
function, where the key is ki , the key shared between sensor si and the sink. For example, in
Figure 2.4, H18 = HMAC k (H14 |H58 ). Using a keyed HMAC function gives us the property
i
that only sensor si and the sink can compute the root value. Second, the terminal nodes are
arranged in an ascending order based on the value of each data item dj .
We ﬁrst discuss what a sensor needs to send to its nearest storage node along its data
items. Each time sensor si wants to send n encrypted data items to a storage node, it ﬁrst
computes a Merkle hash tree over the n encrypted data items, and then sends the root value
along with the n encrypted data items to a storage node. Note that among all the nodes in
the Merkle hash tree, only the root is sent from sensor si to the storage node because the
storage node can compute all other nodes in the Merkle hash tree by itself.
Next, we discuss what a storage node needs to send to the sink along a query result, i.e.,
what should be included in a veriﬁcation object. For the storage node that is near to sensor
si , each time it receives a query {t, [a, b]} from the sink, it ﬁrst ﬁnds the data items that
are in the range [a, b]. Second, it computes the Merkle hash tree (except the root) from the
data items. Third, it sends the query result QR and the veriﬁcation object V O to the sink.
Given data items (d1 )ki ,· · · ,(dn )ki in a storage node, where d1 <· · · <dn , and a range [a, b],
where dn1 −1 <a≤dn1 <· · · <dn2 ≤b<dn2 +1 and 1≤n1 − 1<n2 + 1≤n, and the query result
QR = {(dn1 )k , · · · , (dn2 )k }, the storage node should include (dn1 −1 )k and (dn2 +1 )k in
i
i
i
i
the veriﬁcation object V O because (dn1 −1 )k and (dn2 +1 )k ensure that the query result does
i

i

include all data items that satisfy the query as the query result is bounded by them. We
call (dn1 −1 )ki the left bound of the query result and (dn2 +1 )ki the right bound of the query
result. Note that the left bound (dn1 −1 )ki and the right bound (dn2 +1 )ki may not exist. If
a ≤ d1 , the left bound (dn1 −1 )k does not exist; if dn ≤ b, the right bound (dn2 +1 )k does
i
i
not exist. The veriﬁcation object includes zero to two encrypted data items and O(log n)
21

proof nodes in the Merkel hash tree that are needed for the sink to verify the integrity of
the query result. Taking the example in Figure 2.5, suppose a storage node has received 8
data items {(2)k , (5)k , (9)k , (15)k , (20)k , (23)k , (34)k , (40)k } that sensor si collected
i
i
i
i
i
i
i
i
at time t, and the sink wants to perform the query {t, [10, 30]} on the storage node. Using
Theorem 1, the storage node ﬁnds that the query result includes (15)ki , (20)ki , and (23)ki
which satisfy the query. Along with the query result (i.e., the three data items), the storage
node also sends (9)k , (34)k , H12 , H8 , and H18 , which are marked grey in Figure 2.5, to the
i
i
sink as the veriﬁcation object.

H18
H14

H58

H12

H34

H56

H1

H2

H3

H4

H5

(2)ki

(5)ki

(9)ki

(15)ki

(20)ki

H78

H6
(23)ki

H7
(34)ki

H8
(40)ki

Range [10, 30]
Query result

Verification object

Figure 2.5. Data integrity veriﬁcation
Next, we discuss how the sink uses Merkle hash trees to verify query result integrity. Upon
receiving a query result QR = {(dn1 )k , · · · , (dn2 )k } and its veriﬁcation object, the sink
i
i
computes the root value of the Merkle hash tree and then veriﬁes the integrity of the query
result. Query result integrity is preserved if and only if the following four conditions hold: (1)
The data items in the query result do satisfy the query. (2) If the left bound (dn1 −1 )ki exists,
22

verify that dn1 −1 < a and (dn1 −1 )ki is the nearest left neighbor of (dn1 )ki in the Merkle
hash tree; otherwise, verify that (dn1 )ki is the leftmost encrypted data item in the Merkle
hash tree. (3) If the right bound (dn2 +1 )k exists, verify that b < dn2 +1 and (dn2 +1 )k is
i
i
the nearest right neighbor of (dn2 )k in the Merkle hash tree; otherwise, verify that (dn2 )k
i
i
is the rightmost encrypted data item in the Merkle hash tree. (4) The computed root value
is the same as the root value included in V O.
Note that sorting data items is critical in our scheme for ensuring the integrity of query
result. Without this property, it is diﬃcult for a storage node to prove query result integrity
without sending all data items to the sink.

2.4.2

Integrity Scheme Using Neighborhood Chains

We ﬁrst present a new data structure called neighborhood chains and then discuss its use in integrity veriﬁcation. Given n data items d1 , · · · , dn , where d0 < d1 < · · · < dn < dn+1 , we call
the list of n items encrypted using key ki , (d0 |d1 )k , (d1 |d2 )k , · · · , (dn−1 |dn )k , (dn |dn+1 )k ,
i

i

i

i

the neighborhood chain for the n data items. Here “|” denotes concatenation. For any item
(dj−1 |dj )ki in the chain, we call dj the value of the item and (dj |dj+1 )ki the right neighbor
of the item. Figure 2.6 shows the neighborhood chain for the 5 data items 1, 3, 5, 7 and 9.

(d0|1)ki

(1|3)ki

(3|5)ki

(5|7)ki

(7|9)ki

(9|d6)ki

Range [2, 8]
Query result

Verification object

Figure 2.6. An example neighborhood chain
Preserving query result integrity using neighborhood chaining works as follows. After
collecting n data items d1 , · · · , dn , sensor si sends the corresponding neighborhood chain
(d0 |d1 )k , (d1 |d2 )k , · · · , (dn−1|dn )k , (dn |dn+1 )k , instead of (d1 )k , · · · , (dn )k , to a stori
i
i
i
i
i
23

age node.

Given a range query [a, b], the storage node computes QR as usual.

The

corresponding veriﬁcation object V O only consists of the right neighbor of the largest
data item in QR. Note that V O always consists of one item for any query. If QR =
{(dn1 −1 |dn1 )k , · · · , (dn2 −1 |dn2 )k }, then V O = {(dn2 |dn2 +1 )k }; if QR = ∅, suppose
i
i
i
dn2 < a ≤ b < dn2 +1 , then V O = {(dn2 |dn2+1 )ki }.
After the sink receives QR and V O, it veriﬁes the integrity of QR as follows. First, the
sink veriﬁes that every item in QR satisﬁes the query. Assume that the sink wants to perform
the range query [2, 8] over the data items in Figure 2.6. The storage node calculates QR
to be {(1|3)k ,(3|5)k , (5|7)k } and V O to be {(7|9)k }. Second, the sink veriﬁes that the
i
i
i
i
storage node has not excluded any item that satisﬁes the query. Let {(dn1 −1 |dn1 )ki , · · · ,
(dj−1 |dj )ki , · · · , (dn2 −1 |dn2 )ki } be the correct query result and QR be the query result from
the storage node. We consider the following four cases.
1. If there exists n1 < j < n2 such that (dj−1 |dj )k ∈ QR, the sink can detect this error
/
i

because the items in QR do not form a neighborhood chain.
2. If (dn1 −1 |dn1 )k ∈ QR, the sink can detect this error because it knows the existence
/
i
of dn1 from (dn1 |dn1 +1 )k and dn1 satisﬁes the query.
i
3. If (dn2 −1 |dn2 )ki ∈ QR, the sink can detect this error because it knows the existence
/
of dn2 from the item (dn2 |dn2+1 )k in V O and dn2 satisﬁes the query.
i
4. If QR = ∅, the sink can verify this fact because the item (dn2 |dn2 +1 )ki in V O should
satisfy the property dn2 < a ≤ b < dn2 +1 .

2.5

Queries over Multi-dimensional Data

Sensor collected data and sink issued queries are typically multi-dimensional as most sensors
are equipped with multiple sensing modules such as temperature, humidity, pressure, etc.

24

A z-dimensional data item D is a z-tuple (d1 , · · · , dz ) where each dl (1 ≤ l ≤ z) is the
value for the l-th dimension (i.e., attribute). A z-dimensional range query consists of z subqueries [a1 , b1 ], · · · , [az , bz ] where each sub-query [al , bl ] (1 ≤ l ≤ z) is a range over the l-th
dimension.

2.5.1

Privacy for Multi-dimensional Data

We extend our privacy preserving techniques for one-dimensional data to multi-dimensional
data as follows. Let D1 , · · · , Dn denote the n z-dimensional data items that a sensor
si collects at time slot t, where Dj = (d1 , · · · , dz ) (1 ≤ j ≤ n).
j
j

First, si encrypts

these data with its secret key ki . Second, for each dimension l, si applies the “magic”
function H and obtains H(dl , · · · , dl ). At last, si sends the encrypted data items and
n
1
H(d1 , · · · , d1 ), H(d2 , · · · , d2 ), · · · , H(dz , · · · , dz ) to a nearby storage node. For example,
n
n
n
1
1
1
sensor si collects 5 two-dimensional data items (1,11), (3,5), (6,8), (7,1) and (9,4) at time
slot t, it will send the encrypted data items as well as H(1, 3, 6, 7, 9) and H(1, 4, 5, 8, 11)
to a nearby storage node. When the sink wants to perform query {t, ([a1 , b1 ], · · · , [az , bz ])}
on a storage node, the sink applies the “magic” function G on each sub-query [al , bl ] and
sends {t, G([a1 , b1 ]), · · · , G([az , bz ])} to the storage node. The storage node then applies the
“magic” function F to ﬁnd the query result QRl for each sub-query [al , bl ]. Here the three
“magic” functions H, G, and F are the same as the “magic” functions deﬁned in Section 2.3.
Finally, the storage node computes QR = QR1 ∩ · · · ∩ QRz as the query result. Considering
the above example, given a range query ([2,7],[3,8]), the query result QR1 for the sub-query
[2,6] is the encrypted data items of (3,5),(6,8) and the query result QR2 for the sub-query
[3,8] is the encrypted data items of (9,4),(3,5),(6,8). Therefore the query result QR is the
encrypted data items of (3,5),(6,8).

25

2.5.2

Integrity for Multi-dimensional Data

We next present two integrity preserving schemes for multi-dimensional data: one builds a
merkle hash tree for each dimension; another builds a multi-dimensional neighborhood chain.
Integrity Scheme Using Merkle Hash Trees
To preserve integrity, sensor si ﬁrst computes z Merkle hash trees over the n encrypted
data items along z dimensions. In the l-th Merkle hash tree, the n data items are sorted
according to the values of the l-th attribute. Second, si sends the z root values to a storage
node along with the encrypted data items. For a storage node that is near to sensor si ,
each time it receives a query {t, ([a1 , b1 ], · · · , [az , bz ])}, it ﬁrst ﬁnds the query result QRl
′

for each range [al , bl ]. Second, it chooses the query result QRl that contains the smallest
number of encrypted data items among QR1 , · · · , QRn . Third, it computes the Merkle hash
tree in which the data items are sorted according to the l′ -th attribute. Finally, it sends
′

′

QRl and the corresponding veriﬁcation object V O l to the sink. For example, suppose a
sensor si collects 4 two-dimensional data items {(12, 5), (15, 6), (23, 4), (45, 3)} in a time
slot. Sensor si computes a Merkle hash tree along each dimension. Figure 2.7 shows the
two Merkle hash trees. Given a two-dimensional range query {[16, 23], [3, 5]}, the storage
node can ﬁnd the query results QR1 = {(23, 4)k } based on the ﬁrst attribute and QR2 =
i

{(45, 3)ki , (23, 4)ki , (12, 5)ki } based on the second attribute. Since QR1 only contains one
encrypted data item, the storage node sends to the sink the query result QR1 = {(23, 4)ki }
1
1
and the corresponding veriﬁcation object V O 1 = {(15, 6)k , (45, 3)k , H1 , H14}.
i
i

Note that the query result of a multi-dimensional range query may contain data items
that do not satisfy the query. After decryption, the sink can easily prune the query result
by discarding such data items.

26

H142

H141
H121

H11

H122

H341

H21

(12,5)ki (15,6)ki

H31
(23,4)ki

H12

H41
(45,3)ki

H22

H342

H32

(45,3)ki (23,4)ki (12,5)k
i
[3, 5]

[16, 23]
Query result

H42
(15,6)ki

Verification object

Figure 2.7. Merkle hash trees for two-dimensional data
Integrity Scheme Using Neighborhood Chains
The basic idea is for each of the z values in a data item, we ﬁnd its nearest left neighbor along
each dimension and embed this information when we encrypt the item. Such neighborhood
information is used by the sink for integrity veriﬁcation.
We ﬁrst present multi-dimensional neighborhood chains and then discuss its use in integrity
veriﬁcation. Let D1 , · · · , Dn , where Dj = (d1 , · · · , dz ) for each 1 ≤ j ≤ n, denote n zj
j
dimensional data items. We use dl and dl
0
n+1 to denote the lower bound and the upper bound
of any data item along dimension l. We call D0 = (d1 , · · · , dz ) and Dn+1 = (d1 , · · · , dz )
0
0
n+1
n+1
the lower bound and upper bound of the data items. For each dimension 1 ≤ l ≤ z, we
can sort the values of n data items along the l-th dimension together with dl and dl
0
n+1
in an ascending order. For ease of presentation, we assume dl < dl < · · · < dl
0
1
n+1 for
every dimension 1 ≤ l ≤ z. In this sorted list, we call dl
j−1 (1 ≤ j ≤ n + 1) the left
neighboring value of dl . We use Ll (dl ) to denote the left neighboring value of dl along
j
j
j
dimension l. A multi-dimensional neighborhood chain for D1 , · · · , Dn is constructed by
encrypting every item Dj as (L1 (d1 )|d1 , · · · , Lz (dz )|dz )ki , which is denoted as MNC (Dj ).
j
j
j
j
27

We call Dj the value of MNC (Dj ). Note that when multiple data items have the same value
along the l-th dimension, we annotate Ll (dl ) with the number of such items in MNC (Dj ).
j
The list of n + 1 items encrypted with key ki , MNC (D1 ), · · · , MNC (Dn ), MNC (Dn+1 ),
forms a multi-dimensional neighborhood chain. The nice property of a multi-dimensional
neighborhood chain is that all data items form a neighborhood chain along every dimension.
This property allows the sink to verify the integrity of query results. Considering 5 example
2-dimensional data items (1,11), (3,5), (6,8), (7,1), (9,4) with lower bound (0, 0) and upper
bound (15, 15), the corresponding multi-dimensional neighborhood chain encrypted with key
ki is (0|1, 9|11)k , (1|3, 4|5)k , (3|6, 5|8)k , (6|7, 0|1)k , (7|9, 1|4)k and (9|15, 11|15)k . Figure
i

i

i

i

i

i

2.8 illustrates this chain, where each black point denotes an item, the two grey points denote
the lower and upper bounds, the solid arrows illustrate the chain along the X dimension,
and the dashed arrows illustrate the chain along the Y dimension.

(15,15)
(1,11)
Y dimension

Query
(6,8)
(9,4)

(3,5)

(7,1)
(0,0)

X dimension

Figure 2.8. A 2-dimensional neighborhood chain
Next, we discuss the operations carried on sensors, storage nodes, and the sink using
multi-dimensional chaining.
Sensors: After collecting n z-dimensional data items at time slot t, sensor si computes
28

the multi-dimensional chain for the items and sends it to a storage node.
Storage nodes: Given a z-dimensional query ([a1 , b1 ], · · · , [az , bz ]), a storage node ﬁrst
computes QR = QR1 ∩ · · · ∩ QRz . Second, it computes V O = (QRl − QR) ∪ {R([al , bl ])},
where QRl is the smallest set among QR1 , · · · , QRz (i.e., |QRl | ≤ |QRi | for any 1 ≤ i ≤ z)
and R([al , bl ]) is the right bounding item of the range [al , bl ]. Given a multi-dimensional
chain MNC (D1 ), · · · , MNC (Dn ), MNC (Dn+1 ) and a sub-query [al , bl ] along dimension l,
the right bounding item of [al , bl ] is the item MNC (Dj ) where Ll (dl ) ≤ bl < dl . Figure
j
j
2.8 shows a query ([2,6],[3,8]) with a query result QR = {MNC (3, 5), MNC (6, 8)} and
V O = {MNC (7, 1)}.
The Sink: Upon receiving QR and V O, the sink veriﬁes the integrity of QR as follows.
First, it veriﬁes that every item in QR satisﬁes the query. Second, it veriﬁes that the
storage node has not excluded any item that satisﬁes the query based on the following three
properties: (1) The items in QR ∪ V O should form a chain along one dimension, say l. Thus,
if the storage node excludes an item whose value in the l-th dimension is in the middle of
this chain, this chaining property would be violated. (2) The item in QR ∪ V O that has
the smallest value among the l-th dimension, say MNC (Dj ), satisﬁes the condition that
Ll (dl ) < al . Thus, if the storage node excludes the item whose value on the l-th dimension
j
is the beginning of the chain, this property would be violated. (3) There exists only one item
in V O that is the right bounding item of [al , bl ]. Thus, if the storage node excludes the item
whose value on the l-th dimension is the end of the chain, this property would be violated.

2.6

SafeQ Optimization

In this section, we present an optimization technique based on Bloom ﬁlters [15] to reduce
the communication cost between sensors and storage nodes. This cost can be signiﬁcant
because of two reasons. First, in each submission, a sensor needs to convert each range

29

[dj , dj+1], where dj and dj+1 are two numbers of w bits, to 2w − 2 preﬁx numbers in the
worst case. Second, the sensor applies HMAC to each preﬁx number, which results in a 128bit string if we choose HMAC-MD5 or a 160-bit string if we choose HMAC-SHA1. Reducing
communication cost for sensors is important because of power consumption reasons.
Our basic idea is to use a Bloom ﬁlter to represent HMAC g (N (S([d0 , d1 ]))), · · · ,
HMAC g (N (S([dn , dn+1 ]))). Thus, a sensor only needs to send the Bloom ﬁlter instead
of the hashes to a storage node. The number of bits needed to represent the Bloom ﬁlter
is much smaller than that needed to represent the hashes. Next, we discuss the operations
that sensors and storage nodes need to perform in using this optimization technique.
Sensors:

Let A be a bit array of size c representing the Bloom ﬁlter for

HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1]))) that a sensor computes after collecting n data items d1 , · · · , dn assuming d1 ≤ · · · ≤ dn . Let B be an array of c pointers.
For every 0 ≤ j ≤ n and for every number v in HMAC g (N (S([dj , dj+1 ]))), the sensor
applies k hash functions on v, where each hash function hi (1 ≤ i ≤ k) hashes v to an
integer in the range [1, c], and then sets A[hi (v)] to be 1 and appends the index j to the
list that B[hi (v)] points to. In each submission, the sensor sends A and B to its closest
storage node. For example, HMAC g (N (S([10, 11]))) and HMAC g (N (S([11, 15]))) can be
represented as the two arrays in Figure 2.9, where “-” denotes an empty pointer. Note that
N (S([10, 11])) = {10110} and N (S([11, 15])) = {10111, 11100}.
The logical meaning of B is an array of pointers, each pointing to a list of indices from 0
to n. To reduce the space used for storing pointers, we implement B as a concatenation of
all these lists separated by delimiters. For example, we can represent the array B in Figure
2.9 as a list 1, \0, 1, \0, 1, 2, \0, 2, \0, 2, \0, 2, \0, 2, \0, · · · , \0, 2 .
Storage nodes:

Recall that a ∈ [dj , dj+1 ] if and only if HMAC g (N (F (a))) ∩

HMAC g (N (S([dj , dj+1]))) = ∅. If HMAC g (N (F (a))) ∩ HMAC g (N (S([dj , dj+1 ]))) = ∅,
then there exists at least one number v in HMAC g (N (F (a))) such that the following two
30

HMACg(10110)
h1

A
B

0

1

-

h1

h2 h3
0

1

1

HMACg(10111)

0

1

1

0
-

1

h3

h2
1

0

1

-

h1

0

1

2

2

HMACg(11100)
h2
0

1

2

h3
......

1

......
2

2

2
Figure 2.9. An example Bloom ﬁlter
conditions hold: (1) for every 1 ≤ i ≤ k, A[hi (v)] is 1; (2) for every 1 ≤ i ≤ k, index j
is included in the list that B[hi (v)] points to. For example, to verify whether 12 ∈ [11, 15]
using the Bloom ﬁlter in Figure 2.9, a storage node can apply the 3 hash functions to each
number in HMAC g (N (F (12))). For one number HMAC g (11100) in HMAC g (N (F (12))),
the storage node veriﬁes that HMAC g (11100) satisﬁes the above two conditions, therefore
12 ∈ [11, 15].
Although using our optimization technique QR may contain data items that do not satisfy
the query, they can be easily pruned by the sink after decryption. Given a query [a, b], using
this optimization technique, a storage node may ﬁnd multiple ranges that contain a and
multiple ranges that contain b due to false positives of Bloom ﬁlters. In this case, the storage
node uses the ﬁrst range that contains a and the last range that contains b to compute the
query result.
Bloom ﬁlters introduce false positives in the result of a query, i.e., the data items that
do not satisfy the query. We can control the false positive rate by adjusting Bloom ﬁlter
parameters. Let ǫ denote the average false positive rate and wh denote the bit length
of each number in HMAC g (N (S([dj , dj+1 ]))). For simplicity, we assume that each set
HMAC g (N (S([dj , dj+1]))) contains the same number of values, which is denoted as q. The
31

upper bound of the average false positive rate ǫ is shown in Formula 2.1, the derivation of
which is in Appendix A.
ǫ<

1 (n + 2)(n + 3)
(1 − e−k(n+1)q/c )k
k−1
3 (n + 1)

(2.1)

To represent HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1]))), without Bloom
ﬁlters, the total number of bits required is wh (n + 1)q; with Bloom ﬁlters, the total number
of bits required is at most c+2k(n+1)q⌈log2 (n + 1)⌉, the calculation of which is in Appendix
A. Therefore, our optimization technique reduces the communication cost if
wh (n + 1)q > c + 2k(n + 1)q⌈log2 (n + 1)⌉

(2.2)

Based on Formula 2.1 and 2.2, assuming wh = 128 and n ≥ 3, to achieve reduction on
the communication cost of sensors and the small false positive rate of ǫ < 1%, we choose
128
1
c = ln 2 k(n + 1)q and k to be 4 ≤ k < 1.44+2⌈log (n+1)⌉ . Note that only when n ≥ 215 ,
2

which is unlikely to happen, such k does not exist.

2.7

Queries in Event-driven Networks

So far we have assumed that at each time slot, a sensor sends to a storage node the data
that it collected at that time slot. However, this assumption does not hold for event-driven
networks, where a sensor only reports data to a storage node when certain event happens.
If we directly apply our solution here, then the sink cannot verify whether a sensor collected
data at a time slot. The case that a sensor did not submit any data at time slot t and the
case that the storage node discards all the data that the sensor collected at time slot t are
not distinguishable for the sink.

32

We address the above challenge by sensors reporting their idle period to storage node each
time when they submit data after an idle period or when the idle period is longer than a
threshold. Storage nodes can use such idle period reported by sensors to prove to the sink
that a sensor did not submit any data at any time slot in that idle period. Next, we discuss
the operations carried on sensors, storage nodes and the sink.
Sensors: An idle period for a sensor is a time slot interval [t1 , t2 ], which indicates that
the sensor has no data to submit from t1 to t2 , including t1 and t2 . Let γ be the threshold
of a sensor being idle without reporting to a storage node. Suppose the last time that sensor
si submitted data or reported idle period is time slot t1 − 1. At any time slot t ≥ t1 , si acts
based on three cases:
1. t = t1 : In this case, if si has data to submit, then it just submits the data; otherwise
it takes no action.
2. t1 < t < γ + t1 − 1: In this case, if si has data to submit, then it submits data along
with encrypted idle period [t1 , t − 1]k ; otherwise it takes no action. We call [t1 , t − 1]k
i

i

an idle proof.
3. t = γ + t1 − 1: In this case, if si has data to submit, then it submits data along with
the idle proof [t1 , t − 1]k ; otherwise, it submits the idle proof [t1 , t]k .
i

i

Figure 2.10 illustrates some idle periods for sensor si , where each unit in the time axis is a
time slot, a grey unit denotes that si has data to submit at that time slot, and a blank unit
denotes that si has no data to submit at that time slot. According to the second case, at
time slot t2 + 1, si submits data along with the idle proof [t1 , t2 ]k . According to the third
i

case, at time slot t4 , si submits the idle proof [t3 , t4 ]k .
i

Storage nodes: When a storage node receives a query {t, G([a, b])} from the sink, it ﬁrst
checks whether si has submitted data at time slot t. If si has, then the storage node sends
the query result as discussed in Section 2.3. Otherwise, the storage node checks whether si
33

Ȗ

Time axis
…

…
t1

t2

t3
[t1, t2]ki

t4
[t3, t4]ki

Figure 2.10. Example idle periods and data submissions
has submitted an idle proof for an idle period containing time slot t. If true, then it sends
the idle proof to the sink as V O. Otherwise, it replies to the sink saying that it does not have
the idle proof containing time slot t at this moment, but once the right idle proof is received,
it will forward to the sink. The maximum number of time slots that the sink may need to
wait for the right idle proof is γ − 1. Here γ is a system parameter trading oﬀ eﬃciency and
the amount of time that sink may have to wait for verifying data integrity. Smaller γ favors
the sink for integrity veriﬁcation and larger γ favors sensors for power saving because of less
communication cost.
The Sink: Changes on the sink side are minimal. In the case that V O lacks the idle
proof for verifying the integrity of QR, it will defer the veriﬁcation for at most γ − 1 time
slots, during which benign storage nodes are guaranteed to send the needed idle proof.

2.8
2.8.1

Complexity and Security Analysis
Complexity Analysis

Assume that a sensor collects n z-dimensional data items in a time slot, each attribute of
a data item is a wo -bit number, and the HMAC result of each numericalized preﬁx is a
wh number. The computation cost, communication cost, and storage space of SafeQ are
described in the following table. Note that the communication cost denotes the number of

34

bytes sent for each submission or query, and the storage space denotes the number of bytes
stored in a storage node for each submission.

Sensor
Storage
node
Sink

Computation
O(wozn) hash
O(n) encryption

Communication

Space

O(wowh zn)

–

O(woz) hash

O(zn)

O(wo wh zn)

O(woz) hash

O(woz)

–

Table 2.2. Complexity analysis of SafeQ

2.8.2

Privacy Analysis

In a SafeQ protected two-tiered sensor network, compromising a storage node does not allow
the attacker to obtain the actual values of sensor collected data and sink issued queries.
The correctness of this claim is based on the fact that the hash functions and encryption
algorithms used in SafeQ are secure. In the submission protocol, a storage node only receives
encrypted data items and the secure hash values of preﬁxes converted from the data items.
Without knowing the keys used in the encryption and secure hashing, it is computationally
infeasible to compute the actual values of sensor collected data and the corresponding preﬁxes. In the query protocol, a storage node only receives the secure hash values of preﬁxes
converted from a range query. Without knowing the key used in the secure hashing, it is
computationally infeasible to compute the actual values of sink issued queries.
Next, we analyze information leaking if HMACg () does not satisfy the one-wayness property. More formally, given y, where y = HMACg (x) and x is a numericalized preﬁx, suppose
that a storage node takes O(T ) steps to compute x. Recall that the number of HMAC hashes
sent from a sensor is O(wozn). To reveal a data item dj , the storage node needs to reveal
all the numericalized preﬁxes in HMAC g (N (S([dj−1 , dj ]))). Thus, to reveal n data items,
the storage node would take O(woznT ) steps. Here T = 2128 for HMAC.
35

2.8.3

Integrity Analysis

For our scheme using Merkle hash trees, the correctness of this claim is based on the property
that any change of leaf nodes in a Merkle hash tree will change the root value. Recall that
the leaf nodes in a Merkle hash tree are sorted according to their values. In a query response,
the left bound of the query result (if it exists), the query result, and the right bound of the
query result (if it exists) must be consecutive leaf nodes in the Merkle hash tree. If the
storage node includes forged data in the query result or excludes a legitimate data item from
the query result, the root value computed at the sink will be diﬀerent from the root value
computed at the corresponding sensor.
For our scheme using neighborhood chains, the correctness is based on the following three
properties that QR and V O should satisfy for a query. First, items in QR ∪ V O form a
chain. Excluding any item in the middle or changing any item violates the chaining property.
Second, the ﬁrst item in QR ∪ V O contains the value of its left neighbor, which should be
out of the range query on the smaller end. Third, the last item in QR ∪ V O contains the
value of its right neighbor, which should be out of the range query on the larger end.

2.9

Experimental Results

2.9.1

Evaluation Methodology

To compare SafeQ with the state-of-the-art, which is represented by S&L scheme, we implemented both schemes and performed side-by-side comparison on a large real data set. We
measured average power and space consumption for both the submission and query protocols
of both schemes.

36

2.9.2

Evaluation Setup

We implemented both SafeQ and S&L scheme using TOSSIM [8], a widely used wireless
sensor network simulator. We measured the eﬃciency of SafeQ and S&L scheme on 1, 2,
and 3 dimensional data. For better comparison, we conducted our experiments on the same
data set that S&L used in their experiment [69]. The data set was chosen from a large real
data set from Intel Lab [4] and it consists of the temperature, humidity, and voltage data
collected by 44 nodes during 03/01/2004-03/10/2004. Each data attribute follows Gaussian
distribution. Note that S&L only conducted experiments on the temperature data, while we
experimented with both SafeQ and S&L schemes on 1-dimensional data (of temperature),
2-dimensional data (of temperature and humidity) and 3-dimensional data (of temperature,
humidity, and voltage).
In implementing SafeQ, we used HMAC-MD5 [46] with 128-bit keys as the hash function for hashing preﬁx numbers. We used the DES encryption algorithm in implementing
both SafeQ and S&L scheme. In implementing our Bloom ﬁlter optimization technique, we
chose the number of hash functions to be 4 (i.e., k = 4), which guarantees that the false
positive rate induced by the Bloom ﬁlter is less than 1%. In implementing S&L scheme,
we used the parameter values (i.e., VAR p = 0.4 and EN p = 1), which are corresponding
to the minimum false positives of query results in their experiments, for computing optimal
bucket partitions as in [69], and we used HMAC-MD5 with 128-bit keys as the hash function for computing encoding number. For multi-dimensional data, we used their optimal
bucket partition algorithm to partition multi-dimensional data along each dimension. In our
experiments, we experimented with diﬀerent sizes of time slots ranging from 10 minutes to
80 minutes. For each time slot, we generated 1,000 random range queries in the form of
([a1 , b1 ], [a2 , b2 ], [a3 , b3 ]), where a1 , b1 are two random values of temperature, a2 , b2 are two
random values of humidity, and a3 , b3 are two random values of voltage.

37

2.9.3

Evaluation Results

The experimental results from our side-by-side comparison show that SafeQ signiﬁcantly
outperforms S&L scheme for multi-dimensional data in terms of power and space consumption. For the two integrity preserving schemes, the neighborhood chaining technique is better
than Merkle hash tree technique in terms of both power and space consumption. The rationale for us to include the Merkle hash tree based scheme is that Merkle hash trees are the
typical approach to achieving integrity. We use SafeQ-MHT+ and SafeQ-MHT to denote
our schemes using Merkle hash trees with and without Bloom ﬁlters, respectively, and we
use SafeQ-NC+ and SafeQ-NC to denote our schemes using neighborhood chains with and
without Bloom ﬁlters, respectively.
Figures 2.11(a), 2.11(b), and 2.12(a), show the average power consumption of sensors for
3-dimensional, 2-dimensional, and 1-dimensional data, respectively, versus diﬀerent sizes of
time slots. Figures 2.13(a), 2.13(b), and 2.14(a), show the average power consumption of
storage nodes for 3-dimensional, 2-dimensional, and 1-dimensional data, respectively, versus
diﬀerent sizes of time slots. We observe that the power consumption of both sensors and
storage nodes grows linearly with the number of data items, which conﬁrms our complexity
analysis in Section 4.5.2. Note that the number of collected data items is in direct proportion
to the size of time slots. For power consumption, in comparison with S&L scheme, our
experimental results show that for 3-dimensional data, SafeQ-NC+ consumes 184.9 times
less power for sensors and 76.8 times less power for storage nodes; SafeQ-MHT+ consumes
171.4 times less power for sensors and 46.9 times less power for storage nodes; SafeQ-NC
consumes 59.2 times less power for sensors and 76.8 times less power for storage nodes;
SafeQ-MHT consumes 57.9 times less power for sensors and 46.9 times less power for storage
nodes. For 2-dimensional data, SafeQ-NC+ consumes 10.3 times less power for sensors and
9.0 times less power for storage nodes; SafeQ-MHT+ consumes 9.5 times less power for

38

sensors and 5.4 times less power for storage nodes; SafeQ-NC consumes 2.7 times less power
for sensors and 9.0 times less power for storage nodes; SafeQ-MHT consumes 2.6 times less
power for sensors and 5.4 times less power for storage nodes. Our experimental results
conform with the theoretical analysis that the power consumption in S&L scheme grows
exponentially with the number of dimensions, whereas in SafeQ it grows linearly with the
number of dimensions times the number of data items.
Figures 2.12(b) and 2.14(b) show the average power consumption for a 10-minute slot
for a sensor and a storage node, respectively, versus the number of dimensions of the data.
We observe that there are almost linear correlation between the average power consumption
for both sensors and storage nodes and the number of dimensions of the data, which also
conﬁrms our complexity analysis in Section 4.5.2.
Our experimental results also show that SafeQ is comparable to S&L scheme for 1dimensional data in terms of power and space consumption. For power consumption, SafeQNC+ consumes about the same power for sensors and 0.7 times less power for storage nodes;
SafeQ-MHT+ consumes about the same power for sensors and 0.3 times less power for storage nodes; SafeQ-NC consumes 1.0 times more power for sensors and 0.7 times less power
for storage nodes; SafeQ-MHT consumes 1.0 times more power for sensors and 0.3 times less
power for storage nodes. For space consumption on storage nodes, SafeQ-NC+ and SafeQMHT+ consume about the same space, and SafeQ-NC and SafeQ-MHT consume about 1.0
times more space.
Figures 2.15(a), 2.15(b) and 2.16(a) show the average space consumption of storage nodes
for 3, 2 and 1 dimensional data, respectively.

For space consumption on storage nodes, in

comparison with S&L scheme, our experimental results show that for 3-dimensional data,
SafeQ-NC+ consumes 182.4 times less space; SafeQ-MHT+ consumes 169.1 times less space;
SafeQ-NC consumes 58.5 times less space; SafeQ-MHT consumes 57.2 times less space. For
2-dimensional data, SafeQ-NC+ consumes 10.2 times less space; SafeQ-MHT+ consumes 9.4
39

times less space; SafeQ-NC consumes 2.7 times less space; SafeQ-MHT consumes 2.6 times
less space. The results conform with the theoretical analysis that the space consumption in
S&L scheme grows exponentially with the number of dimensions, whereas in SafeQ it grows
linearly with the number of dimensions and the number of data items.
Figure 2.16(b) shows the average space consumption of storage nodes for each data item
versus the number of dimensions of the data item. For each 3-dimensional data item, S&L
consumes about over 104 bytes, while SafeQ-NC+ and SafeQ-MHT+ consume only 40 bytes.

40

Power consumption (mW)

SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+6

1e+4

1e+2

1e+0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(a) 3-dimensional data

Power consumption (mW)

600
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

400

200

0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(b) 2-dimensional data
Figure 2.11. Average power consumption per submission for a sensor (A)

41

Power consumption (mW)

100
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

80
60
40
20
0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(a) 1-dimensional data

Power consumption (mW)

1e+4
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+3

1e+2

1e+1

1e+0

1

2
Number of dimensions

3

(b) for 10 minutes
Figure 2.12. Average power consumption per submission for a sensor (B)

42

Power consumption (mW)

SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+4

1e+3

1e+2

1e+1

1e+0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(a) 3-dimensional data

Power consumption (mW)

90
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

80
70
60
50
40
30
20
10
0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(b) 2-dimensional data
Figure 2.13. Average power consumption per query response for a storage node (A)

43

Power consumption (mW)

16

SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

14
12
10
8
6
4
2
0
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(a) 1-dimensional data

Power consumption (mW)

1e+3
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+2

1e+1

1e+0

1

2
Number of dimensions

3

(b) for 10 minutes
Figure 2.14. Average power consumption per query response for a storage node (B)

44

Space Consumption (kB)

SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+7

1e+6

1e+5

1e+4

1e+3
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

Space Consumption (kB)

(a) 3-dimensional data

SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+5

1e+4

1e+3
0

10

20 30 40 50 60 70 80
Time slot size (minutes)

90

(b) 2-dimensional data
Figure 2.15. Average space consumption for a storage node (A)

45

Space Consumption (kB)

10000
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

8000

6000

4000

2000
0

10 20 30 40 50 60 70 80
Time slot size (minutes)

90

(a) 1-dimensional data

Space consumption (Bytes)

1e+3
SafeQ−NC+
SafeQ−MHT+
SafeQ−NC
SafeQ−MHT
S&L

1e+2

1e+1

1e+0

1e−1

1

2
Number of dimensions

3

(b) Each data item
Figure 2.16. Average space consumption for a storage node (B)

46

CHAPTER 3
Privacy and Integrity Preserving
Range Queries for Cloud Computing
3.1
3.1.1

Introduction
Motivation

Cloud computing has become a new computing paradigm of internet services, where cloud
providers host numerous hardware, software, and network resources, to store organizations’
data and perform computation over the data on demand of customers’ queries. Cloud computing has three major advantages. First, organizations can instantly open business and
provide products or services to their customers without building and maintaining their computer infrastructure, which signiﬁcantly reduces costs. Second, the data stored in a cloud
are more reliable and can be accessed whenever a customer has internet connection. Third,
cloud providers have powerful computation capacity to process queries, which provides better experience to customers. Many clouds have been successfully built, e.g., Amazon EC2
and S3 [1], Google App Engine [3], and Microsoft Azure [5].

47

The database-as-a-service (DAS) model, ﬁrst introduced by Hacigumus et al. [40], is one
of the most important works in cloud computing. In the DAS model, a cloud provider
hosts the data of an organization and replies query results to the customers on behalf of the
organization. However, the inclusion of the DAS model brings signiﬁcant security and privacy
challenges. As cloud providers cannot be fully trusted and the data of an organization are
typically conﬁdential, the organization needs to encrypt the data before storing it in a cloud
to prevent the cloud provider from revealing the data. However, it is diﬃcult to process
queries over encrypted data. Furthermore, since cloud providers serve as an important role
for answering queries from customers, they may return forged data for the query or may not
return all the data items that satisfy the query.
Therefore, we want to design a protocol for the DAS model that supports multi-dimensional
range queries while preserving the privacy of both data and queries and the integrity of query
results. Range queries are one of the most important queries for various database systems
and have wide applications. For data privacy, cloud providers cannot reveal the organization
data and customer queries. Note that the customer queries also need to be kept conﬁdential
from cloud providers because such queries may leak critical information about query results.
For query result integrity, customers need to detect whether a query result includes forged
data or does not include all the data items that satisfy the query.

3.1.2

Technical Challenges

There are three challenges in solving secure multi-dimensional range queries problem in the
DAS model. First, a cloud provider needs to correctly process range queries over encrypted
data without knowing the values of both data and queries. Second, customers need to verify
whether a query result contains all the data items that satisfy the query and does not contain
any forged data. Third, supporting multi-dimensional range queries is diﬃcult.

48

3.1.3

Limitations of Previous Work

Privacy and integrity preserving range queries have received much attention in database and
security community (e.g., [40, 41, 10, 31, 72, 17, 27, 62, 22]).
Four main techniques have been proposed in the privacy-preserving schemes: bucket partitioning (e.g., [40, 41]), order-preserving hash functions (e.g., [31]), order-preserving encryptions (e.g., [10]), and public-key encryption (e.g., [17, 72]). However, bucket partitioning
leads to false positives in query results, i.e., a query result includes data items that do not
satisfy the query. Existing order-preserving hash functions and order-preserving encryptions
require large amount of shared secret information between an organization and its customers.
The public-key cryptography is too expensive to be applied in realistic applications.
Three main techniques have been proposed in the integrity-preserving schemes: Merkle
hash trees (e.g., [27, 63]), signature aggregation and chaining (e.g., [62, 59]), and spatial
data structures (e.g., [22]). However, Merkle hash trees cannot support multi-dimensional
range queries. Signature aggregation and chaining requires a cloud provider to reply to the
customer the boundary data items of the query that do not satisfy the query. Spatial data
structures are computationally expensive because constructing such structures is complicated. Furthermore, it is not clear how to search query results over such structures in a
privacy preserving manner.

3.1.4

Our Approach

In this work, we propose novel privacy and integrity preserving schemes for the DAS model.
To preserve privacy, we propose an order-preserving hash-based function to encode both data
and queries such that a cloud provider can correctly process encoded queries over encoded
data without knowing their values. To preserve integrity, we present the ﬁrst probabilistic
integrity-preserving scheme for multi-dimensional range queries. In this scheme, we propose

49

a new data structure, called local bit matrices, to encode neighborhood information for each
data item from an organization, such that a customer can verify the integrity of a query
result with a high probability.
Comparing with the state-of-the-art, our schemes achieve both security and eﬃciency. In
terms of security, our schemes not only enable a cloud provider to correctly process queries
over encrypted data, but also leak only the minimum privacy information as we will discuss
in Section 3.3.4. In terms of eﬃciency, our schemes are much more eﬃcient due to the use
of the hash function and symmetric encryption.

3.1.5

Key Contributions

We make three major contributions. First, we propose an eﬃcient privacy-preserving scheme
that can process multi-dimensional range queries without false positives. Second, we propose
the ﬁrst probabilistic scheme for verifying the integrity of range query results. This scheme
employs a new data structure, local bit matrices, which enables customers to verify query
result integrity with high probability. Third, we conduct extensive experiments on real and
synthetic datasets to evaluate the eﬀectiveness and eﬃciency of our scheme.

3.1.6

Summary of Experimental Results

We performed extensive experiments on synthetic datasets and the Adult dataset [32]. Our
experimental results show that our schemes are eﬃcient for preserving privacy and integrity of
multi-dimensional range queries in cloud computing. For a synthetic dataset with one million
1-dimensional data items, the one-time oﬄine data processing time is about 50 minutes, the
space cost is 33MB, and the query processing time is 2 milliseconds. For the Adult dataset
with 45222 3-dimensional data items, the data processing time is 104 seconds, the space cost
is 1.5MB, and the query processing time is 3.5 milliseconds.

50

3.2
3.2.1

Models and Problem Statement
System Model

We consider the database-as-a-service (DAS) model as illustrated in Figure 3.1. The DAS
model consists of three parties: organizations, a cloud provider, and customers. Organizations outsource their private data to a cloud provider. A cloud provider hosts outsourced
data from organizations and processes the customers’ queries on behalf of the organizations.
Customers are the clients (of organizations) that query a cloud provider and retrieve query
results from the outsourced data in the cloud provider.

Query
Customer
Result
Query
Result
Organization

Cloud Provider

Customer

Query
Result
Customer

Figure 3.1. The DAS model

3.2.2

Threat Model

In the DAS model, we assume that organizations and their customers are trusted but the
cloud provider is not. In a hostile environment, both customers and cloud providers may
not be trusted. If a customer is malicious, it may retrieve all the organization’s data and
distribute to other unauthorized users. Such attack is very diﬃcult to be prevented and is
out of the scope of this work. In this work, we mainly focus on the scenario where a cloud
51

provider is not trusted and it may try to reveal organizations’ data and falsify the query
results. In reality, cloud providers and organizations typically belong to diﬀerent parties, i.e.,
diﬀerent companies. The organizations cannot share their private data with untrusted cloud
providers. A malicious cloud provider may try to reveal the private data of organizations,
and return falsiﬁed query results that include forged data or exclude legitimate data. In such
case, a cloud provider can disrupt the business and cause great lost of organizations.
We also assume that there are secure channels between the organization and the cloud
provider, and between the cloud provider and each customer, which could be achieved using
protocols such as SSL.

3.2.3

Problem Statement

The fundamental problem for the DAS model is: how can we design the storage scheme and
the query protocol in a privacy and integrity preserving manner ? A satisfactory solution
to this problem should meet the following three requirements. (1) Data and query privacy:
Data privacy means that a cloud provider cannot reveal any data item from organizations.
Query privacy means that a cloud provider cannot reveal any query from customers. (2) Data
integrity: If a cloud provider returns forged data or does not return all the data items that
satisfy the query, such misbehavior should be detected by the customer. (3) Range Query
Processing: The encoded data from organizations and encoded queries from customers should
allow a cloud provider to correctly process range queries.

3.3

Privacy Preserving for 1-dimensional Data

In this section, we present our privacy-preserving scheme for 1-dimensional data. To preserve
privacy, it is natural to have an organization to encrypt its data items. Let d1 , · · · , dn denote
n data items from the organization, the encryption results can be denoted as (d1 )k , · · · , (dn )k ,
52

where k is the shared secret key between the organization and its customers. However, the key
challenge is how can a cloud provider process queries over encrypted data without knowing
the values of data items.
The basic idea of our scheme is to design an order-preserving hash-based function to
encode the data items from the organization and the queries from its customers such that
the cloud provider can use the encoded queries and encoded data items to ﬁnd out the
query results without knowing the actual values. More formally, let fk () denote the orderpreserving hash-based function, where k is the shared secret key between the organization
and its customers. To compute fk (), the organization and its customers also need to share
the secret information, the domain of the data items [x1 , xN ]. This function fk () satisﬁes
the following property: the condition fk (xi1 ) < fk (xi2 ) holds if x1 ≤ xi1 < xi2 ≤ xN .
To submit n data items d1 , · · · , dn to a cloud provider, the organization ﬁrst encrypts each
data item with the secret key k, i.e., (d1 )k , · · · , (dn )k . Second, the organization applies the
function fk () to each data item, i.e., fk (d1 ), · · · , fk (dn ). Finally, the organization sends the
encrypted data items (d1 )k , · · · , (dn )k as well as the encoded data items fk (d1 ), · · · , fk (dn )
to the cloud provider. To perform a range query [a, b], the customer applies the orderpreserving hash-based function fk () to the lower and upper bounds of the query, i.e., fk (a)
and fk (b), and then sends [fk (a), fk (b)] as the query to the cloud provider. Finally, the
cloud provider can ﬁnd out whether the data item dj (1≤j≤n) satisﬁes the query [a, b] by
checking whether the condition fk (a)≤fk (dj )≤fk (b) holds. Figure 3.2 shows the idea of our
privacy-preserving scheme.
In this section, we ﬁrst present our order-preserving hash-based function and then discuss
its properties. Second, we propose the privacy-preserving scheme by employing this function.
Third, we propose an optimization technique to reduce the size of the results after applying
the hash-based function. Fourth, we analyze the minimum information leakage for any
precise privacy-preserving scheme of range queries and demonstrate that our scheme leaks
53

Organization

Cloud Provider

Customer
[fk(a), fk(b)]

(d1)k,…,(dn)k
fk(d1),…, fk(dn)

dj[a, b] if and only if fk(a)  fk(dj)  fk(b)
Figure 3.2. Basic idea of privacy-preserving scheme
only the minimum information.

3.3.1

The Order-Preserving Hash-based Function

Without loss of generalization, we assume that all possible data items are integers within
domain [x1 , xN ]. The order-preserving hash-based function fk () is in the form
j

fk (xi ) =

hk (xq )

(3.1)

q=1

where xi ∈ [x1 , xN ] and hk () is a keyed hash function, such as keyed HMAC-MD5 and keyed
HMAC-SHA1. The intuition of designing such order-preserving hash-based function is twofold. First, we leverage a normal hash function hk () as the basic building block such that the
one-way property of hk () prevents the cloud provider from revealing the data items. Second,
we consider the result of hk () as a positive integer and then calculate fk (xi ) by summing
the hash results of all values that are less than or equal to xi in the domain [x1 , xN ] such
that if x1 ≤ xi1 < xi2 ≤ xN , fk (xi1 ) is less than fk (xi2 ). In other words, fk () is an
order-preserving function for the values in [x1 , xN ].
More formally, the order-preserving hash-based function fk () satisﬁes two properties.
Order Preserving: Assume that any hk (xq ) (xq ∈ [x1 , xN ]) is a positive integer. The
condition fk (xi1 ) < fk (xi2 ) holds if and only if xi1 < xi2 .
Proof. We ﬁrst prove that if the condition fk (xi1 ) < fk (xi2 ) holds, then xi1 < xi2 . We
prove it by contradiction. If xi1 ≥ xi2 , we have
54

i1

fk (xi1 ) = fk (xi2 ) +

q=i2+1

hk (xq ) ≥ fk (xi2 )

(3.2)

Second, we prove that if the condition xi1 < xi2 holds, then fk (xi1 ) < fk (xi2 ). Similar as
the proof of the property collision resistance, we have
i2

fk (xi2 ) = fk (xi1 ) +

q=i1 +1

hk (xq ) > fk (xi1 )

Collision Resistance: Assume that any hk (xq ) (xq ∈ [x1 , xN ]) is a positive integer. It
is impossible to ﬁnd xi1 and xi2 where xi1 =xi2 such that fk (xi1 )=fk (xi2 ).
Proof. Without loss of generalization, we assume i1 < i2 . Because any hk (xq ) (xq ∈ [x1 , xN ])
is a positive integer, we have
i2

fk (xi2 ) = fk (xi1 ) +

q=i1 +1

hk (xq ) > fk (xi1 )

In fact, the hash-based function fk () can preserve any given arbitrary order of values in
the domain [x1 , xN ] no matter whether the condition x1 < · · · < xN holds. For example,
if the order of 3 data items 3, 5, 7 is deﬁned as 5, 3, 7, then fk (5) < fk (3) < fk (7). This
property allows an organization to revise any data item xi (xi ∈ [x1 , xN ]) arbitrarily while
still preserving the order. We will discuss how to leverage this property to prevent the
statistical analysis of multi-dimensional data in Section 3.6.1.
These two properties and the one-way property of hk () allow the cloud provider to process
the encoded range queries over the encoded data without revealing the values of the data
and queries.

55

3.3.2

The Privacy-Preserving Scheme

The privacy-preserving scheme includes three phases, data submission, query submission,
query processing.
The data submission phase concerns how an organization sends its data to a cloud provider.
Let d1 , · · · , dn denote the data items of an attribute in the private data of the organization.
Recall that [x1 , xN ] is the domain of the attribute and is the shared secret between the
organization and its customers. For simplicity, we assume d1 < d2 < · · · < dn . If some data
items have the same value, the organization can simply represent them as one data item
annotated with the number of items that share this value.
To preserve data privacy, for each dj (1 ≤ j ≤ n), the organization ﬁrst encrypts it with
its secret key k, i.e., (dj )k , and then applies the order-preserving hash-based function, i.e.,
fk (dj ). Finally, the organization sends the encrypted data (d1 )k , · · · , (dn )k as well as the
hash results fk (d1 ), · · · , fk (dn ) to the cloud provider.
The query submission phase concerns how a customer sends a range query to the cloud
provider. When a customer wants to perform a range query [a, b] on the cloud provider, it
ﬁrst applies the order-preserving hash-based function to the lower and upper bounds of the
query, i.e., fk (a) and fk (b). Note that a and b are also two values in [x1 , xN ]. Finally, the
customer sends [fk (a), fk (b)] as a query to the cloud provider.
Upon receiving the query [fk (a), fk (b)], the cloud provider processes this query on the
n data items (d1 )k , · · · , (dn )k by checking which fk (dj ) (1 ≤ j ≤ n) satisﬁes the condition
fk (a) ≤ fk (dj ) ≤ fk (b). Based on the order preserving property of the function fk (),
dj ∈ [a, b] if and only if fk (a) ≤ fk (dj ) ≤ fk (b). Thus, the cloud provider only needs to
return all encrypted data items whose hash values fk () are in the range [fk (a), fk (b)].

56

3.3.3

Optimization of the Order-Preserving Hash-based Function

We propose an optimization technique to reduce the communication cost for sending the
encoded data fk (d1 ), · · · , fk (dn ) to the cloud provider. This cost can be signiﬁcant for two
reasons. First, the result of the hash function hk () is long. For example, the result of the
keyed HMAC-MD5 is 128 bits, and the result of the keyed HMAC-SHA1 is 160 bits. Second,
the result of our order-preserving hash-based function fk () is even longer due to the sum of
multiple hash values of hk (). Let w denote the bit length of hk (). For any xi (xi ∈ [x1 , xN ]),
the condition hk (xi ) ≤ 2w − 1 holds. Thus, we have
N

N

fk (xi ) ≤ fk (xN ) =

hk (xq ) ≤
q=1

(2w − 1) < 2w N

(3.3)

q=1

Therefore, the bit length of fk (xi ) (1 ≤ i ≤ N) is less than log2 2w N = w+log2 N. Assume
that we use the keyed HMAC-MD5 as the hash function hk () and the number of possible
values is one million, i.e., N = 106 . Any fk (xi ) can be expressed by a 128 + log2 106 = 148
bit number.
To reduce the bit length of fk (xi ) (xi ∈ [x1 , xN ]), the idea is to divide every fk (xi ) by a
′

′

value 2w if 2w is less than or equal to any hk (xq ) (xq ∈ [x1 , xN ]). Then, our order-preserving
hash-based function becomes
∗
fk (xi )

=

i
q=1 hk (xq )
′
2w

(3.4)

′
where 2w ≤ hk (xq ) for any xq ∈ [x1 , xN ]. Such division can be easily done by deleting the

right w ′ bits of fk (xi ).
∗
Similar as the analysis of the bit length of fk (xi ), the bit length of any fk (xi ) (xi ∈

[x1 , xN ]) can be reduced to w − w ′ + log2 N by the following calculation.

∗
fk (xi )

≤

∗
fk (xN )

≤

N
w
q=1 (2
′
2w

57

− 1)

′

< (2w−w )N

(3.5)

∗
The order-preserving function fk () also satisﬁes the two properties, order preserving and

collision resistance, the proof of which is in Appendix B.

3.3.4

Analysis of Information Leakage

Given n data items d1 , · · · , dn and a range query [a, b], any precise privacy-preserving scheme
should enable the cloud provider to ﬁnd out all the data items that satisfy the query [a, b]
without revealing the values of the data items from the organization and the query from its
customer. According to this requirement, we have the following theorem.
Theorem 2. Given any precise privacy-preserving scheme, if all possible results of range
queries have been found during the query processing phase, the cloud provider can reveal the
order of the encrypted data items.

Proof. Without loss of generalization, we assume d1 < d2 < · · · < dn . First, we show that
how to reveal the order of three consecutive data items dj−1 , dj , dj+1 . Among all possible
results of range queries, there should be two query results, QR1 = {(dj−1 )k , (dj )k } and
QR2 = {(dj )k , (dj+1)k }. Assume that [a1 , b1 ] and [a2 , b2 ] are the two range queries whose
results are QR1 and QR2 , respectively. Obviously, [a1 , b1 ], [a2 , b2 ], and dj−1 , dj , dj+1 satisfy
two conditions.
1. dj−2 < a1 ≤ dj−1 < dj ≤ b1 < dj+1 .
2. dj−1 < a2 ≤ dj < dj+1 ≤ b2 < dj+2 .
Based on QR1 , the cloud provider knows that (dj−1 )k and (dj )k are two consecutive data
items. Similarly, based on QR2 , the cloud provider knows that (dj )k and (dj+1)k are two
consecutive data items. Based on the common encrypted data item (dj )k in QR1 and QR2 ,
the cloud provider knows that dj is between dj−1 and dj+1 . Thus, the cloud provider knows

58

the order of these three encrypted data items is either dj−1 < dj < dj+1 or dj−1 > dj >
dj+1 . Repeat sorting other three consecutive items. Finally, the cloud provider knows the
order of all the encrypted data items is either d1 < · · · < dn or d1 > · · · > dn .
Theorem 2 describes the minimum information leakage for any precise privacy-preserving
scheme of range queries. That is, the cloud provider will reveal the order of the data items
received from the organization.
We argue that our privacy-preserving scheme achieves the minimum information leakage
for two reasons. First, the cloud provider cannot reveal the values of the data and queries
from the hash results due to the one-way property of hk (). Second, the cloud provider cannot
reveal these values by launching statistical analysis because for any two data items dj1 and
dj2 (1 ≤ j1 < j2 ≤ n), (dj1 )k = (dj2 )k and fk (dj1 ) = fk (dj2 ). Recall that if some data items
have the same value, the organization represents them as one data item with the number of
items that share this value.

3.4

Integrity Preserving for 1-dimensional Data

In this section, we present the ﬁrst probabilistic integrity-preserving scheme for 1-dimensional
data. This scheme allows a customer to verify the integrity of a query result with a high
probability. The meaning of integrity preserving is two-fold. First, a customer can verify
whether the cloud provider forges some data items in the query result. Second, a customer
can verify whether the cloud provider deletes data items that satisﬁes the query.
The basic idea of the integrity-preserving scheme is to encrypt neighborhood information
for each data item such that the neighborhood information of the data items in a query
result can be used to verify the integrity of the query result. More formally, let (M(dj ))k
denote the encrypted neighborhood information for each data item dj (1 ≤ j ≤ n). To
submit n data items d1 , · · · , dn to a cloud provider, the organization not only sends the
59

encrypted data items (d1 )k , · · · , (dn )k and the encoded data items fk (d1 ), · · · , fk (dn ), but
also sends the encrypted neighborhood information (M(d1 ))k , · · · , (M(dn ))k . Upon receiving a query [fk (a), fk (b)] from a customer, the cloud provider ﬁrst ﬁnds out the query
result based on the privacy-preserving scheme. Suppose that the data items dj1 , · · · , dj2
(1 ≤ j1 ≤ j2 ≤ n) satisfy the query. The cloud provider not only replies to the customer
the query result (dj1 )k , · · · , (dj2 )k , but also replies the encrypted neighborhood information
(M(dj1 ))k , · · · , (M(dj2 ))k . For ease of presentation, let QR denote the query result, which
includes all the encrypted data items that satisfy the query, i.e., QR = {(dj1 )k , · · · , (dj2 )k },
and V O denote the veriﬁcation object, which includes the information for the customer to verify the integrity of QR, i.e., V O = {(M(dj1 ))k , · · · , (M(dj2 ))k }. To verify the integrity, the
customer ﬁrst decrypts the query result and veriﬁcation object, i.e., computes dj1 , · · · , dj2
and M(dj1 ), · · · , M(dj2 ). Second, the organization checks whether dj1 , · · · , dj2 satisfy the
query and the overlapping parts of the neighborhood information from every two adjacent
data items exactly match. If so, the customer concludes that the query result includes all
the data items that satisfy the query. Otherwise, the customer concludes that some data
items in the query result are forged or deleted by the cloud provider. Figure 3.3 shows the
basic idea of our integrity-preserving scheme.

Organization

Cloud Provider

Customer
[fk(a), fk(b)]

(d1)k,…,(dn)k
(M(d1))k,…,(M(dn))k
Assume dj1,…,dj2[a, b]

QR = {(dj1)k,…,(dj2)k}
VO = {(M(dj1))k,…, (M(dj2))k}

Figure 3.3. Basic idea of integrity-preserving scheme
Our integrity-preserving scheme can guarantee to detect the misbehavior of forging data
items because the cloud provider cannot insert fake data items into a query result without
knowing the secret key k. This scheme also allows a customer to detect the misbehavior of
60

deleting data items in a query result with a high probability.
Next we present new data structures, called bit matrices and local bit matrices, and then
discuss their usage in integrity veriﬁcation.

3.4.1

Bit Matrices and Local Bit Matrices

To deﬁne bit matrices and local bit matrices, we need to ﬁrst partition the data domain into
multiple non-overlapping buckets. For example in Figure 3.4, we partition domain [1, 15] to
ﬁve buckets, B1 = [1, 3], B2 = [4, 6], B3 = [7, 10], B4 = [11, 12], and B5 = [13, 15]. Second,
we distribute the data items into the corresponding buckets. Third, we assign a bit value 1
to the buckets which includes data items, and assign a bit value 0 to the buckets which does
not include data items. Let B(dj ) denote the bucket that includes dj . A bucket is called the
left nonempty bucket of data item dj if the bucket is the left nearest bucket of B(dj ) that
includes data items. Similarly, a bucket is called the right nonempty bucket of data item dj
if the bucket is the right nearest bucket of B(dj ) that includes data items. For example, for
data item 7 in Figure 3.4, B2 and B5 are the left and right nonempty buckets of data item
7, respectively.

5
1

B1 3

B2

7 9
6

B3

13 14
10

1
1
M= 0
M(5)=2|011|1|1 M(7)=3|1101|2|1
M(9)=3|1101|2|2

B4 12

B5

15

0
1
M(13)=5|101|2|1
M(14)=5|101|2|2

Figure 3.4. Example bit matrix and local bit matrices
Based on the above concepts, we deﬁne bit matrices and local bit matrices as follows. The
bit matrix of all data items, M, is formed by the bit values of all buckets. In Figure 3.4, the
bit matrix of the ﬁve data items is 01101, i.e., M = 01101. The local bit matrix of a data
item dj , M(dj ), consists of four parts: (1) the bucket id of B(dj ); (2) a subset of the bit

61

matrix, which is formed by the bit values from its left nonempty bucket to its right nonempty
bucket; (3) the number of data items in bucket B(dj ); (4) a distinct integer to distinguish
the local bit matrix of dj from other data items in bucket B(dj ). In Figure 3.4, the local
bit matrix of data item 7 is 3|1101|2|1, i.e., M(7) = 3|1101|2|1, where 3 is the bucket id,
1101 is the subset of the bit matrix, 2 is the number of data items in bucket B3 , and 1 is
the integer to distinguish M(7) from M(9). Intuitively, the bit matrix denotes the abstract
information of all the data items, and the local bit matrix of a data item dj denotes the
abstract neighborhood information of dj .
Note that the usage of bucket partition in this work is diﬀerent from that in previous
work (e.g., [40, 41, 10]). They leverage bucket partition to achieve privacy-preserving query
processing. While we use the bit values of buckets for verifying the integrity of query results.

3.4.2

The Integrity-Preserving Scheme

Our integrity-preserving scheme includes four phases, data submission, query submission,
query processing, and query result veriﬁcation.
Let d1 , · · · , dn denote the data items of an attribute from the organization. The organization ﬁrst partitions the data domain to m non-overlapping buckets B1 , · · · , Bm , and
then distributes d1 , · · · , dn to these buckets. The bucket partition is a shared secret between the organization and its customers. Second, the organization computes the local bit
matrix for each data item and then encrypts them with its secret key k, i.e., computes
(M(d1 ))k , · · · , (M(dn ))k . Third, the organization sends to the cloud provider the encrypted
local bit matrices (M(d1 ))k , · · · , (M(dn ))k as well as encrypted data items (d1 )k , · · · , (dn )k
and the encoded data items fk (d1 ), · · · , fk (dn ).
To perform a range query [a, b], a customer sends [fk (a), fk (b)] to the cloud provider.
Upon receiving [fk (a), fk (b)], the cloud provider computes QR as in Section 3.3.2. Here

62

we consider how to compute V O. If QR = {(dj1 )k , · · · , (dj2 )k } (1 ≤ j1 ≤ j2 ≤ n),
V O = {(M(dj1 ))k , · · · , (M(dj2 ))k }; if QR = ∅, which means that there is a data item dj1
(1 ≤ j1 ≤ n) such that dj1 < a ≤ b < dj1 +1 , then V O = {(M(dj1 ))k , (M(dj1 +1 ))k }.
Finally, the cloud provider replies QR and V O.
Upon receiving the query result QR and the veriﬁcation object V O, the customer decrypts
them, and then veriﬁes the integrity of QR as follows. First, the customer veriﬁes whether
each item in QR satisﬁes the query [a, b]. Second, the customer veriﬁes whether the cloud
provider deletes any data item that satisﬁes the query. Let {(dj1 )k , · · · , (dj2 )k } be the
correct query result and Bg1 , · · · , Bgt be the buckets which include at least one data item
in the query result. Let QR be the query result from the cloud provider. Suppose the cloud
provider deletes a data item (dj )k that satisﬁes the query, i.e., (dj )k ∈ {(dj1 )k , · · · , (dj2 )k },
and dj ∈ Bgs (1 ≤ s ≤ t). We consider the following four cases.
Case 1: When QR = ∅, if Bgs ⊆ [a, b], the deletion can be detected for two reasons. First,
if Bgs only includes one data item dj , deleting (dj )k can be detected because the local bit
matrices of data items in Bgs−1 or Bgs+1 show that Bgs should include at least one data
item. Second, if Bgs includes multiple data items, deleting (dj )k can be detected because
the local bit matrices of other data items in Bgs have the number of data items in Bgs .
In Figure 3.4, given a range query [4,11], the correct query result is {(5)k , (7)k , (9)k }, and
the veriﬁcation object is {(M(5))k , (M(7))k , (M(9))k }. Deleting (7)k in B3 can be detected
because based on M(9), the customer knows that B3 includes two data items.
Case 2: When QR = ∅, if Bgs ⊆ [a, b], the deletion cannot be detected because the
customer does not know whether Bgs ∩ [a, b] includes data items. Considering the same
example in Case 1, deleting (5)k cannot be detected because the customer does not know
whether B2 ∩ [4, 11] includes data items.
Case 3: When QR = ∅, if B(dj1 ) ∩ [a, b] = ∅ and B(dj1 +1 ) ∩ [a, b] = ∅, the deletion can be
detected because M(dj1 ) or M(dj1 +1 ) shows that bucket Bgs between B(dj1 ) and B(dj1 +1 )
63

includes data items, and hence, condition dj1 < a ≤ b < dj1 +1 does not hold. In Figure
3.4, given a range query [3,5], the correct query result is {(5)k }. If the cloud provider replies
QR = ∅ and V O = {(M(7))k }, deleting (5)k can be detected because the customer knows
that B2 includes data items based on M(7), and these data items are closer to the query
[3,5] than 7.
Case 4: When QR = ∅, if B(dj1 ) ∩ [a, b] = ∅ or B(dj1 +1 ) ∩ [a, b] = ∅, the deletion cannot
be detected because the customer does not know whether B(dj1 ) ∩ [a, b] or B(dj1 +1 ) ∩ [a, b]
includes data items. In Figure 3.4, given a range query [9,12], the correct query result is
{(9)k }. If the cloud provider replies QR = ∅ and V O = {(M(7))k , (M(13))k }, deleting (9)k
cannot be detected because the customer does not know whether B(3) ∩[9, 12] includes data.
The cloud provider can break integrity veriﬁcation if and only if it can distinguish Cases
2 and 4 from other two cases. Distinguishing these two cases is equivalent to knowing
which data items belong to the same bucket. However, such information cannot be revealed by analyzing the encrypted data items (d1 )k , · · · , (dn )k , the encoded data items
fk (d1 ), · · · , fk (dn ), and the encrypted local bit matrices (M(d1 ))k , · · · , (M(dn ))k . Because
for any two data items dj1 and dj2 (1 ≤ j1 < j2 ≤ n), (dj1 )k = (dj2 )k , fk (dj1 ) = fk (dj2 ),
and (M(dj1 ))k = (M(dj2 ))k . Thus, the cloud provider can only randomly delete the data
items in the query result and hope that this deletion operation will not be detected.

3.5

Finding Optimal Parameters

Our integrity-preserving scheme needs to partition the data domain into multiple nonoverlapping buckets. However, how to partition the domain is still a problem. In this
section, we formalize the problem as an optimization problem and present an algorithm to
solve it. To simplify the problem, we assume that queries from customers follow uniform
distribution, i.e., all queries are equi-probable. Considering the N possible values in [x1 , xN ],
64

there are

N (N +1)
2

possible range queries. Thus, the possibility of any query from customers

2
is equal to N (N +1) .

3.5.1

Detection Probability

We ﬁrst consider the detection probability of randomly deleting data items in a query result
by a cloud provider. This metric is very important to evaluate the eﬀectiveness of our
integrity-preserving scheme. Let B1 , · · · , Bm denotes the multiple non-overlapping buckets,
and [li , hi ] denotes a bucket Bi (1 ≤ i ≤ m). A bucket Bi is called a single-value bucket if
li = hi . Let e(x) denote the frequency of the data item with value x. The probability that
a deletion operation of the cloud provider can be detected is
Pr =

m
i=1

hi
x=li (li − x1 + 1)(xN − hi + 1)e(x)
n
j=1 (dj − x1 + 1)(xN − dj + 1)

(3.6)

The calculation of this probability is in Appendix C. Next, we discuss the two theorems
regarding to P r, the proofs of these two theorems are in Appendix D.
Theorem 3. Given any n data items d1 , · · · , dn , the maximum detection probability of a
deletion operation is
P rmax = 100%

(3.7)

if and only if each data item dj (1 ≤ j ≤ n) forms a single-value bucket, i.e., [dj , dj ].
Theorem 4. Given n data items d1 , · · · , dn , the minimum detection probability of a deletion
operation is
P rmin =

n
j=1 (dj

n
− x1 + 1)(xN − dj + 1)

(3.8)

if and only if there is only one bucket [x1 , xN ].
The intuition behind Theorems 3 and 4 is that the more secret information of data
items the organization shares with its customers, the higher detection probability customers
65

achieve. For Theorem 3, if each data item forms a single-value bucket, the customers know
all the data items before querying the cloud provider. Of course they can detect any deletion
operation and it is meaningless for organizations to outsource their data. For Theorem 4,
recall our privacy preserving scheme in Section 3.3. Customers need to know [x1 , xN ] for
converting a query [a, b] to [fk (a), fk (b)]. Thus, in our context, x1 and xN are the minimum
secret information needed to be shared. Knowing only x1 and xN allows customers to detect
deletion operations with the minimum probability.
If a cloud provider conducts t deletion operations, the probability that at least one of the
t deletion operations can be detected is
P rt = 1 − (1 − P r)t

(3.9)

In Figure 3.4, the probability that a deletion operation can be detected is 60.48%. If the
cloud provider conducts 5 deletion operations over the query results, customers can detect
at least one deletion with 99% probability.

3.5.2

Optimal Bucket Partition

We deﬁne the optimization problem as follows.
Given n data items from the organization and the domain of these items [x1 , xN ], we want
to ﬁnd out the optimal partition with at most m buckets B1 , · · · , Bm such that the detection
probability P r is maximized. More formally, this problem can be deﬁned as follows.
Input: (1) d1 , · · · , dn
(2) [x1 , xN ]
(3) m
Output: B1 , · · · , Bm
Objective: max P r
This problem has the optimal substructure property [25]. Therefore, we can express the
optimal solution of the original problem as the combination of the optimal solutions of
66

two sub-problems. Let H(N, m) denote the problem of optimally partitioning the domain
[x1 , xN ] using at most m buckets. Let δ(i, j) denote the probability contributed by a bucket
[xi , xj ]. We can compute δ(i, j) as follows.
xj

(xi − x1 + 1)(xN − xj + 1) x=xi e(x)
δ(i, j) =
n
j=1 (dj − x1 + 1)(xN − dj + 1)

(3.10)

The optimal problem can be expressed as follows.
H(N, m) = max [H(N − i, m − 1) + δ(N − i + 1, N)]

(3.11)

Algorithm 1: Optimal Bucket Partition
Input: (1) n data items d1 , · · · , dn ;
(2) The domain [x1 , xN ];
(2) m.
Output: B1 , · · · , Bm .
1

Initialize each element in matrices H and P to 0;

2

for i := 2 to N do

3

H[i][2] = max1≤k≤i−1 [δ(1, k) + δ(k + 1, i)];

4

Store the left boundary value of the second bucket in P [i][2];

5
6

for j := 3 to m do
for i := j to N do

7

H[i][j] = maxj−1≤k≤i−1 [H[k][j − 1] + δ(k + 1, i)];

8

Store the left boundary value of the last bucket in P [i][j];

9

Find the maximum value in H and output the corresponding partition in P ;

We use dynamic programming to solve the problem. We ﬁrst solve and store solutions
of the smaller sub-problems. Then, we employ their optimal solutions to solve the larger
67

problems. Finally, we solve the optimal problem of maximizing H(N, m). All intermediate
solutions are stored in an N × m matrix H. The row indices of H are from 1, · · · , N and
the column indices are from 1, · · · , m. Note that H(i, j) = H[i][j]. Both the time and space
complexities of the computation of the matrix H are O(Nm). Along with the optimal value
H(i, j) (1 ≤ i ≤ N, 1 ≤ j ≤ m), we also store the lower bounds of its last bucket for each
sub-problem in another N × m matrix P . Finally, we use the matrix P to reconstruct the
optimal bucket partition in O(m) time. This algorithm is shown in Algorithm 1.

3.6
3.6.1

Query Over Multi-dimensional Data
Privacy for Multi-dimensional Data

Organizations’ data and customers’ queries are typically multi-dimensional. For example, a
medical record typically includes patient’s name, birthday, age, etc. A z-dimensional data
item D is a z-tuple (d1 , · · · , dz ) where each dr (1 ≤ r ≤ z) is the value for the r-th dimension
(i.e., attribute). A z-dimensional range query consists of z sub-queries [a1 , b1 ], · · · , [az , bz ]
where each sub-query [ar , br ] (1 ≤ r ≤ z) is a range over the r-th dimension.
We extend our privacy-preserving scheme for one-dimensional data to multi-dimensional
data as follows. Let D1 , · · · , Dn denote n z-dimensional data items, where Dj = (d1 , · · · , dz )
j
j
(1 ≤ j ≤ n). First, the organization encrypts these data with its secret key k, i.e., computes
(D1 )k , · · · , (Dn )k . Second, for each dimension r, it applies our order-preserving hash-based
function fkr (), i.e., computes fkr (dr ), · · · , fkr (dr ), where kr is the secret key of the ordern
1
preserving hash-based function for the r-th dimension. Last, it sends the encrypted data
items (D1 )k , · · · , (Dn )k , and fk1 (d1 ), · · · , fk1 (d1 ), · · · , fkz (dz ), · · · , fkz (dz ) to the cloud
n
n
1
1
provider. When a customer wants to perform a query ([a1 , b1 ], · · · , [az , bz ]), it applies the
order-preserving hash-based function fkr () on the lower and upper bounds of each sub-query

68

[ar , br ] and sends [fk1 (a1 ), fk1 (b1 )], · · · , [fkz (az ), fkz (bz )] to the cloud provider. The cloud
provider then compares fkr (dr ), · · · , fkr (dr ) with [fkr (ar ), fkr (br )] for each dimension r, to
n
1
ﬁnd out the query result QR. Considering 5 two-dimensional data items (1,11), (3,5), (6,8),
(7,1) and (9,4), given a range query ([2,7],[3,8]), the query result QR is {(3, 5)k , (6, 8)k }.
To prevent the attack of statistical analysis, the data sent from the organization to the
cloud provider should satisfy the following two conditions. First, for any 1 ≤ j1 = j2 ≤ n,
(Dj1 )k = (Dj2 )k . To satisfy this condition, if multiple data items have the same value for
each dimension, the organization can simply represent them as one data item annotated
with the number of these items. Second, along each dimension r, for any 1 ≤ j1 = j2 ≤ n,
fkr (dr ) = fkr (dr ). To satisfy this condition, the organization needs to revise the data items
j
j
1

2

with the same value for the dimension r. Recall the arbitrary order-preserving property of
fkr (). It allows the organization to arbitrarily revise data items while still preserving the
order of these items in the hash results. In our context, if dr = dr , the organization can
j1
j2
concatenate a distinct number for each of them, i.e., dr |0 and dr |1, and then apply the
j1
j2
hash-based function fkr ().

3.6.2

Integrity for Multi-dimensional Data

To preserve the integrity of multi-dimensional data, the organization builds multidimensional local bit matrices. We ﬁrst present the data structures, multi-dimensional bit
matrices and local bit matrices, and then discuss the usage in integrity veriﬁcation for multidimensional data. Considering the example in Figure 3.5(a), we partition the data domain
into 4 × 6 = 24 buckets. Then, we distribute the data items, D1 , · · · , D5 , into the corresponding buckets. We assign a bit value 1 or 0 to each bucket to indicate whether the bucket
includes data items or not. Let B(dj ) denote the bucket that includes dj . A bucket is called
the r-th left nonempty bucket of data item dj (1 ≤ r ≤ z) if the bucket is the left nearest

69

0
0

1D1
0

0
0

0
0

0
0

1D1
0

0
0

0
0

0
0

1D1
0

0
0

0
0

0

0

0

D41

0

0

0

D41

0

0

0

D41

0

0

1
D3

0

0

0

1
D3

0

0

0

1
D3

0

0
0

1D2 0
0
0
(a)

D
1 5
0

0
0

1D2 0
0
0
(b)

D
1 5
0

0
0

1D2 0
0
0
(c)

D
1 5
0

Figure 3.5. The example 2-dimensional bit matrix and local bit matrices
bucket of B(dj ) that includes data items for the r-th dimension. Similarly, a bucket is called
the r-th right nonempty bucket of data item dj if the bucket is the right nearest bucket of
B(dj ) that includes data items for the r-th dimension. In Figure 3.5(a), B(D2 ) is the 1-th
left nonempty bucket of data item D3 .
Based on the above concepts, we deﬁne bit matrices and local bit matrices as follows. The
bit matrix of all data items, M, is formed by the bit values of all buckets. In Figure 3.5(a),
the bit matrix of the ﬁve data items






M =




0
0
0
0
0
0

1
0
0
0
1
0

0
0
0
1
0
0

0
0
1
0
1
0











The local bit matrix of a data item Dj , M(Dj ), consists of four parts: (1) the bucket id
of B(Dj ); (2) a subset of the bit matrix, which is formed by the bit values of the buckets
within a rectangle, which includes its left and right nonempty buckets for each dimension;
(3) the number of data items in bucket B(Dj ); (4) a distinct integer to distinguish the local
bit matrix of Dj from other data items in bucket B(Dj ). In Figure 3.5(b), the local bit

70

matrix of D3 is




0 0 1


M(D3 ) = ID|  0 1 0  |1|1
1 0 1

where ID is the bucket id of B(D3 ).

The integrity-preserving scheme for z-dimensional data (z
that for 1-dimensional data.

>

Here we only show an example.

data items D1 :(d1 , d2 ), · · · , D5 :(d1 , d2 ) in Figure 3.5.
1 1
5 5

1) is similar as
Consider the ﬁve

The organization sends to

the cloud provider the encrypted data items (D1 )k , · · · , (D5 )k , encrypted local bit
matrices (M(D1 ))k , · · · , (M(D5 ))k , and the encoded data items fk1 (d1 ), · · · , fk1 (d1 ),
1
5
fk2 (d2 ), · · · , fk2 (d2 ). Given a range query that includes two data items D2 and D3 in Figure
1
5
3.5(c), the cloud provider replies to the customer the query result QR = {(D2 )k , (D3 )k } and
the veriﬁcation object V O = {(M(D2 ))k , (M(D3 ))k }.
Next, we analyze the detection probability for multi-dimensional data. Let B1 , · · · , Bm de1
z
note the multiple non-overlapping buckets, and ([li , h1 ], · · · , [li , hz ]) denote a z-dimensional
i
i

bucket Bi (1 ≤ i ≤ m). A bucket Bi is called a single-value bucket if for each dimension r
r
(1 ≤ r ≤ z), li = hr . Let [xr , xr r ] denote the domain for each dimension r. Let e(X) de1 N
i

note the frequency of the data item X : (x1 , · · · , xz ). The detection probability of a deletion
operation by cloud providers can be computed as
Pr =

m
i=1

z
r
r
r
r
r=1 (li − x1 + 1)(xNr − hi + 1) X∈Bi e(X)
n
z
r
r
r
r
j=1 r=1 (dj − x1 + 1)(xNr − dj + 1)

(3.12)

Theorems 3 and 4 can also be extended for multi-dimensional data. We have the following
two theorems.
Theorem 5. Given any n z-dimensional data items D1 , · · · , Dn , the maximum detection
probability of a deletion operation is
P rmax = 100%
71

(3.13)

if and only if each data item Dj (1 ≤ j ≤ n) forms a single-value bucket, i.e.,
([d1 , d1 ], · · · , [dz , dz ]).
j j
j j
Theorem 6. Given n z-dimensional data items D1 , · · · , Dn , the minimum detection probability of a deletion operation is
P rmin =

n
j=1

z
r
r=1 (dj

n
− xr + 1)(xr − dr + 1)
1
j
Nr

(3.14)

if and only if there is only one bucket ([x1 , x1 ], · · · , [xz , xz z ]).
1 N1
1 N
The calculation of the detection probability in Equation 3.12 and the proofs of Theorems
5 and 6 are similar to the 1-dimensional case. Finding optimal bucket partition for multidimensional data is an interesting yet diﬃcult problem and will be discussed in further work.

3.7

Evaluation

We evaluated the eﬃciency and eﬀectiveness of our privacy and integrity preserving schemes
for both 1-dimensional and multi-dimensional data. In terms of eﬃciency, we measured the
data processing time for organizations, and the space cost and query processing time for
cloud providers. In terms of eﬀectiveness, we measured whether the experimental detection
probability of deletion operations by cloud providers is consistent with the theoretically
analysis discussed in Sections 3.5.1 and 3.6.2. Our experiments were implemented in Java
1.6.0 and carried out on a PC running Linux with 2 Intel Xeon cores and 16GB of memory.

3.7.1

Evaluation Setup

We conducted our experiments on a real data set, Adult, and ﬁve synthetic datasets. The
Adult dataset is from the UCI Machine Learning Repository [32] and has been widely used
in previous studies. It contains 45222 records. We chose three attributes in this dataset,
72

Age, Education, and Hours-per-week. Note that Education is a categorical attribute and we
mapped each Education value to a distinct integer. The domains of these three attributes
are [17, 90], [1, 16], and [1, 99], respectively. The ﬁve synthetic datasets are generated by randomly choosing 102 , 103, · · · , 106 data items from ﬁve domains [0, 103], [0, 104 ], · · · , [0, 107 ],
respectively. For our order-preserving hash-based function fk (), we used HMAC-MD5 with
128-bit keys as the basic hash function hk (). We used the DES encryption algorithm to
encrypt both data items and local bit matrices.

3.7.2

Results for 1-dimensional Data

We employed the synthetic datasets to evaluate the eﬃciency and eﬀectiveness of our schemes
for 1-dimensional data. For each synthetic dataset, given diﬀerent number of buckets, we
ﬁrst computed the optimal partition and the maximum detection probability, and then we
implemented our schemes using the optimal partition. We also generated 1,000 random
range queries to measure the total processing time for cloud providers and verify query
result integrity for customers. To process the query, we used the binary search algorithm to
ﬁnd out the query result. Let n denote the number of data items in a dataset and m denote
the given number of buckets. According to Theorem 3, if all data items form single-value
buckets, the detection probability is 100%. The rest buckets are empty buckets and the
number of these buckets is n + 1. Thus, the total number of buckets can be computed as
2n + 1. In other words, given m = 2n + 1, the output of our optimal algorithm should be
these 2n + 1 buckets. Based on this observation, we deﬁne partition ratio as m/(2n + 1).
The partition ratio helped us to normalize the results of our optimal partition algorithm for
diﬀerent datasets. Figure 3.6 shows the normalized results for the ﬁve synthetic datasets.
We observed that the detection probability increases with the partition ratio and if partition
ratio is equal to 1, i.e., m = 2n+1, the probability becomes 1, which conﬁrms our discussion.

73

Detection probability (%)

100

80

60
n=1e+2
n=1e+3
n=1e+4
n=1e+5
n=1e+6

40

20

0.2

0.4
0.6
0.8
Partition ratio

1

Experiment detection probability

Figure 3.6. Eﬀectiveness of optimal partition algorithm

100

Experimental results
Theoretical line

80

60

40

20
20

40
60
80
100
Theoretic detection probability

Figure 3.7. Correctness of integrity-preserving scheme

74

To check whether the experimental detection probability is consistent with the theoretical
analysis, for each dataset, we randomly deleted a data item in each query result and then
computed the percentage of query results that were detected by our integrity-preserving
scheme. Note that this percentage is the experimental detection probability. Figure 3.7
shows that the experimental detection probability is close to the theoretical line, which
demonstrates the correctness of our analysis.
Figures 3.8 and 3.9 show the data processing time and space cost for the ﬁve synthetic
datasets, respectively. Note that the horizonal and vertical axes in these ﬁgures are in
logarithmic scales. In Figure 3.8, we observed that the data processing time is less than 300
seconds for 105 data items. Although for one million data items, the data processing time
is about 50 minutes, which is reasonable for real applications because the data processing is
a one-time oﬄine procedure. In Figure 3.9, we observed that the space cost grows linearly
with the number of data items in a dataset. A cloud provider needs 33MB to store one
million data items from an organization.
Figure 3.10 shows the total processing time of 1,000 queries for the ﬁve synthetic datasets.
Processing 1,000 queries over one million data items only takes 2 seconds.

3.7.3

Results for Multi-dimensional Data

We employed the Adult dataset to evaluate the eﬃciency and eﬀectiveness of our schemes
for multi-dimensional data. The experimental results show that the data processing time
for this dataset is 104 seconds, the space cost is 1.5MB, and the total processing time of
1,000 random queries is 3.5 seconds. Due to the absence of the optimal partition algorithm
for multi-dimensional data, we arbitrarily partitioned the Adult dataset to diﬀerent sets of
buckets. The results show that the experimental detection probability is consistent with the
theoretical analysis for multi-dimensional range queries.

75

Data processing time (s)

1e+4
1e+3
1e+2
1e+1
1e+0
1e−1

1e+2

1e+3
1e+4
1e+5
Number of data items

1e+6

Figure 3.8. Data processing time

Space cost (KB)

1e+5
1e+4
1e+3
1e+2
1e+1
1e+0

1e+2

1e+3
1e+4
1e+5
Number of data items
Figure 3.9. Space cost

76

1e+6

Query processing time (s)

2.2
2
1.8
1.6
1.4
1.2
1
0.8
0.6
1e+2

1e+3
1e+4
1e+5
Number of data items

Figure 3.10. Query processing time

77

1e+6

CHAPTER 4
Privacy Preserving Cross-Domain
Cooperative Firewall Optimization
4.1
4.1.1

Introduction
Background and Motivation

Firewalls are critical in securing private networks of businesses, institutions, and home networks. A ﬁrewall is often placed at the entrance between a private network and the external
network so that it can check each incoming or outgoing packet and decide whether to accept
or discard the packet based on its policy. A ﬁrewall policy is usually speciﬁed as a sequence
of rules, called Access Control List (ACL), and each rule has a predicate over multiple packet
header ﬁelds (i.e., source IP, destination IP, source port, destination port, and protocol type)
and a decision (i.e., accept and discard) for the packets that match the predicate. The rules
in a ﬁrewall policy typically follow the ﬁrst-match semantics where the decision for a packet
is the decision of the ﬁrst rule that the packet matches in the policy. Each physical interface
of a router/ﬁrewall is conﬁgured with two ACLs: one for ﬁltering outgoing packets and the

78

other one for ﬁltering incoming packets. In this work, we use ﬁrewalls, ﬁrewall policies, and
ACLs, interchangeably.
The number of rules in a ﬁrewall signiﬁcantly aﬀects its throughput. Figure 4.1 shows the
result of the performance test of iptables conducted by HiPAC [2]. It shows that increasing
the number of rules dramatically reduces the ﬁrewall throughput. Unfortunately, with explosive growth of services deployed on the Internet, ﬁrewall policies are growing rapidly in
size. Thus, optimizing ﬁrewall policies is crucial for improving network performance.

Throughput (Percentage)

100
90
80
70
60
50
40
30
20
10
0
25

50

100 200 400 800 1600 3200 6400
Number of rules

Figure 4.1. Eﬀect of the number of rules on the throughput with frame size 128 bytes [2]

4.1.2

Limitation of Prior Work

Prior work on ﬁrewall optimization focuses on either intra-ﬁrewall optimization [28, 50, 51,
52, 53, 55, 56, 57] or inter-ﬁrewall optimization [11, 81] within one administrative domain
where the privacy of ﬁrewall policies is not a concern. Intra-ﬁrewall optimization means
optimizing a single ﬁrewall. It is achieved by either removing redundant rules [50, 52] or
rewriting rules [28, 51, 53, 55, 56, 57]. Prior work on inter-ﬁrewall optimization requires

79

two ﬁrewall policies without any privacy protection, and thus can only be used within one
administrative domain. However, in reality, it is common that two ﬁrewalls belong to diﬀerent
administrative domains where ﬁrewall policies cannot be shared with each other. Keeping
ﬁrewall policies conﬁdential is important for two reasons. First, a ﬁrewall policy may have
security holes that can be exploited by attackers. Quantitative studies have shown that
most ﬁrewalls are misconﬁgured and have security holes [76]. Second, a ﬁrewall policy often
contains private information, e.g., the IP addresses of servers, which can be used by attackers
to launch more precise and targeted attacks.

4.1.3

Cross-domain Inter-ﬁrewall Optimization

To our best knowledge, no prior work focuses on cross-domain privacy-preserving interﬁrewall optimization. This work represents the ﬁrst step in exploring this unknown space.
Speciﬁcally, we focus on removing inter-ﬁrewall policy redundancies in a privacy-preserving
manner. Consider two adjacent ﬁrewalls 1 and 2 belonging to diﬀerent administrative domains Net1 and Net2 . Let F W1 denote the policy on ﬁrewall 1’s outgoing interface to
ﬁrewall 2 and F W2 denote the policy on ﬁrewall 2’s incoming interface from ﬁrewall 1. For
a rule r in F W2 , if all the packets that match r but do not match any rule above r in F W2
are discarded by F W1 , rule r can be removed because such packets never come to F W2 . We
call rule r an inter-ﬁrewall redundant rule with respect to F W1 . Note that F W1 and F W2
only ﬁlter the traﬃc from F W1 to F W2 ; the traﬃc from ﬁrewall 2’s outgoing interface to
ﬁrewall 1’s incoming interface is guarded by other two separate policies. For simplicity, we
assume that F W1 and F W2 have no intra-ﬁrewall redundancy as such redundancy can be
removed using the proposed solutions [50, 52].
Figure 4.2 illustrates inter-ﬁrewall redundancy, where two adjacent routers belong to different administrative domains CSE and EE. The physical interfaces connecting two routers

80

I1

CSE
subnet

SIP

DIP

I2

SP DP PR Dec

SIP

EE
subnet

DIP

SP DP PR Dec

r1' 1.2.*.* 192.168.*.* *

* TCP

d

r1 1.2.1.* 192.168.1.*

*

25 TCP

a

r2' 2.3.*.* 192.168.*.* *

* TCP

a

r2 1.2.1.* 192.168.*.* 80

* TCP

d

r3 '

*

d

r3

*

a

*

*

*

*

*

*

*

*

FW2: filtering I2’s incoming packets

FW1: filtering I1’s outgoing packets

Figure 4.2. Example inter-ﬁrewall redundant rules
are denoted as I1 and I2 , respectively. The rules of the two ﬁrewall policies F W1 and F W2 ,
that are used to ﬁlter the traﬃc ﬂowing from CSE to EE, are listed in two tables following
the format used in Cisco Access Control Lists. Note that SIP, DIP, SP, DP, PR, and Dec
denote source IP, destination IP, source port, destination port, protocol type, and decision,
′
respectively. Clearly, all the packets that match r1 and r2 in F W2 are discarded by r1 in
′
F W1 . Thus, r1 and r2 of F W2 are inter-ﬁrewall redundant with respect to r1 in F W1 .

4.1.4

Technical Challenges and Our Approach

The key challenge is to design a protocol that allows two adjacent ﬁrewalls to identify the
inter-ﬁrewall redundancy with respect to each other without knowing the policy of the other
ﬁrewall. While intra-ﬁrewall redundancy removal is complex [50, 52], inter-ﬁrewall redundancy removal with the privacy-preserving requirement is even harder. To determine whether
a rule in F W2 is inter-ﬁrewall redundant with respect to F W1 , Net2 certainly needs some
information about F W1 ; yet, Net2 cannot reveal F W1 from such information.
A straightforward solution is to perform a privacy preserving comparison between two rules
from two adjacent ﬁrewalls. Particularly, for each rule r in F W2 , this solution checks whether

81

all possible packets that match rule r in F W2 match a rule r ′ with the discard decision in
F W1 . If rule r ′ exists, r is inter-ﬁrewall redundant with respect to r ′ in F W1 . However,
because ﬁrewalls follow the ﬁrst-match semantics and the rules in a ﬁrewall typically overlap,
this solution is not only incorrect but also incomplete. Incorrect means that wrong redundant
rules could be identiﬁed in F W2 . Suppose this solution identiﬁes r as a redundant rule in
′
F W2 with respect to r2 in F W1 . However, if some packets that match rule r also match rule
′
′
′
r1 (r1 is above r2 ) with the accept decision in F W1 , these packets will pass through F W1 and

then F W2 needs to ﬁlter them with r. In this case, r is actually not redundant. Incomplete
means that a portion of redundant rules could be identiﬁed in F W2 . If all possible packets
that match rule r in F W2 are discarded by not only one rule but multiple rules in F W1 , r is
also redundant. However, the direct comparison solution cannot identify such redundancies.
Our basic idea is as follows. For each rule r in F W2 , we ﬁrst compute a set of compact
predicates representing the set of packets that match r but do not match the rules above r in
F W2 . Then, for each predicate, we check whether all the packets that match the predicate
are discarded by F W1 . If this condition holds for all the predicates computed from rule r,
then rule r is redundant. To eﬃciently compute these predicates, we convert ﬁrewalls to
ﬁrewall decision diagrams [52]. To allow the two ﬁrewalls to detect the redundant rules in
F W2 in a privacy-preserving manner, we develop a protocol so that two ﬁrewalls can detect
such redundant rules without disclosing their policies to each other.

4.1.5

Key Contributions

We make two key contributions. First, we propose a novel privacy-preserving protocol for
detecting inter-ﬁrewall redundant rules in one ﬁrewall with respect to another ﬁrewall. This
work represents the ﬁrst eﬀort along this unexplored direction. Second, we implemented our
protocol and conducted extensive experiments on both real and synthetic ﬁrewall policies.

82

The results on real ﬁrewall policies show that our protocol can remove as many as 49% of
the rules in a ﬁrewall whereas the average is 19.4%. The communication cost is less than a
few hundred KBs. Our protocol incurs no extra online packet processing overhead and the
oﬄine processing time is less than a few hundred seconds.

4.2
4.2.1

System and Threat Models
System Model

A ﬁrewall F W is an ordered list of rules. Each rule has a predicate over d f ields F1 , · · · , Fd
and a decision for the packets that match the predicate. Firewalls usually check ﬁve ﬁelds:
source IP, destination IP, source port, destination port, and protocol type. The length of
these ﬁelds are 32, 32, 16, 16, and 8 bits, respectively. A predicate deﬁnes a set of packets over
the d ﬁelds, and is speciﬁed as F1 ∈T1 ∧· · · ∧Fd ∈Td where each Ti is a subset of Fi ’s domain
D(Fi ). A packet over the d ﬁelds F1 ,· · · ,Fd is a d-tuple (p1 , · · · , pd ) where each pi (1≤i≤d) is
an element of D(Fi ). A packet (p1 , · · · , pd ) matches a rule F1 ∈T1 ∧· · · ∧Fd ∈Td → decision
if and only if the condition p1 ∈T1 ∧· · · ∧pd ∈Td holds. Typical ﬁrewall decisions include
accept, discard, accept with logging, and discard with logging. Without loss of generality,
we only consider accept and discard in this work. We call a rule with the accept decision
an accepting rule and a rule with the discard decision a discarding rule. In a ﬁrewall policy,
a packet may match multiple rules whose decisions are diﬀerent. To resolve these conﬂicts,
ﬁrewalls typically employ a ﬁrst-match semantics where the decision for a packet p is the
decision of the ﬁrst rule that p matches. A matching set of ri , M(ri ), is the set of all
possible packets that match the rule ri [50]. A resolving set of ri , R(ri ), is the set of
packets that match ri but do not match any rule rj above ri (j<i), and R(ri ) is equal to
M(ri )−M(r1 )∪· · · ∪M(ri−1 ) [50].

83

Based on above concepts, we deﬁne inter-ﬁrewall redundant rules. Given two adjacent
ﬁrewalls F W1 and F W2 , where the traﬃc ﬂow is from F W1 to F W2 , a rule r in F W2 is
inter-ﬁrewall redundant with respect to F W1 if and only if all the packets in r’s resolving
set are discarded by F W1 .

4.2.2

Threat Model

We adopt the semi-honest model in [35]. For two adjacent ﬁrewalls, we assume that they
are semi-honest, i.e., each ﬁrewall follows our protocol correctly but each ﬁrewall may try to
reveal the policy of the other ﬁrewall. The semi-honest model is realistic and well adopted
[18, 78]. For example, this model is appropriate for large organizations that have many
independent branches as well as for loosely connected alliances composed by multiple parties.
While we are conﬁdent that all administrative domains follow mandate protocols, we may
not guarantee that no corrupted employees are trying to reveal the private ﬁrewall policies
of other parties. We leave investigation of privacy-preserving ﬁrewall optimization in the
model with malicious participants to future work.

4.3

Privacy-Preserving

Inter-Firewall

Redundancy

Removal
In this section, we present our privacy-preserving protocol for detecting inter-ﬁrewall redundant rules in F W2 with respect to F W1 . To do this, we ﬁrst converts each ﬁrewall
to an equivalent sequence of non-overlapping rules. Because for any non-overlapping rule
nr, the matching set of nr is equal to the resolving set of nr, i.e., M(nr) = R(nr), we
only need to compare non-overlapping rules generated from the two ﬁrewalls for detecting
inter-ﬁrewall redundancy. Second, we divide this problem into two subproblems, single-rule

84

coverage redundancy detection and multi-rule coverage redundancy detection, and then propose our privacy-preserving protocol for solving each subproblem. A rule nr is covered by
′
′
′
′
one or multiple rules nri · · · nri (k ≥ 1) if and only if M(nr)⊆M(nri )∪· · · ∪M(nri ).
1
1
k
k
The ﬁrst subproblem checks whether a non-overlapping rule nr in F W2 is covered by a

non-overlapping discarding rule nr ′ in F W1 , i.e., M(nr)⊆M(nr ′ ). The second subproblem
checks whether a non-overlapping rule nr in F W2 is covered by multiple non-overlapping
′
′
′
′
discarding rules nri · · · nri (k ≥ 2) in F W1 , i.e., M(nr)⊆M(nri )∪· · · ∪M(nri ). Finally,
1
1
k
k
after redundant non-overlapping rules generated from F W2 are identiﬁed, we map them back

to original rules in F W2 and then identify the redundant ones.
The problem of checking whether M(nr)⊆M(nr ′ ) boils down to the problem of checking
whether one range [a, b] in nr is contained by another range [a′ , b′ ] in nr ′ , which further boils
down to the problem of checking whether a ∈ [a′ , b′ ] and b ∈ [a′ , b′ ]. Thus, we ﬁrst describe
the privacy-preserving protocol for comparing a number and a range.

4.3.1

Privacy-Preserving Range Comparison

To check whether a number a from F W2 is in a range [a′ , b′ ] from F W1 , we use a method
similar to the preﬁx membership veriﬁcation scheme in [48]. The basic idea is to convert
the problem of checking whether a ∈ [a′ , b′ ] to the problem of checking whether two sets
converted from a and [a′ , b′ ] have a common element. Our method consists of four steps:
(1) Preﬁx conversion. It converts [a′ , b′ ] to a minimum number of preﬁxes, denoted as
T ([a′ , b′ ]), whose union is [a′ , b′ ]. For example, T ([11, 15])={1011, 11**}.
(2) Preﬁx family construction. It generates all the preﬁxes that contains a including a
itself. This set of preﬁxes is called the preﬁx family of a, denoted as F (a). Let k be the
bit length of a. The preﬁx family F (a) consists of k + 1 preﬁxes where the i-th preﬁx is
obtained by replacing the last i − 1 bits of a by ∗. For example, as the binary representation

85

[11, 15]
⇓ Preﬁx conversion
1011
11**
⇓ Preﬁx numericalization
10111
11100
⇓ Encrypt by Net1
(10111)K1
(11100)K1
⇓ Encrypt by Net2
((10111)K1 )K2
((11100)K1 )K2

12 (=1100)
⇓ Preﬁx family construction
1100
11**
****
110*
1***
⇓ Preﬁx numericalization
11001 11100 10000
11010 11000
⇓ Encrypt by Net2
(11001)K2 (11100)K2 (10000)K2
(11010)K2 (11000)K2
⇓ Encrypt by Net1
((11001)K2 )K1 ((11100)K2 )K1
((10000)K2 )K1 ((11010)K2 )K1
((11000)K2 )K1

(a)

(b)
Figure 4.3. Preﬁx membership veriﬁcation

of 12 is 1100, we have F (12)={1100, 110*, 11**, 1***, ****}. It is not diﬃcult to prove that
a ∈ [a′ , b′ ] if and only if F (a) ∩ T ([a′ , b′ ]) = ∅.
(3) Preﬁx numericalization. It converts the preﬁxes generated in the previous steps to
concrete numbers such that we can encrypt them in the next step. We use the preﬁx
numericalization scheme in [20]. Given a preﬁx b1 b2 · · · bk *· · · * of w bits, we ﬁrst insert 1
after bk . The bit 1 represents a separator between b1 b2 · · · bk and *· · · *. Then we replace
every ∗ by 0. For example, 11** is converted to 11100. If the preﬁx does not contain *s, we
place 1 at the end. For example, 1100 is converted to 11001.
(4) Comparison. It checks whether a ∈ [a′ , b′ ] by checking whether F (a) ∩ T ([a′ , b′ ]) =
∅, which boils down to checking whether two numbers are equal. We use commutative
encryption to do this checking in a privacy-preserving manner. Given a number x and
two encryption keys K1 and K2 , a commutative encryption is a function that satisﬁes the
property ((x)K1 )K2 = ((x)K2 )K1 , i.e., encryption with key K1 ﬁrst and then K2 is equivalent

86

to encryption with key K2 ﬁrst and then K1 . Example commutative encryption algorithms
are the Pohlig-Hellman Exponentiation Cipher [64] and Secure RPC Authentication (SRA)
[67]. Each domain chooses a private key. Let K1 , K2 be the private keys chosen by Net1
and Net2 , respectively. To check whether number v1 from Net1 is equal to number v2 from
Net2 without disclosing the value of each number to the other party, Net1 can ﬁrst encrypt
v1 using key K1 and sends (x)K1 to Net2 ; similarly, Net2 can ﬁrst encrypt v2 using key K2
and sends (x)K2 to Net1 . Then, each party checks whether v1 = v2 by checking whether
((v1 )K1 )K2 = ((v2 )K2 )K1 . Note that ((v1 )K1 )K2 =((v2 )K2 )K1 if and only if v1 = v2 . If
v1 = v2 , neither party can learn anything about the numbers being compared. Figure 4.3
illustrates the process of checking whether 12 from F W2 is in the range [11, 15] from F W1 .

4.3.2

Processing Firewall F W1

To detect redundant rules in F W2 , Net1 converts its ﬁrewall F W1 to a set of non-overlapping
rules. To preserve the privacy of F W1 , Net1 ﬁrst converts each range of a non-overlapping
discarding rules from F W1 to a set of preﬁxes. Second, Net1 and Net2 encrypt these preﬁxes
using commutative encryption. The conversion of F W1 includes nine steps:
(1) Net1 ﬁrst converts F W1 to an equivalent ﬁrewall decision diagram (FDD) [36, 37].
An FDD for a ﬁrewall F W of a sequence of rules r1 , · · · , rn over ﬁelds F1 , · · · , Fd is an
acyclic and directed graph that has ﬁve properties. (a) There is exactly one node that has
no incoming edges. This node is called the root. The nodes that have no outgoing edge are
called terminal nodes. (b) Each node v has a label, denoted F (v). If v is a nonterminal
node, then F (v) ∈ {F1 , · · · , Fd }. If v is a terminal node, then F (v) is a decision. (c) Each
edge e, u→v, is labeled with a non-empty set of integers, denoted I(e), where I(e) is a subset
of the domain of u’s label (i.e., I(e)⊆D(F (u))). (d) The set of all outgoing edges of a node
v, denoted E(v), satisﬁes two conditions: (a) consistency: I(e)∩I(e′ )=∅ for any two distinct

87

r1 ' : F1 [0, 4] F2 [7,15] o d
r2 ' : F1 [5, 7] F2 [5,15] o d
(a)
r3 ' : F1 [5, 7] F2 [0, 8] o a
r4 ' : F1 [0,15] F2 [0,15] o d
FDD construction

F1

[0, 4]

[8, 15]

[5, 7]
[0,6]

F2

d

F2

[7,15][0,4]
d

[9,15]

[5,8]

a

d

F2
[0,15]

d

d

FDD reduction
F1
[5, 7]

[0, 4]
F2

F2
[0,4]

[0,15]
d

[8, 15]

a

Encrypt by Net2
­ ((00100) K ) K , (01001) K ) K ½
1
2
1
2 °
°
® (10000) K1 ) K 2 , ((01011) K1 ) K 2 ¾ (i)
° (01110) K ) K , (11000) K ) K °
1
2
1
2 ¿
¯

Reconstruct nonoverlapping rules
­((00100) K1 ) K 2 ½
27 : ®
¾ {(10000) K1 ) K 2 } o d
¯ (01001) K1 ) K 2 ¿
­ (01011) K ) K ½
1
2 °
­((01011) K1 ) K 2 ½ °
13: ®
¾  ®((01110) K1 ) K 2 ¾ o d
¯ (01110) K1 ) K 2 ¿ ° (11000) )
°
K1 K 2 ¿
¯
45:{(11000) K1 ) K 2 }{(10000) K1 ) K 2 } o d

(b)

F2 (c)
[5,15] [0,15]
d

d

Extract non-overlapping rules
with the discard decision
nr1 ' : F1 [0, 4] F2 [0,15] o d
nr2 ' : F1 [5, 7] F2 [5,15] o d
(d)
nr3 ' : F1 [8,15] F2 [0,15] o d
Convert ranges to prefixes
{00**,0100}{****}o d
{0101,011*}{0101,011*,1***}o d (e)
{1***}{****}o d
Extract and permute
the prefixes
­00**, 0100, ****½
(f)
®0101, 011*, 1***¾
¯
¿
Numericalize the prefixes
­00100, 01001, 10000½
(g)
®01011, 01110, 11000¾
¯
¿
Encrypt by Net1

­(00100) K1 , (01001) K1 , (10000) K1 ½
®(01011) , (01110) , (11000) ¾ (h)
K1
K1
K1 ¿
¯
Figure 4.4. The Conversion of F W1
88

(j)

edges e and e′ in E(v); (b) completeness:

e∈E(v) I(e)=D(F (v)).

(e) A directed path from

the root to a terminal node is called a decision path. No two nodes on a decision path have
the same label. Each path in the FDD corresponds to a non-overlapping rule. A full-length
ordered FDD is an FDD where in each decision path all ﬁelds appear exactly once and in the
same order. For ease of presentation, we use the term “FDD” to denote “full-length ordered
FDD”. An FDD construction algorithm, which converts a ﬁrewall policy to an equivalent
FDD, is presented in [49]. Figure 4.4(b) shows the FDD constructed from Figure 4.4(a).
(2) Net1 reduces the FDD’s size by merging isomorphic subgraphs. An FDD f is reduced if
and only if it satisﬁes two conditions: (a) no two nodes in f are isomorphic; (b) no two nodes
have more than one edge between them. Two nodes v and v ′ in an FDD are isomorphic
if and only if v and v ′ satisfy one of the following two conditions: (a) both v and v ′ are
terminal nodes with identical labels; (b) both v and v ′ are nonterminal nodes and there is
a one-to-one correspondence between the outgoing edges of v and the outgoing edges of v ′
such that every two corresponding edges have identical labels and they both point to the
same node. Figure 4.4(c) shows the FDD reduced from the FDD in Figure 4.4(b).
(3) Net1 extracts non-overlapping discarding rules. Net1 does not extract non-overlapping
accepting rules because the packets accepted by F W1 are passed to F W2 . Note that a nonoverlapping rule from F W2 that is covered by these discarding rules from F W1 is redundant.
Figure 4.4(d) shows the discarding non-overlapping rules extracted from the reduced FDD
in Figure 4.4(c).
(4) Net1 converts each range to a set of preﬁxes. Figure 4.4(e) shows the preﬁxes generated
from Figure 4.4(d).
(5) Net1 unions all these preﬁx sets and then permutes the preﬁxes. Figure 4.4(f) shows
the resulting preﬁx set. Note that the resulting set does not include duplicate preﬁxes.
The beneﬁts are two-fold. In terms of eﬃciency, it avoids encrypting and sending duplicate
preﬁxes for both parties, and hence, signiﬁcantly reduces computation and communication
89

costs. In terms of security, Net2 cannot reconstruct the non-overlapping rules from F W1 ,
because Net2 does not know which preﬁx belongs to which ﬁeld of which rule. However,
Net1 knows such information and it can reconstruct these non-overlapping rules.
(6) Net1 numericalizes and encrypts each preﬁx using K1 , and then sends to Net2 . Figures
4.4(g) and (h) show the numericalized and encrypted preﬁxes, respectively.
(7) Net2 further encrypts these preﬁxes with K2 and sends them back to Net1 as shown
in Figure 4.4(i).
(8) Net1 reconstructs its non-overlapping discarding rules from the double encrypted preﬁxes because Net1 knows which preﬁx belongs to which ﬁeld of which rule.
(9) For each reconstructed non-overlapping rule, Net1 assigns a distinct random index to
it. These indices are used for Net2 to identify the redundant non-overlapping rules from
F W2 . For example, in Figure 4.6(a), Net1 assigns its three rules with three random indices:
27, 13, and 45. The detailed discussion is in Section 4.3.4.

4.3.3

Processing Firewall F W2

In order to compare two ﬁrewalls in a privacy-preserving manner, Net2 and Net1 convert
ﬁrewall F W2 to d sets of double encrypted numbers, where d is the number of ﬁelds. The
conversion of F W2 includes ﬁve steps:
(1) Net2 converts F W2 to an equivalent all-match FDD. All-match FDDs diﬀer from FDDs
on terminal nodes. In an all-match FDD, each terminal node is labeled with a nonempty
set of rule sequence numbers, whereas in an FDD each terminal node is labeled with a decision. For rule ri (1≤i≤n), we call i the sequence number of ri . The set of rule sequence
numbers labeled on a terminal node consists of the sequence numbers of all the rules that
overlaps with the decision path ending with this terminal node. Given a decision path P,
(v1 e1 · · · vd ed vd+1 ), the matching set of P is deﬁned as the set of all packets that satisfy

90

r1 : F1 [0, 2] F2 [7,15] o a
r2 : F1 [0, 5] F2 [7,15] o d
r3 : F1 [6,15] F2 [0, 5] o a (a)
r4 : F1 [0,15] F2 [0,15] o d
Construct all-match FDD

F1
[6, 15]
[3, 5]

[0, 2]
F2

[0,6]

[7,15] [0,6]

[7,15] [0, 5]

(b)
F2

[6,15]

2,4
3,4
4
d
d
a
Extract nonoverlapping rules
nr1 : F1 [0, 2] F2 [0, 6] o d
nr2 : F1 [0, 2] F2 [7,15] o a
(c)
nr3 : F1 [3, 5] F2 [0, 6] o d
nr4 : F1 [3, 5] F2 [7,15] o d
nr5 : F1 [6, 15] F2 [0, 5] o a
nr6 : F1 [6, 15] F2 [6,15] o d
Convert values to
prefix families
nr1 : {F (0), F (2)}{F (0), F (6)}
nr2 : {F (0), F (2)}{F (7), F (15)}
(d)
nr3 : {F (3), F (5)}{F (0), F (6)}
nr4 : {F (3), F (5)}{F (7), F (15)}
nr5 : {F (6), F (15)}{F (0), F (5)}
nr6 : {F (6), F (15)}{F (6), F (15)}
Extract and permute
prefixes for each filed
­0000, 0010, 0011½ ­0000, 0101, 0110½
°0101, 0110, 1111° °0111, 1111, 000* °
°000*, 001*, 010*° °
°
®011*, 111*, 00** ¾, ®010*, 011*, 111* ¾(e)
°01**, 11**, 0*** ° °00**, 01**, 11** °
°
°
°1***, ****
° ¯0***,1***, **** ¿
¯
¿

4
d

1,2,4
a

F2

4
d

Numericalize and
encrypt by Net2
½
½­
­


°(01011) K ° °(01011) K °
2°
2°°
°


°
°
°
°
(f)
®(00100) K 2 ¾, ®(01110) K 2 ¾
°
°°
°


°(11000) K ° °(11000) K °
2°
2°°
°

¿ ¯(10000) K 2 ¿
¯
Encrypt by Net1

½
½­
­


°((01011) K ) K ° °((01011) K ) K °
2
1°
2
1° °
°


°
°°
°
®((00100) K 2 ) K1 ¾, ®((01110) K 2 ) K1 ¾ (g)
°
°°
°


°((11000) K ) K ° °((11000) K ) K °
2
1°
2
1° °
°

¿ ¯((10000) K 2 ) K1 ¿
¯

Figure 4.5. The Conversion of F W2

91

F (v1 )∈I(e1 )∧· · · ∧F (vd )∈I(ed ). We use M(P) to denote the matching set of P. More formally, in an all-match FDD, for any decision path P : (v1 e1 · · · vd ed vd+1 ), if M(P)∩M(ri )=∅,
then M(P)⊆M(ri ) and i∈F (vd+1 ). Figure 4.4(b) shows the all-match FDD generated from
Figure 4.4(a). Considering the terminal node of the fourth path in Figure 4.4(b), its label {2,
4} means that M(P)⊆M(r2 ), M(P)⊆M(r4 ), M(P)∩M(r1 )=∅, and M(P)∩M(r3 )=∅. An
all-match FDD not only represents a ﬁrewall in a non-overlapping fashion but also represents
the overlapping relationship among rules. The reason of converting F W2 to an all-match
FDD is that later Net2 needs the rule sequence numbers to identify inter-ﬁrewall redundant
rules in F W2 .
(2) Net2 extracts all non-overlapping rules from the all-match FDD. Figure 4.5(c) shows
the non-overlapping rules extracted from Figure 4.5(b). For each range [a, b] of a nonoverlapping rule, Net2 generates two preﬁx families F (a) and F (b). Figure 4.5(d) shows the
result from Figure 4.5(c).
(3) For every ﬁeld Fk , Net2 unions all preﬁx families of all the non-overlapping rules into
one preﬁx set and permutes the preﬁxes. Considering the ﬁrst ﬁeld F1 in Figure 4.5(d),
Net2 unions F (0), F (2), F (3), F (5), F (6) and F (15) to the ﬁrst preﬁx set in Figure 4.5(e).
The beneﬁts of this step are similar to those of Step (5) in F W1 ’s conversion. In terms of
eﬃciency, it avoids encrypting and sending the duplicate preﬁxes for each ﬁeld, and hence,
signiﬁcantly reduces the computation and communication costs. In terms of security, Net1
cannot reconstruct the non-overlapping rules of F W2 , because Net1 does not know which
preﬁx belongs to which rule in F W2 . However, Net2 knows such information, which will be
used to identify redundant non-overlapping rules later. Note that, the ordering of the ﬁelds
cannot be permuted because Net1 needs to perform comparison of the preﬁxes with only
those preﬁxes from the corresponding ﬁelds.
(4) Net2 numericalizes and encrypts the preﬁxes using its private K2 , and sends them to
Net1 . Figure 4.5(f) shows the preﬁxes.
92

(5) Net1 further encrypts these preﬁxes using key K1 .

4.3.4

Single-Rule Coverage Redundancy Detection

After processing the two ﬁrewalls, Net1 has a sequence of double encrypted non-overlapping
rules obtained from F W1 and d sets of double encrypted numbers obtained from F W2 . Let
(F1 ∈T1 )∧· · · ∧(Fd ∈Td )→discard denote a double encrypted rule, where Ti is a set of double
encrypted numbers. Let T1 ,· · · ,Td denote the d sets of double encrypted numbers from F W2 .
Figure 4.6(a) shows the double encrypted non-overlapping rules generated from Figure 4.4
and Figure 4.6(b) shows the double encrypted numbers generated from Figure 4.5. For each
ﬁeld Fi (1≤i≤d) and for each number a in Ti , Net1 checks whether there exists a double encrypted rule (F1 ∈T1 )∧· · · ∧(Fd ∈Td )→discard such that a ∈ Ti . If rule ri satisﬁes this condition, then Net1 associates the rule index i with a. As there maybe multiple rules that satisfy
this condition, eventually Net1 associates a set of rule indices with a. If no rule satisﬁes this
condition, Net1 associates an empty set with a. Considering the number ((01011)K2 )K1 ,
only the rule with index 13 contains it because ((01011)K2 )K1 =((01011)K1 )K2 ; thus, Net1
associates ((01011)K2 )K1 with {13}. Finally, Net1 replaces each number in T1 ,· · · ,Td with
its corresponding set of rule indices, and sends them to Net2 .
Upon receiving the sets from Net1 , for each preﬁx family, Net2 ﬁnds the index of the
rule that overlaps with the preﬁx family. For a non-overlapping rule nr from F W2 , if all
its preﬁx families overlap with the same discarding rule nr ′ from F W1 , nr is covered by nr ′
and hence, nr is redundant. For example, in Figure 4.6(d), nr1 is redundant, because F (0),
F (2), F (0) and F (6) overlap with rule 27 from F W1 . Similarly, nr2 is redundant. Note that
F (v):j1 , ..., jk denotes that F (v) overlaps with non-overlapping rules j1 , ..., jk from F W1 .

93

­((00100) K1 ) K 2 ½
{(10000) K1 ) K 2 } o d
27 : ®
(01001) K1 ) K 2 ¾
¯
¿
­ (01011) K ) K ½
1
2 °
­((01011) K1 ) K 2 ½ °
 ®((01110) K1 ) K 2 ¾ o d
13: ®
¾
¯ (01110) K1 ) K 2 ¿ ° (11000) )
°
K1 K 2 ¿
¯
45:{(11000) K1 ) K 2 }{(10000) K1 ) K 2 } o d
(a)

½
­
½­


°((01011) ) °
°((01011) K ) K °
K 2 K1 °
2
1° °
°


°
°
°°
®((00100) K 2 ) K1 ¾, ®((01110) K 2 ) K1 ¾
°
°°
°


°((11000) K ) K ° °((11000) K ) K °
2
1°
2
1° °
°

¯
¿ ¯((10000) K 2 ) K1 ¿
(b)

Compare two reconstructed
firewalls by Net1
½
­

:I ½ ­
:I

° ((01011) K ) K :13 ° ° ((01011) K ) K :13 °
2
1
2
1
°
°
°

:I ° °
:I

°
°
°
(c)
®((00100) K 2 ) K1 : 27¾, ® ((01110) K 2 ) K1 :13 ¾
°
°

:I ° °
:I

°((11000) K ) K : 45° ° ((11000) K ) K :13 °
2
1
2
1
°
:I ° °((10000) K 2 ) K1 : 27,45°

¿¯
¯
¿
Find corresponding prefix families
in FW2 by Net2
nr1 : {F (0) : 27, F (2) : 27}{F (0) : 27, 45,
F (6) :13, 27, 45}
nr2 : {F (0) : 27, F (2) : 27}{F (7) :13, 27, 45, F (15) :13, 27, 45}
F (6) :13, 27, 45} (d)
nr3 : {F (3) : 27, F (5) :13}{F (0) : 27, 45,
nr4 : {F (3) : 27, F (5) :13}{F (7) :13, 27, 45, F (15) :13, 27, 45}
F (5) :13, 27, 45}
nr5 : {F (6) :13, F (15) : 45}{F (0) : 27, 45,
nr6 : {F (6) :13, F (15) : 45}{F (6) :13, 27, 45, F (15) :13, 27, 45}
Figure 4.6. Comparison of Two Firewalls

94

4.3.5

Multi-Rule Coverage Redundancy Detection

To detect multi-rule coverage redundancy, our basic idea is to combine all the non-overlapping
discarding rules from F W1 to a set of new rules such that for any arbitrary rule from F W2 ,
if it is covered by multiple non-overlapping discarding rules from F W1 , it is covered by a rule
′
′
from these new rules. More formally, let nr1 , · · · , nrl denote the non-overlapping discarding
′
′
rules from F W1 and s′ , · · · , s′ denote the set of new rules generated from nr1 , · · · , nrl .
g
1

For any rule nr from F W2 , if a non-overlapping rule nr from F W2 is multi-rule coverage
′
′
′
′
redundant, i.e., M(nr)⊆M(nri )∪· · · ∪M(nri ) where nri · · · nri (k ≥ 2) from F W1 , there
1

is a rule

s′
j

1

k

(1 ≤ j ≤ g) that cover nr, i.e.,

M(nr)⊆M(s′ ).
j

k

Thus, after Net1 computes

the set of new rules from F W1 , we reduce the problem of multi-rule coverage redundancy
detection to single-rule coverage redundancy detection. Then, two parties Net1 and Net2
can cooperatively run our protocol proposed in this section to identify the non-overlapping
single-rule and multi-rule coverage redundant rules from F W2 at the same time. However,
the key question is how to compute the set of new rules s′ , · · · , s′ .
g
1
A straightforward method is to compute all possible rules that are covered by a single or
′
′
multiple non-overlapping discarding rules among nr1 , · · · , nrl . All these rules form the set

of new rules s′ , · · · , s′ . However, this method incurs two major drawbacks. First, the time
g
1
and space complexities of this method can be signiﬁcant because the number of all these
rules could be huge. Second, due to the huge number of these rules, the communication
and computation costs increase signiﬁcantly. The relationship between these costs and the
number of rules is discussed in Section 4.5.2.
Our solution is to compute only the largest rules that are covered by a single or multiple
′
′
non-overlapping discarding rules among nr1 , · · · , nrl . The term largest can be explained

as follows. Without considering the decision, a ﬁrewall rule with d ﬁelds can be denoted
as a hyperrectangle over a d-dimensional space. Then, l non-overlapping discarding rules

95

′
′
nr1 , · · · , nrl are l hyperrectangles over a d-dimensional space. The new rules s′ , · · · , s′
g
1

are also the hyperrectangles. The term largest means that if a hyperrectangle s∗ contains
j
the hyperrectangle s′ but is larger than s′ , s∗ has some parts which are not covered by
j
j
j
′
′
all the l hyperrectangles nr1 , · · · , nrl . For example, the non-overlapping discarding rules
′
′
′
nr1 , nr2 , nr3 in Figure 4.4(d) can be illustrated as three ﬁlled rectangles in Figure 4.7. Figure
′
′
′
4.7 also shows three new rules s′ , s′ , s′ generated from nr1 , nr2 , nr3 , where s′ is illustrated
1 2 3
1
′
′
in the dashed rectangle, s′ is the same as nr2 , and s′ is the same as nr3 . Note that the
2
3

values of two ﬁelds F1 and F2 are integers. Thus, we can combine three ranges [0, 4], [5, 7],
and [8, 15] to a range [0, 15] for the ﬁeld F1 of s′ .
1

F2

s1'

15
nr1'
nr2'
(or s2')

10

nr3'
(or s3')

5

0

5

10

F1
15

Figure 4.7. Three largest rules generated from Figure 4.4(d)
More formally, we can deﬁne a largest rule s′ (1 ≤ j ≤ g) as follows.
j
′
′
1. M(s′ ) ⊆ M(nr1 ) ∪ · · · ∪ M(nrl ).
j

2. For any rule s∗ that M(s∗ ) ⊃ M(s′ ),
j
j
j
′
′
M(s∗ ) ⊂ M(nr1 ) ∪ · · · ∪ M(nrl ).
j
′
′
Given the set of all largest rules s′ , · · · , s′ generated from nr1 , · · · , nrl , for any rule nr, if nr
g
1
′
′
is covered by one or multiple rules in nr1 , · · · , nrl , there exists a largest rule s′ (1 ≤ j ≤ g)
j

which covers nr. We have the following theorem.
96

′
′
Theorem 7. Given the set of all largest rules s′ , · · · , s′ generated from nr1 , · · · , nrl , for
g
1
′
′
any rule nr, if M(nr) ⊆ M(nr1 ) ∪ · · · ∪ M(nrl ), there exists a largest rule s′ (1 ≤ j ≤ g)
j

which satisﬁes the condition M(nr) ⊆ M(s′ ).
j
Proof. If nr is a largest rule, it is included in s′ , · · · , s′ . If nr is not a largest rule, we prove
g
1
it by contradiction. If there exists no largest rule among s′ , · · · , s′ which covers nr, we can
g
1
generate all possible rules which satisfy two conditions: (1) the matching set of each of these
′
′
rules is the superset of M(nr); (2) each of these rules is covered by nr1 , · · · , nrl . Among all

these generated rules, there exists at least a largest rule otherwise the number of these rules
′
′
is inﬁnite. However, M(nr1 ) ∪ · · · ∪ M(nrl ) is a ﬁnite domain.

Next, we discuss how to compute the set of all the largest rules S = {s′ , · · · , s′ } from the
g
1
′
′
non-overlapping discarding rules nr1 , · · · , nrl . Our idea is to ﬁrst compute the largest rules
′
′
for every two rules among nr1 , · · · , nrl . Repeat computing the largest rules for every two

rules in the previous step until the resulting rules do not change. Finally, the resulting rules
is the set of all the largest rules s′ , · · · , s′ . Note that it is trivial to compute the largest
g
1
rules from two rules. This algorithm is shown in Algorithm 2.
For example, to identify the single-rule and multiple-rule coverage redundancy simultaneously, Net1 only needs to perform one more step between Figure 4.4(d) and (e). Net1
computes all the largest rules from the non-overlapping discarding rules in Figure 4.4(d),
which are
s′ : F1 ∈ [0, 15] ∧ F2 ∈ [5, 15] → d
1
s′ : F1 ∈ [0, 4] ∧ F2 ∈ [0, 15] → d
2
s′ : F1 ∈ [8, 15] ∧ F2 ∈ [0, 15] → d
3
Finally, Net2 can identify that nr4 : F1 ∈ [3, 15] ∧ F2 ∈ [7, 15] → d is redundant because nr4
is covered the by the rule s′ .
1
97

Algorithm 2: Computation of the set of largest rules
′
′
Input: l non-overlapping rules nr1 , · · · , nrl .
Output: The set of all the largest rules S
1

′
′
Initialize S to {nr1 , · · · , nrl };

2

while S has been changed do

3

for every two rules s′ , s′ (i = j) in S do
i j

4

remove s′ and s′ from S;
i
j

5

compute the largest rules from s′ and s′ ;
i
j

6

add the largest rules to S ′ ;

7

8

S = S ′;
return S;

4.3.6

Identiﬁcation and Removal of Redundant Rules

After single-rule and multi-rule coverage redundancy detection, Net2 identiﬁes the redundant
non-overlapping rules in F W2 . Next, Net2 needs to identify which original rules are interﬁrewall redundant. As each path in the all-match FDD of F W2 corresponds to a nonoverlapping rule, we call the paths that correspond to the redundant non-overlapping rules
redundant paths and the remaining paths eﬀective paths. For example, in Figure 4.8, the
dashed paths are the redundant paths that correspond to nr1 , nr2 and nr4 in Figure 4.5(c),
respectively. Finally, Net2 identiﬁes redundant rules based on Theorem 8.
Theorem 8. Given ﬁrewall F W2 : r1 ,· · · ,rn with no intra-ﬁrewall redundancy and its allmatch FDD, rule ri is inter-ﬁrewall redundant with respect to F W1 if and only if two conditions hold: (1) there is a redundant path whose terminal node contains sequence number i;
(2) there is no eﬀective path whose terminal node contains i as the smallest element.

98

F1
[6, 15]
[3, 5]

[0, 2]

[0, 6]
4
d

F2

[7, 15][0,6]

1,2,4
a

4
d

F2

[7,15][0, 5]
2,4
d

3,4
d

F2

[6, 15]
4
a

Figure 4.8. Identiﬁcation of redundant rules in F W2
Proof. Let {P1 ,· · · ,Pm } denote all paths in F W2 ’s all-match FDD. According to the theorems in [50, 52], the resolving set of each rule ri (1≤i≤n) in ﬁrewall F W2 satisﬁes the
condition R(ri )=∪k=t M(Pjk ) (1≤jk ≤m), where Pj1 ,· · · ,Pjt are all the paths whose terk=1
minal nodes contain i as the smallest element. Based on the deﬁnition of inter-ﬁrewall
redundant rules in Section 4.2.1, rule ri is inter-ﬁrewall redundant if and only if all the
packets in ∪k=t M(Pjk ) are discarded by F W1 . Thus, each path Pjk (1≤k≤t) is a redunk=1
dant path. In other words, all the paths Pj1 ,· · · ,Pjt whose terminal nodes contain i as the
smallest element are redundant paths.
Considering redundant paths in Figure 4.8, Net2 identiﬁes that r1 and r2 are inter-ﬁrewall
redundant with respect to F W1 .
Theorem 9. The privacy-preserving inter-ﬁrewall redundancy removal protocol is a complete inter-ﬁrewall redundancy removal scheme.
Proof. Suppose that our proposed protocol is not a complete scheme and hence it cannot detect all inter-ﬁrewall redundant rules in F W2 . Assume that rule ri is inter-ﬁrewall redundant
in F W2 but it is not detected by our protocol. According to Theorem 8, R(ri )=∪k=t M(Pj )
k=1
k

(1≤jk ≤m) and Pj1 ,· · · ,Pjt are redundant paths in F W2 ’s all-match FDD. Thus, some paths
in {Pj1 ,· · · ,Pjt } cannot be identiﬁed as redundant paths by our protocol. This conclusion
violates the fact that our protocol can identify all redundant paths in F W2 ’s FDD.

99

4.4

Firewall Update After Optimization

If F W1 or F W2 changes after inter-ﬁrewall optimization, the inter-ﬁrewall redundant rules
identiﬁed by the optimization may not be inter-ﬁrewall redundant anymore. In this section,
we discuss our solution to address ﬁrewall update. There are ﬁve possible cases.
(1) Net1 changes the decisions of some rules from discard to accept in F W1 . In this case,
Net1 needs to notify Net2 that which non-overlapping rules (indices of these rules) from
F W1 are changed. Using this information, Net2 checks if there were any rules in F W2 that
were removed due to these rules, and then adds the aﬀected rules back into F W2 .
(2) Net1 changes the decisions of some rules from accept to discard in F W1 . In this case,
Net2 can run our cooperative optimization protocol again to identify more inter-ﬁrewall
redundant rules in F W2 .
(3) Net2 changes the decisions of some rules in F W2 . In this case, neither party needs to
take actions because the inter-ﬁrewall redundancy detection does not consider the decisions
of the rules in F W2 .
(4) Net1 adds or removes some rules in F W1 . In this case, since the resolving sets of some
rules in F W1 may change, a rule in F W2 that used to be inter-ﬁrewall redundant maybe not
redundant anymore. It is important for Net2 to run our optimization protocol again.
(5) Net2 adds or removes some rules in F W2 . Similar to the fourth case, since the resolving
sets of some rules in F W2 may change, it is important for Net2 to run our protocol again.

4.5
4.5.1

Security and Complexity Analysis
Security Analysis

To analyze the security of our protocol, we ﬁrst describe the commutative encryption and
its properties. Let Key denote a set of private keys and Dom denote a ﬁnite domain. A

100

commutative encryption f is a computable function f :Key×Dom→Dom that satisﬁes the
following four properties. (1) Secrecy: For any x and key K, given (x)K , it is computationally
infeasible to compute K. (2) Commutativity: For any x, K1 , and K2 , we have ((x)K1 )K2 =
((x)K2 )K1 . (3) For any x, y, and K, if x = y, we have (x)K = (y)K . (4) The distribution of
(x)K is indistinguishable from the distribution of x. Note that, for ease of presentation, in
the rest of this section, any x, y, or z is an element of Dom, any K, K1 , or K2 is an element
of Key, and (x)K denotes f (x, K).
In the conversion of F W1 , for each non-overlapping rule nr ′ from F W1 , let VFj (nr ′ ) denote
′
the preﬁx set for the ﬁeld Fj , e.g., in Figure 4.4(e), VF1 (nr1 ) denotes {00**, 0100}. In the

conversion of F W2 , let UFj denote the preﬁx set for the ﬁeld Fj after removing duplicate
preﬁxes, e.g., in Figure 4.5(e), UF1 denotes the ﬁrst preﬁx set. Our cooperative optimization
protocol essentially compares VFj (nr ′ ) and UFj in a privacy-preserving manner. We have
the following theorem.
Theorem 10. If both parties Net1 and Net2 are semi-honest, after comparing two sets
VFj (nr ′ ) and UFj using our protocol, Net1 learns only the size |UFj | and the intersection
VFj (nr ′ ) ∩ UFj , and Net2 learns only the size |VFj (nr ′ )| and the intersection VFj (nr ′ ) ∩ UFj .
Proof. According to the theorems in multi-party secure computation [9, 34], if we can prove
that the distribution of the Net1 ’s view of our protocol cannot be distinguished from a
simulation that uses only VFj (nr ′ ), VFj (nr ′ ) ∩ UFj , and |UFj |, then Net1 cannot learn
anything else except VFj (nr ′ ) ∩ UFj and |UFj |. Note that Net1 ’s view of our protocol is the
information that Net1 gains from F W2 .
Without loss of generality, we only prove that Net1 learns only the size |UFj | and the
intersection VFj (nr ′ ) ∩ UFj . The simulator for Net1 uses key K1 to create a set from
VFj (nr ′ ) and VFj (nr ′ ) ∩ UFj as follows

101

YS = {(x1 )K1 , · · · , (xm )K1 ,
xi ∈VF (nr′ )∩UF
j
j

zm+1 , · · · , zn

}

n−m=|VF (nr′ )−UF |
j
j

where zm+1 , · · · , zn are random values generated by the simulator and they are uniformly
distributed in the ﬁnite domain Dom. According to the theorems in [9], Net1 cannot distinguish the distribution of YS ’s elements from that in

YR = {(x1 )K1 , · · · , (xm )K1 , (xm+1 )K1 , · · · , (xn )K1 }
xi ∈VF (nr′ )∩UF
j
j

xi ∈VF (nr′ )−UF
j
j

The Net1 ’s view of our protocol corresponds to YR . Therefore, the distribution of the Net1 ’s
view of our protocol cannot be distinguished from this simulation.
Next, we analyze the information learned by Net1 and Net2 . After implementing our
protocol, Net1 knows the converted ﬁrewalls of F W1 and F W2 , e.g., Figure 4.4(j) and
Figure 4.5(g), and Net2 knows the comparison result, e.g., Figure 4.6(d). On Net1 side, for
each ﬁeld Fj (1≤j≤d), it knows only |UFj | and VFj (nr ′ )∩UFj , and it cannot reveal the rules
of F W2 for two reasons. First, in VFj (nr ′ )∩UFj , a numericalized preﬁx can be generated
from many diﬀerent numbers. For example, a preﬁx of IP addresses (32 bits) b1 b2 · · · bk ∗· · · ∗
can be generated from 232−k diﬀerent IP addresses. Second, even if Net1 ﬁnds the number
for a preﬁx in VFj (nr ′ )∩UFj , Net1 doesn’t know which rule in F W2 contains that number.
On Net2 side, it only knows that the preﬁx x in F (x) belongs to which non-overlapping
rules in F W1 . But such information is not enough to reveal the rules in F W1 .

4.5.2

Complexity Analysis

Let n1 and n2 be the number of rules in two adjacent ﬁrewalls F W1 and F W2 , respectively, and d be the number of ﬁelds in both ﬁrewalls. For simplicity, we assume that the

102

numbers in diﬀerent ﬁelds have the same length, say w bits. We ﬁrst analyze the computation, space, and communication costs for the conversion of F W1 . Based on the theorem in
[49], the maximum number of non-overlapping rules generated from the FDD is (2n1 −1)d .
Each non-overlapping rule consists of d w-bit intervals and each interval can be converted
to at most 2w−2 preﬁxes. Thus, the maximum number of preﬁxes generated from these
non-overlapping rules is d(2w−2)(2n1 −1)d . Note that the total number of preﬁxes cannot
exceed 2w+1 because Net1 puts all preﬁxes into one set. Thus, the computation cost of
encryption by Net1 is min (d(2w − 2)(2n1 − 1)d , 2w+1 ). Therefore, for the conversion of
F W1 , the computation cost of Net1 is min (O(dwnd), O(nd + 2w )), the space cost of Net1
1
1
is O(dwnd), the communication cost is min (O(dwnd), O(2w )), and the computation cost of
1
1
Net2 is min (O(dwnd), O(2w )). Similarly, for the conversion of F W2 , the computation cost of
1
Net2 is min (O(dwnd), O(nd + 2w d)), the space cost of Net2 is O(dwnd), the communication
2
2
2
cost is min (O(dwnd), O(2w d)), and the computation cost of Net1 is min (O(dwnd), O(2w d)).
2
2

4.6

Experimental Results

We evaluate the eﬀectiveness of our protocol on real ﬁrewalls and evaluate the eﬃciency of
our protocol on both real and synthetic ﬁrewalls. We implemented our protocol using Java
1.6.0. Our experiments were carried out on a PC running Linux with 2 Intel Xeon cores and
16GB of memory.

4.6.1

Evaluation Setup

We conducted experiments over ﬁve groups of two real adjacent ﬁrewalls. Each ﬁrewall
examines ﬁve ﬁelds, source IP, destination IP, source port, destination port, and protocol.
The number of rules ranges from dozens to thousands. In implementing the commutative
encryption, we used the Pohlig-Hellman algorithm [64] with a 1024-bit prime modulus and
103

160-bit encryption keys. To evaluate the eﬀectiveness, we conducted our experiments over
these ﬁve groups of adjacent ﬁrewalls. To evaluate the eﬃciency, for two ﬁrewalls in each
group, we measured the processing time, the comparison time, and the communication cost.
Due to security concerns, it is diﬃcult to obtain a large number of real adjacent ﬁrewalls.
To further evaluate the eﬃciency, we generated a large number of synthetic ﬁrewalls based
on Singh et al. ’s method [74]. The synthetic ﬁrewalls also examine the same ﬁve ﬁelds as
real ﬁrewalls. The number of rules in the synthetic ﬁrewalls ranges from 200 to 2000, and
for each number, we generated 10 synthetic ﬁrewalls. To measure the eﬃciency, we ﬁrst
processed each synthetic ﬁrewall as F W1 and then measured the processing time and communication cost of two parties. Second, we processed each synthetic ﬁrewall as F W2 and
measured the processing time and communication cost. Third, we measured the comparison
time for every two synthetic ﬁrewalls. We did not evaluate the eﬀectiveness of our protocol on synthetic ﬁrewalls because they are generated randomly and independently without
considering whether two ﬁrewalls are adjacent or not.

4.6.2

Methodology

In this section, we deﬁne the metrics to measure the eﬀectiveness of our protocol. Given
our ﬁrewall optimization algorithm A, and two adjacent ﬁrewalls F W1 and F W2 , we use
A(F W1 , F W2 ) to denote a set of inter-ﬁrewall redundant rules in F W2 . Let |F W | denote the number of rules in F W and |A(F W1 , F W2 )| denote the number of inter-ﬁrewall
redundant rules in F W2 .
β(A(F W1 , F W2 )) =

To evaluate the eﬀectiveness, we deﬁne a redundancy ratio

|A(F W1 ,F W2 )|
.
|F W2 |

This ratio β(A(F W1 , F W2 )) measures what percentage

of rules are inter-ﬁrewall redundant in F W2 .

104

4.6.3

Eﬀectiveness and Eﬃciency on Real Policies

Table 4.1 shows the redundancy ratios for 5 real ﬁrewall groups. Column 1 shows the names
of ﬁve real ﬁrewall groups. Columns 2 and 3 show the names of ﬁrewalls F W1 and F W2 ,
respectively. Column 4 shows the number of rules in ﬁrewall F W2 . Figure 5.10 shows the
processing time and communication cost of two parties Net1 and Net2 when processing
F W1 ; Figure 5.11 shows the processing time and communication cost of the two parties
when processing F W2 . Figure 4.11 shows the comparison time of each group.
Group
Econ
Host
Wan
Ath
Comp

F W1
Econ1
Host1
Wan1
Ath1
Comp1

F W2
Econ2
Host2
Wan2
Ath2
Comp2

|F W2 |
129
139
511
1308
3928

redundancy ratio
17.1%
49.6%
1.0%
14.4%
14.7%

Table 4.1. Redundancy ratios for 5 real ﬁrewall groups
Our protocol achieves signiﬁcant redundancy ratio on four real ﬁrewall groups. For 5 real
ﬁrewall groups, our protocol achieves an average redundancy ratio of 19.4%. Particularly, for
the ﬁrewall group Host, our protocol achieves 49.6% redundancy ratio, which implies that
almost half of rules in Host2 are inter-ﬁrewall redundant rules. For ﬁrewall groups Econ,
Ath, and Comp, our protocol achieves 14.4%-17.1% redundancy ratios, which implies that
about 15% of rules in F W2 are redundant in these three groups. Only for one ﬁrewall group
Wan, our protocol achieves 1.0% redundancy ratio. We observed that most adjacent real
ﬁrewalls have many inter-ﬁrewall redundant rules. Thus, our protocol can eﬀectively remove
inter-ﬁrewall redundant rules and signiﬁcantly improve network performance.
Our protocol is eﬃcient for processing and comparing two real ﬁrewalls. When processing
F W1 in the 5 real ﬁrewall groups, the processing time of Net1 is less than 2 seconds and the
processing time of Net2 is less than 1 second. When processing F W2 in those real ﬁrewall

105

1.8
Processing time (s)

1.6

Net1
Net2

1.4
1.2
1
0.8
0.6
0.4
0.2
0

Econ1 Host1 Wan1

Ath1 Comp1

(a) Processing time

Communication cost (KB)

60
50

Net1
Net2

40
30
20
10
0

Econ1 Host1 Wan1

Ath1 Comp1

(b) Communication cost
Figure 4.9. Processing F W1 on real ﬁrewalls

106

Processing time (s)

10
Net1
Net2
8
6
4
2
0

Econ2 Host2 Wan2

Ath2 Comp2

(a) Processing time

Communication cost (KB)

300
Net2
250
200
150
100
50
0

Econ2 Host2 Wan2

Ath2 Comp2

(b) Communication cost
Figure 4.10. Processing F W2 on real ﬁrewalls

107

groups, the processing time of Net1 is less than 4 seconds and the processing time of Net2
is less than 10 seconds. The comparison time of two ﬁrewalls is less than 0.07 seconds. The
total processing time of two parties is less than 15 seconds.

0.07
Searching time (s)

0.06
0.05
0.04
0.03
0.02
0.01
0

Econ Host Wan
Ath Comp
Five groups of real firewalls

Figure 4.11. Comparing two real ﬁrewalls
Our protocol is eﬃcient for the communication cost between two parties. When processing
ﬁrewall F W1 in the 5 real ﬁrewall groups, the communication cost from Net1 to Net2 and
that from Net2 to Net1 are less than 60 KB. Note that the communication cost from Net1
to Net2 and that from Net2 to Net1 are the same because Net1 and Net2 encrypt the
same number of values and the encrypted values have the same length, i.e., 1024 bits in our
experiments. When processing F W2 in those real ﬁrewall groups, the communication cost
from Net2 to Net1 is less than 300 KB. The total communication cost between two parties
is less than 500 KB, which can be sent through the current network (e.g., DSL network)
around 10 seconds.

108

1e+3

Processing time (s)

Net1
Net2
1e+2

1e+1

1e+0

1e−1

200 400 600 800 100012001400160018002000
Number of rules in FW1
(a) Ave. processing time

450
Communication cost (KB)

400

Net1
Net2

350
300
250
200
150
100
50
0

200 400 600 800 100012001400160018002000
Number of rules in FW1
(b) Ave. communication cost
Figure 4.12. Processing F W1 on synthetic ﬁrewalls
109

1e+3

Processing time (s)

Net1
Net2

1e+2

1e+1

1e+0

200 400 600 800 100012001400160018002000
Number of rules in FW2
(a) Ave. processing time

1600
Communication cost (KB)

Net2
1400
1200
1000
800
600
400
200
0

200 400 600 800 100012001400160018002000
Number of rules in FW2
(b) Ave. communication cost
Figure 4.13. Processing F W2 on synthetic ﬁrewalls
110

4.6.4

Eﬃciency on Synthetic Policies

For the synthetic ﬁrewalls, Figure 5.12 and Figure 5.13 show the average processing time and
communication cost of two parties Net1 and Net2 for processing F W1 and F W2 , respectively.
Figure 5.14 shows the average comparison time for every two synthetic ﬁrewalls. Note that
the vertical axis of two ﬁgures 5.12(a) and 5.13(a) are in a logarithmic scale.
Our protocol is eﬃcient for processing and comparing two synthetic ﬁrewalls. When processing the synthetic ﬁrewalls as F W1 , the processing time of Net1 is less than 400 seconds
and the processing time of Net2 is less than 5 seconds. When processing the synthetic ﬁrewalls as F W2 , the processing time of Net1 is less than 400 seconds and the processing time
of Net2 is less than 20 seconds. The comparison time of two synthetic ﬁrewalls is less than
4 seconds.
Our protocol is eﬃcient for the communication cost between two synthetic ﬁrewalls. When
processing the synthetic ﬁrewalls as F W1 , the communication cost from Net1 to Net2 and
that from Net2 to Net1 grow linearly with the number of rules in F W1 , and both costs are less
than 450 KB. Similarly, when processing synthetic ﬁrewalls as F W2 , the communication cost
from Net2 to Net1 grows linearly with the number of rules in F W2 , and the communication
cost from Net2 to Net1 is less than 1600 KB.

111

Searching time (s)

5
4
3
2
1

0
2000
1600
1200
800
400

# of rules in FW2

0

0

400

800

1200

1600

# of rules in FW1

Figure 4.14. Comparing two synthetic ﬁrewalls

112

2000

CHAPTER 5
Privacy Preserving Cross-Domain
Network Reachability Quantiﬁcation
5.1
5.1.1

Introduction
Background and Motivation

Network reachability quantiﬁcation is important for understanding end-to-end network behavior and detecting the violation of security policies. Several critical concerns like router
misconﬁguration, policy violations and service availability can be veriﬁed through an accurate quantiﬁcation. Network reachability for a given network path from the source subnet
to the destination subnet is deﬁned as the set of packets that are allowed by all network
devices on the path. Quantifying network reachability is a diﬃcult and challenging problem
for two reasons. First, various complex mechanisms, such as Access Control Lists (ACLs),
dynamic routing, and network address translation (NAT), have been deployed on network
devices for restricting network reachability. Therefore, to perform an accurate analysis, administrators need to collect all the reachability restriction information from these network

113

devices. Collecting such information could be very diﬃcult due to the privacy and security
concerns. Second, the explosion of the Internet has caused an increase in the complexity and
sophistication of these devices, thus, making reachability analysis computationally expensive
and error-prone.
The current practice of reachability management is still “trial and error” due to the lack
of network reachability analysis and quantiﬁcation tools. Such practice leads to signiﬁcant
number of conﬁguration errors, which has been shown to be the major cause of failure
for Internet services [61]. Industry research also shows that a signiﬁcant percentage of
human eﬀort and monetary resources are employed in maintaining the operational status
of the network [43]. Several critical business applications and sensitive communication are
aﬀected severely due to network outages caused by misconﬁguration errors. These events
place a tremendous amount of pressure on network operators to debug the problems quickly.
Thus, systematic analysis and quantiﬁcation tools of network reachability are needed for
understanding end-to-end network behavior and detecting conﬁguration errors.

5.1.2

Limitation of Prior Art

Performing network reachability analysis is a complex task that involves aggregating and
analyzing the reachability restriction information from all the devices along a given network
path.

The current practice of verifying reachability is to send probing packets. However,

probing has two major drawbacks. First, probing is expensive to quantify network reachability because it needs to generate and send signiﬁcant amount of packets. Second, probing
is inaccurate, e.g., it cannot probe the open ports with no server listening on them. Due
to these drawbacks of probing, many approaches were proposed to address the reachability
problem [12, 42, 44, 54, 75, 77]. The main assumption in all these approaches is that the
reachability restriction information of each network device and other conﬁguration state are

114

known to a central network analyst, who is quantifying the network reachability. However, in
reality, it is common that the network devices along a given path belong to diﬀerent parties
where the reachability restriction information cannot be shared with others including the
network analyst. Figure 5.1 shows a typical scenario of network reachability, where User1
wants to know what packets he can send to User2 through the given path. However, the
network devices deployed along this path belong to three diﬀerent parties, i.e., S1 and FW1
belong to Subnet1 , FW2 , FW3 , and R1 belong to ISP, FW4 and S2 belong to Subnet2 .

ISP

Subnet1

User1

S1

FW1

Firewall

FW2

R1

Router

Subnet2

FW3

FW4

S2

User2

Switches

Figure 5.1. An example of end-to-end network reachability
Keeping the reachability restriction information private is important for two reasons. First,
such information is often misconﬁgured and has security holes that can be exploited by
attackers if it is disclosed. In reality, most ﬁrewall policies have security holes [76]. Disclosing
ACLs allows attackers to analyze and utilize the vulnerabilities of subnets along a given
path. For example, if ACLs along a path from Subnet1 to Subnet2 do not block some
worm traﬃc, attackers can break into Subnet2 from Subnet1 . In practice, neither ISPs nor
private networks disclose their ACLs. Second, the reachability restriction information of a
network device contains private information, e.g., the ACLs of a network device contain the
IP addresses of servers, which can be used by attacker to launch more targeted attacks. If
such information of one device can be shared with other devices, an attacker needs to capture

115

only a single device (or a small subset) to know the security proﬁle of the entire network,
i.e., the sensitive information in the ACLs can be abused for gaining proﬁt or to disrupt
important services across the network. In practice, even within an organization, often no
employees other than the ﬁrewall administrators are allowed to access their ﬁrewall policies.

5.1.3

Cross-Domain Quantiﬁcation of Network Reachability

To our best knowledge, no prior work has been proposed to address the problem of privacypreserving network reachability quantiﬁcation. We proposed the ﬁrst privacy-preserving
protocol for quantifying network reachability for a given network path across multiple parties.
First, for the network devices belonging to each party, we convert the reachability restriction
information of these devices to an access control list (ACL) by leveraging the existing network
reachability quantiﬁcation tool [44]. This tool takes as the input the reachability restriction
information, including ACLs, all possible network transforms (e.g., NAT and PAT), and
protocol states (e.g., connection-oriented and state-less), and outputs an ACL. Note that
ACLs are the most important security component for network devices to ﬁlter the traﬃc.
Considering the example in Figure 5.1, Figure 5.2 shows the three resulting ACLs, A1 , A2 ,
and A3 , for Subnet1 , ISP, and Subnet2 , respectively. For ease of presentation, in the rest
of work, we use “ACL” to denote the resulting ACL converted from multiple ACLs as well
as other reachability restriction information in one party.

Second, we calculate the set of

packets that are accepted by all the resulting ACLs on the given network path in a privacypreserving manner. This calculation requires the comparison of the rules in all the ACLs,
which is complex and error-prone.
Our proposed cross-domain quantiﬁcation approach of network reachability can be very
useful for many applications. Here we give two examples. First, a global view of the network
reachability can help internet service providers (ISPs) to deﬁne better QoS policies. For

116

Subnet1

User1

A1

ISP

A2

Subnet2

A3

User2

Figure 5.2. Three resulting ACLs converted from Figure 5.1
example, the knowledge of the diﬀerent paths through which a particular type of traﬃc is
allowed by the ACLs can help the ISPs to maintain a rated list of the best-quality paths in
case of path failures.

Second, since the network reachability is crucial for many internet

companies, performing a privacy-preserving computation of the network reachability could
become a new business for the ISPs and other parties that involve in this computation.

5.1.4

Technical Challenges

There are three key challenges in the privacy preserving quantiﬁcation of network reachability. (1) It is computationally expensive. An ACL may consist of many rules, and each rule
consists of multiple ﬁelds. Therefore, comparing multiple ACLs with a large number of rules
can be quite expensive, even if only a few ACLs are involved in the process. Furthermore,
the complexity of comparison can be expensive due to overlapping rules resulting in many
comparisons. (2) Communication cost is high as even calculating the intersection of a small
number of ACLs is a tedious process and requires a number of messages to be exchanged
among diﬀerent parties. (3) Protecting the privacy of the ACL rules is crucial. Since a rule
has to be sent to other parties to enable comparison, it is necessary to propose a protocol
that will not reveal the rule but still allows the diﬀerent ACLs to calculate the intersection.

117

5.1.5

Our Approach

In this work, we propose the ﬁrst cross-domain privacy-preserving protocol for quantifying
network reachability. We consider n ACLs (n ≥ 2) in a given network path and each ACL
belongs to a distinct party. Starting with the destination ACL rules, our protocol calculates
the intersection of these rules with the rules of the adjacent ACL. Next, using the results
of this comparison, the adjacent ACL repeats the process with the next adjacent ACL until
the source ACL is reached. At this point, the source ACL obtains the intersection of the
rules from all ACLs along the given path. Brieﬂy, our protocol consists of three phases: ACL
preprocessing, ACL encoding and encryption, and ACL comparison.
In the ﬁrst phase, we transform all the ACLs into an equivalent representation, Firewall
Decision Diagram (FDD) [36], and then extract the non-overlapping rules with accept decisions. In the second phase, to perform privacy preserving comparison, we ﬁrst transform
the rules into a sequence of preﬁx numbers and then encrypt these numbers with secret
keys of diﬀerent parties. This phase enables diﬀerent parties to compute the intersection
of non-overlapping rules in their ACLs without revealing these rules. In the third phase,
the destination ACL computes the intersection of its non-overlapping rules with the rules
from its adjacent ACL, and then the adjacent ACL further repeats this computation with its
adjacent ACL until the source ACL is reached. Note that the comparison result of every two
adjacent ACLs is encrypted with multiple secret keys so that no party can reveal the comparison result independently. Finally, all the ACLs collaboratively decrypt the encrypted
intersection of the non-overlapping rules, but only the ﬁrst party (with the source ACL)
obtains the result. This intersection represents the set of packets that are allowed by all
ACLs on the given path.

118

5.1.6

Summary of Experimental Results

We performed extensive experiments over real and synthetic ACLs. Our experimental results
show that the core operation of our protocol is eﬃcient and suitable for real applications.
The online processing time of an ACL with thousands of rules is less than 25 seconds and
the comparison time of two ACLs is less than 5 seconds. The communication cost between
two ACLs with thousands of rules is less than 2100 KB.

5.1.7

Key Contributions

We make three key contributions. (1) We propose the ﬁrst cross-domain privacy-preserving
protocol to quantify network reachability across multiple parties. Our protocol can accurately compute the intersection of the rules among the ACLs along a given network path
without the need to share these rules across those parties. This is the ﬁrst step towards
privacy-preserving quantiﬁcation of network reachability and it can be extended to other
network metric measurements that are sensitive in nature. (2) We propose an optimization
technique to reduce computation and communication costs. It reduces the number of ACL
encryptions and the number of messages from O(n2 ) to O(n). (3) We conducted extensive
experiments on both real and synthetic ACLs and the result shows that our protocol is
eﬃcient and suitable for real applications.

5.2
5.2.1

Problem Statement and Threat Model
Problem Statement

We focus on quantifying the end-to-end network reachability for a given network path with
multiple network devices belonging to diﬀerent parties. The network devices are connected
with physical interfaces for ﬁltering outgoing packets and incoming packets. A network path
119

is a unidirectional path for transferring packets from the source to the destination. Along the
given network path, there are multiple ACLs and other restriction information for ﬁltering
these packets. Multiple ACLs and other restriction information may belong to the same
party. To convert them to a single ACL for one party, we ﬁrst employ the existing network
reachability approach, Quarnet [44], to convert them to reachability matrices. Second, we
use the query language of Quarnet to obtain a set of packets that can pass through the
given path within the party. Finally, we convert the set of packets to a single ACL. Without
loss of generality, in the rest of work, we use the term “ACL” to denote the resulting ACL
converted from multiple ACLs as well as other reachability restriction information in one
party. Given an ACL A, let M(A) denote the set packets that are accepted by A. Given a
network path with n ACLs A1 , A2 , ..., An for transferring packets from A1 to An , where Ai
belongs to the party Pi (1 ≤ i ≤ n), quantifying the network reachability is computing the
intersection among M(A1 ), · · · , M(An ), i.e., M(A1 ) ∩ M(A2 ) · · · ∩ M(An ).
In our context, we aim to design a privacy preserving protocol which enables the source
ACL A1 to compute the intersection of n ACLs (n ≥ 2), M(A1 )∩M(A2 ) · · ·∩M(An ) without
revealing rules in an ACL Ai (1 ≤ i ≤ n) to any other party Pj (j = i). We make the following
four assumptions. (1) The destination of the network path cannot be an intermediate network
device. In other words, the destination ACL An should ﬁlter the packets to end users but not
to another network device. (2) The source ACL A1 is not allowed to compute the intersection
of a subset of all ACLs along the given network path. Because A1 can easily reveal some rules
in one ACL Ai by three steps. First, compute M(A1 ) ∩ · · · ∩ M(Ai−1 ). Second, compute
M(A1 ) ∩ · · · ∩ M(Ai ). Third, compute M(A1 ) ∩ · · · ∩ M(Ai−1 ) − M(A1 ) ∩ · · · ∩ M(Ai ).
Note that if one party does not want to involve in this process or only wants to provide
part of its ACL rules, the party P1 can still run the protocol to compute network reachability
among the remain ACLs. This requirement is very important especially for the party who
really cares about the security of its private network, e.g., a bank who will not share with
120

other parties the information that what packets can enter into its private network.

5.2.2

Threat Model

We consider the semi-honest model, where each party follows our protocol correctly but it
may try to learn the ACL rules of other parties [35]. For example, the party P1 may use
the intermediate results to reveal the ACL rules of other parties. The semi-honest model is
realistic in our context because a malicious party cannot gain beneﬁts by providing a forged
ACL or not following our protocol.

5.3

Privacy-Preserving

Quantiﬁcation

of

Network

Reachability
To compute the network reachability from A1 to An , our privacy-preserving protocol consists
of three phases, ACL preprocessing, ACL encoding and encryption, and ACL comparison. In
the ﬁrst phase, ACL preprocessing, each party converts its ACL to a sequence of accepting
rules. The union of the matching sets of these accepting rules is equal to the set of packets
that are accepted by the ACL. In the second phase, ACL encoding and encryption, each
party encodes and encrypts each ﬁeld of its accepting rules for preserving the privacy of
its ACL. In the third phase, ACL comparison, all parties compare their ACLs and ﬁnally
the party P1 ﬁnds out the set of packets that are accepted by all ACLs. Particularly, Pn−1
compares the encoded and encrypted accepting rules from An−1 with those from An , and
ﬁnds out the multiple accepting rules whose union is equal to the intersection of M(An ) and
M(An−1 ), M(An ) ∩ M(An−1 ). Then, Pn−2 compares the accepting rules from ACL An−2
with the resulting accepting rules in the ﬁrst step, and ﬁnds out the multiple accepting rules
whose union is equal to M(An ) ∩ M(An−1 ) ∩ M(An−2 ). Repeat this step until P1 ﬁnds out

121

the multiple accepting rules whose union is equal to M(A1 ) ∩ · · · ∩ M(An ). Note that, the
resulting accepting rules of each step are in an encrypted format which prevents any party
from revealing these rules by itself. To reveal the ﬁnal accepting rules, P1 requires all other
parties to decrypt these rules with their private keys and then P1 decrypts these rules.
The basic problem of privacy-preserving network reachability is how to compute the intersection among multiple range rules belonging to diﬀerent parties in a privacy preserving
manner. This problem boils down to the problem of computing intersection of two ranges
[a, b] and [a′ , b′ ], denoted as [a, b] ∩ [a′ , b′ ]. Thus, we ﬁrst describe the privacy-preserving
protocol for computing [a, b] ∩ [a′ , b′ ], and then describe the three phases in our network
reachability protocol.

5.3.1

Privacy-Preserving Range Intersection

To compute the intersection of a range [a, b] from Ai and a range [a′ , b′ ] from Aj , our basic
idea is to check which range among [min, a−1], [a, b], and [b+1, max] includes a′ or b′ , where
min and max are the minimum and maximum numbers, respectively. Thus, the problem
of computing [a, b] ∩ [a′ , b′ ] boils down to the problem of checking whether a number is in
a range, e.g., a′ ∈ [min, a − 1], which can be solved by leveraging the preﬁx membership
veriﬁcation scheme in [48]. The idea of preﬁx membership veriﬁcation is to convert the
problem of checking whether a number is in a range to the problem of checking whether two
sets have common elements. Our scheme consists of six steps:
(1) The party Pi converts range [a, b] to three ranges [min, a−1], [a, b], and [b+1, max],
where min and max are the minimum and maximum numbers of the corresponding ﬁeld’s
domain, respectively. For example, [5,7] is converted to [0,4], [5,7], and [8,15], where 0 and
15 are the minimum and maximum numbers. Note that [min,a−1] and [b+1,max] may not
exist. If a=min, then [min, a−1] does not exist; if b=max, then [b+1,max] does not exist.

122

(2) The party Pi converts each range to a set of preﬁxes, whose union corresponds to the
range. Let S([min, a − 1]), S([a, b]), and S([b + 1, max) denote the resulting preﬁxes for the
three ranges, respectively. For example, S([5, 7])={0101, 011*}, where “*” denotes that this
bit can be 0 or 1.
(3) The party Pj generates the preﬁx families of a and b, denoted as F (a) and F (b). The
preﬁx family F (a) consists of a and all the preﬁxes that contains a. Assuming w is the bit
length of a, F (a) consists of w + 1 preﬁxes where the l-th preﬁx is obtained by replacing
the last l − 1 bits of a by ∗. For example, as the binary representation of 6 is 0110, we
have F (6)={0110, 011*, 01**, 0***, ****}. It is easy to prove that a′ ∈ [a, b] if and only if
F (a′ ) ∩ S([a, b]) = ∅.
(4) Two parties Pi and Pj convert the resulting preﬁxes to numbers so that they can
encrypt them in the next step. We use the preﬁx numericalization scheme in [20]. This
scheme basically inserts 1 before ∗s in a preﬁx and then replaces every ∗ by 0. For example,
01** is converted to 01100. If the preﬁx does not contain ∗s, we place 1 at the end of
the preﬁx. For example, 1100 is converted to 11001. Given a set of preﬁxes S, we use
N (S) to denote the resulting set of numericalized preﬁxes. Thus, a′ ∈ [a, b] if and only if
N (F (a′ )) ∩ N (S([a, b])) = ∅.
(5) Checking whether N (F (a′ ))∩N (S([a, b])) = ∅ is basically checking whether an element
from N (F (a′ )) is equal to an element from N (S([a, b])). We use commutative encryption
(e.g., [64, 67]) to do this checking. Given a number x and two encryption keys Ki and Kj , a
commutative encryption satisﬁes the property ((x)Ki )Kj = ((x)Kj )Ki , i.e., encryption with
Ki and then Kj is equivalent to encryption with Kj and then Ki . For ease of presentation, we
use (x)Kij to denote ((x)Ki )Kj . In our scheme, to check whether N (F (a′ )) ∩ N (S([a, b])) =
∅, Pi ﬁrst encrypts numbers in N (S([a, b])) with its private key Ki , then Pj further encrypts
them by its private key Kj and sends them back to Pi . Let N (S([a, b]))Kij denote the result.
Second, Pj encrypts numbers in N (F (a′ )) with Kj and then Pi encrypts them by Ki . Let
123

[5, 7]
Generate three ranges

[6, 15]

Construct prefix family

[0, 4], [5, 7], [8,15]
Convert ranges to prefixes

­0110, 011*½ ­1111, 111*½
°
°°
°
®01**, 0*** ¾, ®11**, 1*** ¾
°
° °****
°****
¿
¿¯
¯
Numericalize prefixes

­00** ½, ­0101½,{1***}
®0100¾ ®011*¾
¿
¿¯
¯
­01101, 01110½ ­11111, 11110½
°
°°
°
Numericalize the prefixes
®01100, 01000¾, ®11100, 11000¾
°10000
° °10000
°
¯
¿¯
¿
­00100½, ­01011½,{11000}
®01001¾ ®01110¾
Encrypt by P2
¿
¿¯
¯
Encrypt by P1
­(01101) K , (01110) K ½ ­(11111) K , (11110) K ½
2
2°°
2
2°
°
­(00100) K1 ½ ­(01011) K1 ½
,{(11000) K1 } ®(01100) K 2 , (01000) K 2 ¾, ®(11100) K 2 , (11000) K 2 ¾
,®
®(01001) ¾ (01110) ¾
°
° °(10000) K
°(10000) K
K1 ¿ ¯
K1 ¿
¯
2
2
¿
¿¯
¯
Encrypt by P2
Encrypt by P1
­(00100) K12 ½­(01011) K12 ½
­
½­
½
¾®
¾{(11000) K12 } (01101) K 21 (01110) K 21 °°(11111) K 21 (11110) K 21 °
®(01001)
°
K12 ¿¯(01110) K12 ¿
¯
®(01100) K 21 (01000) K 21 ¾®(11100) K 21 (11000) K 21 ¾
°(10000) K
°°(10000) K
°
21
21
¯
¿¯
¿

Figure 5.3. Privacy-preserving range intersection
F (a′ )

Kji

denote the result. Finally, Pi can check whether there is a common element in two

sets N (S([a, b]))Kij and F (a′ )Kji .
Through the previous steps, Pi knows that which range among [min, a − 1], [a, b], and
[b + 1, max] includes a′ or b′ . Based on this information, Pi can compute [a, b] ∩ [a′ , b′ ]. For
example, if a′ ∈ [min, a − 1] and b′ ∈ [a, b], [a, b] ∩ [a′ , b′ ] = [a, b′ ]. Note that a′ and b′ are in
the form of F (a′ )Kji and F (b′ )Kji . Pi cannot reveal a′ and b′ without knowing Pj ’s private
key Kj . Figure 5.3 illustrates the process of computing the intersection of [5, 7] (from A1 )
and [6, 15] (from A2 ).

5.3.2

ACL Preprocessing

In the ACL preprocessing phase, each party Pi (1≤i≤n) computes the set of packets M(Ai )
that are accepted by its ACL Ai . Pi ﬁrst converts its ACL to an equivalent sequence of
124

non-overlapping rules. Non-overlapping rules have an important property, that is, for any
two non-overlapping rules nr and nr ′ , the intersection of the two corresponding matching
sets is empty, i.e., M(nr)∩M(nr ′ )=∅. Thus, any packet p matches one and only one nonoverlapping rule converted from Ai and the decision of this non-overlapping rule is the
decision of Ai for the packet p. Therefore, instead of computing the set M(Ai ), Pi only
needs to retrieve all the non-overlapping accepting rules because the union of the matching
sets of these rules is equal to M(Ai ). The preprocessing of each Ai includes three steps:
(1) Pi converts its ACL Ai to an equivalent acyclic directed graph, called ﬁrewall decision
diagram (FDD) [36]. An FDD construction algorithm is presented in [49]. Figure 5.4(b)
shows the FDD constructed from Figure 5.4(a).

r1 : F1 [0, 4] F2 [7,15] o a
r2 : F1 [5, 7] F2 [5,15] o d
r3 : F1 [5, 7] F2 [0, 8] o a
r4 : F1 [0,15] F2 [0,15] o d
FDD construction

(a)

F1
[0, 4]
[0, 6]

F2

d

[7,15] [0,4]
a

[8, 15]

[5, 7]

a

F2
[5,8]
d

[9,15] [0,15]
d

F2

(b)

d

Extract non-overlapping
accept rules
nr1 : F1 [0, 4] F2 [7,15] o a
nr2 : F1 [5, 7] F2 [0, 4] o a

(c)

Figure 5.4. The Conversion of A1
(2) Pi extracts non-overlapping accepting rules from the FDD. We do not consider nonoverlapping discarding rules because the packets discarded by any ACL Ai cannot pass
through the path. Figure 5.4(c) shows the non-overlapping accepting rules extracted from
the FDD in Figure 5.4(b).
125

P1 (K1)

P2 (K2)

P3 (K3)

A1

A2

A3

nr1(1) : F1 [0, 3] o a
nr2 (1) : F1 [5, 7] o a

nr1( 2) : F1 [0, 2] o a
nr2 ( 2) : F1 [6, 15] o a

nr1(3) : F1 [0, 2] o a
nr2 (3) : F1 [4, 8] o a

Figure 5.5. The example three adjacent ACLs
After ACL preprocessing, the problem of privacy preserving quantiﬁcation of network
reachability boils down to the problem of privacy preserving range intersection, which is in
fact the basic problem in this work. Next, P1 needs to compare its accepting rules from A1
with those from other n−1 ACLs. Without loss of generality, in the next two subsections, we
use a simpliﬁed example in Figure 5.5 to show that how to compute the network reachability
among three ACLs. Each ACL has only one ﬁeld and the domain for the ﬁeld is [0,15]. Note
(i)

(i)

that in Figure 5.5, nr1 and nr2 denote two non-overlapping accepting rules for ACL Ai
(1≤i≤3). Obviously, the network reachability among these three ACLs can be denoted as
two accepting rules F1 ∈[0, 2]→a and F1 ∈[6, 7]→a. Next, we will show that how to compute
these two rules in a privacy preserving manner.

5.3.3

ACL Encoding and Encryption

In the ACL encoding and encryption phase, all parties need to convert their non-overlapping
accepting rules to another format such that they can collaboratively compute the network
reachability while one party cannot reveal the ACL of any other party. To achieve this
purpose, each party ﬁrst encodes its non-overlapping accepting rules and then encrypts
each ﬁeld of these rules. Recall the privacy-preserving range intersection scheme in Section
5.3.1, two parties employ diﬀerent encoding methods, one converts a range [a, b] to a set of
preﬁxes S([a, b]), and another converts a number a′ to its preﬁx family F (a′ ). Thus, in this
126

phase, there are two diﬀerent encoding and encryption methods for diﬀerent ACLs. Assume
that each party Pi (1≤i≤n) has a private key Ki . Each party Pj (1≤j≤n−1) encodes the
non-overlapping accepting rules from Aj by converting each range to a set of preﬁxes and
then encrypts each numericalized preﬁx by other parties Pj , Pj+1 , · · · , Pn . Let H denote
the encoding function used by the party Pj (1≤j≤n−1). Let F1 ∈ [a1 , b1 ] ∧ · · · ∧ Fd ∈
[ad , bd ] denote the predicate of an accepting rule over d ﬁelds. The encoding and encryption
result of this accepting rule for Aj is HKj···n ([a1 , b1 ]) ∧ · · · ∧ HKj···n ([ad , bd ]). The party
Pn encodes the non-overlapping accepting rules from An by converting each range to two
preﬁx families and then encrypts each numericalized preﬁx by itself. Let L denote the
encoding function used by the party Pn . Considering the above accepting rule, the result
is LKn ([a1 , b1 ]) ∧ · · · ∧ LKn ([ad , bd ]). We discuss the procedure of these two encoding and
encryption methods in detail as follows.
Encoding and Encryption of ACL Aj (1 ≤ j ≤ n − 1)
(1) For each non-overlapping accepting rule F1 ∈ [a1 , b1 ] ∧ · · · ∧ Fd ∈[ad , bd ], Pj converts
each range [al , bl ] (1 ≤l≤d) to three ranges [minl , al − 1], [al , bl ], [bl + 1, maxl ], where minl
and maxl are the minimum and maximum values for the l-th ﬁeld, respectively. Figure
5.6(b) shows the ranges generated from Figure 5.6(a).
(2) Pj converts each range to a set of preﬁxes. Figure 5.6(c) shows the preﬁxes generated
from Figure 5.6(b). That is, for the three ranges converted from [al , bl ], compute S([minl , al −
1]), S([al , bl ]), S([bl + 1, maxl ]).
(3) Pj unions all these preﬁx sets and permutes these preﬁxes. Figure 5.6(d) shows the
resulting preﬁx set. This step has two beneﬁts. First, it avoids encrypting and sending
duplicate preﬁxes, and hence, signiﬁcantly reduces the computation and communication
costs for the next two steps. Second, it enhances the security, any other parties except Pj
cannot reconstruct the non-overlapping accepting rules, because it is diﬃcult to correlate
the preﬁxes to their corresponding rules without the knowledge of the original ACL.
127

nr1(1) : F1 [0, 3] o a
nr2 (1) : F1 [5, 7] o a

(a)

Compute complementary ranges
nr1(1) : [0, 3],
nr2 (1) : [0, 4],

[4,15]
[5, 7], [8,15]

(b)

Convert ranges to prefixes
{010*, 1***}
nr1(1) : {00**},
(c)
(1)
nr2 : {00**, 0100},{0101, 011*},{1***}
Extract and permute the prefixes

­00**, 0100, 0101½
(d)
®011*, 1*** ,010* ¾
¯
¿
Numericalize the prefixes

­00100, 01001, 01011½
®01110, 11000, 01010¾
¯
¿
Encrypt by P1

(e)

­(00100) K1 , (01001) K1 , (01011) K1 ½
®(01110) , (11000) , (01010) ¾(f)
K1
K1
K1 ¿
¯
Encrypt by P2 and P3
­(00100) K123 , (01001) K123 , (01011) K123 ½
¾ (g)
®(01110)
K123 , (11000) K123 , (01010) K123 ¿
¯

Reconstruct rules
­(01010) K123 ½
nr1(1) : (00100) K123 , ®
¾
¯(11000) K123 ¿
­(00100) K123 ½ ­(01011) K123 ½
nr2 (1) : ®
¾, (11000) K123
¾, ®
¯(01001) K123 ¿ ¯(01110) K123 ¿

^

`

^

` (h)

Figure 5.6. Encoding and encryption of ACL A1

128

(4) Pj numericalizes and encrypts each preﬁx using Kj . Figure 5.6(e) and 5.6(f) show the
numericalized and encrypted preﬁxes.
(5) Pj sends the resulting preﬁxes to Pj+1 which further encrypts them with its private
key Kj+1 . Then, Pj+1 sends the result preﬁxes to Pj+2 which further encrypts them with
Kj+2 . This process is repeated until Pn encrypts them. Finally, Pn sends to Pj the resulting
preﬁxes that are encrypted n−j+1 times. Figure 5.6(f) shows the result after encrypting by
P1 and Figure 5.6(g) shows the result after encrypting by P2 and P3 .
(6) Pj reconstructs the non-overlapping accepting rules from the multiple encrypted preﬁxes because Pj knows which preﬁx belongs to which ﬁeld of which rule.
Based on the above steps, the encoding and encryption function used by Pj (1≤j≤n−1)
is deﬁned as HKj···n ([al , bl ]) = (N (S(minl , al − 1))Kj···n , N (S(al , bl ))Kj···n , N (S(bl +
1, maxl ))Kj···n ), where [al , bl ] is the range in the l-th ﬁeld of a rule in Aj . Figure 5.8(a)
illustrates the encoding and encryption result of ACL A2 in Figure 5.5. The only diﬀerence
between operations for A1 and A2 is that A1 ’s numericalized preﬁxes are encrypted by all
the three parties while A2 ’s numericalized preﬁxes are only encrypted by P2 and P3 .
Encoding and Encryption of ACL An
(1) For each range [al , bl ] of a non-overlapping rule, Pn generates two preﬁx families F (al )
and F (bl ). Figure 5.7(b) shows the result from Figure 5.7(a).
(2) Pn numericalizes and encrypts the preﬁxes using its private key Kn . Figure 5.7(c)
shows the resulting preﬁxes.
Based on these steps, the encoding and encryption function used by An is deﬁned as
LKn ([al , bl ]) = (N (F (al ))Kn , N (F (bl ))Kn )
where [al , bl ] is the range in the l-th ﬁeld of a rule.

129

nr1(3) : F1 [0, 2] o a
(a)
nr2 (3) : F1 [4, 8] o a
Compute prefix
families
nr1(3) : {F (0), F (2)}
(b)
nr2 (3) : {F (4), F (8)}
Numericalize and

encrypt by P3

nr1(3) : {N F (0) K , N F (2) K }
3
3 (c)
(3)
nr2 : {N F (4) K , N F (8) K }
3
3
Figure 5.7. Encoding and encryption of ACL A3

5.3.4

ACL Comparison

After the ﬁrst two phases, each party Pj (1≤j≤n−1) converts its ACL Aj to a sequence
of encrypted non-overlapping accepting rules and Pn converts its ACL An to a sequence
of encrypted numbers. Figure 5.8(a) shows the encrypted non-overlapping rules converted
from A2 in Figure 5.5 and Figure 5.8(b) shows the encrypted numbers converted from A3 in
Figure 5.5.
In the ACL comparison phase, we need to compare the n sequences of encrypted nonoverlapping accepting rules or encrypted numbers from every two adjacent ACLs. Without
loss of generality, we only present the comparison between An−1 and An . This comparison
includes four steps:
(1) Pn sends the resulting sequence to Pn−1 which further encrypts them with its private key Kn−1 . Let LKn ([a′ , b′ ]) ∧ · · · ∧ LKn ([a′ , b′ ]) denote the encoded and encrypted
1 1
d d
result of the accepting rule nr ′ from An . Pn−1 encrypts it with Kn−1 , i.e., computes
LK

n(n−1)

([a′ , b′ ]) ∧ · · · ∧ LK
1 1

n(n−1)

([a′ , b′ ]). Figure 5.8(c) shows the encrypted result from
d d

Figure 5.8(b).
(2) For each non-overlapping accepting rule nr from An−1 , Pn−1 computes nr∩nr ′ . Let
HK

(n−1)n

([a1 , b1 ]) ∧ · · · ∧ HK

(n−1)n

([ad , bd ]) denote the encoded and encrypted result of
130

nr. To compute nr∩nr ′ , for each ﬁeld l (1≤l≤d), Pn−1 compares HK

(n−1)n

([al , bl ]) with

LKn (n−1) ([a′ , b′ ]) where
l l
HK

(n−1)n

([al , bl ]) = (N (S(minl , al − 1))K

(n−1)n

,

N (S(al , bl ))K
LK

n(n−1)

([a′ , b′ ]) =
l l

, N (S(bl + 1, maxl ))K
)
(n−1)n
(n−1)n
(N (F (a′ ))K
, N (F (b′ ))K
).
l
l
n(n−1)
n(n−1)

According to the privacy-

preserving range intersection, to check whether a′ ∈ [minl , al − 1], Pn−1 checks whether
l
N (S(minl , al − 1))K
∩ N (F (a′ ))K
= ∅. Similarly, Pn−1 checks whether
l
(n−1)n
n(n−1)
a′ ∈ [al , bl ], a′ ∈ [bl + 1, maxl ], and whether b′ ∈ [minl , al −1], b′ ∈ [al , bl ], b′ ∈ [bl + 1, maxl ].
l
l
l
l
l
Based on the above result, Pn−1 computes the intersection between [al , bl ] and [a′ , b′ ], i.e.,
l l
[al , bl ]∩[a′ , b′ ]. Let Tl denote [al , bl ]∩[a′ , b′ ]. For example, if a′ ∈ [al , bl ] and b′ ∈ [bl +1, maxl ],
l l
l l
l
l
the condition al ≤ a′ ≤ bl < b′ holds and hence Tl = [a′ , bl ]. If for any Tl (1 ≤ l ≤ d) the
l
l
l
condition Tl = ∅ holds, then nr ∩ nr ′ = T1 ∧ · · · ∧ Td .
Note that the party Pn−1 cannot reveal a′ and b′ through this comparison because Pn−1
l
l
doesn’t know Pn ’s private key Kn . Thus, if Tl = [a′ , bl ], Pn−1 only knows N (F (a′ ))K
l
l
n(n−1)
, b }.
and bl . We denote the information that Pn−1 knows about Tl as {N (F (a′ ))K
l
n(n−1) l
Figure 5.8(d) shows the result after comparing A2 and A3 . Note that the result may contain
the numbers from An−1 ’s non-overlapping accepting rules, which are not encrypted, e.g., the
number 6 in Figure 5.8(d).
(3) To preserve the privacy of An−1 , the party Pn−1 encodes and encrypts the numbers
from An−1 ’s non-overlapping accepting rules, and then sends the result to Pn . For example,
. Figure 5.8(e) shows the result after
for a number al , Pn−1 computes N (F (al ))K
(n−1)
encoding and encrypting 6.
(4) To facilitate the next comparison with An−2 , Pn−1 sends the comparison result to
Pn and then Pn encrypts the numbers from An−1 ’s non-overlapping accepting rules. Figure
5.8(f) shows the encryption result.
Repeat these four steps to further compare An−2 with the result stored in Pn . After
131

­(00111)

½

K 23 °
­(00010) K 23 ½ °
nr1( 2) : ®
¾, ®(01100) K 23 ¾
¯(00101) K 23 ¿ °(11000) K °
23 ¿ (a)
¯
( 2) ­(00100) K 23 ½ ­(01110) K 23 ½
nr1(3) : {N F (0) K , N F (2) K }
nr2 : ®
3
3 (b)
¾
¾, ®(11000)
K 23 ¿
¯(01010) K 23 ¿ ¯
nr2 (3) : {N F (4) K , N F (8) K }
3
3

Encrypt by P2

nr1(3) : {N F (0) K , N F (2) K }
32
32
(c)
nr2 (3) : {N F (4) K ,N F (8) K }
32
32

Compute intersection
{N F (0) K , N F (2) K }
32
32
(d)
{6,
N F (8) K }
32
Encoding and encrypt

numbers from A2 by P2
{N F (0) K ,N F (2) K }
32
32
(e)
{N F (6) K , N F (8) K }
2

32

Encrypt numbers from A2 by P3
{N F (0) K , N F (2) K }
32
32 (f)
{N F (6) K , N F (8) K }
23

32

Figure 5.8. Comparison of ACLs A2 and A3
all parties ﬁnish the comparison, Pn has the comparison result.

Let {N (F (al ))K1···n ,

N (F (bl ))K1···n } denote the l-th ﬁeld of an encrypted rule in the result. To decrypt this
rule, Pn ﬁrst decrypts it with Kn and sends to Pn−1 . Then, Pn−1 decrypts it with Kn−1
and sends to Pn−2 . Repeat this step until P1 decrypts the rule. Finally, P1 have the
result F1 ∈[a1 , b1 ]∧· · · ∧Fd ∈[ad , bd ]. The comparison result of three ACLs in Figure 5.5 is
{N (F (0))K123 , N (F (2))K123 }, and {N (F (6))K123 , N (F (7))K123 }. Figure 5.9 shows the
decryption process of the comparison result.

132

{N F (0) K
,N F (2) K }
321
321
{N F (6) K
, N F ( 7 ) K }
231

123

Decrypt by P3
{N F (0) K ,N F (2) K }
21
21
{N F (6) K , N F (7) K }
Decrypt by P2
{N F (0) K , N F (2) K }
1
1
{N F (6) K , N F (7) K }
1

21

12

1

Decrypt by P1
{N F (0) , N F (2) }
{N F (6) , N F (7) }
Recover the original
value by P1
[0, 2]
[6, 7]

Figure 5.9. Decryption process of the comparison result

5.4
5.4.1

Security and Complexity Analysis
Security Analysis

The security of our protocol is based on the two important properties of the commutative
encryption. (1) Secrecy: for any x and key K, given (x)K , it is computationally infeasible
to compute K. (2) Indistinguishability: the distribution of (x)K is indistinguishable from
the distribution of x. Based on the ﬁrst property, without knowing Pj ’s secret key Kj , the
party Pi (i = j) cannot decrypt the encrypted numbers from Pi . Furthermore, one party
Pi cannot statistically analyze encrypted numbers from Aj (i = j) because each party Pj
(1 ≤ j ≤ n − 1) unions the encrypted preﬁx numbers into one set before sending them to Pi
for further encryption. Therefore, after the ﬁrst and second phases of our protocol (i.e., ACL
preprocessing and ACL encoding and encryption), the party Pi cannot reveal the ACL of any
other party. Based on the second property, we can prove that after the third phase (i.e., ACL
comparison), the party Pi only learns the limited information of the ACLs of other parties,

133

but such information cannot help it reveal them. Without loss of generality, we consider
the comparison between ACLs An−1 and An . For each non-overlapping rule nr from An−1 ,
let VF ,h (nr) denote the h-th (1 ≤ h ≤ 3) preﬁx set for the ﬁeld Fl (1 ≤ l ≤ d), e.g., in
l
(2)

Figure 5.8(a), VF ,1 (nr1 ) denotes {00010, 00101}. Let N (F (al )) denote one set of preﬁxes
l
for the ﬁeld Fl , e.g., N (F (0)) in Figure 5.8(c). The basic operation of the third phase is
to compare whether two sets from diﬀerent ACLs, e.g., VF ,h (nr) and N (F (al )), have a
l
common element. According to the theorems in multi-party secure computation [9, 34] and
the theorem in [21], we can prove that after the three phases, the party Pn−1 only learns
VF ,h (nr) ∩ N (F (al )) and the size of N (F (al )).
l
To prove this claim, based on the theorems of multi-party secure computation [9, 34],
we only need to prove that the distribution of the Pn−1 ’s view of our protocol cannot be
distinguished from a simulation that uses only VF ,h (nr), VF ,h (nr) ∩ N (F (al )), and the size
l
l
of N (F (al )). The theorem in [21] proves that Pn−1 ’s view of our protocol
YR = {(x1 )Kn−1 , · · · , (xm )Kn−1 , (xm+1 )Kn−1 , · · · , (xt )Kn−1 }
xi ∈VF ,h (nr)∩N (F (al ))
l

xi ∈VF ,h (nr)−N (F (al ))
l

cannot be distinguished from the simulation
YS = {(x1 )Kn−1 , · · · , (xm )Kn−1 ,
xi ∈VF ,h (nr)∩N (F (al ))
l

zm+1 , · · · , zt

}

t−m=|VF ,h (nr)−N (F (al ))|
l

where zm+1 , · · · , zt are random values and uniformly distributed in the domain of encrypted
numbers.
Knowing VF ,h (nr)∩N (F (al )) and the size of N (F (al )), Pn−1 cannot reveal the rules in An
l
for two reasons. First, a numericalized preﬁx in VF ,h (nr) ∩ N (F (al )) can be generated from
l
many numbers. Considering a numericalized preﬁx of 32-bit IP addresses b1 b2 · · · bk 10 · · · 0,
the number of possible IP addresses that can generate such preﬁx is 232−k . Furthermore,
after the comparison, Pn−1 sends to Pn the comparison result which is encrypted with
134

Pn−1 ’s secret key Kn−1 . Without knowing Kn−1 , Pn cannot reveal the comparison result,
and hence, cannot reveal the values from An−1 . Second, the size of N (F (al )) cannot be used
to reveal the rules in An because for any al or bl in the ﬁeld Fl of An , the size of N (F (al ))
or N (F (bl )) is constant.
At the end of our protocol, only P1 knows the intersection of n ACLs, which includes
some information (i.e., numbers) from other ACLs. However, our goal is to preserve the
privacy of ACLs, not the privacy of the intersection result. Knowing such numbers cannot
help P1 to reveal an ACL rule of other parties for two reasons. First, a real ACL typically
consists of hundreds of rules and no one consists of only one rule. Second, P1 does not know
which numbers belong to Aj (2 ≤ j ≤ n), which two numbers form an interval, and which
d intervals form a rule in Aj . The number of possible combinations can be extremely large.
Considering the intersection in Figure 5.9, P1 cannot know which ACL, A2 or A3 , contains
the number 2 or the number 6.

5.4.2

Complexity Analysis

In this section, we analyze the computation, space, and communication costs in our protocol. Let mi be the number of rules in ACL Ai (1 ≤ i ≤ n) and d be the number of
ﬁelds. For ease of presentation, assume that diﬀerent ﬁelds have the same length, i.e., w
bits. We ﬁrst analyze the complexity of processing ACLs A1 , · · · , An−1 and then analyze
the complexity of processing ACL An . The maximum number of non-overlapping rules
generated from the FDD is (2mi −1)d [49]. Each non-overlapping rule consists of d w-bit
intervals, each interval can be converted to at most three ranges, and each range can be
converted to at most 2w−2 preﬁxes [39]. Thus, the maximum number of preﬁxes generated
from these non-overlapping rules is 3d(2w−2)(2mi −1)d . Recall that Pi (1 ≤ i ≤ n − 1)
unions all preﬁxes into one set. Then, the number of preﬁxes cannot exceed 2w+1 . There-

135

fore, for processing Ai (1 ≤ i ≤ n − 1), the computation cost of encryption by Pi , · · · ,
Pn is min (3d(2w − 2)(2mi − 1)d , 2w+1 ) = min (O(dwmd), O(2w )), the space cost of Pi is
i
O(dwmd), and the communication cost is min (O(dwmd), O(2w )). For processing Pn , each
i
i
interval of the non-overlapping rules is converted to two preﬁx families and each preﬁx family includes w + 1 preﬁxes. Thus, the maximum number of preﬁxes converted from An is
2d(w+1)(2mn−1)d . Therefore, for processing An , the computation, space, and communication costs of Pn is O(dwmd ).
n

5.5

Optimization

To reduce the computation and communication costs, we divide the problem of computing
reachability of n ACLs to the problem of computing reachability of two ACLs. Then the
intermediate results are aggregated hierarchically to obtain ﬁnal reachability result. Let Qi
(1 ≤ i ≤ n) denote the set of non-overlapping accepting rules from ACL Ai . In the ACL
encoding and encryption phase, Qi is encrypted n − i + 1 times, i.e., encrypted by Pi , Pi+1 ,
· · · , Pn . Thus, the number of encryptions for Q1 , · · · , Qn is n + (n − 1) + · · · + 1 = O(n2 ).
Similarly, the number of messages in this phase is O(n2 ). To reduce the the number of
encryptions and messages, we ﬁrst divide n ACLs into ⌊n/2⌋ groups. The j-th (1 ≤ j ≤
⌊n/2⌋) group includes two adjacent ACLs A2j−1 and A2j . The last group includes adjacent
ACLs A2⌊n/2⌋−1, · · · , An . For example, 5 ACLs can be divided into two groups {A1 , A2 } and
{A3 , A4 , A5 }. Second, for the ACLs in each group, we run the proposed protocol to compute
the network reachability. The result for each group is actually a new set of non-overlapping
accepting rules. Therefore, we obtain ⌊n/2⌋ sets of non-overlapping accepting rules. Repeat
these two steps until we obtain the network reachability for all n ACLs. Through this
process, there are ⌊n/2⌋ + ⌊n/22 ⌋ + ... + 1 = O(n) groups, and for each group with two
ACLs, the number of ACL encryptions and messages is 2 + 1 = 3. Thus, the number of ACL

136

encryptions and messages is reduced from O(n2 ) to O(n).

5.6

Experimental Results

We evaluated the eﬃciency and eﬀectiveness of our protocol on 10 real ACLs and 100 synthetic ACLs. Both real and synthetic ACLs examine ﬁve ﬁelds, source IP, destination IP,
source port, destination port, and protocol type. For real ACLs, the number of rules ranges
from hundreds to thousands, and the average number of rules is 806. Due to security concerns, it is diﬃcult to obtain many real ACLs. Thus, we generated a large number of
synthetic ACLs based on Singh et al. ’s technique [74]. For synthetic ACLs, the number of
rules ranges from 200 to 2000, and for each number, we generated 10 synthetic ACLs. In
implementing the commutative encryption, we used the Pohlig-Hellman algorithm [64] with
a 1024-bit prime modulus and 160-bit encryption keys. Our experiments were implemented
using Java 1.6.0.
To evaluate the eﬀectiveness, we veriﬁed the correctness of our protocol because we knew
all the ACLs in the experiments. The results show that our protocol is deterministic and
accurate with the given ACLs. Thus, in this section, we focus on the eﬃciency of our protocol.
Recall that processing ACL Ai (1≤i≤n−1) is diﬀerent from processing the last destination
ACL An . Therefore, we evaluate the computation and communication costs of the core
operations of our protocol, processing ACL Ai (1≤i≤n−1), processing the destination ACL
An , and comparing Ai and An . Knowing this performance, we can easily estimate time and
space consumption for a given network path with n ACLs belonging to n parties.

5.6.1

Eﬃciency on Real ACLs

Our protocol is eﬃcient for processing real ACL Ai (1 ≤ i ≤ n − 1). Figure 5.10(a) shows for
processing Ai the computation cost of Pi and the average computation cost of other parties
137

Pi+1 , · · · , Pn . The computation cost of Pi is less than 2 seconds and the computation cost
of Pj (i + 1 ≤ j ≤ n) is less than 1.5 seconds. Note that, for processing Ai , the computation
cost of Pi is one-time oﬄine cost because Pi knows Ai , while the computation cost of Pj
(i + 1 ≤ j ≤ n) is online cost. Figure 5.10(b) shows the average communication cost between
any two adjacent parties Pj and Pj+1 (i ≤ j ≤ n) for processing ACL Ai , which is less than
60 KB. Note that, the computation costs of diﬀerent parties Pj (i + 1 ≤ j ≤ n) are similar
because they encrypt the same number of preﬁxes from Ai . Hence, we only show the average
computation cost of parties Pi+1, · · · , Pn . Similarly, the communication costs between every
two adjacent parties Pj and Pj+1 are the same.
Our protocol is eﬃcient for processing real ACL An . Figure 5.11(a) shows that for processing An , the computation cost of Pn and the average computation cost of other parties.
The computation cost of Pn is less than 10 seconds. The average computation cost of other
parties is less than 6 seconds. Similarly, for processing An , the computation cost of Pn
is one-time oﬄine cost, while the computation costs of other party is online cost. Figure
5.11(b) shows the average communication cost between Pn and Pi (1 ≤ i ≤ n − 1), which is
less than 410 KB.
Our protocol is eﬃcient for real ACL comparison. The comparison time between two
ACLs is less than 1 second, which is much less than the computation cost of processing
ACLs. Because the commutative encryption is more expensive than checking whether two
sets have a common element.

5.6.2

Eﬃciency on Synthetic ACLs

To further evaluate the eﬃciency, we executed our protocol over every 10 synthetic ACLs
with the same number of rules, and then measured the computation and communication costs
of operations on synthetic ACLs A1 to An for diﬀerent parties. Particularly, we measured

138

Computation cost (s)

1.8
1.6

Pi (1 ≤ i ≤ n−1)
Pj (i+1 ≤ j ≤ n)

1.4
1.2
1
0.8
0.6
0.4
0.2
0

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
Real ACL
(a) Computation cost

Communication cost (KB)

60
50
40
30
20
10
0

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
Real ACL
(b) Communication cost

Figure 5.10. Comp. & comm. costs for processing real ACL Ai (1≤i≤n−1)

139

Computation cost (s)

10
8

Pn
Pi (1 ≤ i ≤ n−1)

6
4
2
0

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
Real ACL
(a) Computation cost

Communication cost (KB)

450
400
350
300
250
200
150
100
50
0

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
Real ACL
(b) Communication cost

Figure 5.11. Comp. & comm. costs for processing real ACL An

140

computation and communication costs for processing each synthetic ACL Ai (1 ≤ i ≤ n−1),
processing synthetic ACL An , and the comparison time for every two ACLs.
For processing synthetic ACL Ai (1 ≤ i ≤ n − 1), Figure 5.12(a) shows the computation
cost of Pi and the average computation cost of parties Pi+1 , · · · , Pn and Figure 5.12(b)
shows the average communication cost between Pj and Pj+1 (i ≤ j ≤ n). The one-time
oﬄine computation cost (i.e., the computation cost of Pi ) is less than 400 seconds, and the
online computation cost (i.e., the average computation cost of other parties Pj ) is less than
5 second. The average communication cost between any two adjacent parties Pj and Pj+1
is less than 450 KB.
For processing synthetic ACL An , Figure 5.13(a) shows the computation cost of Pn and
the average computation cost of other parties and Figure 5.13(b) shows the average communication cost between Pn and Pi (1 ≤ i ≤ n − 1). The one-time oﬄine computation cost
(i.e., the computation cost of Pn ) is less than 550 seconds, and the online computation cost
(i.e., the average computation cost of other parties P1 , · · · , Pn−1 ) is less than 25 seconds.
The average communication cost between Pn and Pi is less than 2100 KB.
Figure 5.14 shows the average comparison time for every two synthetic ACLs. The comparison time between any two synthetic ACLs is less than 5 seconds, which is much more
eﬃcient than processing ACLs.

141

1e+3

Computation cost (s)

Pi (1 ≤ i ≤ n−1)
Pj (i+1 ≤ j ≤ n)
1e+2

1e+1

1e+0

1e−1

200 400 600 800 100012001400160018002000
Number of rules in Ai
(a) Ave. computation cost

450
Communication cost (KB)

400
350
300
250
200
150
100
50
0

200 400 600 800 100012001400160018002000
Number of rules in Ai
(b) Ave. communication cost

Figure 5.12. Comp. & comm. costs for processing synthetic ACL Ai (1≤i≤n−1)
142

1e+3

Computation cost (s)

Pn
Pi (1 ≤ i ≤ n−1)

1e+2

1e+1

1e+0

200 400 600 800 100012001400160018002000
Number of rules in An
(a) Ave. computation cost

Communication cost (KB)

2500

2000

1500

1000

500

0

200 400 600 800 100012001400160018002000
Number of rules in An
(b) Ave. communication cost

Figure 5.13. Comp. & comm. costs for processing synthetic ACL An
143

Searching time (s)

5
4
3
2
1

0
2000
1600
1200
800
400

# of rules in An

0

0

400

800

1200

1600

# of rules in Ai

Figure 5.14. Comparison time of synthetic ACLs Ai and An

144

2000

CHAPTER 6
Related Work
6.1

Secure Multiparty Computation

One of the fundamental cryptographic primitives for designing privacy-preserving protocols
is secure multiparty computation, which was ﬁrst introduced by Yao with the famous “TwoMillionaire Problem” [79]. A secure function evaluation protocol enables two parties, one
with input x and the other with y, to cooperatively compute a function f (x, y) without
disclosing one party’s input to the other. The classical solutions are Yao’s “garbled circuits”
protocol [80] and Goldreich’s protocol [60]. Other related work is privacy preserving set
operations, which enables n parties, each party with its private set si , to collaboratively
compute the intersection of all sets, s1 ∩ · · · ∩ sn , without disclosing more information of
one party’s private set beyond the intersection to other parties [33, 45, 68]. Although we
could apply these solutions to solve the problem of privacy preserving network reachability,
the computation and communication costs of these solutions is prohibitive due to the unnecessary requirement of secure multiparty computation for our problems. Secure multiparty
computation requires every party to know the result, while in our problems only one party
knows the result. This diﬀerence signiﬁcantly aﬀects computation and communication costs.

145

6.2

Privacy and Integrity Preserving in WSNs

Privacy and integrity preserving range queries in wireless sensor networks (WSNs) have
drawn people’s attention recently [69, 73, 84]. Sheng and Li proposed a scheme to preserve
the privacy and integrity of range queries in sensor networks [69]. This scheme uses the
bucket partitioning idea proposed by Hacigumus et al. in [40] for database privacy. The
basic idea is to divide the domain of data values into multiple buckets, the size of which is
computed based on the distribution of data values and the location of sensors. In each time
slot, a sensor collects data items from the environment, places them into buckets, encrypts
them together in each bucket, and then sends each encrypted bucket along with its bucket
ID to a nearby storage node. For each bucket that has no data items, the sensor sends an
encoding number, which can be used by the sink to verify that the bucket is empty, to a
nearby storage node. When the sink wants to perform a range query, it ﬁnds the smallest
set of bucket IDs that contains the range in the query, then sends the set as the query to
storage nodes. Upon receiving the bucket IDs, the storage node returns the corresponding
encrypted data in all those buckets. The sink can then decrypt the encrypted buckets and
verify the integrity using encoding numbers. S&L scheme only considered one-dimensional
data in [69] and it can be extended to handle multi-dimensional data by dividing the domain
of each dimension into multiple buckets.
S&L scheme has two main drawbacks, which are inherited from the bucket partitioning
technique. First, as pointed out in [41], the bucket partitioning technique allows compromised
storage nodes to obtain a reasonable estimation on the actual value of both data items and
queries. In comparison, in SafeQ, such estimations are very diﬃcult. Second, for multidimensional data, the power consumption of both sensors and storage nodes, as well as the
space consumption of storage nodes, increases exponentially with the number of dimensions
due to the exponential increase of the number of buckets. In comparison, in SafeQ, the power

146

and space consumption increases linearly with the number of dimensions times the number
of data items.
Shi et al. proposed an optimized version of S&L’s integrity preserving scheme aiming to
reduce the communication cost between sensors and storage nodes [73, 84]. The basic idea
of their optimization is that each sensor uses a bit map to represent which buckets have
data and broadcasts its bit map to the nearby sensors. Each sensor attaches the bit maps
received from others to its own data items and encrypts them together. The sink veriﬁes
query result integrity for a sensor by examining the bit maps from its nearby sensors. In
our experiments, we did not choose the solutions in [73, 84] for side-by-side comparison for
two reasons. First, the techniques used in [73, 84] are similar to S&L scheme except the
optimization for integrity veriﬁcation. The way they extend S&L scheme to handle multidimensional data is to divide the domain of each dimension into multiple buckets. They
inherit the same weakness of allowing compromised storage nodes to estimate the values
of data items and queries with S&L scheme. Second, their optimization technique allows
a compromised sensor to easily compromise the integrity veriﬁcation functionality of the
network by sending falsiﬁed bit maps to sensors and storage nodes. In contrast, in S&L and
our schemes, a compromised sensor cannot jeopardize the querying and veriﬁcation of data
collected by other sensors.

6.3

Privacy and Integrity Preserving in DAS

Database privacy has been studied extensively in both database and security communities
(e.g., [40, 41, 10, 16, 31, 72, 17]). Based on whether query results include false positives,
i.e., data items that do not satisfy the queries but are included in the query results, we
can classify these schemes into the following two categories: approximate privacy-preserving
schemes [40, 41] and precise privacy-preserving schemes [10, 16, 31, 72, 17]. The approximate

147

privacy-preserving schemes reply query results with false positives but they are more eﬃcient,
while the precise privacy-preserving schemes reply query results without false positives but
they are more expensive. Next, we discuss these two categories of privacy-preserving schemes.
Approximate Privacy-Preserving Schemes
Hacigumus et al. ﬁrst proposed the bucket partitioning idea for querying encrypted data in
the database-as-service model (DAS) [40]. The basic idea is to divide the attribute domains
into multiple buckets and then map bucket ids to random numbers for preserving privacy.
Later, Hore et al. explored the optimal partitioning of buckets [41]. However, as pointed out
in [41], bucket partitioning incurs a tradeoﬀ between privacy and eﬃciency. If the bucket sizes
are large, less privacy information is leaked, but query results include more false positives;
if the bucket sizes are small, more privacy information is leaked, but query results include
less false positives. In contrast, our scheme is a precise privacy-preserving scheme, which
returns query results without false positives. Furthermore, it leaks only the minimum privacy
information for any possible precise privacy-preserving scheme, which will be discussed in
Section 3.3.4.
Precise Privacy-Preserving Schemes
Previous work on order-preserving hash functions [31] and order-preserving encryptions
[10, 16] can be employed for constructing precise privacy-preserving schemes in cloud computing. Fox et al. proposed an order-preserving minimal prefect hash function (OPMPHF)
for a domain with N possible values [31]. Agrawal et al. proposed an order-preserving encryption (OPES) [10]. The basic idea of OPES is to transform data items to diﬀerent values
such that the transformed values preserve the order of the data items without disclosing
the privacy of the data items to cloud providers. Speciﬁcally, this scheme ﬁrst divides the
data domain into multiple buckets, i.e., m buckets, computes a transformation polynomial
function for each bucket, and then applies the corresponding transformation function to the

148

data items in each bucket. However, for z-dimensional data items, the OPMPHF function
requires O(zN log N) shared secret information between the organization and its customers,
and the OPES encryption requires O(zm) shared secret information. In contrast, our orderpreserving hash-based function only requires O(z) shared secret information.
Boneh & Waters proposed a public-key system for supporting conjunctive, subset,
and range queries on encrypted data [17].

Although theoretically it seems possible,

Boneh&Waters’ scheme cannot be used to solve the privacy problem in our context because
it is too computationally expensive for cloud computing. It would require an organization to perform O(zN) encryption for submitting data to a cloud provider, where z is the
number of dimensions and N is the domain size (i.e., the number of all possible values) of
each dimension. Here N could be large and each encryption is expensive due to the use of
public-key cryptography. Shi et al. proposed another public-key system for supporting multidimensional range queries on encrypted data [72]. However, this scheme has two drawbacks
in cloud computing. First, it is not practical to require an organization to stay online after it
outsourced the data to cloud providers. In Shi et al. ’s scheme, for each distinct range query,
a customer needs to retrieve a diﬀerent decryption key from the organization. Second, it is
not eﬃcient to process queries. In order to process a query, a cloud provider needs to scan
all the encrypted data from an organization. In contrast, our privacy-preserving scheme
not only can be computed eﬃciently due to the use of the hash function and symmetric
encryption, but also enables cloud providers to perform binary searches to process queries.
Database integrity has also been explored in prior work [27, 63, 62, 59, 24, 22], independent
of database privacy. Merkle hash trees have been used for the authentication of data items
[58] and they were used for verifying the integrity of range queries in [27, 63]. However, it is
diﬃcult to extend Merkle hash trees for supporting multi-dimensional range queries. Pang
et al. [62] and Narasimha & Tsudik [59] proposed similar schemes for verifying the integrity
of range query results using signature aggregation and chaining. For each data item, Pang
149

et al. computed the signature of the data item by signing the concatenation of the digests
of the data item itself as well as its left and right neighbors [62]. Narasimha & Tsudik
computed the signature by signing the concatenation of the digests of the data item and
its left neighbors along each dimension [59]. However, signature aggregation and chaining
requires a cloud provider to reply to the customer the boundary data items of the query that
do not satisfy the query.
Chen et al. proposed Canonical Range Trees (CRTs) to store the counting information
for multi-dimensional data such that the counting information can be used for integrity
veriﬁcation without leaking boundary data items of the query [22]. However, we argue that
the most important requirement in cloud computing is to preserve data privacy. CRTs
contain a lot of privacy information. Thus, we need to preserve the privacy of CRTs. As we
discussed before, preserving privacy of relational databases is already diﬃcult. Preserving
privacy of CRT trees is much more diﬃcult and no scheme has been proposed. Furthermore,
their scheme requires an organization to send a multi-dimensional CRT with O(n logz N)
overhead to a cloud provider, where n is the number of data items and N is the domain size of
each dimension. Therefore, it incurs too much communication cost between an organization
and a cloud provider.

6.4

Firewall Redundancy Removal and Collaborative
Firewall Enforcement in VPN

Prior work on intra-ﬁrewall redundancy removal aims to detect redundant rules within a
single ﬁrewall [38, 50, 52]. Gupta identiﬁed backward and forward redundant rules in a
ﬁrewall [38]. Later, Liu et al. pointed out that the redundant rules identiﬁed by Gupta
are incomplete, and proposed two methods for detecting all redundant rules [50, 52]. Prior

150

work on inter-ﬁrewall redundancy removal requires the knowledge of two ﬁrewall policies and
therefore is only applicable within one administrative domain [11, 81].
Prior work on collaborative ﬁrewall enforcement in virtual private networks (VPNs) enforces ﬁrewall policies over encrypted VPN tunnels without leaking the privacy of the remote
network’s policy [23, 48]. The problems of collaborative ﬁrewall enforcement in VPNs and
privacy-preserving inter-ﬁrewall optimization are fundamentally diﬀerent. First, their purposes are diﬀerent. The former focuses on enforcing a ﬁrewall policy over VPN tunnels in a
privacy-preserving manner, whereas the latter focuses on removing inter-ﬁrewall redundant
rules without disclosing their policies to each other. Second, their requirements are diﬀerent.
The former preserves the privacy of the remote network’s policy, whereas the latter preserves
the privacy of both policies.

6.5

Network Reachability Quantiﬁcation

The challenges in network reachability include, misconﬁguration of ACLs, changes of routing policies, and link failures, that could prevent accessibility to essential network services.
To estimate reachability, existing approaches analyze ACLs while considering other critical
parameters like dynamic routing policies, packet transforms, and variations in protocol operations [12, 42, 44, 54, 75, 77]. To estimate the bounds on reachability, Xie et al. deﬁned
union and intersection operations over ACLs while taking into account the routing decisions,
packet transforms, and link failures [77]. This approach, however, over approximates and
does not yield exact bounds. Ingols et al. used Binary Decision Diagrams (BDDs) to reduce the complexity of handling ACLs and to estimate reachability more accurately [42].
Matousek et al. described a formal model using Interval Decision Diagrams (IDDs) to analyze network reachability under all possible network failure conditions [54]. However, the
approach is not scalable as it performs an exhaustive evaluation of failure scenarios that may

151

or may not occur. Al-Shaer et al. proposed a more accurate model using BDDs and applied
symbolic model checking techniques on properties speciﬁed in computation tree logic (CTL)
[30], to verify reachability across the network for any given packet [12]. Sung et al. studied
the eﬀect of reachability constraints on class-of-service ﬂows, where the packets are subjected
to an additional constraint based on their class-of-service [75]. Khakpour et al. used Firewall
Decision Diagrams (FDDs) to quantify reachability while considering all possible network
transforms like NAT, PAT as well as protocol states like connection-oriented, state-less and
so on [44]. They also described a query language to enable network operators to execute
reachability queries.
While most solutions operate on static conﬁgurations, some work has been proposed for
estimating reachability in an online manner. Bandhakavi et al. analyzed the network reachability using a simulated routing environment, i.e., they constructed a routing graph which
represents the possible paths that could be taken by routing advertisements under the current
router conﬁgurations [13]. Analyzing the graph can identify violations in security policies
and in verifying reachability. Zhang et al. described a real-time monitoring and veriﬁcation system for network reachability [83]. A monitoring software runs on all the routers
and collects up-to-date ACL and forwarding state information, which enables the network
administrator to determine instantaneous reachability between any source destination pair.
This approach provides an insight into instantaneous reachability as it considers snapshots
of routing tables at the time of analysis.
Several other works have been proposed to reduce the complexity of managing networks
and to verify network conﬁgurations. Casado et al. described a novel architecture for enterprise, Secure Architecture for the Networked Enterprise (SANE), which comprises of a
centralized authentication server that allows authorized users to access services [19]. In
SANE, the ACLs can be speciﬁed in a natural way so as to capture the semantics clearly.
Le et al. used data mining techniques to analyze security policies and to detect possible
152

misconﬁgurations in the policies [47]. They considered the notion of association rule mining
to extract usable safe conﬁgurations of routers and detect anomalies in other routers using the extracted patterns. Benson et al. described complexity metrics to evaluate relative
complexity among alternate network designs [14]. The metrics allow network operators to
compare conﬁgurations with standard conﬁgurations and identify errors.
All these approaches are based on the same assumption, that is, there is a central network
analyst who has the complete knowledge of the network conﬁguration and other critical
information. However, this assumption is not true for a network where network devices
belong to diﬀerent parties whose network conﬁguration cannot be shared with other parties.
Therefore, these approaches cannot quantify network reachability across diﬀerent parties.

153

CHAPTER 7
Conclusions and Future Work
Preserving privacy and integrity of private data has become core requirements in the recent
decade for distributed systems across diﬀerent parties. In this dissertation, we investigate
four important privacy and integrity preserving problems for diﬀerent distributed systems.
For two-tiered sensor networks, we propose SafeQ, a eﬀective and eﬃcient protocol for
handling range queries in a privacy and integrity preserving fashion. SafeQ uses the techniques of preﬁx membership veriﬁcation, Merkle hash trees, and neighborhood chaining. In
terms of security, SafeQ signiﬁcantly strengthens the security of two-tiered sensor networks.
Unlike prior art, SafeQ prevents a compromised storage node from obtaining a reasonable
estimation on the actual values of sensor collected data items and sink issued queries. In
terms of eﬃciency, our results show that SafeQ signiﬁcantly outperforms prior art for multidimensional data in terms of both power consumption and storage space. We also propose
an optimization technique using Bloom ﬁlters to signiﬁcantly reduce the communication cost
between sensors and storage nodes.
For cloud computing, we propose novel privacy and integrity preserving schemes for multidimensional range queries. To preserve privacy, we propose an order-preserving hash-based
function to encode the data from an organization and the queries from its customers such that

154

a cloud provider can process encoded queries over encoded data without knowing the actual
values. To preserve integrity, we propose the ﬁrst probabilistic integrity-preserving scheme
for range queries. This scheme employs a new data structure, local bit matrices, which
allows customers to verify the integrity of query results with high probability. Our future
work will consider database updating and optimal bucket partitioning for multi-dimensional
data, which have not been solved in this paper.
For distributed ﬁrewall policies, we identify an important problem, cross-domain privacypreserving inter-ﬁrewall redundancy detection and propose a novel privacy-preserving protocol for detecting such redundancy. The results on real ﬁrewall policies show that our protocol
can remove as many as 49% of the rules in a ﬁrewall whereas the average is 19.4%.
For network reachability, we address the problem of privacy preserving quantiﬁcation
of network reachability across diﬀerent domains. Protecting the privacy of access control
conﬁguration is important as the information can be easily abused. We propose an eﬃcient
and secure protocol to quantify the network reachability accurately while protecting the
privacy of ACLs. We use the divide-and-conquer strategy to decompose the reachability
computation which results in a magnitude reduction of the computation and communication
costs. Our future work will consider dynamic routing information and topological variations
where links go down or new links get added to the network resulting in new paths for data
propagation. In our protocol, we have considered the routing information partially by taking
snapshots of the routing state and encoding it as an ACL rule. However, for a more ﬁnegrained analysis, the instantaneous forwarding information state along with the local routing
policies and other service level agreements also need to be considered.

155

APPENDICES

A

Analysis of SafeQ Optimization

Let HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1 ]))) be the sets of data items that
a sensor needs to represent in a Bloom ﬁlter. Let [a, b] be the range query over the n data
items d1 , · · · , dn . Let wo denote the bit length of each dj , a, and b. Let wh denote the
bit length of the numbers after hashing in each HMAC g (N (S([dj , dj+1]))). Let k be the
number of hash functions in the Bloom ﬁlter.
Given two arrays A and B representing data in HMAC g (N (S([d0 , d1 ]))), · · · ,
HMAC g (N (S([dn , dn+1 ]))), for any v of wh bits, a storage node searches the corresponding
index for v by applying the k hash functions to v and check whether two conditions hold:
(1) for every 1≤i≤k, A[hi (v)] = 1; (2) for every 1≤i≤k, index j (0 ≤ j ≤ n) is included in
the list that B[hi (v)] points to. Let X (v) denote the index that the storage node ﬁnds for
v: if the index exists (i.e., the above conditions hold), X (v) = j; otherwise, X (v) = null .
Based on the analysis of Bloom ﬁlters [15],

the probability P r(A[h1 (v)]

=

1, · · · , A[hk (v)] = 1) is (1 − e−k(n+1)q/c )k . The probability P r(j ∈ B[h1 (v)], · · · , j ∈
1
B[hk (v)]) is ( n+1 )k . Therefore, we have

P r(X (v) = j) = (1 − e−k(n+1)q/c)k (

1 k
)
n+1

(7.1)

As P r(X (v) = j) is the same for any 0 ≤ j ≤ n, let α denote the probability P r(X (v) = j).
According to our discussion in Section 2.3.1, each of the two sets HMAC g (N (F (a))) and
HMAC g (N (F (b))) includes wo + 1 wh -bit numbers. For HMAC g (N (F (a))), there exists
a range [dn1 −1 , dn1 ] such that a ∈ [dn1 −1 , dn1 ]. Therefore, there exists one number va in
HMAC g (N (F (a))) such that X (va ) = n1 − 1. Let v1 , · · · , vwo denote the rest wo numbers
in HMAC g (N (F (a))) and Y denote the minimum index in {X (v1 ), · · · , X (vwo )}. Without
loss of generality, we assume X (v1 ) is the minimum index. The probability of Y = j1 − 1

156

can be computed as follows:
wo

P r(Y = j1 − 1) = P r(X (v1) = j1 − 1)

P r(X (vi) ≥ j1 − 1 or X (vi ) = null)
i=2
wo

= P r(X (v1) = j1 − 1)
= α[1 − (j1

(1 − P r(X (vi ) < j1 − 1))

i=2
wo −1
− 1)α]

Similarly, for HMAC g (N (F (b))), there exists a range [dn2 −1 , dn2 ] such that b ∈
[dn2 −1 , dn2 ].

Therefore, there exists one number vb in HMAC g (N (F (b))) such that

X (vb ) = n2 − 1. Let v1 , · · · , vwo denote the rest wo numbers in HMAC g (N (F (b))) and
Z denote the maximum index in {X (v1 ), · · · , X (vwo )}. Without loss of generality, we assume X (v1 ) is the maximum index. We have
wo

P r(Z = j2 − 1) = P r(X (v1 ) = j2 − 1)

P r(X (vi ) ≤ j2 − 1 or X (vi ) = null)
i=2
wo

(1 − P r(X (vi ) > j2 − 1))

= P r(X (v1 ) = j2 − 1)
= α[1 − (n − j2

i=2
+ 1)α]wo −1

Given a query {HMAC g (N (F (a))), HMAC g (N (F (b)))}, if the storage node can ﬁnd Y =
j1 − 1 or Z = j2 − 1 where 0 ≤ j1 < n1 ≤ n2 < j2 ≤ n, the query result has false positives.
Therefore, the average false positive rate can be computed as follows:
n+1

n1 −1

n+1

n+1

{

ǫ =

(j − j1 ) − (n2 − n1 )
[ 2
×
n − (n2 − n1 )

n1 =1 n2 =n1 j1 =1 j2 =n2 +1

P r(Y = j1 − 1)P r(Z = j2 − 1)]
n1 −1

+

[
j1 =1

n1 − j1
×
n − (n2 − n1 )

P r(Y = j1 − 1)P r(Z < n2 or Z = null )]
n+1

+

[
j2 =n2 +1

j2 − n2
×
n − (n2 − n1 )

P r(Y > n1 − 2 or Y = null )P r(Z = j2 − 1)]}
157

(7.2)

n −j

j −n

1 1
2 2
Because [1−(j1 −1)α]wo −1 ≤ 1, [1−(n−j2 +1)α]wo −1 ≤ 1, n−(n −n ) ≤ 1, n−(n −n ) ≤ 1,
2 1
2 1
(j2 −j1 )−(n2 −n1 )
and
≤ 1, we derive Formula 2.1 from the following calculation.
n−(n −n )
2

1

n+1

n+1

[n − (n2 − n1 )]α

ǫ <
n1 =1 n2 =n1

=

1 (n + 2)(n + 3)
(1 − e−k(n+1)q/c )k
3 (n + 1)k−1

1
Typically, we choose the value c = ln 2 k(n+1)q ≈ 1.44k(n+1)q to minimize the probability

of false positive for Bloom ﬁlters. Thus, Formula 2.1 becomes
1 1 (n + 2)(n + 3)
ǫ < ( )k
3 2
(n + 1)k−1
Next, we discuss under what condition our optimization technique reduces the communication cost between sensors and storage nodes. To represent data in the n + 1 sets
HMAC g (N (S([d0 , d1 ]))), · · · , HMAC g (N (S([dn , dn+1]))), without Bloom ﬁlters, the total
number of bits required is wh (n + 1)q; with Bloom ﬁlters, the total number of bits required is
at most c + 2k(n+ 1)q⌈log2 (n + 1)⌉. Note that the number of bits for representing array A is
c, the number of bits for representing array B is at most 2k(n + 1)q⌈log2 (n + 1)⌉. Therefore,
we derive Formula 2.2.
wh (n + 1)q > c + 2k(n + 1)q⌈log2 (n + 1)⌉
1
In case that c = ln 2 k(n + 1)q, Formula 2.2 becomes

wh
wh
k≤ 1
≈
1.44 + 2⌈log2 (n + 1)⌉
ln 2 + 2⌈log2 (n + 1)⌉

158

B

∗
Properties of fk and Their Proof

′
∗
Order Preserving: Assume any hk (xq ) ≥ 2w (xq ∈ [x1 , xN ]). The condition fk (xi1 ) <
∗
fk (xi2 ) holds if and only if xi1 < xi2 .
∗
∗
Proof. We ﬁrst prove that if the condition fk (xi1 ) < fk (xi2 ) holds, then xi1 < xi2 . We

prove it by contradiction. If xi1 ≥ xi2 , we have
∗
fk (xi1 )

=

∗
fk (xi2 ) +

i1

hk (xq )
′
2w

q=i2 +1

∗
≥ fk (xi2 )

∗
∗
Second, we prove that if the condition xi1 < xi2 holds, then fk (xi1 ) < fk (xi2 ). Similar

as the proof of the property collision resistance, we have
∗
fk (xi2 )

=

∗
fk (xi1 ) +

i2

hk (xq )
′
2w

q=i1+1

∗
> fk (xi1 )

′

Collision Resistance: Assume any hk (xq ) ≥ 2w (xq ∈ [x1 , xN ]). It is impossible to ﬁnd
∗
∗
xi1 and xi2 where xi1 = xi2 such that fk (xi1 ) = fk (xi2 ).

Proof. Without loss of generalization, we assume i1 < i2 . Hence, we have
∗
fk (xi2 )

=
=

i1
q=1 hk (xq )
′
2w
∗
fk (xi1 ) +

+

i2

i2
q=i1 +1 hk (xq )
′
2w

hk (xq )

w
q=i1 +1 2

′

.

′
h (x )
∗
Because for any hk (xq ), 2w ≤ hk (xq ), then k ′q ≥ 1. Therefore, we have fk (xi2 ) >
2w
∗
fk (xi1 ).

159

C

Calculation of Detection Probability

We ﬁrst compute the number of choices for deleting a data item in all query results that
will be detected by customers. Recall Case 1 in Section 3.4.2. Given a range query [a, b],
deleting a data item in Bi will be detected by the customer if [a, b] is the superset of Bi , i.e.,
Bi ⊆ [a, b]. For ease of presentation, let [li , hi ] denote a bucket Bi . A bucket Bi is called a
single-value bucket if li = hi . For the bucket Bi = [li , hi ], there are li − x1 + 1 distinct values
which are less than or equal to li . Similarly, there are xN − hi + 1 distinct values which are
larger than or equal to hi . Thus, the total number of queries, which are the supersets of
[li , hi ], can be computed as (li − x1 + 1)(xN − hi + 1). Let e(x) denote the frequency of the
data item with value x. The number of data items with diﬀerent values satisfying a query
[a, b] can be computed as

b
x=a e(x).

Thus, for deleting a data item in bucket Bi that will

be detected by customers, the number of choices can be computed as
hi

π(Bi ) = (li − x1 + 1)(xN − hi + 1)

e(x)

(7.3)

x=li

Therefore, for deleting a data item in all buckets B1 , · · · , Bm that will be detected by
customers, the number of choices can be computed as
m

π=

m

hi

(li − x1 + 1)(xN − hi + 1)e(x)

π(Bi ) =
i=1

(7.4)

i=1 x=li

In our context, e(x) is either equal to 0 or 1 because if multiple data items have the
same value, the organization simply represents them as one data item annotated with the
number of items that share this value. If there is no data item satisfying the query [a, b],
b
x=a e(x)

= 0. Similarly, the number of choices for deleting a data item in all query results

can be computed as
∗

xN xN

b

π =

e(x)
a=x1 b=a x=a

160

(7.5)

Let Ij () (1 ≤ j ≤ n) denote an indicator function as
Ij (x) =
We have e(x) =

n
j=1 Ij (x).

π∗

1 if x = dj
0 otherwise

Thus, π ∗ can be transformed to

xN xN

b

n

=

n

xN xN

b

Ij (x) =
a=x1 b=a x=a j=1
n dj xN

=

1=
j=1 a=x1 b=dj

Ij (x)
j=1 a=x1 b=a x=a

n

(dj − x1 + 1)(xN − dj + 1)

(7.6)

j=1

Thus, the probability that a deletion operation of the cloud provider can be detected is
π
Pr = ∗ =
π
=

m
i=1

m
i=1 π(Bi )
π∗
hi
x=li (li − x1 + 1)(xN − hi + 1)e(x)
n
j=1 (dj − x1 + 1)(xN − dj + 1)

161

(7.7)

D

Proof of Theorems 3 and 4

Proof of Theorem 3
Proof. First, we prove that if each data item dj (1 ≤ j ≤ n) forms a single-value bucket, then
P rmax = 100%. Obviously, P rmax = 100% is equivalent to π = π ∗ . Thus, we only need
dj
x=dj

to prove π = π ∗ . For each bucket [dj , dj ] (1 ≤ j ≤ n), because

e(x) = 1, we have

π([dj , dj ]) = (dj − x1 + 1)(xN − dj + 1). For each empty bucket Bi , because

hi
x=li e(x)

= 0,

we have π(Bi ) = 0. Thus, according to Equation 7.4, we have
n

π=

(dj − x1 + 1)(xN − dj + 1) = π ∗

j=1

Second, we prove that if P rmax = 100%, each data item dj (1 ≤ j ≤ n) should form
a single-value bucket. We prove it by contradiction. If a data item dj does not form a
single-value bucket, P r = π/π ∗ < 100%, which is equivalent to π < π ∗ . Assuming that
B(dj ) = [l∗ , h∗ ] is not a single-value bucket, we consider the following two cases.
(1) If B(dj ) includes only one data item dj , i.e.,

h∗
x=l∗ e(x)

= 1, the condition l∗ < dj <

h∗ must hold. We have
π(B(dj )) = (l∗ − x1 + 1)(xN − h∗ + 1)
< (dj − x1 + 1)(xN − dj + 1)
Therefore, π <

n
j=1 (dj

− x1 + 1)(xN − dj + 1) = π ∗ .

(2) If B(dj ) includes n2 −n1 +1 data items, dn1 , dn1 +1 , · · · , dj , · · · , dn2 , (1 < n2 −n1 +1 <
n), the condition l∗ ≤ dn1 < dn2 ≤ h∗ must hold. We have
π(B(dj )) =

(l∗

− x1 + 1)(xN

− h∗

h∗

e(x)

+ 1)
x=l∗

n2

(dj − x1 + 1)(xN − dj + 1)

<
j=n1

Thus, π <

n
j=1 (dj

− x1 + 1)(xN − dj + 1) = π ∗ .
162

Proof of Theorem 4
Proof. Obviously, P rmin is equivalent to π = n. Thus, we only need to prove that π = n if
and only if there is only one bucket [x1 , xN ].
First, we prove that if there is only one bucket [x1 , xN ], then π = n.
xN

π = π(B1 ) = (x1 − x1 + 1)(xN − xN + 1)

f (x) = n
x=x1

Second, we prove that if π = n, then there is only one bucket [x1 , xN ]. We prove it
by contradiction. If there are multiple buckets B1 , · · · , Bm (m ≥ 2), then π > n. Let
[li , hi ] denote a bucket Bi (1 ≤ i ≤ m). All buckets must satisfy the following condition,
x1 = l1 ≤ h1 < l2 ≤ h2 < · · · < lm ≤ hm = xN . Thus, for each bucket Bi (1 ≤ i ≤ m),
hi

π(Bi ) = (li − x1 + 1)(xN − hi + 1)

hi

e(x) >
x=li

Thus, we have
m

π=

m

hi

e(x) = n

π(Bi ) >
i=1

i=1 x=li

163

e(x)
x=li

BIBLIOGRAPHY

BIBLIOGRAPHY
[1] Amazon web services, aws.amazon.com.
[2] Firewall throughput test, http://www.hipac.org/performance tests/results.html.
[3] Google app engine, code.google.com/appengine.
[4] Intel lab data. http://berkeley.intel-research.net/ labdata.
[5] Microsoft azure, www.microsoft.com/azure.
[6] Rise project. http://www.cs.ucr.edu/ rise.
[7] Stargate gateway (spb400). http://www.xbow.com.
[8] Tossim. http://www.cs.berkeley.edu/ pal/research/ tossim.html.
[9] Rakesh Agrawal, Alexandre Evﬁmievski, and Ramakrishnan Srikant. Information sharing across private databases. In Proc. ACM Inte. Conf. on Management of Data (SIGMOD), pages 86–97, 2003.
[10] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. Order preserving encryption for numeric data. In Proc. ACM Inte. Conf. on Management of Data
(SIGMOD), pages 563–574, 2004.
[11] Ehab Al-Shaer and Hazem Hamed. Discovery of policy anomalies in distributed ﬁrewalls.
In IEEE INFOCOM’04, pages 2605–2616, March 2004.
[12] Ehab Al-Shaer, Will Marrero, Adel El-Atawy, and Khalid ElBadawi. Network conﬁguration in a box: Towards end-to-end veriﬁcation of network reachability and security.
In Proc. IEEE Inte. Conf. on Network Protocols (ICNP), 2009.
[13] Sruthi Bandhakavi, Sandeep Bhatt, Cat Okita, and Prasad Rao. Analyzing end-toend network reachability. In Proc. IFIP/IEEE Inte. Conf. on Symposium on Integrated
Network Management, 2009.
[14] Theophilus Benson, Aditya Akella, and David Maltz. Unraveling the complexity of
network management. In Proc. USENIX Symposium on Networked Systems Design and
Implementation, 2009.
[15] Burton Bloom. Space/time trade-oﬀs in hash coding with allowable errors. Communications of ACM, 13(7):422–426, 1970.

164

[16] Alexandra Boldyreva, Nathan Chenette, Younho Lee, and Adam O’Neill. Orderpreserving symmetric encryption. In Proc. Inte. Conf. on Advances in Cryptology: the
Theory and Applications of Cryptographic Techniques (EUROCRYPT), pages 224–241,
2009.
[17] Dan Boneh and Brent Waters. Conjunctive, subset, and range queries on encrypted
data. In Proc. Theory of Cryptography Conference (TCC), pages 535–554, 2007.
[18] Justin Brickell and Vitaly Shmatikov. Privacy-preserving graph algorithms in the semihonest model. In Proc. Inte. Conf. on the Theory and Application of Cryptology and
Information Security (ASIACRYPT), pages 236–252, 2010.
[19] Martin Casado, Tal Garﬁnkel, Aditya Akella, Michael J. Freedman, Dan Boneh, Nick
McKeown, and Scott Shenker. Sane: A protection architecture for enterprise networks.
In Proc. Usenix Security Symposium, 2006.
[20] Yeim-Kuan Chang. Fast binary and multiway preﬁx searches for packet forwarding.
Computer Networks, 51(3):588–605, 2007.
[21] Fei Chen, Bezawada Bruhadeshwar, and Alex X. Liu. A cross-domain privacy-preserving
protocol for cooperative ﬁrewall optimization. In Proc. IEEE Conf. on Computer Communications (INFOCOM), 2011.
[22] Hong Chen, Xiaonan Man, Windsor Hsu, Ninghui Li, and Qihua Wang. Access control friendly query veriﬁcation for outsourced data publishing. In Proc. 13th European
Symposium on Research in Computer Security (ESORICS), pages 177–191, 2008.
[23] Jerry Cheng, Hao Yang, Starsky H.Y. Wong, and Songwu Lu. Design and implementation of cross-domain cooperative ﬁrewall. In Proc. IEEE Inte. Conf. on Network
Protocols (ICNP), 2007.
[24] Weiwei Cheng, HweeHwa Pang, and Kian-Lee Tan. Authenticating multi-dimensional
query results in data publishing. In Data and Applications Security 2006, pages 60–73,
2006.
[25] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliﬀord Stein. Introduction to Algorithms. MIT Press.
[26] Peter Desnoyers, Deepak Ganesan, Huan Li, and Prashant Shenoy. Presto: A predictive
storage architecture for sensor networks. In Proc. 10th Workshop on Hot Topics in
Operating Systems (HotOS), 2005.

165

[27] Premkumar Devanbu, Michael Gertz, Charles Martel, and Stuart G. Stubblebine. Authentic data publication over the internet. Journal of Computer Security, 11(3):291–314,
2003.
[28] Qunfeng Dong, Suman Banerjee, Jia Wang, Dheeraj Agrawal, and Ashutsh Shukla.
Packet classiﬁers in ternary CAMs can be smaller. In Proc. ACM Sigmetrics, pages
311–322, 2006.
[29] D. Eastlake and P. Jones. Us secure hash algorithm 1 (sha1). RFC 3174, 2001.
[30] E. Allen Emerson. Temporal and modal logic. 1990.
[31] Edward A. Fox, Qi Fan Chen, Amjad M. Daoud, and Lenwood S. Heath. Orderpreserving minimal perfect hash functions and information retrieval. ACM Transactions
on Information Systems, 9:281–308, 1991.
[32] A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[33] Michael Freedman, Kobi Nissim, and Benny Pinkas. Eﬃcient private matching and
set intersection. In Proc. Inte. Conf. on the Theory and Applications of Cryptographic
Techniques (EUROCRYPT), pages 1–19, 2004.
[34] Oded Goldreich. Secure multi-party computations. Working draft. Version 1.4 edition,
2002.
[35] Oded Goldreich. Foundations of Cryptography: Volume II (Basic Applications). Cambridge University Press, 2004.
[36] Mohamed G. Gouda and Alex X. Liu. Firewall design: consistency, completeness
and compactness. In Proc. 24th IEEE Inte. Conf. on Distributed Computing Systems
(ICDCS-04), pages 320–327, March 2004.
[37] Mohamed G. Gouda and Alex X. Liu. Structured ﬁrewall design. Computer Networks
Journal (Elsevier), 51(4):1106–1120, March 2007.
[38] Pankaj Gupta. Algorithms for Routing Lookups and Packet Classiﬁcation. PhD thesis,
Stanford University, 2000.
[39] Pankaj Gupta and Nick McKeown. Algorithms for packet classiﬁcation. IEEE Network,
15(2):24–32, 2001.
[40] Hakan Hacig¨ m¨¸, Bala Iyer, Chen Li, and Sharad Mehrotra. Executing sql over enu us
crypted data in the database-service-provider model. In Proc. ACM Inte. Conf. on
Management of Data (SIGMOD), pages 216–227, 2002.
166

[41] Bijit Hore, Sharad Mehrotra, and Gene Tsudik. A privacy-preserving index for range
queries. In Proc. 30th Inte. Conf. on Very Large Data (VLDB), pages 720–731, 2004.
[42] Kyle Ingols, Richard Lippmann, and Keith Piwowarski. Practical attack graph generation for network defense. In Proc. Annual Computer Security Applications Conf.
(ACSAC), 2006.
[43] Zeus Kerravala. As the value of enterprise networks escalates, so does the need for
conﬁguration management. Enterprise Computing & Networking, The Yankee Group
Report, January 2004.
[44] Amir R. Khakpour and Alex X. Liu. Quantifying and querying network reachability.
In Proc. Inte. Conf. on Distributed Computing Systems (ICDCS), 2010.
[45] Lea Kissner and Dawn Song. Privacy-preserving set operations. In Advances in Cryptology (CRYPTO), pages 241–257, 2005.
[46] Hugo Krawczyk, Mihir Bellare, and Ran Canetti. Hmac: Keyed-hashing for message
authentication. RFC 2104, 1997.
[47] Franck Le, Sihyung Lee, Tina Wong, Hyong S. Kim, and Darrell Newcomb. Detecting
network-wide and router-speciﬁc misconﬁgurations through data mining. 2009.
[48] Alex X. Liu and Fei Chen. Collaborative enforcement of ﬁrewall policies in virtual
private networks. In Proc. Annual ACM SIGACT-SIGOPS Symposium on Principles
of Distributed Computing (PODC), Toronto, Canada, August 2008.
[49] Alex X. Liu and Mohamed G. Gouda. Diverse ﬁrewall design. IEEE Transactions on
Parallel and Distributed Systems (TPDS), 19(8), 2008.
[50] Alex X. Liu and Mohamed G. Gouda. Complete redundancy removal for packet classiﬁers in tcams. IEEE Transactions on Parallel and Distributed Systems (TPDS), in
press.
[51] Alex X. Liu, Chad R. Meiners, and Eric Torng. Tcam razor: A systematic approach towards minimizing packet classiﬁers in tcams. IEEE/ACM Transactions on Networking,
to appear.
[52] Alex X. Liu, Chad R. Meiners, and Yun Zhou. All-match based complete redundancy
removal for packet classiﬁers in TCAMs. In Proc. 27th Annual IEEE Conf. on Computer
Communications (Infocom), April 2008.

167

[53] Alex X. Liu, Eric Torng, and Chad Meiners. Firewall compressor: An algorithm for
minimizing ﬁrewall policies. In Proc. 27th Annual IEEE Conf. on Computer Communications (Infocom), Phoenix, Arizona, April 2008.
[54] Petr Matousek, Jaroslav Rab, Ondrej Rysavy, and Miroslav Sveda. A formal model
for network-wide security analysis. In Proc. IEEE Inte. Conf. and Workshop on the
Engineering of Computer Based Systems, 2008.
[55] Chad R. Meiners, Alex X. Liu, and Eric Torng. TCAM Razor: A systematic approach
towards minimizing packet classiﬁers in TCAMs. In Proc. 15th IEEE Conf. on Network
Protocols (ICNP), pages 266–275, October 2007.
[56] Chad R. Meiners, Alex X. Liu, and Eric Torng. Bit weaving: A non-preﬁx approach to
compressing packet classiﬁers in TCAMs. In Proc. IEEE Conf. on Network Protocols
(ICNP), pages 93–102, October 2009.
[57] Chad R. Meiners, Alex X. Liu, and Eric Torng. Topological transformation approaches
to optimizing tcam-based packet processing systems. In Proc. ACM Inte. Conf. on
Measurement and Modeling of Computer Systems (SIGMETRICS), pages 73–84, August
2009.
[58] Ralph Merkle. Protocols for public key cryptosystems. In Proc. IEEE Symposium on
Security and Privacy, pages 122–134, 1980.
[59] Maithili Narasimha and Gene Tsudik. Authentication of outsourced databases using
signature aggregation and chaining. In Proc. Inte. Conf. on Database Systems for Advanced Applications (DASFAA), 2006.
[60] Silvio Micali Oded Goldreich and Avi Wigderson. How to play any mental game. In
Proc. nineteenth anual ACM Conf. on Theory of computing, May 1987.
[61] David Oppenheimer, Archana Ganapathi, and David A. Patterson. Why do internet
services fail, and what can be done about it? In Proc. 4th USENIX Symposium on
Internet Technologies and Systems (USITS), March 2003.
[62] HweeHwa Pang, Arpit Jain, Krithi Ramamritham, and Kian-Lee Tan. Verifying completeness of relational query results in data publishing. In Proc. ACM Inte. Conf. on
Management of Data (SIGMOD), pages 407–418, 2005.
[63] HweeHwa Pang and Kian-Lee Tan. Authenticating query results in edge computing. In
Proc. 20th Inte. Conf. on Data Engineering, pages 560–571, 2004.

168

[64] Stephen C. Pohlig and Martin E. Hellman. An improved algorithm for computing
logarithms over gf(p) and its cryptographic signiﬁcance. IEEE Transactions Information
and System Security, IT-24:106–110, 1978.
[65] Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Ramesh Govindan,
Li Yin, and Fang Yu. Data-centric storage in sensornets with ght, a geographic hash
table. Mobile Networks and Applications, 8(4):427–442, 2003.
[66] R. Rivest. The md5 message-digest algorithm. RFC 1321, 1992.
[67] David K. Hess David R. Saﬀord and Douglas Lee Schales. Secure RPC authentication
(SRA) for TELNET and FTP. Technical report, 1993.
[68] Yingpeng Sang and Hong Shen. Eﬃcient and secure protocols for privacy-preserving set
operations. ACM Transactions on Infomation and System Security, 13:9:1–9:35, 2009.
[69] Bo Sheng and Qun Li. Veriﬁable privacy-preserving range query in two-tiered sensor
networks. In Proc. IEEE Inte. Conf. on Computer Communications (INFOCOM), pages
46–50, 2008.
[70] Bo Sheng, Qun Li, and Weizhen Mao. Data storage placement in sensor networks.
In Proc. 7th ACM Inte. Symposium on Mobile Ad Hoc Networking and Computing
(MobiHoc), pages 344–355, 2006.
[71] Bo Sheng, Chiu C. Tan, Qun Li, and Weizhen Mao. An approximation algorithm for
data storage placement in sensor networks. In Proc. Inte. Conf. on Wireless Algorithms,
Systems and Applications (WASA), pages 71–78, 2007.
[72] Elaine Shi, John Bethencourt, T-H. Hubert Chan, Dawn Song, and Adrian Perrig.
Multi-dimensional range query over encrypted data. In Proc. IEEE Symposium on
Security and Privacy (S&P), pages 350–364, 2007.
[73] Jing Shi, Rui Zhang, and Yanchao Zhang. Secure range queries in tiered sensor networks.
In Proc. IEEE Inte. Conf. on Computer Communications (INFOCOM), 2009.
[74] Sumeet Singh, Florin Baboescu, George Varghese, and Jia Wang. Packet classiﬁcation
using multidimensional cutting. In Proc. ACM SIGCOMM, pages 213–224, 2003.
[75] Yu-Wei Eric Sung, Carsten Lund, Mark Lyn, Sanjay Rao, and Subhabrata Sen. Modeling and understanding end-to-end class of service policies in operational networks. In
Proc. SIGCOMM, pages 219–230, 2009.
[76] Avishai Wool. A quantitative study of ﬁrewall conﬁguration errors. IEEE Computer,
37(6):62–67, 2004.
169

[77] Geoﬀery G. Xie, Jibin Khan, David A. Maltz, Hui Zhang, Albert Greenberg, G´
ısli
Hj´lmt´sson, and Jennifer Rexford. On static reachability analysis of ip networks. In
a y
Proc. Annual Joint Conference of the IEEE Computer and Communication Societies
(INFOCOM), 2005.
[78] Zhiqiang Yang, Sheng Zhong, and Rebecca N. Wright. Privacy-preserving classiﬁcation
of customer data without loss of accuracy. In Proc. Inte. Conf. on Data Mining (SIAM),
2005.
[79] Andrew C. Yao. Protocols for secure computations. In Proc. 23rd IEEE Symposium on
the Foundations of Computer Science (FOCS), pages 160–164, 1982.
[80] Andrew C. Yao. How to generate and exchange secrets. In Proc. 27th IEEE Symposium
on Fundations of Computer Science, 1986.
[81] Lihua Yuan, Hao Chen, Jianning Mai, Chen-Nee Chuah, Zhendong Su, and Prasant
Mohapatra. Fireman: a toolkit for ﬁrewall modeling and analysis. In IEEE Symposium
on Security and Privacy, May 2006.
[82] Demetrios Zeinalipour-yazti, Song Lin, Vana Kalogeraki, Dimitrios Gunopulos, and
Walid A. Najjar. Microhash: An eﬃcient index structure for ﬂash-based sensor devices.
In Proc. 4th USENIX Conf. on File and Storage Technologies (FAST), pages 31–44,
2005.
[83] Bo Zhang, T. S. Eugene Ng, and Guohui Wang. Reachability monitoring and veriﬁcation
in enterprise networks. In Proc. SIGCOMM, 2008.
[84] Rui Zhang, Jing Shi, and Yanchao Zhang. Secure multidimensional range queries in
sensor networks. In Proc. ACM Inte. Symposium on Mobile Ad Hoc Networking and
Computing (MobiHoc), 2009.

170