DENIAL OF FIREWALLING: ATTACK AND DEFENSE

By
Joshua Hulst

A THESIS
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
MASTER OF SCIENCE

Computer Science and Engineering
2011

ABSTRACT
DENIAL OF FIREWALLING: ATTACK AND DEFENSE
By
Joshua Hulst

Abstract
Firewalls are critical security devices handling all traﬃc in and out of a network.
When under heavy load of both malicious and legitimate traﬃc, ﬁrewalls may be
overloaded and start discarding or permitting packets without checking ﬁrewall rules,
which can cause huge revenue losses or security breaches. In this paper, we study
Denial of Firewalling attacks, where attackers use well-crafted traﬃc to eﬀectively
overwhelm a ﬁrewall. We ﬁrst investigate ﬁrewall implementation characteristics
that can be exploited for such attacks while treating the ﬁrewall as a black box. We
conducted our studies on a testbed with three popular ﬁrewall devices. Second, given
a remote ﬁrewall, we propose methods for attackers to infer the implementation of
the ﬁrewall. We develop ﬁrewall ﬁngerprinting techniques based on ﬁrewall decisions
on a sequence of TCP packets with unusual ﬂags and machine learning techniques
for inferring ﬁrewall implementation. Next, we present methods that attackers can
use to generate the traﬃc that can eﬀectively overload an identiﬁed remote ﬁrewall.
We show that some ﬁrewalls can be easily overloaded by a small volume of carefully
crafted traﬃc. Finally, we discuss methods for defending against such attacks.

ACKNOWLEDGMENTS

The author would like to thank Amir R. Khakpour for the collaboration and
assistance he provided for this paper as well as Alex X. Liu for his supervision and
guidance. Thanks also to AT&T Research Labs for providing the testbed hardware
and Jia Wang, Zihui Ge, and Dan Pei for their guidance and help.

iii

TABLE OF CONTENTS

LIST OF TABLES

vi

LIST OF FIGURES
1 Introduction
1.1 Motivation . . . . . .
1.2 Technical Challenges
1.3 Our Approach . . . .
1.4 Key Contributions .

vii

.
.
.
.

1
1
2
2
3

2 Background
2.1 Firewall Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Caching and Statefulness . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Packet Classiﬁcation Solutions . . . . . . . . . . . . . . . . . . . . . .

5
5
5
6

3 Overview
3.1 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Measurement Environment . . . . . . . . . . . . . . . . . . . . . . . .

8
8
10

4 Firewall Characteristics
4.1 Methodology Basics . . . . . . . . . . .
4.2 Packet Classiﬁcation Algorithm . . . .
4.2.1 Using Sequential Search . . . .
4.2.2 Sensitivity to Traﬃc Load . . .
4.2.3 Average PPT . . . . . . . . . .
4.3 Caching and Statefulness . . . . . . . .
4.4 Impact of Packet Protocol and Payload

13
13
14
15
17
18
19
21

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. . .
. . .
. . .
. . .
. . .
. . .
Size

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

5 Firewall Inference
26
5.1 Firewall inference using TCP Probe Packets . . . . . . . . . . . . . . 27
5.2 Firewall inference using Packet Processing Time . . . . . . . . . . . . 29
6 Firewall Attack Design

34

7 Proposed Defense Mechanisms
40
7.1 Adding delay to packets . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Run Parallel Firewalls . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8 Related Work

43

iv

9 Conclusion

45

BIBLIOGRAPHY

46

v

LIST OF TABLES
2.1

An example ﬁrewall policy . . . . . . . . . . . . . . . . . . . . . . . .

5

4.1

Cache eﬀectiveness based on local and remote measurements . . . . .

24

5.1

Stateful Firewall . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

5.2

Stateless Firewalls

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

5.3

Accuracy and Misclassiﬁcation for Firewall Type Labels . . . . . . . .

31

5.4

Accuracy and Misclassiﬁcation for Statefulness Labels . . . . . . . . .

32

5.5

Accuracy and Misclassiﬁcation for Firewall Type and Statefulness Labels 32

vi

LIST OF FIGURES
3.1

The testbed (For interpretation of the references to color in this and
all other ﬁgures, the reader is referred to the electronic version of this
thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

The PPT for probe packets that match against a rule in the ﬁrewall
policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

The STD of the PPT for probe packets that match against a rule in
the ﬁrewall policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

4.3

The remote PPT for probe packets with diﬀerent packet payload sizes

25

6.1

The PPT of probe packets when the ﬁrewall is under attack with
packet with no payload . . . . . . . . . . . . . . . . . . . . . . . . . .

35

The PPT of probe packets when the ﬁrewall is under attack with
packets with payload . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

4.1

4.2

6.2

vii

CHAPTER 1
Introduction
1.1

Motivation

Firewalls serve as the ﬁrst line of defense for most networks and have been widely
deployed on enterprise and Internet Service Provider (ISP) networks. Because a
ﬁrewall often guards the entrance to a network, the security and reliability of a
ﬁrewall is crucial to the security of a network, making ﬁrewalls a common target
for attacks. In addition to specially crafted attacks, ﬁrewalls often process large
amounts of legitimate or illegitimate traﬃc, including ﬂash crowds and Denial of
Service (DoS) attacks. When a ﬁrewall is unable to handle the amount of traﬃc it
is asked to process, it can either drop or accept packets without processing them
through the ruleset [12, 20].
If a ﬁrewall drops legitimate traﬃc, services behind the ﬁrewall can become unreachable, causing downtime and network disconnection. The Yankee Group estimates that the revenue losses per hour of downtime for the industry of media,
banking, and brokerage are 1.2, 2.6, and 4.5 million dollars, respectively [18]. Firewall products that allow illegitimate traﬃc expose the network it is protecting to
extra risk and allow attackers access to resources they should not be able to reach.
Our interest in this paper is to study ways of identifying and attacking ﬁrewalls.
We term this attack Denial of Firewalling in the sense that we work to overload the
ﬁrewall and force it to either drop legitimate traﬃc or allow illegitimate traﬃc. We

1

investigate the ﬁrewall as a black box, assuming no knowledge of the internal algorithms or methods used, and try to infer identifying characteristics of the black box.
These characteristics are ﬁrst used to identify a ﬁrewall implementation using remote
measurement techniques. After identifying a ﬁrewall, we use the characteristics of
the implementation to propose attacks tailored for a speciﬁc ﬁrewall implementation
performed by generating network traﬃc. After proposing methods to ﬁngerprint and
attack a ﬁrewall, we create methods to defend against identiﬁcation.

1.2

Technical Challenges

The above objectives present signiﬁcant technical diﬃculty. First, identifying ﬁrewall
characteristics remotely is diﬃcult when treating the ﬁrewall as a black box. Firewall
implementations are often kept as trade secrets and can signiﬁcantly vary between
diﬀerent manufacturers. Secondly, there are relatively few methods for remotely measuring ﬁrewall characteristics. Our threat model assumes no administrative access
to the ﬁrewall under attack, preventing some comprehensive measurements. Third,
even after identifying characteristics of a ﬁrewall, exploiting these characteristics
is diﬃcult. Firewall manufacturers work to make secure ﬁrewalls and take every
precaution to eliminate attacks. Finally, preventing identiﬁcation of ﬁrewalls is challenging. The characteristics we propose for identiﬁcation are implementation details
which are not easily changed in the ﬁrewall. Our defense methods must obscure
identiﬁcation while still maintaining performance and correctness.

1.3

Our Approach

For our study, we build a testbed consisting of three popular ﬁrewalls, two computers
hosting eight virtual machines for generating traﬃc to the ﬁrewalls, and one computer
for sending probe packets and measuring the ﬁrewall processing time of the probe
packets.
We work to address the above technical challenges as follows. We identify and

2

measure two methods of ﬁrewall identiﬁcation. The ﬁrst is the packet processing
time associated with speciﬁc packets which we generate. The diﬀerences in the time
it takes for a ﬁrewall to process certain packets is shown to be a strong indicator
of implementation. Second, we show that diﬀerent ﬁrewalls treat packet sequences
diﬀerently. We generate sequences of TCP packets with speciﬁc ﬂags set and watch
how diﬀerent ﬁrewalls treat the packets. After identifying the characteristics, we
create tests to gather packet processing times for diﬀerent types of packets (i.e., TCP
and UDP), diﬀering payload sizes, and diﬀerent types of ﬁrewall implementations
such as stateful and stateless. We use the test data along with machine learning
techniques to form a method of identifying ﬁrewalls.
After identifying a ﬁrewall, we construct attacks speciﬁc to that ﬁrewall based
on the processing time required for speciﬁc packets. We then attack the ﬁrewall
and measure the eﬀect it has on packet processing time of normal packets, CPU
load, and the amount of dropped legitimate traﬃc. Finally, we discuss methods of
deterring our proposed ﬁngerprinting methods while maintaining ﬁrewall correctness
and performance.

1.4

Key Contributions

We made four key contributions in this paper. First, we identiﬁed ﬁrewall implementation characteristics that one can evaluate for black box ﬁrewalls. Second, we
proposed methods for inferring the black box implementation of a target remote
ﬁrewall. Third, we identiﬁed attack strategies for overloading an identiﬁed remote
ﬁrewall. Finally, we proposed ﬁngerprinting defense mechanisms.
The organization of the rest of the paper is as follows. We provide background
information on ﬁrewalls in Chapter 2. In Chapter 3, we present an overview on
diﬀerent steps of our study on DoF attacks and describe our testbed speciﬁcations.
Chapter 4 presents the results of our experiments for identifying ﬁrewall implementation characteristics. We then present methods for inferring the implementation of
a remote ﬁrewall in Chapter 5. We examine the eﬀectiveness of our designed attacks

3

on diﬀerent ﬁrewalls in Chapter 6. In Chapter 7, we propose defense mechanisms to
protect ﬁrewalls from ﬁngerprinting attacks. We give an overview on related work in
Chapter 8. We conclude our studies and present future work in Chapter 9.

4

CHAPTER 2
Background
2.1

Firewall Policies

Firewall policies deﬁne rules for each incoming or outgoing packet specifying whether
to accept or discard. Each rule in the policy speciﬁes a predicate over ﬁve diﬀerent
ﬁelds: source and destination port, source and destination IP address, and IP protocol. Typically, ﬁrewall policies do not check the source port ﬁeld. The rules in
a ﬁrewall policy may overlap and even conﬂict. To resolve conﬂicts, ﬁrewalls follow
the ﬁrst-match semantic, i.e., the decision of the ﬁrst rule that a packet matches is
the decision of the ﬁrewall for the packet. An example ﬁrewall policy is in Table 2.1.
Rule
r1
r2
r3

2.2

Src IP
Dest IP Src Port Dest Port Protocol
1.2.3.0/24
*
*
*
TCP
*
1.2.3.0/28
*
80
*
*
*
*
*
*
Table 2.1. An example ﬁrewall policy

Action
discard
accept
discard

Caching and Statefulness

One method of increasing ﬁrewall performance is to cache rules or ﬂows based on
temporal locality. Rule caching stores the four-tuple of source IP, destination IP,
destination port, and protocol type for which a ﬁrewall has performed a full lookup

5

on its policy. The decision associated with each entry is stored in the cache. When
a ﬁrewall with rule caching receives a packet, it ﬁrst checks whether the four-tuple
header of the packet is in its cache; if found, the decision for the packet can be made
without checking the packet against the main ﬁrewall rules; if not found, the ﬁrewall
checks the packet against its policy and then caches the four-tuple of the packet
with the decision. Flow caching stores the ﬁve-tuple, which includes the source port
ﬁeld in addition to the four ﬁelds used in rule caching. The lookup process for ﬂow
caching is similar to that for rule caching. The purpose of ﬂow caching is to have a
ﬁne-grained access control beyond ﬁrewall policies. For example, to protect against
SYN ﬂooding attacks, some ﬁrewall products stop accepting new SYN packets with
new source ports when they see too many open ﬂows from a speciﬁc source address
with diﬀerent source ports.
Commercial ﬁrewalls often support both stateful or stateless modes. A stateful
ﬁrewall tracks TCP sessions in a state table by examining the TCP ﬂags of incoming
TCP packets. This ensures that the packet in a TCP session follows the correct order
that includes a proper handshake and tear-down. The ﬁrewall drops any packet with
an illegitimate ﬂag. After a correct handshake, an entry is made in the state table.
The packets that match the session entries bypass the ﬁrewall. Once a session goes
through the correct termination procedure, its table entry is removed.

2.3

Packet Classiﬁcation Solutions

The process of checking a packet against a ﬁrewall policy is called packet classiﬁcation. Packet classiﬁcation solutions fall into two main categories: software based
solutions and Ternary Content Addressable Memory (TCAM) based solutions. Software based packet classiﬁcation solutions include the simple sequential search algorithm and other algorithms based on complex data structures (e.g.,, [6,16,17,22,24]).
The sequential search algorithm compares a packet with each rule in a ﬁrewall policy
sequentially until a match is found. Complex data-structure-based packet classiﬁcation algorithms include Recursive Flow Classiﬁcation (RFC) [16], Aggregated

6

Bit-Vector [6], Tuple space [24], HiCut [17], and HyperCut [22], etc. For TCAM
based packet classiﬁcation, ﬁrewall rules are stored in a special memory chip; for any
given packet, the hardware circuit of the chip compares the packet with every stored
rule in parallel and returns the decision of the ﬁrst rule that matched the packet.
TCAM based packet classiﬁcation is widely used in high performance routers and
ﬁrewalls because the lookup is done in constant time.

7

CHAPTER 3
Overview
3.1

Roadmap

To study DoF attacks and defenses, we design a testbed with three popular ﬁrewalls
for conducting extensive experiments and performance measurements. Of the three
ﬁrewalls tested, two are software ﬁrewalls while the other is hardware based. A
software ﬁrewall is implemented fully in software and may reside on a multipurpose
machine as one of many services being provided. Typically, software ﬁrewalls are
highly conﬁgurable and oﬀer more customization and services than their hardware
counterparts. Hardware ﬁrewalls are made speciﬁcally tailored for packet classiﬁcation. Generally, they are more limited in capabilities than software ﬁrewalls but are
usually very fast in classiﬁcation as they are purpose built.
Our measurements are mostly based on probe packet processing time taken on
remote hosts before and after a ﬁrewall. In our initial experiment we study ﬁrewall
characteristics induced by their implementation. We examine ﬁrewall packet classiﬁcation algorithms to understand whether or not they use sequential search for packet
ﬁltering. We then measure the sensitivity of ﬁrewall packet classiﬁcation algorithms
to ﬁrewall background traﬃc load. We continue our studies by inspecting the ﬁrewall
caching techniques and specifying their caching eﬀectiveness. We ﬁnalize our study
by looking at ﬁrewall processing time with respect to probe packet payload size to
understand if they have an impact on the ﬁrewall packet processing time (PPT).

8

The second experiment is to determine if an attacker can infer the implementation
of a ﬁrewall remotely by sending probe packets through the ﬁrewall. The ﬁrewall
implementation inference process is studied from two perspectives. First, we try to
ﬁnd a signature for each ﬁrewall based on the decision of a sequence of TCP packets
with an unusual set of ﬂags. The results show that the three ﬁrewalls, especially if
they are in stateful mode, discard TCP packet sequences with unusual TCP ﬂags.
As administrators rarely deﬁne policies on TCP ﬂags, the obtained signature usually
has a close association with the ﬁrewall implementation. An attacker can use this
signature to infer ﬁrewall implementation remotely with high conﬁdence. Second, as
a complementary method, we use PPT of a sequence of probe packets to train a classiﬁcation model for each ﬁrewall and use it accordingly to infer the implementation
of a target ﬁrewall. Note that in the attacking scenario, the attacker needs to build
simple testbeds including all speculated ﬁrewall brands to acquire signatures and the
classiﬁcation model. He then needs to (1) compromise a host inside the network, (2)
use security scanner tools such as nmap [9] to ﬁnd the packets that can go through
the target ﬁrewall and reach the compromised host and (3) generate and send probe
packets to measure their PPT. An attacker can also obtain more information using
other monitoring tools (e.g., traceroute) to understand the number of hops and the
extra delay between the probe packet sender and receiver to create more accurate
models for ﬁrewall implementation inference.
The third experiment evaluates which attack is more eﬀective for a given ﬁrewall.
We compare probe packet processing time when the ﬁrewall is under random DoF
attacks and some customized DoF attacks, which are designed based on the ﬁrewall’s
characteristics. The results indicate that some ﬁrewalls can be easily overloaded
while other ﬁrewalls are fairly resilient to customized DoF attacks. Note that as it
is not feasible to measure PPT for discarded packets in commercial ﬁrewalls, all the
measurements in this paper are based on accepted packets.
We ﬁnally propose some techniques to protect ﬁrewalls from DoF attacks by adding
some dummy rules for TCP ﬂags or adding hybrid queuing mechanisms to randomize
the pattern of the probe PPT. Indeed, ﬁrewall vendors can also use these results and

9

modify their implementation so that their products are more resilient against DoF
attacks.

Figure 3.1. The testbed (For interpretation of the references to color in this and all
other ﬁgures, the reader is referred to the electronic version of this thesis)

3.2

Measurement Environment

Figure 3.1 shows the testbed topology for our testing of three diﬀerent ﬁrewalls.
Firewalls FW1 and FW2 are software ﬁrewalls running on a Linux machine with
SMP kernel 2.6. Each ﬁrewall has 2 quad-core Intel Xeon 2.66GHz CPUs and 16GB
of RAM. FW3 is a hardware ﬁrewall that runs on a routing engine board with a
850MHz processor, 1536MB DRAM, and 256MB compact ﬂash. Each ﬁrewall is
conﬁgured with the same policy comprised of 375 rules. The ﬁrst 374 rules are set to
accept traﬃc with the ﬁnal rule discarding all traﬃc that is not speciﬁed previously.
The ﬁrewall policy is chosen from real-life ﬁrewall policies used in a university campus
network. The rules are deﬁned over four packet header ﬁelds: source IP, destination
IP, destination port number, and protocol. As with most real-life ﬁrewall policies,
only a few rules overlap. Moreover, there is no rule hidden by another rule (i.e.,
there is no rule with lower index that completely covers a rule with higher index).
Furthermore, the ﬁrewalls are only conﬁgured for packet ﬁltering; other services such
as VPN or NAT are disabled.
10

In addition to the ﬁrewalls, the testbed has two machines, VM1 and VM2, running
VMWare ESX 3.5.0 on a similar machine with 2 quad-core Intel Xeon 2.66GHz
CPUs and 16GB of RAM. Each VMWare instance has four Linux virtual machines
connected to each other by virtual switches. These virtual switches are connected
directly (without an intermediary switch) to each ﬁrewall (FW1, FW2, and FW3).
The virtual machines on VM1 and VM2 are used to place background traﬃc load
on the ﬁrewalls by sending a substantial amount of packets to diﬀerent interfaces of
the ﬁrewall. The traﬃc is generated by Mausezahn network traﬃc generators (aka
mz) [2], an open-source traﬃc generator. Using both VM1 and VM2, we are able
to sustain a traﬃc rate of up to 300Mbps. Based on the design of experiments and
attacks, the generated traﬃc can be accepted or discarded by the ﬁrewall to which
it is sent. To put maximum load on the ﬁrewalls, the generated traﬃc has no packet
payload. This maximizes the number of packets that a ﬁrewall needs to process. If
packets have payloads, ﬁrewall throughput will increase, but traﬃc packet rate (i.e.,
packets per second) will decrease. As mentioned, the virtual switches are directly
connected to the ﬁrewalls. This is to separate the generated traﬃc for each ﬁrewall
and make ﬁrewall experiments independent from each other.
The last portion of the testbed is the Probe Machine & Traﬃc Analyzer (PMTA):
a Linux machine with Dual Quad-core Intel Xeon 2.66GHz CPUs and 16GB of RAM.
We send probe packets by PMTA directly (i.e., no switch in between) to each ﬁrewall
using an open-source packet generator hping2 [1]. If the probe packets are accepted
by the target ﬁrewall they are routed back to PMTA through another interface (as it
is shown in Figure 3.1). In order to measure ﬁrewall packet processing time, we use
packet trace time-stamps. We use tcpdump [3] to dump packets with time-stamps
with microsecond resolution. For the software ﬁrewalls (FW1 and FW2), we can analyze the packet traces and calculate the PPT based on the diﬀerence of packet trace
time-stamps of outgoing and incoming interfaces. However, the hardware ﬁrewall
(FW3) does not support tcpdump or any traﬃc monitoring (i.e., packet dumping)
feature. Therefore, since we cannot measure the packet processing locally on the ﬁrewalls, the probe packets are forwarded to PMTA and we calculate the time-stamp

11

diﬀerence of the packet traces on PMTA. The time-stamp diﬀerences calculated on
PMTA comprise the ﬁrewall PPT plus probe packet round trip time (RTT) which
in turn reduces the accuracy of ﬁrewall PPT.

12

CHAPTER 4
Firewall Characteristics
To study ﬁrewall characteristics, we ﬁrst give an overview on the methodology basics
such as how the probe packets are sent and how the PPT is measured by PMTA.
We then show the results for diﬀerent ﬁrewall features containing ﬁrewall packet
classiﬁcation algorithms, ﬁrewall statefulness and caching, and packets protocol and
payload size impact.

4.1

Methodology Basics

The probe packets are sent by the PMTA in four modes as follows:
• TCP Fix: A sequence of TCP packets with the same packet header.
• TCP Vary: A sequence of TCP packets with the same packet header except
the source port which is chosen randomly for each probe packet.
• UDP Fix: A sequence of UDP packets with the same packet header.
• UDP Vary: A sequence of UDP packets with the same packet header except
the source port which is chosen randomly for each probe packet.
We conduct two sets of experiments with and without background traﬃc load
in the testbed. The ﬁrst set of experiments are performed under no background
traﬃc load, i.e., the probe packets are the only packets that are transmitted in

13

the testbed during the experiments. In contrast, the second set of experiments are
performed under background traﬃc load. In this case, all virtual machines send
dummy packets with no payload to the target ﬁrewall as the background traﬃc. The
header of dummy packets are chosen such that they will be discarded by the rule
conﬁgured in the ﬁrewall, i.e., these dummy packets never pass through the ﬁrewall.
The dummy packet rate varies from 870,000 packets to 1,875,000 packets per second.
Since packets have no payload, the dummy traﬃc varies from 250Mbps to 300Mbps.
Because the ﬁrewalls are installed on powerful machines, they are not under any type
of resource constraints in terms of CPU and memory when the ﬁrewalls are under
the background traﬃc load. This indicates that the experimental results for the
ﬁrewalls under the background traﬃc load may not be aﬀected by hardware resource
constraints.
We use two methods for measuring PPT: (1) Local measurements are based on
packet traces collected from the incoming and outgoing interfaces of the ﬁrewall.
(2) Remote measurements are based on the packet traces collected from the PMTA’s
incoming and outgoing interfaces. The local measurements of PPT are more accurate
than the remote measurements of PPT, but they require (1) local access to the
ﬁrewalls and (2) the ﬁrewall interface must support packet analyzers which dump
packets passing through the ﬁrewall’s interfaces. In contrast to local, the remote
measurement of PPT includes the packet transmission time, reducing the accuracy.
Because FW3 does not support any packet analyzers, we use local measurement for
FW1 and FW2 as well as remote measurement for all three ﬁrewalls to compare
between FW1, FW2, and FW3.

4.2

Packet Classiﬁcation Algorithm

Identifying the exact packet classiﬁcation algorithm that the ﬁrewall uses is very
diﬃcult if we treat the ﬁrewall as a black box. However, we can design experiments
to test (1) whether a ﬁrewall adopts a sequential search based algorithm for packet
classiﬁcation, (2) whether the performance of a ﬁrewall is sensitive to its traﬃc load;

14

and (3) how a ﬁrewall performs in terms of the PPT.

4.2.1

Using Sequential Search

To test if a ﬁrewall uses sequential search for packet classiﬁcation, we generate a
sequence of probe packets where each packet matches exactly one of the rules in
the ﬁrewall policy. We then measure the PPT for the probe packets. If the PPT
increases linearly as we progress further down the rule list, it is likely that the
ﬁrewall uses a sequential-search-based approach for packet classiﬁcation. If the PPT
exhibits a diﬀerent change pattern or lack of change (i.e., remains ﬂat), the packet
classiﬁcation algorithm used by the ﬁrewall is not sequential-search-based and could
be any of other algorithms described in Section 2.3. We repeat this test 10 times
and compute the median value of the PPT. The median value is preferred over mean
value because it is less sensitive to outliers, which can be caused by the variability of
network congestion and interface packet buﬀering, especially when the ﬁrewalls are
under load. Figure 4.1 shows the median value of the PPT.
Figures 4.1 (a) and (b) show the median value of PPT measurements with and
without background traﬃc load for FW1. Using the remote measurement method,
we observe that the median PPT increases as the rule index increases when there is
no background traﬃc load. A similar increasing trend is also observed on median
PPT under background traﬃc load when the local measurement method is used.
The slopes for the regression lines for PPT of FW1 using remote measurement in
Figure 4.1(a) for with-load and no-load curves are 0.1176 and 0.1645, respectively.
Similarly, the corresponding slopes for curves in Figure 4.1(b) are 0.1411 and -0.0317,
respectively. This observation implies that FW1 is likely to use a sequential-searchbased packet classiﬁcation algorithm. The very small negative slope of the median
PPT using local measurement under no background traﬃc load may indicate that
FW1 uses some type of rule pre-fetching or caching, yet as the slope is very small
the eﬀect is not signiﬁcant.
The results for FW2 (shown in Figures 4.1 (c) and (d)) suggest similar sequentialsearch-based classiﬁcation algorithms, especially when the ﬁrewall is under load. The
15

1000

200

150

load

400

no−load

200

100

200
300
Rule Index
(a) Remote measurement on FW1
250

no−load

100

50

0
0

100

200
300
Rule Index
(b) Local measurement on FW1
80

100

50
load
no−load

0
0

100

200
300
Rule Index
(c) Remote measurement on FW2

load

200

Packet Processing Time

0
0

150

Packet Processing Time

600

Packet Processing Time

800

Packet Processing Time

Packet Processing Time

load

no−load

150
100
50
0
0

100

200
300
Rule Index
(d) Local measurement on FW2

60

40

20

load
no−load

0
0

100

200
300
Rule Index
(e) Remote measurement on FW3

Figure 4.1. The PPT for probe packets that match against a rule in the ﬁrewall policy
16

slopes for regression lines for PPT curves with and without background traﬃc load
in Figure 4.1(c) are 0.1339 and 0.0208, respectively. Similarly, the corresponding
slopes for PPT curves in Figure 4.1(d) are 0.3809 and -0.0073, respectively. We
can also observe that FW1 and FW2 have considerably diﬀerent transmission delays
especially when FW1 is under load by comparing the diﬀerences between graphs in
remote measurement (i.e., Figures 4.1(a) and (c)) with their corresponding ones in
local measurement (i.e., Figures 4.1(b) and (d)). Since the experiment environment
is the same for FW1 and FW2, it seems that such diﬀerence is due to the diﬀerent
queuing implementation in FW1 and FW2, yet because we do not have access to
both ﬁrewalls’ source codes, it is diﬃcult to ensure.
We have diﬀerent observations on median PPT for FW3. The slopes for regression
lines for PPT curves for with and without background traﬃc load in Figure 4.1(e)
are 0.0033 and 0.0082, respectively. The fairly ﬂat regression lines for FW3 implies
that FW3 likely uses some other techniques rather than sequential search based
algorithm for packet classiﬁcation. As FW3 is a hardware ﬁrewall, we believe it uses
TCAM-based packet classiﬁcation methods, which use parallel exhaustive search.

4.2.2

Sensitivity to Traﬃc Load

Using the same experimental settings, we also evaluated the sensitivity of ﬁrewall
performance to traﬃc load. We observe that, among all ﬁrewalls, FW1 is most
sensitive and FW3 is the least sensitive to the traﬃc load. Considering all remote
measurements shown in Figures 4.1 (a), (c), and (e), the median PPT when the
ﬁrewalls are under background traﬃc load is 4.6034, 2.7385, and 0.9874 times larger
than the median PPT when the ﬁrewalls are under no background traﬃc load for
FW1, FW2, and FW3, respectively.
We observe that the PPT curves for FW1 and FW2 have sharper slopes when the
ﬁrewalls are under traﬃc load. This implies that the packet classiﬁcation mechanism,
including packet classiﬁcation algorithm and possible caching scheme, depends on the
current traﬃc and load on the ﬁrewall.
We also ﬁnd that the traﬃc load on the ﬁrewall has an impact on the variance
17

and dispersion of the PPT of probe packets, which directly relate to the stability of
the ﬁrewall and ﬁrewall packet reordering. Figure 4.2 shows the standard deviation
(STD) of the PPT for probe packets. We observe in Figure 4.2(a) that on average
the STD of the PPT for FW2 is 52.4749 times larger than that for FW1 in local
measurements.
To show the relation between the STD of the PPT of probe packets in local and
L
R
remote measurement, let Si and Si denote the vector of PPTs obtained in local

and remote measurements for the i-th ﬁrewall, respectively. Let T denote the transmission delay from PMTA to the ﬁrewall and from the ﬁrewall to PMTA. Therefore,
R
L
Si = Si + T . The STD of the PPT for local measurement can be calculated from

the STD of the PPT for remote measurement as follows:

R
L
ST D(Si ) = ST D(Si + T ) =

L
L
ST D(Si )2 + ST D(T )2 + COV (Si , T )

L
As PPT and transmission time are independent, COV (Si , T ) = 0. Also, STD of

transmission time can be represented by a constant vector c . Hence,
L
ST D(Si ) =

R
ST D(Si )2 − c2

Note that the transmission time and its standard deviation can be diﬀerent based
on the load on the ﬁrewalls and the way the ﬁrewalls handle queuing and packet
forwarding. With that in mind, Figure 4.2(b) shows the STD of the ﬁrewall PPT
calculated from the remote measurements. We observe that, on average the STD
of PPT on FW2 is 1.7910 and 33.3 times larger than those on FW1 and FW3,
respectively. In conclusion, the hardware ﬁrewall (FW3) shows less sensitivity to
the traﬃc load and seems to be more stable in terms of performance under diﬀerent
network traﬃc loads.

4.2.3

Average PPT

In general, Figure 4.1 shows that FW3 yields the lowest PPT regardless of the
background traﬃc load on the ﬁrewall. The average PPTs without background
18

traﬃc load on FW1, FW2, and FW3 are 151.7891, 77.5470, and 60.3360 microseconds
in the remote measurements (shown in Figures 4.1 (a), (c), and (e)), respectively.
Similarly, the corresponding ﬁgures with traﬃc load on FW1, FW2, and FW3 are
672.8522, 98.7970, and 57.8777 microseconds, respectively. However, in the local
measurements (shown in Figures 4.1 (b) and (d)), FW1 and FW2 have similar average
PPTs (50.3710 and 49.7796 microseconds, respectively) when there is no background
traﬃc load on the ﬁrewalls. On the other hand, when there is background traﬃc
load on the ﬁrewall, the average PPTs for FW1 and FW2 are 126.7352 and 92.8078
microseconds, respectively. This result again indicates the high sensitivity of FW1 to
the traﬃc load. Overall, FW3 outperforms FW1 and FW2 with the lowest average
and minimum STD of PPTs, and the least sensitivity to the traﬃc load.

4.3

Caching and Statefulness

As explained, modern ﬁrewalls often use diﬀerent caching mechanisms for rule and
ﬂow caching to reduce the performance overhead of packet classiﬁcation. To identify
if a ﬁrewall uses caching and how eﬀective the caching is, we deﬁne cache eﬀectiveness
(C) as the ratio of the PPT for the ﬁrst probe packet in a sequence of probe packets,
whose header is not in the cache table, to the median PPT of the rest of the probe
packets in the same sequence, whose headers are supposedly in the cache. If C > 1,
the ﬁrewall uses caching eﬀectively. If C ≃ 1, the ﬁrewall either does not use caching,
or the caching that the ﬁrewall uses is not eﬀective.
In our experiments, we measure the ﬁrewall’s C value as follows. For each experiment we ﬁrst send 10 probe packets and measure the PPT for each probe packet
and calculate C value accordingly. The C value reported in this paper is the median
C values for 10 repeated experiments. To determine the eﬀectiveness of a ﬁrewall
caching in stateless and stateful modes, we conduct experiments using four types of
probe packet modes: TCP Fix, TCP Vary, UDP Fix, and UDP Vary. If the ﬁrewall
has eﬀective caching in TCP Fix and UDP Fix probe packets, it means that the
ﬁrewall caches all 5 packet header ﬁelds in its cache table. i.e., it performs ﬂow

19

caching. However, if a ﬁrewall has eﬀective caching in TCP Vary and UDP vary
probe packets, it means the ﬁrewall caches only 4 packet header ﬁelds (excluding
source port) in its cache table. i.e., it performs rule caching.
Table 4.1 presents the C values calculated based on local and remote measurements
for experiments on three ﬁrewalls FW1, FW2 and FW3. In the local measurements,
the results show that FW1 performs very eﬀective ﬂow caching on UDP packets as
the cache eﬀectivenesses for UDP Fix are signiﬁcantly more than 1 (7.4931 < C <
16.75). However, FW1 ﬂow caching on TCP packets is not as eﬀective since the
cache eﬀectivenesses for TCP Fix are variable around 2 (1.9038 < C < 2.3411).
In addition, the results imply that FW1 does not support rule caching because the
cache eﬀectivenesses for UDP Vary and TCP Vary are around 1. However, there is
one exception case where the cache eﬀectiveness for TCP Vary probe packets when
FW1 is under the background traﬃc load and conﬁgured in stateless mode is 3.2020.
This could be an indication of some caching mechanisms that are enabled when FW1
is under load.
For FW2, the caching eﬀectivenesses for UDP Fix and TCP Fix experiments range
from 5.4214 to 9.8167, while the corresponding ﬁgures for UDP Vary and TCP Vary
experiments range from 2.8588 to 4.6833. Because the cache eﬀectiveness values for
FW2 are much larger than 1 for all experiments, it seems that FW2 uses rule caching.
In addition, because the cache eﬀectivenesses for UDP Fix and TCP Fix experiments
are larger than those for UDP Vary and TCP Vary experiments, seemingly the
FW2 uses separate ﬂow caching and rule caching mechanisms. Comparing the cache
eﬀectiveness results for FW1 and FW2, the ﬂow caching mechanism on FW1 is more
eﬀective on UDP packets, whereas the ﬂow caching mechanism on FW2 is more
eﬀective on TCP ﬂows.
In the remote measurements, the transmission delay is much larger than the actual
PPT. Therefore, cache eﬀectiveness calculated using the remote measurement are not
as expressive as local measurements. However, there is one exception where FW3
has a cache eﬀectiveness of larger than 2 in UDP Fix probe packets when FW3 is
in the stateful mode. The result indicates that FW3 performs ﬂow caching in this

20

typical case. In addition, we ﬁnd a unique feature on FW3. When FW3 is in stateful
mode, once PMTA sends the ﬁrst TCP SYN packet, FW3 does not accept any other
TCP packet with the same packet header for a while until it receives a TCP ACK
packet from the packet destination. Thus, calculating C for TCP Fix for FW3 in
stateful mode is not applicable.
Another observation that we can make from the results is that in most of the cases,
the cache eﬀectivenesses when a ﬁrewall does not have background traﬃc load are
slightly higher than those when a ﬁrewall has background traﬃc load. One possible
explanation is that when a ﬁrewall is under load, the cache table has a large number
of entries. This results in longer search time (and PPT) for the rest of the probe
packets. This makes the cache less eﬀective compared to no-load experiments, where
the ﬁrewall’s cache table has a small number of entries.

4.4

Impact of Packet Protocol and Payload Size

Firewalls usually perform queuing management techniques to improve their PPT.
Such techniques can be made to be aware of the protocol and payload size of packets.
In order to evaluate the impact of packet protocol and payload size, we conﬁgure all
three ﬁrewalls in the stateless mode and repeat the same set of experiments while
varying the packet payload size. Figure 4.3 shows the median PPT results for packet
payload size of 0, 500, 1000, 1400 bytes.
We have three main observations from the results. First, Figures 4.3(a), (b), (d)
and (e) show that software ﬁrewalls FW1 and FW2 have diﬀerent PPT in TCP
Fix, TCP Vary, UDP Fix, and UDP Vary experiments. We observe that the PPTs
are smaller in TCP Fix and UDP Fix probe packets than those in TCP Vary and
UDP Vary probe packets. This can be a result of eﬀective ﬂow caching on FW1 and
FW2. More speciﬁcally, this observation on FW2 seems to indicate that ﬂow caching
is more eﬀective than rule caching on FW2. Note that the above observations are
made regardless of the packet payload size. However, Figures 4.3(c) and (f) shows
that FW3 has the same PPT for all of the TCP Fix, TCP Vary, UDP Fix, and

21

UDP Vary packets. This indicates that FW3 has the same rule caching mechanism
for all TCP and UDP packets (when it is in stateless mode). Second, the results in
Figures 4.3(a), (b), (d) and (e) indicate that the packet payload size does not impact
the PPT on FW1 and FW2. However, Figures 4.3 (c) and (f) show that the PPT
increases linearly with regression slope of 0.1945 ± 0.0014 as the packet payload size
increases. Finally, we observe from Figure 4.3 that the impact of packet protocol
and payload size on PPT is independent from whether the ﬁrewall has background
traﬃc load.

22

3

5

10

10

FW1

FW1

FW2

FW2

FW3

Standard Deviation (STD)

Standard Deviation (STD)

4

10

3

10

2

10

1

10

0

10

0

2

10

1

10

0

50

100

150
200
250
Rule index
(a) Local measurement

300

350

10

0

50

100

150
200
250
Rule Index
(b) Remote measurement

Figure 4.2. The STD of the PPT for probe packets that match against a rule in the ﬁrewall policy
23

300

350

Stateful
UDP Fix UDP Vary
FW1
FW2

no load
with load
no load
with load

10.4000
7.4931
8.2333
5.4214

1.0315
0.9690
4.9444
4.4446

Stateful
UDP Fix UDP Vary
FW1
FW2
FW3

no load
with load
no load
with load
no load
with load

1.7246
0.9999
1.0103
0.9938
2.3172
2.1525
Table 4.1.

Local Measurement
Stateless
Stateful
UDP Fix UDP Vary TCP Fix TCP Vary
16.7500
10.7955
8.6000
5.9857

1.1943
0.9725
4.9500
4.0451

1.9038
2.3411
9.8167
8.0074

Stateless
TCP Fix TCP Vary

0.8723
0.9050
4.6833
2.8588

Remote Measurement
Stateless
Stateful
UDP Fix UDP Vary TCP Fix TCP Vary

2.3043
2.2195
8.2727
9.0727

Stateless
TCP Fix TCP Vary

0.9455
1.6800
0.9334
1.3957
0.9692
1.1088
1.2151
1.0010
1.2243
0.9883
0.9393
0.9825
0.9576
1.1181
0.9240
0.9141
1.3036
0.8373
1.1909
0.5466
0.9442
0.8090
0.8148
–
0.9756
0.9318
0.8692
0.8354
–
0.9841
Cache eﬀectiveness based on local and remote measurements
24

0.8000
3.2020
4.5424
3.7576

1.4286
1.1185
0.9560
1.0971
0.7975
0.7725

0.9870
1.1378
0.8733
0.9068
0.8062
0.8540

10

0

25

500
1000
Packet Payload Size
(a) FW1 with load

1400

20
15
10
5
0

0

500
1000
Packet Payload Size
(d) FW1 with no load

1400

50
0

0

80

500
1000
Packet Payload Size
(b) FW2 with load

1400

60
40
20
0

0

500
1000
Packet Payload Size
(e) FW2 with no load

1400

Packet Processing Time

100

20

0

UDP Vary

Packet Processing Time

UDP Fix
Packet Processing Time

TCP Vary

Packet Processing Time

Packet Processing Time
Packet Processing Time

TCP Fix
30

400
300
200
100
0

0

500
1000
Packet Payload Size
(c) FW3 with load

1400

0

500
1000
Packet Payload Size
(f) FW3 with no load

1400

400
300
200
100
0

Figure 4.3. The remote PPT for probe packets with diﬀerent packet payload sizes
25

CHAPTER 5
Firewall Inference
The ﬁrst step toward defeating an opponent is to know them – the same principle
applies when it comes to attacking a ﬁrewall. If attackers can successfully infer the
type (e.g., vendor/version) and the characteristics (e.g., statefulness) of the target
ﬁrewalls, they can potentially render a much more eﬀective attack. In this chapter,
we examine the feasibility and eﬀectiveness of ﬁrewall implementation inference using
probe packets.
Our approach is motivated by the wide range of so-called operating system (OS)
ﬁngerprinting [7, 14, 25] techniques. The idea is that diﬀerent Operating Systems
respond to non-conforming protocol (such as TCP, HTTP) interactions diﬀerently.
By tracking the error-handling responses, one may uniquely identify the OS of the
target host. In our case, we study the decision of ﬁrewalls for sequences of TCP probe
packets with varying TCP header ﬂags – the decision of the ﬁrewall is limited to a
binary sequence of whether the corresponding packet is accepted or discarded by the
ﬁrewall. To distinguish the ﬁrewall’s accept/discard decision due to its conﬁgured
policy, we force all probe packets in the same sequence to share the same source IP,
destination IP and destination port. This ensures that these probe packets hit the
same ﬁrewall rule in the typical ﬁrewall settings. However, in some uncommon cases,
certain types of ﬁrewalls support policies that are based on TCP ﬂags in addition to
the other common TCP header information. This makes our binary sequence decision
unreliable. Hence, we further supplement the decision binary sequence with the PPT

26

of the probe packets and use them to infer the target ﬁrewall implementation. Note
that in the remainder of the paper, all PPTs are measured remotely.
To extract the ﬁrewall behavior ﬁngerprints and construct classiﬁers, we ﬁrst establish a controlled environment, which includes various candidate ﬁrewalls of interest,
devices outside the ﬁrewall that can be used to launch probe packets and devices
behind the ﬁrewall that can be used to receive the probe packets and measure the
delays. The testbed network shown in Figure 3.1 is an example of such a set up. The
signatures and classiﬁers identiﬁed herein can then be applied in the “battleﬁeld”.
We next describe in detail the methodology we applied for ﬁrewall implementation
inference and present the result for the three ﬁrewalls in our testbed. While our
testbed is limited to the three diﬀerent ﬁrewalls available to us, we believe that our
methodology is more generally applicable for ﬁngerprinting other ﬁrewalls in the
market.

5.1

Firewall inference using TCP Probe Packets

As there are eight diﬀerent TCP ﬂags deﬁned in a TCP header, one can construct
28 diﬀerent combinations in each probe packet. This can be further compounded by
the permutations of diﬀerent probe packet sequences. In our limited testbed case,
we ﬁnd that it is suﬃcient to use two consecutive probe packets to distinguish the
behavior of diﬀerent ﬁrewalls (and diﬀerent modes – stateful and stateless). We show
the results when we enable the TCP SYN ﬂag along with one other TCP ﬂag in each
of the two packets. Table 5.1 and Table 5.2 present the results for the stateful and
the stateless ﬁrewalls, respectively. For the ease of presentation, we condense the
information in the table such that the columns represent the diﬀerent TCP ﬂags
enabled in the ﬁrst probe packet (besides the TCP SYN ﬂag) and the rows represent
the diﬀerent TCP ﬂags enabled in the second probe packet (besides SYN ﬂag). In
each cell, there are three indicators representing the ﬁrewall decision from the ﬁrewall
FW1, FW2 and FW3 respectively. Indicator “*” means that both probe packets are
accepted by the ﬁrewall and successfully received at the receiving device behind the

27

ﬁrewall; indicator “-” means that one or both probe packets are discarded, or more
strictly speaking, missed by the receiver.

URG
FIN
RST
PSH
ACK
ECE
CWR

URG
****-***-*****-

URG
FIN
RST
PSH
ACK
ECE
CWR

URG
***
***
-**
***
***
***
***

FIN RST PSH ACK ECE
**-***-***-***************-*-*******-*-***-*-*-*-***-*-*-*****-****
-****
Table 5.1. Stateful Firewall

FIN RST PSH ACK
*** -**
***
***
-** ***
***
-**
*** ***
-**
***
*** -**
***
***
-** ***
***
-**
*** ***
***
***
*** ***
***
***
Table 5.2. Stateless Firewalls

ECE
***
***
***
***
***
***
***

CWR
****-****
-****
**-

CWR
***
***
***
***
***
***
***

The result in Table 5.1 and 5.2 demonstrates that tracking the ﬁrewalls’ feedback
to well-crafted TCP probe packets can be eﬀective in obtaining valuable information
to distinguish diﬀerent ﬁrewalls. Unlike FW1 and FW3 that ﬁlter out some probe
packets, for the stateful and stateless setting of FW2, both probe packets have passed
through the ﬁrewall in all 98 cases. While our example uses certain combinations of
TCP ﬂags, other combinations can prove useful for other ﬁrewall types and settings.
We next demonstrate that one can use supplement information from the PPTs to
distinguish these cases.

28

5.2

Firewall inference using Packet Processing
Time

Although ﬁrewall ﬁngerprinting using a sequence of TCP packets is a deterministic method used to infer ﬁrewall implementation, ﬁrewall rules on TCP ﬂags can
change the default decision of the ﬁrewall and cause misidentiﬁcation of the ﬁrewall implementation. Thus, we propose an alternate method to use implementation
characteristics including PPT and ﬁrewall cache eﬀectiveness to infer ﬁrewall implementation and its statefulness. In this method, we build a classiﬁcation model for
each ﬁrewall and for their statefulness modes based on their median PPT, STD of
the PPTs, and cache eﬀectiveness. We then use this classiﬁcation model to infer
ﬁrewall implementation.
To build a classiﬁcation model and analyze its accuracy, we ﬁrst create a dataset
containing 3,600 data points. For each data point, we send 11 consecutive probe
packets in four diﬀerent modes (TCP Fix, TCP Vary, UDP ﬁx and UDP vary)
with and without payload (total of 8 times). Each data point is represented by
x = x1 , · · · , x24 that has 24 features where x3i−2 is the median of the PPTs, x3i−1
is the STD of the PPTs, and x3i is the cache eﬀectiveness for the 11 probe packets.
The data points are collected when the ﬁrewalls are under three diﬀerent load levels:
no load, medium load, and full load. We also use three sets of labels: the labels for
the ﬁrewall type (Y1 ={‘FW1’,‘FW2’,‘FW3’}), the labels for the ﬁrewall statefulness (Y2 ={’stateful’, ’stateless’}), and the labels for ﬁrewall type and statefulness
(Y3 ={‘FW1-SF’,‘FW2-SF’,‘FW3-SF’,‘FW1-SL’,‘FW2-SL’,‘FW3-SL’}).
For the classiﬁcation, we use multi-class Support Vector Machines (SVMs) with
Radial Basis Function (RBF) kernel with parameters, (γ=0.01, C=500). Note that
the value for RBF kernel parameters have been chosen by model selection algorithms
that we used to maximize the classiﬁcation accuracy. We conduct classiﬁcation
separately for each set of labels, under two conditions (1) if the attacker somehow
knows the ﬁrewall load and (2) if the attacker does not know the ﬁrewall load when
he tries to infer ﬁrewall implementation and its statefulness.

29

The classiﬁcation problem under the ﬁrst condition is a classic one, where the data
point is x with an additional feature of ﬁrewall load level, denoted by z. However, the
classiﬁcation problem under the second condition is not as straightforward. When
we train the classiﬁer we know the ﬁrewall load level, but when we use it for testing
we do not know the load level. To solve this problem, we ﬁrst formulize the problem
as follows:

P (Y |x) =

P (Y, z|x)

(5.1)

P (Y |z, x)P (z|x)

(5.2)

z

=
z

Formula 5.2 indicates that we need two probabilistic classiﬁers: the ﬁrst one is to
speculate the ﬁrewall load level (z) given test data point (x), and the second classiﬁer
is to predict the ﬁrewall label, given the test data point and the speculated load level.
Using the probabilistic classiﬁers, we ﬁrst calculate the probability of each label for
a given data point and then choose the label with the highest probability as the ﬁnal
label of the data point. For the probabilistic classiﬁer, we use multi-class libsvm with
probability estimates [26].
Before classiﬁcation, we use feature selection to maximize classiﬁcation accuracy
rates. By using feature selection, we not only alleviate the curse of dimensionality,
but also ﬁnd the most important and distinctive features in our feature set. Thus, we
use the Sequential Forward Searching (SFS) algorithm [23] for feature selection for
each set of labels. The results indicate that: (1) To infer the ﬁrewall implementation,
we only need 6 features that contain the median of PPTs for probe packets in TCP
Fix and TCP Vary modes with payload, and cache eﬀectiveness for all probe modes
with payload. (2) To infer the statefulness, we need 16 features that contain all probe
packets with payload features as well as cache eﬀectiveness of probe packets with no
payload features. (3) To infer the ﬁrewall implementation and statefulness, we need
7 features that contain the cache eﬀectiveness for all probe test modes with payload,
median of PPTs for probe packets in TCP Fix and UDP Vary modes with payload,
and cache eﬀectiveness for probe packets in UDP Vary mode with no payload. Note
30

that when using diﬀerent methods of classiﬁcation, the important features may be
diﬀerent. However, our results show that the most distinctive features are the cache
eﬀectiveness for the probe packets with payload which clearly complies with Figure
4.3.
Y1 : Firewall Type - Known Load
Accuracy
Misclassiﬁcation
Total
FW1
FW2
FW3

94.56%
FW1 FW2
FW3
100%
12.13% 0.54%
84.36%
0%
0.35%
99.11%
0%
3.51%
Y1 : Firewall Type - Unknown Load
Accuracy
Misclassiﬁcation

Total 94.61%
FW1 FW2
FW3
FW1 100%
12.41% 0.63%
FW2 84.58%
0%
0.30%
FW3 99%
0%
3.01%
Table 5.3. Accuracy and Misclassiﬁcation for Firewall Type Labels

The accuracy results for the classiﬁcation under two conditions for each set of
labels are reported in Tables 5.3, 5.4, and 5.5. The results are the mean of 10
cross-validation accuracy and misclassiﬁcation results for the dataset.
The results in Table 5.3 indicate that we can predict the ﬁrewall implementation
with 94.56% and 94.61% accuracy for known load and unknown load, respectively.
The results also show that while the accuracy of predicting FW1 is 100%, the accuracy of predicting FW2 is 84.36% (and 84.58% for unknown load) because in 12.13%
(and 12.41% for unknown load) of the time it is misclassiﬁed by FW1. The closeness
of the accuracy rates for known load and unknown load assumptions shows that the
ﬁrewall load level plays an insigniﬁcant role in classiﬁcation. Note that if we use
other classiﬁcation methods or other set of ﬁrewalls, the ﬁrewall load level can be an
important factor in classiﬁcation. Thus, it should not be overlooked.
The results in Table 5.4 show that we can predict the statefulness of the ﬁrewall
with 85.79% and 85.72% of accuracy for known load and unknown load, respectively.
Surprisingly, the accuracy rate is very good knowing that the statefulness of a ﬁrewall
has a trivial impact on the PPT.
31

Y2 : Statefulness - Known Load
Accuracy
Misclassiﬁcation
Total
Stateful
Stateless

Total
Stateful
Stateless
Table 5.4. Accuracy

Accuracy
Total
FW1-SF
FW2-SF
FW3-SF
FW1-SL
FW2-SL
FW3-SL

85.79%
Text
Binary
89.49%
17.76%
82.24%
10.51%
Y2 : Statefulness - Unknown Load
Accuracy
Misclassiﬁcation
85.72%
Stateful
Stateless
89.60%
18.01%
81.99%
10.40%
and Misclassiﬁcation for Statefulness Labels

Y3 : Firewall Type and Statefulness - Known Load
Misclassiﬁcation

74.04%
69.39%
59.68%
97.48%
79.03%
41.30%
98.13%

FW1-SF FW2-SF FW3-SF FW1-SL FW2-SL
2.17%
0.92%
20.55%
5.19%
0%
1.62%
15.02%
44.54%
0%
0%
0%
0%
30.61%
6.77%
1.60%
7.82%
0%
28.16%
0%
0.42%
0%
3.22%
0%
0%
1.16%
Y3 : Firewall Type and Statefulness - Unknown Load
Accuracy
Misclassiﬁcation

FW3-SL
0%
1.87%
0%
0%
0%
-

Total
74.06%
FW1-SF FW2-SF FW3-SF FW1-SL FW2-SL FW3-SL
FW1-SF 69.39%
2.32%
0.92%
20.62%
5.27%
0%
FW2-SF 59.53%
0%
0%
0%
44.31%
1.96%
FW3-SF 97.39%
0%
0%
0%
0%
0%
FW1-SL 78.95%
30.61%
6.77%
1.69%
7.82%
0%
FW2-SL 41.76%
0%
28.35%
0%
0.42%
0%
FW3-SL 98.04%
0%
3.03%
0%
0%
0.84%
Table 5.5. Accuracy and Misclassiﬁcation for Firewall Type and Statefulness Labels
The results in Table 5.5 show that we can predict the type of a ﬁrewall and its
statefulness with 74.04% and 74.06% of accuracy for known load and unknown load,
respectively, which is relatively good as we have six classes and random classiﬁcation
accuracy rate is 16.67%. As show in Table 5.3, FW1 is classiﬁed with high accuracy,
while FW2 is classiﬁed with relatively low accuracy. Inferring FW3, on the other
hand, can be done with very good accuracy. The misclassiﬁcation rates also suggest
that both stateful FW1 and FW2 are misclassiﬁed as stateless FW1 and FW2 with
high probability of 30.61%.
32

If we use a diﬀerent set of ﬁrewalls we may have diﬀerent results for accuracy and
misclassiﬁcation rates. However, the results for this set of ﬁrewalls indicate that
an attacker can eﬀectively use these two methods to predict a network ﬁrewall and
design attacks accordingly to either seriously impact performance or exploit possible
vulnerabilities. Nevertheless, in practice, the accuracy results will be lower because
of the impact of the transmission delay induced by other middleboxes along the probe
path. Yet, the attacker can obtain the number of routers in the probe path (using
tools such as traceroute) and build a similar testbed for learning to reduce such
impact.

33

CHAPTER 6
Firewall Attack Design
In order to eﬀectively attack ﬁrewalls, we ﬁrst use ﬁrewall characteristic measurements (conducted earlier in section 4) to design eﬀective customized attacks on the
ﬁrewall. We then examine the eﬀectiveness of the customized attacks by comparing
the ﬁrewall performance under the customized attacks with the ﬁrewall performance
under blind attacks.
The experimental methodology herein is to create an attack scenario and monitor
the ﬁrewall performance on legitimate traﬃc. In our testbed setup, we drive attacks
from all machines in VM1 and VM2 and send (legitimate) probe packets from the
PMTA machine. We use the PPT observed by the probe as the performance indicator metric. We also use the CPU utilization on the ﬁrewall device as a measure of
the ﬁrewall “stress” level. In our testbed, this CPU utilization information is available from FW1 and FW2, obtained through Simple Network Management Protocol
(SNMP). FW3 does not provide access to this information.
In a blind attack, VM1 and VM2 send random UDP and TCP packets with no
payload, which are mostly discarded by the ﬁrewall. In contrast, the customized
attack packets are chosen to be accepted by the ﬁrewall. The attack packets are
generated in TCP Vary and UDP Vary modes. Figure 6.1 and Figure 6.2 show the
packet processing time for 500 probe packets sent with one second interval when the
ﬁrewalls are under blind attack, UDP Vary attack and TCP Vary attack. Note that
in Figure 6.1 the customized attack packets have no payload, while in Figure 6.2

34

10

2

10

100

200
300
400
Probe Packet Index
(a) FW1 − Stateless

500

Random attack − Discarded Packets
UDP Vary − Accepted Packets
TCP Vary − Accepted Packets
3

10

2

10

0

100

200
300
400
Probe Packet Index
(d) FW1 − Stateful

500

0

4

100

200
300
400
Probe Packet Index
(b) FW2 − Stateless

Random attack − Discarded Packets
UDP Vary − Accepted Packets
TCP Vary − Accepted Packets

2

10

100

200
300
400
Probe Packet Index
(e) FW2 − Stateful

3

10

500

Random attack − Discarded Packets
UDP Vary − Accepted Packets
TCP Vary − Accepted Packets

2

10

0

500

10

0

Packet Processing Time

10

3

0

Packet Processing Time

4

Random attack − Discarded Packets
UDP Vary − Accepted Packets
TCP Vary − Accepted Packets

Packet Processing Time

TCP Vary − Accepted Packets

Packet Processing Time

UDP Vary − Accepted Packets

Packet Processing Time

Packet Processing Time

Random attack − Discarded Packets

100

200
300
Probe Packet Index
(c) FW3 − Stateless

500

4

10

Random attack − Discarded Packets
UDP Vary − Accepted Packets
TCP Vary − Accepted Packets
3

10

2

10

0

100

200
300
Probe Packet Index
(f) FW3 − Stateful

Figure 6.1. The PPT of probe packets when the ﬁrewall is under attack with packet with no payload
35

400

400

500

they have packet payload.
Figure 6.1(a) and 6.1(d) show the probe PPTs for FW1 when it is in the stateless
and stateful modes, respectively. The results indicate that the TCP and UDP Vary
attacks are more successful than blind attack, whereas UDP Vary and TCP Vary
almost have the same eﬀect on FW1. The average PPT for probe packets when
FW1 is in stateless mode under UDP and TCP Vary attacks are 686.10 and 673.86
microseconds, which are 1.68 and 1.65 times larger than when it is under blind
attack. However, TCP and UDP Vary attacks impose up to 2.08 and 2.06 times
more load than the blind attack, meaning that the ﬁrewall needs more than twice
the processing power to handle the attack. The average PPT results are almost the
same when the ﬁrewall is in the stateful mode. However, the ﬁrewall load has 21.36%
increase when it is in the stateful mode. Hence, if the attacker knows an accepted
rule, the attack on that rule will be much more eﬀective. Note that for ﬁrewalls like
FW1 which use sequential search for packet processing, if an attacker ﬁnds a rule
with a higher rule index in the ﬁrewall policy, it can increase the eﬀectiveness of
the attack. However, this depends highly on the size of the ﬁrewall policy and the
ﬁrewall caching algorithm [21].
Figure 6.1(b) and 6.1(e) show the probes PPT for FW2 when it is in the stateless
and stateful modes, respectively. First, the results indicate that the average PPT
of probe packets when the ﬁrewall is under blind attack is 1.82 times more than
when the ﬁrewall is under UDP attack. Second, the average PPT of probe packets
when the ﬁrewall is under TCP Vary attack is 22.4 and 25.77 times more than when
the ﬁrewall is under blind attack for the stateless and stateful modes, respectively.
Moreover, when the ﬁrewall is under TCP Vary attack 36.20% and 34.40% of the
probe packets are discarded before reaching the destination (i.e., packet loss) when
the ﬁrewall is in the stateless and stateful modes, respectively. However, looking
at the average load of the ﬁrewall, there is only 0.14 increase for TCP Vary attack
compared to a blind attack. This can somehow imply that the caching table is
ﬂooded by new ﬂows, which reduces the performance of the system tremendously,
while the system has enough resources to forward attack packets with small PPT.

36

10

2

10

0

100

200
300
400
Probe Packet Index
(a) FW1 − Stateless

500

4

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

3

10

0

2

10

0

4

10

100

200
300
400
500
Probe Packet Index
(b) FW2 − Stateless

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

2

10

2

10

4

10

100

200
300
400
Probe Packet Index
(d) FW1 − Stateful

500

0

100

200
300
400
Probe Packet Index
(e) FW2 − Stateful

500

Packet Processing Time

3

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

3

10

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

2

10

1

10

Packet Processing Time

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

Packet Processing Time

Packet Processing Time

10

4

Packet Processing Time

Packet Processing Time

10

0

3

10

0

100

200
300
400
500
Probe Packet Index
(c) FW3 − Stateless

Random attack − No Payload
UDP Vary with Payload
TCP Vary with Paylaod

100

200
300
400
Probe Packet Index
(f) FW3 − Stateful

Figure 6.2. The PPT of probe packets when the ﬁrewall is under attack with packets with payload
37

500

Thus, amongst all of the attacks, the TCP Vary attack is quite eﬀective on FW2,
and FW2 is quite resilient against UDP Vary attack.
Figure 6.1(c) and 6.1(f) show the probes PPT for FW3 when it is in the stateless
and stateful modes, respectively. The results indicate that none of attacks has considerable impact on the ﬁrewall. More speciﬁcally, when the ﬁrewall is in the stateless
mode, the average PPT is 66, 67 and 67 microseconds for blind, UDP and TCP Vary
attacks. When the ﬁrewall is in the stateful mode, the average PPT increases by 6
times to 401, 397 and 397 microseconds. Unlike the other two ﬁrewalls it seems that
the generated attacks on FW3 do not have much impact on the ﬁrewall performance.
It is also notable that although it seems that FW3 have the least ﬂuctuation around
the mean value, there are some spikes for TCP and UDP Vary attacks whose value is
9 times larger than the mean value when ﬁrewall is in the stateless mode. Such spikes
can be seen on Figure 6.1(e) for stateful results as well as spikes for blind attacks.
The standard deviation of PPT for blind attacks when FW3 is in the stateful mode
is 11.5 times more than when FW3 is in the stateless mode.
Figure 6.2 shows the impact of attack traﬃc on probe packets when attack traﬃc
has payload. In this ﬁgure, the UDP and TCP Vary with payload PPT results for
each probe packet are calculated as the mean of the PPT for the probe packet with
three diﬀerent payload sizes: 500B, 1000B, and 1400B. Figure 6.2(a) and 6.2(d) show
the packet payload size on attack traﬃc have almost no eﬀect on FW1. However,
Figure 6.2(b) and 6.2(e) show the packet payload on attack traﬃc increases the STD
of PPT on probe packets sent to FW2 by 2.38 times for the UDP Vary test when the
ﬁrewall is in the stateful mode and decreases the STD of PPT by 1.61 times for the
TCP Vary tests in all the ﬁrewall modes. Similarly, Figure 6.2(c) and 6.2(f) show
that the packet payload on attack traﬃc does not have any impact on average PPT
of probe packets sent to the FW3. However, the STD of PPT of probe packets is
decreased by 1.88 and 1.72 times when the ﬁrewall is in the stateless mode for TCP
and UDP Vary attack, respectively. Similarly, when FW3 is in the stateful mode, the
STD of PPT of probe packets is decreased by 2.17 and 2.67 times when the ﬁrewall
is in the stateless mode for TCP and UDP Vary attack, respectively. Therefore, to

38

attack these ﬁrewalls eﬀectively, packets with no payload are preferred. On the other
hand, this implies that the main focus of an administrator who is working to detect
and stop such attacks should be on attack packets that have no payload as they can
have an impact on the stability of the ﬁrewall.

39

CHAPTER 7
Proposed Defense Mechanisms
Our proposed ﬁngerprinting algorithm is largely based on the identifying the distribution of the PPT of a particular ﬁrewall. To defend against such ﬁngerprinting, a
ﬁrewall’s PPT must be altered to hide the particular implementation. In this section,
we present two methods for obscuring the PPT of a given ﬁrewall.

7.1

Adding delay to packets

The most straightforward way to alter PPT is to add a delay to packets, but there are
signiﬁcant consequences of performing this. If we were to add a constant delay to all
packets traversing a ﬁrewall, the distribution would be shifted, but not signiﬁcantly
altered.
Our goal in this method is to make every ﬁrewall in our testbed look similar when
it comes to PPT. We do this by taking the ﬁrewall with the highest mean PPT and
use that as a base for the rest of the ﬁrewalls. Since delay can only be additive
(a negative delay is impossible to implement), we can only increase the PPT of a
ﬁrewall.
To ﬁnd an appropriate distribution for the delay, we ﬁrst ﬁnd the mean PPT and
variance for each ﬁrewall system. After ﬁnding the deﬁning characteristics for each
one, we take the highest mean distribution and ﬁnd additive distributions for each
ﬁrewall to make the new distribution appear as close as possible to the baseline

40

ﬁrewall.
An additional drawback to adding random delay is that it may cause packet reordering within a ﬂow. We propose two methods to counteract this. The ﬁrst is
that we keep the variance of the delay distribution relatively small. This reduces the
chance that Packet 1 arrives ﬁrst and is assigned a very high delay while Packet 2 is
assigned a very low delay and is thus forwarded before Packet 1. Further, we suggest
per ﬂow delays. If Packet 2 is scheduled to be forwarded before Packet 1, Packet 2
is held until Packet 1 is forwarded and then is immediately sent. Since the variance
of the delay distribution is low, this will happen often and when it does the delay of
Packet 2 will be minimal.
We implement this delay using the Traﬃc Controls built into the Linux kernel
which allows for added delay in multiple ways.
The most signiﬁcant drawback of adding a random amount of delay as discussed
in the previous section is the unnecessary delay and congestion that can be caused
by network delays. To combat that, a more complex system is presented in the next
section.

7.2

Run Parallel Firewalls

An alternate method to changing the distribution of the PPT is to run two diﬀerent
ﬁrewalls in parallel. Some professional ﬁrewalls provide parallel features but in most
instances both ﬁrewalls are the same type, therefore running the same algorithms,
yielding the same PPT distribution. Our proposal combines two diﬀerent ﬁrewall
implementations into a single logical unit as seen from outside of the network. A load
balancer sits in front of both ﬁrewalls, randomly choosing which ﬁrewall a particular
packet will be sent to.
This method presents several advantages over adding delay. First, the network gets
many of the beneﬁts of parallel ﬁrewalls, such as improved response time and higher
network throughput. Second, by having separate ﬁrewalls in parallel, vulnerabilities
become harder to exploit. An attacker cannot control which ﬁrewall a packet is sent

41

through, creating more uncertainty in any attack.
Our proposed parallel ﬁrewall load balancer is simple to implement in stateless
ﬁrewall conﬁgurations as it does not matter which packet goes through which ﬁrewall
but in stateful ﬁrewalls, a truly random assignment could cause complications. For
instance, if diﬀerent portions of the TCP handshake go through diﬀerent ﬁrewalls,
neither will accept the connection as being truly setup and will block all further
communication.
To circumvent this, we propose a ﬂow-aware load balancer. The ﬁrst packet of
a ﬂow is pinned to a particular ﬁrewall and all further packets from that ﬂow are
routed through the original ﬁrewall. This allows each ﬁrewall to maintain a valid
state table while still adding variability into the process. Our solution uses features
of iptables to perform the load balancing, but in a real network a more complete
implementation may be required, allowing the load balancer to make decisions based
on load levels of each ﬁrewall.
Both methods described require testing to determine performance implications as
well as the eﬀectiveness against our proposed attacks. Implementing parallel ﬁrewalls
provides many other beneﬁts in addition to the defense against DoF attacks and
has been implemented in industry. Firewall administrators may wish to investigate
adding randomized delays on the ﬁrewalls themselves in the future.

42

CHAPTER 8
Related Work
In [21], Saleh et al. proposes a method of attack on ﬁrewalls that perform sequential
search. The basic idea is to send packets that match the last rule in a ﬁrewall.
However, it is extremely diﬃcult, if not impossible, for an attacker to ﬁnd the packets
that match the last rule in a ﬁrewall without knowing the policy and implementation
of the ﬁrewall. Furthermore, our results actually show that attack traﬃc consisting
of accepted packets is more eﬀective than attack traﬃc consisting of the packets
discarded by the last rule.
Work has also been done on ﬁrewall performance evaluation [8, 13, 15, 19]. Lyu
and Lau measured the performance of a ﬁrewall under seven diﬀerent policies, where
each policy is for one security level [19]. In a similar vein, Funke et al. evaluated
the ﬁrewall performance (mostly ﬁrewall throughput) under policies with diﬀering
number of rules [13]. They also show that more rules do not necessarily imply poorer
ﬁrewall performance.
There are some industrial reports on comparing commercial ﬁrewalls in terms of
performance under diﬀerent circumstances. Bosen in [8] compared Secure Computing Sidewinder [5] with Checkpoint’s NGX [10] and reported better throughput for
Sidewinder when high-level of protection including packet and protocol inspection is
required. Tolly Group, one of the independent test labs that performs extensive tests
on diﬀerent IT devices from diﬀerent vendors, compared independent Checkpoint
Firewall (VPN-1 Pro), PIX Firewall 535 [11], and NetScreen-500 [4]. The report in-

43

dicated that the Checkpoint Firewall outperforms the other two ﬁrewalls in most of
the tests run [15].

44

CHAPTER 9
Conclusion
In this paper, we present methods for ﬁnding the ﬁrewall characteristics that are
introduced by ﬁrewall implementations. Such characteristics can be exploited by attackers to identify black box ﬁrewalls with high accuracy and launch eﬀective attacks
on ﬁrewalls. We show two methods for inferring ﬁrewall implementation using these
characteristics. The ﬁrst method is based on the ﬁrewall decision on a sequence of
TCP packets with unusual ﬂags, which could be used as a ﬁrewall ﬁngerprint for identiﬁcation. The second method is based on machine learning techniques. We further
study the impact of diﬀerent attacks on diﬀerent ﬁrewalls and show that diﬀerent
ﬁrewalls are vulnerable to diﬀerent attacks. Finally, we propose defense techniques
to prevent attackers from inferring the implementation of a ﬁrewall. Some defense
mechanisms can be applied to all ﬁrewalls and some have to be customized based
on ﬁrewall implementation. Such mechanisms are designed to increase the chance
of incorrect ﬁrewall implementation inference by concealing ﬁrewall TCP ﬂag ﬁngerprints and obscuring the pattern in probe PPT. To evaluate the eﬀectiveness of these
defense mechanisms and measure their impact on ﬁrewall performance, extensive experiments must be conducted with an expanded testbed.

45

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] Hping. http://www.hping.org.
[2] Mz-traﬃc generator. http://www.perihel.at/sec/mz.
[3] Tcpdump public repository. http://www.tcpdump.org.
[4] Netscreen-500. http://www.juniper.net/us/en/prod
ucts-services/end-of-sale/ns500/, 2008.
[5] Sidewinder g2 startup guide. http://www.securecompu
ting.com/techpubs download.cfm?id=1642, 2009.
[6] Florin Baboescu and George Varghese. Scalable packet classiﬁcation. In Proc.
ACM SIGCOMM, 2001.
[7] Robert Beverly. A robust classiﬁer for passive TCP/IP ﬁngerprinting. In Proc.
PAM, 2004.
[8] Bill Bosen. Firewall Performance Evaluation Secure Computing Sidewinder vs.
Check Point NGX. http://
www.comguard.cz/ﬁleadmin/user upload/aktuality/
SIDE-vsCheckpointPerformanceEval.pdf, 2007.
[9] Faydor. Nmap: Free network security scanner. http://nmap.org.
[10] CheckPoint FireWall-1. http://www.checkpoint.com/.
˜
[11] CiscoPIX Firewalls. http://www.cisco.com/. Date of access: March 25, 2005.
[12] Untangle Forum. http://forums.untangle.com/installation/19181-setting-upuntangle-stop-ddos-attack.html, 2010.
[13] Reiner Funke, Andreas Grote, and Hans-Ulrich Heiss. Performance evaluation
of ﬁrewalls in gigabit-networks. In Proc. Symposium on Performance Evaluation
of Computer and Telecommunication Systems, 1999.
[14] Lloyd G. Greenwald and Tavaris J. Greenwald. Toward undetected operating
system ﬁngerprinting. In Proc. USENIX workshop on Oﬀensive Technologies,
2007.

46

[15] The Tolly Group. Competetive evaluation of enterprise-class internet security
solutions. http://www
.tolly.com/TS/2002/Check Point/VPN-1 Pro/Tolly
TS202132CheckPointVPN-1ProAug02-b.pdf, 2002.
[16] Pankaj Gupta and Nick McKeown. Packet classiﬁcation on multiple ﬁelds. In
Proc. ACM SIGCOMM, 1999.
[17] Pankaj Gupta and Nick McKeown. Packet classiﬁcation using hierarchical intelligent cuttings. In Proc. Hot Interconnects VII, Aug. 1999.
[18] Zeus Kerravala. As the value of enterprise networks escalates, so does the need
for conﬁguration management. Enterprise Computing & Networking, The Yankee Group Report, January 2004.
[19] Michael R. Lyu and Lorrien K. Y. Lau. Firewall security: Policies, testing and
performance evaluation. In Proc. Int.Conf. on Computer Systems and Applications (COMPSAC), 2000.
[20] Luke Macpherson. Performing Under Overload. PhD thesis, School of Computer
Science and Engineering, University of New South Wales, 2007.
[21] K. Salah, K. Sattar, M. Sqalli, and Ehab Al-Shaer. A potential low-rate dos
attack against network ﬁrewalls. Security and Communication Networks, John
Wiley., 2009.
[22] Sumeet Singh, Florin Baboescu, George Varghese, and Jia Wang. Packet classiﬁcation using multidimensional cutting. In Proc. ACM SIGCOMM, pages
213–224, 2003.
[23] P. Somol, P. Pudil, J. Novovicova, and P. Paclik. Adaptive ﬂoating search
methods in feature selection. Pattern Recognition Letter, 20:11–13, 1999.
[24] V. Srinivasan, Subhash Suri, and George Varghese. Packet classiﬁcation using
tuple space search. In Proc. ACM SIGCOMM, pages 135–146, 1999.
[25] David Watson, Matthew Smart, G. Robert Malan, and Farnam Jahanian.
Protocol scrubbing: network security through transparent ﬂow modiﬁcation.
IEEE/ACM Transactions on Networking, 12:261–273, 2004.
[26] Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. Probability estimates for
multi-class classiﬁcation by pairwise coupling. Journal of Machine Learning
Research, 5:975–1005, 2004.

47