TCAM REDUCTION TECHNIQUES FOR ALL-MATCH CLASSIFIERS By Nicholas Jon Wender A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Computer Science 2012 Abstract TCAM REDUCTION TECHNIQUES FOR ALL-MATCH CLASSIFIERS By Nicholas Jon Wender Network intrusion detection systems require all-match packet classification, where all rules matching a packet are reported by the system. The problem of efficiently reporting all matching rules is known as the all-match optimization problem. One solution is to convert an all-match classifier into a first-match classifier (in which only the first classifier rule that matches a packet is reported), and use ternary content addressable memory (TCAM) for packet classification. In this thesis, we evaluate two classifier minimization approaches. First, we consider the use of all-match classifier-specific optimization algorithms. Second, we use state-of-theart first-match classifier optimization algorithms in conjunction with all-match algorithms. Our results indicate the appropriate approach is related to the number of TCAM chips available for classification. When using one TCAM chip, we attain 70.85% TCAM space savings using first-match classifier optimization algorithms instead of all-match classifier optimization algorithms. When using multiple TCAM chips, we found that it is best to use all-match specific optimization algorithms. Contents List of Tables . . . . . . . . . . . . . . . List of Figures . . . . . . . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . vi . . . . . . . . . . . . . . Key to Symbols and Abbreviations . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . 1.1 Ternary Content Addressable Memory 1.2 Snort Rules . . . . . . . . . . . . . . . 1.3 Problem Overview . . . . . . . . . . . 1.4 Contributions . . . . . . . . . . . . . . . . . . . 1 2 4 6 6 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 All-Match Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 First-Match Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 9 3 Definitions and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 11 4 Algorithms . . . . . . . . . . . . . . . . 4.1 All-Match to First-Match Conversion 4.2 All-Match Optimization . . . . . . . 4.2.1 Negation Removal . . . . . . 4.2.2 SSA . . . . . . . . . . . . . . 4.3 First-Match Classifier Optimization . 4.3.1 TCAM Razor . . . . . . . . . 4.3.2 Topological Transformation . 4.3.3 TCAM SPliT . . . . . . . . . 4.4 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 16 16 18 20 23 25 27 29 5 Experimental Results . . . . . . . . 5.1 Methods . . . . . . . . . . . . . . 5.2 Baseline Results . . . . . . . . . . 5.2.1 Single TCAM Approach . 5.2.2 Mutliple TCAM Approach 5.3 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 34 35 35 38 . . . . . . . . . . . . iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Bibliography . . . . . . . . . . . . . . . 42 iv . . . . . . . . . . . . . . . . . . . List of Tables Table 4.1 Two-dimensional classifier where each field has domain [0, 7]. . . . . 21 Table 5.1 Number of input and output rules through the all-match to firstmatch conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Number of input and output rules through the all-match to firstmatch conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Summary of results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Table 5.2 Table 5.3 v List of Figures Figure 1.1 Using a TCAM to search for a given key in parallel. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis . . . . . . . . . . . . . 3 Figure 1.2 Na¨ negation removal of the port !43 by flipping bits. . . . . . . . ıve 4 Figure 1.3 Example Snort rules with Filter, options, and actions. . . . . . . . . 5 Figure 4.1 Splitting discontinuous rule ranges. . . . . . . . . . . . . . . . . . . 15 Figure 4.2 Simple application of negation removal losing classifier semantics. . . 16 Figure 4.3 Example of negation removal generating three separators for two negated fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Figure 4.4 Example of SSA splitting a set of rules into two sets. . . . . . . . . . 20 Figure 4.5 FDD corresponding to the classifier shown in table 4.1 with decisions “a” for accept and “d” for discard. . . . . . . . . . . . . . . . . . . . 23 Figure 4.6 Two dimensional classifier with decisions and its associated FDD. . 24 Figure 4.7 One-dimensional solutions for nodes v2 and v3 . . . . . . . . . . . . . 25 Figure 4.8 Figure 4.9 One-dimensional solution for node v1 with virtual v2 and v3 nodes. Result of TCAM Razor on the classifier from 4.6. . . . . . . . . . . . 26 26 Figure 4.10 Figure 4.11 Two dimensional classifier expressed with ranges. . . . . . . . . . . . Landmarks identified by Topological Transformation for field F1 . . . 27 27 Figure 4.12 Figure 4.13 Transformer for field F1 . . . . . . . . . . . . . . . . . . . . . . . . . Transformed classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 Figure 4.14 FDD for node and field table generation. . . . . . . . . . . . . . . . 29 Figure 4.15 Node tables for F1 and F2 , and field table for F2 . . . . . . . . . . . 30 vi Figure 4.16 Field table and shadow encoded field table for F2 . . . . . . . . . . . 30 Figure 5.1 Classifier sizes obtained using TCAM Razor vs negation removal alone for the Snort 2.9.0.3 rule set. . . . . . . . . . . . . . . . . . . . 36 Classifier sizes obtained using TCAM Razor vs negation removal alone for the Snort 2.9.0.5 and 2.9.1.0 rule sets. . . . . . . . . . . . . 36 Time efficiency of TCAM Razor with and without negation removal. 38 Figure 5.2 Figure 5.3 vii KEY TO SYMBOLS AND ABBREVIATIONS Symbol b Description Number of bits Symbol Ni d Number of dimensions n C Classifier m ri d(r) Rule i of a classifier Decision of rule r L xi Fi r[i] Field i Field i of rule r C R D(Fi ) Domain of field i I p p[i] Packet Field i of packet p I CIi p⊆r fi AM FM m(ri ) R Packet p matches rule r Classification of p using C All-match classifier First-match classifier Matchlist of ri Region D(R) Domain of region R Li P s Set of packets Separator rule Ti Si C(p) DS v e F (v) I(e) viii Description Set of negated fields for Fi Number of literals/rules Number of clauses/intersections Set of literal pairs Positive literal for rule i Clause Subset of F containing non-intersection rules Subset of F containing intersection rules Intersection rule Clause i for intersection I FDD f for field i Decision set of a classifier for an FDD FDD node FDD edge Label of FDD node v Set of labels for FDD edge e : u → v Landmark range for edge i Transformer for field i Range of rule from field i Chapter 1: Introduction Network intrusion detection/prevention systems (NIDS/NIPS) detect and prevent malicious attacks on networks. Most NIDS use a set of rules, called a classifier, that define how to handle network traffic. Given a packet, the system searches the classifier for rules that match the packet. These rules specify how the matched packet should be processed. The Snort intrusion detection system is a NIDS with a publicly available set of rules [1]. The Snort rules consist of two parts: a rule header and a set of rule options (signature). The rule header is a filter and an action. We use the filter to determine if a given packet matches the rule header. For a given packet and rule, if the packet header matches the rule header’s filter and the packet payload matches the rule options, then we apply the rule’s action to the packet. Our focus in this thesis is on the first step in the above classification procedure. We are only concerned with matching packet headers to rule headers; we do not worry about matching the packet payload to the rule options. By design, the Snort rules are in all-match (also called multi-match) semantics. This means that a packet can match multiple rule headers. We define the all-match or multi-match optimization problem as the problem of finding all rules in a given classifier that match a given packet. One solution to the all-match optimization problem is to convert a given all-match classifier to a first-match classifier, which have been extensively studied. A first-match classifier is a classifier in which we only return the first (or highest priority) rule matching a given packet. The conversion process takes our all-match classifier and creates a semantically equivalent first-match classifier such that we can find all matching rules for any packet. Given a first-match classifier, our challenge is the first-match optimization problem, in which we want to minimize the size of the classifier. There are many solutions to the first1 match optimization problem, including the very popular solution of using ternary content addressable memory (TCAM) to store the classifier. TCAM chips are widely used because they provide a constant time lookup for all packets. However, TCAM suffers from several drawbacks: high cost, high power consumption, and limited size. In the end, we are left with two approaches when converting an all-match classifier into a first-match classifier: (1) We can use all-match optimization techniques like Negation Removal and the Set Splitting Algorithm (SSA) to reduce the size of the resulting first-match classifier. (2) We can simply convert the all-match classifier into a first-match classifier and then apply first-match optimization algorithms to reduce the size of the classifier. We begin with a short discussion of ternary content addressable memory and follow with an overview of the Snort network intrusion detection system. 1.1 Ternary Content Addressable Memory Hardware based packet classification using ternary content addressable memory (TCAM) has become the industry standard for high performance firewalls and Internet routers [6]. Each TCAM entry holds one rule, and each cell of a TCAM entry can take one of three states: 0, 1, or * — a special “don’t care” state that allows a bit to be either a 0 or 1. Given a packet as a search key, the underlying TCAM circuitry search all entries in parallel and return the index of the first-matching (or highest priority) entry. For example, the TCAM shown in figure 1.1 searches all entries in parallel for the key 10100. We can see from the highlighted rows that the search key matches two entries; however, only the index of the first matching entry will be returned. While providing a deterministic lookup time for a given packet [18], TCAM suffers from 2 1 key 0 1 0 0 0 0 * 0 0 1 * 1 * * 1 0 1 0 * 0 1 1 * 0 TCAM entries Figure 1.1: Using a TCAM to search for a given key in parallel. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis several well known disadvantages. TCAM chips are much more expensive to manufacture, have smaller capacities, and consume significantly more power than a comparable amount SRAM [18]. The complex circuitry that allows TCAM to perform a parallel search is not only costly, but also uses additional board space and consumes more power. Similarly while TCAM chips of up to 72 Mbit have been produced, smaller TCAM chips are more widely used due to the high cost. Another issue faced by TCAM is the extensively studied range expansion problem. While prefixes (discussed in detail in a section 4.3) can encode the IP address and protocol fields for classifiers, ranges given for the port fields usually cannot map directly into one TCAM entry. Thus TCAM chips need to use multiple entries to encode a single range. A b-bit field needs to be split into at most 2(b − 1) separate entries [18]. Therefore, each 16-bit port field can require up to 30 TCAM entries in the worst case. In addition, if a rule has a range for both port fields, it can require up to 30x30 = 900 TCAM entries. Many strategies have been developed to deal with the range expansion problem [2, 3, 8, 12, 17, 14, 16]. A third challenge TCAM chips have is the presence of negated rules. A negated rule is a rule that has a field that is negated, for example TCP 74.12.200.100 43 → 10.1.0.1 !43. Consider the negated destination port !43. In binary, 43 = 0000 0000 0010 1011 3 1*** *1** **1* ***1 **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** 1*** *1** **1* ***1 **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** 1*** *1** **0* ***1 **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** 0*** *1** **0* ***0 Figure 1.2: Na¨ negation removal of the port !43 by flipping bits. ıve and the na¨ approach (see figure 1.2) for putting this negated field in TCAM would create ıve sixteen entries, flipping one bit in each and replacing the rest with a “don’t care” value. Thankfully there are more intelligent solutions to the problem of negated fields, which we will discuss this topic at length in section 4.2.1. 1.2 Snort Rules Snort is an open source rule-based network intrusion detection and prevention system that can perform real-time packet analysis including content searching and protocol analysis. While Snort can perform simpler operations like packet logging and traffic sniffing, we only concern ourselves with Snort’s network intrusion detection mode [1]. As briefly mentioned above, the Snort system is a rule-based system that relies on signature detection for packet classification. To Snort, both the packet header and the packet payload (the data contained in the packet) are important elements. Snort inspects the packet 4 payload for any packet whose header matches a Snort rule header. For this, Snort searches the packet’s data according to the options specified in the corresponding rule. There are two aspects to this system that are worth noting. First, an arbitrary packet can match many distinct Snort rule headers. Second, each Snort rule header may correspond to several signatures. Rule 1 Rule 2 Rule 3 Filter: Options: Action: Filter: Options: Action: Filter: Options: Action: TCP $EXTERNAL NET any → $SQL SERVERS 3306 content:"|0A 00 00 01 85 04 00 00 80|root|00|"; alert msg:"MYSQL root login attempt" TCP $EXTERNAL NET any → $SQL SERVERS 3306 content:"|0F 00 00 00 03|show databases"; alert msg:"MYSQL show databases attempt" TCP $EXTERNAL NET 21 → $HOME NET any isdataat:1023; pcre:"/\d{3}\s+[^\n]{1019}/smi"; alert msg:"FTP Microsoft Internet Explorer FTP Response Parsing Memory Corruption" Figure 1.3: Example Snort rules with Filter, options, and actions. For example, consider the three Snort rules shown in figure 1.3. Any packet over the TCP protocol originating from any port outside the network and destined for an SQL server on port 3306 (any packet matching TCP $EXTERNAL NET any → $SQL SERVERS 3306) will match the header of the first two rules; thus, such packet payloads will be searched for the signatures in both rules. Any incoming packet (any packet going from $EXTERNAL NET to $HOME NET) on port 21 will match the header of the third rule; thus such packet payloads will be searched for the third rule’s signature. Furthermore, given that $SQL SERVERS is defined as a subset of $HOME NET, suppose that an incoming packet matched the following fields: TCP $EXTERNAL NET 21 → $SQL SERVERS 3306. This packet, would then match all three headers; thus, its payload would have to be checked for all three signatures. Admittedly, this is a contrived example because it is not 5 likely that a MySQL server will be accepting FTP connections on port 3306. However, the indicated rules illustrate the major point well: a packet may match many headers and the packet’s payload needs to be inspected accordingly. 1.3 Problem Overview From the preceding discussions of TCAM and the Snort system, we summarize the main challenge we address in this thesis. We wish to construct the smallest possible TCAM classifier that returns all rules that match a given packet header. We will state this problem formally in chapter 3. 1.4 Contributions To solve the all-match optimization problem using TCAM chips, we implement previous solutions (all-match to first-match conversion with negation removal [22] and SSA [23]) in conjunction with additional first-match optimizations: TCAM Razor [11], Topological Transformation [14], and TCAM SPliT [15]. The objective of this thesis is to evaluate two approaches to the all-match optimization problem. (1) Evaluate the performance of optimization algorithms designed specifically for all-match classifiers. (2) Convert an allmatch classifier to a first-match classifier and evaluate general purpose first-match classifier optimization algorithms. We evaluate the performance and efficiency of various combinations of these algorithms by testing them on three different Snort rules. When using only one TCAM, our results show that TCAM Razor outperforms Negation Removal with TCAM space savings of 70.85%. When using multiple TCAM chips, it is best to use SSA to reduce the all-match classifier. 6 Stated another way, with one TCAM we should use general purpose first-match classifier optimization algorithms, and with multiple TCAM chips, we should use all-match specific optimization algorithms. The remainder of this thesis proceeds as follows. In chapter 2 we discuss related work for the all-match optimization problem and first-match classifier optimization. In chapter 3 we define our notation and present a formal statement of the all-match optimization problem. Chapter 4 outlines technical details of our experiments. Chapter 5 presents our simulation results, and chapter 6 offers conclusions. 7 Chapter 2: Related Work In this chapter we review the relevant prior work in both all-match and first-match optimization. First we review the prior art of all-match optimization. 2.1 All-Match Optimization While there is a surprisingly small volume of work in the area of all-match packet classification, the related work can roughly be split into two categories: (1) FPGA-based solutions and (2) TCAM-based solutions. FPGA-based solutions use field programmable gate arrays (FPGAs) to implement allmatch classification methods. A review of the work on all-match classification using FPGAs yields a fair body of research. However, we prefer TCAM-based solutions because many network processors already use TCAM chips for first-match classification. As discussed earlier, TCAM is the industry standard for high performance devices (such as firewalls and routers) because TCAM chips provide line rate (or near line rate) classification that software solutions cannot match [20]. The current state of the art in TCAM-based all-match packet classification is the work by Yu et al. presented in [22] and [23]. In addition TCAM-based solutions have been proposed in [9] and [21]. The solution presented in [9] is intended to scale to larger databases as it requires no additional TCAM entries. However, this approach requires significantly more TCAM lookups to find all matching results. The work of [21] encodes classifier rules and performs both multiple one-dimensional TCAM lookups and hash table accesses. The geometric approach in [22] transforms an all-match classifier into a first-match clas8 sifier which can be stored in TCAM such that with one additional SRAM lookup, we can obtain all matching rules for a given packet. However, this solution can drastically increase the number of rules that need to be stored. Yu et al. extend the work of [22] with [23], in which the resulting first-match classifier is split into two or four sets to reduce the number of additional rules that need to be stored. This approach requires only a handful of extra TCAM entries at the cost of requiring two or four lookups to classify each packet. We build upon these two all-match optimization techniques in the rest of this thesis. 2.2 First-Match Optimization First-match packet classification has been extensively studied in the literature (See [6, 20] for a general overview). In the broadest sense, packet classification can be split into two categories, algorithmic and architectural. The industry standard TCAM-based architectural approach resulted from the inability of algorithmic solutions to provide fast enough performance. However due to the limitations of TCAM, combinations of algorithmic and architectural solutions are becoming more common [20]. Nevertheless, this thesis is not intended to be a comprehensive survey of packet classification algorithms. For this thesis, we apply several state-of-the-art TCAM-based first-match classifier optimization algorithms to reduce the size of a first-match classifier generated from an all-match classifier. We focus on the following three approaches: TCAM Razor [11], Topological Transformation [14], and TCAM SPliT [15]. We describe these methods in more detail in section 4.3. 9 Chapter 3: Definitions and Problem Statement As we mention in the introduction, firewalls and Internet routers classify packets based on sets of rules called classifiers. In this chapter we formally define intervals, fields, rules, packets, and classifiers (both all-match and first-match classifiers). Then we formally state the problem we study in this thesis. These definitions provide a formal basis to discuss the all-match optimization problem throughout the rest of this thesis. 3.1 Definitions We define a classifier as a set of rules C. Each rule r ∈ C is in turn defined as a set of fields that is associated with an action or decision. The decision for r defines the result imposed upon packets matching r by C. Let ri be the ith rule of classifier C. Then we have that d(ri ) is the action of C on ri . In general a rule has d fields, denoted as Fi for 1 ≤ i ≤ d, and each rule is said to be a d-tuple, r = (ri , ..., rd ) over fields F1 , ..., Fd . We say that a b-bit field Fi is defined over the domain D(Fi ) = [0, 2b − 1]. For convenience, we denote the ith field of rule r with the notation r[i]. Typically r[i] = [j, k] is an interval 1 such that [j, k] ∈ D(Fi ). Finally, we say that a classifier containing rules with d fields is a d-dimensional classifier. In practice, five commonly used fields are an 8-bit protocol, 32-bit source and destination IPv4 addresses, and 16-bit source and destination ports. We note that as IPv6 becomes more common, the 32-bit IPv4 address fields will become 128-bit fields. This will also increase the difficulty of packet classification as it will greatly increase the width of each rule. 1 When working with the Snort rules, we relax our notion of an interval such that the field of a rule can be a set of intervals such that each interval in the set if in the domain of the field. 10 For classification, we say that a packet matches a rule if each field of the packet matches the corresponding field of the rule. More formally, a packet p is a d-tuple (p1 , ..., pd ) over fields F1 , ..., Fd . We overload the notation we use with rules for packets and say that field Fi of packet p is p[i] for 1 ≤ i ≤ d. The formal difference between a rule r and a packet p is that p[i] = v for 1 ≤ i ≤ d where v is some discrete value in D(Fi ). Packet p is said to match rule r (denoted as p ⊆ r) if p[i] = v ∈ [j, k] = r[i] for 1 ≤ i ≤ d. If p ⊆ r, we set the decision of the packet, d(p) = d(r). Let C(p) be the procedure through which C classifies p. Further let C(p) = {r1 , ..., rk } be the set of k rules that p matches sorted in descending order by priority. Formally, we have p ⊆ rj ∀ rj ∈ C(p). Now we define the types of classifiers we consider in this thesis. First let us note that for any classifier C we may have some packet p such that p ⊆ ri and p ⊆ rj for rules {ri , rj } ∈ C where ri = rj . That is, in any classifier there may exist a packet that matches more than one rule of the classifier. Let ri and rj be two distinct rules matched by p. Without loss of generality, let us say that ri has a higher priority than rj — which in practice implies that ri appears before rj in C. We say that C is a first-match classifier if for any packet p, C(p) is exactly the set {ri }, the highest priority rule matching p. Then, |C(p)| = 1 for all packets p. On the other hand, we say that C is an all-match classifier if C(p) is the set {r|p ⊆ r}. Typically, |C(p)| > 1 for most packets. 3.2 Problem Statement Commonly referred to as the multi-match or all-match optimization problem, our goal is to transform an all-match classifier into the smallest equivalent first-match classifier. More specifically, given a classifier C, we must generate a first-match classifier C such that for any 11 packet p, we obtain all matching rules of p in C by computing C (p). Since C is an all-match classifier, C(p) = {ri , ..., rk }. Likewise, because C is a first-match classifier, C (p) = {ri }. Therefore our task is to encode d(ri ) so that we can compute {ri , ..., rk } given d(ri ). 12 Chapter 4: Algorithms In this chapter, we provide an overview of all algorithms we evaluate. We begin with the all-match to first-match conversion algorithm from [22]. This algorithm is the basis of our work, as it allows us to create a first-match classifier FM from an all-match classifier AM. Following this discussion, we present two all-match specific optimizations: Negation Removal and the Set Splitting Algorithm (SSA). Negation removal allows us to reduce the number of rules required to represent negated fields in TCAM, while SSA enables us to reduce the number of intersection rules we need to store in TCAM. After the sections on all-match algorithms, we present the first-match algorithms we evaluate. We begin by presenting the concepts of prefixes and firewall decision diagrams. Then we review three first-match optimization algorithms: TCAM Razor, Topological Transformation, and TCAM SPliT. 4.1 All-Match to First-Match Conversion The initial step in solving the all-match optimization problem is to convert the given allmatch classifier, AM, into an equivalent first-match classifier, FM. We follow the algorithm presented by Yu et al [22] to transform AM into FM. Before we discuss the relationships defined for rules, we explain the concept of a rule’s matchlist and discuss how we use this concept with a first-match classifier. With each rule ri ∈ FM we associate a matchlist, denoted m(ri ), that enumerates the rules, rj ∈ AM, that packets matching ri in FM will match in AM. More precisely, the matchlist m(ri ) for rule ri ∈ FM defines a subset of rules from AM such that for any packet p: if p ⊆ ri , then p ⊆ rj ∀ rj ∈ m(ri ). 13 The all-match to first-match algorithm works as follows: (1) Create FM from AM and store FM in TCAM. (2) Compute a matchlist for every ri ∈ FM and store all matchlists in SRAM (3) For ri ∈ FM, assign d(ri ) = k where k is the SRAM index of m(ri ). Then to compute AM(p) for any packet, we perform a TCAM lookup to get FM(p) = d(p). Then we perform an SRAM lookup of d(p) to obtain the matchlist for p. Now we detail the possible relationships rules can have. These relationships are the foundation that we will use to compute FM from AM. Between any pair of rules ri and rj in AM, there exists a relationship (defined over all packets) that satisfies one of the following: 1. ri ∩ rj = ∅. We say that ri and rj are exclusive. If two rules are exclusive, then no packet will match both rules and we can place the rules in any order relative to each other in FM. 2. ri ⊆ rj . If ri is a subset of rj , then we have that any packet that matches ri will also match rj . Thus, we should place ri before rj in FM and add j to m(ri ). 3. ri ∩rj = ∅ (such that ri ⊆ rj and rj ⊆ ri ). If ri and rj have a non-empty intersection, then we need to create rule rk = ri ∩ rj , and place rk before ri and rj in FM, and add m(ri ) and m(rj ) to m(rk ). Let us explicitly state how we use the preceding relationships to convert AM to FM. Initially FM is an empty classifier and we sequentially insert each rule r ∈ AM into FM leveraging these relationships. Let ri be the current rule from AM we are inserting. First we make m(ri ) = (i). Then for and each existing rule of FM, re , we take one of the following actions based on the relationship between ri and re : 1. If ri and re are exclusive, we take no action and proceed to the next rule in FM. 14 2. If ri is a subset of re , we insert ri before re , append m(re ) to m(ri ), and proceed to the next ri in AM1 . 3. If re is a subset of ri , we append m(ri ) to m(re ) and proceed to the next re in FM. 4. If ri and re have a non-empty intersection, we create a new rule rk = ri ∩ re , insert rk before re , append m(re ) to m(rk ), append m(ri ) to m(rk ), and proceed to the next re in FM. The conversion process from in AM to in FM will produce many additional rules from step 4. We refer to these rules as intersection rules. After transforming AM to FM, we must clean up FM to prepare FM for further reduction. First, the rules in FM may contain fields with discontinuous ranges. However, the first-match algorithms we use require that each field of a rule is defined over a continuous range. Thus, we split each rule that has discontinuous field ranges into multiple rules. Figure 4.1 illustrates the rule splitting process. TCP $HOME NET !50 → $EXTERNAL NET [0:100] ↓ TCP $HOME NET [0:49,51:65535] → $EXTERNAL NET [0:100] ↓ TCP $HOME NET [0:49] → $EXTERNAL NET [0:100] TCP $HOME NET [51:65535] → $EXTERNAL NET [0:100] Figure 4.1: Splitting discontinuous rule ranges. Second, the rules in FM have a matchlist as the decision, but the first-match classifier optimization algorithms we apply require that each rule has a single integer action. Thus, we map each distinct matchlist to a distinct integer. This mapping must be maintained so that after optimizing FM, we can perform the SRAM lookup to find the corresponding AM. 1 Yu et al. [22] present a proof of why we can skip the rest of the existing rules. 15 No. 1 2 3 1 2 3 Rule TCP $HOME NET !50 → $HOME NET [0:100] TCP $HOME NET 50 → $HOME NET 75 TCP $HOME NET any → $HOME NET 150 ↓ TCP $HOME NET any → $HOME NET [0:100] TCP $HOME NET 50 → $HOME NET 75 TCP $HOME NET any → $HOME NET 150 Figure 4.2: Simple application of negation removal losing classifier semantics. 4.2 All-Match Optimization When we transform AM into FM, we can apply existing all-match optimization schemes designed specifically to minimize the resulting first-match classifier. As we discussed previously, representing negated fields in TCAM leads to a large growth in the number of TCAM entries. To mitigate this problem, we use a negation removal algorithm during the conversion process [22]. In addition, the first-match conversion process generates many additional intersection rules. We can limit the number of intersection rules by applying the SSA algorithm [23]. 4.2.1 Negation Removal There is no direct way of mapping a negated field into a single TCAM entry [22]. However, there is a scheme that we can use to replace negated fields with the any keyword. However as figure 4.2 illustrates, thoughtless application of negation removal would destroy the semantics of the classifier. Prior to negation removal, no packet could match both rule 1 and rule 2 because of the source port field. Yet after replacing rule 1’s source port of !50 with any, all packets matching rule 2 will match rule 1. This is a violation we cannot allow. 16 To remove negation and maintain classifier semantics, [22] presents a scheme for grouping rules of a classifier into regions. The regions logically partition the classifier. Each region R is a set of rules with a domain D(R) defined as a set of packets P such that ∀ p ∈ P , ∃ r ∈ R such that p ⊆ r. We want to reorder the rules so that all rules of each region are together and the regions are in decreasing order by number of negated fields. To group rules into regions, the algorithm generates a special rule s (which is called a separator ) for each region such that ∀ p ∈ D(R), p ⊆ s. By definition, each separator rule defines a region. The goal is that we place the separators at the end of their corresponding regions to stop packets that belong to the region from incorrectly matching later rules. Every separator has an empty matchlist, m(s) = ∅. To generate separator rules, we must enumerate all negated fields in the classifier. Toward this end, we create d sets of negated fields; one for each dimension of our classifier. For F1 , ..., Fd we have a set of negated fields Ni for 1 ≤ i ≤ d. For each rule r ∈ AM, if r[i] is negated we add the non-negated form of r[i] to Ni . For example, if F3 has the negated value !45, we add 45 to N3 . After processing all rules, if a set Ni = ∅, we append the any keyword to the set. After we have processed all fields of all rules, we generate the separator set S by taking the d-ary Cartesian product of the d negated field sets. Our simple classifier presented above in figure 4.2 would generate just one separator because there is one distinct negated field in the classifier (!50). See figure 4.3 for a slightly more complex example where we generate three separator rules for two negated fields. We have a negated source port of !80 and a negated destination port of !50, so we store 80 and 50 in the corresponding Ni negated field sets. Then we create the separators shown at the bottom of figure 4.3. After we have generated the separator rules as described, we set FM = S. That is, we 17 Rule TCP 70.10.32.41 !80 → 10.1.0.1 [10:20] TCP 60.11.42.15 65 → 10.1.0.2 !50 ↓ Field Value Source port 80 Destination port 50 ↓ Separators any any 80 → any 50 any any 80 → any any any any any → any 50 Figure 4.3: Example of negation removal generating three separators for two negated fields. initialize our first-match classifier to be the set of separators. Finally, we can apply the reduction of AM to FM as described in the preceding section. Our separators ensure that each rule in AM is inserted into the appropriate region. After we have transformed AM to FM we replace all negated fields in FM with the any keyword [22]. 4.2.2 SSA As noted in [22, 9, 23], the geometric all-match to first-match conversion used above generates many intersection rules which greatly increases the size of the resulting classifier. The Set Splitting Algorithm (SSA) of [23] reduces the number of intersection rules that we must store for the classifier. We now provide a brief overview of SSA. SSA is an approach that seeks to minimize the number of additional intersection rules that need to be stored by splitting the classifier into multiple logically distinct sets. The key observation is that if we store the rules that compose an intersection in the different sets, we no longer need the intersection [23]. However, the cost of this solution is that we require two or four TCAM chips with 104-bit wide entries. 18 Thus, the problem is reduced to finding the best way to split the rules of AM into multiple sets. The ideal solution would split the rules such that any two rules that intersect are placed into different sets. In such as case, no new rules would be created. However, such a solution may not exist for all classifiers. In general, is equivalent to the NP-hard maximum set splitting or maximum hypergraph cut problem [4]. Given the problem is NP-hard, SSA uses Johnson’s approximation algorithm [7] for the maximum satisfiability problem to split the classifier. We leave out the underlying details of Johnson’s algorithm and only provide an overview of SSA. Nevertheless, let us restate the maximum satisfiability problem as presented by Yu et al. [23]. A literal xi , is a variable (a positive literal ) or the negation of a variable (a negative literal ) to which we can assign either true or false. We are given a set of n literal pairs: L = {{x1 , ¬x1 }, {x2 , ¬x2 }, ..., {xn , ¬xn }}. A clause is the disjunction of some set of literals. For example, we may have a clause C = x1 ∨ ¬x5 ∨ ¬x8 . By definition, a clause evaluates to true if any of its literals evaluate to true. For our problem, we are given m clauses. The task at hand is to assign values to the literals in L such that we satisfy as many clauses as possible. We assume that we have applied the all-match to first-match reduction (with or without negation removal) and have the two classifiers AM and FM. Let I ⊆ F be the subset of FM containing all intersection rules generated. Furthermore, let R = FM − I be the subset of rules that are not intersection rules. Let |R| = n and |I| = m. SSA proceeds as follows. For each rule ri ∈ R, we create two literals: xi and ¬xi . Next we create two clauses for each intersection I ∈ I. Let I be an intersection of y rules: I = r1 ∩ ... ∩ ry . Then, the clauses we generate are CI1 = {r1 ∨ ... ∨ ry } and CI2 = {¬r1 ∨ ... ∨ ¬ry } [23]. After we create the literals and clauses, we run Johnson’s algorithm which assigns a value to each 19 Rule # 1 2 3 Rule Literals TCP $EXTERNAL NET [10:30] → $HOME NET [10:30] x1 , ¬x1 TCP $EXTERNAL NET [15:35] → $HOME NET [15:35] x2 , ¬x2 TCP $EXTERNAL NET any → $HOME NET [5:20] x3 , ¬x3 ↓ Intersections Rules Clauses (x1 ∨ x2 ∨ x3 ) TCP $EXTERNAL NET [15:30] → $HOME NET [15:20] 1,2,3 (¬x1 ∨¬x2 ∨¬x3 ) (x1 ∨ x2 ) TCP $EXTERNAL NET [15:30] → $HOME NET [15:30] 1,2 (¬x1 ∨ ¬x2 ) 1,3 (x1 ∨ x3 ) TCP $EXTERNAL NET [10:30] → $HOME NET [10:20] (¬x1 ∨ ¬x3 ) (x2 ∨ x3 ) TCP $EXTERNAL NET [15:35] → $HOME NET [15:20] 2,3 (¬x2 ∨ ¬x3 ) ↓ Literal Assignments x1 False x2 True x3 False ↓ SSA Set 1 SSA Set 2 Rule 2 Intersection of Rule 1 and Rule 3 Rule 1 Rule 3 Figure 4.4: Example of SSA splitting a set of rules into two sets. literal. Finally we split R into two sets based on the literal assignments. If xi is true, we assign ri to set one; otherwise we assign ri to set two. Yu et al. [23] show that we only need to store an intersection, I ⊆ I, if either CI1 or CI2 is unsatisfied. Figure 4.4 illustrates SSA on a set of three rules that have four intersections. SSA splits the seven rules into two sets with four total rules. 4.3 First-Match Classifier Optimization After performing all-match to first-match conversion and processing our first-match classifier to be ready for additional minimization, we evaluated the following first-match optimization 20 algorithms: (1) TCAM Razor, (2) Topological Transformation, and (3) TCAM SPliT. Before we present an overview of each algorithm, we discuss two important topics: prefixes and firewall decision diagrams (FDD). The first-match algorithms we apply work almost exclusively with prefixes. A b-bit prefix represents an integer interval and has k leading 1’s or 0’s followed by (b − k) ∗’s . That is, a prefix has the form W {∗}(b−k) and represents the interval [W {0}b−k , W {1}b−k ] where W ∈ {0, 1}k . For example, the prefix, 101** represents the interval [10100, 10111] or in decimal, [20, 23] 2 . When storing rules in TCAM we must convert ranges into prefixes, which leads to the previously discussed range expansion problem. For example, the range [2, 10] converts to the following 3 -bit prefixes, 01* and 1**. All of the first-match algorithms we use convert the given classifier into a firewall decision diagram [10], which is a canonical representation of a classifier. A firewall decision diagram (FDD) represents a classifier and the set of decisions (denoted DS) of the classifier. An FDD is a directed, acyclic graph that has both node and edge labels. All FDDs must satisfy the following properties: Rule # 1 2 3 4 5 F1 10* 10* 10* 0** 1** F2 10* 11* 0** *** *** Decision discard accept accept discard discard Table 4.1: Two-dimensional classifier where each field has domain [0, 7]. 2 When working with an interval from a to b for first-match classifiers we use the common notation [a, b], but when discussing Snort rules we use the Snort notation [a : b]. 21 1. There exists exactly one root node that has no incoming edges. We refer to all nodes with no outgoing edges as terminals, and all other nodes are non-terminals. 2. F (v) is the label of node v: (a) F (v) ∈ {F1 , ..., Fd } if v is a non-terminal node. (b) F (v) ∈ DS if v is a terminal node. 3. I(e) is a non-empty set of labels for edge e : u → v such that I(e) ⊆ D(F (u)). That is, I(e) is a subset of the domain of the field corresponding to node u. 4. A path from the root to a terminal is called a decision path. All nodes on a decision path have a distinct label. 5. Two properties hold for E(v), the set of outgoing edges of node v: (a) I(e1 ) ∩ I(e2 ) = ∅ ∀ (e1 , e2 ) ∈ E(v). A prefix can match only one outgoing edge of v. This is called consistency. (b) e∈E(v) I(e) = D(F (v)). The union of the labels for all outgoing edges of node v cover the domain of the field associated with v. This property is called completeness. Let us illustrate the concept of an FDD with the classifier shown in table 4.1 and figure 4.5. The two-dimensional classifier is over fields F1 and F2 with D(Fi ) = [0, 7] for 1 ≤ i ≤ 2. We can see how the FDD properties apply from the table and the figure. The following discussions present an overview of each of the first-match optimization algorithms we use. After we construct an FDD for the classifier, we use an FDD reduction algorithm to make the FDD smaller, which reduces the number of rules the FDD generates. FDD reduction ensures three properties of the FDD: (1) no two nodes are isomorphic, (2) any pair of nodes 22 has at most one edge between them, and (3) no node has exactly one outgoing edge [5]. For example, we can reduce the FDD in figure 4.5 by removing the non-terminal right-child of the root, and instead have the outgoing edge go directly to the terminal node. It is important to realize that the order in which we use the fields to build an FDD changes the FDD generated. This is an important consideration for two of the first-match classifier optimization algorithms that we use: TCAM Razor and TCAM SPliT. To maximize performance of these two algorithms we must try all permutations (orderings) of the fields when minimizing a classifier. F1 10* F2 10* d 11* 0** a 0** 11* F2 *** d Figure 4.5: FDD corresponding to the classifier shown in table 4.1 with decisions “a” for accept and “d” for discard. 4.3.1 TCAM Razor In [11], Liu et al. build on the dynamic programming solution presented in [19], with which Liu et al. solve the weighted one-dimensional TCAM minimization problem. The optimal one-dimensional solution is then extended to the multi-dimensional case to find a greedy solution for each dimension. Due to its greedy nature, Razor does not guarantee a globally optimal solution, but it is one of the best first-match minimization algorithms available. TCAM Razor is classified as an equivalent transformation algorithm because it produces a 23 v1 101 110 F1 101 101 101 110 110 110 111 0** F2 11* 10* 0** 11* 10* 0** *** *** Decision d a a d a a d d v2 10* 0** F1 F2 a 111 0** F2 11* d v3 *** d Figure 4.6: Two dimensional classifier with decisions and its associated FDD. semantically equivalent, but smaller, classifier [11]. Given a classifier C with d fields, F1 , ..., Fd , Razor converts the classifier into a firewall decision diagram (FDD). Before minimizing the classifier, Razor next reduces the FDD. Following FDD creation and reduction, Razor builds a classifier C that is semantically equivalent to C in a bottom-up fashion by repeatedly applying the optimal one-dimensional solution to obtain a solution for each field. These solutions are combined to give an overall solution. Finally, Razor applies a redundancy removal algorithm to C to remove redundant rules. Consider the two-dimensional classifier and the associated FDD shown in 4.6. For Razor, we use the one-dimensional minimization algorithm on both subgraphs rooted at nodes v2 and v3 to get the TCAM tables shown in 4.7. Next we consider v2 and v3 to be decisions and process v1 as if it were a one-dimensional node and we obtain the resulting TCAM table in 4.8. Finally we put the three tables together to obtain to the final TCAM table in 4.9. The resulting TCAM table has four rules while the original classifier had eight rules. 24 4.3.2 Topological Transformation Topological Transformation [14] is a range encoding scheme for minimizing first-match classifiers. Unlike other range encoding schemes, Topological Transformation uses rule decisions when minimizing the classifier. The algorithm’s key observation is that many values for each field are equivalent; two values are said to be equivalent if they can be interchanged without changing the decision for a packet. A classifier partitions each field into equivalence classes such that all values with the same decision belong to the same class. The goal is to reduce the number and size of TCAM entries by eliminating equivalent values. Like TCAM Razor, Topological Transformation first converts a classifier to an FDD. Then we apply two reduction techniques, domain compression and prefix alignment. Domain compression attempts to make the domain of each field, D(Fi ), as small as possible, while prefix alignment modifies the domain of each field so that we can convert ranges to prefixes with less expansion [14]. Domain compression uses three steps: (1) first we compute the equivalence classes for each field, (2) we build transformers for each field, (3) we apply the field transformers to create the re-encoded classifier [14]. For a d-dimensional classifier, we compute the equivalence classes for each field Fi by first creating an FDD, fi , rooted at Fi for 1 ≤ i ≤ d. Let u be the root of the FDD fi and let u have outgoing edges e1 , ..., em . Meiner’s et al. [14] observe that the labels of these outgoing edges form the equivalence classes for Fi . Each label is associated with several ranges, but F2 11* *** v2 Decision d a F2 *** v3 Decision d Figure 4.7: One-dimensional solutions for nodes v2 and v3 . 25 F1 111 1** *** v1 Decision v3 v2 v3 F1 111 1** 1** *** Figure 4.8: One-dimensional solution for node v1 with virtual v2 and v3 nodes. F2 *** 11* *** *** Decision d d a d Figure 4.9: Result of TCAM Razor on the classifier from 4.6. all ranges belong to the same equivalence class. Thus we choose the range that intersects the fewest rules in C to represent the equivalence class for each edge ej . Once a range, denoted by Lm , has been chosen for each edge, we put the edges in ascending order by the range values Lm . After we sort the edges, we have the transformer Ti for field Fi , which assigns edge ej the decision j for 1 ≤ j ≤ m. Next we can use these transformers to create the transformed classifier. To create the transformed classifier C for C we must re-encode each rule using the transformers we computed in the preceding step. Let Si = [x, y] be the original range for field Fi . Si = [a, b] where a is arg mina a ∈ [0, m − 1] such that La ∩ Si = ∅ and b is arg maxb b ∈ [0, m − 1] such that Lb ∩ Si = ∅. If a range in C does not have an intersection with any landmark, we do not create a transformed rule in C . That is, for any rule r ∈ C, if ∃ a field Fi where r[i] has an empty intersection with all landmark ranges in Fi , then we do not create the transformed rule r in C . The second optimization that Topological Transformation provides is with prefix alignment. Prefix alignment again re-encodes the domain of each field, but the goal now is to change the field domains to be such that ranges can be encoded with fewer prefixes. Domain compression is illustrated in figures 4.10 to 4.13. Figure 4.10 presents a twodimensional classifier we want to perform domain compression on. Figure 4.11 shows the 26 F1 [1, 9] [29, 31] [0, 31] [12, 17] [0, 30] [0, 31] F2 [2, 26] [2, 26] [0, 19] [19, 28] [4, 27] [0, 31] Decision d d a d a a F1 [0, 0] [1, 9] [10, 11] [12, 17] Figure 4.10: Two dimensional classifier expressed with ranges. F1 [0, 0] [1, 9] [10, 11] [12, 17] [18, 28] [29, 31] Decision 0 1 2 3 2 1 Figure 4.11: Landmarks identified by Topological Transformation for field F1 . F1 [1, 1] [1, 1] [0, 3] [3, 3] [0, 3] [0, 3] Figure 4.12: Transformer for field F1 . F2 [0, 1] [0, 1] [0, 2] [0, 1] [0, 1] [0, 2] Decision d d a d a a Figure 4.13: Transformed classifier. landmark ranges identified after Topological Transform creates an FDD rooted at field F1 . Figure 4.12 lists the transformer for the ranges in field F1 . Figure 4.13 shows the transformed classifier, which clearly contains redundant rules. The crossed-out rule in figure 4.13 indicates that the rule would not be generated because [29, 31] does not intersect any landmark ranges for F1 . 4.3.3 TCAM SPliT TCAM SPliT [15] is a first-match optimization algorithm that splits a classifier into several smaller classifiers. SPliT is intended for use in an environment where multiple TCAM chips can be pipelined to perform packet classification. The observation exploited by SPliT is that higher dimensional classifiers suffer from the multiplicative effect where rules interact badly and cause additional TCAM entires. By splitting a classifier into multiple lower dimensional 27 classifiers, SPliT avoids the cost of the multiplicative effect [15]. SPliT first converts a classifier C into an FDD. Then we reduce the FDD by merging all isomorphic subgraphs [14]. Following this setup phase, SPliT applies a divide and conquer approach to create several smaller classifiers from C. The two steps SPliT uses to create smaller classifiers are: (1) node table generation, and (2) field table generation. During node table generation, SPliT creates a one-dimensional TCAM table for each nonterminal in the FDD. Each of these tables has one rule per prefix, per outgoing edge. The decision SPliT assigns each of these rules is the ID of the node to which the outgoing edge goes to. Field table generation combines the node tables for each field we created in the previous step into one TCAM table. This involves three steps. First we assign a unique node ID to each of the node tables. Next we prepend the ID for each node table to all the rules contained in the table. Lastly, SPliT combines all tables for each field into one TCAM table. We can optimize field table generation by using shadow encoding when combining the tables for field Fi [13]. While FDD reduction combines isomorphic nodes together, there are often nodes that are not isomorphic, but similar. Shadow encoding leverages the similarity among nodes during field table generation using two key observations: we have free reign over what table IDs to assign and we can use ternary strings. Shadow encoding allows us to remove similar entries from one node table and use a ternary string for the table ID of another table to match both tables, thus reducing the overall size of the field table [15]. In the end, SPliT produces multiple classifiers of smaller dimension than C. Typically, SPliT will produce either d one-dimensional classifiers or two classifiers, one with k dimensions and the other with (d − k) dimensions. TCAM SPliT is a state-of-the-art first match optimization algorithm when multiple TCAM chips can be used [15]. 28 v1 F1 101 v2 *** F2 01* F2 v3 F2 111 0** 110 a 100 00* 11* v4 0** 1** d a d d Figure 4.14: FDD for node and field table generation. We assign nodes v2 , v3 , and v4 node IDs 00, 01, and 11, respectively. Then we can create the field table for F2 . We prepend the node ID of a node table to its rules when we add them to the field table. To illustrate shadow encoding, consider the field table composed of two node tables (t1 and t2 ) shown in figure 4.16. We can see that t1 and t2 are identical except that t1 has one additional rule. We let t1 defer to t2 and we get the shadow encoded field table in 4.16 4.4 Verification Our work would be pointless without verification that the first-match classifier resulting from the all-match to first-match conversion was actually equivalent to the original allmatch classifier. To our knowledge there is no algorithm short of brute force to confirm that an all-match classifier AM and a first-match classifier FM are equivalent. We verified classifier equivalence by generating a comprehensive set of packets and verifying both classifiers produced the same result for each packet. We generate two sets of packets InputPackets and OutputPackets based on AM and 29 v1 node 101 01* *** v2 node table *** d table 00 01 11 v3 node table 110 a *** d v4 node table 0** a *** d F2 Field table 00 *** d 01 110 a 01 *** d 11 0** a 11 *** d Figure 4.15: Node tables for F1 and F2 , and field table for F2 . t1 t2 00 00 00 01 01 110 100 *** 110 *** a a t1 d → t2 a d 00 100 0* 110 0* *** a a d Figure 4.16: Field table and shadow encoded field table for F2 . FM, respectively. However, since the packet generation procedure is the same for both classifiers, let us focus on the packet generation for AM. Recall that each field in AM can consist of several discontinuous intervals and the ith field of rule r is r[i]. For each interval int = (s, e) of r[i] for each rule r ∈ AM, generate four packets pj for 1 ≤ j ≤ 4 as follows: (1) pj [k] ∈ Fk ∀ k = i. (2) pj [i] uses the values (s − 1, s, e, e + 1) for 1 ≤ j ≤ 4, respectively. For example, Let r be a rule with a source port field of [25:30,35:40], then we generate eight packets (four for each interval) with source port values: 24, 25, 30, 31, 34, 35, 40, and 41. After using the above method to generate the InputPackets and OutputPackets sets for 30 AM and FM, we cross-classified the packets against both classifiers to ensure that neither classifier made misclassifications. 31 Chapter 5: Experimental Results In this chapter, we present our experimental results. We evaluated three versions of the Snort rules using our two approaches: (1) applying all-match classifier specific optimization algorithms and (2) transforming the given all-match classifier to a first-match classifier and then using state-of-the-art first-match classifier optimization algorithms. When optimizing for a one TCAM environment, our results indicate it is best to perform first-match classifier optimization on a transformed classifier when we perform no all-match specific optimizations. Using TCAM Razor, we had TCAM space savings of nearly 71%. However, when optimizing for a multiple TCAM environment, our results indicate using SSA with Negation Removal is most effective. First we discuss some pertinent implementation details. Then we present the classifiers we used and what effect the all-match to first-match conversion has on them. We end by giving performance and efficiency results for both approaches. 5.1 Methods In our experiments, we used the three Snort rule sets shown in column one of table 5.1. The Snort rules often used values like $HOME NET and $EXTERNAL NET for the source IP and destination IP fields. In our experiments, we set $HOME NET to be a full class A subnet. By default, we defined $EXTERNAL NET to be the negation of $HOME NET. We did test different initializations for $HOME NET, but noticed no significant effect on the results. We implemented the all-match classifier to first-match classifier transformation algorithm, the negation removal algorithm, and the Set Splitting Algorithm (SSA). In addition, we used previous implementations of first-match optimization algorithms TCAM Razor, Topological 32 Transformation, and TCAM SPliT. For TCAM Razor and TCAM SPliT, the order in which we process the fields of the classifier has an impact on the results. We evaluated all permutations of field ordering and have reported the results for the best ordering for each algorithm. To compare classifier efficiency, we averaged the running times of the five best permutations of field ordering for each classifier and algorithm. While evaluating first-match optimizations for our classifiers, we considered different values for the width of each TCAM entry. We evaluated using a perfect TCAM width of 104 bits, where each TCAM entry is exactly as wide as each rule, as well as multiples of 36 and 40bit TCAM entry widths. Specifically, we focused on TCAM widths of 104, 144, and 160 bits. The 104 bit perfect TCAM width is the sum of the width of each field; we assume the protocol is an 8 bit field, the source and destination IP fields are 32 bits each, and the source and destination port fields are 16 bits each. While having a perfect width TCAM entry in practice is unrealistic, it provides a lower bound on the size of our classifiers. Negation removal, SSA, and TCAM Razor all use 144 or 160 bit wide TCAM entries because each rule is 104 bits wide. Thus, we are wasting 40 or 56 bits per entry, respectively. For TCAM SPliT and Topological Transformation transformers, we also considered TCAM widths of 36, 40, 72, and 80 bits since each TCAM entry only encodes one field plus a table ID. To evaluate the effectiveness of each algorithm, we compute the TCAM space saved compared to our baseline result using only negation removal. Specifically, we calculate TCAM space saved as, TCAM Space Saved = 1 − TCAM Size of Optimized Classifier . TCAM Size of Classifier with only Negation Removal 33 Rules 2.9.0.3 2.9.0.5 2.9.1.0 No. Rules 471 552 553 Negation Removal Output Rules Converted Rules 19240 90279 28239 176562 28243 176566 w/o Negation Removal Output Rules Converted Rules 18685 177662 27564 349809 27565 349810 Table 5.1: Number of input and output rules through the all-match to first-match conversion. Rules 2.9.0.3 2.9.0.5 2.9.1.0 Negation Removal TCAM Entries 131043 246276 246282 w/o Negation Removal TCAM Entries 1022184 1939700 1939701 Table 5.2: Number of input and output rules through the all-match to first-match conversion. More precisely, the TCAM space saved results we present are in addition to the inherent savings given in [22] when performing negation removal during all-match to first-match conversion. 5.2 Baseline Results First we discuss how the classifiers expand as we progress through the all-match to first-match conversion process. As we can see in table 5.1, the number of rules increases dramatically when we perform the all-match to first-match conversion. The “Output Rules” columns of table 5.1 indicate how many rules the all-match to first-match algorithm produced using or not using negation removal. The “Converted Rules” columns indicate the number of rules in the each classifier after splitting rules with discontinuous ranges. Table 5.2 shows the number of TCAM entries required to store the converted rules without further optimization. We can compute the size of these classifiers by multiplying by the number of bits used per entry. 34 The all-match to first-match algorithm in [22] increases the size of the classifier. This is quite obvious when comparing “No. Rules” column to the values shown in any of the output columns. In addition, the “Output Rules” are greater when we perform negation removal. This is to be expected because the separator rules we add to maintain classifier semantics will generate new intersection rules with other rules. At first glance it may be puzzling why the “Converted Rules” are greater when we do not perform negation removal, but there is a simple explanation for this phenomenon. When we skip negation removal, the rules with a field of $EXTERNAL NET have discontinuous ranges that we need to split into multiple rules. Since most of the rules have $EXTERNAL NET for one of the IP fields, we effectively double the size of the classifier. 5.2.1 Single TCAM Approach We first evaluated the TCAM Razor first-match classifier optimization algorithm [11]. The percent of TCAM space saved using TCAM Razor on the first-match classifier for Snort 2.9.0.3 was 56.49%, as can be seen in Figure 5.1. On the newer and larger Snort rule sets 2.9.0.5 and 2.9.1.0, Razor achieved space savings of 70.85%, as can be seen in Figure 5.2. Because TCAM Razor uses an FDD while minimizing classifiers, we attain the same TCAM space savings whether or not we use negation removal during the all-match to first-match conversion. 5.2.2 Mutliple TCAM Approach When using multiple TCAM chips is an option, our results indicate it is best to use the all-match specific optimizations negation removal and SSA instead of the first-match classi35 Figure 5.1: Classifier sizes obtained using TCAM Razor vs negation removal alone for the Snort 2.9.0.3 rule set. Figure 5.2: Classifier sizes obtained using TCAM Razor vs negation removal alone for the Snort 2.9.0.5 and 2.9.1.0 rule sets. 36 Algorithm Bits Negation Removal 144 TCAM Razor 144 Topological Transformation 40 TCAM SPliT 36 SSA 2 144 SSA 4 144 Snort Rules: TCAM Entries Size Space Savings Entries Size Space Savings Entries Size Space Savings Entries Size Space Savings Entries Size Space Savings Entries Size Space Savings 2.9.0.3 2.9.0.5 2.9.1.0 TCAM Entries 131043 246276 246282 18.00 Mb 33.82 Mb 33.82 Mb — — — 57018 71798 71799 7.83 Mb 9.86 Mb 9.86 Mb 56.49% 70.85% 70.85% 28491 36978 36979 1113 Kb 1444 Kb 1444 Kb 93.96% 95.83% 95.83% 26465 33952 33952 931 Kb 1194 Kb 1194 Kb 94.95% 96.55% 96.55% 6518 1948 1432 917 Kb 274 Kb 201 Kb 95.03% 99.21% 99.42% 1149 1323 1188 162 Kb 186 Kb 167 Kb 99.12% 99.46% 99.52% Table 5.3: Summary of results. fier optimizations Topological Transformation and SPliT. SSA achieves better performance because it solves a slightly different problem. While Topological Transformation and TCAM SPliT return all matching rules, SSA returns only a subset per query. The results returned by all SSA lookups must be unioned together to get a complete solution. Table 5.3 summarizes the performance of all algorithms we evaluated. We have omitted space savings for negation removal because it was our baseline, but in fact the space savings for negation removal can be calculated by comparing against the size of the classifier if we do not apply negation removal. We can see that SSA 2 and SSA 4 outperformed both Topological Transformation and TCAM SPliT. Note that of the multi-TCAM optimization algorithms, Topological Transformation had the least space savings with 93.96% and SSA 4 had the most with 99.52%. 37 5.3 Efficiency The computational effort required to compress large classifiers like the Snort rules is a nontrivial task. In this section, we evaluate the efficiency of our approaches. Surprisingly, we report an increase in efficiency when we skip negation removal during the all-match to first-match conversion. We only collected efficiency data for TCAM Razor and Topological Transformation; however, Topological Transformation had an average run time on the order of hours while SSA 2 and SSA 4 were on the order of 10’s of minutes. Therefore, we only present efficiency results for TCAM Razor. Figure 5.3: Time efficiency of TCAM Razor with and without negation removal. As figure 5.3 shows, we found that on average Razor performed 38% faster for input classifiers generated without using negation removal. This is a meaningful result given that the classifiers without negation removal are twice as large as the classifiers with negation removal (see table 5.1). Since the two classifiers are semantically equivalent, they are represented by 38 the same FDD. Therefore, the only explanation for Razor being more efficient on one or the other is that Razor can build the FDD faster using the classifier formed without negation removal. 39 Chapter 6: Conclusion In this thesis, we evaluate the performance and efficiency of two approaches to minimizing all-match classifiers by converting them to first-match classifiers. In the first approach we, consider optimization algorithms specific to all-match classifiers: negation removal and SSA. Our second approach applies state-of-the-art first-match classifier optimization algorithms to a first-match classifier generated from an all-match classifier. We evaluated both approaches in the context of how many TCAM chips are used. We found that when only one TCAM chip is available, it is best to avoid all-match specific optimizations and apply first-match classifier optimizations instead. Using this approach we were able to achieve a 70.85% reduction in TCAM space with TCAM Razor over negation removal alone. On the other hand, we found in a multiple TCAM chip environment, the best method to use is SSA with negation removal which achieves TCAM space savings of 99.52% over negation removal alone. 40 BIBLIOGRAPHY 41 BIBLIOGRAPHY [1] Snort. http://www.snort.org/. [2] A. Bremler-Barr and D. Hendler. Space-efficient tcam-based classification using gray coding. Computers, IEEE Transactions on, 61(1):18 –30, jan. 2012. [3] Hao Che, Zhijun Wang, Kai Zheng, and Bin Liu. Dres: Dynamic range encoding scheme for tcam coprocessors. [4] Pierluigi Crescenzi and Viggo Kann. A compendium of np optimization problems. http://www.nada.kth.se/~viggo/wwwcompendium/node145.html. [5] Mohamed G. Gouda and Alex X. Liu. Structured firewall design. Comput. Netw., 51(4):1106–1120, March 2007. [6] P. Gupta and N. McKeown. Algorithms for packet classification. Network, IEEE, 15(2):24 –32, mar/apr 2001. [7] David S. Johnson. Approximation algorithms for combinatorial problems. In Proceedings of the fifth annual ACM symposium on Theory of computing, STOC ’73, pages 38–49, New York, NY, USA, 1973. ACM. [8] Karthik Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for advanced packet classification with ternary cams. In In ACM SIGCOMM, pages 193–204, 2005. [9] Karthik Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for advanced packet classification with ternary cams. In In ACM SIGCOMM, pages 193–204, 2005. [10] Alex X. Liu and Mohamed G. Gouda. Diverse firewall design. Parallel and Distributed Systems, IEEE Transactions on, 19(9):1237 –1251, sept. 2008. [11] A.X. Liu, C.R. Meiners, and E. Torng. Tcam razor: A systematic approach towards minimizing packet classifiers in tcams. Networking, IEEE/ACM Transactions on, 18(2):490 –500, april 2010. [12] Huan Liu. Efficient mapping of range classifier into ternary-cam. In High Performance Interconnects, 2002. Proceedings. 10th Symposium on, pages 95 – 100, 2002. 42 [13] Chad R. Meiners, Jignesh Patel, Eric Norige, Eric Torng, and Alex X. Liu. Fast regular expression matching using small tcams for network intrusion detection and prevention systems. In Proceedings of the 19th USENIX conference on Security, USENIX Security’10, pages 8–8, Berkeley, CA, USA, 2010. USENIX Association. [14] C.R. Meiners, A.X. Liu, and E. Torng. Topological transformation approaches to tcambased packet classification. Networking, IEEE/ACM Transactions on, 19(1):237 –250, feb. 2011. [15] C.R. Meiners, A.X. Liu, E. Torng, and J. Patel. Split: Optimizing space, power, and throughput for tcam-based classification. In Architectures for Networking and Communications Systems (ANCS), 2011 Seventh ACM/IEEE Symposium on, pages 200 –210, oct. 2011. [16] D. Pao, P. Zhou, B. Liu, and X. Zhang. Enhanced prefix inclusion coding filter-encoding algorithm for packet classification with ternary content addressable memory. IET Computers & Digital Techniques, 1(5):572–580, 2007. [17] Derek Pao, Yiu Keung Li, and Peng Zhou. Efficient packet classification using tcams. Computer Networks, 50(18):3523 – 3535, 2006. [18] E. Spitznagel, D. Taylor, and J. Turner. Packet classification using extended tcams. In Network Protocols, 2003. Proceedings. 11th IEEE International Conference on, pages 120 – 131, nov. 2003. [19] Subhash Suri, Tuomas Sandholm, and Priyank Warkhede. dimensional routing tables, 2003. Compressing two- [20] David E. Taylor. Survey and taxonomy of packet classification techniques. ACM Comput. Surv., 37(3):238–275, September 2005. [21] Pi-Chung Wang and Chia-Ming Chang. Scalable packet classification for network intrusion detection. In Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems, CSS ’07, pages 64–69, Anaheim, CA, USA, 2007. ACTA Press. [22] F. Yu, R.H. Katz, and T.V. Lakshman. Efficient multimatch packet classification and lookup with tcam. Micro, IEEE, 25(1):50 – 59, jan.-feb. 2005. [23] F. Yu, T.V. Lakshman, M.A. Motoyama, and R.H. Katz. Efficient multimatch packet classification for network security applications. Selected Areas in Communications, IEEE Journal on, 24(10):1805 –1816, oct. 2006. 43