(1 - 2 of 2)
- Algorithms for deep packet inspection
- Patel, Jignesh
- Electronic Theses & Dissertations
The core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string...
Show moreThe core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string matching for network applications. Today most DPI systems use Regular Expression (RE) to represent signatures. RE matching is more diffcult than string matching, and current string matching solutions don't work well for REs. RE matching for networking applications is diffcult for several reasons. First, the DPI application is usually implemented in network devices, which have limited computing resources. Second, as new threats are discovered, size of the signature set grows over time. Last, the matching needs to be done at network speeds, the growth of which out paces improvements in computing speed; so there is a need for novel solutions that can deliver higher throughput. So RE matching for DPI is a very important and active research area.In our research, we investigate the existing methods proposed for RE matching, identify their limitations, and propose new methods to overcome these limitations. RE matching remains a fundamentally challenging problem due to the diffculty in compactly encoding DFA. While the DFA for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, many alternative automata implementations that compress the size of the final automaton have been proposed. However, previously proposed automata construction algorithms employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to expensive minimization on a large automata, and a large intermediate memory footprint. We propose a “Minimize then Union” framework for constructing compact alternative automata, which minimizes smaller automata first before combining them. This approach required much less time and memory, allowing us to handle a much larger RE set. Prior hardware based RE matching algorithms typically use FPGA. The drawback of FPGA is that resynthesizing and updating FPGA circuitry to handle RE updates is slow and diffcult. We propose the first hardware-based RE matching approach that uses Ternary Content Addressable Memory (TCAM). TCAMs have already been widely used in modern networking devices for tasks such as packet classification, so our solutions can be easily deployed. Our methods support easy RE updates, and we show that we can achieve very high throughput. The main reason combined DFAs for multiple REs grow exponentially in size is because of replication of states. We developed a new overlay automata model which exploit this replication to compress the size of the DFA. The idea is to group together the replicated DFA structures instead of repeating them multiple times. The result is that we get a final automata size that is close to that of a NFA (which is linear in the size of the RE set), and simultaneously achieve fast deterministic matching speed of a DFA.