3.3.7.. ‘ .4 .u. ; nu . 3.2.. fr.» 2.. . 11. ‘ Qt. . q :3» Skill” - . .. a. . J . 1 l flu! . .3.qu x, n . .4 v- A... tag, . 4.51 . .1. . fin . . w . . Vang ..., Um» .3 fi ‘ I" AHunmmfl d... I? ”'9, 1. 114.34..“ ’Hf;¢:%¢rflflm.~bhfi.% 49...: . 3,. , .. a». . . a : ufl x . vi!‘ V )hlwfifi, by», .gflfiuwi 01:...43‘. .n 2.3;. 5.94.. ‘9": ..w t :«5. . hack... 0: 3.. . .7 a 3.3.1}. .9 _ .. tun-I.” I. dull.) 11$?! .. It’ll: I... 1:511: 9.411? I . it .13.... I, . 1:091; RSV . £15.90 . 81“- .ll :1... \v .1 in ... I w. I... .. \fl .. a lucid! raj." :3 iii! :4; > .‘§uplkm. 5‘ f: ‘ 1) a! tilt- . ‘ 4%.». (Ln. Aqucxn<fl;;.twrau. _. .‘s:o:..; ‘ ‘ ....si;.w....: This is to certify that the thesis entitled USING DEVELOPMENTAL LEARNING FOR NETWORK INTRUSION DETECTION presented by DAVID B. KNOESTER has been accepted towards fulfillment of the requirements for the MS. degree in COMPUTER SCIENCE Major RrofessoFs Signature \ 5/14/07 Date MSU is an Aflinnative Action/Equal Opportunity Institution LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DaIeDue.p65-p.15 USING DEVELOPMENTAL LEARNING FOR NETWORK INTRUSION DETECTION By DAVID B. KNOESTER A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Computer Science 2004 .w'l' w" U.‘ a q- v ,4. ‘P‘vr '0‘. g. I ,I ' chuvsu. . I c ,; 0,.P'., 4 6.- to v - x«‘.a ’ I \ 1-1 -r - “thij i- “Vtiklt. ~£ ‘ . .I . , '7 ~,, V U; ‘4 " “0 tAdAa ‘ . "5.“\ l .J’ ’ '1 db.“ "lg... | O. I- 1' P12?“ lHJHCA I I': :I-."7 . M ‘4'“ I“- "l' Cw ’— I;‘\ "“4. ABSTRACT USING DEVELOPMENTAL LEARNING FOR NETWORK INTRUSION DETECTION By DAVID B. KNOESTER A key problem in the field of network security is network intrusion detection. Variability in network attacks, frequent software vulnerabilities, and the lack of gen- eral techniques for security combine to make network intrusion detection a challeng- ing problem. This thesis describes the application of developmental learning (a form of machine learning) to network intrusion detection. An extensible framework for network intrusion detection has been developed. The framework is general enough to leverage existing intrusion detection methods, while providing a solid mathemat- ical foundation for identifying network attacks in packet streams. This thesis also describes difficulties encountered when casting a fundamentally systems-oriented problem into the machine-learning domain. Experimental results are presented that demonstrate the effectiveness of this approach for misuse and anomaly detection. Copyright by DAVID B. KN OESTER 2004 LIST 0] LIST 0] 1 Intro 2 Back I 3 Featt 3.1 1 3.2 i 3.3 4 Misu w to H 44 .4 - 1-) .J- .- ‘7’) Table of Contents LIST OF TABLES vi LIST OF FIGURES vii 1 Introduction 1 2 Background 7 2.1 Developmental Learning ......................... 7 2.2 MESO-Tree Algorithm .......................... 9 2.3 1999 DARPA IDS Evaluation ...................... 12 2.4 Related Work ............................... 15 3 Features 18 3.1 Features in MESONet .......................... 19 3.2 Feature Types and Requirements .................... 21 3.3 Feature Composition ........................... 24 3.4 Feature Detail ............................... 26 4 Misuse Detection 35 4.1 Training and Testing Procedures ..................... 36 4.2 Performance Analysis ........................... 38 4.3 Single Packet Features .......................... 40 Experiment 1 ............................... 40 Experiment 2 ............................... 44 Experiment 3 ............................... 48 4.4 Sliding Window Features ......................... 51 Experiment 4 ............................... 51 Experiment 5 ............................... 55 Experiment 6 ............................... 58 Experiment 7 ............................... 61 4.5 Packet Number Features ......................... 64 Experiment 8 ............................... 64 Experiment 9 ............................... 68 4.6 Destination Grouping Features ...................... 71 Experiment 10 ............................... 71 Experiment 11 ............................... 74 iv 5 C4 APP :I PE B .\l C A: USI 4.7 Summary of Results ........................... 5 Anomaly Detection 5.1 'Ii‘aining and Testing Procedures ..................... 5.2 Performance Analysis ........................... 5.3 Proof of Concept ............................. Experiment 1 ............................... 5.4 Feature Sets from Misuse Detection ................... Experiment 2 ............................... Experiment 3 ............................... Experiment 4 ............................... 5.5 Derivative Features ............................ Experiment 5 ............................... Experiment 6 ............................... 5.6 Summary of Results ........................... 6 Conclusions and Future Work APPENDICES A Feature Development B Misuse Detection Feature Set Configurations C Anomaly Detection Feature Set Configurations LIST OF REFERENCES 79 79 80 84 84 87 87 91 94 97 97 101 104 106 108 109 112 121 126 3.1 i 3.2 t 3.. l 3.4 5' 3.5 ’ 4.1 ' 4.2 f 4.3 .1 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 5.1 5.2 List of Tables Features used to detect the LAND attack. ............... 21 Summary of feature classes. ....................... 28 Packet number feature detail. ...................... 30 Single packet feature detail. ....................... 32 Time feature detail ............................. 34 Dacefiles used for misuse detection with MESONet ........... 36 Sample confusion matrix .......................... 39 Summary of misuse detection validation tests. ............. 78 Tracefiles used for anomaly detection with MESONet .......... 80 Summary of anomaly detection validation tests: per-packet and per- attack True Positive Rate (TPR) at 10% false positive rate, 20—second time-agglomerative TPR at 1% false positive rate ............ 105 vi 4.1 4.2 4.3 H 4.5 l6 1.1 2.1 2.2 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 List of Figures Sequence Diagram for MESONet. .................... 6 High level view of a classifier-based DLA ................. 9 Example feature space ........................... 11 A feature set for MESONet, consisting of all the features extracted by both the SinglePacketFeatures and PayloadCountFeatures classes. 25 A feature set for MESONet, consisting of a 5-packet sliding window of all the features extracted by PacketNumberFeatures . ....... 26 A feature set for MESONet, consisting of a 5-packet sliding window of all the features extracted by SinglePacketFeatures, grouped by destination IP address. .......................... 27 Ti‘acefile A ground truth packet graph. ................. 37 Tracefile B ground truth packet graph. ................. 38 Experiment 1 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 42 Experiment 1 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 43 Experiment 2 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 46 Experiment 2 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 47 Experiment 3 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 49 Experiment 3 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 50 Experiment 4 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 53 Experiment 4 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 54 Experiment 5 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 56 Experiment 5 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 57 Experiment 6 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 59 vii cu Q1! cu cu g," cm C.” c.” C.” L'JI C.” C." U1 414 +- 5.4 L1]. ' 5;." C." C." c," an (,1! (xi—Imurumoorap—a Paco »-s “ I (1‘ C ' ' uLa CA“ I a. j On—J (‘1'? p. 4 p- .4 (.&fi gun» [UH 4.14 Experiment 6 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 60 4.15 Experiment 7 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 62 4.16 Experiment 7 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 63 4.17 Experiment 8 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 66 4.18 Experiment 8 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 67 4.19 Experiment 9 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). ......................... 69 4.20 Experiment 9 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 70 4.21 Experiment 10 resubstitution test results: packet graph (top) and confusion matrix (bottom) ......................... 72 4.22 Experiment 10 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 73 4.23 Experiment 11 resubstitution test results: packet graph (top) and confusion matrix (bottom) ......................... 75 4.24 Experiment 11 validation test results: packet graph (top) and confu- sion matrix (bottom) ............................ 76 5.1 'Ii'acefile D ground truth packet graph. ................. 81 5.2 Sample ROC curve ............................. 83 5.3 Nearest-cluster distances .......................... 85 5.4 Nearest-cluster standard-deviations(top) and packet labels (bottom). 86 5.5 Per-packet ROC curve for Experiment 2. ................ 88 5.6 Per-attack ROC curve for Experiment 2. ................ 89 5.7 Time-agglomerative ROC curve for Experiment 2 ............ 90 5.8 Per-packet ROC curve for Experiment 3. ................ 91 5.9 Per-attack ROC curve for Experiment 3. ................ 92 5.10 Time—agglomerative ROC curve for Experiment 3 ............ 93 5.11 Per-packet ROC curve for Experiment 4. ................ 94 5.12 Per-attack ROC curve for Experiment 4. ................ 95 5.13 Time-agglomerative ROC curve for Experiment 4 ............ 96 5.14 Per-packet ROC curve for Experiment 5. ................ 98 5.15 Per-attack ROC curve for Experiment 5. ................ 99 5.16 Time-agglomerative ROC curve for Experiment 5 ............ 100 5.17 Per-packet ROC curve for Experiment 6. ................ 101 5.18 Per-attack ROC curve for Experiment 6. ................ 102 5.19 Time-agglomerative ROC curve for Experiment 6 ............ 103 A.l Java interface for feature extraction. .................. 110 A2 Listing of all code for the TCPPercentage class. ............ 111 viii 13 B2 3.3 Bi 3.5 8.6 8..“ BS 39 81+ .1. E C1 C2 % ll C3 C4 C5 A.3 Feature set using the TCPPercentage class ................ 111 3.1 Feature set configuration for Experiment 1 ................ 112 B2 Feature set configuration for Experiment 2 ................ 112 B3 Feature set configuration for Experiment 3 ................ 113 8.4 Feature set configuration for Experiment 4 ................ 113 3.5 Feature set configuration for Experiment 5 ................ 114 B6 Feature set configuration for Experiment 6 ................ 115 B7 Feature set configuration for Experiment 7 ................ 116 38 Feature set configuration for Experiment 8 ................ 117 3.9 Feature set configuration for Experiment 9 ................ 118 B.10 Feature set configuration for Experiment 10. .............. 119 RH Feature set configuration for Experiment 11. .............. 120 C.1 Feature set configuration for Experiment 2 ................ 122 C2 Feature set configuration for Experiment 3 ................ 123 C3 Feature set configuration for Experiment 4 ................ 123 G4 Feature set configuration for Experiment 5 ................ 124 C5 Feature set configuration for Experiment 6 ................ 125 ix Chapter 1 Introduction Ensuring an individual’s right to privacy, protecting national infrastructure and preventing costly service outages all require network security. Although some net- work intrusions are as innocuous as undesired email, others disrupt global network services and cause billions of dollars in lost productivity [15]. The variability of network intrusions coupled with frequent software vulnerabilities and the lack of general techniques for securing networks has made network intrusion detection a challenging problem. Network intrusions, also called network attacks, are activities that lead to the unauthorized use of computer resources. They include attempts to compromise integrity, confidentiality, or availability of network services [6]. Most network intru- sions have a distinct attack signature, comprised of unique and observable charac- teristics, that allow Network Intrusion Detection Systems (NIDS) [9, 11, 18, 19, 25, 27—29,31,41,42,44,47, 48] to positively identify them. However, slight variations in attack signature are often all that is required to foil detection. Furthermore, there are many different styles of network intrusions. Some target specific applications or operating systems, while others engage in Distributed Denial of Service (DDoS) attacks [26]. Finally, new network attacks, especially those that exploit a recently I I‘V .ryw- ‘. __,=.- l-» Q-ir ‘ -qo phi ”.V and .U. o P" 0? L's. L..- . Un‘n - [aura [- Q A“ W0 , _ disclosed or unpublicized software vulnerability, often do not have a known attack signature. Network attack variability has led to two approaches for detecting network intrusions. First, misuse detection searches network traffic for known attack signa- tures. Second, anomaly detection recognizes unusual or anomalous network condi- tions. Each approach has different strengths and weaknesses. For example, misuse detectors can recognize known attacks, but are unable to reliably detect variations on these attacks. Misuse detectors are also unable to detect previously unknown (often called zero-day) attacks [20,21]. Conversely, anomaly detectors are capa- ble of detecting zero-day attacks, but generally do so with a high false-positive rate [8, 20,21,23]. Effective network intrusion detection requires techniques that can help address the weaknesses of current approaches to misuse and anomaly detection. In this the- sis, we explore the application of methods based on developmental learning [50], a form of machine learning, to network intrusion detection. By deriving features, or measurements suitable for use by a developmental learning algorithm, from network packets, we are able to mathematically determine the similarity (or lack thereof) be- tween packet streams. We address both misuse and anomaly detection, and describe difficulties related to deriving features from network packets. Thesis Statement: By applying methods based on developmental learning to net- work intrusion detection, it is possible to perform both misuse and anomaly detection within the same framework, providing good detection accuracy with few false-positive alarms. The major contributions of this work can be summarized as follows: 1. MESOTree-based Network Intrusion Detection System (MESONet) design and implementation. MESONet is both a framework for extracting features from g9 fifi>~9~ .M x “ lurcm, ~3“. 5.. usfpr "i L :5? .-, network traffic, and an implementation of a network intrusion detection sys- tem. MESONet uses MESO-Tree [24], a classification algorithm similar to Hierarchical Discriminant Regression [22, 49], for developmental learning. MESONet is independent of the particular learning algorithm used and can be easily redeployed atop a different system, enabling the use of improved al- gorithms and implementations without extensive software modification. Ad- ditionally, MESONet supports a wide variety of runtime selectable features, and is easily extended to incorporate new features. 2. Application of developmental learning to a systems-oriented problem. Net- work intrusion detection systems have become increasingly complex. Machine learning techniques offer an approach that is capable of dealing with this com- plexity. This thesis shows that machine learning techniques can be successfully applied in a systems domain, specifically network intrusion detection, and we present difficulties encountered when doing so. 3. Performance evaluation through experimentation. We evaluate the perfor- mance of MESONet for misuse and anomaly detection using a subset of the 1999 DARPA IDS Evaluation datasets1 [33]. MESONet is a platform for conducting developmental-learning based exper- iments into network intrusion detection. Because MESONet is a learning-based system, it must be trained on known data prior to conducting tests. As a practical matter, both training and testing data must be known; that is, every packet used for training or testing must be labeled either normal, meaning that it is not part of a network attack, or attack, predictably meaning that it is part of a network attack. Although all experiments presented in this thesis are the results of offline tests, MESONet is capable of operating online, sniffing traffic directly from a network. 1We are aware of research critical of the 1999 DARPA IDS Evaluation. This issue is addressed in detail in Section 2.3. ['0] [[15 {6350 $553150.) open source J 35543} 3 Experimé :5. for each one. liESO.‘ rated on 0:. étrenenta' :: :erork. Normal 1, tgexerutim: I [535:3 dim: | ii'fidii ll [30» king. Net“. | | Mid path , I u used as t: “Sling each : as. simiiar _ if”? mi: 4-? ' For this reason, and to ensure compatibility with publicly available network trace- files, MESONet also incorporates packet capturing, or sniffing. MESONet uses the open source JPCap package [12] for network traffic capturing, which is built upon libpcap [3], a cross-platform packet capturing facility. Experiments presented in this thesis are the results of batch training, meaning that for each experiment, MESONet was trained on all available training data at once. MESONet is capable of incremental training, for example, MESONet can be trained on one network packet at a time. Combined with its online capabilities, incremental training enables near “plug-and-play” installation of MESONet on any network. Normal operation of MESONet follows the sequence shown in Figure 1.1. Dur- ing execution, J PCap reads packets from a previously captured tracefile (or captures packets directly from the network). Each packet is then labeled as either normal or attack; if possible, attack packets are labeled with the specific attack to which they belong. Next, an array of features, called a feature vector, is extracted from each labeled packet. During training, these feature vectors and their associated labels are used as training data by the developmental learning algorithm (DLA). During testing, each feature vector is used as input to the DLA, and the DLA returns the most similar instance of training data. Finally, we perform different analyses on the test results depending on whether MESONet is operating in misuse or anomaly detection mode. The remainder of this thesis is organized as follows. Chapter 2 provides back- ground information on developmental learning and the MESO-tree algorithm, de- scribes the 1999 DARPA IDS Evaluation, and presents a summary of related work. Chapter 3 describes features in MESONet and the difficulties associated with the de- velopmental learning approach to network intrusion detection. Chapter 4 describes misuse detection with MESONet, including training and testing procedures, perfor- detection Wit. 3.153585- am; wit and cor. mance analyses, and experiments performed. Similarly, Chapter 5 describes anomaly detection with MESONet, including training and testing procedures, performance analyses, and experiments performed. Finally, Chapter 6 describes possible future work and concludes. .umZOmmE 8m Ewumsa cocosvom A; onE E: fi=< l ...... E32 H E5“?— _ _ u EEO.— Eamumumutauso 33.3mm». _ _ 2.8: n E32 aatSnoEoboc—oum ................ I u—: H 53‘”.— .oxqumuzs Bacon m ” BESEEE SefiEsuxsar _oo._._.IOmm_2 _ _ xmagmoaaaom _ _>wu§w_on§_ uoEsmcoUHoc—onn— anon: .35 3552 Chapter 2 Background MESONet draws upon previous research from both the network intrusion de- tection and developmental learning communities. As these communities have little, if any, overlap, this chapter provides background information sufficient to the un- derstanding of MESONet. Section 2.1 describes developmental learning, Section 2.2 describes the MESO-tree algorithm, Section 2.3 describes the 1999 DARPA IDS Evaluation, and Section 2.4 describes related work. 2.1 Developmental Learning Developmental learning is a recently established machine learning paradigm based on Automated, Animal-like Learning, or AA-learning [50]. The goal of devel- opmental learning is to enable an agent1 to learn online through interaction with its environment. In contrast to task-specific machine learning approaches, develop- mental learning focuses on general purpose, non-task-specific autonomous agents. For example, an agent implemented using evolution-based machine learning may be evolved to perform a specific task [46], while a developmental agent is capable of 1An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through effectors [43]. dynamically learning new tasks, without need for reprogramming. A developmental learning algorithm (DLA) is responsible for building and main- taining a representation of what the agent has been taught. Teaching a develop- mental agent requires that the agent: 1) observe relevant information about its environment, and 2) is provided the correct action to perform given its observa- tions. Environmental observations are collected via sensors, which produce scalar measurements of particular environmental features. For example, a thermometer is a sensor that produces a measurement of the temperature feature. Features may also be calculated from sensor readings. For example, interpacket delay may be cal- culated using the arrival times of sequentially sensed network packets. Both sensed and calculated features are collected as an array called a feature vector, or sample, that represents all of an agent’s observations at a given point in time. Actions, or labels [14], are typically communicated to the developmental agent by an external entity. In some cases a human may perform this role, interactively providing feedback to the agent based on the agent’s performance [10]. In other cases, software may provide the agent with a label based on specialized knowledge. For example, a database of network packets could provide labels (“attack” or “nor— mal”) to an agent based on whether or not a particular packet is present in the database. Each label is associated with a feature vector, representing the desired action an agent is to perform when observing environmental conditions similar to those contained in the feature vector. As shown in Figure 2.1, our DLA implementation produces a statistical model using techniques based on pattern classification and agglomerative clustering [14]. This statistical model is a classifier [24], and its task is to determine the label for unlabeled feature vectors. Typically, DLAs have two distinct phases of operation: training and testing. During training, the DLA builds its classifier from a set of .r ‘ ‘ .4 WW“ “.l. I ...ouv 1 ltu. .L. I 7 ,Sc'fi mad (\J <¢. ."- "IT 1 labeled feature vectors. Once trained, the DLA can be queried using an unlabeled feature vector. The DLA then tests this vector against the classifier, and returns a label indicating the action that is most likely desired of the agent. EA A» 1.x mm mm \ I" cum D Figure 2.1: High level view of a classifier-based DLA. 2.2 MESO-Tree Algorithm The classifier used in MESONet is called MESO-tree, for Multi-Element Self- Organizing Tree [24]. MESO-tree uses small agglomerative clusters, called sensitivity spheres to group similar training samples together. Each sensitivity sphere acts as a proxy for the samples it contains, which, depending on the training data, may be anywhere from a single sample to tens of thousands of samples. The motivation for utilizing sensitivity spheres rather than operating directly on the training data is to reduce computational requirements, which is especially significant in cases where sensitivity spheres represent many samples. Sensitivity spheres are partitioned into sets during the construction of a memory-efficient decision tree [37], which produces a hierarchical model of the train— I'll-l. nib}. .j Van—1. I V“: ing data. Each node of the decision tree comprises a set of sensitivity spheres called a partition? Each partition contains a subset of the sensitivity spheres that are most similar to each other. At subsequent tree depths, parent partitions are split according to a branching factor, producing smaller child partitions of greater detail. These child partitions comprise distinct subsets of the parent’s sensitivity spheres, and provide finer discrimination and better classification of test samples. Classifying a sample requires that the sample be compared to the sensitivity spheres within the partitions, and the path of greatest similarity followed. Once a leaf node of the tree is reached, the sample is compared directly to the training data within the most similar sensitivity sphere, and the most similar training sam- ple therein is found. Both the label associated with this training sample and the similarity metric between it and the tested sample are returned. This final similarity metric is termed the nearest-cluster distance. A novel capability of MESO-tree is its independence from a particular measure of similarity. As such, any algorithm that calculates a metric between two feature vectors may be used by MESO-tree to determine similarity between samples. For example, MESONet uses Euclidean distance, but Mahalanobis or Canberra distances could be used as well. Additionally, MESOTree-based DLAs can support both batch and incremen- tal training. Incremental training enables construction of the classifier’s statistical model through the addition of one or only a few samples at a time. As such, the classifier can be trained and tested during concurrent interaction with users or other system components. By leveraging this structure, a relatively simple distance metric, such as Eu- clidean distance, can be used to achieve high classification accuracy. One key advan- tage of this approach is that it requires a small amount of calculation, enabling rapid 2As used here, the term partition is unrelated to mathematical combinatorics. 10 5.; u Lon: TI .. v... . ‘10 7‘}. IR). A I C TE“ ”w. I Is... .15.... . Fl classifier training and testing, as wells as limiting the impact of high-dimensional feature vectors. To illustrate, consider a feature vector to represent a single point in the feature- space, an N-dimensional space where each dimension corresponds to a single feature, which in the case of MESONet would be related to network packets. Figure 2.2 is a graphical depiction of a three-dimensional feature space, where the dimensions correspond to the packet length, average interpacket delay, and an indicator feature representing whether or not the packet uses the TCP protocol. This figure shows two clusters of feature vectors, labeled A and B, each of which contains feature vectors for 3 packets. The feature vectors in cluster A have an average packet length of approximately 256 bytes, an average interpacket delay of approximately 0.5 seconds, and are all TCP packets. The feature vectors in cluster B have an average packet length of approximately 768 bytes, an average interpacket delay of approximately 2 seconds, and are not TCP packets. IsTCP ii A (I) Yes ._ ' c | x0 1 x1] x2] b [yo] Y1 1 Y2] a L201 21 1 22 | 50 0.5 l 0 1 5 2 0 Average Interpacket Delay (0) No E J I | | ; Figure 2.2: Example feature space. 11 3531300 I‘EfiOTS. l rises. I true it atesini .n 12:51 It 1.5 thE g . “Us '— Based on the preceding discussion of MESO—tree, we see that cluster A is a partition containing a single sensitivity sphere, which in turn aggregates 3 feature vectors. Similarly, we see that cluster B is a partition containing two sensitivity spheres, which together aggregate 3 feature vectors. Figure 2.2 also contains the feature vector T, which is nearest to the S1 sensitivity sphere in cluster B. If T were a testing sample, it would be assigned the same label as feature vector d, the most- similar training sample, and the nearest-cluster distance would be approximately 1.5 (The Euclidean distance between feature vectors T and d). 2.3 1999 DARPA IDS Evaluation The 1999 DARPA IDS Evaluation (henceforth “IDSEval”) was the second for— mal evaluation of intrusion detection systems, the first being the 1998 DARPA IDS Evaluation. Many different datasets are available that contain network traffic, such as [2,4,38]. However, as stated in [36], the 1998 and 1999 DARPA IDS Evaluation datasets are the only ones to contain an accurate “ground truth,” or actual labels (eg, “attack” or “normal”) for individual network packets3. The IDSEval dataset consists of tracefiles of network traffic, audit logs from selected machines, and a detection scoring truth for evaluating intrusion detection systems [33]. In order to guarantee an accurate ground truth for the IDSEval, all non-attack network traffic was automatically generated by custom software au- tomata developed by the Air Force Research Laboratory (AFRL) [33]. These custom automata were designed to simulate hundreds of programmers, secretaries, man- agers, and other users common to UNIX and Windows NT application programs. Attacks were performed manually, and each attack was recorded in the detection 3We are aware of other corpora, such as [2], which contains entirely unlabeled traffic, [4], which contains sanitized packet traces, sans payload, and [38], which contains solely attack traffic. Due to these factors, these traces are not sufficient for our purposes. 12 so 3011 ._Q ‘ :Ulf.’ ll 15511 di‘ 6 ,j 1 ”mi; .- 403 d , ’Y I 6. at r - scoring truth. Network tracefiles, containing both normal and attack traffic, were captured at two points in the IDSEval network, from “inside” and “outside” sniffers. The inside sniffer was only able to capture network traffic visible on the internal network, while the outside sniffer captured all inbound and outbound network traffic. For each network attack, the attack ID, start time, duration, and victim IP address(es) were recorded in the detection scoring truth as an attack record. All traffic not belonging to an attack record is assumed to be normal traffic. Network-based misuse detection is then the problem of detecting which of the packets in a given network tracefile belong to specific attack records. Similarly, network-based anomaly detection is the problem of detecting which packets in a given network tracefile are anomalous when compared to a known-normal set of packets, with the hope that these anomalous packets also belong to an attack record. We are aware of two major critiques of the IDSEval: Mahoney and Chan’s anal- ysis of background traffic [34], and McHugh’s general critique of the IDSEval [36]. McHugh primarily addresses problems with the setup and evaluation portions of the IDSEval. As a result of his critique, we do away with the psuedo-ROC curve as used in the IDSEval, instead using a traditional parametric ROC curve based upon a probabilistic measure. Phrthermore, we address the base-rate fallacy [8] in Section 5.2. The critique of the background traffic in the IDSEval by Mahoney and Chan focuses on possible simulation artifacts appearing in the IDSEval datasets. Specifi- cally, Mahoney and Chan state that three preconditions must exist for a simulation artifact to be present: 1. The attribute is well behaved in simulated traffic without attacks. 2. The attribute is not well behaved in real traffic without attacks. 13 If? If?” -1 \_l‘ 34.7": ..‘ U - I :J‘ a, ‘ . "4 Th. 'r . .“ ~17 \] 3. The attribute is not well behaved in traffic containing simulated attacks. Putting aside the fact that their analysis does not prove precondition two (though we agree it is probably correct), we address the specific attributes in ques- tion as related to our work in MESONet. TCP SYN Regularity. In [34], it is noted that the option bytes of different TCP SYN packets have the same values in the IDSEval traffic, and different values in real network traffic. We do not use the presence, or value, of option bytes in any of the features used by MESONet, as will be shown in Section 3.4. Source Address Predictability. In [34], it is noted that a limited number of source IP addresses are used in the IDSEval data, while real network traffic appears to have a steadily increasing number of source IP addresses. We do not include specific IP addresses in any of the features used by MESONet. We do, however, include the relationship between the source and destination addresses of packets, as will be shown in Section 3.4. Checksum Errors. In [34], it is noted that TCP and ICMP checksum errors occur in real network traffic, however not in the IDSEval data. MESONet un- conditionally filters out all malformed TCP, UDP, and ICMP traffic, therefore the presence (or absence) of these packets has no effect on the operational characteristics of MESONet. Packet Header Fields. In [34], it is noted that a large variety of values for the TTL and TOS fields occur in real network traffic, while few different values occur for the same fields in the IDSEval data. We do not use TTL or TOS in any of the features used by MESONet, as will be shown in Section 3.4. “Crud” [sic]. In [34], the following anomalies were noticed in real network traffic and not in the IDSEval data: 0 Nonzero TCP ACK fields. 14 o Nonzero URG pointer field. 0 Nonzero values in TCP reserved flags. 0 Nonzero values in the TCP reserved header bits. None of these attributes were used in MESONet, as will be shown in Section 3.4. HTTP Requests. In [34], variations in HTTP requests were noticed in real network traffic, and not in the IDSEval traffic. The particular variations mentioned deal with alphanumeric characters. As will be shown in Section 3.4, alpha-numeric characters are not used in any features appearing in MESONet. SMTP Requests. In [34], variations in SMTP requests were noticed in real network traffic, and not in the IDSEval traffic. The particular variations mentioned deal with alphanumeric characters. As will be shown in Section 3.4, alphanumeric characters are not used in any features appearing in MESONet. However, a small number of the SMTP requests in real traffic are malformed, and include binary data. This characteristic may have a small effect on MESONet, proportional to the number of symbolic characters appearing in the first 100 bytes of the SMTP request. SSH Requests. In [34], variations of SSH version identification numbers were detected in real network traffic, and only one in the IDSEval data. This characteris- tic may have a small impact on MESONet, proportional to the number of symbolic characters appearing in SSH version identification strings. 2.4 Related Work Intrusion detection systems are based on two main approaches: anomaly de- tection and misuse detection. Anomaly detection systems are those that detect abnormal behavior [9, 18, 19, 28, 29, 41, 42], while misuse detection systems are those that detect known attacks [9, 11, 27, 31, 41,47]. Furthermore, some intrusion detec- 15 tion systems are purely network-based [25, 41,44, 48]. The remainder of this section describes a number of N IDS that use techniques similar, in either approach or im- plementation, to MESONet. In [28], Lane and Brodley describe a user-based anomaly detection system. They show that user profiles based on UNIX command sequences are sufficiently unique to distinguish between different users. However, as mentioned in [19], this approach has difficulty with informed intruders and concept drift. Informed in- truders are aware of a privileged users habits, for example a proclivity for “vi” as opposed to “emacs,” and are able to masquerade as that user. Concept drift occurs when a user’s behavior changes enough over time such that the system no longer recognizes malicious behavior, or when a user’s normal behavior appears malicious. This research shows that anomaly detection systems must employ machine-learning techniques to avoid problems with concept drift. In [30], Lee et a1. use auditing programs to extract an extensive set of features from network traffic, in this case traffic from the 1998 DARPA IDS Evaluation. They then apply data mining techniques to these features to learn rules that capture the behavior of both normal and attack traffic. In their particular case, they used Bro [41] and NFR [39] for IP packet filtering, packet reassembly, and to generate the audit data. They used RIPPER [13], a classification rule learning program, to mine the audit data. An interesting characteristic of this approach is that determining classification rules is entirely automated via use of RIPPER. However, a shortcoming of this particular approach is the dependency upon audit data; it is unable to operate directly on network traffic. In [7], Asaka et a1. developed a method for intrusion detection based on dis- criminate analysis. They developed a host-based IDS, designed to detect network intrusions on the local machine. It monitors system logs for occurrences of 11 differ- ent system calls performed by privileged network servers, and uses discriminant and 16 principle component analysis to determine if the behavior is indicative of an intru- sion. It is capable of detecting new intrusions and can distinguish between normal and attack traffic. However, it is limited to monitoring a single host, is dependent on the logging capabilities of the host, and is not designed to operate directly on network traffic. In [45], Chen et al. describe an anomaly detection scheme based on princi- ple component classification. They assume that anomalies can be treated as out- liers, and construct a predictive model from the major and minor principle compo- nents of normal instances. They conducted experiments with the 1999 KDD Cup data [1], which contains connection records derived from the 1998 DARPA IDS Eval- uation [32], rather than raw packet traces. They show that their proposed method outperforms nearest-neighbor, density-based local outliers, and Canberra metric approaches. Although this research was successful when operating on connection records, it was not designed to operate directly on network traffic. In [35], Mahoney and Chan describe a two-part intrusion detection system, consisting of a packet header anomaly detector (PHAD) and an application layer anomaly detector (ALAD). Both parts are nonstationary, meaning that they model probabilities based on the time since the last event, rather than the average rate of events, which reduces the importance of frequent events. The authors tested the combination of PHAD and ALAD on the 1999 DARPA IDS Evaluation data. Interestingly, a point is made of mentioning that network anomaly detection is “. . . essentially the machine learning problem of modeling normal network traffic from a training set.” To put it another way, anomaly detection tries to learn the characteristics of the normal class and determine if new instances do not belong to that class. This research shows that time-based characteristics of network traffic are significant for intrusion detection. 17 Chapter 3 Features As mentioned in Section 2.1, sensors collect environmental observations that produce scalar measurements of environmental features. In MESONet, the only available sensor is the network interface, therefore all features must be derived from network packets. Features are a critical component of MESONet because they are the only link between the classifier and network packets. Furthermore, feature vectors must comprise distinguishing characteristics of network attacks in order for MESONet to be effective at network intrusion detection. The remainder of this chapter is organized as follows. Section 3.1 describes the use of features in MESONet and difficulties associated with the developmental learning approach to network intrusion detection. Section 3.2 describes the different types of features in MESONet and requirements for features to be effective in a developmental learning environment. Section 3.3 presents examples of the flexibility afforded by the MESONet framework to feature vector composition, and Section 3.4 details all features used in MESONet. 18 3.1 Features in MESONet We define MESONet features to be scalar values that are derived from charac- teristics of network packets. A feature vector, or sample, is thus an array of scalar values in which each value has been derived from a network packet. This process of calculating a feature vector is termed feature extraction. Some features may directly depend on data within the packet (for example, packet length), while others may depend on characteristics of the packet itself (for example, the time of arrival). For every packet received by MESONet, a new feature vector is extracted, labeled, and delivered to the DLA, which then performs either training or testing. The boundary between classifiers (such as MESOTree, discussed in Section 2.2) and feature extraction is somewhat arbitrary [14]. An ideal set of features would make classification trivial, just as an omnipotent classifier has no need of sophisti- cated feature extraction. The practical goal of feature extraction is to seek discrim- inating features — those features that have similar values for samples that have the same class, and different values for samples that have different classes. One of the strengths of the developmental learning approach is that very little penalty is paid for utilizing a large number of features, which allows us to err on the side of too many features (however, performance can be penalized for using “bad” features, as will be shown in Chapter 4). The selection of the particular features to extract from network packets is the main difficulty in the application of developmental learning to network intrusion detection. In more traditional applications of developmental learning, for example in robotics [51], sensors and features are naturally related: Each feature is extracted from a sensor that relates to the physical environment in which the robot operates. However, in MESONet all features must be extracted from packets, which has no such natural model. 19 In MESONet, selecting the particular features to be extracted from network packets relies heavily on domain knowledge, much as in other forms of classifica— tion [14]. In general, the approach we have taken is to include those features that “cover” the distinguishing characteristics of all network attacks appearing in the IDSEval data, without developing specific features for that attack. For example, consider the “LAND” attack [26]. This attack can be detected by a single TCP SYN packet that has equal source and destination IP addresses and port numbers. We can detect LAND by extracting a single indicator feature that checks for exactly this condition, for example: if( (pkt.src_address == pkt.dst_address) && (pkt.src_port == pkt.dst-port) at pkt.is_tcp && pkt.is_syn ) { return LAND_ATTACK; else return NORMAL; However, this approach is not scalable when we consider that the IDSEval consists of many different types of attacks, not to mention that this approach is ineffectual at detecting zero—day attacks. Alternatively, we can extract a series of more general features, which when used in concert are capable of detecting the LAND attack. These features include indicator features for the type of packet (TCP, UDP, or ICMP), as well as metrics for the similarity of source and destination IP address and port numbers. These features assume certain values when extracted from a LAND attack packet, and different values when extracted from a normal packet. We then rely upon MESO- tree to distinguish between the two different feature vectors. Table 3.1 is a listing of the features that detect the LAND attack, how they are calculated, and the values 20 Feature Calculation Attack Normal IP Address Distance src_ip[0 XOR dst.ip0 0 ¢ 0 src_ip.1‘ XOR dst.ip_1] 0 7f 0 src_ip_2‘ XOR dst.ip.2, 0 74 0 src_ip_3] XOR dst_ip_3 0 aé 0 Port Distance src_port XOR dst..port 0 75 0 isTCP pkt.is-TCP ? True : False True True/ False isSYN pkt.is_SYN ? True : False True True/ False Table 3.1: Features used to detect the LAND attack. that distinguish between attack and normal packets. It should be noted that, due to training, only packets that are most similar the values listed in the Attack column are considered attacks. 3.2 Feature Types and Requirements Four different types of features are used in MESONet: packet features, calcu- lated features, windowing features, and grouping features. Packet Features. These features are directly related to values and charac- teristics intrinsic to a packet. For example, packet length, protocol type, time of arrival, and flag values are all packet features. Calculated Features. These features are not derived directly from fields in network packets. Rather, they include composites or transformations of packet characteristics, as well as multi—packet characteristics, such as the running average of packet length, interpacket delay, and average packet rate. We refer to calculated features that have a dependence upon past packets as exhibiting “memory.” For example, average interpacket delay depends not only on the arrival time of the current packet, but also on the arrival time of previous packets. Windowing Features. These “meta-features” alter the feature vector so as to include features from multiple packets. The rationale for using windowing features is that many intrusions cannot be detected in a single packet, so we must include 21 features from multiple packets within a single feature vector. For example, a five— packet windowing feature concatenates all features from four previous packets to the features from a fifth packet, generating a single, larger feature vector. Grouping Features. These “meta-features” are used to select the particular feature extractor to use for any captured packet. Grouping enforces that all packets using a particular feature extractor either belong to the same flow, are destined for the same host, or originated from the same host. Grouping features are particularly useful when combined with windowing, in which case all packets used to construct final feature vectors are guaranteed to be related (they are destined for, and/ or originate from, the same host). In addition to being one of these types, features must satisfy requirements regarding ordering and boundedness to be effective within a developmental learning environment. These requirements help to ensure that the calculation of a distance metric (such as Euclidean distance; see Section 2.2) is meaningful with respect to observed features. Ordering. Feature values must obey a semantically correct strict weak order- ingl. For example, when given three different samples that have values at, y, z for the same feature, such that :c < y < z and y — a: < z — y, it is semantically valid to say that :r: and y are “closer” to each other than to z. This requirement stems from the use of a metric to determine the distance between multiple feature vectors. In MESONet, this requirement manifests itself in the handling of categorical variables, specifically nominal categorical variables such as the IP protocol and the ICMP message type fields. For example, packet length is a quantitative variable - it has meaning as a number, and obeys a semantically correct strict weak ordering. However, ICMP message type is a categorical variable; values include 1 (echo-reply), lA strict weak ordering is a partial order over the “less than” relation. By semantically correct strict weak ordering, we mean that it must “make sense” to compare the feature values. 22 ‘ ”w. W. N t I. . . l 91m.” r H“ Ml 5. 9.7» I; u o . m. - N v ”r u m .. ...- .1 it 3 (destination unreachable), and 8 (echo-request), among others. If we applied a strict weak ordering to ICMP message type, we would determine that echo-reply (1) is “closer” to destination unreachable (3) than to echo-request (8) — obviously se- mantically meaningless. Similarly, the protocol field in IP datagrams is a categorical variable. If we applied a strict weak ordering to this field we would find that ICMP (protocol 1), is “closer” to TCP (protocol 6), than UDP (protocol 17). Obviously, saying that ICMP is “close” to TCP is also semantically meaningless. To mitigate this problem, we split each categorical variable appearing in net- work packets into a set of indicator variables, one for each possible category. For example, we split the IP protocol field into three different indicator variables: isTCP (1 if the packet is a TCP packet, 0 otherwise), isUDP (1 if the packet is a UDP packet, 0 otherwise), and isICMP (1 if the packet is an ICMP packet, 0 otherwise). Each packet will only have one of these indicator variables set, and it is now seman- tically correct to say that a packet with the isTCP variable set is “close” to another packet that also has the isTCP variable set — in fact, they are equal in that feature. Boundedness. The range of feature values must be bounded. For example, given any feature x, the maximum value that can be assumed by :1: must be known, and fixed. This requirement, too, stems from the use of a metric to determine the distance between multiple feature vectors. In MESONet, this requirement manifests itself in the handling of quantitative variables, specifically in “counting” variables such as “the count of ICMP packets” and “the count of TCP fragments.” This requirement is particularly important when the quantity of training traffic is significantly different than testing traffic. For example, let us consider a feature “ICMPCount,” that counts the number of ICMP packets captured. If training consisted of one hour of traffic, during which time only 10 ICMP packets were captured, and testing consisted of 10 hours of traffic with an average ICMP rate of 10 per hour, then the final ICMP packets will 23 have a Euclidean distance of at least 90 from the training traffic. Applying a known bound removes this dependence upon duration of testing and training. To mitigate this problem, we encode such unbounded features as a percentage. For example, ICMPCount is encoded as the percentage of ICMP traffic versus IP traffic. 3.3 Feature Composition MESONet is a framework for extracting features from network traffic in a man— ner suitable for use with developmental learning algorithms. To facilitate rapid training and testing, MESONet supports a variety of runtime selectable features, and is easily extended to incorporate new features. Dynamic composition of feature sets, together with the ease of developing new feature classes, also allow us to easily incorporate other research efforts into the MESONet framework. For example, both Bro [41] and NPR [39] produce audit data that is suitable for inclusion into MESONet as features. Similarly, system call information [7] and nonstationary models [35] may also be incorporated into MESONet as features. In short, any scalar value that can be calculated from a packet stream can be easily included in MESONet. Appendix A explores the development of new features in detail. For any given training or testing period, MESONet dynamically assembles a hierarchy of Java objects representing the current feature set, or the list of fea- tures to be extracted from each network packet. These objects perform any needed calculation to extract features from packets. The composition of the feature set is determined by an XML configuration file. For example, (featureStrategy c1assname="SinglePacketFeatures"/> is an example of a very simple feature set, comprised of a single Java class, in this case SinglePacketFeatures. Although 24 Figure 3.1: A feature set for MESONet, consisting of all the features extracted by both the SinglePacketFeatures and PayloadCountFeatures classes. SinglePacketFeatures is contained within a single XML element, it extracts 20 different features from every network packet. The ability for a class to provide mul- tiple features allows logically related features to be collected in a single location, which not only improves performance but enhances maintainability. The particular features that are extracted by each class mentioned here are described in detail in Section 3.4. Figure 3.1 describes a more complex feature set. In this case, CompoundI-‘eatures is used to aggregate the features that are extracted by both SinglePacketFeatures and PayloadCountFeatures into a single fea- ture vector. Again, although it is contained within a single XML element, PayloadCountFeatures extracts 24 features from every network packet. It should be noted that CompoundFeatures does not itself extract any features from network packets; it is used only to assist in assembling the final feature vector. In this case, the resultant feature vector comprises 44 features; 20 from SinglePacketFeatures, and 24 from PayloadCountFeatures. SinglePacketFeatures and PayloadCountFeatures are examples of packet features, as described in Section 3.2. Figure 3.2 describes a feature set containing a 5—packet window of calculated features. SlidingWindoeratures maintains a list of features that were previously extracted by the classes it encapsulates. This 25 (featureStrategy classname="PacketNumberFeatures"/> Figure 3.2: A feature set for MESONet, consisting of a 5-packet sliding window of all the features extracted by PacketNumberFeatures. list is treated as a FIFO sliding window, similar in spirit to TCP sliding windows, whereby older features are replaced by newer features after a certain number have been stored. There is no restriction placed upon the number of features that may be extracted at a time - indeed, PacketNumberFeatures extracts 6 different features from each packet. Finally, Figure 3.3 describes a 5-packet window of SinglePacketFeatures that is grouped by destination IP address. In this case, HostFeatures instantiates a new feature set hierarchy for every unique destination IP address occurring in all captured network packets. This ensures that the same objects will be used to extract features from subsequent network packets belonging to the same packet stream. This is particularly useful when used with SlidingWindoeratures, as it ensures that all features in the window belong to the same packet stream. 3.4 Feature Detail As mentioned in Section 3.3, the ability of a single Java class to extract multi- ple features from a network packet provides a logical grouping for related features. Based on this grouping, this section describes a taxonomy of the features that can currently be extracted by the MESONet framework. Table 3.2 summarizes all indi- vidual features described in this section, and contains the feature name, the class in 26 Figure 3.3: A feature set for MESONet, consisting of a 5—packet sliding window of all the features extracted by SinglePacketFeatures, grouped by destination IP address. which it is defined, and a brief description. CompoundFeatures. The CompoundFeatures class, as alluded to in Sec- tion 3.3, aggregates features extracted by other classes into a single feature vector; in essence, it is a container for other feature extractors. It does not, itself, extract features from network packets. Configuration parameters to CompoundFeatures are: 0 subStrategy: Any number of XML elements containing configuration infor- mation for the feature extractors to be aggregated. DerivativeFeatures. The DerivativeFeatures class, similar to CompoundFeatures, is also a container for other feature extractors, how- ever DerivativeFeatures also augments the resultant feature. vector with the first derivative of the contained features. For example, if the upcoming PacketNumberFeatures class was contained by the DerivativeFeatures class, then the resultant feature vector would contain scalar values for both the percent- age of TCP traffic and the rate at which the percentage of TCP traffic has been changing. The derivative is calculated by averaging the change between successive 27 Feature Name Class Description Derivative Features DerivativeFeatures Augments contained feature class with its derivative. Sliding Window SlidingWindoeratures Sliding window of previous feature vectors. Host Grouping HostFeatures Groups feature vectors by source and /or destination. Compound CompoundFeatures Aggregates features from con- tained feature classes. IP Address Distance SinglePacketFeatures Distance between source and destination IP addresses. Port Number Distance SinglePacketFeatures Distance between source and destination port numbers. Fragment Offset SinglePacketFeatures IP fragment offset Packet Length SinglePacketFeatures Total length of packet Packet Header Length SinglePacketFeatures Length of packet header Is TCP SinglePacketFeatures Indicator; 1 if TCP, 0 else Is UDP SinglePacketFeatures Indicator; 1 if UDP, 0 else Is ICMP SinglePacketFeatures Indicator; 1 if ICMP, 0 else TCP ACK SinglePacketFeatures Indicator; 1 if TCP and ACK, 0 else TCP FIN SinglePacketFeatures Indicator; 1 if TCP and FIN, 0 else TCP PSH SinglePacketFeatures Indicator; 1 if TCP and PSH, 0 else TCP RST SinglePacketFeatures Indicator; 1 if TCP and RST, 0 else TCP SYN SinglePacketFeatures Indicator; 1 if TCP and SYN, 0 else TCP URG SinglePacketFeatures Indicator; 1 if TCP and URG, 0 else Payload Count PayloadCountFeatures Counts of certain characters in payload Payload Value PayloadValueFeatures ASCII value of bytes in pay- load Interpacket Delay TimeFeatures Mean time between subse- quent packets Packet Rate TimeFeatures Packets per second %SYN Packets PacketNumberFeatures Percent SYN traffic %FIN Packets PacketNumberFeatures Percent FIN traffic %RST Packets PacketNumberFeatures Percent RST traffic %TCP Packets PacketNumberFeatures Percent TCP traffic %UDP Packets PacketNumberFeatures Percent UDP traffic %ICMP Packets PacketNumberFeatures Percent ICMP traffic Table 3.2: Summary of feature classes. 28 feature values over a configurable number samples. Configuration parameters to Derivat iveFeatures are: 0 maxWindowSize: The number of samples over which the change in feature value is averaged. o granularity: The minimum amount of time that must pass between samples to trigger a recalculation of the derivative. If large (compared to sample frequency), DerivativeFeatures will ignore short-term, or impulse, behavior. 0 subStrategy: An XML element containing configuration information for the contained feature extractor; can be CompoundFeatures. HostFeatures. The HostFeatures class, as mentioned in Section 3.3, con- structs a new feature set hierarchy based upon the source or destination of captured packets. Internally, HostFeatures maintains a hash table of IP addresses to feature extractors. For every captured packet, a lookup in this hash table is performed, based on either the source or destination IP address of the packet. The resultant feature extractor is then used to extract features from the current packet. This ap— proach is most powerful when used in concert with stateful feature extractors, such as SlidingWindoeratures or DerivativeFeatures, which will then be guaran- teed that all feature extractions are performed on packets from the same stream. Finally, HostFeatures can also be nested with itself to allow feature extraction from the same source-destination IP address pair. Configuration parameters to HostFeatures are: e sortby: May have value “src” or “dst,” which provide either source or desti— nation grouping. 29 Feature Name Description ‘70 SYN The percentage of traffic with the “SYN” TCP flag set. SYN packets signify the beginning of a TCP con- nection, and are also used during some DoS attacks. ‘70 FIN The percentage of traffic with the “FIN” TCP flag set. FIN packets signify the end of a TCP connec- tion, and are also used by attackers for reconnaissance purposes. ‘70 RST The percentage of traffic with the “RST” TCP flag set. RST packets are used to abort an improper TCP connection setup, and are used during some DoS at- tacks. % TCP The percentage of traffic that belongs to the TCP protocol. % UDP The percentage of traffic that belongs to the UDP protocol. ‘70 ICMP The percentage of traffic that belongs to the ICMP protocol. Table 3.3: Packet number feature detail. 0 subStrategy: An XML element containing configuration information for the contained feature extractor; can be another HostFeatures with different sort— ing, to provide unique source-destination grouping. PacketNumberFeatures. The PacketNumberFeatures class calculates the percentage of traffic that meets certain protocol characteristics. For example, this class calculates the percentage of traffic that is TCP. There are no configu- ration parameters to PacketNumberFeatures. The specific features calculated by PacketNumberFeatures are shown in Table 3.3. PayloadCountFeatures. The PayloadCountFeatures class extracts features from the payload portion of captured packets. Rather than searching for specific strings of characters, or particular application protocols, PayloadCountFeatures counts the number of certain characters appearing in the payload. The alphabet of characters counted includes printable symbolic characters, many of which are used in network attacks (for example, thousands of backslash characters appear in 30 the payload of the “back” attack). The motivation behind counting characters is to avoid problems with ordering, as described in Section 3.2. Configuration parameters to PayloadCountFeatures are: e payloadLength: Specifies the maximum number of payload bytes to scan for counted characters. The specific features calculated by PayloadCountFeatures are: 0 Character Alphabet: The count of certain characters appearing in the packet payload. One feature for each of the following characters is extracted: {}/\'!@#$'/.*()-_=+‘ ’<>,.? PayloadValueFeatures. The PayloadValueFeatures class extracts fea- tures from the payload portion of captured packets. In opposition to PayloadCountFeatures and the ordering principle discussed in Section 3.2, this class copies the ASCII value of payload bytes directly into the feature vector. This feature extractor is used primarily as motivation to follow the ordering principle. Configuration parameters to PayloadValueFeatures: o payloadLength: Specifies the maximum number of bytes to copy into the feature vector. The specific features calculated by PayloadValueFeatures are: 0 Character Value(s): One feature for each ASCII character value of pay- loadLength bytes are copied into the feature vector. SinglePacketFeatures. The SinglePacketFeatures class extracts features from each packet based upon the packet’s protocol, flags, source, and destination. 31 Feature Name Description IP Distance Actually four features, one for each IP address byte, IP distance is used to measure the similarity between source and destination network masks. The calcula- tion is performed by taking the XOR of the source and destination IP address bytes: src[i] EB dst[i]. These feature values will be zero for packets that have iden- tical source and destination addresses, and nonzero otherwise. Port Number Distance Similar to IP distance, port number distance mea- sures the similarity between source and destination port numbers, and is also calculated via an XOR: src_port EB dst_port. This feature is only extracted from TCP and UDP packets. Hagment Offset The packet’s fragment offset from the IP header. Packet Length The total length of the packet. Packet Header Length The length of the header portion of the packet. Is TCP An indicator feature, this feature is set to 1 if the captured packet is a TCP packet, and 0 otherwise. TCP ACK An indicator, this feature is set to 1 if the captured packet is a TCP packet and the ACK bit is set, 0 otherwise. TCP FIN An indicator, this feature is set to 1 if the captured packet is a TCP packet and the FIN bit is set, 0 otherwise. TCP PSH An indicator, this feature is set to 1 if the captured packet is a TCP packet and the PSH bit is set, 0 otherwise. TCP RST An indicator, this feature is set to 1 if the captured packet is a TCP packet and the RST bit is set, 0 otherwise. TCP SYN An indicator, this feature is set to 1 if the captured packet is a TCP packet and the SYN bit is set, 0 otherwise. TCP URG An indicator, this feature is set to 1 if the captured packet is a TCP packet and the URG bit is set, 0 otherwise. Is UDP An indicator feature, this feature is set to 1 if the captured packet is a UDP packet, and 0 otherwise. Is ICMP An indicator feature, this feature is set to 1 if the captured packet is an ICMP packet, and 0 otherwise. Table 3.4: Single packet feature detail. 32 In short, it extracts features from the packet header. There are no configura- tion parameters to SinglePacketFeatures. The specific features calculated by SinglePacket-Features are shown in Table 3.4. SlidingWindoeratures. The SlidingWindoeratures class, as alluded to in Section 3.3, creates a FIFO sliding window, similar to TCP sliding window, of features extracted by its contained class. It can be thought of as adding “his- tory” to the feature vector; SlidingWindoeratures adds a configurable number of prior feature vectors to every feature vector extracted by the contained class. SlidingWindoeratures does not, itself, extract features from network packets. Configuration parameters to SlidingWindoeratures are: e subStrategy: An XML element containing configuration information for the contained feature extractor; can be CompoundFeatures. TimeFeatures. The TimeFeatures class extracts time-based features from network packets, specifically mean interpacket delay and packets per second statis- tics. When combined with HostFeatures, TimeFeatures can calculate these statis- tics for all packets destined for a host, thus helping to detect DoS attacks. Note that using DerivativeFeatures in conjunction with TimeFeatures would calculate the rate of change of interpacket delay and the packet acceleration. The specific features calculated by TimeFeatures are shown in Table 3.5. Configuration parameters to TimeFeatures are: e windowSize: The number of packets over which statistics are to be calcu- lated. 33 Feature Name Description Mean Interpacket Delay The average of the time between subsequent network packets. Packets Per Second The number of network packets arriving during the last second. Table 3.5: Time feature detail. 34 Chapter 4 Misuse Detection Misuse detection systems search network traffic for known attack signatures. Although misuse detection systems are unable to reliably detect zero-day attacks, they are able to detect known attacks with very low false positive rate [20,21], and are thus a critical component of effective network intrusion detection. Experiments presented in this chapter show largely negative results; that is, although MESONet achieves good accuracy when performing misuse detection, it does not achieve a good recall rate. These experiments were, however, integral to our understanding of feature extraction, and ultimately led us to better results with anomaly detection, as will be shown in Chapter 5. The remainder of this chapter describes misuse detection with MESONet, and is organized as follows. Section 4.1 describes the training and testing procedures for misuse detection with MESONet. Section 4.2 describes the analyses used to evaluate the performance of misuse detection with MESONet. Sections 4.3 through 4.6 de- scribe experimental results using different feature sets, and Section 4.7 summarizes misuse detection with MESONet. 35 Tracefile Source Use Tracefile Size Packet Count Attacks A 3 / 8 / 1999 training 329MB 1263543 7; 7178 packets B 3 / 11/ 1999 testing 330MB 1263543 9; 25047 packets Table 4.1: Tracefiles used for misuse detection with MESONet. 4.1 Training and Testing Procedures Performing misuse detection with MESONet requires both training and testing phases. Table 4.1 summarizes the tracefiles used to perform training and test- ing. Tfacefile A, used for training, contains 1.25M normal packets and 7,178 attack packets, comprising 7 different network attacks. Tracefile B contains 1.26M normal packets and 25,047 attack packets, comprising 9 different network attacks. It is typ— ical in network intrusion detection for the normal traffic to far outweigh the attack traffic [8]. Figure 4.1, termed a packet graph, shows the specific attacks occurring in the training data (Tracefile A). Each point in Figure 4.1 represents one attack packet (for the sake of clarity, normal packets are not plotted; there are 1.25M of them). As shown in this figure, 7 different attacks occur in ’I‘racefile A: NTInfoscan, pod, back, httptunnel, land, secret, and ps attack [26]. Of these attacks, httptunnel contains the most attack packets, 5,061 out of 7,178. Figure 4.2 is a packet graph of the attacks occurring in Tracefile B, the tracefile used for misuse detection testing. As shown in this figure, 8 different attacks occur in Tracefile B, comprising 25,047 packets. Furthermore, only one of the attacks in 'Ii'acefile B is also represented in Tracefile A: the LAND attack. Effective misuse detection requires that MESONet be trained with attacks that are representative of the attacks expected to appear in the testing traffic. However, the realities of network intrusion detection dictate that misuse detection systems also be evaluated using non—trained attacks. For this reason, the testing data used with MESONet contains 7 non-trained attacks. 36 1 Ground Trirth + mm, ................... ..g- .......................... : ...................... - pen ,. ........................ f ..................... . ..... ...................... .. ......................... , ........................ .. neptune : ' i ”an .. .. ,, - ............................................................................ . CI‘ISNIS .... ...,. ., . ., .......... ..... .] m -. .. . ......................... ..................... .I ”in,” ... .. ..i .. .. ....................... ...................... .- Wu‘e -.......... . .. . .......................... ..................... .. eject [W ,. .. .. : . ................... .. m .... _ ... .. ........................................................................... - 933m 1 2' f + m, . . ........... .. ................................................ ., , ... ................ - m, __ .. ...................... .. .................................................. - W _ .......................... + .. “mm .......................... m ................ . back _. ...-..+ ..- ............................................................................ a .... ~11an ... ....................... i ......................... i ................................................. i ............................................................................. - 0 200000 400000 600000 800000 16+06 1294-06 1.4e+06 Figure 4.1: ’Ifacefile A ground truth packet graph. 37 All misuse detection experiments are performed with a different feature set. Two different tests are performed with each feature set: a resubstitution test and a validation test. For the resubstitution test, MESONet is both trained and tested on the same data, in this case Tracefile A. This test is conducted to show the sufficiency of the feature set to be used for learning-based misuse detection. In theory, the results of a resubstitution test should be perfect, however, this is not always the case. Intuitively, resubstitution tests are performed to make sure that the feature set contains discriminating features; a poor resubstitution suggests that the feature set does not contain enough information to discriminate between attack and normal traffic. For the validation test, MESONet is trained on the entirety of Ti'acefile A and tested on the entirety of Ti‘acefile B. This test is conducted to show I l {$.me - ................. .. -.....i. ....................... pen .— ................. .. ,. ...................... ......... . .......... . .. .. .. Ground Triith + neptune ; a § ------ - satan f” ........... ...“. i++ ., - .... ................ : . . .. ps attack met .. .......... . ........ ...... 1e+06 1 .29+06 Figure 4.2: Tracefile B ground truth packet graph. the applicability of what MESONet learned during training to a different tracefile. 4.2 Performance Analysis The method by which performance of an intrusion detection system is judged is as important as the intrusion detection system itself [8,32,36]. In order to provide an unbiased performance analysis of misuse detection with MESONet, we generate packet graphs and calculate precision, recall, false positive rate, accuracy, and a confusion matrix for both resubstitution and validation tests. Packet graphs allow an intuitive understanding of how MESONet classifies each packet by showing the attack classification of each packet. 38 Hf HE 1'13. I. flu! .i L. A confusion matrix, also known as a correlation matrix, tabulates information about actual and predicted classifications performed by a classification system [5]. Referring to the sample confusion matrix in Table 4.2, the value in cell a is the number of correct normal predictions, also known as true negatives. Cell b is the number of normal packets that are predicted to be an attack, also known as false positives. Cell c is the number of attack packets that are predicted to be normal, also known as false negatives, and cell d is the number of attack packets that are predicted to be attack, also known as true positives. Predicted Normal Attack Actual Normal a b Attack c d Table 4.2: Sample confusion matrix. False positive rate is defined as the proportion of predicted attacks that are actually normal traffic. If TN = TrueNegatz'ves and FP = FalsePosz’tz'ves, the false positive rate F PR is defined as FPR = FP/(FP + TN). A low false positive rate is desirable. Intuitively, false positive rate is a measure of the trustworthiness of attack predictions. False positive rate is also the typical method of evaluating intrusion detection systems, but is somewhat suspect due to the base-rate fallacy [8], which states that false positive rates well below .0001% may be needed to account for the vast number of packets per day, typically well over 1 million. Precision is defined as the proportion of predicted attacks that are correct. If TP = TruePositz'ves and F P = F alsePosz’tives, precision P is defined as P = TP/(TP + F P). A high precision is desirable. Intuitively, precision measures the effectiveness of training todistinguish between attack and normal classes. Recall is defined as the proportion of all attacks that are correctly predicted. Recall is equivalent to True Positive Rate. If TP = TruePosz'tz'ves and FN = F alseNegatz'ves, recall R is defined as R = TP/(TP + F N ) A high recall is 39 desirable. Intuitively, recall measures the effectiveness of training to detect attacks. Accuracy is defined as the proportion of correctly identified samples. If S = Samples, TP = TruePosz'tz'ves, and TN = TrueNegatives, accuracy A is defined as A = (TP + TN ) / S . A high accuracy is desirable. 4.3 Single Packet Features In this section we describe experiments based on features that can be extracted from a single network packet. Experiment 1 uses only the SinglePacketFeatures class, Experiment 2 adds the PayloadCountFeatures class, which is then replaced by PayloadValueFeatures in Experiment 3. All feature classes used here are de- scribed in Section 3.4. Complete feature set configuration for all experiments con- ducted in this chapter can be found in Appendix B. Experiment 1. The first misuse detection experiment performed with MESONet used only the SinglePacketFeatures class. This class includes features such as IP address distance, protocol indicators, and payload length. The rationale behind selecting SinglePacketFeatures as the first test was that all network at- tacks have packet characteristics. Indeed, some network attacks, such as the “land” and “pod” attacks are detectable based solely on packet characteristics. As shown by the resubstitution confusion matrix in Figure 4.3, less than half of the attack packets were correctly classified. The recall rate of 46% verifies that this feature set alone is insufficient for attack detection. This confusion matrix does show an interesting characteristic, one that will be carried through to other tests, and that is the relationship between the false positive and recall rates. A low false positive rate shows that MESONet is correctly learning the characteristics of normal packets; however, a low recall rate shows that MESONet is not learning the characteristics of attack packets. In other words, based on the results of the 40 resubstitution test, we can hypothesize that this feature set adequately captures the behavior of normal packets, but not of attack packets. The validation confusion matrix shown in Figure 4.4 verifies that SinglePacketFeatures is not a viable feature set for attack detection. Although accuracy is still high due to the large number of normal packets, recall and pre- cision are low, 0.09% and 1.6% respectively. Furthermore, the false positive rate has reached 0.1%, representing 1,351 false alerts. Using SinglePacketFeatures .alone describes normal packet characteristics with 98% accuracy, but is incapable of describing attack packet characteristics. All future experiments include SinglePacketFeatures because, based on domain knowledge, we know that network attack detection must be predicated upon individual packet characteristics. Upcoming experiments show that although SinglePacketFeatures alone is not a sufficient feature set, performance is improved when SinglePacketFeatures is used in conjunction with other feature classes. 41 Predicted Normal Attack Normal 1256206 159 Attack 3819 3359 False Positive Rate .0001 Accuracy .9968 Recall .4679 Precision .9548 Figure 4.3: Experiment 1 resubstitution test results: packet graph (top) and confu- sion matrix (bottom). 42 I Ground Truth + MESONet x Nnnbmn 1-..... . s ..... . ............... ..... . ........ . ...... é ........ . .................. .... ..... . .............. .. Predicted Normal Attack Normal 1312660 1351 Attack 25024 23 False Positive Rate .0010 Accuracy .9803 Recall .0009 Precision .0167 Figure 4.4: Experiment 1 validation test results: packet graph (top) and confusion matrix (bottom). 43 Experiment 2. The second misuse detection experiment performed with MESONet augments SinglePacketFeatures with payload characteristics. Specif- ically, we add PayloadCountFeatures to the feature set. PayloadCountFeatures adds features for the count of certain characters appearing in each packet’s pay- load. The rationale behind adding payload characteristics to the feature set is that many network attacks contain payload designed specifically to exploit network applications. For example, the “back” attack payload contains thousands of “/” characters. The resubstitution test confusion matrix in Figure 4.5 is different than that from Experiment 1 in two respects. First, the false positive rate increased from 0.01% to 0.06%. showing that this feature set has more difficulty distinguishing normal packets from attack packets. Second, the recall rate increased to 68%, showing that this feature set is better at describing the characteristics of attack packets. The confusion matrix shown in Figure 4.6 verifies that SinglePacketFeatures, even when used in conjunction with PayloadCountFeatures, is still an insufficient feature set for misuse detection. The false positive rate is at the higher level of 0.37%, representing 4,956 false alerts. The high false positive rate confirms that this feature set does not accurately describe the characteristics of normal packets. However, recall rate has risen to 0.11%, representing an increase of 5 correctly detected attacks. The packet graph in Figure 4.6 shows that there is a weak correlation between attack predictions made by MESONet and actual attack packets. For example, MESONet predicts a large series of “httptunnel” attack packets at the same time that the ground truth shows a series of “perl” attack packets. This behavior shows that, based on this particular feature set, there are similarities between these two attacks. With the exception of Experiment 3, all future experiments include 44 PayloadCountFeatures because, based on domain knowledge, we know that net- work attacks are payload specific. Upcoming experiments show that although SinglePacketI-‘eatures and PayloadCountFeatures are not a sufficient feature set, performance is improved by the addition of other feature classes. 45 I Ground Tn'Jth + . MESONet x ..................... Predicted Normal Attack Normal 1255595 770 Attack 2228 4950 False Positive Rate .0006 Accuracy .9976 Recall .6896 Precision .8653 Figure 4.5: Experiment 2 resubstitution test results: packet graph (top) and confu- sion matrix (bottom). 46 ' Ground Trim + MESONet x mm .. . .- ...................... ......................... . ......................... _.....+ ................ .. Predicted Normal Attack Normal 1309055 4956 Attack 25019 28 False Positive Rate .0037 Accuracy .9776 Recall .0011 Precision .0056 Figure 4.6: Experiment 2 validation test results: packet graph (top) and confusion matrix (bottom). 47 Experiment 3. The third misuse detection experiment performed with MESONet augments SinglePacketFeatures with PayloadValueFeatures. PayloadValuef-‘eatures adds features for the ASCII value of the first 100 charac- ters appearing in each packet’s payload. It should be noted that the features added by the PayloadValueFeatures class violate the ordering requirement described in Section 3.2. In fact, the poor performance of this experiment was the motivation for the ordering requirement. As a result of this poor performance, upcoming ex- periments do not use PayloadValueFeatures. Compared to previous resubstitution tests, the use of PayloadValueFeatures has a detrimental effect on misuse detection performance. The confusion matrix shown in Figure 4.7 lists 15,302 false positives and only 455 true positives. The recall rate of 6.3% verifies that this feature set is not sufficient for detecting attacks, while the high false positive rate of 1.2% shows that this feature set is incapable of distinguishing between attack and normal traffic. Even though the recall rate of 0.21%, shown in Figure 4.8, is larger than that of previous validation tests, the number of false positives has increased‘dramatically, to 18,125. The greater recall rate suggests that this feature set is better at detecting attacks, however, the false positive rate shows that MESONet is still incapable of distinguishing between attack and normal packets. Furthermore, the packet graph in Figure 4.8 shows that MESONet is detecting only a single attack, the “NTinfoscan” attack, and that the attack predictions are in no way correlated to actual attack packets. 48 Predicted Normal Attack Normal 1241063 15302 Attack 6723 455 False Positive Rate .0121 Accuracy .9825 Recall .0633 Precision .0288 Figure 4.7: Experiment 3 resubstitution test results: packet graph (top) and confu- sion matrix (bottom). 49 ”a ,- i 3 i ; 1r 3 ”an .. ......... m.++ -+ .......+..+ ...... -... .... .++ ........ ........................ _. Mi“ J“...++.+”. ...... Wm+w .. +1... ... .... . ....+++.’ ........... ...“. ...... HM“. .-.... ............... _. _ mam - ............... ........ ....... ....gi. .................... ................... ....................... ......................... ........................ - é mob ,........... .. . ......... . .......... .; ...................... .. .. ...... . ........... ........................ .. 0 200000 400000 600000 800000 1e+06 1.29406 1.40406 Packet Predicted Normal Attack Normal 1295886 18125 Attack 24994 53 False Positive Rate .0137 Accuracy .9677 Recall .0021 Precision .0029 Figure 4.8: Experiment 3 validation test results: packet graph (top) and confusion matrix (bottom). 50 4.4 Sliding Window Features In this section we describe experiments based on the use of sliding window features. Experiment 4 adds SlidingWindoeratures to the feature set from Ex- periment 2, Experiment 5 and 6 add different configurations of TimeFeatures, and Experiment 7 uses a larger sliding window. All feature classes used here are described in Section 3.4. Complete feature set configuration for all experiments conducted in this chapter can be found in Appendix B. Experiment 4. The fourth misuse detection experiment performed with MESONet applies a two-packet sliding window to the feature set from Experiment 2. Specifically, we use SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures and PayloadCountFeatures of two packets. The rationale for using a sliding window of features is that very few network attacks consist of only a single packet. For example, in Tracefile A and B, only the “land” attack con- sists of a single packet. Therefore, accurate attack detection must consider features from multiple packets. It should be noted that the sliding window is updated with packets as they are captured off the network, that is to say, there may or may not be any correlation between the two feature sets in the window. Through the use of SlidingWindoeratures, both recall and false positive rates improve dramatically over previous experiments, as shown by Figure 4.9. In- deed, this particular feature set is the first to suggest that recall and false positive rates are independent of each other, and that they can both reach near-optimal val- ues. Figure 4.9 lists a recall rate of 97.9%, verifying that the feature set is sufficient for detecting attacks, while the false positive rate of .01% shows that the train- ing data and feature set together are capable of distinguishing between attack and normal packets. Of course, considering that these are resubstitution test results, near-optimal values are to be expected. 51 As shown by the confusion matrix in Figure 4.10, the number of correct attack predictions has increased dramatically as compared to previous experiments, from the previous maximum of 53 in Experiment 3 to the present 358. At the same time, false positives have reduced from 4,956 in Experiment 2 to 3050. Use of SlidingWindowI-‘eatures is clearly beneficial, however the false positive rate is still high. Subsequent experiments attempt to reduce the false positive rate, while minimizing adverse affect on recall rate. It should be noted that the 1.42% re- call rate in this experiment is the maximum recall rate achieved on misuse detection with MESONet. 52 ' Ground Truth + . MESONet x r iL'LiJ 335333- Predicted Normal Attack Normal 1256224 141 Attack 145 7033 False Positive Rate .0001 Accuracy .9997 Recall .9797 Precision .9803 Figure 4.9: Experiment 4 resubstitution test results: packet graph (top) and confu- sion matrix (bottom). 53 portswow ps attack secret . land httptunnel back pod NTInfoscan Figure 4.10: Experiment 4 validation test results: matrix (bottom). I Ground Truth + MESONet x E .......... * H'x‘xix.‘ ...... . ............ .x. *.*.* ............... p ...... x. ................. ..x .................... f. : .a-l -.....“xx. ............................................ .x. .- ...... ..........x. ..................................................................... - Predicted Normal Attack Normal 1310961 3050 Attack 24689 358 False Positive Rate .0023 Accuracy .9792 Recall .0142 Precision .1050 54 packet graph (top) and confusion Experiment 5. The fifth misuse detection experiment performed with MESONet adds TimeFeatures to the feature set from Experiment 4. Specif- ically, we use SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures of two pack- ets. TimeFeatures adds both interpacket delay and packet rate features. The rationale for including time—based features is based on observations of the timing characteristics of certain network attacks. For example, the “secret” attack has a highly regular timing pattern of packets. As shown by the resubstitution test confusion matrix in Figure 4.11, the addi- tion of TimeFeatures to the feature set has resulted in a near-ideal resubstitution test, with only 2 false positives and 1 false negative. The high recall rate of 99.9% signifies that the characteristics of the attacks are being learned, while the near-zero false positive rate shows that the feature set accurately describes characteristics of normal packets. At the cost of 9 true positives, the addition of TimeFeatures to the feature set has resulted in a reduction of 360 false positives, as shown by the validation test confusion matrix in Figure 4.12. Unfortunately, the false positive rate is still an unmanageably high 0.2%. Similar to Experiment 4, the packet graph in Figure 4.12 shows that MESONet is predicting nearly all attacks to belong to the “httptunnel” class. This result indicates that, based on this feature set, the “httptunnel” attack is similar to normal network traffic. Although this experiment achieved the best overall results for misuse detection, we were concerned with the placement of TimeFeatures within the sliding window. Specifically, this experiment includes TimeFeatures for both packets appearing in the sliding window, rather than including a single set of TimeFeatures (that would have been calculated, in part, from both packets appearing in the window). 55 2 Ground Trim: + MESONet x mm .. .................................................................. . ..... ...... . ............................................................................................. .. .... mom i “an .. ..................................................................... ................. . .......................................................................... .. m .. ...................... , ................................................ ,. ............................................. m .. ................................................................. ....................... . ............................................................................ u m . ................................................................... ....... . ...... . ................................................................................ .. g hm .. ..................................................................... .................................................................................................... u w .. ...................................................................... . ............... . ............................................................... , ....................... ., DOM Fm---” ... .. ........... . .......................................................................... .. ”m p. ................................................................... ..................................................................... 3'5 .................. .. m l- ........................................................................ ........................................................................ ; ..... i. .................. .. I'm .. .. .-. ............................................... ii W p. i am... ...... ..... m ................. .. m .. ......... i ............................................................... ................................................... _. pod x , W" t ......... 1 .1. ........................ i .......................... ,1. ......................... i 0 200000 400000 600000 800000 1o+06 1.2e+os 14e+06 Packet Predicted Normal Attack Normal 1256363 2 Attack 1 7177 False Positive Rate lE—6 Accuracy .9999 Recall .9998 Precision .9997 Figure 4.11: Experiment 5 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). 56 :I I I I Ground Trim + . . MESONet x “Pm . ....... 4 porl #4. — ............. - newt-m I 88180 .. .......... M4,... .. ...... .+.+. +....,.. ..++. ......... .... m“, 1...... ...... .... ..... . ....... . ..... . ........ .. .................. ...; ........... ... ........ .2... ... .......... .... ............... . ph' .. ......................................................................................................................................................................... .. W .. .......................................................................................................................... ., - "Him "' ----1 g M .. ......................................................................... ............................................................... _, m .. ............................................................... , ................................................................................................... .. mm .... ...- .................................. ...................................................................................................... .. psamck fi‘ ..... x............ 51 ..... )1 ........... 5‘ .............................. IX .. ..x... ................ )It ........................... . secret (2’56 ...................... xx" .......... x. ....... .x. ..x. .x. ............... +Wm..)(....... +.... ..l land -'- ........................ .............. + ......... .. mm -I-‘XX** .................... IIIXI ........................................ .x......- M -xx ....... - ......... .x. .. H)“; .......................... X 90° 5 x "Tim" .. ......................... IL ...................... i ................................................ _ 0 200000 400000 600000 800000 1e+06 1.20406 1.4e+06 Packet Figure 4.12: Experiment 5 validation test results: matrix (bottom). Predicted Normal Attack Normal 1311321 2690 Attack 24698 349 False Positive Rate .0020 Accuracy .9795 Recall .0139 Precision .1148 57 packet graph (top) and confusion Experiment 6. The sixth misuse detection experiment performed with MESONet is identical to Experiment 5, with the exception of rearranging the time- based feature with regard to the sliding window. In this experiment we append TimeFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures, rather than including them directly in the sliding win— dow (and hence duplicating them). The rationale for this feature set is to examine the effect of rearranging the feature set. As shown by the resubstitution test confusion matrix in Figure 4.13, the feature set is clearly sufficient for training. The recall rate of 99.9% and false positive rate of 0.000001% verify that this feature set and training data are suficient for distinguishing between attack and normal packets. The validation test confusion matrix in Figure 4.14 shows that this feature set approaches the performance of Experiments 4 and 5. Compared to Experiment 5, the false positive rate is 0.2% greater, while the recall rate is 0.12% less. Compared to Experiment 4, the false positive rate is 0.17% greater, while the recall rate is 0.15% less. The results of this experiment show that time-based features should in fact be including in the sliding window. Although this experiment shows that time-based features should be included in the sliding window, Experiments 8 and 9 show that including features in the sliding window is not a generally applicable rule. 58 s 8. i Predicted Normal Attack Normal 1256363 2 Attack 1 7177 False Positive Rate 1E—6 Accuracy .9999 Recall .9998 Precision .9997 Figure 4.13: Experiment 6 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). 59 I Ground Truth + MESONet x F a '. i + i W E‘I- ............... ‘Ix ................ .x. ................................................................................... IxIIIII-I bad ......................... .. .x....... xi x x ........ ....... . ................. ...................... .i 2x 3x 3 l x x . : .. i .. . .. 1.-.... ....-.“ x l x x 1 0 200000 400000 600000 800000 1e+06 1.26+06 1.46406 Packet Predicted Normal Attack Normal 1308638 5373 Attack 24727 320 False Positive Rate .0040 Accuracy .9775 Recall .0127 Precision .0562 Figure 4.14: Experiment 6 validation test results: packet graph (top) and confusion matrix (bottom). 60 Experiment 7. The seventh misuse detection experiment performed with MESONet explores the effect of larger sliding windows on detection accuracy. Specifically, we use a three-packet sliding window of SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures. The resubstitution test confusion matrix in Figure 4.15 shows that the feature set and training data are sufficient for distinguishing between attack and normal packets; indeed, the recall and false positive rates are ideal. The validation test confusion matrix in Figure 4.16 shows that with this feature set, larger sliding windows have a beneficial effect on false positive rate, while at the same time they have a detrimental effect on recall rate. Compared to Experiment 6, the false positive rate is 0.05% less, while the recall rate is 0.07% less. This exper- iment shows that although the window size does have an effect upon performance, window size is not a substitute for proper feature selection. 61 Packet Predicted Normal Attack Normal 1256365 0 Attack 0 7178 False Positive Rate 0.0 Accuracy 1.0 Recall 1.0 Precision 1.0 Figure 4.15: Experiment 7 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). 62 Ground Trinh + MESONet x Predicted Normal Attack Normal 1309396 4615 Attack 24745 302 False Positive Rate .0035 Accuracy .9780 Recall .0120 Precision .0614 Figure 4.16: Experiment 7 validation test results: packet graph (top) and confusion matrix (bottom). 63 4.5 Packet Number Features In this section we describe experiments based on the use of packet number features. Experiment 8 and 9 use different configurations of PacketNumberFeatures in conjunction with the feature set from Experiment 5. All feature classes used here are described in Section 3.4. Complete feature set configuration for all experiments conducted in this chapter can be found in Appendix B. Experiment 8. The eighth misuse detection experiment performed with MESONet adds PacketNumberFeatures to the feature set from Experi- ment 5. Specifically, we use SlidingWindowI-‘eatures to construct feature vec- tors from the SinglePacketFeatures, PayloadCountFeatures, TimeFeatures, and PacketNumberFeatures of two packets. PacketNumberFeatures add features such as the percentage of TCP, UDP, and ICMP traffic. The rationale for including packet number-based features is based on the observation that a number of attacks, specifically port and IP sweeps, use inordinately high numbers of SYN, RST, or FIN packets. The resubstitution test confusion matrix shown in Figure 4.17 shows that this training data and feature set are capable of distinguishing between attack and nor- mal packets, and adequately describe the characteristics of attack packets. Unfortunately, the confusion matrix in Figure 4.18 shows that the validation test has been adversely affected by the addition of PacketNumberFeatures. Indeed, the false positive rate has reached 0.34%, representing 4,546 false alerts, and the recall rate has dropped to 0.19%. The packet graph in Figure 4.18 shows very little correlation between predicted and actual attack packets, though MESONet does appear to detect the “portsweep” attack (however, mislabeled as “httptunnel”). Based on these validation test results, we can conclude that this particular fea- 64 ture set is unsuitable for misuse detection with MESONet, especially when compared to earlier, better performing, experiments. 65 I Ground Trfnh + _ MESONet x Label mam .... ~ , .. . .. t .......... .- F a I x+ l Predicted Normal Attack Normal 1256365 0 Attack 0 7178 False Positive Rate 0.0 Accuracy 1.0 Recall 1.0 Precision 1.0 Figure 4.17: Experiment 8 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). 66 I Ground TnIith + MESONet x .. ...................... ..... ...... _ . ........... . ..... ............. ......m... ........ T q . . I ........................ q M“ a ................ .m‘ .......................... xix. ...‘m* ..............:...x.. ................. ‘0‘”. ............................................. back ..x ......... x..x.-x. ...... X‘xx ........ x. ”XIX” *x. .......... Ix”: ...... ........................ -: 0 200000 400000 600000 800000 16+06 1.20+06 1.40406 Predicted Normal Attack Normal 1309465 4546 Attack 24998 49 False Positive Rate .0034 Accuracy .9779 Recall .0019 Precision .0106 Figure 4.18: Experiment 8 validation test results: packet graph (top) and confusion matrix (bottom). 67 Experiment 9. The ninth misuse detection experiment performed with MESONet uses the identical feature classes as Experiment 8, however with a differ- ent arrangement of features with regard to the sliding window. Specifically, in this experiment we append TimeFeatures and PacketNumberFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures, rather than including them directly in the sliding window (and hence duplicating them). The rationale for this rearrangement of features is that time- and number- based features cannot change rapidly over the course of two packets, and Experi— ment 8 showed that the typical organization of features was insufficient. As with the resubstitution test in Experiment 8, the confusion matrix in Fig- ure 4.19 is ideal. The recall rate of 100% verifies that the feature set is sufficient to describe characteristics of attacks, and the false positive rate of 0% shows that the feature set and training data are able to distinguish between attack and normal traffic. Compared to Experiment 8, the validation test confusion matrix in Figure 4.20 adds 33 true positives and reduces the number of false positives by 517. The recall rate of 0.32% has increased marginally, still less than one-quarter of that in Exper- iment 4, while the false positive rate of 0.3%, representing 4,029 false positives, is 0.07% greater. Experiment 9 shows that removing packet number-based features from the slid- ing window had a positive effect on the validation test, but the ill-effect of including PacketNumberFeatures still dominates. Additionally, this experiment, in combi- nation with Experiment 6, show that there is no rule regarding the placement of features with respect to the sliding window. 68 Predicted Normal Attack Normal 1256365 0 Attack 0 7178 False Positive Rate 0.0 Accuracy 1.0 Recall 1.0 Precision 1.0 Figure 4.19: Experiment 9 resubstitution test results: packet graph (top) and con- fusion matrix (bottom). 69 I I I f Ground Truth + f MESONet x mmB ,- . .........._ .. . .. .. ............................................... E.....+ ............ .1 Predicted Normal Attack Normal 1309982 4029 Attack 24965 82 False Positive Rate .0030 Accuracy .9783 Recall .0032 Precision .0199 Figure 4.20: Experiment 9 validation test results: packet graph (top) and confusion matrix (bottom). 70 4.6 Destination Grouping Features In this section we describe experiments based on the use of grouping features, specifically destination grouping. Experiment 10 applies destination grouping, via the HostFeatures class, to the feature set from Experiment 9, while Experiment 11 does the same to the feature set from Experiment 5. All feature classes used here are described in Section 3.4. Complete feature set configuration for all experiments conducted in this chapter can be found in Appendix B. Experiment 10. The tenth misuse detection experiment performed with MESONet introduces destination grouping. Specifically, this experi- ment uses the same feature set as in Experiment 9, appending TimeFeatures and PacketNumberFeatures to a two—packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures, and additionally applies des- tination grouping through the use of HostFeatures. The rationale for utilizing desti- nation grouping is twofold: First, we hypothesize that destination grouping will fur- ther reduce the false positive rate associated with the use of PacketNumberFeatures. Second, destination grouping ensures that the features extracted from both packets in the sliding window are related, insofar as they are destined for the same host. As with Experiment 9, the resubstitution test confusion matrix shown in Fig- ure 4.21 is perfect. The low false positive and high recall rates suggest that the training data and feature set are sufficient for misuse detection. As shown by the confusion matrix in Figure 4.22, the number of true positives has increased slightly over that in Experiment 9 (by 3), however the number of false positives has increased dramatically. In this case, when compared to Exper- iment 9, the false positive rate has increased to 1.07%, representing 14,101 false alerts. Though this feature set is sufficient for training, grouping by destination is harmful to the correct classification of normal traffic. 71 Predicted Normal Attack Normal 1256365 0 Attack 0 7178 False Positive Rate 0.0 Accuracy 1.0 Recall 1.0 Precision 1.0 Figure 4.21: Experiment 10 resubstitution test results: confusion matrix (bottom). 72 packet graph (top) and I Ground Trim: + MESONet x "Pm .. ..... . ....... . .. - .................. ..1 ................... ...................... .3 .......................... .+ ............ ...- 3. mm" ,. ......... .x.. ..... ix III-XIX ........ : .................................... . .............................. . ,1 ....................... q Predicted Normal Attack Normal 1299910 14101 Attack 24962 85 False Positive Rate .0107 Accuracy .9708 Recall .0033 Precision .0059 Figure 4.22: Experiment 10 validation test results: packet graph (top) and confusion matrix (bottom). 73 Experiment 11. The eleventh misuse detection experiment performed with MESONet uses a feature set identical to that in Experiment 10, with the exception of removing the PacketNumberFeatures class. Specifically, Experiment 11 appends TimeFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures., and additionally applies destination grouping through the use of HostFeatures. The resubstitution test confusion matrix shown in Figure 4.23 shows that this feature set is sufficient for describing the characteristics of normal packets, and for discriminating between normal and attack packets. The low false positive rate of 0.00007% and high recall rate of 99.9% are very nearly ideal. Compared to Experiment 10, the validation test confusion matrix in Figure 4.24 has 1,686 fewer false positives, and 167 more true positives, corresponding to a false positive rate of 0.94%, and and a recall rate of 1.0%, respectively. Although the removal of PacketNumberFeatures has clearly had a beneficial effect on performance, the false positive rate is still much higher than that in Ex- periment 5; 0.94% compared to 0.20%. 74 ' Ground ninth + _ MESONet x pa" .- ............... ........;. . .............. ........., ... ......... I Label 1 ; Predicted Normal Attack Normal 1256364 1 Attack 7 7171 False Positive Rate 7 E-7 Accuracy .9999 Recall .9990 Precision .9998 Figure 4.23: Experiment 11 resubstitution test results: packet graph (top) and confusion matrix (bottom). 75 ' Ground Truth + MESONet x m" b'm-++ ...-. .4 ................... ....................... 4 .4. ........... +. 4...... . -.......i;++- ......... . .............. _. mus ......4++4+..++* + ..+ ++,++4.. .. 44....444.“ ........ m ........ . ...... _. N.M bxinui .......... x ;... mm. ..., . .. ....; ..... .x. ............... ..x.. ., ...... . ..... I ..... . ....................... d X+ W EMIXI .. ..( x. ....-. .. Ix“ xix-x .. ..... *uiINII/I ....... .............'; .................... .. 0 200000 400000 600000 800000 1e+06 1.20406 1.40408 Packet Predicted Normal Attack Normal 1301596 12415 Attack 24795 252 False Positive Rate .0094 Accuracy .9722 Recall .0100 Precision .0198 Figure 4.24: Experiment 11 validation test results: packet graph (top) and confusion matrix (bottom). 76 4.7 Summary of Results This chapter has presented the results of 11 different experiments using MESONet for misuse detection. Different feature sets were used for each exper- iment, and we tested packet features, calculated features, window features, and grouping features. Table 4.3 summarizes each experiment described in this chapter. It lists the feature classes used and the false positive, accuracy, and recall rates for each vali- dation test. As shown in this table, Experiment I achieved the best false positive rate, 0.1%, however it also has the lowest recall rate, 0.09%. Experiment 4 achieved the best recall rate, 1.4%, and has the second-lowest false positive rate, 0.23%. In conclusion, we should not expect learning-based misuse detection systems to perform well when very little correlation exists between training and testing traffic. More precisely, based on the feature sets used, there is very little similarity between the attacks present in Tracefile A and those in 'Iracefile B. However, as we shall see in Chapter 5, feature sets used here for misuse detection perform admirably for anomaly detection. 77 Experiment False Positive IRaue(%D .Recafl lian3(%fi Accuracy (‘76) Feature Classes (110 (109 981) SinglePacketFeatures (137 0J1 97fl' SinglePacketFeatures, PayloadCountFeatures IL37 (121 9637 SinglePacketFeatures, PayloadValueFeatures (123 L42 97$) SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures (120 L39 9719 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures (140 L27 977' SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures (135 IL20 9713 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures (134 (119 917 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures, PacketNumberFeatures (130 (132 9713 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures, PacketNumberFeatures 10 L07 (133 971) HostFeatures, SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures, PacketNumberFeatures 11 (194 L00 9732 HostFeatures, SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures Table 4.3: Summary of misuse detection validation tests. 78 Chapter 5 Anomaly Detection Anomaly detection systems search network traffic for unusual, or anomalous, traffic. Although anomaly detection systems are unable to reliably detect known attacks, they are capable of detecting zero-day attacks [8,20, 21,23], and are thus a critical component of effective network intrusion detection. The remainder of this chapter describes anomaly detection with MESONet, and is organized as follows. Section 5.1 describes the training and testing procedures for anomaly detection with MESONet. Section 5.2 describes the analyses used to evaluate the performance of anomaly detection with MESONet. Sections 5.3 through 5.5 describe experimental results using different feature sets, and Section 5.6 summarizes anomaly detection with MESONet. 5.1 Training and Testing Procedures Performing anomaly detection with MESONet requires both training and test- ing phases. However, as stated in [35], anomaly detection tries to learn the char- acteristics of normal packets and determine if unknown packets do not belong to that class. For this reason, when performing anomaly detection, MESONet must 79 Tracefile Source Use Tracefile Size Packet Count Attacks C 3 / 8/ 1999 training 329MB 1256365 0 (filtered) D 3/29/1999 testing 216MB 1208920 12; 7789 packets Table 5.1: Tracefiles used for anomaly detection with MESONet. be trained only with normal traffic. Table 5.1 summarizes the tracefiles used to perform training and testing. Tracefile C, used for training, contains 1.25M nor- mal packets and 0 attack packets (attack packets were filtered out of Tracefile A to construct trcefile C). Tracefile D contains 1.2M normal packets and 7,789 attack packets, comprising 12 different network attacks. It is typical in network intrusion detection for the normal traffic to far outweigh the attack traffic [8]. Figure 5.1, termed a packet graph, shows the specific attacks that occur in the testing data (Tracefile D). Each point in Figure 5.1 represents one attack packet (For the sake of clarity, the 1.2M normal packets are not plotted). As shown in this figure, 12 different attacks occur in Tracefile D: smurf, yaga, guesstelnet, snmpget, xsnoop, sshtrojan, ntfsdos, sendmail, ftp-write, portsweep, ps attack, and secret [26]. It should be noted that although Figure 5.1 lists the particular attacks to which packets belong, anomaly detection itself does not distinguish between attacks. Because an anomaly detection system is trained with only normal traffic, re— substitution tests (such as those performed in Chapter 4) are inappropriate; we therefore only perform a validation test for each different feature set. With the exception of Experiment 1, the proof-of-concept experiment, MESONet is trained on the entirety of Tracefile C and tested on the entirety of Tracefile D. 5.2 Performance Analysis Performance analysis of anomaly detection with MESONet is significantly dif- ferent than that of misuse detection. For misuse detection, MESONet labeled every packet as either attack or normal traffic. For anomaly detection, MESONet labels 80 I I iI :I Ground ‘l‘ruth + yaga ,. . “4+ ..... + ........ 4...++.+..+ .......... .4 mpg“ L. ................. - .. .. .. . .. .. ... ...'.. .. .. .4 sshtojan ,. ..................... + ..................................................... ................................................................................ .4 ntfsms .. ........... .... .. .... flashiis ................................................ .1 mummb ................ . .. .. .. .. _ ... ,' ..................................................................................... _. Wm .. + .... 4 . ..., .. .... _, ”and m4“, ......................................................... , ....................................................... . .......................... .4 W ................ .. .. .,.. .. .. .. .. .. .....i... ............................................................................... 4 back .... ................................................................ ......... . ........................................................................ - ma ,. ......... . .. .. ,, .......................................................................................... .. NWm . ........................... l ........................... l ........................ l .............................. I ............................. l .......................... 0 200000 400000 600000 800000 le+06 129+06 Packet Figure 5.1: Tracefile D ground truth packet graph. every packet as normal traffic (because it is trained on only normal traffic), and returns the nearest-cluster distance for each feature vector with which it is tested. The IDSEval states that intrusion detection systems should be evaluated based on the number of false-positives occurring during one day [33]. However, as men- tioned in [36], this is not an appropriate way to evaluate the performance of an intrusion detection system. In particular, McHugh finds fault with the fact that the IDSEval does not require intrusion detection systems to report a confidence mea- sure with each alertl. However, using MESONet we are able to use nearest-cluster distance as a measure of anomaly probability. N earest-cluster distance, recalling Section 2.2, is the distance between the tested 4 1To be fair, McHugh also mentions that this may not be possible with some IDSs. To quote: “For example, a system that performs pattern matching either finds a match or it does not.” 81 feature vector and the nearest trained feature vector. As a large nearest-cluster distance signifies a feature vector that was not seen during training, it is by definition an anomaly. Analyzing the performance of anomaly detection with MESONet then focuses on the correlation between large nearest-cluster distances and attack traffic, with the hope that large nearest-cluster distances closely correspond to attack traffic. Because the analyses performed in Chapter 4 are unsuited for analyzing the correlation between nearest-cluster distances and attack traffic, we use a paramet- ric Receiver Operating Characteristic (ROC) curve to analyze the performance of anomaly detection with MESONet. The goal of ROC analysis is to determine two probabilities: Event+noise, and noise alone [14,16]. Applied to intrusion detection, a ROC curve is a plot of true— positive detection rate (event+noise) versus false-positive rate (noise alone), where true-positives represent detected attacks and false-positives represent normal packets incorrectly labeled as attacks. Ideally, a ROC curve increases along the true-positive axis very quickly. The strength of this approach is that it allows calculation of the true positive rate given a tolerable false positive rate. Figure 5.2 is a sample non-parametric ROC curve. Point A is located at .8 along the True Positive Fraction axis and at .4 along the False Positive Fraction axis. If this ROC curve were to belong to an intrusion detection system, it would be correct to state that it would detect 80% of all attacks with a 40% false positive rate. A parametric ROC curve is constructed based upon the decreasing probability of detection (the nearest—cluster distance, in this case). For MESONet, this means that to construct a parametric ROC curve all tested samples must first be sorted in descending order by their nearest-cluster distance. The sorted list must then be traversed, and for each sample if it is an attack we plot a point higher along the true-positive axis, while if it is normal we plot a point farther right along the false- 82 Sample ROC Curve 1 I I I f I r 05 . ................ ........ /....2; .................... .................... .............. True Positive Fraction 0.2 _. .. ...._ .. .. .. / // o, r ...... / ................ ................... ........... ....... ........... ........ . ....... - 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction Figure 5.2: Sample ROC curve. positive axis. In this way the nearest—cluster distance decreases from the origin to the point (1, 1). In addition to a per-packet ROC curve, where the curve is adjusted based on the classification of each packet, we also construct two additional ROC curves: A per-attack ROC curve, and a time-agglomerative ROC curve. In a per-attack ROC curve, we change the true-positive axis to represent the fraction of different attacks that are detected, rather than the fraction of attack packets that are detected. This ROC curve is constructed in response to the realiza- tion that MESONet may not detect all attack packets belonging to a given attack, but it may detect at least one attack packet belonging to each attack. In a time-agglomerative ROC curve, we agglomerate per-attack results via a 83 time-based sliding window. When traversing the sorted list of nearest-cluster dis- tances, we also examine packets that arrived at a similar time to the packet currently being examined. If any of them is actually an attack packet, then we count it as a detection. For a l-second sliding window this automatically reduces the detection unit from the number of packets to the number of seconds. Applied to Tracefile D, this reduces the total possible number of detections from 1.2M (the number of packets) to 79,200 (the number of seconds in the 22-hour long tracefile), thereby addressing the base-rate fallacy [8], and keeping the number of false alerts at a manageable level. 5.3 Proof of Concept Experiment 1. To establish that MESONet is capable of using the nearest- cluster distance for anomaly detection, we performed an m-fold cross-validation test (with m = 10), as described in [14], on the first 20,000 packets from tracefile A and recorded the nearest-cluster distance for each tested packet. We then calculated the mean and standard deviation of these distances, and found the mean nearest-cluster distance to be 62.52, and the standard deviation to be 59.5. We then trained MESONet on the first 20,000 normal packets from tracefile A, tested it on the first 20,000 packets from tracefile B (contains both attack and normal packets), and recorded the resulting nearest-cluster distances. Figure 5.3 is a per-packet plot of the nearest-cluster distances. It shows that the majority of packets have a low nearest-cluster distance, though there are groups of packets that have significantly larger nearest-cluster distances. Figure 5.4 is a plot of the packet label (attack or normal) and nearest-cluster distance, modulo the standard deviation. It should be noted that the horizontal lines are in fact closely-spaced marks, each of which represents one packet. The 84 10000 . 3 3 y a ++ 717 I V 44+ «91+; ++ 40+ +4} 1 WWW 0 2000 4000 6000 8W0 10000 12000 14000 16W 180% 2W0 Packet Figure 5.3: N earest-cluster distances. “line” appearing at standard deviation 0 near the middle of the figure represents a large amount of traffic having a nearest-cluster distance less than or equal to the mean, while marks above it represent packets with a nearest-cluster distance larger than the mean. The marks (and “lines”) in the packet label portion of the figure represent known labels (based on the IDSEval detection truth), and are marked as either “Normal” or “Attack.” Figure 5.4 shows that all attacks contained in the tested traffic are identified as having nearest-cluster distances greater than two standard deviations from the mean. Although a number false positives did occur, a nearest-cluster distance of greater than two standard deviations from the mean clearly separates the majority of attack and normal traffic. 85 Standard Deviations 0 2000 4000 6000 8000100001200014000160001800020000 Grodnd Truth ' + Packet Labels l i i i i i i i j 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Packet Figure 5.4: Nearest-cluster standard-deviations(top) and packet labels (bottom). This experiment suggests that nearest—cluster distance may be an accurate mea- sure of network anomalies, and subsequent experiments establish the correlation between attack packets and large nearest-cluster distance. 86 5.4 Feature Sets from Misuse Detection In this section we describe experiments based on the best performing feature sets from the misuse detection experiments of Chapter 4. If any of these feature sets also perform well for anomaly detection, then it will show that the features used here are significant for both misuse and anomaly detection. Complete feature set configuration for all experiments conducted in this chapter can be found in Appendix C. Experiment 2. The second anomaly detection experiment performed with MESONet used the same feature set as Experiment 6 in Section 4.4. For this experiment, we append TimeFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures. This experiment is the first of three using the best performing feature sets from Chapter 4. Figure 5.5 is the per-packet ROC curve resulting from using this feature set for anomaly detection. As shown in this figure, initial performance quickly reaches a 72% true positive rate, with a 15% false positive rate. Performance then slowly increases to a 90% true positive rate, with a 60% false positive rate. As shown by Figure 5.5, and confirmed by upcoming experiments, per-packet ROC analysis is, at best, a losing battle; the false positive rates are much too high. Indeed, even a 10% false positive rate results in 120,000 false alerts. Figure 5.6 shows more clearly that the feature set used for this experiment captures the characteristics of all attacks. Indeed, 100% of attacks are detected with a false positive rate of 28%. However, even though all attacks are detected, the 28% false positive rate results in 336,000 false alerts. Figure 5.7 shows that there is a time-based correlation between large nearest- cluster distances and attack packets. When using a 1-second window, this feature set detects all attacks with a false positive rate of 16.5%. However, when utilizing 87 0,7 -..- V .. . - 0.5 -. .............. ...........- ................... - .................... ........ .. .......... ..- .................. .. 05 _ ........... ............ . ....... .................... 0,4 _ ...... .............. ....... ............... ......... .................. .. True Positive Fraction (T PF) 2 .' I I I 2 2 Z : 01 .... ................. ..;. . ................ . . .. .... ........ ............. .................. _. ' . . . . - - u - ~ .' I : ; 2 L I 2 2 1 1 . ; l .' 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.5: Per-packet ROC curve for Experiment 2. a 60—second window, this feature set detects all attacks with a false positive rate of 0.5%, corresponding to 396 false alerts throughout the 22-hour tracefile —- The first practical level of false alerts achieved by MESONet. Subsequent experiments attempt to lower the false positive rate even further, while at the same time reducing the size of the sliding window. This experiment establishes the trend, born out by subsequent experiments, that per-packet and per-attack ROC analyses are plagued by a high false positive rate. This trend is not surprising, especially when considering research showing that network traffic is inherently “strange” [17,40]. However, subsequent experiments also confirm that time-based ROC analysis, even when using the small l-second window, are effective at detecting network attacks with a low false positive rate. 88 True Positive Fraction (TPF) 0.2 0.1 i i i i i i i i i 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.6: Per-attack ROC curve for Experiment 2. 89 True Positive Fraction (TPF) 0.9 .. :4 :. ........-,. .......... ... -.. ...... . ......... r. . . T l ' ' ' I .. ' l U ,1 0.5 T ---_i__"___*~u-i;h-1 .. 0.3 --‘ ....................................................... 0.2 [- 1-second Window ---x--- 20-second Window ------a---~- 40-seoond Window ---o<- l60-secon9 Windowlm «h 0.1 L l i i i i 0 0.02 0.04 0.08 0.08 0.1 0.12 0.14 0.16 0.18 False Positive Fraction (FPF) Figure 5.7: Time-agglomerative ROC curve for Experiment 2. 90 Experiment 3. The third anomaly detection experiment performed with MESONet used the same feature set as Experiment 5 in Section 4.4. For this experiment, we use SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures of two pack- ets. The difference between this experiment and Experiment 2 is that here we include TimeFeatures in the sliding window. Figure 5.8 is the per-packet ROC curve resulting from using this feature set for anomaly detection. Compared to Figure 5.5, this ROC curve is more gradual, increasing to a 70% true positive rate, with a 20% false positive rate, followed by a 90% true positive rate, with a 60% false positive rate. 0,4 H. True Positive Fraction (TPF) 0 i i i i i i i i i o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.8: Per-packet ROC curve for Experiment 3. Compared to Figure 5.6, the per-attack ROC curve in Figure 5.9 increases more 91 quickly, detecting 100% of attacks with a false positive rate of 18%. T I I l I I l a? ... .. ..- O. . L: i .5 ' S u. ..... .... 8 .. ............... - § , .................................................................................................................... . o J l l I l l l l l 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.9: Per-attack ROC curve for Experiment 3. Figure 5.10 shows an improvement in the performance of the l-second window when compared to Experiment 2, achieving a 100% true positive rate, with an 11.8% false positive rate. However, performance of the 20-second window is slightly worse, detecting 100% of attacks at a 2% false positive rate as opposed to a 1.5% false positive rate. When utilizing a 60-second window, this feature set detects all attacks with a false positive rate of 0.7%, corresponding to 554 false alerts. As in Experiment 2, the true positive rates achieved by the per-packet and per- attack ROC analyses are crippled by a high false positive rate, just as time-based ROC analysis is successful at achieving a high true positive rate with few false alerts. 92 True Positive Fraction (TPF) 0.6 ..... X!” 0.5 V 0.4 0.3 - v ....a. ...... .. ........................_ Y I I“. 0.2 1-second Window ---X--- 20-second Window we ----- ‘ 40-seoond Window -v-o-- ; - : z . 60-second Window ..."... 1 1 l 1 l J A L 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 False Positive Fraction (FPF) .0 d Figure 5.10: Time-agglomerative ROC curve for Experiment 3. 93 Experiment 4. The fourth anomaly detection experiment performed with MESONet used the same feature set as Experiment 4 in Section 4.4. For this experiment, we use SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures and PayloadCountFeatures of two packets. Figure 5.11 is the per-packet ROC curve resulting from using this feature set for anomaly detection. This ROC curve is very similar to that in Figure 5.8, achieving a roughly equivalent 69% true positive rate, with a 20% false positive rate, followed by a 91% true positive rate, with a 60% false positive rate. 0.7 .............. True Positive Fraction (T PF) 02 _ , , . _ . .. .................. a 01 .... .. ............... , ...... ....... ................ -. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.11: Per-packet ROC curve for Experiment 4. The per-attack ROC curve in Figure 5.12 also achieves a 100% true positive rate, with an 18% false positive rate, similar to Figure 5.9. However, this ROC curve achieves a 58% true positive rate at a false positive rate of 1%, compared to 94 the 2.5% from Figure 5.9. 09 l' ..... 03 , .................. , ......... , , , ......... , ...... 0.5 ...... ... . ... , , ....- .. .. ,. .. .. . .. . ., .. .. .. .................. .. 0,, ......... ,. .. .. . -... ................. - True Positive Fraction (T PF) O 2 ......MH... \. . ... . , . . ...... r. .( .. .......... ... ......H... .....g..., .... ...,. .. \... .............{.......... .........(..................q : f : I E i i f 5 Q, _ . .. . o g g . ; z s ; ; i O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.12: Per-attack ROC curve for Experiment 4. Figure 5.13 shows that this feature set is capable of detecting 100% of attacks within a l-second window, at a false positive rate of 14%, slightly worse than that in Experiment 3. However, when using a 40-second or 60-second window, 100% of attacks are detected with the low false positive rate of 0.2%, corresponding to 158 false alerts. This is the best performing time-agglomerative ROC analysis conducted for anomaly detection, and reaches an acceptable number of false alerts. This experiment is the last using a misuse detection feature set. All three have shown that per-packet and per-attack ROC analyses are ineffectual, while even a 1- second time-agglomerative ROC analysis performs with an acceptable false positive rate when detecting 100% of network attacks. Overall, larger time windows during 95 True Positive Fraction (T PF) 0.4 l- . . ............ . 03 .. . i - . i i i ‘ 1seoond Window; ---x--- 01 r- ... . 'f‘ . .. .. ., .... . .. .. ......m... .. .. . .. .. . 2058mm Wirmw ______ 8"“ -i ; 2 s 2 5 ; 40-seoond Window ---0-- ‘ - = ‘- - 60-seoond Window 4 o F L l 1 l l 1 l 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 False Positive Fraction (FPF) Figure 5.13: Time-agglomerative ROC curve for Experiment 4. time-agglomerative ROC analysis have resulted in higher true positive and lower false positive rates. This suggests that there is a time-based relationship between network anomalies and network attacks, specifically when using the best performing feature sets from misuse detection. 96 5.5 Derivative Features In this section we describe experiments using feature sets that include grouping, derivatives, packet number, time, single packet features, and in one case, payload count features. Complete feature set configuration for all experiments conducted in this chapter can be found in Appendix C. Experiment 5. The fifth anomaly detection experiment performed with MESONet used a different feature set from those used for misuse detection in Chapter 4. For this experiment, we use a destination-grouped two—packet slid- ing window of SinglePacketFeatures and PayloadCountFeatures, appended to PacketNumberFeatures and the derivative of TimeFeatures. The main difference between this experiment and others is the inclusion of derivative features (through the use of the DerivativeFeatures class) and PacketNumberFeatures. The ratio- nale for including these features is that some network attacks have unusual timing characteristics (for example, DoS attacks), or send unusually high numbers of certain packets over a short period of time (for example, portsweeps) [26]. Overall, the per-packet ROC curve for this feature set, shown in Figure 5.15 is only marginally better than that in Experiments 2 through 4, achieving a 90% true positive rate, with a 50% false positive rate. However, an interesting characteristic of this ROC curve is that it reaches a 67% true positive rate very quickly, with only a 5% false positive rate. This characteristic shows that feature vectors with large nearest-cluster distances are very likely to be attacks, but that middling nearest- cluster distances are not an accurate measure of attacks. Similar to the per-packet ROC curve, the per-attack ROC curve shown in Fig- ure 5.15 also reaches a 100% true positive rate quickly, with only an 8% false positive rate. Figure 5.15 shows that when using this feature set, large nearest-cluster dis- tances cover all network attacks. 97 0.3 - .. . . 0.7 0.6 05 . .- ............ ................ 0.4 _.... .. .. .. .. .5 .. .. .. True Positive Fraction (T PF) 03 ...... .3). .. . ... . .. .. .1. ... .,3.. .. .. ., .....', .. ., ........ .....,‘. ............... .. 02 .. .. ... .,',. .. .. , ,. .. , ,. .. ,.. .. ., .. .. .. . , _. ,4 . ...,.. ..',. , ................. ., o i i ; l L i i i i 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.14: Per-packet ROC curve for Experiment 5. With a l-second window, the time-agglomerative ROC curve shown in Fig- ure 5.16 reaches a 100% true positive rate with a 3% false positive rate. Larger time windows also achieve a 100% true positive rate, however with a 6% false positive rate. As with both the per-packet and per-attack ROC curves, the time- agglomerative curve achieves a high true positive rate faster than experiments from Section 5.4. However, for all time-agglomerative ROC curves, Experiment 2 through 4 achieve the same true positive rate (100%) at a lower false positive rate. 98 True Positive Fraction (T PF) 0.3 0.2 0.1 O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 False Positive Fraction (FPF) Figure 5.15: Per-attack ROC curve for Experiment 5. 99 True Positive Fraction (T PF) 0.5 0.4 0.3 0.2 0.1 Figure 5.16: L i L i i 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 False Positive Fraction (FPF) Time-agglomerative ROC curve for Experiment 5. 100 0.2 Experiment 6. The sixth anomaly detection experiment performed with MESONet removes the PayloadCountFeatures from the feature set of Experi- ment 5. For this experiment, we use a destination-grouped two-packet sliding window of SinglePacketFeatures, appended to PacketNumberFeatures and the derivative of TimeFeatures. The rationale for this experiment is that payload is the most variable aspect of network traffic, and hence is most likely to generate false positives. The per-packet ROC curve for this feature set, shown in Figure 5.17 is nearly identical to that in Experiment 5, achieving a 70% true positive rate, with a 10% false positive rate, increasing to a 90% true positive rate, with a 50% false positive rate. 09 ._ .. ; 4' . ............. .' Q4 ... ... - ,. _ . . .. . .. . ., True Positive Fraction (T PF) 0.3 _. ................ .................. o i i i i i i i i 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.17: Per-packet ROC curve for Experiment 6. 101 Although the per-attack ROC curve for this feature set, shown in Figure 5.18, also achieves a 100% true positive rate, the false positive rate of 12% is slightly higher than that in Experiment 5. Figure 5.18 also shows that this feature set is “slower” at detecting all the attacks present in Tracefile D, which suggests that the feature set used in Experiment 5 is better at correlating large nearest-cluster distances to attack packets. 0.9 . ................... ............ ..- 6 b- ..... ..................... .. .................. . .......... . ..................... . ........ . ........ ...., ................. l ................... - - : : ‘ : : : ; : ; 0.5 .-.. ............ .......... ................... ......... . ...... ..... .................. ........ .................. a 0'4 _ . ...... . Tnie Positive Fraction (TPF) O 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Fraction (FPF) Figure 5.18: Per-attack ROC curve for Experiment 6. Finally, the time-agglomerative ROC curve in Figure 5.19 shows a strong time- based correlation between large nearest—cluster distance and attack packets, achiev- ing a 100% true positive rate, with a 2% false positive rate when using a 1—second window, and with a 0.6% false positive rate when using a 20—second or larger win- dow. Compared to Experiment 5, Figure 5.19 shows a stronger correlation between 102 large nearest-cluster distances and attack packets. 1 . . i ? 1.4 a: .................................................................................................................................. . CL : b : é .......................................................................................................... . i a ......................................................... .............................................................................................. , ................ 4 g . .................................................... . ........................................ , .............. . ’ 1-seoonoWlndow:---X--- 20-second Window mem « 40-second Window --0 i i i i i a .6039”? W'm. ‘ 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 False Positive Fraction (FPF) Figure 5.19: Time-agglomerative ROC curve for Experiment 6. 103 5.6 Summary of Results This chapter has presented the results of 6 different experiments using MESONet for anomaly detection. Different feature sets were used for each ex- periment, and we tested the best performing feature sets from Chapter 4 as well as feature sets specific to anomaly detection. Table 5.2 is a summary of the feature classes used for each experiment described in this section. It lists the true positive rate at a 10% false positive rate for both per-packet and per-attack ROC curves, and the true positive rate for 20—second time-agglomerative ROC curves at 1% false positive rate. In conclusion, the success of time-agglomerative ROC curves suggest that anomaly detection systems have potential for detecting the presence of, if not the specific, attack. Although merely alerting that an attack is occurring is less useful than pinpointing the attack packets, it is a necessary step for the wide deployment of intrusion detection systems, especially if they are to extend beyond network-based systems. 104 Experiment Ihxket 'TF41(96) iknmck 'TFT{(%D 30—second 'TFT{(%D Feature Classes 2 44 75 91 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures 34 75 91 SlidingWindoeraturea, SinglePacketFeatures, PayloadCountFeatures, TimeFeatures 55 91 SlidingWindoeratures, SinglePacketFeatures, PayloadCountFeatures 69 100 100 HostFeatures, SlidingWindoerautres, SinglePacketFeatures, PayloadCountFeatures, PacketNumberFeatures, TimeFeatures, DerivativeFeatures 70 91 100 HostFeatures, SlidingWindoerautres, SinglePacketFeatures, PacketNumberFeatures, TimeFeatures, DerivativeFeatures Table 5.2: Summary of anomaly detection validation tests: per-packet and per- attack True Positive Rate (TPR) at 10% false positive rate, 20—second time- agglomerative TPR at 1% false positive rate. 105 Chapter 6 Conclusions and Future Work In this thesis we explored the application of developmental learning techniques to network intrusion detection. We used a MESOTree-based developmental learn- ing algorithm, in conjunction with runtime selectable features, to construct a uni- fied framework for misuse and anomaly detection systems. This framework, called MESONet, encapsulates network traffic capturing, feature extraction, training, and misuse and anomaly testing. We have shown that machine learning techniques can be successfully applied in a systems domain, specifically network intrusion detection. As systems become increasingly complex, machine learning approaches become increasingly important, as they offer an approach that is capable of dealing with this complexity. We have evaluated the performance of MESONet through experimentation, con- ducting a series of misuse and anomaly detection experiments on a subset of the 1999 DARPA IDS Evaluation datasets. Performance tests shows that while MESONet is capable of misuse detection with high accuracy, it suffers from low attack detection rate. On the other hand, depending on the feature set used, MESONet achieves high accuracy when on anomaly detection, achieving a 100% true positive rate with a 0.2% false positive rate when using a 40-second detection window. 106 Possible future work in network intrusion detection includes incorporating dif- ferent types of features into MESONet, including non-stationary models [35] and additional traffic statistics [41]. A problem facing all network intrusion detection research, however, is the lack of available real-world network traffic that includes a ground truth. Possible future work not related to network intrusion detection includes the application of these techniques to general system security. For example, features regarding system behavior, including process state, hardware activity, and audit logs, could be, extracted from a running operating system kernel. A framework similar to MESONet could be created for these features, after which developmental learning techniques could be used to classify system behavior. A difficulty associated with this research includes the lack of ground truth for training data. 107 APPENDICES 108 Appendix A Feature Development Composable feature sets do little good for rapid experimentation if developing new features is onerous, therefore MESONet provides a common interface for de- veloping new features. Figure A.1 is a code listing of the Java interface that must be implemented by any class that extracts features from network packets. The toFloatArray(. . .) methods are called with the current captured packet, which must be either a TCP, UDP, or ICMP packet, and its arrival time. The init(. . .) method is called at object instantiation time, and is passed the enclosing XML element from the configuration file for initialization purposes. As an example of the ease by which new features can be developed for MESONet Figure A2 is a listing of the hypothetical TCPPercentage class. This simplistic class calculates the percentage of traffic that uses the TCP protocol, on a packet-by—packet basis. Figure A3 is the relevant portion from a configuration file that would use this new feature. If this feature were to be used with destination grouping (by using HostFeatures, for example), then it would calculate the percentage of TCP traffic destined for the same host. No additional work is needed to incorporate this feature into MESONet. 109 import org.jdom.*; import net.sourceforge.jpcap.util.*; import net.sourceforge.jpcap.net.*; /* * Interface for all feature extractors in MESONet. */ public interface FeatureStrategy { // Initialization // featureStrategyTag is the enclosing XML // element from the configuration file. public void init( Element featureStrategyTag ); // One of these are called for each packet that is captured // by MESONet. They return an array of floats, that MUST // be the same size for every packet. public float[] toFloatArray( TCPPacket pkt, Timeval time ); public float[] toFloatArray( UDPPacket pkt, Timeval time ); public float[] toFloatArray( ICMPPacket pkt, Timeval time ); Figure A.1: Java interface for feature extraction. 110 import org.jdom.*; import net.sourceforge.jpcap.net.*; import net.sourceforge.jpcap.util.*; /* * Example implementation of a feature extractor. * This class calculates the percentage of all traffic that is * TCP, and returns a 1-element float array containing this * percentage. */ public class TCPPercentage implements FeatureStrategy { private float count=0; private float tcp=0; // No initialization. public void init( Element featureStrategyTag ) { } // Adjust the count of TCP traffic. public float[] toFloatArray< TCPPacket pkt, Timeval time ) { ++tcp; ++count; return new floatfl { tcp / count }; } // Adjust the count of non-TCP traffic. public float[] toFloatArray( UDPPacket pkt, Timeval time ) { ++count; return new floatfl { tcp / count }; } public float[] toFloatArray< ICMPPacket pkt, Timeval time ) { ++count; return new floatf] { tcp / count }; } Figure A.2: Listing of all code for the TCPPercentage class. Figure A.3: Feature set using the TCPPercentage class. 111 Appendix B Misuse Detection Feature Set Configurations This appendix lists the complete feature set configurations for all misuse detec- tion experiments conducted in Chapter 4. Figure B.1 lists the feature set configuration used for Experiment 1, consisting of only the SinglePacketFeatures class. I Figure B.1: Feature set configuration for Experiment 1. Figure B.2 lists the feature set configuration used for Experiment 2, which augments SinglePacketFeatures with the PayloadCountFeatures class via the use of the CompoundFeatureStrategy. Figure 8.2: Feature set configuration for Experiment 2. 112 Figure 3.3 lists the feature set configuration used for Experiment 3, which aug- ments SinglePacketFeatures with the PayloadValueFeatures class. It should be noted PayloadValueFeatures violates the ordering principle described in Sec- tion 3.2. —— ‘34—.- A“! Figure 8.3: Feature set configuration for Experiment 3. Figure B.4 lists the feature set configuration used for Experiment 4, which uses SlidingWindoeratures to construct feature vectors SinglePacketFeatures and PayloadCountFeatures of two packets. from the Figure 8.4: Feature set configuration for Experiment 4. 113 Figure B.5 lists the feature set configuration used for Experiment 5, which uses SlidingWindowI-‘eatures to construct feature vectors from the SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures of two pack- ems Figure 8.5: Feature set configuration for Experiment 5. Figure B.6 lists the feature set configuration used for Experiment 6, which appends TimeFeatures to a two-packet sliding window consisting of SinglePacketFeaturestaxiPayloadCountFeatures. Figure 8.7 lists the feature set configuration used for Experiment 7, which uses a three-packet sliding window of SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures. Figure 88 lists the feature set configuration used for Experiment 8, which uses SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures, PayloadCountFeatures, TimeFeatures, and 114 Figure 8.6: Feature set configuration for Experiment 6. PacketNumberFeatures of two packets. Figure B.9 lists the feature set configuration used for Experiment 9, which appends TimeFeatures and PacketNumberFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures. Figure 310 lists the feature set configuration used for Experiment 10, which appends TimeFeatures and PacketNumberFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures, and additionally applies destination grouping through the use of HostFeatures. Figure 8.11 lists the feature set configuration used for Experiment 11, which appends TimeFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures, and additionally applies des- 115 Figure 8.7: Feature set configuration for Experiment 7. tination grouping through the use of HostFeatures. 116 Figure 8.8: Feature set configuration for Experiment 8. 117 Figure 8.9: Feature set configuration for Experiment 9. 118 Figure 310: Feature set configuration for Experiment 10. 119 Figure 311: Feature set configuration for Experiment 11. 120 Appendix C Anomaly Detection Feature Set Configurations This appendix lists the complete feature set configurations for all anomaly de- tection experiments conducted in Chapter 5. Figure C.1 lists the feature set configuration used for Experiment 2, which appends TimeFeatures to a two-packet sliding window consisting of SinglePacketFeatures and PayloadCountFeatures. Figure C.2 lists the feature set configuration used for Experiment 3, which uses Slidingwindoeratures to construct feature vectors from the SinglePacketFeatures, PayloadCountFeatures, and TimeFeatures of two pack- ets. Figure 0.3 lists the feature set configuration used for Experiment 4, which uses SlidingWindoeratures to construct feature vectors from the SinglePacketFeatures and PayloadCountFeatures of two packets. Figure C.4 lists the feature set configuration used for Experiment 5, which uses a destination-grouped two-packet sliding window of SinglePacketFeatures and PayloadCountFeatures, appended to PacketNumberFeatures and the derivative of TimeFeatures. Figure C.5 lists the feature set configuration used for Experiment 6, which 121 ‘1—1 Figure C.1: Feature set configuration for Experiment 2. uses a destination-grouped two-packet sliding window of SinglePacketFeatures, appended to PacketNumberFeatures and the derivative of TimeFeatures. 122 Figure C.2: Feature set configuration for Experiment 3. Figure C.3: Feature set configuration for Experiment 4. 123 Figure C.4: Feature set configuration for Experiment 5. 124 Figure C.5: Feature set configuration for Experiment 6. 125 Bibliography [1] 1999 KDD Cup. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [2] The shmoo group, capture the flag contest, DEFCON-9. Website, July 2001. http: //www.cs.ucsb.edu/ ”dhm/defcon—tcpdumps. [3] tcpdump/libpcap, 2001. http://www.tcpdump.org. [4] UCLA CSD packet traces, D—WARD project. Website, December 2002. http://lever.cs.ucla.edu/ddos/traces. [5] Charu C. Aggarwal, Cecilia Procopiuc, Joel L. Wolf, Philip S. Yu, and Jong Soo Park. Fast algorithms for projected clustering. In Proceedings of the ACM SI C- MOD Conference on Management of Data, pages 61—72, Philadelphia, Penn- sylvania, USA, June 1999. [6] J. P. Anderson. Computer security threat monitoring and surveillance. Tech- nical report, NIST, April 1980. [7] M. Asaka, T. Onabuta, T. Inoue, S. Okazawa, and S. Goto. A new intrusion detection method based on discriminant analysis. In Institute of Electronics, Information, and Communication Engineers (IEI CE), number 577 in 'Iransac- tions, 2001. [8] Stefan Axelsson. The base-rate fallacy and its implications for the difficulty of intrusion detection. In ACM Conference on Computer and Communications Security, pages 1—7, 1999. [9] Stefan Axelsson. Intrusion detection systems: A survey and taxonomy. Tech- nical Report 99—15, Chalmers Univ., March 2000. [10] Bruce Blumberg, Marc Downie, Yuri Ivanov, Matt Berlin, Michael Patrick Johnson, and Bill Tomlinson. Integrated learning for interactive synthetic char- acters. In Computer Graphics, Proceedings of SIGCRAPH’OZ, 2002. [11] J. Cannady. Artificial neural networks for misuse detection. In Proceedings of the 1.9.98 National Information Systems Security Conference (NISSC ’98) Octo- ber 5-8 1998. Arlington, VA., pages 443—456, 1998. 126 [12] Patrick Charles, John Guthrie, Justin Haddad, Steve Bitteker, and David B. Knoester. J PCap: Network packet capture facility for java, May 2001. http: / / sourceforge. net / pro j ects / jpcap. [13] William W. Cohen. Fast effective rule induction. In Proc. of the 12th Inter- national Conference on Machine Learning, pages 115—123. Morgan Kaufmann, July 1995. [14] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification, Second Edition. John Wiley and Sons, Incorporated, New York, New York, USA, 2001. [15] Computer Economics. Virus attack costs on the rise — again. Website, 2002. http: / /www.computereconomics.com. [16] James P. Egan. Signal Detection Theory and ROC Analysis. Series in Cognition and Perception. Academic Press, June 1975. [17] S. Floyd and V. Paxson. Difficulties in simulating the internet. In IEEE/A CM Transactions on Networking ( T 0N ), volume 9, August 2001. [18] Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji, and Thomas A. Longstaff. A sense of self for Unix processes. In Proceedinges of the 1996 IEEE Symposium on Research in Security and Privacy, pages 120—128. IEEE Computer Society Press, 1996. [19] A. Ghosh, J. Wanken, and F. Charron. Detecting anomalous and unknown intrusions against programs. In Proceedings of the 1998 Annual Computer Se- curity Applications Conference (ACSAC’98), December 1998., pages 259—267. Los Alamitos, CA, USA : IEEE Comput. Soc, 1998, 1998. [20] Anup K. Ghosh and Aaron Schwartzbard. A study in using neural networks for anomaly and misuse detection. In Proceedings of the 8th USENIX Security Symposium, pages 141—152, August 1999. [21] Anup K. Ghosh, Aaron Schwartzbard, and Michael Schatz. Learning program behavior profiles for intrusion detection. In Proceedings 1 st USENIX Workshop on Intrusion Detection and Network Monitoring, pages 51—62, April 1999. [22] Wey-Shiuan Hwang and J uyang Weng. Hierarchical discriminant regression. In IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1277—1293, November 2000. [23] K. Julisch. Dealing with false positives in intrusion detection. In 3rd Intl. Workshop on Recent Advances in Intrusion Detection RAID, Lecture Notes in Computer Science. Springer-Verlag, 2000. 127 3.: [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] Eric P. Kasten and Philip K. McKinley. A framework-based classifier for online learning. Technical Report MSU-CSE—04-1, Department of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA, January 2004. Richard A. Kemmerer. N STAT: A model-based real-time network intrusion detection system. Technical Report TRCSQ7-18, University of California, Santa Barbara, 1998. Kris Kendall. A database of computer attacks for the evaluation of intrusion detection systems, 1998. Master’s thesis. Sandeep Kumar and Eugene H. Spafford. A Pattern Matching Model for Misuse Intrusion Detection. In Proceedings of the 17th National Computer Security Conference, pages 11—21, 1994. T. Lane and C. E. Brodley. An application of machine learning to anomaly detection. In Proc. 20th NIST-NCSC National Information Systems Security Conference, pages 366—380, 1997. Terran Lane and Carla E. Brodley. Temporal sequence learning and data re- duction for anomaly detection. ACM Transactions on Information and System Security, 2(3):295—331, 1999. Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok. A data mining framework for building intrusion detection models. In IEEE Symposium on Security and Privacy, pages 120—132, 1999. Lin, Wang, and Jajodia. Abstraction-based misuse detection: High-level spec- ifications and adaptable strategies. In PCSF W: Proceedings of The 11th Com- puter Security Foundations Workshop. IEEE Computer Society Press, 1998. Richard Lippmann, David Fried, Isaac Graf, Joshua Haines, Kristopher Kendall, David McClung, Dan Weber, Seth Webster, Dan Wyschogrod, Robert Cunningham, and Marc Zissman. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In Proceedings of the DARPA Information Survivability Conference and Exposition, Los Alamitos, CA, 2000. IEEE Computer Society Press. Richard Lippmann, Joshua W. Haines, David J. flied, Jonathan Korba, and Kumar Das. The 1999 darpa off-line intrusion detection evaluation. Comput. Networks, 34(4):579—595, 2000. M. V. Mahoney and P. K. Chan. An analysis of the 1999 DARPA/ Lincoln Lab- oratory evaluation for network anomaly detection. In Proc. of Recent Advances in Intrusion Detection (RAID), pages 220—237, 2003. 128 [35] Matthew V. Mahoney and Philip K. Chan. Learning nonstationary models of normal network traffic for detecting novel attacks. [36] J. McHugh. Testing intrusion detection systems: A critique of the 1998 and 1999 darpa off-line intrusion detection system evaluation as performed by lin- coln laboratory. In ACM Transactions on Information and System Security, November 2000. [37] Sreerama K. Murthy. Automatic construction of decision trees from data: A multi—disciplinary survey. Data Mining and Knowledge Discovery, 2(4):345—389, 1998. [38] Darren Mutz. Mucus tcpdumps. Website, November 2003. http: //www.cs.ucsb.edu/ ~dhm/mucus-tcpdumps. [39] NFR Inc. Network flight recorder, 1997. http://www.nfr.com. [40] V. Paxon and S. Floyd. Why we don’t know how to simulate the internet. In In Proceedings of the 29th Conference on Winter Simulation (WSC), 1997. [41] Vern Paxson. Bro: a system for detecting network intruders in real—time. Com— puter Networks (Amsterdam, Netherlands: 1999), 31(23—-24):2435-2463, 1999. [42] P. A. Porras and P. G. N eumann. EMERALD: Event monitoring enabling responses to anomalous live disturbances. In Proc. 20th NIS T -N CS C National Information Systems Security Conference, pages 353—365, 1997 . [43] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice- Hall, 1995. [44] R. Sekar, Y. Guang, S. Verma, and T. Shanbhag. A high-performance network intrusion detection system. In ACM Conference on Computer and Communi- cations Security, pages 8—17, 1999. [45] Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang. A novel anomaly detection scheme based on principal component classifier. In Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, pages 172—179, 2003. Held in in conjunction with the Third IEEE International Conference on Data Mining ICDM’03. [46] Arlindo Silva, Ana Neves, and Ernesto Costa. Genetically programming net- works to evolve memory mechanisms. In Wolfgang Banzhaf, Jason Daida, Agos- ton E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela, and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Con- ference, volume 2, page 1448, Orlando, Florida, USA, 13-17 1999. Morgan Kaufrnann. 129 ‘ P." “-.-I [47] G. Vigna, S. Eckmann, and R. Kemmerer. The stat tool suite. In DARPA In- formation Survivability Conference and Exposition ( DIS CEX ) IEEE Computer Society Press, 2000. [48] Giovanni Vigna and Richard A. Kemmerer. NetSTAT: A network-based intru- sion detection approach. In Annual Computer Security Applications Conference ACSA C, 1998. [49] J. Weng and W. Hwang. Online image classification using ihdr. In International Journal on Document Analysis and Recognition, volume 5, pages 118—125, 2003. [50] Juyang Weng. A theory for mentally developing robots. In Proceedings 2nd In- ternational Conference on Development and Learning. IEEE Computer Society Press, June 2002. [51] Juyang Weng, Colin H. Evans, Wey S. Hwang, and Yong-Beom Lee. The de- velopmental approach to artificial intelligence: Concepts, developmental algo- rithms and experimental results. Technical Report MSU-CPS98-25, Michigan State University, July 1998. 130 [I'll