Towards machine learning based source identification of encrypted video traffic
         The rapid growth of the Internet has helped to popularize video streaming services, which has now become the most dominant content on the Internet. The management of video streaming traffic is complicated by its enormous volume, diverse communication protocols and data formats, and the widespread adoption of encryption. In this thesis, the aim is to develop a novel firewall framework, named Soft-margined Firewall, for managing encrypted video streaming traffic while avoiding violation of user privacy. The system distinguishes itself from conventional firewall systems by incorporating machine learning and Traffic Analysis (TA) as a traffic detection and blocking mechanism. The goal is to detect unknown network traffic, including traffic that is encrypted, tunneled through Virtual Private Network, or obfuscated, in realistic application scenarios. Existing TA methods have limitations in that they can deal only with simple traffic patterns-usually, only a single source of traffic is allowed in a tunnel, and a trained classifier is not portable between network locations, requiring redundant training. This work aims to address these limitations with new techniques in machine learning. The three main contributions of this work are: 1) developing new statistical features around traffic surge periods that can better identify websites with dynamic contents; 2) a two-stage classifier architecture to solve the mixed-traffic problem with state-of-the-art TA features; and 3) leveraging a novel natural-language inspired feature to solve the mixed-traffic problem using Deep-Learning methods. A fully working Soft-margin Firewall with the above distinctive features have been designed, implemented, and verified for both conventional classifiers and the proposed deep-learning based classifiers. The efficacy of the proposed system is confirmed via experiments conducted on actual network setups with a custom-built prototype firewall and OpenVPN servers. The proposed feature-classifier combinations show superior performance compared to previous state-of-the-art results. The solution that combines natural-language inspired traffic feature and Deep-Learning is demonstrated to be able to solve the mixed-traffic problem, and capable of predicting multiple labels associated with one sample. Additionally, the classifier can classify traffic recorded from locations that are different from where the trained traffic was collected. These results are the first of their kind and are expected to lead the way of creating next-generation TA-based firewall systems.
    
    Read
- In Collections
- 
    Electronic Theses & Dissertations
                    
 
- Copyright Status
- Attribution 4.0 International
- Material Type
- 
    Theses
                    
 
- Thesis Advisors
- 
    Biswas, Subir
                    
 
- Committee Members
- 
    Li, Tongtong
                    
 Morris, Daniel
 Torng, Eric
 
- Date Published
- 
    2019
                    
 
- Subjects
- 
    Machine learning
                    
 Firewalls (Computer security)
 Computer security
 Computer networks--Security measures
 Computer networks--Management
 Streaming technology (Telecommunications)
 Telecommunication
 
- Program of Study
- 
    Electrical Engineering - Doctor of Philosophy
                    
 
- Degree Level
- 
    Doctoral
                    
 
- Language
- 
    English
                    
 
- Pages
- xii, 134 pages
- ISBN
- 
    9781085631464
                    
 108563146X
 
- Permalink
- https://doi.org/doi:10.25335/dg57-w216