You are here
Search results
(1  2 of 2)
 Title
 Near duplicate image search
 Creator
 Li, Fengjie
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Information retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e...
Show moreInformation retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e.g. dense feature vectors) and bagoffeatures (e.g. local feature represented images). For sparse data, inverted index and document retrieval models had been proved to be very effective for large scale retrieval problems. For dense data and bagoffeature data, however, there are still some open problems. For example, Locality Sensitive Hashing, a stateoftheart method for searching high dimensional vectors, often fails to make a good tradeoff between precision and recall. Namely, it tends to achieve high preci sion but with low recall or vice versa. The bagofwords model, a popular approach for searching objects represented bagoffeatures, has a limited performance because of the information loss during the quantization procedure.Since the general problem of searching objects represented in dense vectors and bagoffeatures may be too challenging, in this dissertation, we focus on nearly duplicate search, in which the matched objects is almost identical to the query. By effectively exploring the statistical proper ties of near duplicities, we will be able to design more effective indexing schemes and search algorithms. Thus, the focus of this dissertation is to design new indexing methods and retrieval algorithms, for near duplicate search in large scale databases, that accurately capture the data simi larity and delivers more accurate and efficient search. Below, we summarize the main contributions of this dissertation:Our first contribution is a new algorithm for searching near duplicate bagoffeatures data. The proposed algorithm, named random seeding quantization, is more efficient in generating bagof words representations for near duplicate images. The new scheme is motivated by approximating the optimal partial matching between bagoffeatures, and thus produces a bagofwords representation capturing the true similarities of the data, leading to more accurate and efficient retrieval of bagoffeatures data.Our second contribution, termed Random Projection Filtering, is a search algorithm designed for efficient near duplicate vector search. By explicitly exploiting the statistical properties of near duplicity, the algorithm projects high dimensional vectors into lower dimensional space and filter out irrelevant items. Our effective filtering procedure makes RPF more accurate and efficient to identify nearly duplicate objects in databases.Our third contribution is to develop and evaluate a new randomized range search algorithm for near duplicate vectors in high dimensional spaces, termed as Random Projection Search. Different from RPF, the algorithm presented in this chapter is suitable for a wider range of applications be cause it does not require the sparsity constrains for high search accuracy. The key idea is to project both the data points and the query point into an one dimensional space by a random projection, and perform one dimensional range search to find the subset of data points that are within the range of a given query using binary search. We prove the theoretical guarantee for the proposed algorithm and evaluate its empirical performance on a dataset of 1.1 billion image features.
Show less
 Title
 Hardware algorithms for highspeed packet processing
 Creator
 Norige, Eric
 Date
 2017
 Collection
 Electronic Theses & Dissertations
 Description

The networking industry is facing enormous challenges of scaling devices to support theexponential growth of internet traffic as well as increasing number of features being implemented inside the network. Algorithmic hardware improvements to networking componentshave largely been neglected due to the ease of leveraging increased clock frequency and compute power and the risks of implementing complex hardware designs. As clock frequencyslows its growth, algorithmic solutions become important...
Show moreThe networking industry is facing enormous challenges of scaling devices to support theexponential growth of internet traffic as well as increasing number of features being implemented inside the network. Algorithmic hardware improvements to networking componentshave largely been neglected due to the ease of leveraging increased clock frequency and compute power and the risks of implementing complex hardware designs. As clock frequencyslows its growth, algorithmic solutions become important to fill the gap between currentgeneration capability and next generation requirements. This paper presents algorithmicsolutions to networking problems in three domains: Deep Packet Inspection(DPI), firewall(and other) ruleset compression and noncryptographic hashing. The improvements in DPIare twopronged: first in the area of applicationlevel protocol field extraction, which allowssecurity devices to precisely identify packet fields for targeted validity checks. By usingcounting automata, we achieve precise parsing of nonregular protocols with small, constantperflow memory requirements, extracting at rates of up to 30gbps on real traffic in softwarewhile using only 112 bytes of state per flow. The second DPI improvement is on the longstanding regular expression matching problem, where we complete the HFA solution to theDFA state explosion problem with efficient construction algorithms and optimized memorylayout for hardware or software implementation. These methods construct automata toocomplex to be constructed by previous methods in seconds, while being capable of 29gbpsthroughput with an ASIC implementation. Firewall ruleset compression enables more firewall entries to be stored in a fixed capacity pattern matching engine, and can also be usedto reorganize a firewall specification for higher performance software matching. A novelrecursive structure called TUF is given to unify the best known solutions to this problemand suggest future avenues of attack. These algorithms, with little tuning, achieve a 13.7%improvement in compression on large, reallife classifiers, and can achieve the same results asexisting algorithms while running 20 times faster. Finally, noncryptographic hash functionscan be used for anything from hash tables to track network flows to packet sampling fortraffic characterization. We give a novel approach to generating hardware hash functionsin between the extremes of expensive cryptographic hash functions and low quality linearhash functions. To evaluate these midrange hash functions properly, we develop new evaluation methods to better distinguish noncryptographic hash function quality. The hashfunctions described in this paper achieve lowlatency, wide hashing with good avalanche anduniversality properties at a much lower cost than existing solutions.
Show less