You are here
Search results
(1  2 of 2)
 Title
 Near duplicate image search
 Creator
 Li, Fengjie
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Information retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e...
Show moreInformation retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e.g. dense feature vectors) and bagoffeatures (e.g. local feature represented images). For sparse data, inverted index and document retrieval models had been proved to be very effective for large scale retrieval problems. For dense data and bagoffeature data, however, there are still some open problems. For example, Locality Sensitive Hashing, a stateoftheart method for searching high dimensional vectors, often fails to make a good tradeoff between precision and recall. Namely, it tends to achieve high preci sion but with low recall or vice versa. The bagofwords model, a popular approach for searching objects represented bagoffeatures, has a limited performance because of the information loss during the quantization procedure.Since the general problem of searching objects represented in dense vectors and bagoffeatures may be too challenging, in this dissertation, we focus on nearly duplicate search, in which the matched objects is almost identical to the query. By effectively exploring the statistical proper ties of near duplicities, we will be able to design more effective indexing schemes and search algorithms. Thus, the focus of this dissertation is to design new indexing methods and retrieval algorithms, for near duplicate search in large scale databases, that accurately capture the data simi larity and delivers more accurate and efficient search. Below, we summarize the main contributions of this dissertation:Our first contribution is a new algorithm for searching near duplicate bagoffeatures data. The proposed algorithm, named random seeding quantization, is more efficient in generating bagof words representations for near duplicate images. The new scheme is motivated by approximating the optimal partial matching between bagoffeatures, and thus produces a bagofwords representation capturing the true similarities of the data, leading to more accurate and efficient retrieval of bagoffeatures data.Our second contribution, termed Random Projection Filtering, is a search algorithm designed for efficient near duplicate vector search. By explicitly exploiting the statistical properties of near duplicity, the algorithm projects high dimensional vectors into lower dimensional space and filter out irrelevant items. Our effective filtering procedure makes RPF more accurate and efficient to identify nearly duplicate objects in databases.Our third contribution is to develop and evaluate a new randomized range search algorithm for near duplicate vectors in high dimensional spaces, termed as Random Projection Search. Different from RPF, the algorithm presented in this chapter is suitable for a wider range of applications be cause it does not require the sparsity constrains for high search accuracy. The key idea is to project both the data points and the query point into an one dimensional space by a random projection, and perform one dimensional range search to find the subset of data points that are within the range of a given query using binary search. We prove the theoretical guarantee for the proposed algorithm and evaluate its empirical performance on a dataset of 1.1 billion image features.
Show less
 Title
 Structure and evolutionary dynamics in fitness landscapes
 Creator
 Pakanati, Anuraag R.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Evolution can be conceptualized as an optimization algorithm that allows populations to search through genotypes for those that produce high fitness solutions. This search process is commonly depicted as exploring a “fitness landscape”, which combines similarity relationships among genotypes with the concept of a genotypefitness map. As populations adapt to their fitness landscape, they accumulate information about the fitness landscape in which they live. A greater understanding of...
Show moreEvolution can be conceptualized as an optimization algorithm that allows populations to search through genotypes for those that produce high fitness solutions. This search process is commonly depicted as exploring a “fitness landscape”, which combines similarity relationships among genotypes with the concept of a genotypefitness map. As populations adapt to their fitness landscape, they accumulate information about the fitness landscape in which they live. A greater understanding of evolution on fitness landscapes will help elucidate fundamental evolutionary processes. I examine methods of estimating information acquisition in evolving populations and find that these techniques have largely ignored the effects of common descent. Since information is estimated by measuring conserved genomic regions across a population, common descent can create a severe bias by increasing similarities among unselected regions. I introduce a correction method to compensate for the effects of common descent on genomic information and empirically demonstrate its efficacy.Next, I explore three instantiations of NK, Avida, and RNA fitness landscapes to better understand structural properties such as the distribution of peaks and the size of basins of attraction. I find that the fitness of peaks is correlated with the fitness of peaks within their neighborhood, and that the size of peaks' basins of attraction tends to be proportional to the heights of the peaks. Finally, I visualize local dynamics and perform a detailed comparison between the space of what evolutionary trajectories are technically possible from a single starting point and the results of actual evolving populations.
Show less