You are here
Search results
(1  20 of 47)
Pages
 Title
 Harnessing evolutionary computation for the design and generation of adaptive embedded controllers within the context of uncertainty
 Creator
 Byers, Chad Michael
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

A critical challenge for the design of embedded controllers is incorporating desirable qualities such as robustness, fault tolerance, and adaptability into the control process in order to respond to dynamic environmental conditions. An embedded controller governs the execution of a taskspecific system by monitoring information from its environment via sensors and producing an appropriate response through the system's actuators, often independent of any supervisory control. For a human...
Show moreA critical challenge for the design of embedded controllers is incorporating desirable qualities such as robustness, fault tolerance, and adaptability into the control process in order to respond to dynamic environmental conditions. An embedded controller governs the execution of a taskspecific system by monitoring information from its environment via sensors and producing an appropriate response through the system's actuators, often independent of any supervisory control. For a human developer, identifying the set of all possible combinations of conditions a system might experience and designing a solution to accommodate this set is burdensome, costly, and often, infeasible. To alleviate this burden, a variety of techniques have been explored to automate the generation of embedded controller solutions. In this dissertation, we focus on the bioinspired technique referred to as evolutionary computation where we harness evolution's power as a populationbased, global search technique to build up good behavioral components. In this way, evolution naturally selects for these desirable qualities in order for a solution to remain competitive over time in the population. Often, these search techniques operate in the context of uncertainty where aspects of the (1) problem domain, (2) solution space, and (3) search process itself are subject to variation and change. To mitigate issues associated with uncertainty in the problem domain, we propose the digital enzyme, a biologicallyinspired model that maps the complexity of both the environment and the system into the space of values rather than instructions. To address uncertainty in the solution space, we remove constraints in our initial digital enzyme model to allow the genome structure to be dynamic and openended, accommodating a wider range of evolved solution designs. Finally, to help explore the inherent uncertainty that exists in the search process, we uncover a hidden feature interaction present between the diversitypreserving search operator of a popular evolutionary algorithm and propose a new way to use niching as a means to mitigate its unwanted effects and bias on search.
Show less
 Title
 Kernelbased clustering of big data
 Creator
 Chitta, Radha
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

There has been a rapid increase in the volume of digital data over the recent years. A study by IDC and EMC Corporation predicted the creation of 44 zettabytes (10^21 bytes) of digital data by the year 2020. Analysis of this massive amounts of data, popularly known as big data, necessitates highly scalable data analysis techniques. Clustering is an exploratory data analysis tool used to discover the underlying groups in the data. The stateoftheart algorithms for clustering big data sets...
Show moreThere has been a rapid increase in the volume of digital data over the recent years. A study by IDC and EMC Corporation predicted the creation of 44 zettabytes (10^21 bytes) of digital data by the year 2020. Analysis of this massive amounts of data, popularly known as big data, necessitates highly scalable data analysis techniques. Clustering is an exploratory data analysis tool used to discover the underlying groups in the data. The stateoftheart algorithms for clustering big data sets are linear clustering algorithms, which assume that the data is linearly separable in the input space, and use measures such as the Euclidean distance to define the interpoint similarities. Though efficient, linear clustering algorithms do not achieve high cluster quality on realworld data sets, which are not linearly separable. Kernelbased clustering algorithms employ nonlinear similarity measures to define the interpoint similarities. As a result, they are able to identify clusters of arbitrary shapes and densities. However, kernelbased clustering techniques suffer from two major limitations: (i) Their running time and memory complexity increase quadratically with the increase in the size of the data set. They cannot scale up to data sets containing billions of data points. (ii) The performance of the kernelbased clustering algorithms is highly sensitive to the choice of the kernel similarity function. Ad hoc approaches, relying on prior domain knowledge, are currently employed to choose the kernel function, and it is difficult to determine the appropriate kernel similarity function for the given data set.In this thesis, we develop scalable approximate kernelbased clustering algorithms using random sampling and matrix approximation techniques. They can cluster big data sets containing billions of highdimensional points not only as efficiently as linear clustering algorithms but also as accurately as classical kernelbased clustering algorithms.Our first contribution is based on the premise that the similarity matrices corresponding to big data sets can usually be wellapproximated by lowrank matrices built from a subset of the data. We develop an approximate kernelbased clustering algorithm, which uses a lowrank approximate kernel matrix, constructed from a uniformly sampled small subset of the data, to perform clustering. We show that the proposed algorithm has linear running time complexity and low memory requirements, and also achieves high cluster quality, when provided with sufficient number of data samples. We also demonstrate that the proposed algorithm can be easily parallelized to handle distributed data sets. We then employ nonlinear random feature maps to approximate the kernel similarity function, and design clustering algorithms which enhance the efficiency of kernelbased clustering, as well as label assignment for previously unseen data points. Our next contribution is an online kernelbased clustering algorithm that can cluster potentially unbounded stream data in realtime. It intelligently samples the data stream and finds the cluster labels using these sampled points. The proposed scheme is more effective than the current kernelbased and linear stream clustering techniques, both in terms of efficiency and cluster quality. We finally address the issues of high dimensionality and scalability to data sets containing a large number of clusters. Under the assumption that the kernel matrix is sparse when the number of clusters is large, we modify the above online kernelbased clustering scheme to perform clustering in a lowdimensional space spanned by the top eigenvectors of the sparse kernel matrix. The combination of sampling and sparsity further reduces the running time and memory complexity. The proposed clustering algorithms can be applied in a number of realworld applications. We demonstrate the efficacy of our algorithms using several large benchmark text and image data sets. For instance, the proposed batch kernel clustering algorithms were used to cluster large image data sets (e.g. Tiny) containing up to 80 million images. The proposed stream kernel clustering algorithm was used to cluster over a billion tweets from Twitter, for hashtag recommendation.
Show less
 Title
 Evolution of cooperation in the light of information theory
 Creator
 Mirmomeni, Masoud
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Cooperation is ubiquitous in different biological levels and is necessary for evolution to shape the life and create new forms of organization. Genes cooperate in controlling cells; cells efficiently collaborate together to produce cohesive multicellular organisms; members of insect colonies and animal clans cooperate in protecting the colony and providing food. Cooperation means that members of a group bear a cost, c, for another individuals to earn a benefit, b. While cooperators of the...
Show moreCooperation is ubiquitous in different biological levels and is necessary for evolution to shape the life and create new forms of organization. Genes cooperate in controlling cells; cells efficiently collaborate together to produce cohesive multicellular organisms; members of insect colonies and animal clans cooperate in protecting the colony and providing food. Cooperation means that members of a group bear a cost, c, for another individuals to earn a benefit, b. While cooperators of the group help others by paying a cost, defectors receive the benefits of this altruistic behavior without providing any service in return to the group. To address this dilemma, here we use a game theoretic approach to model and study evolutionary dynamics that can lead to unselfish behavior. Evolutionary game theory is an approach to study frequencydependent systems. In evolutionary games the fitness of individuals depends on the relative abundance of the various types in the population. We explore different strategies and different games such as iterated games between players with conditional strategies, multi player games, and iterated games between fully stochastic strategies in noisy environments to find the necessity conditions that lead to cooperation. Interestingly, we see that in all of these games communication is the key factor for maintaining cooperation among selfish individuals. We show that communication and information exchange is necessary for the emergence of costly altruism, and to maintain cooperation in the group there should be minimum rate of communication between individuals. We quantify this minimum amount of information exchange, which is necessary for individuals to exhibit cooperative behavior, by defining a noisy communication channel between them in iterated stochastic games and measuring the communication rate (in bits) during the break down of cooperation.
Show less
 Title
 Discrete vector and 2tensor analyses and applications
 Creator
 Liu, Beibei
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

We present novel analysis methods for vector fields and an intrinsic representation of 2tensor fields on meshes, and show the benefits they bring to discrete calculus, geometry processing, texture synthesis and fluid simulation. For instance, such vector fields and tensor fields in flat 2D space are necessary for examplebased texture synthesis. However, many existing methods cannot ensure the continuity automatically or control the singularities accurately. Moreover, extending such analyses...
Show moreWe present novel analysis methods for vector fields and an intrinsic representation of 2tensor fields on meshes, and show the benefits they bring to discrete calculus, geometry processing, texture synthesis and fluid simulation. For instance, such vector fields and tensor fields in flat 2D space are necessary for examplebased texture synthesis. However, many existing methods cannot ensure the continuity automatically or control the singularities accurately. Moreover, extending such analyses to curved surfaces involves several known challenges. First, vectors at different surface points are defined in different tangent planes, so their comparison necessarily involves a concept calledconnection to transport vectors from one tangent plane to another in a parallel way. The few existing approaches for discrete connections offer neither a globally optimal principled definition nor a consistent disretization of differential operators. Second, symmetric 2tensors, which play a crucial role in geometry processing, are often discretized as components stored in the predefined local frames. There is no convenient way to perform coordinateindependent computations with arbitrary 2tensor fields on triangulated surface meshes. Finally, the persistent pursue for efficiency in the processing of vector fields in applications such as incompressible fluid simulation often results in undesired artifacts such as numerical viscosity, which prevents a predictive preview for the fineresolution simulation at coarse spatial and temporal resolutions.We offer solutions to address these issues using our novel representation and analysis tools.First, we present a framework for examplebased texture synthesis with feature alignment to vector fields with two way rotational symmetry, also known as orientation fields. Our contribution is twofold: a design tool for orientation fields with a natural boundary condition and singularity control, and a parallel texture synthesis adapted specifically for such fields in feature alignment.Second, we define discrete connection on triangle meshes, which involves closedform expressions within edges and triangles and finite rotations between pairs of incident vertices, edges, or triangles. The finite set of parameters of this connection can be optimally computed by minimizing a quadratic measure of the deviation from the connection induced by the embedding of the input triangle mesh. Local integrals of other firstorder derivatives as well as the L2based energies can also be computed.Third, we offer a coordinatefree representation of arbitrary 2tensor fields on triangle meshes, where we leverage a decomposition of continuous 2tensors in the plane to construct a finitedimensional encoding of tensor fields through scalar values on oriented pieces of a manifold triangulation. We also provide closedform expressions of common operators for tensor fields, including pairing, inner product, and trace for this discrete representation, and formulate a discrete covariant derivative induced by the 2tensors instead of the metric of the surface. Other operators, such as discrete Lie bracket, can be constructed based on these operators. This approach extends computational tools for tensor fields and offers a numerical framework for discrete tensor calculus on triangulations.Finally, a spectral vector field calculus on embeded irregular shape is introduced to build a modelreduced variational Eulerian integrator for incompressible fluid. The resulting simulation combines the efficiency gains of dimension reduction, the qualitative robustness to coarse spatial and temporal resolutions of geometric integrators, and the simplicity of subgrid accurate boundary conditions on regular grids to deal with arbitrarilyshaped domains. A functional map approach to fluid simulation is also proposed, where scalarvalued and vectorvalued eigenfunctions of the Laplacian operator can be easily used as reduced bases. Using a variational integrator in time topreserve liveliness and a simple, yet accurate embedding of the fluid domain onto a Cartesian grid, our modelreduced fluid simulator can achieve realistic animations in significantly less computation time than fullscale nondissipative methods but without the numerical viscosity from which current reduced methods suffer.
Show less
 Title
 Approaches to scaling and improving metagenome sequence assembly
 Creator
 Pell, Jason (Jason A.)
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

Since the completion of the Human Genome Project in the early 2000s, new highthroughput sequencing technologies have been developed that produce more DNA sequence reads at a much lower cost. Because of this, large quantities of data have been generated that are difficult to analyze computationally, not only because of the sheer number of reads but due to errors. One area where this is a particularly difficult problem is metagenomics, where an ensemble of microbes in an environmental sample...
Show moreSince the completion of the Human Genome Project in the early 2000s, new highthroughput sequencing technologies have been developed that produce more DNA sequence reads at a much lower cost. Because of this, large quantities of data have been generated that are difficult to analyze computationally, not only because of the sheer number of reads but due to errors. One area where this is a particularly difficult problem is metagenomics, where an ensemble of microbes in an environmental sample is sequenced. In this scenario, blends of species with varying abundance levels must be processed together in a Bioinformatics pipeline. One common goal with a sequencing dataset is to assemble the genome from the set of reads, but since comparing reads with one another scales quadratically, new algorithms had to be developed to handle the large quantity of short reads generated from the latest sequencers. These assembly algorithms frequently use de Bruijn graphs where reads are broken down into kmers, or small DNA words of a fixed size k. Despite these algorithmic advances, DNA sequence assembly still scales poorly due to errors and computer memory inefficiency.In this dissertation, we develop approaches to tackle the current shortcomings in metagenome sequence assembly. First, we devise the novel use of a Bloom filter, a probabilistic data structure with false positives, for storing a de Bruijn graph in memory. We study the properties of the de Bruijn graph with false positives in detail and observe that the components in the graph abruptly connect together at a specific false positive rate. Then, we analyze the memory efficiency of a partitioning algorithm at various false positive rates and find that this approach can lead to a 40x decrease in memory usage.Extending the idea of a probabilistic de Bruijn graph, we then develop a twopass error correction algorithm that effectively discards erroneous reads and corrects the remaining majority to be more accurate. In the first pass, we use the digital normalization algorithm to collect novelty and discard reads that have already been at a sufficient coverage. In the second, a readtograph alignment strategy is used to correct reads. Some heuristics are employed to improve the performance. We evaluate the algorithm with an E. coli dataset as well as a mock human gut metagenome dataset and find that the error correction strategy works as intended.
Show less
 Title
 Multiobjective regression with application to the climate domain
 Creator
 Abraham, Zubin John
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

Regressionbased approaches are widely used in climate research to derive the statistical, spatial, and temporal relationships among climate variables. Despite its extensive literature, existing approaches are insufficient to address the unique challenges arising from the data characteristics and requirements of this domain. For example, climate variables such as precipitation have zeroinflated distributions, which render ineffective any linear regression models constructed from the data. In...
Show moreRegressionbased approaches are widely used in climate research to derive the statistical, spatial, and temporal relationships among climate variables. Despite its extensive literature, existing approaches are insufficient to address the unique challenges arising from the data characteristics and requirements of this domain. For example, climate variables such as precipitation have zeroinflated distributions, which render ineffective any linear regression models constructed from the data. In addition, whereas traditional regressionbased approaches emphasize on minimizing the discrepancy between observed and predicted values, there is a growing demand for regression outputs that satisfy other domainspecific criteria. To address these challenges, this thesis presents multiobjective regression frameworks designed to extend current regressionbased approaches to meet the needs of climate researchers. First, a framework called Integrated Classification and Regression (ICR) is developed to accurately capture the timing of rain events and the magnitude of rain amount in zeroinflated precipitation data. The second multiobjective regression framework focuses on modeling the extreme values of a distribution without degrading its overall accuracy in predicting nonextreme values. The third framework emphasizes on both minimizing the divergence between the regression output and observed data while maximizing the fit of their cumulative distribution functions. The fourth contribution extends this framework to a multioutput setting, to ensure that the joint distribution of the multiple regression outputs is realistic and consistent with true observations.
Show less
 Title
 Inferring regulatory interactions in transcriptional regulatory networks
 Creator
 Mahmoud, Sherine Awad
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation....
Show moreLiving cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation. Recent developments in biotechnology have been applied to uncover TFtarget binding relationships to reconstruct draft regulatory circuits at a systems level. Furthermore, to identify regulatory interactions in vivo and consequently reveal their functions, TF single/double knockouts and overexpression experiments have been systematically carried out. However, the results of many single or even doubleknockout experiments are often nonconclusive, since many genes are regulated by multiple TFs with complementary functions. To predict the TF combinations that the knocking out of them are most likely to bring about the phenotypic change, we developed a new computational tool called TRIM that models the interactions between the TFs and the target genes in terms of both the TFtarget interaction's function (activation or repression) and its corresponding logical role (necessary and/or sufficient). We used DNAprotein binding and gene expression data to construct regulatory modules for inferring the transcriptional regulatory interaction models for the TFs and their corresponding target genes. Our TRIM algorithm is based on an HMM and a set of constraints that relate gene expression patterns to regulatory interaction models. However, TRIM infers up to 2TFs interactions. Inferring the collaborative interactions of multiple TFs is a computationally difficult task, because when multiple TFs simultaneously or sequentially control their target genes, a single gene responds to merged inputs, resulting in complex gene expression patterns. We developed mTRIM to solve this problem with a modified association rule mining approach. mTRIM is a novel method to infer TF collaborationsin transcriptional regulation networks. It can not only identify TF groups that regulate thecommon targets collaboratively but also TFs with complementary functions. However, mTRIM ignores the effect of miRNAs on target genes. In order to take miRNAs' effect into considerations, we developed a new computational model called TmiRNA that incorporates miRNAs into the inference. TmiRNA infers the interactions between a set of regulators including both TFs and miRNAs and the set of their target genes. We used our model to study the combinatorial code of Human Cancer transcriptional regulation.
Show less
 Title
 Differential geometry based representations for meshes and correspondences
 Creator
 Wang, Yuanzhen, 1986
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

This dissertation presents novel meshing and correspondence inference techniques, designed for generating high quality representations of both the geometric objects and the mappings among these objects, which benefit subsequent geometry processing, modeling, and scientific computing. From abstract.
 Title
 Automatic verification and revision of multitolerant programs
 Creator
 Chen, Jingshu
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

The notion of multitolerance is based on the observation that modern programs are often subject to multiple faults. And, the requirements in the presence of these faults vary based on the nature of the faults, its severity and the cost of providing faulttolerance to it. Also, assurance of multitolerant systems is necessary via they are integral parts of our lives. This dissertation proposes to provide such assurance via automated verification and revision.Regarding verification, we focus on...
Show moreThe notion of multitolerance is based on the observation that modern programs are often subject to multiple faults. And, the requirements in the presence of these faults vary based on the nature of the faults, its severity and the cost of providing faulttolerance to it. Also, assurance of multitolerant systems is necessary via they are integral parts of our lives. This dissertation proposes to provide such assurance via automated verification and revision.Regarding verification, we focus on verification of selfstabilization, which is the ability of the program to recover from arbitrary states. We consider verification of selfstabilization because several multitolerant systems are indeed stabilizing. Also, most of literature on verification of faulttolerance focuses on safety property; our work complements it by considering liveness properties. Hence, we envision verification of multitolerant programs by using existing approaches for verifying safety and using the results from this dissertation for verifying liveness. We propose a technique that is based on a bottleneck (fairness requirements) identified in existing work on verification of stabilization. Our approach uses the role of fairness along with symbolic model checking, and hence reduces the cost of verification substantially. We also propose a constraintbased approach that reduces the task of verifying selfstabilization into a wellstudied problem of constraint solving, so that one can leverage the use of existing highly optimized solutions (SAT/SMT solvers) to reduce the verification cost.Regarding revision, we focus on revising existing programs to obtain the corresponding multitolerant ones. Revising the program manually is expensive since it requires additional verification steps to guarantee correctness. Also, manual revision may violate existing requirements. For these reasons, we propose an automatic approach to revise a given program to add multitolerance to the given class(es) of faults. Specifically, we characterize multitolerance in terms of strong multitolerance and weak multitolerance. Intuitively, strong multitolerance provides higher guarantees than weak multitolerance. However, there are scenarios where designing a strong multitolerant program is expensive or impossible although designing weak multitolerance is feasible. We investigate the complexity of automatic revision for adding multitolerance. In particular, we identify instances where adding weak multitolerance is NPhard even though adding strong multitolerance in the same setting in P. We also develop algorithms (and heuristics) for automatic revision for adding multitolerance to existing programs. We implement these algorithms in a model repair tool for automatically adding multitolerance. Additionally, we build a lightweight framework that utilizes our model repair tool for automatically revising UML state diagram for adding faulttolerance. This framework has several practical and methodological significance regarding the development of concurrent software. Specifically, this framework allows designers to revise an existing UML model to add faulttolerance without a detailed knowledge of the formalism behind model repair algorithms.
Show less
 Title
 Profile HMMbased protein domain analysis of nextgeneration sequencing data
 Creator
 Zhang, Yuan
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

Sequence analysis is the process of analyzing DNA, RNA or peptide sequences using a wide range of methodologies in order to understand their functions, structures or evolution history. Next generation sequencing (NGS) technologies generate largescale sequence data of high coverage and nucleotide level resolution at low costs, benefiting a variety of research areas such as gene expression profiling, metagenomic annotation, ncRNA identification, etc. Therefore, functional analysis of NGS...
Show moreSequence analysis is the process of analyzing DNA, RNA or peptide sequences using a wide range of methodologies in order to understand their functions, structures or evolution history. Next generation sequencing (NGS) technologies generate largescale sequence data of high coverage and nucleotide level resolution at low costs, benefiting a variety of research areas such as gene expression profiling, metagenomic annotation, ncRNA identification, etc. Therefore, functional analysis of NGS sequences becomes increasingly important because it provides insightful information, such as gene expression, protein composition, and phylogenetic complexity, of the species from which the sequences are generated. One basic step during the functional analysis is to classify genomic sequences into different functional categories, such as protein families or protein domains (or domains for short), which are independent functional units in a majority of annotated protein sequences. The stateoftheart method for protein domain analysis is based on comparative sequence analysis, which classifies query sequences into annotated protein or domain databases. There are two types of domain analysis methods, pairwise alignment and profilebased similarity search. The first one uses pairwise alignment tools such as BLAST to search query genomic sequences against reference protein sequences in databases such as NCBInr. The second one uses profile HMMbased tools such as HMMER to classify query sequences into annotated domain families such as Pfam. Compared to the first method, the profile HMMbased method has smaller search space and higher sensitivity with remote homolog detection. Therefore, I focus on profile HMMbased protein domain analysis.There are several challenges with protein domain analysis of NGS sequences. First, sequences generated by some NGS platforms such as pyrosequencing have relatively high error rates, making it difficult to classify the sequences into their native domain families. Second, existing protein domain analysis tools have low sensitivity with short query sequences and poorly conserved domain families. Third, the volume of NGS data is usually very large, making it difficult to assemble short reads into longer contigs. In this work, I focus on addressing these three challenges using different methods. To be specific, we have proposed four tools, HMMFRAME, MetaDomain, SALT, and SATAssembler. HMMFRAME focuses on detecting and correcting frameshift errors in sequences generated by pyrosequencing technology, thus accurately classifying metagenomic sequences containing frameshift errors into their native protein domain families. MetaDomain and SALT are both designed for short reads generated by NGS technologies. MetaDomain uses relaxed positionspecific score thresholds and alignment positions to increase the sensitivity while keeping the false positive rate at a low level. SALT combines both positionspecific score thresholds and graph algorithms and achieves higher accuracy than MetaDomain. SATAssembler conducts targeted gene assembly from largescale NGS data. It has smaller memory usage, higher gene coverage, and lower chimera rate compared with existing tools. Finally, I will make a conclusion on my work and briefly talk about some future work
Show less
 Title
 Cortexinspired developmental learning networks for stereo vision
 Creator
 Solgi, Mojtaba
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

How does the human brain make sense of the 3D world while its visual input, the retinal images, are only twodimensional? There are multiple depthcues exploited by the brain to create a 3D model of the world. Despite the importance of this subject both for scientists and engineers, the underlying computational mechanisms of the stereo vision in the human brain is still largely unknown. This thesis is an attempt towards creating a developmental model of the stereo vision in the visual cortex....
Show moreHow does the human brain make sense of the 3D world while its visual input, the retinal images, are only twodimensional? There are multiple depthcues exploited by the brain to create a 3D model of the world. Despite the importance of this subject both for scientists and engineers, the underlying computational mechanisms of the stereo vision in the human brain is still largely unknown. This thesis is an attempt towards creating a developmental model of the stereo vision in the visual cortex. By developmental we mean that the features of each neuron are developed, instead of handcrafted, so that the limited resource is optimally used. This approach helps us learn more about the biological stereo vision, and also yields results superior to those of traditional computer vision approaches, e.g., under weak textures. Developmental networks, such as WhereWhat Networks (WWN), have been shown promising for simultaneous attention and recognition, while handling variations in scale, location and type as well as interclass variations. Moreover, in a simpler prior setting, they have shown subpixel accuracy in disparity detection in challenging natural images. However, the previous work for stereo vision was limited to 20 pixel stripes of shifted images and unable to scale to real world problems. This dissertation presents work on building neuromorphic developmental models for stereo vision, focusing on 1) dynamic synapse retraction and growth as a method of developing more efficient receptive fields 2) training for images that involve complex natural backgrounds 3) integration of depth perception with location and type information. In a setting of 5 object classes, 7 × 7 = 49 locations and 11 disparity levels, the network achieves above 95% recognition rate for object shapes, under one pixel disparity detection error, and under 10 pixel location error. These results are reported using challenging natural and synthetic textures both on background and foreground objects in disjoint testing.
Show less
 Title
 Measurement and modeling of large scale networks
 Creator
 Shafiq, Muhammad Zubair
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The goal of this thesis is to identify measurement, modeling, and optimization opportunities for large scale networks  with specific focus on cellular networks and online social networks. These networks are facing unprecedented operational challenges due to their very large scale.Cellular networks are experiencing an explosive increase in the volume of traffic for the last few years. This unprecedented increase in the volume of mobile traffic is attributed to the increase in the subscriber...
Show moreThe goal of this thesis is to identify measurement, modeling, and optimization opportunities for large scale networks  with specific focus on cellular networks and online social networks. These networks are facing unprecedented operational challenges due to their very large scale.Cellular networks are experiencing an explosive increase in the volume of traffic for the last few years. This unprecedented increase in the volume of mobile traffic is attributed to the increase in the subscriber base, improving network connection speeds, and improving hardware and software capabilities of modern smartphones. In contrast to the traditional fixed IP networks, mobile network operators are faced with the constraint of limited radio frequency spectrum at their disposal. As the communication technologies evolve beyond 3G to Long Term Evolution (LTE), the competition for the limited radio frequency spectrum is becoming even more intense. Therefore, mobile network operators increasingly focus on optimizing different aspects of the network by customized design and management to improve key performance indicators (KPIs).Online social networks are increasing at a very rapid pace, while trying to provide more contentrich and interactive services to their users. For instance, Facebook currently has more than 1.2 billion monthly active users and offers news feed, graph search, groups, photo sharing, and messaging services. The information for such a large user base cannot be efficiently and securely managed by traditional database systems. Social network service providers are deploying novel large scale infrastructure to cope with these scaling challenges.In this thesis, I present novel approaches to tackle these challenges by revisiting the current practices for the design, deployment, and management of large scale network systems using a combination of theoretical and empirical methods. I take a datadriven approach in which the theoretical and empirical analyses are intertwined. First, I measure and analyze the trends in data and then model the identified trends using suitable parametric models. Finally, I rigorously evaluate the developed models and the resulting system design prototypes using extensive simulations, realistic testbed environments, or realworld deployment. This methodology is to used to address several problems related to cellular networks and online social networks.
Show less
 Title
 Fingerprint recognition : models and applications
 Creator
 Yoon, Soweon
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Fingerprint recognition has been successfully used in law enforcement and forensics to identify suspects and victims for over a century. Recent advances in automated fingerprint identification systems (AFIS), coupled with the growing need for reliable person recognition, have resulted in an increased deployment of AFIS in broad applications such as border control, employment background checks, secure facility access, and user authentication in laptops and mobile devices. Despite the success...
Show moreFingerprint recognition has been successfully used in law enforcement and forensics to identify suspects and victims for over a century. Recent advances in automated fingerprint identification systems (AFIS), coupled with the growing need for reliable person recognition, have resulted in an increased deployment of AFIS in broad applications such as border control, employment background checks, secure facility access, and user authentication in laptops and mobile devices. Despite the success of fingerprint recognition technique in many largescale and diverse person identification applications, several challenging issues in fingerprint recognition still need to be addressed. First, the persistence and uniqueness of fingerprintsthe fundamental premises for fingerprint recognitionremain as presumptions rather than facts with solid scientific underpinnings. Although some studies have addressed the uniqueness of fingerprints, there has been no systematic study reported on the persistence of fingerprints. Given a large longitudinal fingerprint database, we have analyzed it with a multilevel statistical model to assess the impact of time interval between a genuine fingerprint pair on the corresponding match score and identification decision. Second, an appropriate mathematical model for fingerprint orientation field is necessary in addressing a number of challenging problems, including feature extraction (e.g., orientation field and minutiae) from noisy or partial fingerprints, detecting abnormality in fingerprint patterns, and representing fingerprints in a compact form. To this end, we have developed a global orientation field model in the form of ordinary differential equations which is constrained by the number of singular points in a fingerprint. The model represents the global orientation field with a small number of polynomial terms and outputs fingerprintlike orientation fields with the specific number of singular points after fitting an input pattern. We use this model to check the fingerprintness of the input image and ensure the integrity of exemplar fingerprint databases. Third, given that automatic latent fingerprint matching is difficult due to poor quality of latent fingerprints found at crime scenes, we have developed a semiautomatic latent enhancement method. The proposed algorithm is based on robust orientation field estimation that is able to (i) reduce human intervention in feature markup and (ii) improve automatic feature extraction and matching accuracy of latents. Fourth, fingerprint obfuscation or alteration is a type of attack on AFIS that is of concern to law enforcement and border crossing agencies. We show that the fingerprint matching accuracy can greatly deteriorate when the altered fingerprints are presented to AFIS. To address this deficiency of AFIS, we have (i) analyzed the types of fingerprint obfuscation, (ii) developed a detection algorithm for altered fingerprints by measuring the abnormality in orientation field and minutiae distribution, and (iii) proposed a restoration and matching algorithm to possibly link an altered fingerprint to its prealtered mate. By addressing these contemporary problems, this dissertation has advanced our understanding of fingerprint recognition and enabled development of robust and accurate fingerprint recognition algorithms.
Show less
 Title
 Statistical and learning algorithms for the design, analysis, measurement, and modeling of networking and security systems
 Creator
 Shahzad, Muhammad (College teacher)
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

"The goal of this thesis is to develop statistical and learning algorithms for the design, analysis, measurement, and modeling of networking and security systems with specific focus on RFID systems, network performance metrics, user security, and software security. Next, I give a brief overview of these four areas of focus."  Abstract.
 Title
 Data clustering with pairwise constraints
 Creator
 Yi, Jinfeng
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The classical unsupervised clustering is an illposed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of nonexpert crowd workers, which leads to the problem of crowdclustering, and (ii)...
Show moreThe classical unsupervised clustering is an illposed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of nonexpert crowd workers, which leads to the problem of crowdclustering, and (ii) pairwise constraints collected from oracle or experts, which leads to the problem of semisupervised clustering. In both cases, the costs of collecting pairwise constraints can be expensive, thus it is important to identify the minimal number of pairwise constraints needed to accurately recover the underlying true data partition, also known as a sample complexity problem.In this thesis, we first analyze the sample complexity of crowdclustering. At first, we propose a novel crowdclustering approach based on the theory of matrix completion. Unlike the existing crowdclustering algorithm that is based on a Bayesian generative model, the proposed approach is more desirable since it only needs a much less number of crowdsourced pairwise annotations to accurately cluster all the objects. Our theoretical analysis shows that in order to accurately cluster $N$ objects, only $O(N\log^2 N)$ randomly sampled pairs should be annotated by crowd workers. To further reduce the sample complexity, we then introduce a semicrowdsourced clustering framework that is able to effectively incorporate the lowlevel features of the objects to be clustered. In this framework, we only need to sample a subset of $n \ll N$ objects and generate their pairwise constraints via crowdsourcing. After completing a $n \times n$ similarity matrix using the proposed crowdclustering algorithm, we can further recover a $N \times N$ similarity matrix by applying a regressionbased distance metric learning algorithm to the completed smaller size similarity matrix. This enables us to reliably cluster $N$ objects with only $O(n\log^2 n)$ crowdsourced pairwise constraints.Next, we study the problem of sample complexity in semisupervised clustering. To this end, we propose a novel convex semisupervised clustering approach based on the theory of matrix completion. In order to reduce the number of pairwise constraints needed %to achieve a perfect data partitioning,we apply a nature assumption that the feature representationsof the objects are able to reflect the similarities between objects. This enables us to only utilize $O(\log N)$ pairwiseconstraints to perfectly recover the data partition of $N$ objects.Lastly, in addition to sample complexity that relates to labeling costs, we also consider the computational costs of semisupervised clustering.%In addition to sample complexity that relates to the labeling costs, we also consider the computational cost of semisupervised clustering in the final part of this thesis.Specifically, we study the problem of efficiently updating clustering results when the pairwise constraints are generated sequentially, a common case in various realworld applications such as social networks. To address this issue, we develop a dynamic semisupervised clustering algorithm that casts the clustering problem into a searching problem in a feasibleconvex space, i.e., a convex hull with its extreme points being an ensemble of multiple data partitions. Unlike classical semisupervised clustering algorithms that need to reoptimize their objective functions when new pairwise constraints are generated, the proposed method only needs to update a lowdimensional vector and its time complexity is irrelevant to the number of data points to be clustered. This enables us to update largescale clustering results in an extremely efficient way.
Show less
 Title
 Out of the box optimization using the parameterless population pyramid
 Creator
 Goldman, Brian W.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

The Parameterless Population Pyramid (P3) is a recently introduced method for performing evolutionary optimization without requiring any userspecified parameters. P3’s primary innovation is to replace the generational model with a pyramid of multiple populations that are iteratively created and expanded. In combination with local search and advanced crossover, P3 scales to problem difficulty, exploiting previously learned information before adding more diversity.Across seven problems, each...
Show moreThe Parameterless Population Pyramid (P3) is a recently introduced method for performing evolutionary optimization without requiring any userspecified parameters. P3’s primary innovation is to replace the generational model with a pyramid of multiple populations that are iteratively created and expanded. In combination with local search and advanced crossover, P3 scales to problem difficulty, exploiting previously learned information before adding more diversity.Across seven problems, each tested using on average 18 problem sizes, P3 outperformed all five advanced comparison algorithms. This improvement includes requiring fewer evaluations to find the global optimum and better fitness when using the same number of evaluations. Using both algorithm analysis and comparison we show P3’s effectiveness is due to its ability to properly maintain, add, and exploit diversity. Unlike the best comparison algorithms, P3 was able to achieve this quality without any problemspecific tuning. Thus, unlike previous parameterless methods, P3 does not sacrifice quality for applicability. Therefore we conclude that P3 is an efficient, general, parameterless approach to blackbox optimization that is more effective than existing stateoftheart techniques.Furthermore, P3 can be specialized for graybox problems, which have known, limited, nonlinear relationships between variables. GrayBox P3 leverages the HammingBall Hill Climber, an exceptionally efficient form of local search, as well as a novel method for performing crossover using the known variable interactions. In doing so GrayBox P3 is able to find the global optimum of large problems in seconds, improving over BlackBox P3 by up to two orders of magnitude.
Show less
 Title
 Largescale high dimensional distance metric learning and its application to computer vision
 Creator
 Qian, Qi
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Learning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$nearest neighbor classifier, $k$means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have...
Show moreLearning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$nearest neighbor classifier, $k$means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have been developed in the past decade, most of them can handle only small data sets with hundreds of features, significantly limiting their applications to real world applications that often involve millions of training examples represented by hundreds of thousands of features. Three main challenges are encountered to learn the metric from these largescale high dimensional data: (i) To make sure that the learned metric is a Positive SemiDefinitive (PSD) matrix, a projection into the PSD cone is required at every iteration, whose cost is cubic in the dimensionality making it unsuitable for high dimensional data; (ii) The number of variables that needs to be optimized in DML is quadratic in the dimensionality, which results in the slow convergence rate in optimization and high requirement of memory storage; (iii) The number of constraints used by DML is at least quadratic, if not cubic, in the number of examples depending on if pairwise constraints or triplet constraints are used in DML. Besides, features can be redundant due to high dimensional representations (e.g., face features) and DML with feature selection is preferred for these applications.The main contribution of this dissertation is to address these challenges both theoretically and empirically. First, for the challenge arising from the PSD projection, we exploit the minibatch strategy and adaptive sampling with smooth loss function to significantly reduce the number of updates (i.e., projections) while keeping the similar performance. Second, for the challenge arising from high dimensionality, we propose a dual random projection approach, which enjoys the light computation due to the usage of random projection and at the same time, significantly improves the effectiveness of random projection. Third, for the challenge with largescale constraints, we develop a novel multistage metric learning framework. It divides the original optimization problem into multiple stages. It reduces the computation by adaptively sampling a small subset of constraints at each stage. Finally, to handle redundant features with group property, we develop a greedy algorithm that selects feature group and learns the corresponding metric simultaneously at each iteration leading to further improvement of learning efficiency when combined with adaptive minibatch strategy and incremental sampling. Besides the theoretical and empirical investigation of DML on the benchmark datasets of machine learning, we also apply the proposed methods to several important computer vision applications (i.e., finegrained visual categorization (FGVC) and face recognition).
Show less
 Title
 Elucidating the evolutionary origins of collective animal behavior
 Creator
 Olson, Randal S.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Despite over a century of research, the evolutionary origins of collective animal behavior remain unclear. Dozens of hypotheses explaining the evolution of collective behavior have risen and fallen in the past century, but until recently it has been difficult to perform controlled behavioral evolution experiments to isolate these various hypotheses and test their individual effects. In this dissertation, I outline a relatively new method using digital models of evolution to perform controlled...
Show moreDespite over a century of research, the evolutionary origins of collective animal behavior remain unclear. Dozens of hypotheses explaining the evolution of collective behavior have risen and fallen in the past century, but until recently it has been difficult to perform controlled behavioral evolution experiments to isolate these various hypotheses and test their individual effects. In this dissertation, I outline a relatively new method using digital models of evolution to perform controlled behavioral evolution experiments. In particular, I use these models to directly explore the evolutionary consequence of the selfish herd, predator confusion, and the many eyes hypotheses, and demonstrate how the models can lend key insights useful to behavioral biologists, computer scientists, and robotics researchers. This dissertation lays the groundwork for the experimental study of the hypotheses surrounding the evolution of collective animal behavior, and establishes a path for future experiments to explore and disentangle how the various hypothesized benefits of collective behavior interact over evolutionary time.
Show less
 Title
 Hostsymbiont coevolution in digital and microbial systems
 Creator
 Zaman, Luis
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Darwin's image of the entangled bank captures foremost the pervasiveness of life as it clothes the earth, but it also captures how intimately species interact and often depend on one another. This interaction is particularly pronounced for obligate parasites, who's livelihoods depend on interactions with their hosts and who's hosts often pay severely. In my thesis, I first demonstrate how antagonistic coevolution in Avida leads to a diverse set of interacting host and parasite phenotypes: a...
Show moreDarwin's image of the entangled bank captures foremost the pervasiveness of life as it clothes the earth, but it also captures how intimately species interact and often depend on one another. This interaction is particularly pronounced for obligate parasites, who's livelihoods depend on interactions with their hosts and who's hosts often pay severely. In my thesis, I first demonstrate how antagonistic coevolution in Avida leads to a diverse set of interacting host and parasite phenotypes: a digital entangled bank. Second, I show how further evolution is embedded within this community context by studying the coevolution of complexity driven by parasites'population genetic memory  where the diversifying community of parasites "remembers" previously evolved hosts. Continuing to study the intersection of coevolution and community ecology, I investigate the structure of communities produced by the coevolutionary process in Avida. I show that a nested structure of interactions is common in our experiments, which is the same structure often found in natural hostparasite and plantpollinator communities as well as many phagebacteria interaction networks. In addition, I show that "growing" networks are nested by virtue of the process of incrementally adding nodes and edges. Thus, coevolution is expected to produce significantly nested communities when compared to random networks. However, the coevolved digital hostparasite networks are significantly more nested than expected from this neutral growth process. The interactions between hosts and their intimately interacting partners are not just parasitic, instead they span a broad range and include many mutualistic interactions. In the last section of my thesis, I study evolution and coevolution along the parasitismmutualism continuum using a temperate λ phage system that provides its host with access to an otherwise unavailable metabolic pathway. Instead of evolving more mutualistic phage as I predicted, both the phage and bacteria evolved cheating strategies.
Show less
 Title
 Near duplicate image search
 Creator
 Li, Fengjie
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Information retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e...
Show moreInformation retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e.g. dense feature vectors) and bagoffeatures (e.g. local feature represented images). For sparse data, inverted index and document retrieval models had been proved to be very effective for large scale retrieval problems. For dense data and bagoffeature data, however, there are still some open problems. For example, Locality Sensitive Hashing, a stateoftheart method for searching high dimensional vectors, often fails to make a good tradeoff between precision and recall. Namely, it tends to achieve high preci sion but with low recall or vice versa. The bagofwords model, a popular approach for searching objects represented bagoffeatures, has a limited performance because of the information loss during the quantization procedure.Since the general problem of searching objects represented in dense vectors and bagoffeatures may be too challenging, in this dissertation, we focus on nearly duplicate search, in which the matched objects is almost identical to the query. By effectively exploring the statistical proper ties of near duplicities, we will be able to design more effective indexing schemes and search algorithms. Thus, the focus of this dissertation is to design new indexing methods and retrieval algorithms, for near duplicate search in large scale databases, that accurately capture the data simi larity and delivers more accurate and efficient search. Below, we summarize the main contributions of this dissertation:Our first contribution is a new algorithm for searching near duplicate bagoffeatures data. The proposed algorithm, named random seeding quantization, is more efficient in generating bagof words representations for near duplicate images. The new scheme is motivated by approximating the optimal partial matching between bagoffeatures, and thus produces a bagofwords representation capturing the true similarities of the data, leading to more accurate and efficient retrieval of bagoffeatures data.Our second contribution, termed Random Projection Filtering, is a search algorithm designed for efficient near duplicate vector search. By explicitly exploiting the statistical properties of near duplicity, the algorithm projects high dimensional vectors into lower dimensional space and filter out irrelevant items. Our effective filtering procedure makes RPF more accurate and efficient to identify nearly duplicate objects in databases.Our third contribution is to develop and evaluate a new randomized range search algorithm for near duplicate vectors in high dimensional spaces, termed as Random Projection Search. Different from RPF, the algorithm presented in this chapter is suitable for a wider range of applications be cause it does not require the sparsity constrains for high search accuracy. The key idea is to project both the data points and the query point into an one dimensional space by a random projection, and perform one dimensional range search to find the subset of data points that are within the range of a given query using binary search. We prove the theoretical guarantee for the proposed algorithm and evaluate its empirical performance on a dataset of 1.1 billion image features.
Show less