You are here
Search results
(1  20 of 91)
Pages
 Title
 Defending against browser based data exfiltration attacks
 Creator
 Sood, Aditya
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

The global nature of Internet has revolutionized cultural and commercial interactions while at the same time it has provided opportunities for cyber criminals. Crimeware services now exist that have transformed the nature of cyber crime by making it more automated and robust. Furthermore, these crimeware services are sold as a part of a growing underground economy. This underground economy has provided a financial incentive to create and market more sophisticated crimeware. Botnets have...
Show moreThe global nature of Internet has revolutionized cultural and commercial interactions while at the same time it has provided opportunities for cyber criminals. Crimeware services now exist that have transformed the nature of cyber crime by making it more automated and robust. Furthermore, these crimeware services are sold as a part of a growing underground economy. This underground economy has provided a financial incentive to create and market more sophisticated crimeware. Botnets have evolved to become the primary, automated crimeware. The current, third generation of botnets targets online financial institutions across the globe. Willie Sutton, the bank robber, when asked why he robbed banks is credited with replying: "That is where the money is." Today, financial institutions are online so "that is where the money is" and criminals are swarming. Because the browser is most people's window to the Internet, it has become the primary target of crimeware, bots in particular. A common task is to steal credentials for financial institutions such as accounts and passwords.Our goal is to prevent browserbased data exfiltration attacks. Currently bots use a variant of the ManintheMiddle attack known as the ManintheBrowser attack for data exfiltration. The two most widely deployed browserbased data exfiltration attacks are Formgrabbing and Web Injects. Formgrabbing is used to steal data such as credentials in web forms while the Web Injects attack is used to coerce the user to provide supplemental information such as a Social Security Number (SSN). Current security techniques emphasize detection of malware. We take the opposite approach and assume that clients are infected with malware and then work to thwart their attack. This thesis makes the following contributions:We introduce WPSeal, a method that a financial institution can use to discover that a Webinject attack is happening so an account can be shut down before any damage occurs. This technique is done entirely on the server side (such as the financial institution's side).We developed a technique to encrypt form data, rendering it useless for theft. This technique is controlled from the server side (such as the financial institution's side). Using WPSeal, we can detect if the encryption scheme has been tampered with.We present an argument that current hookingbased capabilities of bots cannot circumvent WPSeal (as well as the encryption that WPSeal protects). That is, criminals will have to come up with a totally different class of attack.In both cases, we do not prevent the attack. Instead, we detect the attack before damage can be done, rendering the attack harmless.
Show less
 Title
 Aquatic environment monitoring using robotic sensor networks
 Creator
 Wang, Yu, Ph. D.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Aquatic environment has been facing an increasing number of threats from various harmful aquatic processes such as oil spills, harmful algal blooms (HABs), and aquatic debris. These processes greatly endanger the aquatic ecosystems, marine life, human health, and water transport. Hence, it is of great interest to detect these processes and monitor their evolution such that proper actions can be taken to prevent the potential risks. This dissertation explores four representative problems in...
Show moreAquatic environment has been facing an increasing number of threats from various harmful aquatic processes such as oil spills, harmful algal blooms (HABs), and aquatic debris. These processes greatly endanger the aquatic ecosystems, marine life, human health, and water transport. Hence, it is of great interest to detect these processes and monitor their evolution such that proper actions can be taken to prevent the potential risks. This dissertation explores four representative problems in aquatic environment monitoring, which include diffusion processing profiling, spatiotemporal field reconstruction, aquatic debris surveillance, and water surface monitoring.First, we propose an accuracyaware approach to profiling an aquatic diffusion process such as oil spill. In our approach, the robotic sensors collaboratively profile the characteristics of a diffusion process including its source location, discharged substance amount, and evolution over time. In particular, the robotic sensors reposition themselves to progressively improve the profiling accuracy. We formulate a novel movement scheduling problem that aims to maximize the profiling accuracy subject to limited sensor mobility and energy budget. To solve this problem, we develop an efficient gradientascentbased algorithm and a nearoptimal dynamicprogrammingbased algorithm.Second, we present a novel approach to reconstructing a spatiotemporal aquatic field such as HABs. This approach features a rendezvousbased mobility control scheme where robotic sensors collaborate in the form of a swarm to sense the aquatic environment in a series of carefully chosen rendezvous regions. We design a novel feedback control algorithm that maintains the desirable level of wireless connectivity for a sensor swarm in the presence of significant environment and system dynamics. Moreover, informationtheoretic analysis is used to guide the selection of rendezvous regions so that the reconstruction accuracy is maximized subject to the limited sensor mobility.Third, we develop a visionbased, cloudenabled, lowcost, yet intelligent solution to aquatic debris surveillance. Our approach features realtime debris detection and coveragebased rotation scheduling algorithms. Specifically, the image processing algorithms for debris detection are specifically designed to address the unique challenges in aquatic environment, e.g.,, constant camera shaking due to waves. The rotation scheduling algorithm provides effective coverage of sporadic debris arrivals despite camera's limited angular view. Moreover, we design a dynamic task offloading scheme to offload the computationintensive processing tasks to the cloud for battery power conservation.Finally, we design Samba  an aquatic surveillance robot that integrates an offtheshelf Android smartphone and a robotic fish for general water surface monitoring. Using the builtin camera of onboard smartphone, Samba can detect spatially dispersed aquatic processes. To reduce the excessive false alarms caused by the nonwater area, Samba segments the captured images and performs target detection in the identified water area only. We propose a novel approach that leverages the powerefficient inertial sensors on smartphone to assist the image processing. Samba also features a set of lightweight and robust computer vision algorithms, which detect harmful aquatic processes based on their distinctive color features.
Show less
 Title
 Noncoding RNA identification in largescale genomic data
 Creator
 Yuan, Cheng
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between WatsonCrick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes....
Show moreNoncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between WatsonCrick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes. Identification of novel and known ncRNAs becomes increasingly important in order to understand their functionalities and the underlying communities.Nextgeneration sequencing (NGS) technology sheds lights on more comprehensive and sensitive ncRNA annotation. Lowly transcribed ncRNAs or ncRNAs from rare species with low abundance may be identified via deep sequencing. However, there exist several challenges in ncRNA identification in largescale genomic data. First, the massive volume of datasets could lead to very long computation time, making existing algorithms infeasible. Second, NGS has relatively high error rate, which could further complicate the problem. Third, high sequence similarity among related ncRNAs could make them difficult to identify, resulting in incorrect output. Fourth, while secondary structures should be adopted for accurate ncRNA identification, they usually incur high computational complexity. In particular, some ncRNAs contain pseudoknot structures, which cannot be effectively modeled by the stateoftheart approach. As a result, ncRNAs containing pseudoknots are hard to annotate.In my PhD work, I aimed to tackle the above challenges in ncRNA identification. First, I designed a progressive search pipeline to identify ncRNAs containing pseudoknot structures. The algorithms are more efficient than the stateoftheart approaches and can be used for largescale data. Second, I designed a ncRNA classification tool for short reads in NGS data lacking quality reference genomes. The initial homology search phase significantly reduces size of the original input, making the tool feasible for largescale data. Last, I focused on identifying 16S ribosomal RNAs from NGS data. 16S ribosomal RNAs are very important type of ncRNAs, which can be used for phylogenic study. A set of graph based assembly algorithms were applied to form longer or fulllength 16S rRNA contigs. I utilized pairedend information in NGS data, so lowly abundant 16S genes can also be identified. To reduce the complexity of problem and make the tool practical for largescale data, I designed a list of error correction and graph reduction techniques for graph simplification.
Show less
 Title
 Data clustering with pairwise constraints
 Creator
 Yi, Jinfeng
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The classical unsupervised clustering is an illposed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of nonexpert crowd workers, which leads to the problem of crowdclustering, and (ii)...
Show moreThe classical unsupervised clustering is an illposed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of nonexpert crowd workers, which leads to the problem of crowdclustering, and (ii) pairwise constraints collected from oracle or experts, which leads to the problem of semisupervised clustering. In both cases, the costs of collecting pairwise constraints can be expensive, thus it is important to identify the minimal number of pairwise constraints needed to accurately recover the underlying true data partition, also known as a sample complexity problem.In this thesis, we first analyze the sample complexity of crowdclustering. At first, we propose a novel crowdclustering approach based on the theory of matrix completion. Unlike the existing crowdclustering algorithm that is based on a Bayesian generative model, the proposed approach is more desirable since it only needs a much less number of crowdsourced pairwise annotations to accurately cluster all the objects. Our theoretical analysis shows that in order to accurately cluster $N$ objects, only $O(N\log^2 N)$ randomly sampled pairs should be annotated by crowd workers. To further reduce the sample complexity, we then introduce a semicrowdsourced clustering framework that is able to effectively incorporate the lowlevel features of the objects to be clustered. In this framework, we only need to sample a subset of $n \ll N$ objects and generate their pairwise constraints via crowdsourcing. After completing a $n \times n$ similarity matrix using the proposed crowdclustering algorithm, we can further recover a $N \times N$ similarity matrix by applying a regressionbased distance metric learning algorithm to the completed smaller size similarity matrix. This enables us to reliably cluster $N$ objects with only $O(n\log^2 n)$ crowdsourced pairwise constraints.Next, we study the problem of sample complexity in semisupervised clustering. To this end, we propose a novel convex semisupervised clustering approach based on the theory of matrix completion. In order to reduce the number of pairwise constraints needed %to achieve a perfect data partitioning,we apply a nature assumption that the feature representationsof the objects are able to reflect the similarities between objects. This enables us to only utilize $O(\log N)$ pairwiseconstraints to perfectly recover the data partition of $N$ objects.Lastly, in addition to sample complexity that relates to labeling costs, we also consider the computational costs of semisupervised clustering.%In addition to sample complexity that relates to the labeling costs, we also consider the computational cost of semisupervised clustering in the final part of this thesis.Specifically, we study the problem of efficiently updating clustering results when the pairwise constraints are generated sequentially, a common case in various realworld applications such as social networks. To address this issue, we develop a dynamic semisupervised clustering algorithm that casts the clustering problem into a searching problem in a feasibleconvex space, i.e., a convex hull with its extreme points being an ensemble of multiple data partitions. Unlike classical semisupervised clustering algorithms that need to reoptimize their objective functions when new pairwise constraints are generated, the proposed method only needs to update a lowdimensional vector and its time complexity is irrelevant to the number of data points to be clustered. This enables us to update largescale clustering results in an extremely efficient way.
Show less
 Title
 Exploiting crosstechnology interference for efficient network services in wireless systems
 Creator
 Zhou, Ruogu
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

In the last decade, we have witnessed the wide adoption of a variety of wireless technologies like WiFi, Cellular, Bluetooth, ZigBee, and Nearfield Communication(NFC). However, the fast growth of wireless networks generates significant crosstechnology interference, which leads to network performance degradation and potential security breach. In this dissertation, we propose two novel physical layer techniques to deal with the interference, and improve the performance and security of sensor...
Show moreIn the last decade, we have witnessed the wide adoption of a variety of wireless technologies like WiFi, Cellular, Bluetooth, ZigBee, and Nearfield Communication(NFC). However, the fast growth of wireless networks generates significant crosstechnology interference, which leads to network performance degradation and potential security breach. In this dissertation, we propose two novel physical layer techniques to deal with the interference, and improve the performance and security of sensor networks and mobile systems, respectively. First, we exploit the WiFi interference as a ``blessing" in the design of sensor networks and develop novel WiFi interference detection techniques for ZigBee sensors. Second, utilizing these techniques, we design three efficient network services: WiFi discovery which detects the existence of nearby WiFi networks using ZigBee sensors, WiFi performance monitoring which measures and tracks performance of WiFi networks using a ZigBee sensor network, and time synchronization which provides synchronized clocks for sensor networks based on WiFi signals. Third, we design a novel, noninvasive NFC security system called {\em nShield} to reduce the transmission power of NFC radios, which protects NFC against passive eavesdropping. nShield implements a novel adaptive RF attenuation scheme, in which the extra RF energy of NFC transmissions is determined and absorbed by nShield. At the same time, nShield scavenges the extra RF energy to sustain the perpetual operation. Together with the extremely lopower design, it enables nShield to provide the host uninterrupted protection against malicious eavesdropping. The above systems are implemented and extensively evaluated on a testbed of sensor networks and smartphones.
Show less
 Title
 Exploiting smoothness in statistical learning, sequential prediction, and stochastic optimization
 Creator
 Mahdavi, Mehrdad
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss...
Show moreIn the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss functions (e.g., Lipschitzness, strong convexity, and smoothness). On the other hand, emergence of datasets of an unprecedented size, demands the development of novel and more efficient optimization algorithms to tackle largescale learning problems. The overarching goal of this thesis is to reassess the smoothness of loss functions in statistical learning, sequential prediction/online learning, and stochastic optimization and explicate its consequences. In particular we examine how leveraging smoothness of loss function could be beneficial or detrimental in these settings in terms of sample complexity, statistical consistency, regret analysis, and convergence rate. In the statistical learning framework, we investigate the sample complexity of learning problems when the loss function is smooth and strongly convex and the learner is provided with the target risk as a prior knowledge. We establish that under these assumptions, by exploiting the smoothness of loss function, we are able to improve the sample complexity of learning exponentially. Furthermore, the proof of our results is constructive and is rooted in a properly designed stochastic optimization algorithm which could be of significant practical importance. We also investigate the smoothness from the viewpoint ofstatistical consistency and show that in sharp contrast to optimization and generalization where the smoothness is favorable because of its computational and theoretical virtues, the smoothness of surrogate loss function might deteriorate the binary excess risk. Motivated by this negative result, we provide a unified analysis of three types of errors including optimization error, generalization bound, and the error in translating convex excess risk into a binary excess risk, and underline the conditions that smoothness might be preferred.We then turn to elaborate the importance of smoothness in sequential prediction/online learning. We introduce a new measure to assess the performance of online learning algorithms which is referred to asgradual variation . The gradual variation is measured by the sum of the distances between every two consecutive loss functions and is more suitable for gradually evolving environments such as stock prediction. Under smoothness assumption, we devise novel algorithms for online convex optimization with regret bounded by gradual variation. The proposed algorithms can take advantage of benign sequences and at the same time protect against the adversarial sequences of loss functions. Finally, we investigate how to exploit the smoothness of loss function in convex optimization. Unlike the optimization methods based on full gradients, the smoothness assumption was not exploited by most of the existing stochastic optimization methods. We propose a novel optimization paradigm that is referred to asmixed optimization which interpolates between stochastic and full gradient methods and is able to exploit the smoothness of loss functions to obtain faster convergence rates in stochastic optimization, and condition number independent accesses of full gradients in deterministic optimization. The key underlying insight of mixed optimization is to utilize infrequent full gradients of the objective function to progressively reduce the variance of the stochastic gradients. These results show an intricate interplay between stochastic and deterministic convex optimization to take advantages of their individual merits. We also propose efficientprojectionfree optimization algorithms to tackle the computational challenge arising from the projection steps which are required at each iteration of most existing gradient based optimization methods to ensure the feasibility of intermediate solutions. In stochastic optimization setting, by introducing and leveraging smoothness, we develop novel methods which only require one projection at the final iteration. In online learning setting, we consider online convex optimization with soft constraints where the constraints are allowed to be satisfied on long term. We show that by compromising on the learner's regret, one can devise efficient online learning algorithms with sublinear bound on both the regret and the violation of the constraints
Show less
 Title
 Cybersickness Prioritzation and Modeling
 Creator
 Rebenitsch, Lisa Renee
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Motion sicknesslike symptoms due to visual stimuli, or cybersickness, are a topic of significant public concern. From threedimensional movies to new head mounted displays such as the Oculus Rift, the general public is being offered everincreasing opportunities for immersive experiences. However, virtual environment opportunities are often accompanied with increased reports of "movie theatre sickness" and other cybersickness symptoms. Research in the field has posed over forty potential...
Show moreMotion sicknesslike symptoms due to visual stimuli, or cybersickness, are a topic of significant public concern. From threedimensional movies to new head mounted displays such as the Oculus Rift, the general public is being offered everincreasing opportunities for immersive experiences. However, virtual environment opportunities are often accompanied with increased reports of "movie theatre sickness" and other cybersickness symptoms. Research in the field has posed over forty potential factors that affect cybersickness. The objective of this thesis is to prioritize cybersickness human factors and develop models to predict their effects. A thorough search of the literature is given, which summarizes the support, or lack thereof, for each factor. New experimentation is conducted for the factors with insufficient existing evidence. The factors with sufficient support were fitted to singleterm models to use for guidelines and normalization. Cumulative models are then developed to predict individual and configuration level cybersickness effects. The literature on certain factors is unclear due to selective reporting of configuration features, various configurations, different measurement methods, and limited studies. The literature regarding individual susceptibility to cybersickness is particularly limited. Therefore, experiments were performed that specifically addressed individual susceptibility with different display modalities. Common display methods such as headmounted displays and screens were examined. Stereoscopic effects were also examined due to recent renewed interest and unclear effects. All the experiments included surveys to collect data on individual susceptibility factors.Cybersickness effects are not normally distributed, which decreases the reliability of common statistical measures on cybersickness data. A new statistical method for cybersickness research, called the zeroinflated model, is examined as an improved method for comparing the impact of the features involved in cybersickness. The zeroinflated model has the benefit of having a "nonsusceptible" or "well" group built into the statistical test, which can be upwards of 50% of the participants in a study. An analysis of the literature permitted several variables to be removed from further consideration and allowed single factor models to be created. Three independent variability models for individual susceptibility were found. The individual model explains 37% of the adjusted variance of a model and the linear models created for the hardware and software explain 5570% of the adjusted variance. Since the variance as predicted by the models is not purely additive, the total explained variance with these models is likely 7080%, which is a significant improvement over the models proposed by Kolasinski's that explain 34% of the variability of the model [1]. The models created using experimental results incorporating the individual, hardware and software had an average absolute residual accuracy of less than 1%.
Show less
 Title
 Finding optimized bounding boxes of polytopes in ddimensional space and their properties in kdimensional projections
 Creator
 Shahid, Salman (Of Michigan State University)
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Using minimal bounding boxes to encapsulate or approximate a set of points in ddimensional space is a nontrivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in ddimensional space is a nontrivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in ddimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the kdimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
 Title
 Fingerprint recognition : models and applications
 Creator
 Yoon, Soweon
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Fingerprint recognition has been successfully used in law enforcement and forensics to identify suspects and victims for over a century. Recent advances in automated fingerprint identification systems (AFIS), coupled with the growing need for reliable person recognition, have resulted in an increased deployment of AFIS in broad applications such as border control, employment background checks, secure facility access, and user authentication in laptops and mobile devices. Despite the success...
Show moreFingerprint recognition has been successfully used in law enforcement and forensics to identify suspects and victims for over a century. Recent advances in automated fingerprint identification systems (AFIS), coupled with the growing need for reliable person recognition, have resulted in an increased deployment of AFIS in broad applications such as border control, employment background checks, secure facility access, and user authentication in laptops and mobile devices. Despite the success of fingerprint recognition technique in many largescale and diverse person identification applications, several challenging issues in fingerprint recognition still need to be addressed. First, the persistence and uniqueness of fingerprintsthe fundamental premises for fingerprint recognitionremain as presumptions rather than facts with solid scientific underpinnings. Although some studies have addressed the uniqueness of fingerprints, there has been no systematic study reported on the persistence of fingerprints. Given a large longitudinal fingerprint database, we have analyzed it with a multilevel statistical model to assess the impact of time interval between a genuine fingerprint pair on the corresponding match score and identification decision. Second, an appropriate mathematical model for fingerprint orientation field is necessary in addressing a number of challenging problems, including feature extraction (e.g., orientation field and minutiae) from noisy or partial fingerprints, detecting abnormality in fingerprint patterns, and representing fingerprints in a compact form. To this end, we have developed a global orientation field model in the form of ordinary differential equations which is constrained by the number of singular points in a fingerprint. The model represents the global orientation field with a small number of polynomial terms and outputs fingerprintlike orientation fields with the specific number of singular points after fitting an input pattern. We use this model to check the fingerprintness of the input image and ensure the integrity of exemplar fingerprint databases. Third, given that automatic latent fingerprint matching is difficult due to poor quality of latent fingerprints found at crime scenes, we have developed a semiautomatic latent enhancement method. The proposed algorithm is based on robust orientation field estimation that is able to (i) reduce human intervention in feature markup and (ii) improve automatic feature extraction and matching accuracy of latents. Fourth, fingerprint obfuscation or alteration is a type of attack on AFIS that is of concern to law enforcement and border crossing agencies. We show that the fingerprint matching accuracy can greatly deteriorate when the altered fingerprints are presented to AFIS. To address this deficiency of AFIS, we have (i) analyzed the types of fingerprint obfuscation, (ii) developed a detection algorithm for altered fingerprints by measuring the abnormality in orientation field and minutiae distribution, and (iii) proposed a restoration and matching algorithm to possibly link an altered fingerprint to its prealtered mate. By addressing these contemporary problems, this dissertation has advanced our understanding of fingerprint recognition and enabled development of robust and accurate fingerprint recognition algorithms.
Show less
 Title
 Exploring jointlevel control in evolutionary robotics
 Creator
 Moore, Jared M.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

"In this dissertation, we use computational evolution and physics simulations to explore both control and morphology in robotic systems. Specifically we investigate jointlevel control strategies and their interaction with morphological elements."  Abstract.
 Title
 Identification and analysis of noncoding RNAs in large scale genomic data
 Creator
 Achawanantakun, Rujira
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The highthroughput sequencing technologies have created the opportunity of largescale transcriptome analyses and intensify attention on the study of noncoding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their...
Show moreThe highthroughput sequencing technologies have created the opportunity of largescale transcriptome analyses and intensify attention on the study of noncoding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their sequences and secondary structures. Thus, accurate secondary structure prediction provides important information to understand the tertiary structures and thus the functions of ncRNAs.The stateoftheart ncRNA identification tools are mainly based on two approaches. The first approach is a comparative structure analysis, which determines the consensus structure from homologous ncRNAs. Structure prediction is a costly process, because the size of the putative structures increases exponentially with the sequence length. Thus it is not practical for very long ncRNAs such as lncRNAs. The accuracy of current structure prediction tools is still not satisfactory, especially on sequences containing pseudoknots. An alternative identification approach that has been increasingly popular is sequence based expression analysis, which relies on next generation sequencing (NGS) technologies for quantifying gene expression on a genomewide scale. The specific expression patterns are used to identify the type of ncRNAs. This method therefore is limited to ncRNAs that have medium to high expression levels and have the unique expression patterns that are different from other ncRNAs. In this work, we address the challenges presented in ncRNA identification using different approaches. To be specific, we have proposed four tools, grammarstring based alignment, KnotShape, KnotStructure, and lncRNAID. Grammarstring is a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a contextfree grammar and a full RNA grammar including pseudoknots. It simplifies a complicated structure alignment to a simple grammar stringbased alignment. Also, grammarstringbased alignment incorporates both sequence and structure into multiple sequence alignment. Thus, we can then enhance the speed of alignment and achieve an accurate consensus structure. KnotShape and KnotStructure focus on reducing the size of the structure search space to enhance the speed of a structure prediction process. KnotShape predicts the best shape by grouping similar structures together and applying SVM classification to select the best representative shape. KnotStructure improve the performance of structure prediction by using grammarstring basedalignment and the predicted shape output by KnotShape.lncRNAID is specially designed for lncRNA identification. It incorporates balanced random forest learning to construct a classification model to distinguish lncRNA from proteincoding sequences. The major advantage is that it can maintain a good predictive performance under the limited or imbalanced training data.
Show less
 Title
 Referential grounding towards mediating shared perceptual basis in situated dialogue
 Creator
 Liu, Changsong, Ph. D.
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

In situated dialogue, although an artificial agent (e.g., robot) and its human partner are copresent in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the agent to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). The work presented in this dissertation focuses on...
Show moreIn situated dialogue, although an artificial agent (e.g., robot) and its human partner are copresent in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the agent to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). The work presented in this dissertation focuses on computational approaches to enable robust and adaptive referential grounding in situated settings.First, graphbased representations are employed to capture a human speaker's linguistic discourse and an agent's visual perception. Referential grounding is then formulated as a graphmatching problem, and a statespace search algorithm is applied to ground linguistic references onto perceived objects. Furthermore, hypergraph representations are used to account for groupbased descriptions, and one prevalent pattern of collaborative communication observed from a humanhuman dialogue dataset is incorporated into the search algorithm. This graphmatching based approach thus provides a principled way to model and utilize spatial relations, groupbased descriptions, and collaborative referring discourse in situated dialogue. Evaluation results demonstrate that, when the agent's visual perception is unreliable due to computer vision errors, the graphbased approach significantly improves referential grounding accuracy over a baseline which only relies on objectproperties.Second, an optimization based approach is proposed to mediate the perceptual differences between an agent and a human. Through online interaction with the human, the agent can learn a set of weights which indicate how reliably/unreliably each dimension (object type, object color, etc.) of its perception of the environment maps to the human’s linguistic descriptors. Then the agent can adapt to the situation by applying the learned weights to the grounding process and/or adjusting its word grounding models accordingly. Empirical evaluation shows this weightlearning approach can successfully adjust the weights to reflect the agent’s perceptual insufficiencies. The learned weights, together with updated word grounding models, can lead to a significant improvement for referential grounding in subsequent dialogues.Third, a probabilistic labeling algorithm is introduced to handle uncertainties from visual perception and language processing, and to potentially support generation of collaborative responses in the future. The probabilistic labeling algorithm is formulated under the Bayesian reasoning framework. It provides a unified probabilistic scheme to integrate different types of evidence from the collaborative referring discourse, and to generate ranked multiple grounding hypotheses for followup processes. Evaluated on the same dataset, probabilistic labeling significantly outperforms statespace search in both accuracy and efficiency.All these approaches contribute to the ultimate goal of building collaborative dialogue agents for situated interaction, so that the next generation of intelligent machines/devices can better serve human users in daily work and life.
Show less
 Title
 Distancepreserving graphs
 Creator
 Nussbaum, Ronald
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Let G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distancepreserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distancepreserving. We show that all hypercubes and graphs with delta(G)>=2n/31 are distancepreserving. Towards this end...
Show moreLet G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distancepreserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distancepreserving. We show that all hypercubes and graphs with delta(G)>=2n/31 are distancepreserving. Towards this end, we carefully examine the role of "forbidden" subgraphs. We discuss our observations, and provide some conjectures which we computationally verified for small values of n. We say that a distancepreserving graph is sequentially distancepreserving if each subgraph in the set of isometric subgraphs is a superset of the previous one, and consider this special case as well.There are a number of questions involving the construction of distancepreserving graphs. We show that it is always possible to add an edge to a noncomplete sequentially distancepreserving graph such that the augmented graph is still sequentially distancepreserving. We further conjecture that the same is true of all distancepreserving graphs. We discuss our observations on making nondistancepreserving graphs into distance preserving ones via adding edges. We show methods for constructing regular distancepreserving graphs, and consider constructing distancepreserving graphs for arbitrary degree sequences. As before, all conjectures here have been computationally verified for small values of n.
Show less
 Title
 Hierarchical learning for large multiclass classification in network data
 Creator
 Liu, Lei
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

Multiclass learning from network data is an important but challenging problem with many applications, including malware detection in computer networks, user modeling in social networks, and protein function prediction in biological networks. Despite the extensive research on large multiclass learning, there are still numerous issues that have not been sufficiently addressed, such as efficiency of model testing, interpretability of the induced models, as well as the ability to handle...
Show moreMulticlass learning from network data is an important but challenging problem with many applications, including malware detection in computer networks, user modeling in social networks, and protein function prediction in biological networks. Despite the extensive research on large multiclass learning, there are still numerous issues that have not been sufficiently addressed, such as efficiency of model testing, interpretability of the induced models, as well as the ability to handle imbalanced classes. To overcome these challenges, there has been increasing interest in recent years to develop hierarchical learning methods for large multiclass problems. Unfortunately, none of them were designed for classification of network data. In addition, there are very few studies devoted to classification of heterogeneous networks, where the nodes may have different feature sets. This thesis aims to overcome these limitations with the following contributions.First, as the number of classes in big data applications can be very large, ranging from thousands to possibly millions, two hierarchical learning schemes are proposed to deal with the socalled extreme multiclass learning problems. The first approach, known as recursive nonnegative matrix factorization (RNMF), is designed to achieve sublinear runtime in classifying test data. Although RNMF reduces the test time significantly, it may also assign the same class to multiple leaf nodes, which hampers the interpretability of the model as a concept hierarchy for the classes. Furthermore, since RNMF employs a greedy strategy to partition the classes, there is no theoretical guarantee that the partitions generated by the tree would lead to a globally optimal solution.To address the limitations of RNMF, an alternative hierarchical learning method known as matrix factorization tree (MFTree) is proposed. Unlike RNMF, MFtree is designed to optimize a global objective function while learning its taxonomy structure. A formal proof is provided to show the equivalence between the objective function of MFtree and the HilbertSchmidt Independence Criterion (HSIC). Furthermore, to improve its training efficiency, a fast algorithm for inducing approximate MFTree is also developed.Next, an extension of MFTree to network data is proposed. This approach can seamlessly integrate both the link structure and node attribute information into a unified learning framework. To the best of our knowledge, this is the first study that automatically constructs a taxonomy structure to predict large multiclass problems for network classification. Empirical results suggest that the approach can effectively capture the relationship between classes and generate class taxonomy structures that resemble those produced by human experts. The approach can also be easily parallelizable and has been implemented in a MapReduce framework.Finally, we introduce a network learning task known as coclassification to classify heterogeneous nodes in multiple networks. Unlike existing node classification problems, the goal of coclassification is to learn the classifiers in multiple networks jointly, instead of learning to classify each network independently. The framework proposed in this thesis can utilize prior information about the relationship between classes in different networks to improve its prediction accuracy.
Show less
 Title
 Toward efficient spectrum use in multicarrier wireless networks
 Creator
 Huang, Pei
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The last decade has witnessed growing interest in dynamic spectrum access, which is motivated by the observation that a large portion of the radio spectrum has been licensed but remains highly underutilized while a few small unlicensed bands that are open to anyone are getting more crowded due to the explosive expansion of wireless services. To provide more flexible access to radio spectrum, dynamic spectrum access is introduced to enable unlicensed users to opportunistically utilize vacant...
Show moreThe last decade has witnessed growing interest in dynamic spectrum access, which is motivated by the observation that a large portion of the radio spectrum has been licensed but remains highly underutilized while a few small unlicensed bands that are open to anyone are getting more crowded due to the explosive expansion of wireless services. To provide more flexible access to radio spectrum, dynamic spectrum access is introduced to enable unlicensed users to opportunistically utilize vacant spectrum chunks (known as spectrum holes) in licensed frequency bands.In dynamic spectrum access, noncontiguous orthogonal frequency division multiplexing (NCOFDM) is widely adopted to efficiently utilize fragmented spectrum because it is convenient to keep silent on some spectrum fragments to avoid interference with licensed users. In NCOFDM, a band of spectrum is divided into many orthogonal subcarriers and data are transmitted on a subset of them simultaneously. The subcarriers that interfere with the licensed users are deactivated. Because each subcarrier can be managed independently, this dissertation introduces a series of techniques that exploit the subcarriers to address problems in dynamic spectrum access.When unlicensed users called secondary users (SUs) are granted the permission to operate in the licensed bands, they must ensure that the interference caused by them to licensed users known as primary users (PUs) is within a limit. Even without such a requirement, SUs should avoid as many collisions as possible. To improve spectrum hole extraction rate and reduce collision rate, we propose a spectrum occupancy prediction model that helps estimate the spectrum availability. It measures a wide band of spectrum with OFDM and groups subcarriers to subchannels based on spectrum use activities. In each subchannel, frequent spectrum occupancy patterns are identified and used to predict future channel states (i.e., busy or idle). With the prediction, SUs are able to make use of spectrum holes more aggressively without introducing undue interference to PUs.In the spectrum holes discovered above, a mechanism is needed to coordinate medium access between devices. Because devices opportunistically utilize spectrum holes, a device may experience severe contentions with devices from various applications. We propose a collision detection and bitwise arbitration (CDBA) mechanism that quickly identifies the winner in a contention using combined information from both the time domain and the frequency domain. It enables collision detection in the frequency domain by selectively deactivating subcarriers at each transmitter.The CDBA assumes that all devices adopt the same channel width, but different radio technologies have different requirements on channel width. When heterogeneous radios coexist in a contention domain, wideband devices can hardly win medium access opportunities in contention with narrowband devices. To address the problem, we propose an adaptive channel bonding protocol in which a wideband device initiates transmission as long as there exist some idle narrow channels and gradually increases channel width during transmission whenever new narrow channels become available. To increase the chance to win some narrow channels, a wideband device contends on subcarriers of each narrow channel with a different priority.After the contention problem is addressed, we study how to increase the transmission speed when a device is granted the permission to transmit. As wireless networks move toward wider channel widths, it is common that different subcarriers experience different fade. To cope with the frequencyselective fading, modulation scheme for each subcarrier should be selected based on the subcarrier channel quality. We exploit the modulation diversity in our modulation scheme coding to improve network throughput.Besides unicast, broadcast is another fundamental mechanism in wireless networks. Because devices utilize spectrum fragments opportunistically, different receivers may have different vacant spectrum fragments at different locations. The broadcast is more challenging in dynamic spectrum access because the transmitter needs to consider the diversity of spectrum availability. We propose a spectrum fragment agile broadcast (SFAB) protocol to support broadcast under nonuniform spectrum availability. It encodes unique sequences on subcarriers of each spectrum fragment to achieve fast spectrum agreement between the transmitter and the receivers.
Show less
 Title
 Geometric and topological modeling techniques for large and complex shapes
 Creator
 Feng, Xin
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

The past few decades have witnessed the incredible advancements in modeling, digitizing and visualizing techniques for three–dimensional shapes. Those advancements led to an explosion in the number of three–dimensional models being created for design, manufacture, architecture, medical imaging, etc. At the same time, the structure, function, stability, and dynamics of proteins, subcellular structures, organelles, and multiprotein complexes have emerged as a leading interest in...
Show moreThe past few decades have witnessed the incredible advancements in modeling, digitizing and visualizing techniques for three–dimensional shapes. Those advancements led to an explosion in the number of three–dimensional models being created for design, manufacture, architecture, medical imaging, etc. At the same time, the structure, function, stability, and dynamics of proteins, subcellular structures, organelles, and multiprotein complexes have emerged as a leading interest in structural biology, another major source of large and complex geometric models. Geometric modeling not only provides visualizations of shapes for large biomolecular complexes but also fills the gap between structural information and theoretical modeling, and enables the understanding of function, stability, and dynamics.We first propose, for tessellated volumes of arbitrary topology, a compact data structure that offers constant–time–complexity incidence queries among cells of any dimensions. Our data structure is simple to implement, easy to use, and allows for arbitrary, user–defined 3–cells such as prisms and hexahedra, while remaining highly efficient in memory usage compared to previous work. We also provide the analysis on its time complexity for commonly–used incidence and adjacency queries such as vertex and edge one–rings.We then introduce a suite of computational tools for volumetric data processing, information extraction, surface mesh rendering, geometric measurement, and curvature estimation for biomolecular complexes. Particular emphasis is given to the modeling of Electron Microscopy Data Bank (EMDB) data and Protein Data Bank (PDB) data. Lagrangian and Cartesian representations are discussed for the surface presentation. Based on these representations, practical algorithms are developed for surface area and surface–enclosed volume calculation, and curvature estimation. Methods for volumetric meshing have also been presented. Because the technological development in computer science and mathematics has led to a variety of choices at each stage of the geometric modeling, we discuss the rationales in the design and selection of various algorithms. Analytical test models are designed to verify the computational accuracy and convergence of proposed algorithms. We selected six EMDB data and six PDB data to demonstrate the efficacy of the proposed algorithms in handling biomolecular surfaces and explore their capability of geometric characterization of binding targets. Thus, our toolkit offers a comprehensive protocol for the geometric modeling of proteins, subcellular structures, organelles, and multiprotein complexes.Furthermore, we present a method for computing “choking” loops—a set of surface loops that describe the narrowing of the volumes inside/outside of the surface and extend the notion of surface homology and homotopy loops. The intuition behind their definition is that a choking loop represents the region where an offset of the original surface would get pinched. Our generalized loops naturally include the usual2g handles/tunnels computed based on the topology of the genus–g surface, but also include loops that identify chokepoints or bottlenecks, i.e., boundaries of small membranes separating the inside or outside volume of the surface into disconnected regions. Our definition is based on persistent homology theory, which gives a measure to topological structures, thus providing resilience to noise and a well–defined way to determine topological feature size.Finally, we explore the application of persistent homology theory in protein folding analysis. The extremely complex process of protein folding brings challenges for both experimental study and theoretical modeling. The persistent homology approach studies the Euler characteristics of the protein conformations during the folding process. More precisely, the persistence is measured by the variation of van der Waals radius, which leads to the change of protein 3D structures and uncovers the inter–connectivity. Our results on fullerenes demonstrate the potential of our geometric and topological approach to protein stability analysis.
Show less
 Title
 Multiple kernel and multilabel learning for image categorization
 Creator
 Bucak, Serhat Selçuk
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multilabel learning...
Show more"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multilabel learning algorithms with high prediction performance for image categorization... "  Abstract.
Show less
 Title
 Unobtrusive physiological monitoring using smartphones
 Creator
 Hao, Tian (Research scientist)
 Date
 2015
 Collection
 Electronic Theses & Dissertations
 Description

"This thesis presents an indepth investigation in unobtrusive smartphonebased physiological monitoring, which aims to help people get healthier and fitter in a more efficient and less costly way."  Abstract.
 Title
 Differential geometry based representations for meshes and correspondences
 Creator
 Wang, Yuanzhen, 1986
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

This dissertation presents novel meshing and correspondence inference techniques, designed for generating high quality representations of both the geometric objects and the mappings among these objects, which benefit subsequent geometry processing, modeling, and scientific computing. From abstract.
 Title
 Contributions to biometric recognition : matching identical twins and latent fingerprints
 Creator
 Paulino, Alessandra Aparecida
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

Automatic recognition of a person by the use of distinctive physical and behavioral characteristics is called biometrics. Two of the important challenges in biometrics are recognition of identical twins and matching of latent fingerprints to their exemplar prints (rolled or slap prints). The contributions of this dissertation are focused on these two topics.Identical (monozygotic) twins are a result of a single fertilized egg that splits into two cells, each one giving birth to one individual...
Show moreAutomatic recognition of a person by the use of distinctive physical and behavioral characteristics is called biometrics. Two of the important challenges in biometrics are recognition of identical twins and matching of latent fingerprints to their exemplar prints (rolled or slap prints). The contributions of this dissertation are focused on these two topics.Identical (monozygotic) twins are a result of a single fertilized egg that splits into two cells, each one giving birth to one individual. Identical twins have the same deoxyribonucleic acid (DNA), thus their genotypic features (features influenced by the genetic material) are the same. However, some of their phenotypic features (features influenced by the fetal environment) may be different. Thus, it is essential to determine which biometric traits (either by themselves or in combination with other traits) have the ability to distinguish identical twins and the extent of their ability for this discrimination.The first contribution of this dissertation is an evaluation of the performance of biometric systems in the presence of identical twins for the three most commonly used biometric modalities, namely fingerprint, face and iris. Identical twins are shown to be a challenge to current face recognition systems. On the other hand, fingerprint and iris matching of identical twins show performance comparable to those with unrelated persons. The fusion of different samples from the same modality of a subject (e.g., left and right iris, fingerprints of multiple fingers) yields the best matching performance for identical twins, similar to what has been shown for unrelated persons. Biometric traits can also be used to determine whether two subjects enrolled in a biometric database are identical twins. By using face and iris modalities together, for example, we can correctly identify 80\% of such identical twin pairs, while only 2\% of subject pairs who are not identical twins will be incorrectly considered identical twins.The second contribution of this work is focused on improving latent fingerprint matching performance. Latent fingerprints are partial fingerprint images that typically contain only a small area of friction ridge pattern and large nonlinear distortion, are blurred or smudged, and contain complex background noise. Due to these characteristics, latents are a particularly challenging for matching to their mates (reference prints) in a database. Given a latent print in which minutiae have been marked by a human expert (as is the current practice in forensics), we have proposed two approaches to improve the latent matching performance. The first approach consists of enhancing the latent image and fusing the matching score obtained from the enhanced latents with the score based on manually marked minutiae. This approach outperforms a commercial fingerprint matcher on the public latent database NIST SD 27. The second approach consists of developing a latent fingerprint matcher that utilizes minutiae as well as the orientation field information. The proposed matching algorithm outperforms three fingerprint matching algorithms on two different latent fingerprint databases (NIST SD 27 and WVU latent databases).The latent fingerprint identification accuracy generally deteriorates as the size of the fingerprint database grows. To alleviate this problem and to reduce the overall search time of latent matching, we propose to combine various level 1 and level 2 features, including minutia cylinder code, minutia triplets, singular points, orientation field and ridge period, to efficiently filter out a large portion of the reference fingerprint database. The proposed approach outperforms stateoftheart indexing techniques on the public domain latent database NIST SD27 against a large background database of 267K rolled prints. The experimental results also show that the proposed filtering scheme has the desirable property of attaining improved computational efficiency of latent search (20\% penetration rate) while maintaining the latent matching accuracy.
Show less