You are here
Search results
(1 - 20 of 23)
Pages
- Title
- Using Eventual Consistency to Improve the Performance of Distributed Graph Computation In Key-Value Stores
- Creator
- Nguyen, Duong Ngoc
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Key-value stores have gained increasing popularity due to their fast performance and simple data model. A key-value store usually consists of multiple replicas located in different geographical regions to provide higher availability and fault tolerance. Consequently, a protocol is employed to ensure that data are consistent across the replicas.The CAP theorem states the impossibility of simultaneously achieving three desirable properties in a distributed system, namely consistency,...
Show moreKey-value stores have gained increasing popularity due to their fast performance and simple data model. A key-value store usually consists of multiple replicas located in different geographical regions to provide higher availability and fault tolerance. Consequently, a protocol is employed to ensure that data are consistent across the replicas.The CAP theorem states the impossibility of simultaneously achieving three desirable properties in a distributed system, namely consistency, availability, and network partition tolerance. Since failures are a norm in distributed systems and the capability to maintain the service at an acceptable level in the presence of failures is a critical dependability and business requirement of any system, the partition tolerance property is a necessity. Consequently, the trade-off between consistency and availability (performance) is inevitable. Strong consistency is attained at the cost of slow performance and fast performance is attained at the cost of weak consistency, resulting in a spectrum of consistency models suitable for different needs. Among the consistency models, sequential consistency and eventual consistency are two common ones. The former is easier to program with but suffers from poor performance whereas the latter suffers from potential data anomalies while providing higher performance.In this dissertation, we focus on the problem of what a designer should do if he/she is asked to solve a problem on a key-value store that provides eventual consistency. Specifically, we are interested in the approaches that allow the designer to run his/her applications on an eventually consistent key-value store and handle data anomalies if they occur during the computation. To that end, we investigate two options: (1) Using detect-rollback approach, and (2) Using stabilization approach. In the first option, the designer identifies a correctness predicate, say $\Phi$, and continues to run the application as if it was running on sequential consistency, as our system monitors $\Phi$. If $\Phi$ is violated (because the underlying key-value store provides eventual consistency), the system rolls back to a state where $\Phi$ holds and the computation is resumed from there. In the second option, the data anomalies are treated as state perturbations and handled by the convergence property of stabilizing algorithms.We choose LinkedIn's Voldemort key-value store as the example key-value store for our study. We run experiments with several graph-based applications on Amazon AWS platform to evaluate the benefits of the two approaches. From the experiment results, we observe that overall, both approaches provide benefits to the applications when compared to running the applications on sequential consistency. However, stabilization provides higher benefits, especially in the aggressive stabilization mode which trades more perturbations for no locking overhead.The results suggest that while there is some cost associated with making an algorithm stabilizing, there may be a substantial benefit in revising an existing algorithm for the problem at hand to make it stabilizing and reduce the overall runtime under eventual consistency.There are several directions of extension. For the detect-rollback approach, we are working to develop a more general rollback mechanism for the applications and improve the efficiency and accuracy of the monitors. For the stabilization approach, we are working to develop an analytical model for the benefits of eventual consistency in stabilizing programs. Our current work focuses on silent stabilization and we plan to extend our approach to other variations of stabilization.
Show less
- Title
- Achieving reliable distributed systems : through efficient run-time monitoring and predicate detection
- Creator
- Tekken Valapil, Vidhya
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Runtime monitoring of distributed systems to perform predicate detection is critical as well as a challenging task. It is critical because it ensures the reliability of the system by detecting all possible violations of system requirements. It is challenging because to guarantee lack of violations one has to analyze every possible ordering of system events and this is an expensive task. In this report, wefocus on ordering events in a system run using HLC (Hybrid Logical Clocks) timestamps,...
Show moreRuntime monitoring of distributed systems to perform predicate detection is critical as well as a challenging task. It is critical because it ensures the reliability of the system by detecting all possible violations of system requirements. It is challenging because to guarantee lack of violations one has to analyze every possible ordering of system events and this is an expensive task. In this report, wefocus on ordering events in a system run using HLC (Hybrid Logical Clocks) timestamps, which are O(1) sized timestamps, and present some efficient algorithms to perform predicate detection using HLC. Since, with HLC, the runtime monitor cannot find all possible orderings of systems events, we present a new type of clock called Biased Hybrid Logical Clocks (BHLC), that are capable of finding more possible orderings than HLC. Thus we show that BHLC based predicate detection can find more violations than HLC based predicate detection. Since predicate detection based on both HLC and BHLC do not guarantee detection of all possible violations in a system run, we present an SMT (Satisfiability Modulo Theories) solver based predicate detection approach, that guarantees the detection of all possible violations in a system run. While a runtime monitor that performs predicate detection using SMT solvers is accurate, the time taken by the solver to detect the presence or absence of a violation can be high. To reduce the time taken by the runtime monitor, we propose the use of an efficient two-layered monitoring approach, where the first layer of the monitor is efficient but less accurate and the second layer is accurate but less efficient. Together they reduce the overall time taken to perform predicate detection drastically and also guarantee detection of all possible violations.
Show less
- Title
- Consistency for distributed data stores
- Creator
- Roohitavaf, Mohammad
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Geo-replicated data stores are one of the integral parts of today's Internet services. Service providers usually replicate their data on different data centers worldwide to achieve higher performance and data durability. However, when we use this approach, the consistency between replicas becomes a concern. At the highest level of consistency, we want strong consistency that provides the illusion of having only a single copy of the data. However, strong consistency comes with high performance...
Show moreGeo-replicated data stores are one of the integral parts of today's Internet services. Service providers usually replicate their data on different data centers worldwide to achieve higher performance and data durability. However, when we use this approach, the consistency between replicas becomes a concern. At the highest level of consistency, we want strong consistency that provides the illusion of having only a single copy of the data. However, strong consistency comes with high performance and availability costs. In this work, we focus on weaker consistency models that allow us to provide high performance and availability while preventing certain inconsistencies. Session guarantees (aka. client-centric consistency models) are one of such weaker consistency models that prevent some of the inconsistencies from occurring in a client session. We provide modified versions of session guarantees that, unlike traditional session guarantees, do not cause the problem of slowdown cascade for partitioned systems. We present a protocol to provide session guarantees for eBay NuKV that is a key-value store designed for eBay's internal services with high performance and availability requirements. We utilize Hybrid Logical Clocks (HLCs) to provide wait-free write operations while providing session guarantees. Our experiments, done on eBay cloud platform, show our protocol does not cause significant overhead compared with eventual consistency. In addition to session guarantees, a large portion of this dissertation is dedicated to causal consistency. Causal consistency is especially interesting as it is has been proved to be the strongest consistency model that allows the system to be available even during network partitions. We provide CausalSpartanX protocol that, using HLCs, improves current time-based protocols by eliminating the effect of clock anomalies such as clock skew between servers. CausalSpartanX also supports non-blocking causally consistent read-only transactions that allow applications to read a set of values that are causally consistent with each other. Read-only transactions provide a powerful abstraction that is impossible to be replaced by a set of basic read operations. CausalSpartanX, like other causal consistency protocols, assumes sticky clients (i.e. clients that never change the replica that they access). We prove if one wants immediate visibility for local updates in a data center, clients have to be sticky. Based on the structure of CausalSpartanX, we provide our Adaptive Causal Consistency Framework (ACCF) that is a configurable framework that generalizes current consistency protocols. ACCF provides a basis for designing adaptive protocols that can constantly monitor the system and clients' usage pattern and change themselves to provide better performance and availability. Finally, we present our Distributed Key-Value Framework (DKVF), a framework for rapid prototyping and benchmarking consistency protocols. DKVF lets protocol designers only focus on their high-level protocols, delegating all lower level communication and storage tasks to the framework.
Show less
- Title
- Robust multi-task learning algorithms for predictive modeling of spatial and temporal data
- Creator
- Liu, Xi (Graduate of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially...
Show more"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially when there are significant noise and missing values or limited training examples available. The goal of this thesis is to develop novel multi-task learning frameworks that can exploit the spatial and/or temporal dependencies of the data to ensure robust predictions in spite of the data quality and scarcity problems. The first framework developed in this dissertation is designed for multi-task classification of time series data. Specifically, the prediction task here is to continuously classify activities of a human subject based on the multi-modal sensor data collected in a smart home environment. As the classes exhibit strong spatial and temporal dependencies, this makes it an ideal setting for applying a multi-task learning approach. Nevertheless, since the type of sensors deployed often vary from one room (location) to another, this introduces a structured missing value problem, in which blocks of sensor data could be missing when a subject moves from one room to another. To address this challenge, a probabilistic multi-task classification framework is developed to jointly model the activity recognition tasks from all the rooms, taking into account the block-missing value problem. The framework also learns the transitional dependencies between classes to improve its overall prediction accuracy. The second framework is developed for the multi-location time series forecasting problem. Although multi-task learning has been successfully applied to many time series forecasting applications such as climate prediction, conventional approaches aim to minimize only the point-wise residual error of their predictions instead of considering how well their models fit the overall distribution of the response variable. As a result, their predicted distribution may not fully capture the true distribution of the data. In this thesis, a novel distribution-preserving multi-task learning framework is proposed for the multi-location time series forecasting problem. The framework uses a non-parametric density estimation approach to fit the distribution of the response variable and employs an L2-distance function to minimize the divergence between the predicted and true distributions. The third framework proposed in this dissertation is for the multi-step-ahead (long-range) time series prediction problem with application to ensemble forecasting of sea surface temperature. Specifically, our goal is to effectively combine the forecasts generated by various numerical models at different lead times to obtain more precise predictions. Towards this end, a multi-task deep learning framework based on a hierarchical LSTM architecture is proposed to jointly model the ensemble forecasts of different models, taking into account the temporal dependencies between forecasts at different lead times. Experiments performed on 29-year sea surface temperature data from North American Multi-Model Ensemble (NAMME) demonstrate that the proposed architecture significantly outperforms standard LSTM and other MTL approaches."--Pages ii-iii.
Show less
- Title
- Towards machine learning based source identification of encrypted video traffic
- Creator
- Shi, Yan (Of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
The rapid growth of the Internet has helped to popularize video streaming services, which has now become the most dominant content on the Internet. The management of video streaming traffic is complicated by its enormous volume, diverse communication protocols and data formats, and the widespread adoption of encryption. In this thesis, the aim is to develop a novel firewall framework, named Soft-margined Firewall, for managing encrypted video streaming traffic while avoiding violation of user...
Show moreThe rapid growth of the Internet has helped to popularize video streaming services, which has now become the most dominant content on the Internet. The management of video streaming traffic is complicated by its enormous volume, diverse communication protocols and data formats, and the widespread adoption of encryption. In this thesis, the aim is to develop a novel firewall framework, named Soft-margined Firewall, for managing encrypted video streaming traffic while avoiding violation of user privacy. The system distinguishes itself from conventional firewall systems by incorporating machine learning and Traffic Analysis (TA) as a traffic detection and blocking mechanism. The goal is to detect unknown network traffic, including traffic that is encrypted, tunneled through Virtual Private Network, or obfuscated, in realistic application scenarios. Existing TA methods have limitations in that they can deal only with simple traffic patterns-usually, only a single source of traffic is allowed in a tunnel, and a trained classifier is not portable between network locations, requiring redundant training. This work aims to address these limitations with new techniques in machine learning. The three main contributions of this work are: 1) developing new statistical features around traffic surge periods that can better identify websites with dynamic contents; 2) a two-stage classifier architecture to solve the mixed-traffic problem with state-of-the-art TA features; and 3) leveraging a novel natural-language inspired feature to solve the mixed-traffic problem using Deep-Learning methods. A fully working Soft-margin Firewall with the above distinctive features have been designed, implemented, and verified for both conventional classifiers and the proposed deep-learning based classifiers. The efficacy of the proposed system is confirmed via experiments conducted on actual network setups with a custom-built prototype firewall and OpenVPN servers. The proposed feature-classifier combinations show superior performance compared to previous state-of-the-art results. The solution that combines natural-language inspired traffic feature and Deep-Learning is demonstrated to be able to solve the mixed-traffic problem, and capable of predicting multiple labels associated with one sample. Additionally, the classifier can classify traffic recorded from locations that are different from where the trained traffic was collected. These results are the first of their kind and are expected to lead the way of creating next-generation TA-based firewall systems.
Show less
- Title
- On design and implementation of fast & secure network protocols for datacenters
- Creator
- Munir, Ali (Graduate of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
My PhD work focuses on improving the performance and security of networked systems. For network performance, my research focuses on scheduling and transport in datacenter networks. For network security, my research focuses on multipath TCP security.To improve the performance of datacenter transport, I proposed PASE, a near-optimal and deployment friendly transport protocol. To this end, I first identified the underlying strategies used by existing datacenter transports. Next, I showed that...
Show moreMy PhD work focuses on improving the performance and security of networked systems. For network performance, my research focuses on scheduling and transport in datacenter networks. For network security, my research focuses on multipath TCP security.To improve the performance of datacenter transport, I proposed PASE, a near-optimal and deployment friendly transport protocol. To this end, I first identified the underlying strategies used by existing datacenter transports. Next, I showed that these strategies are complimentary to each other, rather than substitutes, as they have different strengths and can address each other's limitations. Unfortunately, prior datacenter transports use only one of these strategies and as a result they either achieve near-optimal performance or deployment friendliness but not both. Based on this insight, I designed a datacenter transport protocol called PASE, which carefully synthesizes these strategies by assigning different transport responsibility to each strategy. To further improve the performance of datacenter transport in multi-tenant networks, I proposed Stacked Congestion Control (SCC), to achieve performance isolation and objective scheduling simultaneously. SCC is a distributed host-based bandwidth allocation framework, where an underlay congestion control layer handles contention among tenants, and a private congestion control layer for each tenant optimizes its performance objective. To my best knowledge, no prior work supported performance isolation and objective scheduling simultaneously.To improve task scheduling performance in datacenters, I proposed NEAT, a task scheduling framework that leverages information from the underlying network scheduler to make task placement decisions. Existing datacenter schedulers optimize either the placement of tasks or the scheduling of network flows. Inconsistent assumptions of the two schedulers can compromise the overall application performance. The core of NEAT is a task completion time predictor that estimates the completion time of a task under given network condition and a given network scheduling policy. Next, a distributed task placement framework leverages the predicted task completion times to make task placement decisions and minimize the average completion time of active tasks.To improve multipath TCP (MPTCP) security, I reported vulnerabilities in MPTCP that arise because of cross-path interactions between MPTCP subflows. MPTCP allows two endpoints to simultaneously use multiple paths between them. An attacker eavesdropping one MPTCP subflow can infer throughput of other subflows and also can inject forged MPTCP packets to change priorities of any MPTCP subflow. Attacker can exploit these vulnerabilities to launch the connection hijack attack on the paths he has no access to, or to divert traffic from one path to other paths. My proposed vulnerabilities fixes, changes to MPTCP specification, provide the guarantees that MPTCP is at least as secure as TCP and the original MPTCP. And has been adopted by IETF.
Show less
- Title
- Metamodeling framework for simultaneous multi-objective optimization using efficient evolutionary algorithms
- Creator
- Roy, Proteek Chandan
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Most real-world problems are comprised of multiple conflicting objectives and solutions to those problems are multiple Pareto-optimal trade-off solutions. The main challenge of these practical problems is that the objectives and constraints do not have any closed functional forms and they are expensive for computation as well. Objectives coming from finite element analysis, computational fluid dynamics software, network flow simulators, crop modeling, weather modeling or any other simulations...
Show moreMost real-world problems are comprised of multiple conflicting objectives and solutions to those problems are multiple Pareto-optimal trade-off solutions. The main challenge of these practical problems is that the objectives and constraints do not have any closed functional forms and they are expensive for computation as well. Objectives coming from finite element analysis, computational fluid dynamics software, network flow simulators, crop modeling, weather modeling or any other simulations which involve partial differential equations are good examples of expensive problems. These problems can also be regarded as l03000300ow-budget'' problems since only a few solution evaluations can be performed given limited time. Nevertheless, parameter estimation and optimization of objectives related to these simulations require a good number of solution evaluations to come up with better parameters or a reasonably good trade-off front. To provide an efficient search process within a limited number of exact evaluations, metamodel-assisted algorithms have been proposed in the literature. These algorithms attempt to construct a computationally inexpensive representative model of the problem, having the same global optima and thereby providing a way to carry out the optimization in metamodel space in an efficient way. Population-based methods like evolutionary algorithms have become standard for solving multi-objective problems and recently Metamodel-based evolutionary algorithms are being used for solving expensive problems. In this thesis, we would like to address a few challenges of metamodel-based optimization algorithms and propose some efficient and innovative ways to construct these algorithms. To approach efficient design of metamodel-based optimization algorithm, one needs to address the choice of metamodeling functions. The most trivial way is to build metamodels for each objective and constraint separately. But we can reduce the number of metamodel constructions by using some aggregated functions and target either single or multiple optima in each step. We propose a taxonomy of possible metamodel-based algorithmic frameworks which not only includes most algorithms from the literature but also suggests some new ones. We improve each of the frameworks by introducing trust region concepts in the multi-objective scenario and present two strategies for building trust regions. Apart from addressing the main bottleneck of the limited number of solution evaluations, we also propose efficient non-dominated sorting methods that further reduce computational time for a basic step of multi-objective optimization. We have carried out extensive experiments over all representative metamodeling frameworks and shown that each of them can solve a good number of test problems. We have not tried to tune the algorithmic parameters yet and it remains as our future work. Our theoretical analyses and extensive experiments suggest that we can achieve efficient metamodel-based multi-objective optimization algorithms for solving test as well as real-world expensive and low-budget problems.
Show less
- Title
- Faster algorithms for machine learning problems in high dimension
- Creator
- Ye, Mingquan
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
"When dealing with datasets with high dimension, the existing machine learning algorithms often do not work in practice. Actually, most of the real-world data has the nature of low intrinsic dimension. For example, data often lies on a low-dimensional manifold or has a low doubling dimension. Inspired by this phenomenon, this thesis tries to improve the time complexities of two fundamental problems in machine learning using some techniques in computational geometry. In Chapter two, we propose...
Show more"When dealing with datasets with high dimension, the existing machine learning algorithms often do not work in practice. Actually, most of the real-world data has the nature of low intrinsic dimension. For example, data often lies on a low-dimensional manifold or has a low doubling dimension. Inspired by this phenomenon, this thesis tries to improve the time complexities of two fundamental problems in machine learning using some techniques in computational geometry. In Chapter two, we propose a bi-criteria approximation algorithm for minimum enclosing ball with outliers and extend it to the outlier recognition problem. By virtue of the "core-set" idea and the Random Gradient Descent Tree, we propose an efficient algorithm which is linear in the number of points n and the dimensionality d, and provides a probability bound. In experiments, compared with some existing outlier recognition algorithms, our method is proven to be efficient and robust to the outlier ratios. In Chapter three, we adopt the "doubling dimension" to characterize the intrinsic dimension of a point set. By the property of doubling dimension, we can approximate the geometric alignment between two point sets by executing the existing alignment algorithms on their subsets, which achieves a much smaller time complexity. More importantly, the proposed approximate method has a theoretical upper bound and can serve as the preprocessing step of any alignment algorithm."--Page ii.
Show less
- Title
- Capturing bluetooth traffic in the wild : practical systems and privacy implications
- Creator
- Albazrqaoe, Wahhab
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
"Bluetooth wireless technology is today present in billions of smartphones, mobile devices, and portable electronics. With the prevalence of personal Bluetooth devices, a practical Bluetooth traffic sniffer is of increasing interest due to the following. First, it has been reported that a traffic sniffer is an essential, day-to-day tool for Bluetooth engineers and applications developers [4] [14]; and second, as the communication between Bluetooth devices is privacy-sensitive in nature,...
Show more"Bluetooth wireless technology is today present in billions of smartphones, mobile devices, and portable electronics. With the prevalence of personal Bluetooth devices, a practical Bluetooth traffic sniffer is of increasing interest due to the following. First, it has been reported that a traffic sniffer is an essential, day-to-day tool for Bluetooth engineers and applications developers [4] [14]; and second, as the communication between Bluetooth devices is privacy-sensitive in nature, exploring the possibility of Bluetooth traffic sniffing in practical settings sheds lights into potential user privacy leakage. To date, sniffing Bluetooth traffic has been widely considered an extremely intricate task due to wideband spread spectrum of Bluetooth, pseudo-random frequency hopping adopted by Bluetooth at baseband, and the interference in the open 2.4 GHz band. This thesis addresses these challenges by introducing novel traffic sniffers that capture Bluetooth packets in practical environments. In particular, we present the following systems. (i) BlueEar, the first practical Bluetooth traffic sniffing system only using general, inexpensive wireless platforms. BlueEar features a novel dual-radio architecture where two inexpensive, Bluetooth-compliant radios coordinate with each other to eavesdrop on hopping subchannels in indiscoverable mode. Statistic models and lightweight machine learning tools are integrated to learn the adaptive hopping behavior of the target. Our results show that BlueEar maintains a packet capture rate higher than 90% consistently in dynamic settings. In addition, we discuss the implications of the BlueEar approach on Bluetooth LE sniffing and present a practical countermeasure that effectively reduces the packet capture rate of sniffer by 70%, which can be easily implemented on the Bluetooth master while requiring no modification to slave devices like keyboards and headsets. And (ii) BlueFunnel, the first low-power, wideband traffic sniffer that monitors Bluetooth spectrum in parallel and captures packet in realtime. BlueFunnel tackles the challenge of wideband spread spectrum based on low speed, low cost ADC (2 Msamples/sec) to subsample Bluetooth spectrum. Further, it leverages a suite of novel signal processing algorithms to demodulate Bluetooth signal in realtime. We implement BlueFunnel prototype based on USRP2 devices. Specifically, we employ two USRR2 devices, each is equipped with SBX daughterboard, to build a customized software radio platform. The customized SDR platform is interfaced to the controller, which implements the digital signal processing algorithms on a personal laptop. We evaluate the system performance based on packet capture rates in a variety of interference conditions, mainly introduce by the 802.11-based WLANs. BlueFunnel maintains good levels of packet capture rates in all settings. Further, we introduce two scenarios of attacks against Bluetooth, where BlueFunnel successfully reveals sensitive information about the target link."--Pages ii-iii.
Show less
- Title
- Fluid animation on deforming surface meshes
- Creator
- Wang, Xiaojun (Graduate of Michigan State University)
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
"We explore methods for visually plausible fluid simulation on deforming surfaces with inhomogeneous diffusion properties. While there are methods for fluid simulation on surfaces, not much research effort focused on the influence of the motion of underlying surface, in particular when it is not a rigid surface, such as knitted or woven textiles in motion. The complexity involved makes the simulation challenging to account for the non-inertial local frames typically used to describe the...
Show more"We explore methods for visually plausible fluid simulation on deforming surfaces with inhomogeneous diffusion properties. While there are methods for fluid simulation on surfaces, not much research effort focused on the influence of the motion of underlying surface, in particular when it is not a rigid surface, such as knitted or woven textiles in motion. The complexity involved makes the simulation challenging to account for the non-inertial local frames typically used to describe the motion and the anisotropic effects in diffusion, absorption, adsorption. Thus, our primary goal is to enable fast and stable method for such scenarios. First, in preparation of the material properties for the surface domain, we describe textiles with salient feature direction by bulk material property tensors in order to reduce the complexity, by employing 2D homogenization technique, which effectively turns microscale inhomogeneous properties into homogeneous properties in macroscale descriptions. We then use standard texture mapping techniques to map these tensors to triangles in the curved surface mesh, taking into account the alignment of each local tangent space with correct feature directions of the macroscale tensor. We show that this homogenization tool is intuitive, flexible and easily adjusted. Second, for efficient description of the deforming surface, we offer a new geometry representation for the surface with solely angles instead of vertex coordinates, to reduce storage for the motion of underlying surface. Since our simulation tool relies heavily on long sequences of 3D curved triangular meshes, it is worthwhile exploring such efficient representations to make our tool practical by reducing the memory access during real-time simulations as well as reducing the file sizes. Inspired by angle-based representations for tetrahedral meshes, we use spectral method to restore curved surface using both angles of the triangles and dihedral angles between adjacent triangles in the mesh. Moreover, in many surface deformation sequences, it is often sufficient to update the dihedral angles while keeping the triangle interior angles fixed. Third, we propose a framework for simulating various effects of fluid flowing on deforming surfaces. We directly applied our simulator on curved surface meshes instead of in parameter domains, whereas many existing simulation methods require a parameterization on the surface. We further demonstrate that fictitious forces induced by the surface motion can be added to the surface-based simulation at a small additional cost. These fictitious forces can be decomposed into different components. Only the rectilinear and Coriolis components are relevant to our choice of local frames. Other effects, such as diffusion, adsorption, absorption, and evaporation are also incorporated for realistic stain simulation. Finally, we explore the extraction of Lagrangian Coherent Structure (LCS), which is often referred to as the skeleton of fluid motion. The LCS structures are often described by ridges of the finite time Lyapunov exponent (FTLE) fields, which describe the extremal stretching of fluid parcels following the flow. We proposed a novel improvement to the ridge marching algorithm, which extract such ridges robustly for the typically noisy FTLE estimates even in well-defined fluid flows. Our results are potentially applicable to visualizing and controlling fluid trajectory patterns. In contrast to current methods for LCS calculation, which are only applicable to flat 2D or 3D domains and sensitive to noise, our ridge extraction is readily applicable to curved surfaces even when they are deforming. The collection of these computational tools will facilitate generation of realistic and easy to adjust surface fluid animation with various physically plausible effects on surface."--Pages ii-iii.
Show less
- Title
- Hidden Markov model-based homology search and gene prediction in NGS ERA
- Creator
- Techa-angkoon, Prapaporn
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
The exponential cost reduction of next-generation sequencing (NGS) enabled researchers to sequence a large number of organisms in order to answer various questions in biology, ecology, health, etc. For newly sequenced genomes, gene prediction and homology search against characterized protein sequence databases are two fundamental tasks for annotating functional elements in the genomes. The main goal of gene prediction is to identify the gene locus and their structures. As there is...
Show moreThe exponential cost reduction of next-generation sequencing (NGS) enabled researchers to sequence a large number of organisms in order to answer various questions in biology, ecology, health, etc. For newly sequenced genomes, gene prediction and homology search against characterized protein sequence databases are two fundamental tasks for annotating functional elements in the genomes. The main goal of gene prediction is to identify the gene locus and their structures. As there is accumulating evidence showing important functions of RNAs (ncRNAs), comprehensive gene prediction should include both protein-coding genes and ncRNAs. Homology search against protein sequences can aid identification of functional elements in genomes. Although there are intensive research in the fields of gene prediction, ncRNA search, and homology search, there are still unaddressed challenges. In this dissertation, I made contributions in these three areas. For gene prediction, I designed an HMM-based ab initio gene prediction tool that considers G+C gradient in grass genomes. For homology search, I designed a method that can align short reads against protein families using profile HMMs. For ncRNA search, I designed a ncRNA alignment tool that can align highly structured ncRNAs using only sequence similarity. Below I summarize my contributions.Despite decades of research about gene prediction, existing gene prediction tools are not carefully designed to deal with variant G+C content and 5'-3' changing patterns inside coding regions. Thus, these tools can miss genes with positive or negative G+C gradient in grass genomes such as rice, maize, sorghum, etc. I implemented a tool named AUGUSTUS-GC that accounts for 5'-3' G+C gradient. Our tool can accurately predict protein-coding genes in plant genomes especially grass genomes.A large number of sequencing projects produced short reads from the whole genomes or transcriptomic data. I designed a short reads homology search tool that employs paired-end reads to improve homology search sensitivity. The experimental results show that our tool can achieve significantly better sensitivity and accuracy in aligning short reads that are part of remote homologs.Despite the extensive studies of ncRNA search, the existing tools that heavily depend on the secondary structure in homology search cannot efficiently handle RNA-seq data that is accumulating rapidly. It will be ideal if we can have a faster ncRNA homology search tool with similar accuracy as those adopting secondary structure. I implemented an accurate ncRNA alignment tool called glu-RNA that can achieve similar accuracy to structural alignment tools while keeping the same running time complexity as sequence alignment tools. The experimental results demonstrate that our tool can achieve more accurate alignments than the popular sequence alignment tools and a well-known structural alignment program.
Show less
- Title
- Hardware algorithms for high-speed packet processing
- Creator
- Norige, Eric
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
The networking industry is facing enormous challenges of scaling devices to support theexponential growth of internet traffic as well as increasing number of features being implemented inside the network. Algorithmic hardware improvements to networking componentshave largely been neglected due to the ease of leveraging increased clock frequency and compute power and the risks of implementing complex hardware designs. As clock frequencyslows its growth, algorithmic solutions become important...
Show moreThe networking industry is facing enormous challenges of scaling devices to support theexponential growth of internet traffic as well as increasing number of features being implemented inside the network. Algorithmic hardware improvements to networking componentshave largely been neglected due to the ease of leveraging increased clock frequency and compute power and the risks of implementing complex hardware designs. As clock frequencyslows its growth, algorithmic solutions become important to fill the gap between currentgeneration capability and next generation requirements. This paper presents algorithmicsolutions to networking problems in three domains: Deep Packet Inspection(DPI), firewall(and other) ruleset compression and non-cryptographic hashing. The improvements in DPIare two-pronged: first in the area of application-level protocol field extraction, which allowssecurity devices to precisely identify packet fields for targeted validity checks. By usingcounting automata, we achieve precise parsing of non-regular protocols with small, constantper-flow memory requirements, extracting at rates of up to 30gbps on real traffic in softwarewhile using only 112 bytes of state per flow. The second DPI improvement is on the longstanding regular expression matching problem, where we complete the HFA solution to theDFA state explosion problem with efficient construction algorithms and optimized memorylayout for hardware or software implementation. These methods construct automata toocomplex to be constructed by previous methods in seconds, while being capable of 29gbpsthroughput with an ASIC implementation. Firewall ruleset compression enables more firewall entries to be stored in a fixed capacity pattern matching engine, and can also be usedto reorganize a firewall specification for higher performance software matching. A novelrecursive structure called TUF is given to unify the best known solutions to this problemand suggest future avenues of attack. These algorithms, with little tuning, achieve a 13.7%improvement in compression on large, real-life classifiers, and can achieve the same results asexisting algorithms while running 20 times faster. Finally, non-cryptographic hash functionscan be used for anything from hash tables to track network flows to packet sampling fortraffic characterization. We give a novel approach to generating hardware hash functionsin between the extremes of expensive cryptographic hash functions and low quality linearhash functions. To evaluate these mid-range hash functions properly, we develop new evaluation methods to better distinguish non-cryptographic hash function quality. The hashfunctions described in this paper achieve low-latency, wide hashing with good avalanche anduniversality properties at a much lower cost than existing solutions.
Show less
- Title
- Automated addition of fault-tolerance via lazy repair and graceful degradation
- Creator
- Lin, Yiyan
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
In this dissertation, we concentrate on the problem of automated addition of fault-tolerance that transforms a fault-intolerant program to be a fault-tolerant program. We solve this problem via model repair. Model repair is a correct-by-construct technique to revise an existing model so that the revised model satisfies the given correctness criteria, such as safety, liveness, or fault-tolerance. We consider two problems of using model repair to add fault-tolerance. First, if the repaired...
Show moreIn this dissertation, we concentrate on the problem of automated addition of fault-tolerance that transforms a fault-intolerant program to be a fault-tolerant program. We solve this problem via model repair. Model repair is a correct-by-construct technique to revise an existing model so that the revised model satisfies the given correctness criteria, such as safety, liveness, or fault-tolerance. We consider two problems of using model repair to add fault-tolerance. First, if the repaired model violates the assumptions (e.g., partial observability, inability to detect crashed processes, etc) made in the underlying system, then it cannot be implemented. We denote these requirements as realizability constraints. Second, the addition of fault-tolerance may fail if the program cannot fully recover after certain faults occur. In this dissertation, we propose a lazy repair approach to address realizability issues in adding fault-tolerance. Additionally, we propose a technique to automatically add graceful degradation to a program, so that the program can recover with partial functionality (that is identified by the designer to be the critical functionality) if full recovery is impossible.A model repair technique transforms a model to another model that satisfies a new set of properties. Such a transformation should also maintain the mapping between the model and the underlying program. For example, in a distributed program, every process is restricted to read (or write) some variables in other processes. A model that represents this program should also disallow the process to read (or write) those inaccessable variables. If these constraints are violated, then the corresponding model will be unrealizable. An unrealizable model (in this context, a model that violates the read/write restrictions) may make it impossible to obtain the corresponding implementation.%In this dissertation, we call the read (or write) restriction as a realizability constraint in distributed systems. An unrealizable model (a model that violates the realizability constraints) may complicate the implementation by introducing extra amount of modification to the program. Such modification may in turn break the program's correctness.Resolving realizability constraints increases the complexity of model repair. Existing model repair techniques introduce heuristics to reduce the complexity. However, this heuristic-based approach is designed and optimized specifically for distributed programs. We need a more generic model repair approach for other types of programs, e.g., synchronous programs, cyber-physical programs, etc. Hence, in this dissertation, we propose a model repair technique, i.e., lazy repair, to add fault-tolerance to programs with different types of realizability constraints. It involves two steps. First, we only focus on repairing to obtain a model that satisfies correctness criteria while ignoring realizability constraints. In the second step, we repair this model further by removing behaviors while ensuring that the desired specification is preserved. The lazy repair approach simplifies the process of developing heuristics, and provides a tradeoff in terms of the time saved in the first step and the extra work required in the second step. We demonstrate that lazy repair is applicable in the context of distributed systems, synchronous systems and cyber-physical systems.In addition, safety critical systems such as airplanes, automobiles and elevators should operate with high dependability in the presence of faults. If the occurrence of faults breaks down some components, the system may not be able to fully recover. In this scenario, the system can still operate with remaining resources and deliver partial but core functionality, i.e., to display graceful degradation. Existing model repair approaches, such as addition of fault-tolerance, cannot transform a program to provide graceful degradation. In this dissertation, we propose a technique to add fault-tolerance to a program with graceful degradation. In the absence of faults, such a program exhibits ideal behaviors. In the presence of faults, the program is allowed to recover with reduced functionality. This technique involves two steps. First, it automatically generates a program with graceful degradation based on the input fault-intolerant program. Second, it adds fault-tolerance to the output program from first step. We demonstrate that this technique is applicable in the context of high atomicity programs as well as low atomicity programs (i.e., distributed programs). We also present a case study on adding multi-graceful degradation to a dangerous gas detection and ventilation system. Through this case study, we show that our approach can assist the designer to obtain a program that behaves like the deployed system.
Show less
- Title
- Computational identification and analysis of non-coding RNAs in large-scale biological data
- Creator
- Lei, Jikai
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Non-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing...
Show moreNon-protein-coding RNAs (ncRNAs) are RNA molecules that function directly at the level of RNA without translating into protein. They play important biological functions in all three domains of life, i.e. Eukarya, Bacteria and Archaea. To understand the working mechanisms and the functions of ncRNAs in various species, a fundamental step is to identify both known and novel ncRNAs from large-scale biological data.Large-scale genomic data includes both genomic sequence data and NGS sequencing data. Both types of genomic data provide great opportunity for identifying ncRNAs. For genomic sequence data, a lot of ncRNA identification tools that use comparative sequence analysis have been developed. These methods work well for ncRNAs that have strong sequence similarity. However, they are not well-suited for detecting ncRNAs that are remotely homologous. Next generation sequencing (NGS), while it opens a new horizon for annotating and understanding known and novel ncRNAs, also introduces many challenges. First, existing genomic sequence searching tools can not be readily applied to NGS data because NGS technology produces short, fragmentary reads. Second, most NGS data sets are large-scale. Existing algorithms are infeasible on NGS data because of high resource requirements. Third, metagenomic sequencing, which utilizes NGS technology to sequence uncultured, complex microbial communities directly from their natural inhabitants, further aggravates the difficulties. Thus, massive amount of genomic sequence data and NGS data calls for efficient algorithms and tools for ncRNA annotation.In this dissertation, I present three computational methods and tools to efficiently identify ncRNAs from large-scale biological data. Chain-RNA is a tool that combines both sequence similarity and structure similarity to locate cross-species conserved RNA elements with low sequence similarity in genomic sequence data. It can achieve significantly higher sensitivity in identifying remotely conserved ncRNA elements than sequence based methods such as BLAST, and is much faster than existing structural alignment tools. miR-PREFeR (miRNA PREdiction From small RNA-Seq data) utilizes expression patterns of miRNA and follows the criteria for plant microRNA annotation to accurately predict plant miRNAs from one or more small RNA-Seq data samples. It is sensitive, accurate, fast and has low-memory footprint. metaCRISPR focuses on identifying Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) from large-scale metagenomic sequencing data. It uses a kmer hash table to efficiently detect reads that belong to CRISPRs from the raw metagonmic data set. Overlap graph based clustering is then conducted on the reduced data set to separate different CRSIPRs. A set of graph based algorithms are used to assemble and recover CRISPRs from the clusters.
Show less
- Title
- Novel computational approaches to investigate microbial diversity
- Creator
- Zhang, Qingpeng
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Species diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error...
Show moreSpecies diversity is an important measurement of ecological communities.Scientists believe that there is a strong relationship between speciesdiversity and ecosystem processes. However efforts to investigate microbialdiversity using whole genome shotgun reads data are still scarce. With novel applications of data structuresand the development of novel algorithms, firstly we developed an efficient k-mer countingapproach and approaches to enable scalable streaming analysis of large and error-prone short-read shotgun data sets. Then based on these efforts, we developed a statistical framework allowing for scalable diversity analysis of large,complex metagenomes without the need for assembly or reference sequences. Thismethod is evaluated on multiple large metagenomes from differentenvironments, such as seawater, human microbiome, soil. Given the velocity ingrowth of sequencing data, this method is promising for analyzing highlydiverse samples with relatively low computational requirements. Further, as themethod does not depend on reference genomes, it also provides opportunities totackle the large amounts of unknowns we find in metagenomicdatasets.
Show less
- Title
- Statistical and learning algorithms for the design, analysis, measurement, and modeling of networking and security systems
- Creator
- Shahzad, Muhammad (College teacher)
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
"The goal of this thesis is to develop statistical and learning algorithms for the design, analysis, measurement, and modeling of networking and security systems with specific focus on RFID systems, network performance metrics, user security, and software security. Next, I give a brief overview of these four areas of focus." -- Abstract.
- Title
- Near duplicate image search
- Creator
- Li, Fengjie
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Information retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e...
Show moreInformation retrieval addresses the fundamental problem of how to identify the objects from database that satisfies the information needs of users. Facing the information overload, the major challenge in search algorithm design is to ensure that useful information can be found both accurately and efficiently from large databases.To address this challenge, different indexing and retrieval methods had been proposed for different types of data, namely sparse data (e.g. documents), dense data (e.g. dense feature vectors) and bag-of-features (e.g. local feature represented images). For sparse data, inverted index and document retrieval models had been proved to be very effective for large scale retrieval problems. For dense data and bag-of-feature data, however, there are still some open problems. For example, Locality Sensitive Hashing, a state-of-the-art method for searching high dimensional vectors, often fails to make a good tradeoff between precision and recall. Namely, it tends to achieve high preci- sion but with low recall or vice versa. The bag-of-words model, a popular approach for searching objects represented bag-of-features, has a limited performance because of the information loss during the quantization procedure.Since the general problem of searching objects represented in dense vectors and bag-of-features may be too challenging, in this dissertation, we focus on nearly duplicate search, in which the matched objects is almost identical to the query. By effectively exploring the statistical proper- ties of near duplicities, we will be able to design more effective indexing schemes and search algorithms. Thus, the focus of this dissertation is to design new indexing methods and retrieval algorithms, for near duplicate search in large scale databases, that accurately capture the data simi- larity and delivers more accurate and efficient search. Below, we summarize the main contributions of this dissertation:Our first contribution is a new algorithm for searching near duplicate bag-of-features data. The proposed algorithm, named random seeding quantization, is more efficient in generating bag-of- words representations for near duplicate images. The new scheme is motivated by approximating the optimal partial matching between bag-of-features, and thus produces a bag-of-words representation capturing the true similarities of the data, leading to more accurate and efficient retrieval of bag-of-features data.Our second contribution, termed Random Projection Filtering, is a search algorithm designed for efficient near duplicate vector search. By explicitly exploiting the statistical properties of near duplicity, the algorithm projects high dimensional vectors into lower dimensional space and filter out irrelevant items. Our effective filtering procedure makes RPF more accurate and efficient to identify nearly duplicate objects in databases.Our third contribution is to develop and evaluate a new randomized range search algorithm for near duplicate vectors in high dimensional spaces, termed as Random Projection Search. Different from RPF, the algorithm presented in this chapter is suitable for a wider range of applications be- cause it does not require the sparsity constrains for high search accuracy. The key idea is to project both the data points and the query point into an one dimensional space by a random projection, and perform one dimensional range search to find the subset of data points that are within the range of a given query using binary search. We prove the theoretical guarantee for the proposed algorithm and evaluate its empirical performance on a dataset of 1.1 billion image features.
Show less
- Title
- Geometric and topological modeling techniques for large and complex shapes
- Creator
- Feng, Xin
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
The past few decades have witnessed the incredible advancements in modeling, digitizing and visualizing techniques for three–dimensional shapes. Those advancements led to an explosion in the number of three–dimensional models being created for design, manufacture, architecture, medical imaging, etc. At the same time, the structure, function, stability, and dynamics of proteins, subcellular structures, organelles, and multiprotein complexes have emerged as a leading interest in...
Show moreThe past few decades have witnessed the incredible advancements in modeling, digitizing and visualizing techniques for three–dimensional shapes. Those advancements led to an explosion in the number of three–dimensional models being created for design, manufacture, architecture, medical imaging, etc. At the same time, the structure, function, stability, and dynamics of proteins, subcellular structures, organelles, and multiprotein complexes have emerged as a leading interest in structural biology, another major source of large and complex geometric models. Geometric modeling not only provides visualizations of shapes for large biomolecular complexes but also fills the gap between structural information and theoretical modeling, and enables the understanding of function, stability, and dynamics.We first propose, for tessellated volumes of arbitrary topology, a compact data structure that offers constant–time–complexity incidence queries among cells of any dimensions. Our data structure is simple to implement, easy to use, and allows for arbitrary, user–defined 3–cells such as prisms and hexahedra, while remaining highly efficient in memory usage compared to previous work. We also provide the analysis on its time complexity for commonly–used incidence and adjacency queries such as vertex and edge one–rings.We then introduce a suite of computational tools for volumetric data processing, information extraction, surface mesh rendering, geometric measurement, and curvature estimation for biomolecular complexes. Particular emphasis is given to the modeling of Electron Microscopy Data Bank (EMDB) data and Protein Data Bank (PDB) data. Lagrangian and Cartesian representations are discussed for the surface presentation. Based on these representations, practical algorithms are developed for surface area and surface–enclosed volume calculation, and curvature estimation. Methods for volumetric meshing have also been presented. Because the technological development in computer science and mathematics has led to a variety of choices at each stage of the geometric modeling, we discuss the rationales in the design and selection of various algorithms. Analytical test models are designed to verify the computational accuracy and convergence of proposed algorithms. We selected six EMDB data and six PDB data to demonstrate the efficacy of the proposed algorithms in handling biomolecular surfaces and explore their capability of geometric characterization of binding targets. Thus, our toolkit offers a comprehensive protocol for the geometric modeling of proteins, subcellular structures, organelles, and multiprotein complexes.Furthermore, we present a method for computing “choking” loops—a set of surface loops that describe the narrowing of the volumes inside/outside of the surface and extend the notion of surface homology and homotopy loops. The intuition behind their definition is that a choking loop represents the region where an offset of the original surface would get pinched. Our generalized loops naturally include the usual2g handles/tunnels computed based on the topology of the genus–g surface, but also include loops that identify chokepoints or bottlenecks, i.e., boundaries of small membranes separating the inside or outside volume of the surface into disconnected regions. Our definition is based on persistent homology theory, which gives a measure to topological structures, thus providing resilience to noise and a well–defined way to determine topological feature size.Finally, we explore the application of persistent homology theory in protein folding analysis. The extremely complex process of protein folding brings challenges for both experimental study and theoretical modeling. The persistent homology approach studies the Euler characteristics of the protein conformations during the folding process. More precisely, the persistence is measured by the variation of van der Waals radius, which leads to the change of protein 3D structures and uncovers the inter–connectivity. Our results on fullerenes demonstrate the potential of our geometric and topological approach to protein stability analysis.
Show less
- Title
- The evolutionary potential of populations on complex fitness landscapes
- Creator
- Bryson, David Michael
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Evolution is a highly contingent process, where the quality of the solutions produced is affected by many factors. I explore and describe the contributions of three such aspects that influence overall evolutionary potential: the prior history of a population, the type and frequency of mutations that the organisms are subject to, and the composition of the underlying genetic hardware. I have systematically tested changes to a digital evolution system, Avida, measuring evolutionary potential in...
Show moreEvolution is a highly contingent process, where the quality of the solutions produced is affected by many factors. I explore and describe the contributions of three such aspects that influence overall evolutionary potential: the prior history of a population, the type and frequency of mutations that the organisms are subject to, and the composition of the underlying genetic hardware. I have systematically tested changes to a digital evolution system, Avida, measuring evolutionary potential in seven different computational environments ranging in complexity of the underlying fitness landscapes. I have examined trends and general principles that these measurements demonstrate and used my results to optimize the evolutionary potential of the system, broadly enhancing performance. The results of this work show that history and mutation rate play significant roles in evolutionary potential, but the final fitness levels of populations are remarkably stable to substantial changes in the genetic hardware and a broad range of mutation types.
Show less
- Title
- A study of Bluetooth Frequency Hopping sequence : modeling and a practical attack
- Creator
- Albazrqaoe, Wahhab
- Date
- 2011
- Collection
- Electronic Theses & Dissertations
- Description
-
The Bluetooth is a wireless interface that enables electronic devices to establish short-range, ad-hoc wireless connections. This kind of short-range wireless networking is known as Wireless Personal Area Networks (WPAN). Because of its attractive features of small size, low cost, and low power, Bluetooth gains a world wide usage. It is embedded in many portable computing devices and considered as a good replacement for local wire connections. Since wireless data is inherently exposed to...
Show moreThe Bluetooth is a wireless interface that enables electronic devices to establish short-range, ad-hoc wireless connections. This kind of short-range wireless networking is known as Wireless Personal Area Networks (WPAN). Because of its attractive features of small size, low cost, and low power, Bluetooth gains a world wide usage. It is embedded in many portable computing devices and considered as a good replacement for local wire connections. Since wireless data is inherently exposed to eavesdropping, the security and confidentiality is a central issue for wireless standard as well as Bluetooth. To maintain security and confidentiality of wireless packets, the Bluetooth system mainly relies on the Frequency Hopping mechanism to equivocate an adversary. By this technique, a wireless channel is accessed for transmitting a packet. For each wireless packet, a single channel is selected in a pseudo random way. This kind of randomness in channel selection makes it difficult for an eavesdropped to predict the next channel to be accessed. Hence, capturing Bluetooth wireless packets is a challenge. In this work, we investigate the Frequency Hopping sequence and specifically the hop selection kernel. We analyze the operation of the kernel hardware by partitioning it into three parts. Based on this modeling, we propose an attacking method for the hop selection kernel. The proposed method shows how to expose the clock value hidden in the kernel. This helps to predict Bluetooth hopping sequence and, hence, capturing Bluetooth wireless packet is possible.
Show less