You are here
Search results
(1 - 20 of 38)
Pages
- Title
- Semi=supervised learning with side information : graph-based approaches
- Creator
- Liu, Yi
- Date
- 2007
- Collection
- Electronic Theses & Dissertations
- Title
- Some contributions to semi-supervised learning
- Creator
- Mallapragada, Paven Kumar
- Date
- 2010
- Collection
- Electronic Theses & Dissertations
- Title
- Algorithms for deep packet inspection
- Creator
- Patel, Jignesh
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
The core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string...
Show moreThe core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string matching for network applications. Today most DPI systems use Regular Expression (RE) to represent signatures. RE matching is more diffcult than string matching, and current string matching solutions don't work well for REs. RE matching for networking applications is diffcult for several reasons. First, the DPI application is usually implemented in network devices, which have limited computing resources. Second, as new threats are discovered, size of the signature set grows over time. Last, the matching needs to be done at network speeds, the growth of which out paces improvements in computing speed; so there is a need for novel solutions that can deliver higher throughput. So RE matching for DPI is a very important and active research area.In our research, we investigate the existing methods proposed for RE matching, identify their limitations, and propose new methods to overcome these limitations. RE matching remains a fundamentally challenging problem due to the diffculty in compactly encoding DFA. While the DFA for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, many alternative automata implementations that compress the size of the final automaton have been proposed. However, previously proposed automata construction algorithms employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to expensive minimization on a large automata, and a large intermediate memory footprint. We propose a “Minimize then Union” framework for constructing compact alternative automata, which minimizes smaller automata first before combining them. This approach required much less time and memory, allowing us to handle a much larger RE set. Prior hardware based RE matching algorithms typically use FPGA. The drawback of FPGA is that resynthesizing and updating FPGA circuitry to handle RE updates is slow and diffcult. We propose the first hardware-based RE matching approach that uses Ternary Content Addressable Memory (TCAM). TCAMs have already been widely used in modern networking devices for tasks such as packet classification, so our solutions can be easily deployed. Our methods support easy RE updates, and we show that we can achieve very high throughput. The main reason combined DFAs for multiple REs grow exponentially in size is because of replication of states. We developed a new overlay automata model which exploit this replication to compress the size of the DFA. The idea is to group together the replicated DFA structures instead of repeating them multiple times. The result is that we get a final automata size that is close to that of a NFA (which is linear in the size of the RE set), and simultaneously achieve fast deterministic matching speed of a DFA.
Show less
- Title
- Applying evolutionary computation techniques to address environmental uncertainty in dynamically adaptive systems
- Creator
- Ramirez, Andres J.
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
A dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of...
Show moreA dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of system and environmental conditions that a DAS might encounter throughout its lifetime. Nevertheless, a DAS must continuously satisfy its requirements despite the threat that this uncertainty poses to its adaptation capabilities. This dissertation proposes a model-based framework that supports the specification, monitoring, and dynamic reconfiguration of a DAS to explicitly address uncertainty. The proposed framework uses goal-oriented requirements models and evolutionary computation techniques to derive and fine-tune utility functions for requirements monitoring in a DAS, identify combinations of system and environmental conditions that adversely affect the behavior of a DAS, and generate adaptations on-demand to transition the DAS to a target system configuration while preserving system consistency. We demonstrate the capabilities of our model-based framework by applying it to an industrial case study involving a remote data mirroring network that efficiently distributes data even as network links fail and messages are dropped, corrupted, and delayed.
Show less
- Title
- Finding optimized bounding boxes of polytopes in d-dimensional space and their properties in k-dimensional projections
- Creator
- Shahid, Salman (Of Michigan State University)
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Using minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in d-dimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the k-dimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
- Title
- Non-coding RNA identification in large-scale genomic data
- Creator
- Yuan, Cheng
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes....
Show moreNoncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes. Identification of novel and known ncRNAs becomes increasingly important in order to understand their functionalities and the underlying communities.Next-generation sequencing (NGS) technology sheds lights on more comprehensive and sensitive ncRNA annotation. Lowly transcribed ncRNAs or ncRNAs from rare species with low abundance may be identified via deep sequencing. However, there exist several challenges in ncRNA identification in large-scale genomic data. First, the massive volume of datasets could lead to very long computation time, making existing algorithms infeasible. Second, NGS has relatively high error rate, which could further complicate the problem. Third, high sequence similarity among related ncRNAs could make them difficult to identify, resulting in incorrect output. Fourth, while secondary structures should be adopted for accurate ncRNA identification, they usually incur high computational complexity. In particular, some ncRNAs contain pseudoknot structures, which cannot be effectively modeled by the state-of-the-art approach. As a result, ncRNAs containing pseudoknots are hard to annotate.In my PhD work, I aimed to tackle the above challenges in ncRNA identification. First, I designed a progressive search pipeline to identify ncRNAs containing pseudoknot structures. The algorithms are more efficient than the state-of-the-art approaches and can be used for large-scale data. Second, I designed a ncRNA classification tool for short reads in NGS data lacking quality reference genomes. The initial homology search phase significantly reduces size of the original input, making the tool feasible for large-scale data. Last, I focused on identifying 16S ribosomal RNAs from NGS data. 16S ribosomal RNAs are very important type of ncRNAs, which can be used for phylogenic study. A set of graph based assembly algorithms were applied to form longer or full-length 16S rRNA contigs. I utilized paired-end information in NGS data, so lowly abundant 16S genes can also be identified. To reduce the complexity of problem and make the tool practical for large-scale data, I designed a list of error correction and graph reduction techniques for graph simplification.
Show less
- Title
- Multiple kernel and multi-label learning for image categorization
- Creator
- Bucak, Serhat Selçuk
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning...
Show more"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning algorithms with high prediction performance for image categorization... " -- Abstract.
Show less
- Title
- Gender-related effects of advanced placement computer science courses on self-efficacy, belongingness, and persistence
- Creator
- Good, Jonathon Andrew
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
The underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K-12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCS-P) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCS-P allowed teachers to present computer...
Show moreThe underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K-12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCS-P) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCS-P allowed teachers to present computer science from a broad perspective, allowing students to pursue problems of personal significance, and allowing for computing projects to take a variety of forms. The nationwide enrollment statistics for Advanced Placement Computer Science Principles in 2017 had a higher proportion of female students (30.7%) than Advanced Placement Computer Science A (23.6%) courses. However, it is unknown to what degree enrollment in these courses was related to students’ plans to enroll in future computer science courses. This correlational study examined how students’ enrollment in Advanced Placement Computer Science courses, along with student gender, predicted students’ sense of computing self-efficacy, belongingness, and expected persistence in computer science. A nationwide sample of 263 students from 10 APCS-P and 10 APCS-A courses participated in the study. Students completed pre and post surveys at the beginning and end of their Fall 2017 semester regarding their computing self-efficacy, belongingness, and plans to continue in computer science studies. Using hierarchical linear modeling analysis due to the nested nature of the data within class sections, the researcher found that the APCS course type was not predictive of self-efficacy, belongingness, or expectations to persist in computer science. The results suggested that female students’ self-efficacy declined over the course of the study. However, gender was not predictive of belongingness or expectations to persist in computer science. Students were found to have entered into both courses with high a sense of self-efficacy, belongingness, and expectation to persist in computer science.The results from this suggests that students enrolled in both Advanced Placement Computer Science courses are already likely to pursue computer science. I also found that the type of APCS course in which students enroll does not relate to students’ interest in computer science. This suggests that educators should look beyond AP courses as a method of exposing students to computer science, possibly through efforts such as computational thinking and cross-curricular uses of computer science concepts and practices. Educators and administrators should also continue to examine whether there are structural biases in how students are directed to computer science courses. As for the drop in self-efficacy related to gender, this in alignment with previous research suggesting that educators should carefully scaffold students’ initial experiences in the course to not negatively influence their self-efficacy. Further research should examine how specific pedagogical practices could influence students’ persistence, as the designation and curriculum of APCS-A or APCS-P alone may not capture the myriad of ways in which teachers may be addressing gender inequity in their classrooms. Research can also examine how student interest in computer science is affected at an earlier age, as the APCS courses may be reaching students after they have already formed their opinions about computer science as a field.
Show less
- Title
- Energy Conservation in Heterogeneous Smartphone Ad Hoc Networks
- Creator
- Mariani, James
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
In recent years mobile computing has been rapidly expanding to the point that there are now more devices than there are people. While once it was common for every household to have one PC, it is now common for every person to have a mobile device. With the increased use of smartphone devices, there has also been an increase in the need for mobile ad hoc networks, in which phones connect directly to each other without the need for an intermediate router. Most modern smart phones are equipped...
Show moreIn recent years mobile computing has been rapidly expanding to the point that there are now more devices than there are people. While once it was common for every household to have one PC, it is now common for every person to have a mobile device. With the increased use of smartphone devices, there has also been an increase in the need for mobile ad hoc networks, in which phones connect directly to each other without the need for an intermediate router. Most modern smart phones are equipped with both Bluetooth and Wifi Direct, where Wifi Direct has a better transmission range and rate and Bluetooth is more energy efficient. However only one or the other is used in a smartphone ad hoc network. We propose a Heterogeneous Smartphone Ad Hoc Network, HSNet, a framework to enable the automatic switching between Wifi Direct and Bluetooth to emphasize minimizing energy consumption while still maintaining an efficient network. We develop an application to evaluate the HSNet framework which shows significant energy savings when utilizing our switching algorithm to send messages by a less energy intensive technology in situations where energy conservation is desired. We discuss additional features of HSNet such as load balancing to help increase the lifetime of the network by more evenly distributing slave nodes among connected master nodes. Finally, we show that the throughput of our system is not affected due to technology switching for most scenarios. Future work of this project includes exploring energy efficient routing as well as simulation/scale testing for larger and more diverse smartphone ad hoc networks.
Show less
- Title
- Network analysis with negative links
- Creator
- Derr, Tyler Scott
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis...
Show moreAs we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Traditionally, network analysis has focused on networks having only positive links, or unsigned networks. However, in many real-world systems, relations between nodes in a graph can be both positive and negative, or signed networks. For example, in online social media, users not only have positive links such as friends, followers, and those they trust, but also can establish negative links to those they distrust, towards their foes, or block and unfriend users.Thus, although signed networks are ubiquitous due to their ability to represent negative links in addition to positive links, they have been significantly under explored. In addition, due to the rise in popularity of today's social media and increased polarization online, this has led to both an increased attention and demand for advanced methods to perform the typical network analysis tasks when also taking into consideration negative links. More specifically, there is a need for methods that can measure, model, mine, and apply signed networks that harness both these positive and negative relations. However, this raises novel challenges, as the properties and principles of negative links are not necessarily the same as positive links, and furthermore the social theories that have been used in unsigned networks might not apply with the inclusion of negative links.The chief objective of this dissertation is to first analyze the distinct properties negative links have as compared to positive links and towards improving network analysis with negative links by researching the utility and how to harness social theories that have been established in a holistic view of networks containing both positive and negative links. We discover that simply extending unsigned network analysis is typically not sufficient and that although the existence of negative links introduces numerous challenges, they also provide unprecedented opportunities for advancing the frontier of the network analysis domain. In particular, we develop advanced methods in signed networks for measuring node relevance and centrality (i.e., signed network measuring), present the first generative signed network model and extend/analyze balance theory to signed bipartite networks (i.e., signed network modeling), construct the first signed graph convolutional network which learns node representations that can achieve state-of-the-art prediction performance and then furthermore introduce the novel idea of transformation-based network embedding (i.e., signed network mining), and apply signed networks by creating a framework that can infer both link and interaction polarity levels in online social media and constructing an advanced comprehensive congressional vote prediction framework built around harnessing signed networks.
Show less
- Title
- LIDAR AND CAMERA CALIBRATION USING A MOUNTED SPHERE
- Creator
- Li, Jiajia
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Extrinsic calibration between lidar and camera sensors is needed for multi-modal sensor data fusion. However, obtaining precise extrinsic calibration can be tedious, computationally expensive, or involve elaborate apparatus. This thesis proposes a simple, fast, and robust method performing extrinsic calibration between a camera and lidar. The only required calibration target is a hand-held colored sphere mounted on a whiteboard. The convolutional neural networks are developed to automatically...
Show moreExtrinsic calibration between lidar and camera sensors is needed for multi-modal sensor data fusion. However, obtaining precise extrinsic calibration can be tedious, computationally expensive, or involve elaborate apparatus. This thesis proposes a simple, fast, and robust method performing extrinsic calibration between a camera and lidar. The only required calibration target is a hand-held colored sphere mounted on a whiteboard. The convolutional neural networks are developed to automatically localize the sphere relative to the camera and the lidar. Then using the localization covariance models, the relative pose between the camera and lidar is derived. To evaluate the accuracy of our method, we record image and lidar data of a sphere at a set of known grid positions by using two rails mounted on a wall. The accurate calibration results are demonstrated by projecting the grid centers into the camera image plane and finding the error between these points and the hand-labeled sphere centers.
Show less
- Title
- I AM DOING MORE THAN CODING : A QUALITATIVE STUDY OF BLACK WOMEN HBCU UNDERGRADUATES’ PERSISTENCE IN COMPUTING
- Creator
- Benton, Amber V.
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
The purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small...
Show moreThe purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small stories qualitative approach to examine the day-to-day experiences of Black women undergraduates at HBCUs as they persisted in their computing degree programs. The findings suggest that: (a) gender underrepresentation in computing affects Black women’s experiences, (b) computing culture at HBCUs directly affect Black women in computing, (c) Black women need access to resources and opportunities to persist in computing, (d) computing-related internships are beneficial professional opportunities but are also sites of gendered racism for Black women, (e) connectedness between Black people is innate but also needs to be fostered, (f) Black women want to engage in computing that contributes to social impact and community uplift, and (g) science identity is not a primary identity for Black women in computing. This paper also argues that disciplinary focused efforts contribute to the persistence of Black women in computing.
Show less
- Title
- MICROBLOG GUIDED CRYPTOCURRENCY TRADING AND FRAMING ANALYSIS
- Creator
- Pawlicka Maule, Anna Paula
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
With 56 million people actively trading and investing in cryptocurrency online and globally, there is an increasing need for an automatic social media analysis tool to help understand trading discourse and behavior. Previous works have shown the usefulness of modeling microblog discourse for the prediction of trading stocks and their price fluctuations, as well as content framing. In this work, I present a natural language modeling pipeline that leverages language and social network behaviors...
Show moreWith 56 million people actively trading and investing in cryptocurrency online and globally, there is an increasing need for an automatic social media analysis tool to help understand trading discourse and behavior. Previous works have shown the usefulness of modeling microblog discourse for the prediction of trading stocks and their price fluctuations, as well as content framing. In this work, I present a natural language modeling pipeline that leverages language and social network behaviors for the prediction of cryptocurrency day trading actions and their associated framing patterns. Specifically, I present two modeling approaches. The first determines if the tweets of a 24-hour period can be used to guide day trading behavior, specifically if a cryptocurrency investor should buy, sell, or hold their cryptocurrencies in order to make a trading profit. The second is an unsupervised deep clustering approach to automatically detect framing patterns. My contributions include the modeling pipeline for this novel task, a new dataset of cryptocurrency-related tweets from influential accounts, and a transaction volume dataset. The experiments executed show that this weakly-supervised trading pipeline achieves an 88.78% accuracy for day trading behavior predictions and reveals framing fluctuations prior to and during the COVID-19 pandemic that could be used to guide investment actions.
Show less
- Title
- Discrete de Rham-Hodge Theory
- Creator
- Zhao, Rundong
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
We present a systematic treatment to 3D shape analysis based on the well-established de Rham-Hodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are...
Show moreWe present a systematic treatment to 3D shape analysis based on the well-established de Rham-Hodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are summarized in the following aspects. First, we present a compendium of discrete Hodge decompositions of vector fields, which provides the primary building block of the de Rham-Hodge theory for computations performed on the commonly used tetrahedral meshes embedded in the 3D Euclidean space. Second, we present a real-world application of the above computational tool to 3D shape analysis on biological macromolecules. Finally, we extend the above method to an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. Our work on the decomposition of vector fields, spectral shape analysis on static shapes, and evolving shapes has already shown its effectiveness in biomolecular applications and will lead to a rich set of features for machine learning-based shape analysis currently under development.
Show less
- Title
- Dissertation : novel parallel algorithms and performance optimization techniques for the multi-level fast multipole algorithm
- Creator
- Lingg, Michael
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Since Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the N-Body problem has been a very important tool in physics simulations. Expanding on the early use of the classical N-Body problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical N-Body problem to solve the Helmholtz...
Show moreSince Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the N-Body problem has been a very important tool in physics simulations. Expanding on the early use of the classical N-Body problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical N-Body problem to solve the Helmholtz equation for groups of particles with oscillatory interactions has allowed for simulations that assist in antenna design, radar cross section prediction, reduction of engine noise, and medical devices that utilize sound waves, to name a sample of possible applications. While N-Body simulations are extremely valuable, the computational cost of directly evaluating interactions among all pairs grows quadratically with the number of particles, rendering large scale simulations infeasible even on the most powerful supercomputers. The Fast Multipole Method (FMM) and the broader class of tree algorithms that it belongs to have significantly reduced the computational complexity of N-body simulations, while providing controllable accuracy guarantees. While FMM provided a significant boost, N-body problems tackled by scientists and engineers continue to grow larger in size, necessitating the development of efficient parallel algorithms and implementations to run on supercomputers. The Laplace variant of FMM, which is used to treat the classical N-body problem, has been extensively researched and optimized to the extent that Laplace FMM codes can scale to tens of thousands of processors for simulations involving over trillion particles. In contrast, the Multi-Level Fast Multipole Algorithm (MLFMA), which is aimed for the Helmholtz kernel variant of FMM, lags significantly behind in efficiency and scaling. The added complexity of an oscillatory potential results in much more intricate data dependency patterns and load balancing requirements among parallel processes, making algorithms and optimizations developed for Laplace FMM mostly ineffective for MLFMA. In this thesis, we propose novel parallel algorithms and performance optimization techniques to improve the performance of MLFMA on modern computer architectures. Proposed algorithms and performance optimizations range from efficient leveraging of the memory hierarchy on multi-core processors to an investigation of the benefits of the emerging concept of task parallelism for MLFMA, and to significant reductions of communication overheads and load imbalances in large scale computations. Parallel algorithms for distributed memory parallel MLFMA are also accompanied by detailed complexity analyses and performance models. We describe efficient implementations of all proposed algorithms and optimization techniques, and analyze their impact in detail. In particular, we show that our work yields significant speedups and much improved scalability compared to existing methods for MLFMA in large geometries designed to test the range of the problem space, as well as in real world problems.
Show less
- Title
- Online Learning Algorithms for Mining Trajectory data and their Applications
- Creator
- Wang, Ding
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Trajectories are spatio-temporal data that represent traces of moving objects, such as humans, migrating animals, vehicles, and tropical cyclones. In addition to the geo-location information, a trajectory data often contain other (non-spatial) features describing the states of the moving objects. The time-varying geo-location and state information would collectively characterize a trajectory dataset, which can be harnessed to understand the dynamics of the moving objects. This thesis focuses...
Show moreTrajectories are spatio-temporal data that represent traces of moving objects, such as humans, migrating animals, vehicles, and tropical cyclones. In addition to the geo-location information, a trajectory data often contain other (non-spatial) features describing the states of the moving objects. The time-varying geo-location and state information would collectively characterize a trajectory dataset, which can be harnessed to understand the dynamics of the moving objects. This thesis focuses on the development of efficient and accurate machine learning algorithms for forecasting the future trajectory path and state of a moving object. Although many methods have been developed in recent years, there are still numerous challenges that have not been sufficiently addressed by existing methods, which hamper their effectiveness when applied to critical applications such as hurricane prediction. These challenges include their difficulties in terms of handling concept drifts, error propagation in long-term forecasts, missing values, and nonlinearities in the data. In this thesis, I present a family of online learning algorithms to address these challenges. Online learning is an effective approach as it can efficiently fit new observations while adapting to concept drifts present in the data. First, I proposed an online learning framework called OMuLeT for long-term forecasting of the trajectory paths of moving objects. OMuLeT employs an online learning with restart strategy to incrementally update the weights of its predictive model as new observation data become available. It can also handle missing values in the data using a novel weight renormalization strategy.Second, I introduced the OOR framework to predict the future state of the moving object. Since the state can be represented by ordinal values, OOR employs a novel ordinal loss function to train its model. In addition, the framework was extended to OOQR to accommodate a quantile loss function to improve its prediction accuracy for larger values on the ordinal scale. Furthermore, I also developed the OOR-ε and OOQR-ε frameworks to generate real-valued state predictions using the ε insensitivity loss function.Third, I developed an online learning framework called JOHAN, that simultaneously predicts the location and state of the moving object. JOHAN generates its predictions by leveraging the relationship between the state and location information. JOHAN utilizes a quantile loss function to bias the algorithm towards predicting more accurately large categorical values in terms of the state of the moving object, say, for a high intensity hurricane.Finally, I present a deep learning framework to capture non-linear relationships in trajectory data. The proposed DTP framework employs a TDM approach for imputing missing values, coupled with an LSTM architecture for dynamic path prediction. In addition, the framework was extended to ODTP, which applied an online learning setting to address concept drifts present in the trajectory data.As proof of concept, the proposed algorithms were applied to the hurricane prediction task. Both OMuLeT and ODTP were used to predict the future trajectory path of a hurricane up to 48 hours lead time. Experimental results showed that OMuLeT and ODTP outperformed various baseline methods, including the official forecasts produced by the U.S. National Hurricane Center. OOR was applied to predict the intensity of a hurricane up to 48 hours in advance. Experimental results showed that OOR outperformed various state-of-the-art online learning methods and can generate predictions close to the NHC official forecasts. Since hurricane intensity prediction is a notoriously hard problem, JOHAN was applied to improve its prediction accuracy by leveraging the trajectory information, particularly for high intensity hurricanes that are near landfall.
Show less
- Title
- 5D Nondestructive Evaluation : Object Reconstruction to Toolpath Generation
- Creator
- Hamilton, Ciaron Nathan
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
The focus of this thesis is to provide virtualization methods for a Cyber-Physical System (CPS) setup that interfaces physical Nondestructive Evaluation (NDE) scanning environments into virtual spaces through virtual-physical interfacing and path planning. In these environments, a probe used for NDE mounted as the end-effector of a robot arm will actuate and acquire data along the surface of a Material Under Test (MUT) within virtual and physical spaces. Such configurations are practical for...
Show moreThe focus of this thesis is to provide virtualization methods for a Cyber-Physical System (CPS) setup that interfaces physical Nondestructive Evaluation (NDE) scanning environments into virtual spaces through virtual-physical interfacing and path planning. In these environments, a probe used for NDE mounted as the end-effector of a robot arm will actuate and acquire data along the surface of a Material Under Test (MUT) within virtual and physical spaces. Such configurations are practical for applications that require damage analysis of certain geometrically complex parts, ranging from automobile to aerospace to military industries. The pipeline of the designed $5D$ actuation system starts by virtually reconstructing the physical MUT and its surrounding environment, generating a toolpath along the surface of the reconstructed MUT, conducting a physical scan along the toolpath which synchronizes the robot's end effector position with retrieved NDE data, and post processing the obtained data. Most of this thesis will focus on virtual topics, including reconstruction from stereo camera images and toolpath planning. Virtual mesh generation of the MUT and surrounding environment are found with stereo camera images, where methods for camera positioning, registration, filtering, and reconstruction are provided. Path planning around the MUT uses a customized path-planner, where a $2D$ grid of rays is generated where each ray intersection across the surface of the MUT's mesh provides the translation and rotation of waypoints for actuation. Experimental setups include both predefined meshes and reconstructed meshes found from several real carbon-fiber automobile components using an Intel RealSense D425i stereo camera, showing both the reconstruction and path planning results. A theoretical review is also included to discuss analytical prospects of the system. The final system is designed to be automated to minimize human interaction to conduct scans, with later reports planned to discuss the scanning and post processing prospects of the system.
Show less
- Title
- COMBINING FACE AND IRIS FOR PRIVACY PRESERVATION
- Creator
- Ledala, Achsah Junia
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
With the extensive use of biometrics for authenticating users, the need to ensure privacy of biometric data is greater than ever before. Biometric authentication systems are vulnerable to attacks and the loss of biometric data will lead to loss of privacy of an individual. Multibiometrics refers to the use of multiple biometric modalities simultaneously in order to perform matching. In this work, we introduce a multibiometric fusion technique which can be used to ensure that the original raw...
Show moreWith the extensive use of biometrics for authenticating users, the need to ensure privacy of biometric data is greater than ever before. Biometric authentication systems are vulnerable to attacks and the loss of biometric data will lead to loss of privacy of an individual. Multibiometrics refers to the use of multiple biometric modalities simultaneously in order to perform matching. In this work, we introduce a multibiometric fusion technique which can be used to ensure that the original raw biometric data are unlikely to be compromised and, at the same time, recognition can be performed. The face and the iris biometric modalities are fused at the feature-level to produce discriminative embeddings that can be used for recognition. The original face or the iris cannot be retrieved from the combined representation, thus preserving the privacy of the individual. We present the results of this approach, provide analysis, discuss the challenges, and list possible future directions.
Show less
- Title
- OPTIMIZATION OF LARGE SCALE ITERATIVE EIGENSOLVERS
- Creator
- Afibuzzaman, Md
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Sparse matrix computations, in the form of solvers for systems of linear equations, eigenvalue problem or matrix factorizations constitute the main kernel in problems from fields as diverse as computational fluid dynamics, quantum many body problems, machine learning and graph analytics. Iterative eigensolvers have been preferred over the regular method because the regular method not being feasible with industrial sized matrices. Although dense linear algebra libraries like BLAS, LAPACK,...
Show moreSparse matrix computations, in the form of solvers for systems of linear equations, eigenvalue problem or matrix factorizations constitute the main kernel in problems from fields as diverse as computational fluid dynamics, quantum many body problems, machine learning and graph analytics. Iterative eigensolvers have been preferred over the regular method because the regular method not being feasible with industrial sized matrices. Although dense linear algebra libraries like BLAS, LAPACK, SCALAPACK are well established and some vendor optimized implementation like mkl from Intel or Cray Libsci exist, it is not the same case for sparse linear algebra which is lagging far behind. The main reason behind slow progress in the standardization of sparse linear algebra or library development is the different forms and properties depending on the application area. It is worsened for deep memory hierarchies of modern architectures due to low arithmetic intensities and memory bound computations. Minimization of data movement and fast access to the matrix are critical in this case. Since the current technology is driven by deep memory architectures where we get the increased capacity at the expense of increased latency and decreased bandwidth when we go further from the processors. The key to achieve high performance in sparse matrix computations in deep memory hierarchy is to minimize data movement across layers of the memory and overlap data movement with computations. My thesis work contributes towards addressing the algorithmic challenges and developing a computational infrastructure to achieve high performance in scientific applications for both shared memory and distributed memory architectures. For this purpose, I started working on optimizing a blocked eigensolver and optimized specific computational kernels which uses a new storage format. Using this optimization as a building block, we introduce a shared memory task parallel framework focusing on optimizing the entire solvers rather than a specific kernel. Before extending this shared memory implementation to a distributed memory architecture, I simulated the communication pattern and overheads of a large scale distributed memory application and then I introduce the communication tasks in the framework to overlap communication and computation. Additionally, I also tried to find a custom scheduler for the tasks using a graph partitioner. To get acquainted with high performance computing and parallel libraries, I started my PhD journey with optimizing a DFT code named Sky3D where I used dense matrix libraries. Despite there might not be any single solution for this problem, I tried to find an optimized solution. Though the large distributed memory application MFDn is kind of the driver project of the thesis, but the framework we developed is not confined to MFDn only, rather it can be used for other scientific applications too. The output of this thesis is the task parallel HPC infrastructure that we envisioned for both shared and distributed memory architectures.
Show less
- Title
- Contributions to Fingerprint Recognition
- Creator
- Engelsma, Joshua James
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
From the early days of the mid to late nineteenth century when scientific research first began to focus on fingerprints, to the present day fingerprint recognition systems we find deployed on our day to day devices, the science of fingerprint recognition has come a long way. In spite of this progress, there remains challenging problems to be solved. This thesis highlights a few of these problems, and proposes solutions to address them. One area of further research that must be conducted on...
Show moreFrom the early days of the mid to late nineteenth century when scientific research first began to focus on fingerprints, to the present day fingerprint recognition systems we find deployed on our day to day devices, the science of fingerprint recognition has come a long way. In spite of this progress, there remains challenging problems to be solved. This thesis highlights a few of these problems, and proposes solutions to address them. One area of further research that must be conducted on fingerprint recognition systems is that of robust, operational evaluations. In chapter two of this thesis, we show how the current practices of using calibration patterns to evaluate fingerprint readers are limited. We then propose a realistic fake finger called the Universal Target. The Universal Target is a realistic, 3D, fake finger (or phantom) which can be imaged by all major types of fingerprint sensing technologies. We show the entire manufacturing (molding and casting) process for fabricating the Universal Targets. Then, we show a series of evaluations which demonstrate how the Universal Targets can be used to operationally evaluate current commercial fingerprint readers. Our Universal Target is a significant step forward in enabling more realistic, standardized evaluations of fingerprint readers. In our third chapter, we shift gears from improving the evaluation standards of fingerprint readers to instead focus on the security of fingerprint readers. In particular, we turn our attention towards detecting fake fingerprint (spoof) attacks. To do so, we open source a fingerprint reader (built from low-cost ubiquitous components), called RaspiReader. RaspiReader is a high-resolution fingerprint reader customized with both direct-view imaging and FTIR imaging in order to better detect fingerprint spoofs. We show through a number of experiments that RaspiReader enables state-of-the-art fingerprint spoof detection accuracy. We also demonstrate that RaspiReader enables better generalization to what are known as "unseen attacks" (those attacks which were not seen during training of the spoof detector). Finally, we show that fingerprints captured by RaspiReader are completely compatible with images captured by legacy fingerprint readers for matching.In chapter four, we move on to propose a major improvement to the fingerprint feature extraction and matching sub-modules of fingerprint recognition systems. In particular, we propose a deep network, called DeepPrint, to extract a 200 byte fixed-length fingerprint representation. While prevailing fingerprint matchers primarily utilize minutiae points and expensive graph matching algorithms for comparison, two DeepPrint representations can be compared with only 192 multiplications and 191 additions. This is extremely useful for large scale search where potentially billions of pairwise fingerprint comparisons must be made. The DeepPrint representation also enables practical encrypted matching using a fully homomorphic encryption scheme. This enables better protection of the fingerprint templates which are stored in the database. While discriminative fixed-length representations are available for both face and iris recognition, such a representation has eluded fingerprint recognition. This chapter aims to fill that void.Finally, we conclude our thesis by working to extend fingerprint recognition to all ages. While current fingerprint recognition systems are being used by billions of teenagers and adults around the world, the youngest people among us remain disenfranchised. In particular, modern day fingerprint recognition systems do not work well on infants and young children. In this penultimate chapter, we aim to rectify this major shortcoming. To that end, we prototype a high-resolution (1900 ppi) infant fingerprint reader. Then, we track and fingerprint 315 infants (under the age of 3 months at enrollment) at the Dayalbagh Children's Hospital in Agra India over the course of 1 year (4 different sessions). To match the infant fingerprints, we develop our own high-resolution infant fingerprint matcher. Our experimental results demonstrate significant promise for the extension of fingerprint recognition to all ages. This work has the potential for major global good as all young infants and children could be given a verifiable digital identity for better vaccination tracking as a child and for government benefits and assistance as an adult. In summary, this thesis makes major contributions to the entire end-to-end fingerprint recognition system and extends its use case to all ages.
Show less