You are here
Search results
(1 - 20 of 46)
Pages
- Title
- Semi=supervised learning with side information : graph-based approaches
- Creator
- Liu, Yi
- Date
- 2007
- Collection
- Electronic Theses & Dissertations
- Title
- Some contributions to semi-supervised learning
- Creator
- Mallapragada, Paven Kumar
- Date
- 2010
- Collection
- Electronic Theses & Dissertations
- Title
- Evolution of distributed behavior
- Creator
- Knoester, David B.
- Date
- 2011
- Collection
- Electronic Theses & Dissertations
- Description
-
In this dissertation, we describe a study in the evolution of distributed behavior, where evolutionary algorithms are used to discover behaviors for distributed computing systems. We define distributed behavior as that in which groups of individuals must both cooperate in working towards a common goal and coordinate their activities in a harmonious fashion. As such, communication among individuals is necessarily a key component of distributed behavior, and we have identified three classes of...
Show moreIn this dissertation, we describe a study in the evolution of distributed behavior, where evolutionary algorithms are used to discover behaviors for distributed computing systems. We define distributed behavior as that in which groups of individuals must both cooperate in working towards a common goal and coordinate their activities in a harmonious fashion. As such, communication among individuals is necessarily a key component of distributed behavior, and we have identified three classes of distributed behavior that require communication: data-driven behaviors, where semantically meaningful data is transmitted between individuals; temporal behaviors, which are based on the relative timing of individuals' actions; and structural behaviors, which are responsible for maintaining the underlying communication network connecting individuals. Our results demonstrate that evolutionary algorithms can discover groups of individuals that exhibit each of these different classes of distributed behavior, and that these behaviors can be discovered both in isolation (e.g., evolving a purely data-driven algorithm) and in concert (e.g., evolving an algorithm that includes both data-driven and structural behaviors). As part of this research, we show that evolutionary algorithms can discover novel heuristics for distributed computing, and hint at a new class of distributed algorithm enabled by such studies.The majority of this research was conducted with the Avida platform for digital evolution, a system that has been proven to aid researchers in understanding the biological process of evolution by natural selection. For this reason, the results presented in this dissertation provide the foundation for future studies that examine how distributed behaviors evolved in nature. The close relationship between evolutionary biology and evolutionary algorithms thus aids our study of evolving algorithms for the next generation of distributed computing systems.
Show less
- Title
- Algorithms for deep packet inspection
- Creator
- Patel, Jignesh
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
The core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string...
Show moreThe core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string matching for network applications. Today most DPI systems use Regular Expression (RE) to represent signatures. RE matching is more diffcult than string matching, and current string matching solutions don't work well for REs. RE matching for networking applications is diffcult for several reasons. First, the DPI application is usually implemented in network devices, which have limited computing resources. Second, as new threats are discovered, size of the signature set grows over time. Last, the matching needs to be done at network speeds, the growth of which out paces improvements in computing speed; so there is a need for novel solutions that can deliver higher throughput. So RE matching for DPI is a very important and active research area.In our research, we investigate the existing methods proposed for RE matching, identify their limitations, and propose new methods to overcome these limitations. RE matching remains a fundamentally challenging problem due to the diffculty in compactly encoding DFA. While the DFA for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, many alternative automata implementations that compress the size of the final automaton have been proposed. However, previously proposed automata construction algorithms employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to expensive minimization on a large automata, and a large intermediate memory footprint. We propose a “Minimize then Union” framework for constructing compact alternative automata, which minimizes smaller automata first before combining them. This approach required much less time and memory, allowing us to handle a much larger RE set. Prior hardware based RE matching algorithms typically use FPGA. The drawback of FPGA is that resynthesizing and updating FPGA circuitry to handle RE updates is slow and diffcult. We propose the first hardware-based RE matching approach that uses Ternary Content Addressable Memory (TCAM). TCAMs have already been widely used in modern networking devices for tasks such as packet classification, so our solutions can be easily deployed. Our methods support easy RE updates, and we show that we can achieve very high throughput. The main reason combined DFAs for multiple REs grow exponentially in size is because of replication of states. We developed a new overlay automata model which exploit this replication to compress the size of the DFA. The idea is to group together the replicated DFA structures instead of repeating them multiple times. The result is that we get a final automata size that is close to that of a NFA (which is linear in the size of the RE set), and simultaneously achieve fast deterministic matching speed of a DFA.
Show less
- Title
- Applying evolutionary computation techniques to address environmental uncertainty in dynamically adaptive systems
- Creator
- Ramirez, Andres J.
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
A dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of...
Show moreA dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of system and environmental conditions that a DAS might encounter throughout its lifetime. Nevertheless, a DAS must continuously satisfy its requirements despite the threat that this uncertainty poses to its adaptation capabilities. This dissertation proposes a model-based framework that supports the specification, monitoring, and dynamic reconfiguration of a DAS to explicitly address uncertainty. The proposed framework uses goal-oriented requirements models and evolutionary computation techniques to derive and fine-tune utility functions for requirements monitoring in a DAS, identify combinations of system and environmental conditions that adversely affect the behavior of a DAS, and generate adaptations on-demand to transition the DAS to a target system configuration while preserving system consistency. We demonstrate the capabilities of our model-based framework by applying it to an industrial case study involving a remote data mirroring network that efficiently distributes data even as network links fail and messages are dropped, corrupted, and delayed.
Show less
- Title
- Finding optimized bounding boxes of polytopes in d-dimensional space and their properties in k-dimensional projections
- Creator
- Shahid, Salman (Of Michigan State University)
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Using minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in d-dimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the k-dimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
- Title
- Non-coding RNA identification in large-scale genomic data
- Creator
- Yuan, Cheng
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes....
Show moreNoncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes. Identification of novel and known ncRNAs becomes increasingly important in order to understand their functionalities and the underlying communities.Next-generation sequencing (NGS) technology sheds lights on more comprehensive and sensitive ncRNA annotation. Lowly transcribed ncRNAs or ncRNAs from rare species with low abundance may be identified via deep sequencing. However, there exist several challenges in ncRNA identification in large-scale genomic data. First, the massive volume of datasets could lead to very long computation time, making existing algorithms infeasible. Second, NGS has relatively high error rate, which could further complicate the problem. Third, high sequence similarity among related ncRNAs could make them difficult to identify, resulting in incorrect output. Fourth, while secondary structures should be adopted for accurate ncRNA identification, they usually incur high computational complexity. In particular, some ncRNAs contain pseudoknot structures, which cannot be effectively modeled by the state-of-the-art approach. As a result, ncRNAs containing pseudoknots are hard to annotate.In my PhD work, I aimed to tackle the above challenges in ncRNA identification. First, I designed a progressive search pipeline to identify ncRNAs containing pseudoknot structures. The algorithms are more efficient than the state-of-the-art approaches and can be used for large-scale data. Second, I designed a ncRNA classification tool for short reads in NGS data lacking quality reference genomes. The initial homology search phase significantly reduces size of the original input, making the tool feasible for large-scale data. Last, I focused on identifying 16S ribosomal RNAs from NGS data. 16S ribosomal RNAs are very important type of ncRNAs, which can be used for phylogenic study. A set of graph based assembly algorithms were applied to form longer or full-length 16S rRNA contigs. I utilized paired-end information in NGS data, so lowly abundant 16S genes can also be identified. To reduce the complexity of problem and make the tool practical for large-scale data, I designed a list of error correction and graph reduction techniques for graph simplification.
Show less
- Title
- Multiple kernel and multi-label learning for image categorization
- Creator
- Bucak, Serhat Selçuk
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning...
Show more"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning algorithms with high prediction performance for image categorization... " -- Abstract.
Show less
- Title
- Gender-related effects of advanced placement computer science courses on self-efficacy, belongingness, and persistence
- Creator
- Good, Jonathon Andrew
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
The underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K-12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCS-P) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCS-P allowed teachers to present computer...
Show moreThe underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K-12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCS-P) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCS-P allowed teachers to present computer science from a broad perspective, allowing students to pursue problems of personal significance, and allowing for computing projects to take a variety of forms. The nationwide enrollment statistics for Advanced Placement Computer Science Principles in 2017 had a higher proportion of female students (30.7%) than Advanced Placement Computer Science A (23.6%) courses. However, it is unknown to what degree enrollment in these courses was related to students’ plans to enroll in future computer science courses. This correlational study examined how students’ enrollment in Advanced Placement Computer Science courses, along with student gender, predicted students’ sense of computing self-efficacy, belongingness, and expected persistence in computer science. A nationwide sample of 263 students from 10 APCS-P and 10 APCS-A courses participated in the study. Students completed pre and post surveys at the beginning and end of their Fall 2017 semester regarding their computing self-efficacy, belongingness, and plans to continue in computer science studies. Using hierarchical linear modeling analysis due to the nested nature of the data within class sections, the researcher found that the APCS course type was not predictive of self-efficacy, belongingness, or expectations to persist in computer science. The results suggested that female students’ self-efficacy declined over the course of the study. However, gender was not predictive of belongingness or expectations to persist in computer science. Students were found to have entered into both courses with high a sense of self-efficacy, belongingness, and expectation to persist in computer science.The results from this suggests that students enrolled in both Advanced Placement Computer Science courses are already likely to pursue computer science. I also found that the type of APCS course in which students enroll does not relate to students’ interest in computer science. This suggests that educators should look beyond AP courses as a method of exposing students to computer science, possibly through efforts such as computational thinking and cross-curricular uses of computer science concepts and practices. Educators and administrators should also continue to examine whether there are structural biases in how students are directed to computer science courses. As for the drop in self-efficacy related to gender, this in alignment with previous research suggesting that educators should carefully scaffold students’ initial experiences in the course to not negatively influence their self-efficacy. Further research should examine how specific pedagogical practices could influence students’ persistence, as the designation and curriculum of APCS-A or APCS-P alone may not capture the myriad of ways in which teachers may be addressing gender inequity in their classrooms. Research can also examine how student interest in computer science is affected at an earlier age, as the APCS courses may be reaching students after they have already formed their opinions about computer science as a field.
Show less
- Title
- Energy Conservation in Heterogeneous Smartphone Ad Hoc Networks
- Creator
- Mariani, James
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
In recent years mobile computing has been rapidly expanding to the point that there are now more devices than there are people. While once it was common for every household to have one PC, it is now common for every person to have a mobile device. With the increased use of smartphone devices, there has also been an increase in the need for mobile ad hoc networks, in which phones connect directly to each other without the need for an intermediate router. Most modern smart phones are equipped...
Show moreIn recent years mobile computing has been rapidly expanding to the point that there are now more devices than there are people. While once it was common for every household to have one PC, it is now common for every person to have a mobile device. With the increased use of smartphone devices, there has also been an increase in the need for mobile ad hoc networks, in which phones connect directly to each other without the need for an intermediate router. Most modern smart phones are equipped with both Bluetooth and Wifi Direct, where Wifi Direct has a better transmission range and rate and Bluetooth is more energy efficient. However only one or the other is used in a smartphone ad hoc network. We propose a Heterogeneous Smartphone Ad Hoc Network, HSNet, a framework to enable the automatic switching between Wifi Direct and Bluetooth to emphasize minimizing energy consumption while still maintaining an efficient network. We develop an application to evaluate the HSNet framework which shows significant energy savings when utilizing our switching algorithm to send messages by a less energy intensive technology in situations where energy conservation is desired. We discuss additional features of HSNet such as load balancing to help increase the lifetime of the network by more evenly distributing slave nodes among connected master nodes. Finally, we show that the throughput of our system is not affected due to technology switching for most scenarios. Future work of this project includes exploring energy efficient routing as well as simulation/scale testing for larger and more diverse smartphone ad hoc networks.
Show less
- Title
- Discrete de Rham-Hodge Theory
- Creator
- Zhao, Rundong
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
We present a systematic treatment to 3D shape analysis based on the well-established de Rham-Hodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are...
Show moreWe present a systematic treatment to 3D shape analysis based on the well-established de Rham-Hodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are summarized in the following aspects. First, we present a compendium of discrete Hodge decompositions of vector fields, which provides the primary building block of the de Rham-Hodge theory for computations performed on the commonly used tetrahedral meshes embedded in the 3D Euclidean space. Second, we present a real-world application of the above computational tool to 3D shape analysis on biological macromolecules. Finally, we extend the above method to an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. Our work on the decomposition of vector fields, spectral shape analysis on static shapes, and evolving shapes has already shown its effectiveness in biomolecular applications and will lead to a rich set of features for machine learning-based shape analysis currently under development.
Show less
- Title
- Network analysis with negative links
- Creator
- Derr, Tyler Scott
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis...
Show moreAs we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Traditionally, network analysis has focused on networks having only positive links, or unsigned networks. However, in many real-world systems, relations between nodes in a graph can be both positive and negative, or signed networks. For example, in online social media, users not only have positive links such as friends, followers, and those they trust, but also can establish negative links to those they distrust, towards their foes, or block and unfriend users.Thus, although signed networks are ubiquitous due to their ability to represent negative links in addition to positive links, they have been significantly under explored. In addition, due to the rise in popularity of today's social media and increased polarization online, this has led to both an increased attention and demand for advanced methods to perform the typical network analysis tasks when also taking into consideration negative links. More specifically, there is a need for methods that can measure, model, mine, and apply signed networks that harness both these positive and negative relations. However, this raises novel challenges, as the properties and principles of negative links are not necessarily the same as positive links, and furthermore the social theories that have been used in unsigned networks might not apply with the inclusion of negative links.The chief objective of this dissertation is to first analyze the distinct properties negative links have as compared to positive links and towards improving network analysis with negative links by researching the utility and how to harness social theories that have been established in a holistic view of networks containing both positive and negative links. We discover that simply extending unsigned network analysis is typically not sufficient and that although the existence of negative links introduces numerous challenges, they also provide unprecedented opportunities for advancing the frontier of the network analysis domain. In particular, we develop advanced methods in signed networks for measuring node relevance and centrality (i.e., signed network measuring), present the first generative signed network model and extend/analyze balance theory to signed bipartite networks (i.e., signed network modeling), construct the first signed graph convolutional network which learns node representations that can achieve state-of-the-art prediction performance and then furthermore introduce the novel idea of transformation-based network embedding (i.e., signed network mining), and apply signed networks by creating a framework that can infer both link and interaction polarity levels in online social media and constructing an advanced comprehensive congressional vote prediction framework built around harnessing signed networks.
Show less
- Title
- MICROBLOG GUIDED CRYPTOCURRENCY TRADING AND FRAMING ANALYSIS
- Creator
- Pawlicka Maule, Anna Paula
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
With 56 million people actively trading and investing in cryptocurrency online and globally, there is an increasing need for an automatic social media analysis tool to help understand trading discourse and behavior. Previous works have shown the usefulness of modeling microblog discourse for the prediction of trading stocks and their price fluctuations, as well as content framing. In this work, I present a natural language modeling pipeline that leverages language and social network behaviors...
Show moreWith 56 million people actively trading and investing in cryptocurrency online and globally, there is an increasing need for an automatic social media analysis tool to help understand trading discourse and behavior. Previous works have shown the usefulness of modeling microblog discourse for the prediction of trading stocks and their price fluctuations, as well as content framing. In this work, I present a natural language modeling pipeline that leverages language and social network behaviors for the prediction of cryptocurrency day trading actions and their associated framing patterns. Specifically, I present two modeling approaches. The first determines if the tweets of a 24-hour period can be used to guide day trading behavior, specifically if a cryptocurrency investor should buy, sell, or hold their cryptocurrencies in order to make a trading profit. The second is an unsupervised deep clustering approach to automatically detect framing patterns. My contributions include the modeling pipeline for this novel task, a new dataset of cryptocurrency-related tweets from influential accounts, and a transaction volume dataset. The experiments executed show that this weakly-supervised trading pipeline achieves an 88.78% accuracy for day trading behavior predictions and reveals framing fluctuations prior to and during the COVID-19 pandemic that could be used to guide investment actions.
Show less
- Title
- I AM DOING MORE THAN CODING : A QUALITATIVE STUDY OF BLACK WOMEN HBCU UNDERGRADUATES’ PERSISTENCE IN COMPUTING
- Creator
- Benton, Amber V.
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
The purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small...
Show moreThe purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small stories qualitative approach to examine the day-to-day experiences of Black women undergraduates at HBCUs as they persisted in their computing degree programs. The findings suggest that: (a) gender underrepresentation in computing affects Black women’s experiences, (b) computing culture at HBCUs directly affect Black women in computing, (c) Black women need access to resources and opportunities to persist in computing, (d) computing-related internships are beneficial professional opportunities but are also sites of gendered racism for Black women, (e) connectedness between Black people is innate but also needs to be fostered, (f) Black women want to engage in computing that contributes to social impact and community uplift, and (g) science identity is not a primary identity for Black women in computing. This paper also argues that disciplinary focused efforts contribute to the persistence of Black women in computing.
Show less
- Title
- Dissertation : novel parallel algorithms and performance optimization techniques for the multi-level fast multipole algorithm
- Creator
- Lingg, Michael
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Since Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the N-Body problem has been a very important tool in physics simulations. Expanding on the early use of the classical N-Body problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical N-Body problem to solve the Helmholtz...
Show moreSince Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the N-Body problem has been a very important tool in physics simulations. Expanding on the early use of the classical N-Body problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical N-Body problem to solve the Helmholtz equation for groups of particles with oscillatory interactions has allowed for simulations that assist in antenna design, radar cross section prediction, reduction of engine noise, and medical devices that utilize sound waves, to name a sample of possible applications. While N-Body simulations are extremely valuable, the computational cost of directly evaluating interactions among all pairs grows quadratically with the number of particles, rendering large scale simulations infeasible even on the most powerful supercomputers. The Fast Multipole Method (FMM) and the broader class of tree algorithms that it belongs to have significantly reduced the computational complexity of N-body simulations, while providing controllable accuracy guarantees. While FMM provided a significant boost, N-body problems tackled by scientists and engineers continue to grow larger in size, necessitating the development of efficient parallel algorithms and implementations to run on supercomputers. The Laplace variant of FMM, which is used to treat the classical N-body problem, has been extensively researched and optimized to the extent that Laplace FMM codes can scale to tens of thousands of processors for simulations involving over trillion particles. In contrast, the Multi-Level Fast Multipole Algorithm (MLFMA), which is aimed for the Helmholtz kernel variant of FMM, lags significantly behind in efficiency and scaling. The added complexity of an oscillatory potential results in much more intricate data dependency patterns and load balancing requirements among parallel processes, making algorithms and optimizations developed for Laplace FMM mostly ineffective for MLFMA. In this thesis, we propose novel parallel algorithms and performance optimization techniques to improve the performance of MLFMA on modern computer architectures. Proposed algorithms and performance optimizations range from efficient leveraging of the memory hierarchy on multi-core processors to an investigation of the benefits of the emerging concept of task parallelism for MLFMA, and to significant reductions of communication overheads and load imbalances in large scale computations. Parallel algorithms for distributed memory parallel MLFMA are also accompanied by detailed complexity analyses and performance models. We describe efficient implementations of all proposed algorithms and optimization techniques, and analyze their impact in detail. In particular, we show that our work yields significant speedups and much improved scalability compared to existing methods for MLFMA in large geometries designed to test the range of the problem space, as well as in real world problems.
Show less
- Title
- LIDAR AND CAMERA CALIBRATION USING A MOUNTED SPHERE
- Creator
- Li, Jiajia
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Extrinsic calibration between lidar and camera sensors is needed for multi-modal sensor data fusion. However, obtaining precise extrinsic calibration can be tedious, computationally expensive, or involve elaborate apparatus. This thesis proposes a simple, fast, and robust method performing extrinsic calibration between a camera and lidar. The only required calibration target is a hand-held colored sphere mounted on a whiteboard. The convolutional neural networks are developed to automatically...
Show moreExtrinsic calibration between lidar and camera sensors is needed for multi-modal sensor data fusion. However, obtaining precise extrinsic calibration can be tedious, computationally expensive, or involve elaborate apparatus. This thesis proposes a simple, fast, and robust method performing extrinsic calibration between a camera and lidar. The only required calibration target is a hand-held colored sphere mounted on a whiteboard. The convolutional neural networks are developed to automatically localize the sphere relative to the camera and the lidar. Then using the localization covariance models, the relative pose between the camera and lidar is derived. To evaluate the accuracy of our method, we record image and lidar data of a sphere at a set of known grid positions by using two rails mounted on a wall. The accurate calibration results are demonstrated by projecting the grid centers into the camera image plane and finding the error between these points and the hand-labeled sphere centers.
Show less
- Title
- TEACHERS IN SOCIAL MEDIA : A DATA SCIENCE PERSPECTIVE
- Creator
- Karimi, Hamid
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Social media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and...
Show moreSocial media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and expanded world outside. In such an expanded virtual space, teachers engage in various activities within their communities, e.g., exchanging instructional resources, seeking new teaching methods, engaging in online discussions. Therefore, given the importance of teachers in social media and its tremendous impact on PK-12 education, in this dissertation, we investigate teachers in social media from a data science perspective. Our investigation in this direction is essentially an interdisciplinary endeavor bridging modern data science and education. In particular, we have made three contributions, as briefly discussed in the following. Current teachers in social media studies suffice to a small number of surveyed teachers while thousands of other teachers are on social media. This hinders us from conducting large-scale data-driven studies pertinent to teachers in social media. Aiming to overcome this challenge and further facilitate data-driven studies related to teachers in social media, we propose a novel method that automatically identifies teachers on Pinterest, an image-based social media popular among teachers. In this framework, we formulate the teacher identification problem as a positive unlabelled (PU) learning where positive samples are surveyed teachers, and unlabelled samples are their online friends. Using our framework, we build the largest dataset of teachers on Pinterest. With this dataset at our disposal, we perform an exploratory analysis of teachers on Pinterest while considering their genders. Our analysis incorporates two crucial aspects of teachers in social media. First, we investigate various online activities of male and female teachers, e.g., topics and sources of their curated resources, the professional language employed to describe their resources. Second, we investigate male and female teachers in the context of the social network (the graph) they belong to, e.g., structural centrality, gender homophily. Our analysis and findings in this part of the dissertation can serve as a valuable reference for many entities concerned with teachers' gender, e.g., principals, state, and federal governments.Finally, in the third part of the dissertation, we shed light on the diffusion of teacher-curated resources on Pinterest. First, we introduce three measures to characterize the diffusion process. Then, we investigate these three measures while considering two crucial characteristics of a resource, e.g., the topic and the source. Ultimately, we investigate how teacher attributes (e.g., the number of friends) affect the diffusion of their resources. The conducted diffusion analysis is the first of its kind and offers a deeper understating of the complex mechanism driving the diffusion of resources curated by teachers on Pinterest.
Show less
- Title
- Automated Speaker Recognition in Non-ideal Audio Signals Using Deep Neural Networks
- Creator
- Chowdhury, Anurag
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Speaker recognition entails the use of the human voice as a biometric modality for recognizing individuals. While speaker recognition systems are gaining popularity in consumer applications, most of these systems are negatively affected by non-ideal audio conditions, such as audio degradations, multi-lingual speech, and varying duration audio. This thesis focuses on developing speaker recognition systems robust to non-ideal audio conditions.Firstly, a 1-Dimensional Convolutional Neural...
Show moreSpeaker recognition entails the use of the human voice as a biometric modality for recognizing individuals. While speaker recognition systems are gaining popularity in consumer applications, most of these systems are negatively affected by non-ideal audio conditions, such as audio degradations, multi-lingual speech, and varying duration audio. This thesis focuses on developing speaker recognition systems robust to non-ideal audio conditions.Firstly, a 1-Dimensional Convolutional Neural Network (1D-CNN) is developed to extract noise-robust speaker-dependent speech characteristics from the Mel Frequency Cepstral Coefficients (MFCC). Secondly, the 1D-CNN-based approach is extended to develop a triplet-learning-based feature-fusion framework, called 1D-Triplet-CNN, for improving speaker recognition performance by judiciously combining MFCC and Linear Predictive Coding (LPC) features. Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech: speech perception and speech production. Thirdly, a time-domain filterbank called DeepVOX is learned from vast amounts of raw speech audio to replace commonly-used hand-crafted filterbanks, such as the Mel filterbank, in speech feature extractors. Finally, a vocal style encoding network called DeepTalk is developed to learn speaker-dependent behavioral voice characteristics to improve speaker recognition performance. The primary contribution of the thesis is the development of deep learning-based techniques to extract discriminative, noise-robust physical and behavioral voice characteristics from non-ideal speech audio. A large number of experiments conducted on the TIMIT, NTIMIT, SITW, NIST SRE (2008, 2010, and 2018), Fisher, VOXCeleb, and JukeBox datasets convey the efficacy of the proposed techniques and their importance in improving speaker recognition performance in non-ideal audio conditions.
Show less
- Title
- EXTENDED REALITY (XR) & GAMIFICATION IN THE CONTEXT OF THE INTERNET OF THINGS (IOT) AND ARTIFICIAL INTELLIGENCE (AI)
- Creator
- Pappas, Georgios
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
The present research develops a holistic framework for and way of thinking about Deep Technologies related to Gamification, eXtended Reality (XR), the Internet of Things (IoT), and Artificial Intelligence (AI). Starting with the concept of gamification and the immersive technology of XR, we create interconnections with the IoT and AI implementations. While each constituent technology has its own unique impact, our approach uniquely addresses the combinational potential of these technologies...
Show moreThe present research develops a holistic framework for and way of thinking about Deep Technologies related to Gamification, eXtended Reality (XR), the Internet of Things (IoT), and Artificial Intelligence (AI). Starting with the concept of gamification and the immersive technology of XR, we create interconnections with the IoT and AI implementations. While each constituent technology has its own unique impact, our approach uniquely addresses the combinational potential of these technologies that may have greater impact than any technology on its own. To approach the research problem more efficiently, the methodology followed includes its initial division into smaller parts. For each part of the research problem, novel applications were designed and developed including gamified tools, serious games and AR/VR implementations. We apply the proposed framework in two different domains: autonomous vehicles (AVs), and distance learning.Specifically, in chapter 2, an innovative hybrid tool for distance learning is showcased where, among others, the fusion with IoT provides a novel pseudomultiplayer mode. This mode may transform advanced asynchronous gamified tools to synchronous by enabling or disabling virtual events and phenomena enhancing the student experience. Next, in Chapter 3, along with gamification, the combination of XR with IoT data streams is presented but this time in an automotive context. We showcase how this fusion of technologies provides low-latency monitoring of vehicle characteristics, and how this can be visualized in augmented and virtual reality using low-cost hardware and services. This part of our proposed framework provides the methodology of creating any type of Digital Twin with near real-time data visualization.Following that, in chapter 4 we establish the second part of the suggested holistic framework where Virtual Environments (VEs), in general, can work as synthetic data generators and thus, be a great source of artificial suitable for training AI models. This part of the research includes two novel implementations the Gamified Digital Simulator (GDS) and the Virtual LiDAR Simulator.Having established the holistic framework, in Chapter 5, we now “zoom in” to gamification exploring deeper aspects of virtual environments and discuss how serious games can be combined with other facets of virtual layers (cyber ranges,virtual learning environments) to provide enhanced training and advanced learning experiences. Lastly, in chapter 6, “zooming out” from gamification an additional enhancement layer is presented. We showcase the importance of human-centered design of via an implementation that tries to simulate the AV-pedestrian interactions in a virtual and safe environment.
Show less
- Title
- Computational Frameworks for Indel-Aware Evolutionary Analysis using Large-Scale Genomic Sequence Data
- Creator
- Wang, Wei
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
With the development of sequencing techniques, genetic sequencing data has been extensively used in evolutionary studies.The phylogenetic reconstruction problem, which is the reconstruction of evolutionary history from biomolecular sequences, is a fundamental problem. The evolutionary relationship between organisms is often represented by phylogeny, which is a tree or network representation. The most widely-used approach for reconstructing phylogenies from sequencing data involves two phases:...
Show moreWith the development of sequencing techniques, genetic sequencing data has been extensively used in evolutionary studies.The phylogenetic reconstruction problem, which is the reconstruction of evolutionary history from biomolecular sequences, is a fundamental problem. The evolutionary relationship between organisms is often represented by phylogeny, which is a tree or network representation. The most widely-used approach for reconstructing phylogenies from sequencing data involves two phases: multiple sequence alignment and phylogenetic reconstruction from the aligned sequences. As the amount of biomolecular sequence data increases, it has become a major challenge to develop efficient and accurate computational methods for phylogenetic analyses of large-scale sequencing data. Due to the complexity of the phylogenetic reconstruction problem in modern phylogenetic studies, the traditional sequence-based phylogenetic analysis methods involve many over-simplified assumptions. In this thesis, we describe our contribution in relaxing some of these over-simplified assumptions in the phylogenetic analysis.Insertion and deletion events, referred to as indels, carry much phylogenetic information but are often ignored in the reconstruction process of phylogenies. We take into account the indel uncertainties in multiple phylogenetic analyses by applying resampling and re-estimation. Another over-simplified assumption that we contributed to is adopted by many commonly used non-parametric algorithms for the resampling of biomolecular sequences, all sites in an MSA are evolved independently and identically distributed (i.i.d). Many evolution events, such as recombination and hybridization, may produce intra-sequence and functional dependence in biomolecular sequences that violate this assumption. We introduce SERES, a resampling algorithm for biomolecular sequences that can produce resampled replicates that preserve the intra-sequence dependence. We describe the application of the SERES resampling and re-estimation approach to two classical problems: the multiple sequence alignment support estimation and recombination-aware local genealogical inference. We show that these two statistical inference problems greatly benefit from the indel-aware resampling and re-estimation approach and the reservation of intra-sequence dependence.A major drawback of SERES is that it requires parameters to ensure the synchronization of random walks on unaligned sequences.We introduce RAWR, a non-parametric resampling method designed for phylogenetic tree support estimation that does not require extra parameters. We show that the RAWR-based resampling and re-estimation method produces comparable or typically better performance than the traditional bootstrap approach on the phylogenetic tree support estimation problem. We further relax the commonly used assumption of phylogeny.Evolutionary history is usually considered as a tree structure. Evolutionary events that cause reticulated gene flow are ignored. Previous studies show that alignment uncertainty greatly impacts downstream tree inference and learning. However, there is little discussion about the impact of MSA uncertainties on the phylogenetic network reconstruction. We show evidence that the errors introduced in MSA estimation decrease the accuracy of the inferred phylogenetic network, and an indel-aware reconstruction method is needed for phylogenetic network analysis. In this dissertation, we introduce our contribution to phylogenetic estimation using biomolecular sequence data involving complex evolutionary histories, such as sequence insertion and deletion processes and non-tree-like evolution.
Show less