You are here
Search results
(1 - 20 of 81)
Pages
- Title
- Finding optimized bounding boxes of polytopes in d-dimensional space and their properties in k-dimensional projections
- Creator
- Shahid, Salman (Of Michigan State University)
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Using minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in d-dimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the k-dimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
- Title
- Non-coding RNA identification in large-scale genomic data
- Creator
- Yuan, Cheng
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes....
Show moreNoncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between Watson-Crick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes. Identification of novel and known ncRNAs becomes increasingly important in order to understand their functionalities and the underlying communities.Next-generation sequencing (NGS) technology sheds lights on more comprehensive and sensitive ncRNA annotation. Lowly transcribed ncRNAs or ncRNAs from rare species with low abundance may be identified via deep sequencing. However, there exist several challenges in ncRNA identification in large-scale genomic data. First, the massive volume of datasets could lead to very long computation time, making existing algorithms infeasible. Second, NGS has relatively high error rate, which could further complicate the problem. Third, high sequence similarity among related ncRNAs could make them difficult to identify, resulting in incorrect output. Fourth, while secondary structures should be adopted for accurate ncRNA identification, they usually incur high computational complexity. In particular, some ncRNAs contain pseudoknot structures, which cannot be effectively modeled by the state-of-the-art approach. As a result, ncRNAs containing pseudoknots are hard to annotate.In my PhD work, I aimed to tackle the above challenges in ncRNA identification. First, I designed a progressive search pipeline to identify ncRNAs containing pseudoknot structures. The algorithms are more efficient than the state-of-the-art approaches and can be used for large-scale data. Second, I designed a ncRNA classification tool for short reads in NGS data lacking quality reference genomes. The initial homology search phase significantly reduces size of the original input, making the tool feasible for large-scale data. Last, I focused on identifying 16S ribosomal RNAs from NGS data. 16S ribosomal RNAs are very important type of ncRNAs, which can be used for phylogenic study. A set of graph based assembly algorithms were applied to form longer or full-length 16S rRNA contigs. I utilized paired-end information in NGS data, so lowly abundant 16S genes can also be identified. To reduce the complexity of problem and make the tool practical for large-scale data, I designed a list of error correction and graph reduction techniques for graph simplification.
Show less
- Title
- Semi=supervised learning with side information : graph-based approaches
- Creator
- Liu, Yi
- Date
- 2007
- Collection
- Electronic Theses & Dissertations
- Title
- TEACHERS IN SOCIAL MEDIA : A DATA SCIENCE PERSPECTIVE
- Creator
- Karimi, Hamid
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Social media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and...
Show moreSocial media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and expanded world outside. In such an expanded virtual space, teachers engage in various activities within their communities, e.g., exchanging instructional resources, seeking new teaching methods, engaging in online discussions. Therefore, given the importance of teachers in social media and its tremendous impact on PK-12 education, in this dissertation, we investigate teachers in social media from a data science perspective. Our investigation in this direction is essentially an interdisciplinary endeavor bridging modern data science and education. In particular, we have made three contributions, as briefly discussed in the following. Current teachers in social media studies suffice to a small number of surveyed teachers while thousands of other teachers are on social media. This hinders us from conducting large-scale data-driven studies pertinent to teachers in social media. Aiming to overcome this challenge and further facilitate data-driven studies related to teachers in social media, we propose a novel method that automatically identifies teachers on Pinterest, an image-based social media popular among teachers. In this framework, we formulate the teacher identification problem as a positive unlabelled (PU) learning where positive samples are surveyed teachers, and unlabelled samples are their online friends. Using our framework, we build the largest dataset of teachers on Pinterest. With this dataset at our disposal, we perform an exploratory analysis of teachers on Pinterest while considering their genders. Our analysis incorporates two crucial aspects of teachers in social media. First, we investigate various online activities of male and female teachers, e.g., topics and sources of their curated resources, the professional language employed to describe their resources. Second, we investigate male and female teachers in the context of the social network (the graph) they belong to, e.g., structural centrality, gender homophily. Our analysis and findings in this part of the dissertation can serve as a valuable reference for many entities concerned with teachers' gender, e.g., principals, state, and federal governments.Finally, in the third part of the dissertation, we shed light on the diffusion of teacher-curated resources on Pinterest. First, we introduce three measures to characterize the diffusion process. Then, we investigate these three measures while considering two crucial characteristics of a resource, e.g., the topic and the source. Ultimately, we investigate how teacher attributes (e.g., the number of friends) affect the diffusion of their resources. The conducted diffusion analysis is the first of its kind and offers a deeper understating of the complex mechanism driving the diffusion of resources curated by teachers on Pinterest.
Show less
- Title
- Automated Speaker Recognition in Non-ideal Audio Signals Using Deep Neural Networks
- Creator
- Chowdhury, Anurag
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Speaker recognition entails the use of the human voice as a biometric modality for recognizing individuals. While speaker recognition systems are gaining popularity in consumer applications, most of these systems are negatively affected by non-ideal audio conditions, such as audio degradations, multi-lingual speech, and varying duration audio. This thesis focuses on developing speaker recognition systems robust to non-ideal audio conditions.Firstly, a 1-Dimensional Convolutional Neural...
Show moreSpeaker recognition entails the use of the human voice as a biometric modality for recognizing individuals. While speaker recognition systems are gaining popularity in consumer applications, most of these systems are negatively affected by non-ideal audio conditions, such as audio degradations, multi-lingual speech, and varying duration audio. This thesis focuses on developing speaker recognition systems robust to non-ideal audio conditions.Firstly, a 1-Dimensional Convolutional Neural Network (1D-CNN) is developed to extract noise-robust speaker-dependent speech characteristics from the Mel Frequency Cepstral Coefficients (MFCC). Secondly, the 1D-CNN-based approach is extended to develop a triplet-learning-based feature-fusion framework, called 1D-Triplet-CNN, for improving speaker recognition performance by judiciously combining MFCC and Linear Predictive Coding (LPC) features. Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech: speech perception and speech production. Thirdly, a time-domain filterbank called DeepVOX is learned from vast amounts of raw speech audio to replace commonly-used hand-crafted filterbanks, such as the Mel filterbank, in speech feature extractors. Finally, a vocal style encoding network called DeepTalk is developed to learn speaker-dependent behavioral voice characteristics to improve speaker recognition performance. The primary contribution of the thesis is the development of deep learning-based techniques to extract discriminative, noise-robust physical and behavioral voice characteristics from non-ideal speech audio. A large number of experiments conducted on the TIMIT, NTIMIT, SITW, NIST SRE (2008, 2010, and 2018), Fisher, VOXCeleb, and JukeBox datasets convey the efficacy of the proposed techniques and their importance in improving speaker recognition performance in non-ideal audio conditions.
Show less
- Title
- Iris Recognition : Enhancing Security and Improving Performance
- Creator
- Sharma, Renu
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Biometric systems recognize individuals based on their physical or behavioral traits, viz., face, iris, and voice. Iris (the colored annular region around the pupil) is one of the most popular biometric traits due to its uniqueness, accuracy, and stability. However, its widespread usage raises security concerns against various adversarial attacks. Another challenge is to match iris images with other compatible biometric modalities (i.e., face) to increase the scope of human identification....
Show moreBiometric systems recognize individuals based on their physical or behavioral traits, viz., face, iris, and voice. Iris (the colored annular region around the pupil) is one of the most popular biometric traits due to its uniqueness, accuracy, and stability. However, its widespread usage raises security concerns against various adversarial attacks. Another challenge is to match iris images with other compatible biometric modalities (i.e., face) to increase the scope of human identification. Therefore, the focus of this thesis is two-fold: firstly, enhance the security of the iris recognition system by detecting adversarial attacks, and secondly, accentuate its performance in iris-face matching.To enhance the security of the iris biometric system, we work over two types of adversarial attacks - presentation and morph attacks. A presentation attack (PA) occurs when an adversary presents a fake or altered biometric sample (plastic eye, cosmetic contact lens, etc.) to a biometric system to obfuscate their own identity or impersonate another identity. We propose three deep learning-based iris PA detection frameworks corresponding to three different imaging modalities, namely NIR spectrum, visible spectrum, and Optical Coherence Tomography (OCT) imaging inputting a NIR image, visible-spectrum video, and cross-sectional OCT image, respectively. The techniques perform effectively to detect known iris PAs as well as generalize well across unseen attacks, unseen sensors, and multiple datasets. We also presented the explainability and interpretability of the results from the techniques. Our other focuses are robustness analysis and continuous update (retraining) of the trained iris PA detection models. Another burgeoning security threat to biometric systems is morph attacks. A morph attack entails the generation of an image (morphed image) that embodies multiple different identities. Typically, a biometric image is associated with a single identity. In this work, we first demonstrate the vulnerability of iris recognition techniques to morph attacks and then develop techniques to detect the morphed iris images.The second focus of the thesis is to improve the performance of a cross-modal system where iris images are matched against face images. Cross-modality matching involves various challenges, such as cross-spectral, cross-resolution, cross-pose, and cross-temporal. To address these challenges, we extract common features present in both images using a multi-channel convolutional network and also generate synthetic data to augment insufficient training data using a dual-variational autoencoder framework. The two focus areas of this thesis improve the acceptance and widespread usage of the iris biometric system.
Show less
- Title
- EFFICIENT AND PORTABLE SPARSE SOLVERS FOR HETEROGENEOUS HIGH PERFORMANCE COMPUTING SYSTEMS
- Creator
- Rabbi, Md Fazlay
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Sparse matrix computations arise in the form of the solution of systems of linear equations, matrix factorization, linear least-squares problems, and eigenvalue problems in numerous computational disciplines ranging from quantum many-body problems, computational fluid dynamics, machine learning and graph analytics. The scale of problems in these scientific applications typically necessitates execution on massively parallel architectures. Moreover, due to the irregular data access patterns and...
Show moreSparse matrix computations arise in the form of the solution of systems of linear equations, matrix factorization, linear least-squares problems, and eigenvalue problems in numerous computational disciplines ranging from quantum many-body problems, computational fluid dynamics, machine learning and graph analytics. The scale of problems in these scientific applications typically necessitates execution on massively parallel architectures. Moreover, due to the irregular data access patterns and low arithmetic intensities of sparse matrix computations, achieving high performance and scalability is very difficult. These challenges are further exacerbated by the increasingly complex deep memory hierarchies of the modern architectures as they typically integrate several layers of memory storage. Data movement is an important bottleneck against efficiency and energy consumption in large-scale sparse matrix computations. Minimizing data movement across layers of the memory and overlapping data movement with computations are keys to achieving high performance in sparse matrix computations. My thesis work contributes towards systematically identifying algorithmic challenges of the sparse solvers and providing optimized and high performing solutions for both shared memory architectures and heterogeneous architectures by minimizing data movements between different memory layers. For this purpose, we first introduce a shared memory task-parallel framework focusing on optimizing the entire solvers rather than a specific kernel. As most of the recent (or upcoming) supercomputers are equipped with Graphics Processing Unit (GPU), we decided to evaluate the efficacy of the directive-based programming models (i.e., OpenMP and OpenACC) in offloading computations on GPU to achieve performance portability. Being inspired by the promising results of this work, we port and optimize our shared memory task-parallel framework on GPU accelerated systems to execute problem sizes that exceed device memory.
Show less
- Title
- EXTENDED REALITY (XR) & GAMIFICATION IN THE CONTEXT OF THE INTERNET OF THINGS (IOT) AND ARTIFICIAL INTELLIGENCE (AI)
- Creator
- Pappas, Georgios
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
The present research develops a holistic framework for and way of thinking about Deep Technologies related to Gamification, eXtended Reality (XR), the Internet of Things (IoT), and Artificial Intelligence (AI). Starting with the concept of gamification and the immersive technology of XR, we create interconnections with the IoT and AI implementations. While each constituent technology has its own unique impact, our approach uniquely addresses the combinational potential of these technologies...
Show moreThe present research develops a holistic framework for and way of thinking about Deep Technologies related to Gamification, eXtended Reality (XR), the Internet of Things (IoT), and Artificial Intelligence (AI). Starting with the concept of gamification and the immersive technology of XR, we create interconnections with the IoT and AI implementations. While each constituent technology has its own unique impact, our approach uniquely addresses the combinational potential of these technologies that may have greater impact than any technology on its own. To approach the research problem more efficiently, the methodology followed includes its initial division into smaller parts. For each part of the research problem, novel applications were designed and developed including gamified tools, serious games and AR/VR implementations. We apply the proposed framework in two different domains: autonomous vehicles (AVs), and distance learning.Specifically, in chapter 2, an innovative hybrid tool for distance learning is showcased where, among others, the fusion with IoT provides a novel pseudomultiplayer mode. This mode may transform advanced asynchronous gamified tools to synchronous by enabling or disabling virtual events and phenomena enhancing the student experience. Next, in Chapter 3, along with gamification, the combination of XR with IoT data streams is presented but this time in an automotive context. We showcase how this fusion of technologies provides low-latency monitoring of vehicle characteristics, and how this can be visualized in augmented and virtual reality using low-cost hardware and services. This part of our proposed framework provides the methodology of creating any type of Digital Twin with near real-time data visualization.Following that, in chapter 4 we establish the second part of the suggested holistic framework where Virtual Environments (VEs), in general, can work as synthetic data generators and thus, be a great source of artificial suitable for training AI models. This part of the research includes two novel implementations the Gamified Digital Simulator (GDS) and the Virtual LiDAR Simulator.Having established the holistic framework, in Chapter 5, we now “zoom in” to gamification exploring deeper aspects of virtual environments and discuss how serious games can be combined with other facets of virtual layers (cyber ranges,virtual learning environments) to provide enhanced training and advanced learning experiences. Lastly, in chapter 6, “zooming out” from gamification an additional enhancement layer is presented. We showcase the importance of human-centered design of via an implementation that tries to simulate the AV-pedestrian interactions in a virtual and safe environment.
Show less
- Title
- Applying evolutionary computation techniques to address environmental uncertainty in dynamically adaptive systems
- Creator
- Ramirez, Andres J.
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
A dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of...
Show moreA dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of system and environmental conditions that a DAS might encounter throughout its lifetime. Nevertheless, a DAS must continuously satisfy its requirements despite the threat that this uncertainty poses to its adaptation capabilities. This dissertation proposes a model-based framework that supports the specification, monitoring, and dynamic reconfiguration of a DAS to explicitly address uncertainty. The proposed framework uses goal-oriented requirements models and evolutionary computation techniques to derive and fine-tune utility functions for requirements monitoring in a DAS, identify combinations of system and environmental conditions that adversely affect the behavior of a DAS, and generate adaptations on-demand to transition the DAS to a target system configuration while preserving system consistency. We demonstrate the capabilities of our model-based framework by applying it to an industrial case study involving a remote data mirroring network that efficiently distributes data even as network links fail and messages are dropped, corrupted, and delayed.
Show less
- Title
- Using Eventual Consistency to Improve the Performance of Distributed Graph Computation In Key-Value Stores
- Creator
- Nguyen, Duong Ngoc
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Key-value stores have gained increasing popularity due to their fast performance and simple data model. A key-value store usually consists of multiple replicas located in different geographical regions to provide higher availability and fault tolerance. Consequently, a protocol is employed to ensure that data are consistent across the replicas.The CAP theorem states the impossibility of simultaneously achieving three desirable properties in a distributed system, namely consistency,...
Show moreKey-value stores have gained increasing popularity due to their fast performance and simple data model. A key-value store usually consists of multiple replicas located in different geographical regions to provide higher availability and fault tolerance. Consequently, a protocol is employed to ensure that data are consistent across the replicas.The CAP theorem states the impossibility of simultaneously achieving three desirable properties in a distributed system, namely consistency, availability, and network partition tolerance. Since failures are a norm in distributed systems and the capability to maintain the service at an acceptable level in the presence of failures is a critical dependability and business requirement of any system, the partition tolerance property is a necessity. Consequently, the trade-off between consistency and availability (performance) is inevitable. Strong consistency is attained at the cost of slow performance and fast performance is attained at the cost of weak consistency, resulting in a spectrum of consistency models suitable for different needs. Among the consistency models, sequential consistency and eventual consistency are two common ones. The former is easier to program with but suffers from poor performance whereas the latter suffers from potential data anomalies while providing higher performance.In this dissertation, we focus on the problem of what a designer should do if he/she is asked to solve a problem on a key-value store that provides eventual consistency. Specifically, we are interested in the approaches that allow the designer to run his/her applications on an eventually consistent key-value store and handle data anomalies if they occur during the computation. To that end, we investigate two options: (1) Using detect-rollback approach, and (2) Using stabilization approach. In the first option, the designer identifies a correctness predicate, say $\Phi$, and continues to run the application as if it was running on sequential consistency, as our system monitors $\Phi$. If $\Phi$ is violated (because the underlying key-value store provides eventual consistency), the system rolls back to a state where $\Phi$ holds and the computation is resumed from there. In the second option, the data anomalies are treated as state perturbations and handled by the convergence property of stabilizing algorithms.We choose LinkedIn's Voldemort key-value store as the example key-value store for our study. We run experiments with several graph-based applications on Amazon AWS platform to evaluate the benefits of the two approaches. From the experiment results, we observe that overall, both approaches provide benefits to the applications when compared to running the applications on sequential consistency. However, stabilization provides higher benefits, especially in the aggressive stabilization mode which trades more perturbations for no locking overhead.The results suggest that while there is some cost associated with making an algorithm stabilizing, there may be a substantial benefit in revising an existing algorithm for the problem at hand to make it stabilizing and reduce the overall runtime under eventual consistency.There are several directions of extension. For the detect-rollback approach, we are working to develop a more general rollback mechanism for the applications and improve the efficiency and accuracy of the monitors. For the stabilization approach, we are working to develop an analytical model for the benefits of eventual consistency in stabilizing programs. Our current work focuses on silent stabilization and we plan to extend our approach to other variations of stabilization.
Show less
- Title
- PRECISION DIAGNOSTICS AND INNOVATIONS FOR PLANT BREEDING RESEARCH
- Creator
- Hugghis, Eli
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Major technological advances are necessary to reach the goal of feeding our world’s growing population. To do this, there is an increasing demand within the agricultural field for rapid diagnostic tools to improve the efficiency of current methods in plant disease and DNA identification. The use of gold nanoparticles has emerged as a promising technology for a range of applications from smart agrochemical delivery systems to pathogen detection. In addition to this, advances in image...
Show moreMajor technological advances are necessary to reach the goal of feeding our world’s growing population. To do this, there is an increasing demand within the agricultural field for rapid diagnostic tools to improve the efficiency of current methods in plant disease and DNA identification. The use of gold nanoparticles has emerged as a promising technology for a range of applications from smart agrochemical delivery systems to pathogen detection. In addition to this, advances in image classification analyses have allowed machine learning approaches to become more accessible to the agricultural field. Here we present the use of gold nanoparticles (AuNPs) for the detection of transgenic gene sequences in maize and the use of machine learning algorithms for the identification and classification of Fusarium spp. infected wheat seed. AuNPs show promise in their ability to diagnose the presence of transgenic insertions in DNA samples within 10 minutes through colorimetric response. Image-based analysis with the utilization of logistic regression, support vector machines, and k-nearest neighbors were able to accurately identify and differentiate healthy and diseased wheat kernels within the testing set at an accuracy of 95-98.8%. These technologies act as rapid tools to be used by plant breeders and pathologists to improve their ability to make selection decisions efficiently and objectively.
Show less
- Title
- PALETTEVIZ : A METHOD FOR VISUALIZATION OF HIGH-DIMENSIONAL PARETO-OPTIMAL FRONT AND ITS APPLICATIONS TO MULTI-CRITERIA DECISION MAKING AND ANALYSIS
- Creator
- Talukder, AKM Khaled Ahsan
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Visual representation of a many-objective Pareto-optimal front in four or more dimensional objective space requires a large number of data points. Moreover, choosing a single point from a large set even with certain preference information is problematic, as it imposes a large cognitive burden on the decision-makers. Therefore, many-objective optimization and decision-making practitioners have been interested in effective visualization methods to en- able them to filter down a large set to a...
Show moreVisual representation of a many-objective Pareto-optimal front in four or more dimensional objective space requires a large number of data points. Moreover, choosing a single point from a large set even with certain preference information is problematic, as it imposes a large cognitive burden on the decision-makers. Therefore, many-objective optimization and decision-making practitioners have been interested in effective visualization methods to en- able them to filter down a large set to a few critical points for further analysis. Most existing visualization methods are borrowed from other data analytics domains and they are too generic to be effective for many-criterion decision making. In this dissertation, we propose a visualization method, using star-coordinate and radial visualization plots, for effectively visualizing many-objective trade-off solutions. The proposed method respects some basic topological, geometric and functional decision-making properties of high-dimensional trade- off points mapped to a three-dimensional space. We call this method Palette Visualization (PaletteViz). We demonstrate the use of PaletteViz on a number of large-dimensional multi- objective optimization test problems and three real-world multi-objective problems, where one of them has 10 objective and 16 constraint functions. We also show the uses of NIMBUS and Pareto-Race concepts from canonical multi-criterion decision making and analysis literature and introduce them into PaletteViz to demonstrate the ease and advantage of the proposed method.
Show less
- Title
- Towards Robust and Reliable Communication for Millimeter Wave Networks
- Creator
- Zarifneshat, Masoud
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
The future generations of wireless networks benefit significantly from millimeter wave technology (mmW) with frequencies ranging from about 30 GHz to 300 GHz. Specifically, the fifth generation of wireless networks has already implemented the mmW technology and the capacity requirements defined in 6G will also benefit from the mmW spectrum. Despite the attractions of the mmW technology, the mmW spectrum has some inherent propagation properties that introduce challenges. The first is that free...
Show moreThe future generations of wireless networks benefit significantly from millimeter wave technology (mmW) with frequencies ranging from about 30 GHz to 300 GHz. Specifically, the fifth generation of wireless networks has already implemented the mmW technology and the capacity requirements defined in 6G will also benefit from the mmW spectrum. Despite the attractions of the mmW technology, the mmW spectrum has some inherent propagation properties that introduce challenges. The first is that free space pathloss in mmW is more severe than that in the sub 6 GHz band. To make the mmW signal travel farther, communication systems need to use phased array antennas to concentrate the signal power to a limited direction in space at each given time. Directional communication can incur high overhead on the system because it needs to probe the space for finding signal paths. To have efficient communication in the mmW spectrum, the transmitter and the receiver should align their beams on strong signal paths which is a high overhead task. The second is a low diffraction of the mmW spectrum. The low diffraction causes almost any object including the human body to easily block the mmW signal degrading the mmW link quality. Avoiding and recovering from the blockage in the mmW communications, especially in dynamic environments, is particularly challenging because of the fast changes of the mmW channel. Due to the unique characteristics of the mmW propagation, the traditional user association methods perform poorly in the mmW spectrum. Therefore, we propose user association methods that consider the inherent propagation characteristics of the mmW signal. We first propose a method that collects the history of blockage incidents throughout the network and exploits the historical blockage incidents to associate user equipment to the base station with lower blockage possibility. The simulation results show that our proposed algorithm performs better in terms of improving the quality of the links and blockage rate in the network. User association based only on one objective may deteriorate other objectives. Therefore, we formulate a biobjective optimization problem to consider two objectives of load balance and blockage possibility in the network. We conduct Lagrangian dual analysis to decrease time complexity. The results show that our solution to the biobjective optimization problem has a better outcome compared to optimizing each objective alone. After we investigate the user association problem, we further look into the problem of maintaining a robust link between a transmitter and a receiver. The directional propagation of the mmW signal creates the opportunity to exploit multipath for a robust link. The main reasons for the link quality degradation are blockage and link movement. We devise a learning-based prediction framework to classify link blockage and link movement efficiently and quickly using diffraction values for taking appropriate mitigating actions. The simulations show that the prediction framework can predict blockage with close to 90% accuracy. The prediction framework will eliminate the need for time-consuming methods to discriminate between link movement and link blockage. After detecting the reason for the link degradation, the system needs to do the beam alignment on the updated mmW signal paths. The beam alignment on the signal paths is a high overhead task. We propose using signaling in another frequency band to discover the paths surrounding a receiver working in the mmW spectrum. In this way, the receiver does not have to do an expensive beam scan in the mmW band. Our experiments with off-the-shelf devices show that we can use a non-mmW frequency band's paths to align the beams in mmW frequency. In this dissertation, we provide solutions to the fundamental problems in mmW communication. We propose a user association method that is designed for mmW networks considering challenges of mmW signal. A closed-form solution for a biobjective optimization problem to optimize both blockage and load balance of the network is also provided. Moreover, we show that we can efficiently use the out-of-band signal to exploit multipath created in mmW communication. The future research direction includes investigating the methods proposed in this dissertation to solve some of the classic problems in the wireless networks that exist in the mmW spectrum.
Show less
- Title
- Variational Bayes inference of Ising models and their applications
- Creator
- Kim, Minwoo
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Ising models originated in statistical physics have been widely used in modeling spatialdata and computer vision problems. However, statistical inference of this model and its application to many practical fields remain challenging due to intractable nature of the normalizing constant in the likelihood. This dissertation consists of two main themes, (1) parameter estimation of Ising model and (2) structured variable selection based on the Ising model using variational Bayes (VB).In Chapter 1,...
Show moreIsing models originated in statistical physics have been widely used in modeling spatialdata and computer vision problems. However, statistical inference of this model and its application to many practical fields remain challenging due to intractable nature of the normalizing constant in the likelihood. This dissertation consists of two main themes, (1) parameter estimation of Ising model and (2) structured variable selection based on the Ising model using variational Bayes (VB).In Chapter 1, we review the background, research questions and development of Isingmodel, variational Bayes, and other statistical concepts. An Ising model basically deal with a binary random vector in which each component is dependent on its neighbors. There exist various versions of Ising model depending on parameterization and neighboring structure. In Chapter 2, with two-parameter Ising model, we describe a novel procedure for the pa- rameter estimation based on VB which is computationally efficient and accurate compared to existing methods. Traditional pseudo maximum likelihood estimate (PMLE) can pro- vide accurate results only for smaller number of neighbors. A Bayesian approach based on Markov chain Monte Carlo (MCMC) performs better even with a large number of neighbors. Computational costs of MCMC, however, are quite expensive in terms of time. Accordingly, we propose a VB method with two variational families, mean-field (MF) Gaussian family and bivariate normal (BN) family. Extensive simulation studies validate the efficacy of the families. Using our VB methods, computing times are remarkably decreased without dete- rioration in performance accuracy, or in some scenarios we get much more accurate output. In addition, we demonstrates theoretical properties of the proposed VB method under MF family. The main theoretical contribution of our work lies in establishing the consistency of the variational posterior for the Ising model with the true likelihood replaced by the pseudo- likelihood. Under certain conditions, we first derive the rates at which the true posterior based on the pseudo-likelihood concentrates around the εn- shrinking neighborhoods of the true parameters. With a suitable bound on the Kullback-Leibler distance between the true and the variational posterior, we next establish the rate of contraction for the variational pos- terior and demonstrate that the variational posterior also concentrates around εn-shrinking neighborhoods of the true parameter.In Chapter 3, we propose a Bayesian variable selection technique for a regression setupin which the regression coefficients hold structural dependency. We employ spike and slab priors on the regression coefficients as follows: (i) In order to capture the intrinsic structure, we first consider Ising prior on latent binary variables. If a latent variable takes one, the corresponding regression coefficient is active, otherwise, it is inactive. (ii) Employing spike and slab prior, we put Gaussian priors (slab) on the active coefficients and inactive coefficients will be zeros with probability one (spike).
Show less
- Title
- Optimizing and Improving the Fidelity of Reactive, Polarizable Molecular Dynamics Simulations on Modern High Performance Computing Architectures
- Creator
- O'Hearn, Kurt A.
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Reactive, polarizable molecular dynamics simulations are a crucial tool for the high-fidelity study of large systems with chemical reactions. In support of this, several approaches have been employed with varying degrees of computational cost and physical accuracy. One of the more successful approaches in recent years, the reactive force field (ReaxFF) model, wasdesigned to fill the gap between traditional classical models and quantum mechanical models by incorporating a dynamic bond order...
Show moreReactive, polarizable molecular dynamics simulations are a crucial tool for the high-fidelity study of large systems with chemical reactions. In support of this, several approaches have been employed with varying degrees of computational cost and physical accuracy. One of the more successful approaches in recent years, the reactive force field (ReaxFF) model, wasdesigned to fill the gap between traditional classical models and quantum mechanical models by incorporating a dynamic bond order potential term. When coupling ReaxFF with dynamic global charges models for electrostatics, special considerations are necessary for obtaining highly performant implementations, especially on modern high-performance computing architectures.In this work, we detail the performance optimization of the PuReMD (PuReMD Reactive Molecular Dynamics) software package, an open-source, GPLv3-licensed implementation of ReaxFF coupled with dynamic charge models. We begin byexploring the tuning of the iterative Krylov linear solvers underpinning the global charge models in a shared-memory parallel context using OpenMP, with the explicit goal of minimizing the mean combined preconditioner and solver time. We found that with appropriate solver tuning, significant speedups and scalability improvements were observed. Following these successes, we extend these approaches to the solvers in the distributed-memory MPI implementation of PuReMD, as well as broaden the scope of optimization to other portions of the ReaxFF potential such as the bond order computations. Here again we find that sizable performance gains were achieved for large simulations numbering in the hundreds of thousands of atoms.With these performance improvements in hand, we next change focus to another important use of PuReMD -- the development of ReaxFF force fields for new materials. The high fidelity inherent in ReaxFF simulations for different chemistries oftentimes comes at the expense of a steep learning curve for parameter optimization, due in part to complexities in the high dimensional parameter space and due in part to the necessity of deep domain knowledge of how to adequately control the ReaxFF functional forms. To diagnose and combat these issues, a study was undertaken to optimize parameters for Li-O systems using the OGOLEM genetic algorithms framework coupled with a modified shared-memory version of PuReMD. We found that with careful training set design, sufficient optimization control with tuned genetic algorithms, and improved polarizability through enhanced charge model use, higher accuracy was achieved in simulations involving ductile fracture behavior, a difficult phenomena to hereto model correctly.Finally, we return to performance optimization for the GPU-accelerated distributed-memory PuReMD codebase. Modern supercomputers have recently achieved exascale levels of peak arithmetic rates due in large part to the design decision to incorporate massive numbers of GPUs. In order to take advantage of such computing systems, the MPI+CUDA version of PuReMD was re-designed and benchmarked on modern NVIDIA Tesla GPUs. Performance on-par with or exceeding the LAMMPS Kokkos, a ReaxFF implementation developed at Scandia National Laboratories, with PuReMD typically out-performing LAMMPS Kokkos at larger scales.
Show less
- Title
- VISIONING THE AGRICULTURE BLOCKCHAIN : THE ROLE AND RISE OF BLOCKCHAIN IN THE COMMERCIAL POULTRY INDUSTRY
- Creator
- Fennell, Chris
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Blockchain is an emerging technology that is being explored by technologists and industry leaders as a way to revolutionize the agriculture supply chain. The problem is that human and ecological insights are needed to understand the complexities of how blockchain could fulfill these visions. In this work, I assert how the blockchain's promising vision of traceability, immutability and distributed properties presents advancements and challenges to rural farming. This work wrestles with the...
Show moreBlockchain is an emerging technology that is being explored by technologists and industry leaders as a way to revolutionize the agriculture supply chain. The problem is that human and ecological insights are needed to understand the complexities of how blockchain could fulfill these visions. In this work, I assert how the blockchain's promising vision of traceability, immutability and distributed properties presents advancements and challenges to rural farming. This work wrestles with the more subtle ways the blockchain technology would be integrated into the existing infrastructure. Through interviews and participatory design workshops, I talked with an expansive set of stakeholders including Amish farmers, contract growers, senior leadership and field supervisors. This research illuminates that commercial poultry farming is such a complex and diffuse system that any overhaul of its core infrastructure will be difficult to ``roll back'' once blockchain is ``rolled out.'' Through an HCI and sociotechnical system perspective, drawing particular insights from Science and Technology Studies theories of infrastructure and breakdown, this dissertation asserts three main concerns. First, this dissertation uncovers the dominant narratives on the farm around revision and ``roll back'' of blockchain, connecting to theories of version control from computer science. Second, this work uncovers that a core concern of the poultry supply chain is death and I reveal the sociotechnical and material implications for the integration of blockchain. Finally, this dissertation discusses the meaning of ``security’’ for the poultry supply chain in which biosecurity is prioritized over cybersecurity and how blockchain impacts these concerns. Together these findings point to significant implications for designers of blockchain infrastructure and how rural workers will integrate the technology into the supply chain.
Show less
- Title
- Computational Frameworks for Indel-Aware Evolutionary Analysis using Large-Scale Genomic Sequence Data
- Creator
- Wang, Wei
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
With the development of sequencing techniques, genetic sequencing data has been extensively used in evolutionary studies.The phylogenetic reconstruction problem, which is the reconstruction of evolutionary history from biomolecular sequences, is a fundamental problem. The evolutionary relationship between organisms is often represented by phylogeny, which is a tree or network representation. The most widely-used approach for reconstructing phylogenies from sequencing data involves two phases:...
Show moreWith the development of sequencing techniques, genetic sequencing data has been extensively used in evolutionary studies.The phylogenetic reconstruction problem, which is the reconstruction of evolutionary history from biomolecular sequences, is a fundamental problem. The evolutionary relationship between organisms is often represented by phylogeny, which is a tree or network representation. The most widely-used approach for reconstructing phylogenies from sequencing data involves two phases: multiple sequence alignment and phylogenetic reconstruction from the aligned sequences. As the amount of biomolecular sequence data increases, it has become a major challenge to develop efficient and accurate computational methods for phylogenetic analyses of large-scale sequencing data. Due to the complexity of the phylogenetic reconstruction problem in modern phylogenetic studies, the traditional sequence-based phylogenetic analysis methods involve many over-simplified assumptions. In this thesis, we describe our contribution in relaxing some of these over-simplified assumptions in the phylogenetic analysis.Insertion and deletion events, referred to as indels, carry much phylogenetic information but are often ignored in the reconstruction process of phylogenies. We take into account the indel uncertainties in multiple phylogenetic analyses by applying resampling and re-estimation. Another over-simplified assumption that we contributed to is adopted by many commonly used non-parametric algorithms for the resampling of biomolecular sequences, all sites in an MSA are evolved independently and identically distributed (i.i.d). Many evolution events, such as recombination and hybridization, may produce intra-sequence and functional dependence in biomolecular sequences that violate this assumption. We introduce SERES, a resampling algorithm for biomolecular sequences that can produce resampled replicates that preserve the intra-sequence dependence. We describe the application of the SERES resampling and re-estimation approach to two classical problems: the multiple sequence alignment support estimation and recombination-aware local genealogical inference. We show that these two statistical inference problems greatly benefit from the indel-aware resampling and re-estimation approach and the reservation of intra-sequence dependence.A major drawback of SERES is that it requires parameters to ensure the synchronization of random walks on unaligned sequences.We introduce RAWR, a non-parametric resampling method designed for phylogenetic tree support estimation that does not require extra parameters. We show that the RAWR-based resampling and re-estimation method produces comparable or typically better performance than the traditional bootstrap approach on the phylogenetic tree support estimation problem. We further relax the commonly used assumption of phylogeny.Evolutionary history is usually considered as a tree structure. Evolutionary events that cause reticulated gene flow are ignored. Previous studies show that alignment uncertainty greatly impacts downstream tree inference and learning. However, there is little discussion about the impact of MSA uncertainties on the phylogenetic network reconstruction. We show evidence that the errors introduced in MSA estimation decrease the accuracy of the inferred phylogenetic network, and an indel-aware reconstruction method is needed for phylogenetic network analysis. In this dissertation, we introduce our contribution to phylogenetic estimation using biomolecular sequence data involving complex evolutionary histories, such as sequence insertion and deletion processes and non-tree-like evolution.
Show less
- Title
- Predicting the Properties of Ligands Using Molecular Dynamics and Machine Learning
- Creator
- Donyapour, Nazanin
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
The discovery and design of new drugs requires extensive experimental assays that are usually very expensive and time-consuming. To cut down the cost and time of the drug development process and help design effective drugs more efficiently, various computational methods have been developed that are referred to collectively as in silico drug design. These in silico methods can be used to not only determine compounds that can bind to a target receptor but to determine whether compounds show...
Show moreThe discovery and design of new drugs requires extensive experimental assays that are usually very expensive and time-consuming. To cut down the cost and time of the drug development process and help design effective drugs more efficiently, various computational methods have been developed that are referred to collectively as in silico drug design. These in silico methods can be used to not only determine compounds that can bind to a target receptor but to determine whether compounds show ideal drug-like properties. I have provided solutions to these problems by developing novel methods for molecular simulation and molecular property prediction. Firstly, we have developed a new enhanced sampling MD algorithm called Resampling of Ensembles by Variation Optimization or “REVO” that can generate binding and unbinding pathways of ligand-target interactions. These pathways are useful for calculating transition rates and Residence Times (RT) of protein-ligand complexes. This can be particularly useful for drug design as studies for some systems show that the drug efficacy correlates more with RT than the binding affinity. This method is generally useful for generating long-timescale transitions in complex systems, including alternate ligand binding poses and protein conformational changes. Secondly, we have developed a technique we refer to as “ClassicalGSG” to predict the partition coefficient (log P) of small molecules. log P is one of the main factors in determining the drug likeness of a compound, as it helps determine bioavailability, solubility, and membrane permeability. This method has been very successful compared to other methods in literature. Finally, we have developed a method called ``Flexible Topology'' that we hope can eventually be used to screen a database of potential ligands while considering ligand-induced conformational changes. After discovering molecules with drug-like properties in the drug design pipeline, Virtual Screening (VS) methods are employed to perform an extensive search on drug databases with hundreds of millions of compounds to find candidates that bind tightly to a molecular target. However, in order for this to be computationally tractable, typically, only static snapshots of the target are used, which cannot respond to the presence of the drug compound. To efficiently capture drug-target interactions during screening, we have developed a machine-learning algorithm that employs Molecular Dynamics (MD) simulations with a protein of interest and a set of atoms called “Ghost Particles”. During the simulation, the Flexible Topology method induces forces that constantly modify the ghost particles and optimizes them toward drug-like molecules that are compatible with the molecular target.
Show less
- Title
- UNDERSTANDING THE GENETIC BASIS OF HUMAN DISEASES BY COMPUTATIONALLY MODELING THE LARGE-SCALE GENE REGULATORY NETWORKS
- Creator
- Wang, Hao
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Many severe diseases are known to be caused by the genetic disorder of the human genome, including breast cancer and Alzheimer's disease. Understanding the genetic basis of human diseases plays a vital role in personalized medicine and precision therapy. However, the pervasive spatial correlations between the disease-associated SNPs have hindered the ability of traditional GWAS studies to discover causal SNPs and obscured the underlying mechanisms of disease-associated SNPs. Recently, diverse...
Show moreMany severe diseases are known to be caused by the genetic disorder of the human genome, including breast cancer and Alzheimer's disease. Understanding the genetic basis of human diseases plays a vital role in personalized medicine and precision therapy. However, the pervasive spatial correlations between the disease-associated SNPs have hindered the ability of traditional GWAS studies to discover causal SNPs and obscured the underlying mechanisms of disease-associated SNPs. Recently, diverse biological datasets generated by large data consortia provide a unique opportunity to fill the gap between genotypes and phenotypes using biological networks, representing the complex interplay between genes, enhancers, and transcription factors (TF) in the 3D space. The comprehensive delineation of the regulatory landscape calls for highly scalable computational algorithms to reconstruct the 3D chromosome structures and mechanistically predict the enhancer-gene links. In this dissertation, I first developed two algorithms, FLAMINGO and tFLAMINGO, to reconstruct the high-resolution 3D chromosome structures. The algorithmic advancements of FLAMINGO and tFLAMINGO lead to the reconstruction of the 3D chromosome structures in an unprecedented resolution from the highly sparse chromatin contact maps. I further developed two integrative algorithms, ComMUTE and ProTECT, to mechanistically predict the long-range enhancer-gene links by modeling the TF profiles. Based on the extensive evaluations, these two algorithms demonstrate superior performance in predicting enhancer-gene links and decoding TF regulatory grammars over existing algorithms. The successful application of ComMUTE and ProTECT in 127 cell types not only provide a rich resource of gene regulatory networks but also shed light on the mechanistic understanding of QTLs, disease-associated genetic variants, and high-order chromatin interactions.
Show less
- Title
- Robust Learning of Deep Neural Networks under Data Corruption
- Creator
- Liu, Boyang
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Training deep neural networks in the presence of corrupted data is challenging as the corrupted data points may significantly impact generalization performance of the models. Unfortunately, the data corruption issue widely exists in many application domains, including but not limited to, healthcare, environmental sciences, autonomous driving, and social media analytics. Although there have been some previous studies that aim to enhance the robustness of machine learning models against data...
Show moreTraining deep neural networks in the presence of corrupted data is challenging as the corrupted data points may significantly impact generalization performance of the models. Unfortunately, the data corruption issue widely exists in many application domains, including but not limited to, healthcare, environmental sciences, autonomous driving, and social media analytics. Although there have been some previous studies that aim to enhance the robustness of machine learning models against data corruption, most of them either lack theoretical robustness guarantees or unable to scale to the millions of model parameters governing deep neural networks. The goal of this thesis is to design robust machine learning algorithms that 1) effectively deal with different types of data corruption, 2) have sound theoretical guarantees on robustness, and 3) scalable to large number of parameters in deep neural networks.There are two general approaches to enhance model robustness against data corruption. The first approach is to detect and remove the corrupted data while the second approach is to design robust learning algorithms that can tolerate some fraction of corrupted data. In this thesis, I had developed two robust unsupervised anomaly detection algorithms and two robust supervised learning algorithm for corrupted supervision and backdoor attack. Specifically, in Chapter 2, I proposed the Robust Collaborative Autoencoder (RCA) approach to enhance the robustness of vanilla autoencoder methods against natural corruption. In Chapter 3, I developed Robust RealNVP, a robust density estimation technique for unsupervised anomaly detection tasks given concentrated anomalies. Chapter 4 presents the Provable Robust Learning (PRL) approach, which is a robust algorithm against agnostic corrupted supervision. In Chapter 5, a meta-algorithm to defend against backdoor attacks is proposed by exploring the connection between label corruption and backdoor data poisoning attack. Extensive experiments on multiple benchmark datasets have demonstrated the robustness of the proposed algorithms under different types of corruption.
Show less