Search results
(1  20 of 41)
Pages
 Title
 Semi=supervised learning with side information : graphbased approaches
 Creator
 Liu, Yi
 Date
 2007
 Collection
 Electronic Theses & Dissertations
 Title
 Some contributions to semisupervised learning
 Creator
 Mallapragada, Paven Kumar
 Date
 2010
 Collection
 Electronic Theses & Dissertations
 Title
 Evolution of distributed behavior
 Creator
 Knoester, David B.
 Date
 2011
 Collection
 Electronic Theses & Dissertations
 Description

In this dissertation, we describe a study in the evolution of distributed behavior, where evolutionary algorithms are used to discover behaviors for distributed computing systems. We define distributed behavior as that in which groups of individuals must both cooperate in working towards a common goal and coordinate their activities in a harmonious fashion. As such, communication among individuals is necessarily a key component of distributed behavior, and we have identified three classes of...
Show moreIn this dissertation, we describe a study in the evolution of distributed behavior, where evolutionary algorithms are used to discover behaviors for distributed computing systems. We define distributed behavior as that in which groups of individuals must both cooperate in working towards a common goal and coordinate their activities in a harmonious fashion. As such, communication among individuals is necessarily a key component of distributed behavior, and we have identified three classes of distributed behavior that require communication: datadriven behaviors, where semantically meaningful data is transmitted between individuals; temporal behaviors, which are based on the relative timing of individuals' actions; and structural behaviors, which are responsible for maintaining the underlying communication network connecting individuals. Our results demonstrate that evolutionary algorithms can discover groups of individuals that exhibit each of these different classes of distributed behavior, and that these behaviors can be discovered both in isolation (e.g., evolving a purely datadriven algorithm) and in concert (e.g., evolving an algorithm that includes both datadriven and structural behaviors). As part of this research, we show that evolutionary algorithms can discover novel heuristics for distributed computing, and hint at a new class of distributed algorithm enabled by such studies.The majority of this research was conducted with the Avida platform for digital evolution, a system that has been proven to aid researchers in understanding the biological process of evolution by natural selection. For this reason, the results presented in this dissertation provide the foundation for future studies that examine how distributed behaviors evolved in nature. The close relationship between evolutionary biology and evolutionary algorithms thus aids our study of evolving algorithms for the next generation of distributed computing systems.
Show less
 Title
 Algorithms for deep packet inspection
 Creator
 Patel, Jignesh
 Date
 2012
 Collection
 Electronic Theses & Dissertations
 Description

The core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string...
Show moreThe core operation in network intrusion detection and prevention systems is Deep Packet Inspection (DPI), in which each security threat is represented as a signature, and the payload of each data packet is matched against the set of current security threat signatures. DPI is also used for other networking applications like advanced QoS mechanisms, protocol identification etc.. In the past, attack signatures were specified as strings, and a great deal of research has been done in string matching for network applications. Today most DPI systems use Regular Expression (RE) to represent signatures. RE matching is more diffcult than string matching, and current string matching solutions don't work well for REs. RE matching for networking applications is diffcult for several reasons. First, the DPI application is usually implemented in network devices, which have limited computing resources. Second, as new threats are discovered, size of the signature set grows over time. Last, the matching needs to be done at network speeds, the growth of which out paces improvements in computing speed; so there is a need for novel solutions that can deliver higher throughput. So RE matching for DPI is a very important and active research area.In our research, we investigate the existing methods proposed for RE matching, identify their limitations, and propose new methods to overcome these limitations. RE matching remains a fundamentally challenging problem due to the diffculty in compactly encoding DFA. While the DFA for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, many alternative automata implementations that compress the size of the final automaton have been proposed. However, previously proposed automata construction algorithms employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to expensive minimization on a large automata, and a large intermediate memory footprint. We propose a “Minimize then Union” framework for constructing compact alternative automata, which minimizes smaller automata first before combining them. This approach required much less time and memory, allowing us to handle a much larger RE set. Prior hardware based RE matching algorithms typically use FPGA. The drawback of FPGA is that resynthesizing and updating FPGA circuitry to handle RE updates is slow and diffcult. We propose the first hardwarebased RE matching approach that uses Ternary Content Addressable Memory (TCAM). TCAMs have already been widely used in modern networking devices for tasks such as packet classification, so our solutions can be easily deployed. Our methods support easy RE updates, and we show that we can achieve very high throughput. The main reason combined DFAs for multiple REs grow exponentially in size is because of replication of states. We developed a new overlay automata model which exploit this replication to compress the size of the DFA. The idea is to group together the replicated DFA structures instead of repeating them multiple times. The result is that we get a final automata size that is close to that of a NFA (which is linear in the size of the RE set), and simultaneously achieve fast deterministic matching speed of a DFA.
Show less
 Title
 Applying evolutionary computation techniques to address environmental uncertainty in dynamically adaptive systems
 Creator
 Ramirez, Andres J.
 Date
 2013
 Collection
 Electronic Theses & Dissertations
 Description

A dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of...
Show moreA dynamically adaptive system (DAS) observes itself and its execution environment at run time to detect conditions that warrant adaptation. If an adaptation is necessary, then a DAS changes its structure and/or behavior to continuously satisfy its requirements, even as its environment changes. It is challenging, however, to systematically and rigorously develop a DAS due to environmental uncertainty. In particular, it is often infeasible for a human to identify all possible combinations of system and environmental conditions that a DAS might encounter throughout its lifetime. Nevertheless, a DAS must continuously satisfy its requirements despite the threat that this uncertainty poses to its adaptation capabilities. This dissertation proposes a modelbased framework that supports the specification, monitoring, and dynamic reconfiguration of a DAS to explicitly address uncertainty. The proposed framework uses goaloriented requirements models and evolutionary computation techniques to derive and finetune utility functions for requirements monitoring in a DAS, identify combinations of system and environmental conditions that adversely affect the behavior of a DAS, and generate adaptations ondemand to transition the DAS to a target system configuration while preserving system consistency. We demonstrate the capabilities of our modelbased framework by applying it to an industrial case study involving a remote data mirroring network that efficiently distributes data even as network links fail and messages are dropped, corrupted, and delayed.
Show less
 Title
 Noncoding RNA identification in largescale genomic data
 Creator
 Yuan, Cheng
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between WatsonCrick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes....
Show moreNoncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and important biological functions. ncRNAs function not only through their primary structures, but also secondary structures, which are defined by interactions between WatsonCrick and wobble base pairs. Common types of ncRNA include microRNA, rRNA, snoRNA, tRNA. Functions of ncRNAs vary among different types. Recent studies suggest the existence of large number of ncRNA genes. Identification of novel and known ncRNAs becomes increasingly important in order to understand their functionalities and the underlying communities.Nextgeneration sequencing (NGS) technology sheds lights on more comprehensive and sensitive ncRNA annotation. Lowly transcribed ncRNAs or ncRNAs from rare species with low abundance may be identified via deep sequencing. However, there exist several challenges in ncRNA identification in largescale genomic data. First, the massive volume of datasets could lead to very long computation time, making existing algorithms infeasible. Second, NGS has relatively high error rate, which could further complicate the problem. Third, high sequence similarity among related ncRNAs could make them difficult to identify, resulting in incorrect output. Fourth, while secondary structures should be adopted for accurate ncRNA identification, they usually incur high computational complexity. In particular, some ncRNAs contain pseudoknot structures, which cannot be effectively modeled by the stateoftheart approach. As a result, ncRNAs containing pseudoknots are hard to annotate.In my PhD work, I aimed to tackle the above challenges in ncRNA identification. First, I designed a progressive search pipeline to identify ncRNAs containing pseudoknot structures. The algorithms are more efficient than the stateoftheart approaches and can be used for largescale data. Second, I designed a ncRNA classification tool for short reads in NGS data lacking quality reference genomes. The initial homology search phase significantly reduces size of the original input, making the tool feasible for largescale data. Last, I focused on identifying 16S ribosomal RNAs from NGS data. 16S ribosomal RNAs are very important type of ncRNAs, which can be used for phylogenic study. A set of graph based assembly algorithms were applied to form longer or fulllength 16S rRNA contigs. I utilized pairedend information in NGS data, so lowly abundant 16S genes can also be identified. To reduce the complexity of problem and make the tool practical for largescale data, I designed a list of error correction and graph reduction techniques for graph simplification.
Show less
 Title
 Multiple kernel and multilabel learning for image categorization
 Creator
 Bucak, Serhat Selçuk
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multilabel learning...
Show more"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multilabel learning algorithms with high prediction performance for image categorization... "  Abstract.
Show less
 Title
 Finding optimized bounding boxes of polytopes in ddimensional space and their properties in kdimensional projections
 Creator
 Shahid, Salman (Of Michigan State University)
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

Using minimal bounding boxes to encapsulate or approximate a set of points in ddimensional space is a nontrivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in ddimensional space is a nontrivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in ddimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the kdimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
 Title
 ExampleBased Parameterization of Linear Blend Skinning for Skinning Decomposition (EPLBS
 Creator
 Hopkins, Kayra M.
 Date
 2017
 Collection
 Electronic Theses & Dissertations
 Description

This thesis presents Examplebased Parameterization of Linear Blend Skinning for Skinning Decomposition (EPLBS), a unified and robust method for using example data to simplify and improve the development and parameterization of high quality 3D models for animation. Animation and threedimensional (3D) computer graphics have quickly become a popular medium for education, entertainment and scientific simulation. In addition to film, gaming and research applications, recent advancements in...
Show moreThis thesis presents Examplebased Parameterization of Linear Blend Skinning for Skinning Decomposition (EPLBS), a unified and robust method for using example data to simplify and improve the development and parameterization of high quality 3D models for animation. Animation and threedimensional (3D) computer graphics have quickly become a popular medium for education, entertainment and scientific simulation. In addition to film, gaming and research applications, recent advancements in augmented reality (AR) and virtual reality (VR) are driving additional demand for 3D content. However, the success of graphics in these arenas depends greatly on the efficiency of model creation and the realism of the animation or 3D image.A common method for figure animation is skeletal animation using linear blend skinning (LBS). In this method, vertices are deformed based on a weighted sum of displacements due to an embedded skeleton. This research addresses the problem that LBS animation parameter computation, including determining the rig (the skeletal structure), identifying influence bones (which bones influence which vertices), and assigning skinning weights (amounts of influence a bone has on a vertex), is a tedious process that is difficult to get right. Even the most skilled animators must work tirelessly to design an effective character model and often find themselves repeatedly correcting flaws in the parameterization. Significant research, including the use of exampledata, has focused on simplifying and automating individual components of the LBS deformation process and increasing the quality of resulting animations. However, constraints on LBS animation parameters makes automated analytic computation of the values equally as challenging as traditional 3D animation methods. Skinning decomposition is one such method of computing LBS animation LBS parameters from example data. Skinning decomposition challenges include constraint adherence and computationally efficient determination of LBS parameters.The EPLBS method presented in this thesis utilizes example data as input to a leastsquares nonlinear optimization process. Given a model as a set of example poses captured from scan data or manually created, EPLBS institutes a single optimization equation that allows for simultaneous computation of all animation parameters for the model. An iterative clustering methodology is used to construct an initial parameterization estimate for this model, which is then subjected to nonlinear optimization to improve the fitting to the example data. Simultaneous optimization of weights and joint transformations is complicated by a wide range of differing constraints and parameter interdependencies. To address interdependent and conflicting constraints, parameter mapping solutions are presented that map the constraints to an alternative domain more suitable for nonlinear minimization. The presented research is a comprehensive, datadriven solution for automatically determining skeletal structure, influence bones and skinning weights from a set of example data. Results are presented for a range of models that demonstrate the effectiveness of the method.
Show less
 Title
 Genderrelated effects of advanced placement computer science courses on selfefficacy, belongingness, and persistence
 Creator
 Good, Jonathon Andrew
 Date
 2018
 Collection
 Electronic Theses & Dissertations
 Description

The underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCSP) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCSP allowed teachers to present computer...
Show moreThe underrepresentation of women in computer science has been a concern of educators for multiple decades. The low representation of women in the computer science is a pattern from K12 schools through the university level and profession. One of the purposes of the introduction of the Advanced Placement Computer Science Principles (APCSP) course in 2016 was to help broaden participation in computer science at the high school level. The design of APCSP allowed teachers to present computer science from a broad perspective, allowing students to pursue problems of personal significance, and allowing for computing projects to take a variety of forms. The nationwide enrollment statistics for Advanced Placement Computer Science Principles in 2017 had a higher proportion of female students (30.7%) than Advanced Placement Computer Science A (23.6%) courses. However, it is unknown to what degree enrollment in these courses was related to students’ plans to enroll in future computer science courses. This correlational study examined how students’ enrollment in Advanced Placement Computer Science courses, along with student gender, predicted students’ sense of computing selfefficacy, belongingness, and expected persistence in computer science. A nationwide sample of 263 students from 10 APCSP and 10 APCSA courses participated in the study. Students completed pre and post surveys at the beginning and end of their Fall 2017 semester regarding their computing selfefficacy, belongingness, and plans to continue in computer science studies. Using hierarchical linear modeling analysis due to the nested nature of the data within class sections, the researcher found that the APCS course type was not predictive of selfefficacy, belongingness, or expectations to persist in computer science. The results suggested that female students’ selfefficacy declined over the course of the study. However, gender was not predictive of belongingness or expectations to persist in computer science. Students were found to have entered into both courses with high a sense of selfefficacy, belongingness, and expectation to persist in computer science.The results from this suggests that students enrolled in both Advanced Placement Computer Science courses are already likely to pursue computer science. I also found that the type of APCS course in which students enroll does not relate to students’ interest in computer science. This suggests that educators should look beyond AP courses as a method of exposing students to computer science, possibly through efforts such as computational thinking and crosscurricular uses of computer science concepts and practices. Educators and administrators should also continue to examine whether there are structural biases in how students are directed to computer science courses. As for the drop in selfefficacy related to gender, this in alignment with previous research suggesting that educators should carefully scaffold students’ initial experiences in the course to not negatively influence their selfefficacy. Further research should examine how specific pedagogical practices could influence students’ persistence, as the designation and curriculum of APCSA or APCSP alone may not capture the myriad of ways in which teachers may be addressing gender inequity in their classrooms. Research can also examine how student interest in computer science is affected at an earlier age, as the APCS courses may be reaching students after they have already formed their opinions about computer science as a field.
Show less
 Title
 Sequence learning with side information : modeling and applications
 Creator
 Wang, Zhiwei
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Sequential data is ubiquitous and modeling sequential data has been one of the most longstanding computer science problems. The goal of sequence modeling is to represent a sequence with a lowdimensional dense vector that incorporates as much information as possible. A fundamental type of information contained in sequences is the sequential dependency and a large body of research has been devoted to designing effective ways to capture it. Recently, sequence learning models such as recurrent...
Show moreSequential data is ubiquitous and modeling sequential data has been one of the most longstanding computer science problems. The goal of sequence modeling is to represent a sequence with a lowdimensional dense vector that incorporates as much information as possible. A fundamental type of information contained in sequences is the sequential dependency and a large body of research has been devoted to designing effective ways to capture it. Recently, sequence learning models such as recurrent neural networks (RNNs), temporal convolutional networks, and Transformer have gained tremendous popularity in modeling sequential data. Equipped with effective structures such as gating mechanisms, large receptive fields, and attention mechanisms, these models have achieved great success in many applications of a wide range of fields.However, besides the sequential dependency, sequences also exhibit side information that remains underexplored. Thus, in the thesis, we study the problem of sequence learning with side information. Specifically, we present our efforts devoted to building sequence learning models to effectively and efficiently capture side information that is commonly seen in sequential data. In addition, we show that side information can play an important role in sequence learning tasks as it can provide rich information that is complementary to the sequential dependency. More importantly, we apply our proposed models in various realworld applications and have achieved promising results.
Show less
 Title
 DIGITAL IMAGE FORENSICS IN THE CONTEXT OF BIOMETRICS
 Creator
 Banerjee, Sudipta
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Digital image forensics entails the deduction of the origin, history and authenticity of a digital image. While a number of powerful techniques have been developed for this purpose, much of the focus has been on images depicting natural scenes and generic objects. In this thesis, we direct our focus on biometric images, viz., iris, ocular and face images.Firstly, we assess the viability of using existing sensor identification schemes developed for visible spectrum images on nearinfrared (NIR...
Show moreDigital image forensics entails the deduction of the origin, history and authenticity of a digital image. While a number of powerful techniques have been developed for this purpose, much of the focus has been on images depicting natural scenes and generic objects. In this thesis, we direct our focus on biometric images, viz., iris, ocular and face images.Firstly, we assess the viability of using existing sensor identification schemes developed for visible spectrum images on nearinfrared (NIR) iris and ocular images. These schemes are based on estimating the multiplicative sensor noise that is embedded in an input image. Further, we conduct a study analyzing the impact of photometric modifications on the robustness of the schemes. Secondly, we develop a method for sensor deidentificaton, where the sensor noise in an image is suppressed but its biometric utility is retained. This enhances privacy by unlinking an image from its camera sensor and, subsequently, the owner of the camera. Thirdly, we develop methods for constructing an image phylogeny tree from a set of nearduplicate images. An image phylogeny tree captures the relationship between subtly modified images by computing a directed acyclic graph that depicts the sequence in which the images were modified. Our primary contribution in this regard is the use of complex basis functions to model any arbitrary transformation between a pair of images and the design of a likelihood ratio based framework for determining the original and modified image in the pair. We are currently integrating a graphbased deep learning approach with sensorspecific information to refine and improve the performance of the proposed image phylogeny algorithm.
Show less
 Title
 Achieving reliable distributed systems : through efficient runtime monitoring and predicate detection
 Creator
 Tekken Valapil, Vidhya
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Runtime monitoring of distributed systems to perform predicate detection is critical as well as a challenging task. It is critical because it ensures the reliability of the system by detecting all possible violations of system requirements. It is challenging because to guarantee lack of violations one has to analyze every possible ordering of system events and this is an expensive task. In this report, wefocus on ordering events in a system run using HLC (Hybrid Logical Clocks) timestamps,...
Show moreRuntime monitoring of distributed systems to perform predicate detection is critical as well as a challenging task. It is critical because it ensures the reliability of the system by detecting all possible violations of system requirements. It is challenging because to guarantee lack of violations one has to analyze every possible ordering of system events and this is an expensive task. In this report, wefocus on ordering events in a system run using HLC (Hybrid Logical Clocks) timestamps, which are O(1) sized timestamps, and present some efficient algorithms to perform predicate detection using HLC. Since, with HLC, the runtime monitor cannot find all possible orderings of systems events, we present a new type of clock called Biased Hybrid Logical Clocks (BHLC), that are capable of finding more possible orderings than HLC. Thus we show that BHLC based predicate detection can find more violations than HLC based predicate detection. Since predicate detection based on both HLC and BHLC do not guarantee detection of all possible violations in a system run, we present an SMT (Satisfiability Modulo Theories) solver based predicate detection approach, that guarantees the detection of all possible violations in a system run. While a runtime monitor that performs predicate detection using SMT solvers is accurate, the time taken by the solver to detect the presence or absence of a violation can be high. To reduce the time taken by the runtime monitor, we propose the use of an efficient twolayered monitoring approach, where the first layer of the monitor is efficient but less accurate and the second layer is accurate but less efficient. Together they reduce the overall time taken to perform predicate detection drastically and also guarantee detection of all possible violations.
Show less
 Title
 Fast edit distance calculation methods for NGS sequence similarity
 Creator
 Islam, A. K. M. Tauhidul
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Sequence fragments generated from targeted regions of phylogenetic marker genes provide valuable insight in identifying and classifying organisms and inferring taxonomic hierarchies. In recent years, significant development in targeted gene fragment sequencing through Next Generation Sequencing (NGS) technologies has increased the necessity of efficient sequence similarity computation methods for very large numbers of pairs of NGS sequences.The edit distance has been widely used to determine...
Show moreSequence fragments generated from targeted regions of phylogenetic marker genes provide valuable insight in identifying and classifying organisms and inferring taxonomic hierarchies. In recent years, significant development in targeted gene fragment sequencing through Next Generation Sequencing (NGS) technologies has increased the necessity of efficient sequence similarity computation methods for very large numbers of pairs of NGS sequences.The edit distance has been widely used to determine the dissimilarity between pairs of strings. All the known methods for the edit distance calculation run in near quadratic time with respect to string lengths, and it may take days or weeks to compute distances between such large numbers of pairs of NGS sequences. To solve the performance bottleneck problem, faster edit distance approximation and bounded edit distance calculation methods have been proposed. Despite these efforts, the existing edit distance calculation methods are not fast enough when computing larger numbers of pairs of NGS sequences. In order to further reduce the computation time, many NGS sequence similarity methods have been proposed using matching kmers. These methods extract all possible kmers from NGS sequences and compare similarity between pairs of sequences based on the shared kmers. However, these methods reduce the computation time at the cost accuracy.In this dissertation, our goal is to compute NGS sequence similarity using edit distance based methods while reducing the computation time. We propose a few edit distance prediction methods using dataset independent reference sequences that are distant from each other. These reference sequences convert sequences in datasets into feature vectors by computing edit distances between the sequence and each of the reference sequences. Given sequences A, B and a reference sequence r, the edit distance, ed(A.B) 2265 (ed(A, r) 0303ed(B, r)). Since each reference sequence is significantly different from each other, with sufficiently large number of reference sequences and high similarity threshold, the differences of edit distances of A and B with respect to the reference sequences are close to the ed(A,B). Using this property, we predict edit distances in the vector space based on the Euclidean distances and the Chebyshev distances. Further, we develop a small set of deterministically generated reference sequences with maximum distance between each of them to predict higher edit distances more efficiently. This method predicts edit distances between corresponding subsequences separately and then merges the partial distances to predict the edit distances between the entire sequences. The computation complexity of this method is linear with respect to sequence length. The proposed edit distance prediction methods are significantly fast while achieving very good accuracy for high similarity thresholds. We have also shown the effectiveness of these methods on agglomerative hierarchical clustering.We also propose an efficient bounded exact edit distance calculation method using the trace [1]. For a given edit distance threshold d, only letters up to d positions apart can be part of an edit operation. Hence, we generate pairs of subsequences up to length difference d so that no edit operation is spilled over to the adjacent pairs of subsequences. Then we compute the trace cost in such a way that the number of matching letters between the subsequences are maximized. This technique does not guarantee locally optimal edit distance, however, it guarantees globally optimal edit distance between the entire sequences for distance up to d. The bounded exact edit distance calculation method is an order of magnitude faster than that of the dynamic programming edit distance calculation method.
Show less
 Title
 SemiAdversarial Networks for Imparting Demographic Privacy to Face Images
 Creator
 Mirjalili, Vahid
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Face recognition systems are being widely used in a number of applications ranging from user authentication in handheld devices to identifying people of interest from surveillance videos. In several such applications, face images are stored in a central database. In such cases, it is necessary to ensure that the stored face images are used for the stated purpose and not for any other purposes. For example, advanced machine learning methods can be used to automatically extract age, gender,...
Show moreFace recognition systems are being widely used in a number of applications ranging from user authentication in handheld devices to identifying people of interest from surveillance videos. In several such applications, face images are stored in a central database. In such cases, it is necessary to ensure that the stored face images are used for the stated purpose and not for any other purposes. For example, advanced machine learning methods can be used to automatically extract age, gender, race and so on from the stored face images. These cues are often referred to as demographic attributes. When such attributes are extracted without the consent of individuals, it can lead to potential violation of privacy. Indeed, the European Union's General Data Protection and Regulation (GDPR) requires the primary purpose of data collection to be declared to individuals prior to data collection. GDPR strictly prohibits the use of this data for any purpose beyond what was stated. In this thesis, we consider this type of regulation and develop methods for enhancing the privacy accorded to face images with respect to the automatic extraction of demogrpahic attributes. In particular, we design algorithms that modify input face images such that certain specified demogrpahic attributes cannot be reliably extracted from them. At the same time, the biometric utility of the images is retained, i.e., the modified face images can still be used for matching purposes. The primary objective of this research is not necessarily to fool human observers, but rather to prevent machine learning methods from automatically extracting such information. The following are the contributions of this thesis. First, we design a convolutional autoencoder known as a semiadversarial neural network, or SAN, that perturbs input face images such that they are adversarial with respect to an attribute classifier (e.g., gender classifier) while still retaining their utility with respect to a face matcher. Second, we develop techniques to ensure that the adversarial outputs produced by the SAN are generalizable across multiple attribute classifiers, including those that may not have been used during the training phase. Third, we extend the SAN architecture and develop a neural network known as PrivacyNet, that can be used for imparting multiattribute privacy to face images. Fourth, we conduct extensive experimental analysis using several face image datasets to evaluate the performance of the proposed methods as well as visualize the perturbations induced by the methods. Results suggest the benefits of using semiadversarial networks to impart privacy to face images while still retaining the biometric utility of the ensuing face images.
Show less
 Title
 I AM DOING MORE THAN CODING : A QUALITATIVE STUDY OF BLACK WOMEN HBCU UNDERGRADUATES’ PERSISTENCE IN COMPUTING
 Creator
 Benton, Amber V.
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

The purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small...
Show moreThe purpose of my study is to explore why and how Black women undergraduates at historically Black colleges and universities (HBCUs) persist in computing. By centering the experiences of Black women undergraduates and their stories, this dissertation expands traditional, dominant ways of understanding student persistence in higher education. Critical Race Feminism (CRF) was applied as a conceptual framework to the stories of 11 Black women undergraduates in computing and drew on the small stories qualitative approach to examine the daytoday experiences of Black women undergraduates at HBCUs as they persisted in their computing degree programs. The findings suggest that: (a) gender underrepresentation in computing affects Black women’s experiences, (b) computing culture at HBCUs directly affect Black women in computing, (c) Black women need access to resources and opportunities to persist in computing, (d) computingrelated internships are beneficial professional opportunities but are also sites of gendered racism for Black women, (e) connectedness between Black people is innate but also needs to be fostered, (f) Black women want to engage in computing that contributes to social impact and community uplift, and (g) science identity is not a primary identity for Black women in computing. This paper also argues that disciplinary focused efforts contribute to the persistence of Black women in computing.
Show less
 Title
 Dissertation : novel parallel algorithms and performance optimization techniques for the multilevel fast multipole algorithm
 Creator
 Lingg, Michael
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

Since Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the NBody problem has been a very important tool in physics simulations. Expanding on the early use of the classical NBody problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical NBody problem to solve the Helmholtz...
Show moreSince Sir Issac Newton determined that characterizing orbits of celestial objects required considering the gravitational interactions among all bodies in the system, the NBody problem has been a very important tool in physics simulations. Expanding on the early use of the classical NBody problem for gravitational simulations, the method has proven invaluable in fluid dynamics, molecular simulations and data analytics. The extension of the classical NBody problem to solve the Helmholtz equation for groups of particles with oscillatory interactions has allowed for simulations that assist in antenna design, radar cross section prediction, reduction of engine noise, and medical devices that utilize sound waves, to name a sample of possible applications. While NBody simulations are extremely valuable, the computational cost of directly evaluating interactions among all pairs grows quadratically with the number of particles, rendering large scale simulations infeasible even on the most powerful supercomputers. The Fast Multipole Method (FMM) and the broader class of tree algorithms that it belongs to have significantly reduced the computational complexity of Nbody simulations, while providing controllable accuracy guarantees. While FMM provided a significant boost, Nbody problems tackled by scientists and engineers continue to grow larger in size, necessitating the development of efficient parallel algorithms and implementations to run on supercomputers. The Laplace variant of FMM, which is used to treat the classical Nbody problem, has been extensively researched and optimized to the extent that Laplace FMM codes can scale to tens of thousands of processors for simulations involving over trillion particles. In contrast, the MultiLevel Fast Multipole Algorithm (MLFMA), which is aimed for the Helmholtz kernel variant of FMM, lags significantly behind in efficiency and scaling. The added complexity of an oscillatory potential results in much more intricate data dependency patterns and load balancing requirements among parallel processes, making algorithms and optimizations developed for Laplace FMM mostly ineffective for MLFMA. In this thesis, we propose novel parallel algorithms and performance optimization techniques to improve the performance of MLFMA on modern computer architectures. Proposed algorithms and performance optimizations range from efficient leveraging of the memory hierarchy on multicore processors to an investigation of the benefits of the emerging concept of task parallelism for MLFMA, and to significant reductions of communication overheads and load imbalances in large scale computations. Parallel algorithms for distributed memory parallel MLFMA are also accompanied by detailed complexity analyses and performance models. We describe efficient implementations of all proposed algorithms and optimization techniques, and analyze their impact in detail. In particular, we show that our work yields significant speedups and much improved scalability compared to existing methods for MLFMA in large geometries designed to test the range of the problem space, as well as in real world problems.
Show less
 Title
 Quantitative methods for calibrated spatial measurements of laryngeal phonatory mechanisms
 Creator
 Ghasemzadeh, Hamzeh
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

The ability to perform measurements is an important cornerstone and the prerequisite of any quantitative research. Measurements allow us to quantify inputs and outputs of a system, and then to express their relationships using concise mathematical expressions and models. Those models would then enable us to understand how a target system works and to predict its output for changes in the system parameters. Conversely, models would enable us to determine the proper parameters of a system for...
Show moreThe ability to perform measurements is an important cornerstone and the prerequisite of any quantitative research. Measurements allow us to quantify inputs and outputs of a system, and then to express their relationships using concise mathematical expressions and models. Those models would then enable us to understand how a target system works and to predict its output for changes in the system parameters. Conversely, models would enable us to determine the proper parameters of a system for achieving a certain output. Putting these in the context of voice science research, variations in the parameters of the phonatory system could be attributed to individual differences. Thus, accurate models would enable us to account for individual differences during the diagnosis and to make reliable predictions about the likely outcome of different treatment options. Analysis of vibration of the vocal folds using highspeed videoendoscopy (HSV) could be an ideal candidate for constructing computational models. However, conventional images are not spatially calibrated and cannot be used for absolute spatial measurements. This dissertation is focused on developing the required methodologies for calibrated spatial measurements from invivo HSV recordings. Specifically, two different approaches for calibrated horizontal measurements of HSV images are presented. The first approach is called the indirect approach, and it is based on the registration of a specific attribute of a common object (e.g. size of a lesion) from a calibrated intraoperative still image to its corresponding noncalibrated invivo HSV recording. This approach does not require specialized instruments and can be implemented in many clinical settings. However, its validity depends on a couple of assumptions. Violation of those assumptions could lead to significant measurement errors. The second approach is called the direct approach, and it is based on a laserprojection flexible fiberoptic endoscope. This approach would enable us to make accurate calibrated spatial measurements. This dissertation evaluates the accuracy of the first approach indirectly, and by studying its underlying fundamental assumptions. However, the accuracy of the second approach is evaluated directly, and using benchtop experiments with different surfaces, different working distances, and different imaging angles. The main significances and contributions of this dissertation are the following: (1) a formal treatment of indirect horizontal calibration is presented, and the assumptions governing its validity and reliability are discussed. A battery of tests is presented that can indirectly assess the validity of those assumptions in laryngeal imaging applications; (2) recordings from pre and postsurgery from patients with vocal fold mass lesions are used as a testbench for the developed indirect calibration approach. In that regard, a full solution is developed for measuring the calibrated velocity of the vocal folds. The developed solution is then used to investigate postsurgery changes in the closing velocity of the vocal folds from patients with vocal fold mass lesions; (3) the method for calibrated vertical measurement from a laserprojection fiberoptic flexible endoscope is developed. The developed method is evaluated at different working distances, different imaging angles, and on a 3D surface; (4) a detailed analysis and investigation of nonlinear image distortion of a fiberoptic flexible endoscope is presented. The effect of imaging angle and spatial location of an object on the magnitude of that distortion is studied and quantified; (5) the method for calibrated horizontal measurement from a laserprojection fiberoptic flexible endoscope is developed. The developed method is evaluated at different working distances, different imaging angles, and on a 3D surface.
Show less
 Title
 Network analysis with negative links
 Creator
 Derr, Tyler Scott
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis...
Show moreAs we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Traditionally, network analysis has focused on networks having only positive links, or unsigned networks. However, in many realworld systems, relations between nodes in a graph can be both positive and negative, or signed networks. For example, in online social media, users not only have positive links such as friends, followers, and those they trust, but also can establish negative links to those they distrust, towards their foes, or block and unfriend users.Thus, although signed networks are ubiquitous due to their ability to represent negative links in addition to positive links, they have been significantly under explored. In addition, due to the rise in popularity of today's social media and increased polarization online, this has led to both an increased attention and demand for advanced methods to perform the typical network analysis tasks when also taking into consideration negative links. More specifically, there is a need for methods that can measure, model, mine, and apply signed networks that harness both these positive and negative relations. However, this raises novel challenges, as the properties and principles of negative links are not necessarily the same as positive links, and furthermore the social theories that have been used in unsigned networks might not apply with the inclusion of negative links.The chief objective of this dissertation is to first analyze the distinct properties negative links have as compared to positive links and towards improving network analysis with negative links by researching the utility and how to harness social theories that have been established in a holistic view of networks containing both positive and negative links. We discover that simply extending unsigned network analysis is typically not sufficient and that although the existence of negative links introduces numerous challenges, they also provide unprecedented opportunities for advancing the frontier of the network analysis domain. In particular, we develop advanced methods in signed networks for measuring node relevance and centrality (i.e., signed network measuring), present the first generative signed network model and extend/analyze balance theory to signed bipartite networks (i.e., signed network modeling), construct the first signed graph convolutional network which learns node representations that can achieve stateoftheart prediction performance and then furthermore introduce the novel idea of transformationbased network embedding (i.e., signed network mining), and apply signed networks by creating a framework that can infer both link and interaction polarity levels in online social media and constructing an advanced comprehensive congressional vote prediction framework built around harnessing signed networks.
Show less
 Title
 Discrete de RhamHodge Theory
 Creator
 Zhao, Rundong
 Date
 2020
 Collection
 Electronic Theses & Dissertations
 Description

We present a systematic treatment to 3D shape analysis based on the wellestablished de RhamHodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are...
Show moreWe present a systematic treatment to 3D shape analysis based on the wellestablished de RhamHodge theory in differential geometry and topology. The computational tools we developed are widely applicable to research areas such as computer graphics, computer vision, and computational biology. We extensively tested it in the context of 3D structure analysis of biological macromolecules to demonstrate the efficacy and efficiency of our method in potential applications. Our contributions are summarized in the following aspects. First, we present a compendium of discrete Hodge decompositions of vector fields, which provides the primary building block of the de RhamHodge theory for computations performed on the commonly used tetrahedral meshes embedded in the 3D Euclidean space. Second, we present a realworld application of the above computational tool to 3D shape analysis on biological macromolecules. Finally, we extend the above method to an evolutionary de RhamHodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. Our work on the decomposition of vector fields, spectral shape analysis on static shapes, and evolving shapes has already shown its effectiveness in biomolecular applications and will lead to a rich set of features for machine learningbased shape analysis currently under development.
Show less