You are here
Search results
(1 - 20 of 34)
Pages
- Title
- Modeling physical causality of action verbs for grounded language understanding
- Creator
- Gao, Qiaozi
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Building systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the...
Show moreBuilding systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the world, and to make inference based on what we hear or read. Commonsense knowledge is generally shared among cognitive capable individuals, thus it is rarely stated in human language. This makes it very difficult for artificial agents to acquire commonsense knowledge from language corpora. To address this problem, this dissertation investigates the acquisition of commonsense knowledge, especially knowledge related to basic actions upon the physical world and how that influences language processing and grounding.Linguistics studies have shown that action verbs often denote some change of state (CoS) as the result of an action. For example, the result of "slice a pizza" is that the state of the object (pizza) changes from one big piece to several smaller pieces. However, the causality of action verbs and its potential connection with the physical world has not been systematically explored. Artificial agents often do not have this kind of basic commonsense causality knowledge, which makes it difficult for these agents to work with humans and to reason, learn, and perform actions.To address this problem, this dissertation models dimensions of physical causality associated with common action verbs. Based on such modeling, several approaches are developed to incorporate causality knowledge to language grounding, visual causality reasoning, and commonsense story comprehension.
Show less
- Title
- Large-scale high dimensional distance metric learning and its application to computer vision
- Creator
- Qian, Qi
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Learning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$-nearest neighbor classifier, $k$-means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have...
Show moreLearning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$-nearest neighbor classifier, $k$-means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have been developed in the past decade, most of them can handle only small data sets with hundreds of features, significantly limiting their applications to real world applications that often involve millions of training examples represented by hundreds of thousands of features. Three main challenges are encountered to learn the metric from these large-scale high dimensional data: (i) To make sure that the learned metric is a Positive Semi-Definitive (PSD) matrix, a projection into the PSD cone is required at every iteration, whose cost is cubic in the dimensionality making it unsuitable for high dimensional data; (ii) The number of variables that needs to be optimized in DML is quadratic in the dimensionality, which results in the slow convergence rate in optimization and high requirement of memory storage; (iii) The number of constraints used by DML is at least quadratic, if not cubic, in the number of examples depending on if pairwise constraints or triplet constraints are used in DML. Besides, features can be redundant due to high dimensional representations (e.g., face features) and DML with feature selection is preferred for these applications.The main contribution of this dissertation is to address these challenges both theoretically and empirically. First, for the challenge arising from the PSD projection, we exploit the mini-batch strategy and adaptive sampling with smooth loss function to significantly reduce the number of updates (i.e., projections) while keeping the similar performance. Second, for the challenge arising from high dimensionality, we propose a dual random projection approach, which enjoys the light computation due to the usage of random projection and at the same time, significantly improves the effectiveness of random projection. Third, for the challenge with large-scale constraints, we develop a novel multi-stage metric learning framework. It divides the original optimization problem into multiple stages. It reduces the computation by adaptively sampling a small subset of constraints at each stage. Finally, to handle redundant features with group property, we develop a greedy algorithm that selects feature group and learns the corresponding metric simultaneously at each iteration leading to further improvement of learning efficiency when combined with adaptive mini-batch strategy and incremental sampling. Besides the theoretical and empirical investigation of DML on the benchmark datasets of machine learning, we also apply the proposed methods to several important computer vision applications (i.e., fine-grained visual categorization (FGVC) and face recognition).
Show less
- Title
- Finding optimized bounding boxes of polytopes in d-dimensional space and their properties in k-dimensional projections
- Creator
- Shahid, Salman (Of Michigan State University)
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Using minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in d-dimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the k-dimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
- Title
- Data clustering with pairwise constraints
- Creator
- Yi, Jinfeng
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
The classical unsupervised clustering is an ill-posed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of non-expert crowd workers, which leads to the problem of crowdclustering, and (ii)...
Show moreThe classical unsupervised clustering is an ill-posed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of non-expert crowd workers, which leads to the problem of crowdclustering, and (ii) pairwise constraints collected from oracle or experts, which leads to the problem of semi-supervised clustering. In both cases, the costs of collecting pairwise constraints can be expensive, thus it is important to identify the minimal number of pairwise constraints needed to accurately recover the underlying true data partition, also known as a sample complexity problem.In this thesis, we first analyze the sample complexity of crowdclustering. At first, we propose a novel crowdclustering approach based on the theory of matrix completion. Unlike the existing crowdclustering algorithm that is based on a Bayesian generative model, the proposed approach is more desirable since it only needs a much less number of crowdsourced pairwise annotations to accurately cluster all the objects. Our theoretical analysis shows that in order to accurately cluster $N$ objects, only $O(N\log^2 N)$ randomly sampled pairs should be annotated by crowd workers. To further reduce the sample complexity, we then introduce a semi-crowdsourced clustering framework that is able to effectively incorporate the low-level features of the objects to be clustered. In this framework, we only need to sample a subset of $n \ll N$ objects and generate their pairwise constraints via crowdsourcing. After completing a $n \times n$ similarity matrix using the proposed crowdclustering algorithm, we can further recover a $N \times N$ similarity matrix by applying a regression-based distance metric learning algorithm to the completed smaller size similarity matrix. This enables us to reliably cluster $N$ objects with only $O(n\log^2 n)$ crowdsourced pairwise constraints.Next, we study the problem of sample complexity in semi-supervised clustering. To this end, we propose a novel convex semi-supervised clustering approach based on the theory of matrix completion. In order to reduce the number of pairwise constraints needed %to achieve a perfect data partitioning,we apply a nature assumption that the feature representationsof the objects are able to reflect the similarities between objects. This enables us to only utilize $O(\log N)$ pairwiseconstraints to perfectly recover the data partition of $N$ objects.Lastly, in addition to sample complexity that relates to labeling costs, we also consider the computational costs of semi-supervised clustering.%In addition to sample complexity that relates to the labeling costs, we also consider the computational cost of semi-supervised clustering in the final part of this thesis.Specifically, we study the problem of efficiently updating clustering results when the pairwise constraints are generated sequentially, a common case in various real-world applications such as social networks. To address this issue, we develop a dynamic semi-supervised clustering algorithm that casts the clustering problem into a searching problem in a feasibleconvex space, i.e., a convex hull with its extreme points being an ensemble of multiple data partitions. Unlike classical semi-supervised clustering algorithms that need to re-optimize their objective functions when new pairwise constraints are generated, the proposed method only needs to update a low-dimensional vector and its time complexity is irrelevant to the number of data points to be clustered. This enables us to update large-scale clustering results in an extremely efficient way.
Show less
- Title
- Robust multi-task learning algorithms for predictive modeling of spatial and temporal data
- Creator
- Liu, Xi (Graduate of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially...
Show more"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially when there are significant noise and missing values or limited training examples available. The goal of this thesis is to develop novel multi-task learning frameworks that can exploit the spatial and/or temporal dependencies of the data to ensure robust predictions in spite of the data quality and scarcity problems. The first framework developed in this dissertation is designed for multi-task classification of time series data. Specifically, the prediction task here is to continuously classify activities of a human subject based on the multi-modal sensor data collected in a smart home environment. As the classes exhibit strong spatial and temporal dependencies, this makes it an ideal setting for applying a multi-task learning approach. Nevertheless, since the type of sensors deployed often vary from one room (location) to another, this introduces a structured missing value problem, in which blocks of sensor data could be missing when a subject moves from one room to another. To address this challenge, a probabilistic multi-task classification framework is developed to jointly model the activity recognition tasks from all the rooms, taking into account the block-missing value problem. The framework also learns the transitional dependencies between classes to improve its overall prediction accuracy. The second framework is developed for the multi-location time series forecasting problem. Although multi-task learning has been successfully applied to many time series forecasting applications such as climate prediction, conventional approaches aim to minimize only the point-wise residual error of their predictions instead of considering how well their models fit the overall distribution of the response variable. As a result, their predicted distribution may not fully capture the true distribution of the data. In this thesis, a novel distribution-preserving multi-task learning framework is proposed for the multi-location time series forecasting problem. The framework uses a non-parametric density estimation approach to fit the distribution of the response variable and employs an L2-distance function to minimize the divergence between the predicted and true distributions. The third framework proposed in this dissertation is for the multi-step-ahead (long-range) time series prediction problem with application to ensemble forecasting of sea surface temperature. Specifically, our goal is to effectively combine the forecasts generated by various numerical models at different lead times to obtain more precise predictions. Towards this end, a multi-task deep learning framework based on a hierarchical LSTM architecture is proposed to jointly model the ensemble forecasts of different models, taking into account the temporal dependencies between forecasts at different lead times. Experiments performed on 29-year sea surface temperature data from North American Multi-Model Ensemble (NAMME) demonstrate that the proposed architecture significantly outperforms standard LSTM and other MTL approaches."--Pages ii-iii.
Show less
- Title
- Distance-preserving graphs
- Creator
- Nussbaum, Ronald
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Let G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distance-preserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distance-preserving. We show that all hypercubes and graphs with delta(G)>=2n/3-1 are distance-preserving. Towards this end...
Show moreLet G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distance-preserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distance-preserving. We show that all hypercubes and graphs with delta(G)>=2n/3-1 are distance-preserving. Towards this end, we carefully examine the role of "forbidden" subgraphs. We discuss our observations, and provide some conjectures which we computationally verified for small values of n. We say that a distance-preserving graph is sequentially distance-preserving if each subgraph in the set of isometric subgraphs is a superset of the previous one, and consider this special case as well.There are a number of questions involving the construction of distance-preserving graphs. We show that it is always possible to add an edge to a non-complete sequentially distance-preserving graph such that the augmented graph is still sequentially distance-preserving. We further conjecture that the same is true of all distance-preserving graphs. We discuss our observations on making non-distance-preserving graphs into distance preserving ones via adding edges. We show methods for constructing regular distance-preserving graphs, and consider constructing distance-preserving graphs for arbitrary degree sequences. As before, all conjectures here have been computationally verified for small values of n.
Show less
- Title
- Performance analysis and privacy protection of network data
- Creator
- Ahmed, Faraz (Research engineer)
- Date
- 2018
- Collection
- Electronic Theses & Dissertations
- Description
-
"The goal of this thesis is to address network management research challenges faced by operational networks - with specific focus on cellular networks, content delivery networks, and online social networks. Next, I give an overview of my research on network management of these networks. Cellular networks utilize existing service quality management systems for detecting performance degradation issues inside the network, however, under certain conditions degradation in End-to-End (E2E)...
Show more"The goal of this thesis is to address network management research challenges faced by operational networks - with specific focus on cellular networks, content delivery networks, and online social networks. Next, I give an overview of my research on network management of these networks. Cellular networks utilize existing service quality management systems for detecting performance degradation issues inside the network, however, under certain conditions degradation in End-to-End (E2E) performance may go undetected. These conditions may arise due to problems in the mobile device hardware, smartphone applications, and content providers. In this thesis, I present a system for detecting and localizing E2E performance degradation at cellular service providers across four administrative domains: cellular network, content providers, device manufacturers, and smartphone applications. Cellular networks also need systems that can prioritize performance degradation issues according to the number of customers impacted. Cell tower outages are performance degradation issues that directly impact connectivity of cellular network users. In this thesis, we design and evaluate a cell tower outage monitoring system that analyzes and estimates device level impact during cell tower outages. Content delivery networks (CDNs) maintain multiple transit routes from content distribution servers to eyeball ISP networks which provide Internet connectivity to end users. Two major considerations for CDNs are transit prices and performance dynamics of delivering content to end users. The dynamic nature of transit pricing and performance makes it challenging to optimize the cost and performance tradeoff. There are thousands of eyeball ISPs which are reachable via different transit routes and different geographical locations. Each choice of transit route for a particular eyeball ISP and geographical location has distinct cost and performance characteristics, which makes the problem of developing a transit routing strategy challenging. In this thesis, I present a measurement approach to actively collect client perceived network performance and then use these measurements towards optimal transit route selection for CDNs. Online Social Networks (OSNs) often refuse to publish their social network graphs due to privacy concerns. Differential privacy has been the widely accepted criteria for privacy preserving data publishing. In this thesis, I present a random matrix approach to OSN graph publishing, which achieves storage and computational efficiency by reducing dimensions of adjacency matrices and achieves differential privacy by adding a small amount of noise."--Pages ii-iii.
Show less
- Title
- Exploiting smoothness in statistical learning, sequential prediction, and stochastic optimization
- Creator
- Mahdavi, Mehrdad
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss...
Show moreIn the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss functions (e.g., Lipschitzness, strong convexity, and smoothness). On the other hand, emergence of datasets of an unprecedented size, demands the development of novel and more efficient optimization algorithms to tackle large-scale learning problems. The overarching goal of this thesis is to reassess the smoothness of loss functions in statistical learning, sequential prediction/online learning, and stochastic optimization and explicate its consequences. In particular we examine how leveraging smoothness of loss function could be beneficial or detrimental in these settings in terms of sample complexity, statistical consistency, regret analysis, and convergence rate. In the statistical learning framework, we investigate the sample complexity of learning problems when the loss function is smooth and strongly convex and the learner is provided with the target risk as a prior knowledge. We establish that under these assumptions, by exploiting the smoothness of loss function, we are able to improve the sample complexity of learning exponentially. Furthermore, the proof of our results is constructive and is rooted in a properly designed stochastic optimization algorithm which could be of significant practical importance. We also investigate the smoothness from the viewpoint ofstatistical consistency and show that in sharp contrast to optimization and generalization where the smoothness is favorable because of its computational and theoretical virtues, the smoothness of surrogate loss function might deteriorate the binary excess risk. Motivated by this negative result, we provide a unified analysis of three types of errors including optimization error, generalization bound, and the error in translating convex excess risk into a binary excess risk, and underline the conditions that smoothness might be preferred.We then turn to elaborate the importance of smoothness in sequential prediction/online learning. We introduce a new measure to assess the performance of online learning algorithms which is referred to asgradual variation . The gradual variation is measured by the sum of the distances between every two consecutive loss functions and is more suitable for gradually evolving environments such as stock prediction. Under smoothness assumption, we devise novel algorithms for online convex optimization with regret bounded by gradual variation. The proposed algorithms can take advantage of benign sequences and at the same time protect against the adversarial sequences of loss functions. Finally, we investigate how to exploit the smoothness of loss function in convex optimization. Unlike the optimization methods based on full gradients, the smoothness assumption was not exploited by most of the existing stochastic optimization methods. We propose a novel optimization paradigm that is referred to asmixed optimization which interpolates between stochastic and full gradient methods and is able to exploit the smoothness of loss functions to obtain faster convergence rates in stochastic optimization, and condition number independent accesses of full gradients in deterministic optimization. The key underlying insight of mixed optimization is to utilize infrequent full gradients of the objective function to progressively reduce the variance of the stochastic gradients. These results show an intricate interplay between stochastic and deterministic convex optimization to take advantages of their individual merits. We also propose efficientprojection-free optimization algorithms to tackle the computational challenge arising from the projection steps which are required at each iteration of most existing gradient based optimization methods to ensure the feasibility of intermediate solutions. In stochastic optimization setting, by introducing and leveraging smoothness, we develop novel methods which only require one projection at the final iteration. In online learning setting, we consider online convex optimization with soft constraints where the constraints are allowed to be satisfied on long term. We show that by compromising on the learner's regret, one can devise efficient online learning algorithms with sub-linear bound on both the regret and the violation of the constraints
Show less
- Title
- Network analysis with negative links
- Creator
- Derr, Tyler Scott
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis...
Show moreAs we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Traditionally, network analysis has focused on networks having only positive links, or unsigned networks. However, in many real-world systems, relations between nodes in a graph can be both positive and negative, or signed networks. For example, in online social media, users not only have positive links such as friends, followers, and those they trust, but also can establish negative links to those they distrust, towards their foes, or block and unfriend users.Thus, although signed networks are ubiquitous due to their ability to represent negative links in addition to positive links, they have been significantly under explored. In addition, due to the rise in popularity of today's social media and increased polarization online, this has led to both an increased attention and demand for advanced methods to perform the typical network analysis tasks when also taking into consideration negative links. More specifically, there is a need for methods that can measure, model, mine, and apply signed networks that harness both these positive and negative relations. However, this raises novel challenges, as the properties and principles of negative links are not necessarily the same as positive links, and furthermore the social theories that have been used in unsigned networks might not apply with the inclusion of negative links.The chief objective of this dissertation is to first analyze the distinct properties negative links have as compared to positive links and towards improving network analysis with negative links by researching the utility and how to harness social theories that have been established in a holistic view of networks containing both positive and negative links. We discover that simply extending unsigned network analysis is typically not sufficient and that although the existence of negative links introduces numerous challenges, they also provide unprecedented opportunities for advancing the frontier of the network analysis domain. In particular, we develop advanced methods in signed networks for measuring node relevance and centrality (i.e., signed network measuring), present the first generative signed network model and extend/analyze balance theory to signed bipartite networks (i.e., signed network modeling), construct the first signed graph convolutional network which learns node representations that can achieve state-of-the-art prediction performance and then furthermore introduce the novel idea of transformation-based network embedding (i.e., signed network mining), and apply signed networks by creating a framework that can infer both link and interaction polarity levels in online social media and constructing an advanced comprehensive congressional vote prediction framework built around harnessing signed networks.
Show less
- Title
- Value-aware multi-objective interconnect optimization
- Creator
- Jayaprakash, Sharath
- Date
- 2011
- Collection
- Electronic Theses & Dissertations
- Description
-
Geometrical scaling of critical semiconductor dimensions while keeping electric field constant improves logic delay and power consumption, but degrades global and semi-global interconnect delay. Repeater insertion and higher aspect-ratio wires partially alleviate interconnect delay, but adversely impact power consumption, temperature, and crosstalk noise. Although a number of interconnect design techniques have been proposed, they primarily target: (1) one or the other design metric, not all...
Show moreGeometrical scaling of critical semiconductor dimensions while keeping electric field constant improves logic delay and power consumption, but degrades global and semi-global interconnect delay. Repeater insertion and higher aspect-ratio wires partially alleviate interconnect delay, but adversely impact power consumption, temperature, and crosstalk noise. Although a number of interconnect design techniques have been proposed, they primarily target: (1) one or the other design metric, not all relevant metrics simultaneously and/or (2) random or worst-case traffic conditions instead of real-workload traffic, which is important because interconnect behavior (energy, delay, temperature, etc.) depends upon traffic value characteristics.To address the shortcomings of previous work, we present a value-aware interconnect design methodology that permits simultaneous optimization of multiple metrics in a prioritized manner tailored to target application requirements. Towards this end, we first survey existing commonly-used interconnect design techniques and analyze their influence on different interconnect circuit parameters and design metrics. From among these, we focus on wire spacing and data encoding as examples of two techniques to which we apply our value-aware methodology for optimization of interconnect energy, average delay, and peak wire temperature in the context of the SPEC CPU2k benchmarks. Next, we describe and compare two value-aware techniques that optimize energy savings within area constraints: partitioned hybrid encoding and value-aware wire spacing. Finally, we present three encoding techniques with increasing flexibility to adapt to traffic value characteristics: (1)static , which employs a single encoding mode, fixed at design time based on traffic value characteristics, in all cycles; (2)dynamic , which chooses the most suitable encoding mode in any given cycle from among two or more predetermined modes; and (3)adaptive , which selects the most appropriate encoding mode in any given cycle from among a set of modes that changes with traffic conditions at run time. These techniques provide increasingly greater benefits, with our static technique easily outperforming previous, more complex, but value-oblivious, dynamic encoding techniques.
Show less
- Title
- Climate change impact assessments for regions of the United States
- Creator
- Tang, Ying
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
The Earth's climate is projected to change significantly in the future, which will greatly impact many natural and human systems. Climate projections are important components of climate change impact, vulnerability, and adaptation assessments, and this research aims to examine several issues related to climate models and model projections and the use of these projections in impact assessment studies. The research is composed of three individual studies. The first study compares various...
Show moreThe Earth's climate is projected to change significantly in the future, which will greatly impact many natural and human systems. Climate projections are important components of climate change impact, vulnerability, and adaptation assessments, and this research aims to examine several issues related to climate models and model projections and the use of these projections in impact assessment studies. The research is composed of three individual studies. The first study compares various methods in generating climate change projections for use in agriculture assessment studies at several lake-modified sites in the Great Lakes region of the United States. By producing climate change projections using different data sources and methods and comparing their similarities and differences, the study hopes to inform impact researchers and decision makers about the various choices for generating climate change projections and their advantages/disadvantages. The second study assesses the skill of regional climate models (RCMs) in simulating low-level wind maxima, often referred to as low-level jets (LLJs). As a pronounced climate feature in central United States, the LLJs have their impacts ranging from wind energy, to precipitation, and to bird migration. Knowing how well RCMs simulate the climatology of LLJ is a necessary first step towards a better understanding of RCMs as a powerful tool for generating regional climate change projections through dynamical downscaling for central US and other regions affected by LLJs. Finally, the third study applies RCM projections to assess the potential risk of extreme wildfires in the United States. Climate change is expected to alter the frequency and severity of atmospheric conditions conducive to wildfires. Using outputs from a suite of RCMs, this study examines the changes of an operational fire weather index, the Haines Index, between the current climate and the projected future climate. The results are expected to be used to inform fire managers that future summers might be more conducive to extreme and erratic wildfires.
Show less
- Title
- Edge impact in graphs and social network matrix completion
- Creator
- Ross, Dennis
- Date
- 2016
- Collection
- Electronic Theses & Dissertations
- Description
-
Every graph G can be associated with many well-known invariant properties along with their corresponding values. A framework is proposed to measure the change in any particular invariant upon addition of a new edge $e$ in the resulting graph G+e. In graphs, the P-impact of an edge e is the `magnitude' of the difference between the values of the invariant P in graphs G+e from G. Several famous invariants are explored and a proof towards optimal edge addition for distance-impact in trees is...
Show moreEvery graph G can be associated with many well-known invariant properties along with their corresponding values. A framework is proposed to measure the change in any particular invariant upon addition of a new edge $e$ in the resulting graph G+e. In graphs, the P-impact of an edge e is the `magnitude' of the difference between the values of the invariant P in graphs G+e from G. Several famous invariants are explored and a proof towards optimal edge addition for distance-impact in trees is given. A natural application to measuring the impact of edge addition to a graph is that of link prediction. An efficient algorithm for link prediction even with cold-start vertices using a subspace sharing method that decouples matrix completion and side information transduction is presented. This method is extended to predict ratings in user-item recommender systems where both may be cold-start.
Show less
- Title
- Multimodal learning and its application to modeling Alzheimer's disease
- Creator
- Wang, Qi (Graduate of Michigan State University)
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Multimodal learning gains increasing attention in recent years as heterogeneous data modalities are being collected from diverse domains or extracted from various feature extractors and used for learning. Multimodal learning is to integrate predictive information from different modalities to enhance the performance of the learned models. For example, when modeling Alzheimer's disease, multiple brain imaging modalities are collected from the patients, and effectively fusion from which is shown...
Show moreMultimodal learning gains increasing attention in recent years as heterogeneous data modalities are being collected from diverse domains or extracted from various feature extractors and used for learning. Multimodal learning is to integrate predictive information from different modalities to enhance the performance of the learned models. For example, when modeling Alzheimer's disease, multiple brain imaging modalities are collected from the patients, and effectively fusion from which is shown to be beneficial to predictive performance. Multimodal learning is associated with many challenges. One outstanding challenge is the severe overfitting problems due to the high feature dimension when concatenating the modalities. For example, the feature dimension of diffusion-weighted MRI modalities, which has been used in Alzheimer's disease diagnosis, is usually much larger than the sample size available for training. To solve this problem, in the first work, I propose a sparse learning method that selects the important features and modalities to alleviate the overfitting problem. Another challenge in multimodal learning is the heterogeneity among the modalities and their potential interactions. My second work explores non-linear interactions among the modalities. The proposed model learns a modality invariant component, which serves as a compact feature representation of the modalities and has high predictive power. In addition to utilize the modality invariant information of multiple modalities, modalities may provide supplementary information, and correlating them in the learning can be more informative. Thus, in the third work, I propose multimodal information bottleneck to fuse supplementary information from different modalities while eliminating the irrelevant information from them. One challenge of utilizing the supplementary information of multiple modalities is that most work can only be applied to the data with complete modalities. Modalities missing problem widely exists in multimodal learning tasks. For these tasks, only a small portion of data can be used to train the model. Thus, to fully use all the precious data, in the fourth work, I propose a knowledge distillation based algorithm to utilize all the data, including those that have missing modalities while fusing the supplementary information.
Show less
- Title
- Multiple kernel and multi-label learning for image categorization
- Creator
- Bucak, Serhat Selçuk
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning...
Show more"One crucial step towards the goal of converting large image collections to useful information sources is image categorization. The goal of image categorization is to find the relevant labels for a given an image from a closed set of labels. Despite the huge interest and significant contributions by the research community, there remains much room for improvement in the image categorization task. In this dissertation, we develop efficient multiple kernel learning and multi-label learning algorithms with high prediction performance for image categorization... " -- Abstract.
Show less
- Title
- Contributions to machine learning in biomedical informatics
- Creator
- Baytas, Inci Meliha
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
"With innovations in digital data acquisition devices and increased memory capacity, virtually all commercial and scientific domains have been witnessing an exponential growth in the amount of data they can collect. For instance, healthcare is experiencing a tremendous growth in digital patient information due to the high adaptation rate of electronic health record systems in hospitals. The abundance of data offers many opportunities to develop robust and versatile systems, as long as the...
Show more"With innovations in digital data acquisition devices and increased memory capacity, virtually all commercial and scientific domains have been witnessing an exponential growth in the amount of data they can collect. For instance, healthcare is experiencing a tremendous growth in digital patient information due to the high adaptation rate of electronic health record systems in hospitals. The abundance of data offers many opportunities to develop robust and versatile systems, as long as the underlying salient information in data can be captured. On the other hand, today's data, often named big data, is challenging to analyze due to its large scale and high complexity. For this reason, efficient data-driven techniques are necessary to extract and utilize the valuable information in the data. The field of machine learning essentially develops such techniques to learn effective models directly from the data. Machine learning models have been successfully employed to solve complicated real world problems. However, the big data concept has numerous properties that pose additional challenges in algorithm development. Namely, high dimensionality, class membership imbalance, non-linearity, distributed data, heterogeneity, and temporal nature are some of the big data characteristics that machine learning must address. Biomedical informatics is an interdisciplinary domain where machine learning techniques are used to analyze electronic health records (EHRs). EHR comprises digital patient data with various modalities and depicts an instance of big data. For this reason, analysis of digital patient data is quite challenging although it provides a rich source for clinical research. While the scale of EHR data used in clinical research might not be huge compared to the other domains, such as social media, it is still not feasible for physicians to analyze and interpret longitudinal and heterogeneous data of thousands of patients. Therefore, computational approaches and graphical tools to assist physicians in summarizing the underlying clinical patterns of the EHRs are necessary. The field of biomedical informatics employs machine learning and data mining approaches to provide the essential computational techniques to analyze and interpret complex healthcare data to assist physicians in patient diagnosis and treatment. In this thesis, we propose and develop machine learning algorithms, motivated by prevalent biomedical informatics tasks, to analyze the EHRs. Specifically, we make the following contributions: (i) A convex sparse principal component analysis approach along with variance reduced stochastic proximal gradient descent is proposed for the patient phenotyping task, which is defined as finding clinical representations for patient groups sharing the same set of diseases. (ii) An asynchronous distributed multi-task learning method is introduced to learn predictive models for distributed EHRs. (iii) A modified long-short term memory (LSTM) architecture is designed for the patient subtyping task, where the goal is to cluster patients based on similar progression pathways. The proposed LSTM architecture, T-LSTM, performs a subspace decomposition on the cell memory such that the short term effect in the previous memory is discounted based on the length of the time gap. (iv) An alternative approach to T-LSTM model is proposed with a decoupled memory to capture the short and long term changes. The proposed model, decoupled memory gated recurrent network (DM-GRN), is designed to learn two types of memories focusing on different components of the time series data. In this study, in addition to the healthcare applications, behavior of the proposed model is investigated for traffic speed prediction problem to illustrate its generalization ability. In summary, the aforementioned machine learning approaches have been developed to address complex characteristics of electronic health records in routine biomedical informatics tasks such as computational patient phenotyping and patient subtyping. Proposed models are also applicable to different domains with similar data characteristics as EHRs."--Pages ii-iii.
Show less
- Title
- Learning with structures
- Creator
- Zhou, Yang
- Date
- 2010
- Collection
- Electronic Theses & Dissertations
- Description
-
In this dissertation we discuss learning with structures, which appears frequently in both machine learning theories and applications. First we review existing structure learning algorithms, then we study several specifically interesting problems. The first problem we study is the structure learning of dynamic systems. We investigate using dynamic Bayesian networks to reconstruct functional cortical networks from the spike trains of neurons. Next we study structure learning from matrix...
Show moreIn this dissertation we discuss learning with structures, which appears frequently in both machine learning theories and applications. First we review existing structure learning algorithms, then we study several specifically interesting problems. The first problem we study is the structure learning of dynamic systems. We investigate using dynamic Bayesian networks to reconstruct functional cortical networks from the spike trains of neurons. Next we study structure learning from matrix factorization, which has been a popular research area in recent years. We propose an efficient non-negative matrix factorization algorithm which derives not only the membership assignments to the clusters but also the interaction strengths among the clusters. Following that we study the hierarchical and grouped structure in regularization. We propose a novel regularizer called group lasso which introduces competitions among variables in groups, and thus results in sparse solutions. Finally we study the sparse structure in a novel problem of online feature selection, and propose an online learning algorithm that only needs to sense a small number of attributes before the reliable decision can be made.
Show less
- Title
- High-dimensional variable selection for spatial regression and covariance estimation
- Creator
- Nandy, Siddhartha
- Date
- 2016
- Collection
- Electronic Theses & Dissertations
- Description
-
Spatial regression is an important predictive tool in many scientific applications and an additive model provides a flexible regression relationship between predictors and a response variable. Such a model is proved to be effective in regression based prediction. In this article, we develop a regularized variable selection technique for building a spatial additive model. We find that the approaches developed for independent data do not work well for spatially dependent data. This motivates us...
Show moreSpatial regression is an important predictive tool in many scientific applications and an additive model provides a flexible regression relationship between predictors and a response variable. Such a model is proved to be effective in regression based prediction. In this article, we develop a regularized variable selection technique for building a spatial additive model. We find that the approaches developed for independent data do not work well for spatially dependent data. This motivates us to propose a spatially weighted L2- error norm with a group LASSO type penalty to select additive components for spatial additive models. We establish the selection consistency of the proposed approach where a penalty parameter depends on several factors, such as the order of approximation of additive components, characteristics of the spatial weight and spatial dependence, etc. An extensive simulation study provides a vivid picture of the impacts of dependent data structures and choices of a spatial weight on selection results as well as the asymptotic behavior of the estimates. We also investigate the impact of correlated predictor variables. As an illustrative example, the proposed approach is applied to lung cancer mortality data over the period of 2000-2005, obtained from Surveillance, Epidemiology, and End Results Program by the National Cancer Institute, U.S.Providing a best linear unbiased predictor (BLUP) is always a challenge for a non-repetitive, irregularly spaced, spatial data. The estimation process as well as prediction involves inverting an $n\times n$ covariance matrix, which computationally requires O(n^3). Studies showed the potential observed process covariance matrix can be decomposed into two additive matrix components, measurement error and an underlying process which can be non-stationary. The non-stationary component is often assumed to be fixed but low rank. This assumption allows us to write the underlying process as a linear combination of fixed numbers of spatial random effects, known as fixed rank kriging (FRK). The benefit of smaller rank has been used to improve the computation time as O(n r^2), where r is the rank of the low rank covariance matrix. In this work we generalize FRK, by rewriting the underlying process as a linear combination of n random effects, although only a few among these are actually responsible to quantify the covariance structure. Further, FRK considers the covariance matrix of the random effect can be represented as product of r x r cholesky decomposition. The generalization leads us to a n x n cholesky decomposition and use a group-wise penalized likelihood where each row of the lower triangular matrix is penalized. More precisely, we present a two-step approach using group LASSO type shrinkage estimation technique for estimating the rank of the covariance matrix and finally the matrix itself. We investigate our findings over a set of simulation study and finally apply to a rainfall data obtained on Colorado, US.
Show less
- Title
- Biometric template security
- Creator
- Nagar, Abhishek
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
"This dissertation provides a thorough analysis of the vulnerabilities of a biometric recognition system with emphasis on the vulnerabilities related to the information stored in biometric systems in the form of biometric templates."--From abstract.
- Title
- Directed information for complex network analysis from multivariate time series
- Creator
- Liu, Ying
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Complex networks, ranging from gene regulatory networks in biology to social networks in sociology, havereceived growing attention from the scientific community. The analysis of complex networks employs techniquesfrom graph theory, machine learning and signal processing. In recent years, complex network analysis tools havebeen applied to neuroscience and neuroimaging studies to have a better understanding of the human brain. In thisthesis, we focus on inferring and analyzing the complex...
Show moreComplex networks, ranging from gene regulatory networks in biology to social networks in sociology, havereceived growing attention from the scientific community. The analysis of complex networks employs techniquesfrom graph theory, machine learning and signal processing. In recent years, complex network analysis tools havebeen applied to neuroscience and neuroimaging studies to have a better understanding of the human brain. In thisthesis, we focus on inferring and analyzing the complex functional brain networks underlying multichannelelectroencephalogram (EEG) recordings. Understanding this complex network requires the development of a measureto quantify the relationship between multivariate time series, algorithms to reconstruct the network based on thepairwise relationships, and identification of functional modules within the network.Functional and effective connectivity are two widely studiedapproaches to quantify the connectivity between two recordings.Unlike functional connectivity which only quantifies the statisticaldependencies between two processes by measures such as crosscorrelation, phase synchrony, and mutual information (MI), effectiveconnectivity quantifies the influence one node exerts on anothernode. Directed information (DI) measure is one of the approachesthat has been recently proposed to capture the causal relationshipsbetween two time series. Two major challenges remain with theapplication of DI to multivariate data, which include thecomputational complexity of computing DI with increasing signallength and the accuracy of estimation from limited realizations ofthe data. Expressions that can simplify the computation of theoriginal definition of DI while still quantifying the causalityrelationship are needed. In addition, the advantage of DI overconventionally causality measures such as Granger causality has notbeen fully investigated. In this thesis, we propose time-laggeddirected information and modified directed information to addressthe issue of computational complexity, and compare the performanceof this model free measure with model based measures (e.g. Grangercausality) for different realistic signal models.Once the pairwise DI between two random processes is computed,another problem is to infer the underlying structure of the complexnetwork with minimal false positive detection. We propose to useconditional directed information (CDI) proposed by Kramer to addressthis issue, and introduce the time-lagged conditional directedinformation and modified conditional directed information to lowerthe computational complexity of CDI. Three network inferencealgorithms are presented to infer directed acyclic networks whichcan quantify the causality and also detect the indirect couplingssimultaneously from multivariate data.One last challenge in the study of complex networks, specifically in neuroscience applications, is to identifythe functional modules from multichannel, multiple subject recordings. Most research on community detection inthis area so far has focused on finding the association matrix based on functional connectivity, instead ofeffective connectivity, thus not capturing the causality in the network. In addition, in order to find a modularstructure that best describes all of the subjects in a group, a group analysis strategy is needed. In thisthesis, we propose a multi-subject hierarchical community detection algorithm suitable for a group of weightedand asymmetric (directed) networks representing effective connectivity, and apply the algorithm to multichannelelectroencephalogram (EEG) data.
Show less
- Title
- Learning from noisily connected data
- Creator
- Yang, Tianbao
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Machine learning is a discipline of developing computational algorithms for learning predictive models from data. Traditional analytical learning methods treat the data as independent and identically distributed (i.i.d) samples from unknown distributions. However, this assumption is often violated in many real world applications that leading to the challenge of learning predictive models. For example, in electronic commerce website, customers could purchase a product by the recommendation of...
Show moreMachine learning is a discipline of developing computational algorithms for learning predictive models from data. Traditional analytical learning methods treat the data as independent and identically distributed (i.i.d) samples from unknown distributions. However, this assumption is often violated in many real world applications that leading to the challenge of learning predictive models. For example, in electronic commerce website, customers could purchase a product by the recommendation of their friends. Hence the purchasement records of customers are not i.i.d samples but correlated. Nowadays, data become correlated due to collaborations, interactions, communications, and many other types of connections. Effective learning from these connected data not only provides better understanding of the data but also brings significant economic benefits. How to learn from the connected data also brings unique challenges to both supervised learning and unsupervised learning algorithms because these algorithms are designed for i.i.d data and are often sensitive to the noise in the connected data. In this dissertation, I focus on developing theory and algorithms for learning from connected data. In particular, I consider two types of connections: the first type of connection is naturally formed in real wold networks, while the second type of connection is manually created to facilitate the learning process which is called must-and-cannot link. In the first part of this dissertation, I develop efficient algorithms for detecting communities in the first type of connected data. In the second part of this dissertation, I develop clustering algorithms that effectively utilize both must links and cannot links for the second type of connected dataA common approach toward learning from connected data is to assume that if two data points are connected, they are likely to be assigned to the same class/cluster. This assumption is often violated in real-word applications, leading to the noisy connection problems. One key challenge of learning from connected data is how to model the noisy pairwise connections that indicates the pairwise class-relationship between two data points. In the problem of detecting communities in networked data, I develop Bayesian approaches that explicitly model the noisy pairwise links by introducing additional hidden variables, besides community memberships, to explain potential inconsistency between the pairwise connections and pairwise class-relationship. In clustering must-and-cannot linked data, I will try to model how the noise is added into the pairwise connections in the manually generating process. The main contributions of this dissertation include (i) it introducespopularity andproductivity for the first time besides the community memberships to model the generation of noisy links in real networks; the effectiveness of these factors is demonstrated through the task of community detection; (ii) it proposes a discriminative model for the first time that combines the content and link analysis together for detecting communities to alleviate the impact of noisy connections in community detection; (iii) it presents a general approach for learning from noisily labeled data, proves the theoretical convergence results for the first time and applies the approach in clustering noisy must-and-cannot linked data.
Show less