You are here
Search results
(1 - 20 of 36)
Pages
- Title
- Robust multi-task learning algorithms for predictive modeling of spatial and temporal data
- Creator
- Liu, Xi (Graduate of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially...
Show more"Recent years have witnessed the significant growth of spatial and temporal data generated from various disciplines, including geophysical sciences, neuroscience, economics, criminology, and epidemiology. Such data have been extensively used to train spatial and temporal models that can make predictions either at multiple locations simultaneously or along multiple forecasting horizons (lead times). However, training an accurate prediction model in these domains can be challenging especially when there are significant noise and missing values or limited training examples available. The goal of this thesis is to develop novel multi-task learning frameworks that can exploit the spatial and/or temporal dependencies of the data to ensure robust predictions in spite of the data quality and scarcity problems. The first framework developed in this dissertation is designed for multi-task classification of time series data. Specifically, the prediction task here is to continuously classify activities of a human subject based on the multi-modal sensor data collected in a smart home environment. As the classes exhibit strong spatial and temporal dependencies, this makes it an ideal setting for applying a multi-task learning approach. Nevertheless, since the type of sensors deployed often vary from one room (location) to another, this introduces a structured missing value problem, in which blocks of sensor data could be missing when a subject moves from one room to another. To address this challenge, a probabilistic multi-task classification framework is developed to jointly model the activity recognition tasks from all the rooms, taking into account the block-missing value problem. The framework also learns the transitional dependencies between classes to improve its overall prediction accuracy. The second framework is developed for the multi-location time series forecasting problem. Although multi-task learning has been successfully applied to many time series forecasting applications such as climate prediction, conventional approaches aim to minimize only the point-wise residual error of their predictions instead of considering how well their models fit the overall distribution of the response variable. As a result, their predicted distribution may not fully capture the true distribution of the data. In this thesis, a novel distribution-preserving multi-task learning framework is proposed for the multi-location time series forecasting problem. The framework uses a non-parametric density estimation approach to fit the distribution of the response variable and employs an L2-distance function to minimize the divergence between the predicted and true distributions. The third framework proposed in this dissertation is for the multi-step-ahead (long-range) time series prediction problem with application to ensemble forecasting of sea surface temperature. Specifically, our goal is to effectively combine the forecasts generated by various numerical models at different lead times to obtain more precise predictions. Towards this end, a multi-task deep learning framework based on a hierarchical LSTM architecture is proposed to jointly model the ensemble forecasts of different models, taking into account the temporal dependencies between forecasts at different lead times. Experiments performed on 29-year sea surface temperature data from North American Multi-Model Ensemble (NAMME) demonstrate that the proposed architecture significantly outperforms standard LSTM and other MTL approaches."--Pages ii-iii.
Show less
- Title
- Modeling physical causality of action verbs for grounded language understanding
- Creator
- Gao, Qiaozi
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Building systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the...
Show moreBuilding systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the world, and to make inference based on what we hear or read. Commonsense knowledge is generally shared among cognitive capable individuals, thus it is rarely stated in human language. This makes it very difficult for artificial agents to acquire commonsense knowledge from language corpora. To address this problem, this dissertation investigates the acquisition of commonsense knowledge, especially knowledge related to basic actions upon the physical world and how that influences language processing and grounding.Linguistics studies have shown that action verbs often denote some change of state (CoS) as the result of an action. For example, the result of "slice a pizza" is that the state of the object (pizza) changes from one big piece to several smaller pieces. However, the causality of action verbs and its potential connection with the physical world has not been systematically explored. Artificial agents often do not have this kind of basic commonsense causality knowledge, which makes it difficult for these agents to work with humans and to reason, learn, and perform actions.To address this problem, this dissertation models dimensions of physical causality associated with common action verbs. Based on such modeling, several approaches are developed to incorporate causality knowledge to language grounding, visual causality reasoning, and commonsense story comprehension.
Show less
- Title
- Finding optimized bounding boxes of polytopes in d-dimensional space and their properties in k-dimensional projections
- Creator
- Shahid, Salman (Of Michigan State University)
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Using minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a...
Show moreUsing minimal bounding boxes to encapsulate or approximate a set of points in d-dimensional space is a non-trivial problem that has applications in a variety of fields including collision detection, object rendering, high dimensional databases and statistical analysis to name a few. While a significant amount of work has been done on the three dimensional variant of the problem (i.e. finding the minimum volume bounding box of a set of points in three dimensions), it is difficult to find a simple method to do the same for higher dimensions. Even in three dimensions existing methods suffer from either high time complexity or suboptimal results with a speed up in execution time. In this thesis we present a new approach to find the optimized minimum bounding boxes of a set of points defining convex polytopes in d-dimensional space. The solution also gives the optimal bounding box in three dimensions with a much simpler implementation while significantly speeding up the execution time for a large number of vertices. The basis of the proposed approach is a series of unique properties of the k-dimensional projections that are leveraged into an algorithm. This algorithm works by constructing the convex hulls of a given set of points and optimizing the projections of those hulls in two dimensional space using the new concept of Simultaneous Local Optimal. We show that the proposed algorithm provides significantly better performances than those of the current state of the art approach on the basis of time and accuracy. To illustrate the importance of the result in terms of a real world application, the optimized bounding box algorithm is used to develop a method for carrying out range queries in high dimensional databases. This method uses data transformation techniques in conjunction with a set of heuristics to provide significant performance improvement.
Show less
- Title
- Data clustering with pairwise constraints
- Creator
- Yi, Jinfeng
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
The classical unsupervised clustering is an ill-posed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of non-expert crowd workers, which leads to the problem of crowdclustering, and (ii)...
Show moreThe classical unsupervised clustering is an ill-posed problem due to the absence of a unique clustering criteria. This issue can be addressed by introducing additional supervised information, usually casts in the form of pairwise constraints, to the clustering procedure. Depending on the sources, most pairwise constraints can be classified into two categories: (i) pairwise constraints collected from a set of non-expert crowd workers, which leads to the problem of crowdclustering, and (ii) pairwise constraints collected from oracle or experts, which leads to the problem of semi-supervised clustering. In both cases, the costs of collecting pairwise constraints can be expensive, thus it is important to identify the minimal number of pairwise constraints needed to accurately recover the underlying true data partition, also known as a sample complexity problem.In this thesis, we first analyze the sample complexity of crowdclustering. At first, we propose a novel crowdclustering approach based on the theory of matrix completion. Unlike the existing crowdclustering algorithm that is based on a Bayesian generative model, the proposed approach is more desirable since it only needs a much less number of crowdsourced pairwise annotations to accurately cluster all the objects. Our theoretical analysis shows that in order to accurately cluster $N$ objects, only $O(N\log^2 N)$ randomly sampled pairs should be annotated by crowd workers. To further reduce the sample complexity, we then introduce a semi-crowdsourced clustering framework that is able to effectively incorporate the low-level features of the objects to be clustered. In this framework, we only need to sample a subset of $n \ll N$ objects and generate their pairwise constraints via crowdsourcing. After completing a $n \times n$ similarity matrix using the proposed crowdclustering algorithm, we can further recover a $N \times N$ similarity matrix by applying a regression-based distance metric learning algorithm to the completed smaller size similarity matrix. This enables us to reliably cluster $N$ objects with only $O(n\log^2 n)$ crowdsourced pairwise constraints.Next, we study the problem of sample complexity in semi-supervised clustering. To this end, we propose a novel convex semi-supervised clustering approach based on the theory of matrix completion. In order to reduce the number of pairwise constraints needed %to achieve a perfect data partitioning,we apply a nature assumption that the feature representationsof the objects are able to reflect the similarities between objects. This enables us to only utilize $O(\log N)$ pairwiseconstraints to perfectly recover the data partition of $N$ objects.Lastly, in addition to sample complexity that relates to labeling costs, we also consider the computational costs of semi-supervised clustering.%In addition to sample complexity that relates to the labeling costs, we also consider the computational cost of semi-supervised clustering in the final part of this thesis.Specifically, we study the problem of efficiently updating clustering results when the pairwise constraints are generated sequentially, a common case in various real-world applications such as social networks. To address this issue, we develop a dynamic semi-supervised clustering algorithm that casts the clustering problem into a searching problem in a feasibleconvex space, i.e., a convex hull with its extreme points being an ensemble of multiple data partitions. Unlike classical semi-supervised clustering algorithms that need to re-optimize their objective functions when new pairwise constraints are generated, the proposed method only needs to update a low-dimensional vector and its time complexity is irrelevant to the number of data points to be clustered. This enables us to update large-scale clustering results in an extremely efficient way.
Show less
- Title
- Large-scale high dimensional distance metric learning and its application to computer vision
- Creator
- Qian, Qi
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Learning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$-nearest neighbor classifier, $k$-means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have...
Show moreLearning an appropriate distance function (i.e., similarity) is one of the key tasks in machine learning, especially for distance based machine learning algorithms, e.g., $k$-nearest neighbor classifier, $k$-means clustering, etc. Distance metric learning (DML), the subject to be studied in this dissertation, is designed to learn a metric that pulls the examples from the same class together and pushes the examples from different classes away from each other. Although many DML algorithms have been developed in the past decade, most of them can handle only small data sets with hundreds of features, significantly limiting their applications to real world applications that often involve millions of training examples represented by hundreds of thousands of features. Three main challenges are encountered to learn the metric from these large-scale high dimensional data: (i) To make sure that the learned metric is a Positive Semi-Definitive (PSD) matrix, a projection into the PSD cone is required at every iteration, whose cost is cubic in the dimensionality making it unsuitable for high dimensional data; (ii) The number of variables that needs to be optimized in DML is quadratic in the dimensionality, which results in the slow convergence rate in optimization and high requirement of memory storage; (iii) The number of constraints used by DML is at least quadratic, if not cubic, in the number of examples depending on if pairwise constraints or triplet constraints are used in DML. Besides, features can be redundant due to high dimensional representations (e.g., face features) and DML with feature selection is preferred for these applications.The main contribution of this dissertation is to address these challenges both theoretically and empirically. First, for the challenge arising from the PSD projection, we exploit the mini-batch strategy and adaptive sampling with smooth loss function to significantly reduce the number of updates (i.e., projections) while keeping the similar performance. Second, for the challenge arising from high dimensionality, we propose a dual random projection approach, which enjoys the light computation due to the usage of random projection and at the same time, significantly improves the effectiveness of random projection. Third, for the challenge with large-scale constraints, we develop a novel multi-stage metric learning framework. It divides the original optimization problem into multiple stages. It reduces the computation by adaptively sampling a small subset of constraints at each stage. Finally, to handle redundant features with group property, we develop a greedy algorithm that selects feature group and learns the corresponding metric simultaneously at each iteration leading to further improvement of learning efficiency when combined with adaptive mini-batch strategy and incremental sampling. Besides the theoretical and empirical investigation of DML on the benchmark datasets of machine learning, we also apply the proposed methods to several important computer vision applications (i.e., fine-grained visual categorization (FGVC) and face recognition).
Show less
- Title
- High-dimensional variable selection for spatial regression and covariance estimation
- Creator
- Nandy, Siddhartha
- Date
- 2016
- Collection
- Electronic Theses & Dissertations
- Description
-
Spatial regression is an important predictive tool in many scientific applications and an additive model provides a flexible regression relationship between predictors and a response variable. Such a model is proved to be effective in regression based prediction. In this article, we develop a regularized variable selection technique for building a spatial additive model. We find that the approaches developed for independent data do not work well for spatially dependent data. This motivates us...
Show moreSpatial regression is an important predictive tool in many scientific applications and an additive model provides a flexible regression relationship between predictors and a response variable. Such a model is proved to be effective in regression based prediction. In this article, we develop a regularized variable selection technique for building a spatial additive model. We find that the approaches developed for independent data do not work well for spatially dependent data. This motivates us to propose a spatially weighted L2- error norm with a group LASSO type penalty to select additive components for spatial additive models. We establish the selection consistency of the proposed approach where a penalty parameter depends on several factors, such as the order of approximation of additive components, characteristics of the spatial weight and spatial dependence, etc. An extensive simulation study provides a vivid picture of the impacts of dependent data structures and choices of a spatial weight on selection results as well as the asymptotic behavior of the estimates. We also investigate the impact of correlated predictor variables. As an illustrative example, the proposed approach is applied to lung cancer mortality data over the period of 2000-2005, obtained from Surveillance, Epidemiology, and End Results Program by the National Cancer Institute, U.S.Providing a best linear unbiased predictor (BLUP) is always a challenge for a non-repetitive, irregularly spaced, spatial data. The estimation process as well as prediction involves inverting an $n\times n$ covariance matrix, which computationally requires O(n^3). Studies showed the potential observed process covariance matrix can be decomposed into two additive matrix components, measurement error and an underlying process which can be non-stationary. The non-stationary component is often assumed to be fixed but low rank. This assumption allows us to write the underlying process as a linear combination of fixed numbers of spatial random effects, known as fixed rank kriging (FRK). The benefit of smaller rank has been used to improve the computation time as O(n r^2), where r is the rank of the low rank covariance matrix. In this work we generalize FRK, by rewriting the underlying process as a linear combination of n random effects, although only a few among these are actually responsible to quantify the covariance structure. Further, FRK considers the covariance matrix of the random effect can be represented as product of r x r cholesky decomposition. The generalization leads us to a n x n cholesky decomposition and use a group-wise penalized likelihood where each row of the lower triangular matrix is penalized. More precisely, we present a two-step approach using group LASSO type shrinkage estimation technique for estimating the rank of the covariance matrix and finally the matrix itself. We investigate our findings over a set of simulation study and finally apply to a rainfall data obtained on Colorado, US.
Show less
- Title
- TEACHERS IN SOCIAL MEDIA : A DATA SCIENCE PERSPECTIVE
- Creator
- Karimi, Hamid
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Social media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and...
Show moreSocial media has become an integral part of human life in the 21st century. The number of social media users was estimated to be around 3.6 billion individuals in 2020. Social media platforms (e.g., Facebook) have facilitated interpersonal communication, diffusion of information, the creation of groups and communities, to name a few. As far as education systems are concerned, online social media has transformed and connected traditional social networks within the schoolhouse to a broader and expanded world outside. In such an expanded virtual space, teachers engage in various activities within their communities, e.g., exchanging instructional resources, seeking new teaching methods, engaging in online discussions. Therefore, given the importance of teachers in social media and its tremendous impact on PK-12 education, in this dissertation, we investigate teachers in social media from a data science perspective. Our investigation in this direction is essentially an interdisciplinary endeavor bridging modern data science and education. In particular, we have made three contributions, as briefly discussed in the following. Current teachers in social media studies suffice to a small number of surveyed teachers while thousands of other teachers are on social media. This hinders us from conducting large-scale data-driven studies pertinent to teachers in social media. Aiming to overcome this challenge and further facilitate data-driven studies related to teachers in social media, we propose a novel method that automatically identifies teachers on Pinterest, an image-based social media popular among teachers. In this framework, we formulate the teacher identification problem as a positive unlabelled (PU) learning where positive samples are surveyed teachers, and unlabelled samples are their online friends. Using our framework, we build the largest dataset of teachers on Pinterest. With this dataset at our disposal, we perform an exploratory analysis of teachers on Pinterest while considering their genders. Our analysis incorporates two crucial aspects of teachers in social media. First, we investigate various online activities of male and female teachers, e.g., topics and sources of their curated resources, the professional language employed to describe their resources. Second, we investigate male and female teachers in the context of the social network (the graph) they belong to, e.g., structural centrality, gender homophily. Our analysis and findings in this part of the dissertation can serve as a valuable reference for many entities concerned with teachers' gender, e.g., principals, state, and federal governments.Finally, in the third part of the dissertation, we shed light on the diffusion of teacher-curated resources on Pinterest. First, we introduce three measures to characterize the diffusion process. Then, we investigate these three measures while considering two crucial characteristics of a resource, e.g., the topic and the source. Ultimately, we investigate how teacher attributes (e.g., the number of friends) affect the diffusion of their resources. The conducted diffusion analysis is the first of its kind and offers a deeper understating of the complex mechanism driving the diffusion of resources curated by teachers on Pinterest.
Show less
- Title
- Machine Learning on Drug Discovery : Algorithms and Applications
- Creator
- Sun, Mengying
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Drug development is an expensive and time-consuming process where thousands of chemical compounds are being tested and experiments being conducted in order to find out drugs that are safe and effective. Modern drug development aims to speed up the intermediate steps and reduce cost by leveraging machine learning techniques, typically at drug discovery and preclinical research stages. Better identification of promising candidates can significantly reduce the load of later processes, e.g.,...
Show moreDrug development is an expensive and time-consuming process where thousands of chemical compounds are being tested and experiments being conducted in order to find out drugs that are safe and effective. Modern drug development aims to speed up the intermediate steps and reduce cost by leveraging machine learning techniques, typically at drug discovery and preclinical research stages. Better identification of promising candidates can significantly reduce the load of later processes, e.g., clinical trials, saving tons of resources as well as time.In this dissertation, we explored and proposed novel machine learning algorithms for drug discovery from the aspects of robustness, knowledge transfer, molecular generation and optimization. First of all, labels from high-throughput experiments (e.g., biological profiling and chemical screening) often contain inevitable noise due to technical and biological variations. We proposed a method that leverages both disagreement and agreement among deep neural networks to mitigate the negative effect brought by noisy labels and better predict drug responses. Secondly, graph neural networks (GNNs) has become popular for modeling graph-structured data (e.g., molecules). Graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be an effective strategy for pretraining GNNs. However, the existing graph contrastive learning methods have intrinsic limitations when adopted for molecular tasks. Therefore, we proposed a method that utilizes domain knowledge at both local- and global-level to assist representation learning. The local-level domain knowledge guides the augmentation process such that variation is introduced without changing graph semantics. The global-level knowledge encodes the similarity information between graphs in the entire dataset and helps to learn representations with richer semantics. Last but not least, we proposed a search-based approach for multi-objective molecular generation and optimization. We show that given proper design and sufficient information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient. Specifically, the proposed method starts with existing molecules and uses a two-stage search strategy to gradually modify them into new ones, based on transformation rules derived from large compound libraries. We demonstrate all the proposed methods with extensive experiments.
Show less
- Title
- Distance-preserving graphs
- Creator
- Nussbaum, Ronald
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Let G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distance-preserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distance-preserving. We show that all hypercubes and graphs with delta(G)>=2n/3-1 are distance-preserving. Towards this end...
Show moreLet G be a simple graph on n vertices, where d_G(u,v) denotes the distance between vertices u and v in G. An induced subgraph H of G is isometric if d_H(u,v)=d_G(u,v) for all u,v in V(H). We say that G is a distance-preserving graph if G contains at least one isometric subgraph of order k for every k where 1<=k<=n.A number of sufficient conditions exist for a graph to be distance-preserving. We show that all hypercubes and graphs with delta(G)>=2n/3-1 are distance-preserving. Towards this end, we carefully examine the role of "forbidden" subgraphs. We discuss our observations, and provide some conjectures which we computationally verified for small values of n. We say that a distance-preserving graph is sequentially distance-preserving if each subgraph in the set of isometric subgraphs is a superset of the previous one, and consider this special case as well.There are a number of questions involving the construction of distance-preserving graphs. We show that it is always possible to add an edge to a non-complete sequentially distance-preserving graph such that the augmented graph is still sequentially distance-preserving. We further conjecture that the same is true of all distance-preserving graphs. We discuss our observations on making non-distance-preserving graphs into distance preserving ones via adding edges. We show methods for constructing regular distance-preserving graphs, and consider constructing distance-preserving graphs for arbitrary degree sequences. As before, all conjectures here have been computationally verified for small values of n.
Show less
- Title
- Network analysis with negative links
- Creator
- Derr, Tyler Scott
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis...
Show moreAs we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively extract insightful patterns. We can then seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Traditionally, network analysis has focused on networks having only positive links, or unsigned networks. However, in many real-world systems, relations between nodes in a graph can be both positive and negative, or signed networks. For example, in online social media, users not only have positive links such as friends, followers, and those they trust, but also can establish negative links to those they distrust, towards their foes, or block and unfriend users.Thus, although signed networks are ubiquitous due to their ability to represent negative links in addition to positive links, they have been significantly under explored. In addition, due to the rise in popularity of today's social media and increased polarization online, this has led to both an increased attention and demand for advanced methods to perform the typical network analysis tasks when also taking into consideration negative links. More specifically, there is a need for methods that can measure, model, mine, and apply signed networks that harness both these positive and negative relations. However, this raises novel challenges, as the properties and principles of negative links are not necessarily the same as positive links, and furthermore the social theories that have been used in unsigned networks might not apply with the inclusion of negative links.The chief objective of this dissertation is to first analyze the distinct properties negative links have as compared to positive links and towards improving network analysis with negative links by researching the utility and how to harness social theories that have been established in a holistic view of networks containing both positive and negative links. We discover that simply extending unsigned network analysis is typically not sufficient and that although the existence of negative links introduces numerous challenges, they also provide unprecedented opportunities for advancing the frontier of the network analysis domain. In particular, we develop advanced methods in signed networks for measuring node relevance and centrality (i.e., signed network measuring), present the first generative signed network model and extend/analyze balance theory to signed bipartite networks (i.e., signed network modeling), construct the first signed graph convolutional network which learns node representations that can achieve state-of-the-art prediction performance and then furthermore introduce the novel idea of transformation-based network embedding (i.e., signed network mining), and apply signed networks by creating a framework that can infer both link and interaction polarity levels in online social media and constructing an advanced comprehensive congressional vote prediction framework built around harnessing signed networks.
Show less
- Title
- Multimodal learning and its application to modeling Alzheimer's disease
- Creator
- Wang, Qi (Graduate of Michigan State University)
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Multimodal learning gains increasing attention in recent years as heterogeneous data modalities are being collected from diverse domains or extracted from various feature extractors and used for learning. Multimodal learning is to integrate predictive information from different modalities to enhance the performance of the learned models. For example, when modeling Alzheimer's disease, multiple brain imaging modalities are collected from the patients, and effectively fusion from which is shown...
Show moreMultimodal learning gains increasing attention in recent years as heterogeneous data modalities are being collected from diverse domains or extracted from various feature extractors and used for learning. Multimodal learning is to integrate predictive information from different modalities to enhance the performance of the learned models. For example, when modeling Alzheimer's disease, multiple brain imaging modalities are collected from the patients, and effectively fusion from which is shown to be beneficial to predictive performance. Multimodal learning is associated with many challenges. One outstanding challenge is the severe overfitting problems due to the high feature dimension when concatenating the modalities. For example, the feature dimension of diffusion-weighted MRI modalities, which has been used in Alzheimer's disease diagnosis, is usually much larger than the sample size available for training. To solve this problem, in the first work, I propose a sparse learning method that selects the important features and modalities to alleviate the overfitting problem. Another challenge in multimodal learning is the heterogeneity among the modalities and their potential interactions. My second work explores non-linear interactions among the modalities. The proposed model learns a modality invariant component, which serves as a compact feature representation of the modalities and has high predictive power. In addition to utilize the modality invariant information of multiple modalities, modalities may provide supplementary information, and correlating them in the learning can be more informative. Thus, in the third work, I propose multimodal information bottleneck to fuse supplementary information from different modalities while eliminating the irrelevant information from them. One challenge of utilizing the supplementary information of multiple modalities is that most work can only be applied to the data with complete modalities. Modalities missing problem widely exists in multimodal learning tasks. For these tasks, only a small portion of data can be used to train the model. Thus, to fully use all the precious data, in the fourth work, I propose a knowledge distillation based algorithm to utilize all the data, including those that have missing modalities while fusing the supplementary information.
Show less
- Title
- Advanced Operators for Graph Neural Networks
- Creator
- Ma, Yao
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Graphs, which encode pairwise relations between entities, are a kind of universal data structure for many real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. For example, friend recommendation in social networks can be regarded as a link prediction task and predicting properties of chemical compounds can be treated as a graph classification task. An essential...
Show moreGraphs, which encode pairwise relations between entities, are a kind of universal data structure for many real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be treated as computational tasks on graphs. For example, friend recommendation in social networks can be regarded as a link prediction task and predicting properties of chemical compounds can be treated as a graph classification task. An essential step to facilitate these tasks is to learn vector representations either for nodes or the entire graphs. Given its great success of representation learning in images and text, deep learning offers great promise for graphs. However, compared to images and text, deep learning on graphs faces immense challenges. Graphs are irregular where nodes are unordered and each of them can have a distinct number of neighbors. Thus, traditional deep learning models cannot be directly applied to graphs, which calls for dedicated efforts for designing novel deep graph models. To help meet this pressing demand, we developed and investigated novel GNN algorithms to generalize deep learning techniques to graph-structured data. Two key operations in GNNs are the graph filtering operation, which aims to refine node representations; and the graph pooling operation, which aims to summarize node representations to obtain a graph representation. In this thesis, we provide deep understandings or develop novel algorithms for these two operations from new perspectives. For graph filtering operations, we propose a unified framework from the perspective of graph signal denoising, which demonstrates that most existing graph filtering operations are conducting feature smoothing. Then, we further investigate what information typical graph filtering operations can capture and how they can be understood beyond feature smoothing. For graph pooling operations, we study the procedure of pooling from the perspective of graph spectral theory and present a novel graph pooling operation. We also propose a technique to downsample nodes considering both mode importance and representativeness, which leads to a novel graph pooling operation.
Show less
- Title
- GEOGRAPHIC APPLICATIONS OF KNOWLEDGE-RICH MACHINE LEARNING APPROACHES IN SPATIOTEMPORAL DATA ANALYSIS
- Creator
- Hatami bahman beiglou, Pouyan
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
In the modern realm of pervasive, frequent, sizable and instant data capturing with advancements in instrumentation, data generation and data gathering techniques, we can benefit new prospects to comprehend and analyze the role of geography in everyday life. However, traditional geographic data analytics are now strictly challenged by the volume, velocity, variety and veracity of the data requiring analysis to extract value. As a result, geographic data science has garnered great interest in...
Show moreIn the modern realm of pervasive, frequent, sizable and instant data capturing with advancements in instrumentation, data generation and data gathering techniques, we can benefit new prospects to comprehend and analyze the role of geography in everyday life. However, traditional geographic data analytics are now strictly challenged by the volume, velocity, variety and veracity of the data requiring analysis to extract value. As a result, geographic data science has garnered great interest in the past two decades. Considering that much of data science’s success is formed outside of geography, there is an increased risk within such perspectives that location will remain simply as an additional column within a database, no more or less important than any other feature. Geographic data science combines this data with spatial and temporal components. The spatial and temporal dependence allow us to interpolate and extrapolate to fill gaps in the presence of inadequate data and infer reasonable approximations elsewhere by incorporating information from diverse data types and sources. However, within scientific communities there exist arguments regarding whether geographic data science is a scientific discipline of its own. Because data science is still in its early adoption phases in geography, geographic data science is required to develop its unique concepts, differentiating itself from other disciplines such as statistics or computer science. This becomes possible when geographers, within a community of practice, are enabled to learn and connect the current tools, methods, and domain knowledge to address the existing challenges of geographic data analysis. To take a step toward that purpose, in this dissertation, three knowledge-rich applications of data science in the analysis of geographic spatiotemporal big datasets are studied, and the opportunities and challenges facing this research along the way are explored. The first chapter of this dissertation is allocated to review the challenges and opportunities in the era of spatiotemporal big data, followed by tackling three different problems within geography, one within the subfield of human geography, and two within physical geography. Finally, in the last chapter, some final thoughts on the current state of geographic data science are discussed and the potential for future studies are considered.
Show less
- Title
- Adaptive and Automated Deep Recommender Systems
- Creator
- Zhao, Xiangyu
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Recommender systems are intelligent information retrieval applications, and have been leveraged in numerous domains such as e-commerce, movies, music, books, and point-of-interests. They play a crucial role in the users' information-seeking process, and overcome the information overload issue by recommending personalized items (products, services, or information) that best match users' needs and preferences. Driven by the recent advances in machine learning theories and the prevalence of deep...
Show moreRecommender systems are intelligent information retrieval applications, and have been leveraged in numerous domains such as e-commerce, movies, music, books, and point-of-interests. They play a crucial role in the users' information-seeking process, and overcome the information overload issue by recommending personalized items (products, services, or information) that best match users' needs and preferences. Driven by the recent advances in machine learning theories and the prevalence of deep learning techniques, there have been tremendous interests in developing deep learning based recommender systems. They have unprecedentedly advanced effectiveness of mining the non-linear user-item relationships and learning the feature representations from massive datasets, which produce great vitality and improvements in recommendations from both academic and industry communities.Despite above prominence of existing deep recommender systems, their adaptiveness and automation still remain under-explored. Thus, in this dissertation, we study the problem of adaptive and automated deep recommender systems. Specifically, we present our efforts devoted to building adaptive deep recommender systems to continuously update recommendation strategies according to the dynamic nature of user preference, which maximizes the cumulative reward from users in the practical streaming recommendation scenarios. In addition, we propose a group of automated and systematic approaches that design deep recommender system frameworks effectively and efficiently from a data-driven manner. More importantly, we apply our proposed models into a variety of real-world recommendation platforms and have achieved promising enhancements of social and economic benefits.
Show less
- Title
- Identification and analysis of non-coding RNAs in large scale genomic data
- Creator
- Achawanantakun, Rujira
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
The high-throughput sequencing technologies have created the opportunity of large-scale transcriptome analyses and intensify attention on the study of non-coding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their...
Show moreThe high-throughput sequencing technologies have created the opportunity of large-scale transcriptome analyses and intensify attention on the study of non-coding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their sequences and secondary structures. Thus, accurate secondary structure prediction provides important information to understand the tertiary structures and thus the functions of ncRNAs.The state-of-the-art ncRNA identification tools are mainly based on two approaches. The first approach is a comparative structure analysis, which determines the consensus structure from homologous ncRNAs. Structure prediction is a costly process, because the size of the putative structures increases exponentially with the sequence length. Thus it is not practical for very long ncRNAs such as lncRNAs. The accuracy of current structure prediction tools is still not satisfactory, especially on sequences containing pseudoknots. An alternative identification approach that has been increasingly popular is sequence based expression analysis, which relies on next generation sequencing (NGS) technologies for quantifying gene expression on a genome-wide scale. The specific expression patterns are used to identify the type of ncRNAs. This method therefore is limited to ncRNAs that have medium to high expression levels and have the unique expression patterns that are different from other ncRNAs. In this work, we address the challenges presented in ncRNA identification using different approaches. To be specific, we have proposed four tools, grammar-string based alignment, KnotShape, KnotStructure, and lncRNA-ID. Grammar-string is a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar and a full RNA grammar including pseudoknots. It simplifies a complicated structure alignment to a simple grammar string-based alignment. Also, grammar-string-based alignment incorporates both sequence and structure into multiple sequence alignment. Thus, we can then enhance the speed of alignment and achieve an accurate consensus structure. KnotShape and KnotStructure focus on reducing the size of the structure search space to enhance the speed of a structure prediction process. KnotShape predicts the best shape by grouping similar structures together and applying SVM classification to select the best representative shape. KnotStructure improve the performance of structure prediction by using grammar-string based-alignment and the predicted shape output by KnotShape.lncRNA-ID is specially designed for lncRNA identification. It incorporates balanced random forest learning to construct a classification model to distinguish lncRNA from protein-coding sequences. The major advantage is that it can maintain a good predictive performance under the limited or imbalanced training data.
Show less
- Title
- Detecting and Mitigating Bias in Natural Languages
- Creator
- Liu, Haochen
- Date
- 2022
- Collection
- Electronic Theses & Dissertations
- Description
-
Natural language processing (NLP) is an increasingly prominent subfield of artificial intelligence (AI). NLP techniques enable intelligent machines to understand and analyze natural languages and make it possible for humans and machines to communicate through natural languages. However, more and more evidence indicates that NLP applications show human-like discriminatory bias or make unfair decisions. As NLP algorithms play an increasingly irreplaceable role in promoting the automation of...
Show moreNatural language processing (NLP) is an increasingly prominent subfield of artificial intelligence (AI). NLP techniques enable intelligent machines to understand and analyze natural languages and make it possible for humans and machines to communicate through natural languages. However, more and more evidence indicates that NLP applications show human-like discriminatory bias or make unfair decisions. As NLP algorithms play an increasingly irreplaceable role in promoting the automation of people's lives, bias in NLP is closely related to users' vital interests and demands considerable attention.While there are a growing number of studies related to bias in natural languages, the research on this topic is far from complete. In this thesis, we propose several studies to fill up the gaps in the area of bias in NLP in terms of three perspectives. First, existing studies are mainly confined to traditional and relatively mature NLP tasks, but for certain newly emerging tasks such as dialogue generation, the research on how to define, detect, and mitigate the bias in them is still absent. We conduct pioneering studies on bias in dialogue models to answer these questions. Second, previous studies basically focus on explicit bias in NLP algorithms but overlook implicit bias. We investigate the implicit bias in text classification tasks in our studies, where we propose novel methods to detect, explain, and mitigate the implicit bias. Third, existing research on bias in NLP focuses more on in-processing and post-processing bias mitigation strategies, but rarely considers how to avoid bias being produced in the generation process of the training data, especially in the data annotation phase. To this end, we investigate annotator bias in crowdsourced data for NLP tasks and its group effect. We verify the existence of annotator group bias, develop a novel probabilistic graphical framework to capture it, and propose an algorithm to eliminate its negative impact on NLP model learning.
Show less
- Title
- Sequence learning with side information : modeling and applications
- Creator
- Wang, Zhiwei
- Date
- 2020
- Collection
- Electronic Theses & Dissertations
- Description
-
Sequential data is ubiquitous and modeling sequential data has been one of the most long-standing computer science problems. The goal of sequence modeling is to represent a sequence with a low-dimensional dense vector that incorporates as much information as possible. A fundamental type of information contained in sequences is the sequential dependency and a large body of research has been devoted to designing effective ways to capture it. Recently, sequence learning models such as recurrent...
Show moreSequential data is ubiquitous and modeling sequential data has been one of the most long-standing computer science problems. The goal of sequence modeling is to represent a sequence with a low-dimensional dense vector that incorporates as much information as possible. A fundamental type of information contained in sequences is the sequential dependency and a large body of research has been devoted to designing effective ways to capture it. Recently, sequence learning models such as recurrent neural networks (RNNs), temporal convolutional networks, and Transformer have gained tremendous popularity in modeling sequential data. Equipped with effective structures such as gating mechanisms, large receptive fields, and attention mechanisms, these models have achieved great success in many applications of a wide range of fields.However, besides the sequential dependency, sequences also exhibit side information that remains under-explored. Thus, in the thesis, we study the problem of sequence learning with side information. Specifically, we present our efforts devoted to building sequence learning models to effectively and efficiently capture side information that is commonly seen in sequential data. In addition, we show that side information can play an important role in sequence learning tasks as it can provide rich information that is complementary to the sequential dependency. More importantly, we apply our proposed models in various real-world applications and have achieved promising results.
Show less
- Title
- Online Learning Algorithms for Mining Trajectory data and their Applications
- Creator
- Wang, Ding
- Date
- 2021
- Collection
- Electronic Theses & Dissertations
- Description
-
Trajectories are spatio-temporal data that represent traces of moving objects, such as humans, migrating animals, vehicles, and tropical cyclones. In addition to the geo-location information, a trajectory data often contain other (non-spatial) features describing the states of the moving objects. The time-varying geo-location and state information would collectively characterize a trajectory dataset, which can be harnessed to understand the dynamics of the moving objects. This thesis focuses...
Show moreTrajectories are spatio-temporal data that represent traces of moving objects, such as humans, migrating animals, vehicles, and tropical cyclones. In addition to the geo-location information, a trajectory data often contain other (non-spatial) features describing the states of the moving objects. The time-varying geo-location and state information would collectively characterize a trajectory dataset, which can be harnessed to understand the dynamics of the moving objects. This thesis focuses on the development of efficient and accurate machine learning algorithms for forecasting the future trajectory path and state of a moving object. Although many methods have been developed in recent years, there are still numerous challenges that have not been sufficiently addressed by existing methods, which hamper their effectiveness when applied to critical applications such as hurricane prediction. These challenges include their difficulties in terms of handling concept drifts, error propagation in long-term forecasts, missing values, and nonlinearities in the data. In this thesis, I present a family of online learning algorithms to address these challenges. Online learning is an effective approach as it can efficiently fit new observations while adapting to concept drifts present in the data. First, I proposed an online learning framework called OMuLeT for long-term forecasting of the trajectory paths of moving objects. OMuLeT employs an online learning with restart strategy to incrementally update the weights of its predictive model as new observation data become available. It can also handle missing values in the data using a novel weight renormalization strategy.Second, I introduced the OOR framework to predict the future state of the moving object. Since the state can be represented by ordinal values, OOR employs a novel ordinal loss function to train its model. In addition, the framework was extended to OOQR to accommodate a quantile loss function to improve its prediction accuracy for larger values on the ordinal scale. Furthermore, I also developed the OOR-ε and OOQR-ε frameworks to generate real-valued state predictions using the ε insensitivity loss function.Third, I developed an online learning framework called JOHAN, that simultaneously predicts the location and state of the moving object. JOHAN generates its predictions by leveraging the relationship between the state and location information. JOHAN utilizes a quantile loss function to bias the algorithm towards predicting more accurately large categorical values in terms of the state of the moving object, say, for a high intensity hurricane.Finally, I present a deep learning framework to capture non-linear relationships in trajectory data. The proposed DTP framework employs a TDM approach for imputing missing values, coupled with an LSTM architecture for dynamic path prediction. In addition, the framework was extended to ODTP, which applied an online learning setting to address concept drifts present in the trajectory data.As proof of concept, the proposed algorithms were applied to the hurricane prediction task. Both OMuLeT and ODTP were used to predict the future trajectory path of a hurricane up to 48 hours lead time. Experimental results showed that OMuLeT and ODTP outperformed various baseline methods, including the official forecasts produced by the U.S. National Hurricane Center. OOR was applied to predict the intensity of a hurricane up to 48 hours in advance. Experimental results showed that OOR outperformed various state-of-the-art online learning methods and can generate predictions close to the NHC official forecasts. Since hurricane intensity prediction is a notoriously hard problem, JOHAN was applied to improve its prediction accuracy by leveraging the trajectory information, particularly for high intensity hurricanes that are near landfall.
Show less
- Title
- Referring expression generation towards mediating shared perceptual basis in situated dialogue
- Creator
- Fang, Rui (Research engineer)
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Situated human-robot dialogue has received increasing attention in recent years. In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. Robots need to automatically perceive and make inference of the shared environment. Due to its limited perceptual and reasoning capabilities, the robot's representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner's. Although...
Show moreSituated human-robot dialogue has received increasing attention in recent years. In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. Robots need to automatically perceive and make inference of the shared environment. Due to its limited perceptual and reasoning capabilities, the robot's representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner's. Although physically co-present, a joint perceptual basis between the human and the robot cannot be established. Thus, referential communication between the human and the robot becomes difficult. Robots need to collaborate with human partners to establish a joint perceptual basis, referring expression generation (REG) thus becomes an important problem in situated dialogue. REG is the task of generating referring expressions to describe target objects such that the intended objects can be correctly identified by the human. Although extensively studied, most existing REG algorithms were developed and evaluated under the assumption that agents and humans have access to the same kind of domain information. This is clearly not true in situated dialogue. The objective of this thesis is investigating how to generate referring expressions to mediate mismatched perceptual basis between humans and agents. As a first step, a hypergraph-based approach is developed to account for group-based spatial relations and uncertainties in perceiving the environment. This approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). %This big performance gap calls for new solutions to REG that can mediate mismatched perceptual basis in situated dialogue. This big performance gap indicates that when the agent applies traditional approaches (which usually generate a single minimum description) to generate referring expressions to describe target objects, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, two collaborative models are developed - an episodic model and an installment model - for REG. In both models, instead of generating a single referring expression to describe a target object as in the previous work, it generates multiple small expressions that lead to the target object with a goal to minimize the collaborative effort. In particular, the installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform our non-collaborative hypergraph-based approach with an absolute gain of 6% and 21% respectively. Lastly, the collaborative models are further extended to embodied collaborative models for facilitate human-robot interaction. These embodied models seamlessly incorporate robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) into the collaborative model for REG. The empirical results have shown that when robot gestures and human verbal feedback is incorporated, the new collaborative model achieves over 28% absolute gains compared to the baseline collaborative model. This thesis further discusses the opportunities and challenges brought by modeling embodiment in collaborative referential communication in human-robot interaction.
Show less
- Title
- Learning from noisily connected data
- Creator
- Yang, Tianbao
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Machine learning is a discipline of developing computational algorithms for learning predictive models from data. Traditional analytical learning methods treat the data as independent and identically distributed (i.i.d) samples from unknown distributions. However, this assumption is often violated in many real world applications that leading to the challenge of learning predictive models. For example, in electronic commerce website, customers could purchase a product by the recommendation of...
Show moreMachine learning is a discipline of developing computational algorithms for learning predictive models from data. Traditional analytical learning methods treat the data as independent and identically distributed (i.i.d) samples from unknown distributions. However, this assumption is often violated in many real world applications that leading to the challenge of learning predictive models. For example, in electronic commerce website, customers could purchase a product by the recommendation of their friends. Hence the purchasement records of customers are not i.i.d samples but correlated. Nowadays, data become correlated due to collaborations, interactions, communications, and many other types of connections. Effective learning from these connected data not only provides better understanding of the data but also brings significant economic benefits. How to learn from the connected data also brings unique challenges to both supervised learning and unsupervised learning algorithms because these algorithms are designed for i.i.d data and are often sensitive to the noise in the connected data. In this dissertation, I focus on developing theory and algorithms for learning from connected data. In particular, I consider two types of connections: the first type of connection is naturally formed in real wold networks, while the second type of connection is manually created to facilitate the learning process which is called must-and-cannot link. In the first part of this dissertation, I develop efficient algorithms for detecting communities in the first type of connected data. In the second part of this dissertation, I develop clustering algorithms that effectively utilize both must links and cannot links for the second type of connected dataA common approach toward learning from connected data is to assume that if two data points are connected, they are likely to be assigned to the same class/cluster. This assumption is often violated in real-word applications, leading to the noisy connection problems. One key challenge of learning from connected data is how to model the noisy pairwise connections that indicates the pairwise class-relationship between two data points. In the problem of detecting communities in networked data, I develop Bayesian approaches that explicitly model the noisy pairwise links by introducing additional hidden variables, besides community memberships, to explain potential inconsistency between the pairwise connections and pairwise class-relationship. In clustering must-and-cannot linked data, I will try to model how the noise is added into the pairwise connections in the manually generating process. The main contributions of this dissertation include (i) it introducespopularity andproductivity for the first time besides the community memberships to model the generation of noisy links in real networks; the effectiveness of these factors is demonstrated through the task of community detection; (ii) it proposes a discriminative model for the first time that combines the content and link analysis together for detecting communities to alleviate the impact of noisy connections in community detection; (iii) it presents a general approach for learning from noisily labeled data, proves the theoretical convergence results for the first time and applies the approach in clustering noisy must-and-cannot linked data.
Show less