You are here
Search results
(1 - 9 of 9)
- Title
- Modeling physical causality of action verbs for grounded language understanding
- Creator
- Gao, Qiaozi
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Building systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the...
Show moreBuilding systems that can understand and communicate through human natural language is one of the ultimate goals in AI. Decades of natural language processing research has been mainly focused on learning from large amounts of language corpora. However, human communication relies on a significant amount of unverbalized information, which is often referred as commonsense knowledge. This type of knowledge allows us to understand each other's intention, to connect language with concepts in the world, and to make inference based on what we hear or read. Commonsense knowledge is generally shared among cognitive capable individuals, thus it is rarely stated in human language. This makes it very difficult for artificial agents to acquire commonsense knowledge from language corpora. To address this problem, this dissertation investigates the acquisition of commonsense knowledge, especially knowledge related to basic actions upon the physical world and how that influences language processing and grounding.Linguistics studies have shown that action verbs often denote some change of state (CoS) as the result of an action. For example, the result of "slice a pizza" is that the state of the object (pizza) changes from one big piece to several smaller pieces. However, the causality of action verbs and its potential connection with the physical world has not been systematically explored. Artificial agents often do not have this kind of basic commonsense causality knowledge, which makes it difficult for these agents to work with humans and to reason, learn, and perform actions.To address this problem, this dissertation models dimensions of physical causality associated with common action verbs. Based on such modeling, several approaches are developed to incorporate causality knowledge to language grounding, visual causality reasoning, and commonsense story comprehension.
Show less
- Title
- Semantic role labeling of implicit arguments for nominal predicates
- Creator
- Gerber, Matthew Steven
- Date
- 2011
- Collection
- Electronic Theses & Dissertations
- Description
-
Natural language is routinely used to express the occurrence of an event and existence of entities that participate in the event. The entities involved are not haphazardly related to the event; rather, they play specific roles in the event and relate to each other in systematic ways with respect to the event. This basic semantic scaffolding permits construction of the rich event descriptions encountered in spoken and written language. Semantic role labeling (SRL) is a method of automatically...
Show moreNatural language is routinely used to express the occurrence of an event and existence of entities that participate in the event. The entities involved are not haphazardly related to the event; rather, they play specific roles in the event and relate to each other in systematic ways with respect to the event. This basic semantic scaffolding permits construction of the rich event descriptions encountered in spoken and written language. Semantic role labeling (SRL) is a method of automatically identifying events, their participants, and the existing relations within textual expressions of language. Traditionally, SRL research has focused on the analysis of verbs due to their strong connection with event descriptions. In contrast, this dissertation focuses on emerging topics in noun-based (or nominal) SRL.One key difference between verbal and nominal SRL is that nominal event descriptions often lack participating entities in the words that immediately surround the predicate (i.e., the word denoting an event). Participants (or arguments) found at longer distances in the text are referred to as implicit. Implicit arguments are relatively uncommon for verbal predicates, which typically require their arguments to appear in the immediate vicinity. In contrast, implicit arguments are quite common for nominal predicates. Previous research has not systematically investigated implicit argumentation, whether for verbal or nominal predicates. This dissertation shows that implicit argumentation presents a significant challenge to nominal SRL systems: after introducing implicit argumentation into the evaluation, the state-of-the-art nominal SRL system presented in this dissertation suffers a performance degradation of more than 8%.Motivated by these observations, this dissertation focuses specifically on implicit argumentation in nominal SRL. Experiments in this dissertation show that the aforementioned performance degradation can be reduced by a discriminative classifier capable of filtering out nominals whose arguments are implicit. The approach improves performance substantially for many frequent predicates - an encouraging result, but one that leaves much to be desired. In particular, the filter-based nominal SRL system makes no attempt to identify implicit arguments, despite the fact that they exist in nearly all textual discourses.As a first step toward the goal of identifying implicit arguments, this dissertation presents a manually annotated corpus in which nominal predicates have been linked to implicit arguments within the containing documents. This corpus has a number of unique properties that distinguish it from preexisting resources, of which few address implicit arguments directly. Analysis of this corpus shows that implicit arguments are frequent and often occur within a few sentences of the nominal predicate.Using the implicit argument corpus, this dissertation develops and evaluates a novel model capable of recovering implicit arguments. The model relies on a variety of information sources that have not been used in prior SRL research. The relative importance of these information sources is assessed and particularly troubling error types are discussed. This model is an important step forward because it unifies work on traditional verbal and nominal SRL systems. The model extracts semantic structures that cannot be recovered by applying the systems independently.Building on the implicit argument model, this dissertation then develops a preliminary joint model of implicit arguments. The joint model is motivated by the fact that semantic arguments do not exist independently of each other. The presence of a particular argument can promote or inhibit the presence of another. Argument dependency is modeled by using the TextRunner information extraction system to gather general purpose knowledge from millions of Internet webpages. Results for the joint model are mixed; however, a number of interesting insights are drawn from the study.
Show less
- Title
- Grounded language processing for action understanding and justification
- Creator
- Yang, Shaohua (Graduate of Michigan State University)
- Date
- 2019
- Collection
- Electronic Theses & Dissertations
- Description
-
Recent years have witnessed an increasing interest on cognitive robots entering into our life. In order to reason, collaborate and communicate with human in the shared physical world, the agents need to understand the meaning of human language, especially the actions, and connect them to the physical world. Furthermore, to make the communication more transparent and trustworthy, the agents should have human-like action justification ability to explain their decision-making behaviors. The goal...
Show moreRecent years have witnessed an increasing interest on cognitive robots entering into our life. In order to reason, collaborate and communicate with human in the shared physical world, the agents need to understand the meaning of human language, especially the actions, and connect them to the physical world. Furthermore, to make the communication more transparent and trustworthy, the agents should have human-like action justification ability to explain their decision-making behaviors. The goal of this dissertation is to develop approaches that learns to understand actions in the perceived world through language communication. Towards this goal, we study three related problems. Semantic role labeling captures semantic roles (or participants) such as agent, patient and theme associated with verbs from text. While it provides important intermediate semantic representations for many traditional NLP tasks, it does not capture grounded semantics with which an artificial agent can reason, learn, and perform the actions. We utilize semantic role labeling to connect the visual semantics with linguistic semantics. On one hand, this structured semantic representation can help extend the traditional visual scene understanding instead of simply object recognition and relation detection, which is important for achieving human robot collaboration tasks. On the other hand, due to the shared common ground, not every language instruction is fully specified explicitly. We proposed to not only ground explicit semantic roles, but also implicit roles which is hidden during the communication. Our empirical results have shown that by incorporate the semantic information, we achieve better grounding performance, and also a better semantic representation of the visual world. Another challenge for an agent is to explain to human why it recognizes what's going on as a certain action. With the recent advance of deep learning, A lot of works have shown to be very effective on action recognition. But most of them function like black-box models and have no interpretations of the decisions which are given. To enable collaboration and communication between humans and agents, we developed a generative conditional variational autoencoder (CVAE) approach which allows the agent to learn to acquire commonsense evidence for action justification. Our empirical results have shown that, compared to a typical attention-based model, CVAE has a significantly higher explanation ability in terms of identifying correct commonsense evidence to justify perceived actions. The experiment on communication grounding further shows that the commonsense evidence identified by CVAE can be communicated to humans to achieve a significantly higher common ground between humans and agents. The third problem combines the action grounding with action justification in the context of visual commonsense reasoning. Humans have tremendous visual commonsense knowledge to answer the question and justify the rationale, but the agent does not. On one hand, this process requires the agent to jointly ground both the answers and rationales to the images. On the other hand, it also requires the agent to learn the relation between the answer and the rationale. We propose a deep factorized model to have a better understanding of the relations between the image, question, answer and rationale. Our empirical results have shown that the proposed model outperforms strong baselines in the overall performance. By explicitly modeling factors of language grounding and commonsense reasoning, the proposed model provides a better understanding of effects of these factors on grounded action justification.
Show less
- Title
- Online adaptation for mobile device text input personalization
- Creator
- Baldwin, Tyler
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
As mobile devices have become more common, the need for efficient methods of mobile device text entry has grown. With this growth comes new challenges, as the constraints imposed by the size, processing power, and design of mobile devices impairs traditional text entry mechanisms in ways not seen in previous text entry tasks. To combat this, researchers have developed a variety of text entry aids, such as automatic word completion and correction, that help the user input the desired text more...
Show moreAs mobile devices have become more common, the need for efficient methods of mobile device text entry has grown. With this growth comes new challenges, as the constraints imposed by the size, processing power, and design of mobile devices impairs traditional text entry mechanisms in ways not seen in previous text entry tasks. To combat this, researchers have developed a variety of text entry aids, such as automatic word completion and correction, that help the user input the desired text more quickly and accurately than unaided input. Text entry aids are able to produce meaningful gains by attempting to model user behavior. These aids rely on models of the language the user speaks and types in and of user typing behavior to understand the intent of a user's input. Because these models require a large body of supervised training data to build, they are often built offline using aggregate data from many users. When they wish to predict the behavior of a new user, they do so by comparing their input to the behavior of the ``average'' user used to build the models.Alternatively, a model that is built on the current user's data rather than that of an average user may be better able to adapt to their individual quirks and provide better overall performance. However, to enable this personalized experience for a previously unseen user the system must be able to collect the data to build the models online, from the natural input provided by the user. This not only allows the system to better model the user's behavior, but it also allows it to continuously adapt to behavioral changes. This work examines this personalization and adaptation problem, with a particular focus on solving the online data collection problem.This work looks at the online data collection, personalization, and adaptation problems at two levels. In the first, it examines lower level text entry aids that attempt to help users input each individual character. Online data collection and personalization are examined in the context of one commonly deployed character-level text entry aid, key-target resizing. Several simple and computationally inexpensive data collection and assessment methods are proposed and evaluated. The results of these experiments suggest that by using these data assessment techniques we are able to dynamically build personalized models that outperform general models by observing less than one week's worth of text input from the average user. Additional analyses suggest that further improvements can be obtained by hybrid approaches that consider both aggregate and personalized data.We then step back and examine the data assessment and collection process for higher-level text entry aids. To do so we examine two text entry aids that work at the word level, automatic word correction and automatic word completion. Although their stated goal differs, these aids work similarly and, critically, fail similarly. To improve performance, data assessment methods that can detect cases of system failure are proposed. By automatically and dynamically detecting when a system fails for a given user, we are better able to understand user behavior and help the system overcome its shortfalls. The results of these experiments suggest that a careful examination of user dialogue behavior will allow the system to assess its own performance. Several methods for utilizing the self-assessment data for personalization are proposed and are shown to be plausibly able to improve performance.
Show less
- Title
- Verb semantics as denoting change of state in the physical world
- Creator
- Doering, Malcolm
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
In the not too distant future we anticipate the existence of robots that will help around the house, in particular, the kitchen. Thus, it is critical that robots can understand the language commonly used within this domain. Therefore, in this work we explore the semantics of verbs that frequently occur when describing cooking activities. Motivated by linguistic theory on the lexical semantics of concrete action verbs and data collected via crowdsourcing, an ontology of the changes of states...
Show moreIn the not too distant future we anticipate the existence of robots that will help around the house, in particular, the kitchen. Thus, it is critical that robots can understand the language commonly used within this domain. Therefore, in this work we explore the semantics of verbs that frequently occur when describing cooking activities. Motivated by linguistic theory on the lexical semantics of concrete action verbs and data collected via crowdsourcing, an ontology of the changes of states of the physical world as denoted by concrete action verbs is presented. Furthermore, additional datasets are collected for the purpose of validating the ontology, exploring the effects of context on verbal change of state semantics, and testing the automatic identification of changes of state denoted by verbs. In conclusion, several areas of further investigation are suggested.
Show less
- Title
- Eye gaze for reference resolution in multimodal conversational interfaces
- Creator
- Prasov, Zahar
- Date
- 2011
- Collection
- Electronic Theses & Dissertations
- Description
-
Multimodal conversational interfaces allow users to carry a spoken dialogue with an artificial conversational agent while looking at a graphical display. The dialogue is used to accomplish purposeful tasks. Motivated by previous psycholinguistic findings, this dissertation investigates how eye gaze contributes to automated spoken language understanding in such a setting, specifically focusing on robust reference resolution---a process that identifies the referring expressions in an utterance...
Show moreMultimodal conversational interfaces allow users to carry a spoken dialogue with an artificial conversational agent while looking at a graphical display. The dialogue is used to accomplish purposeful tasks. Motivated by previous psycholinguistic findings, this dissertation investigates how eye gaze contributes to automated spoken language understanding in such a setting, specifically focusing on robust reference resolution---a process that identifies the referring expressions in an utterance and determines which entities these expressions refer to. As a part of this investigation we attempt to model user focus of attention during human-machine conversation by utilizing the users' naturally occurring eye gaze. We study which eye gaze and auxiliary visual factors contribute to this model's accuracy. Among the various features extracted from eye gaze, fixation intensity has shown to be the most indicative in reflecting attention. We combine user speech along with this gaze-based attentional model into an integrated reference resolution framework. This framework fuses linguistic, dialogue, domain, and eye gaze information to robustly resolve various kinds of referring expressions that occur during human-machine conversation. Our studies have shown that based on this framework, eye gaze can compensate for limited domain models and dialogue processing capability. We further extend this framework to handle recognized speech input acquired situated dialogue within an immersive virtual environment. We utilize word confusion networks to model the set of alternative speech recognition hypotheses and incorporate confusion networks into the reference resolution framework. The empirical results indicate that incorporating eye gaze significantly improves reference resolution performance, especially when limited domain model information is available to the reference resolution framework. The empirical results also indicate that modeling recognized speech via confusion networks rather than the single best recognition hypothesis leads to better reference resolution performance.
Show less
- Title
- Referring expression generation towards mediating shared perceptual basis in situated dialogue
- Creator
- Fang, Rui (Research engineer)
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Situated human-robot dialogue has received increasing attention in recent years. In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. Robots need to automatically perceive and make inference of the shared environment. Due to its limited perceptual and reasoning capabilities, the robot's representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner's. Although...
Show moreSituated human-robot dialogue has received increasing attention in recent years. In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. Robots need to automatically perceive and make inference of the shared environment. Due to its limited perceptual and reasoning capabilities, the robot's representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner's. Although physically co-present, a joint perceptual basis between the human and the robot cannot be established. Thus, referential communication between the human and the robot becomes difficult. Robots need to collaborate with human partners to establish a joint perceptual basis, referring expression generation (REG) thus becomes an important problem in situated dialogue. REG is the task of generating referring expressions to describe target objects such that the intended objects can be correctly identified by the human. Although extensively studied, most existing REG algorithms were developed and evaluated under the assumption that agents and humans have access to the same kind of domain information. This is clearly not true in situated dialogue. The objective of this thesis is investigating how to generate referring expressions to mediate mismatched perceptual basis between humans and agents. As a first step, a hypergraph-based approach is developed to account for group-based spatial relations and uncertainties in perceiving the environment. This approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). %This big performance gap calls for new solutions to REG that can mediate mismatched perceptual basis in situated dialogue. This big performance gap indicates that when the agent applies traditional approaches (which usually generate a single minimum description) to generate referring expressions to describe target objects, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, two collaborative models are developed - an episodic model and an installment model - for REG. In both models, instead of generating a single referring expression to describe a target object as in the previous work, it generates multiple small expressions that lead to the target object with a goal to minimize the collaborative effort. In particular, the installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform our non-collaborative hypergraph-based approach with an absolute gain of 6% and 21% respectively. Lastly, the collaborative models are further extended to embodied collaborative models for facilitate human-robot interaction. These embodied models seamlessly incorporate robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) into the collaborative model for REG. The empirical results have shown that when robot gestures and human verbal feedback is incorporated, the new collaborative model achieves over 28% absolute gains compared to the baseline collaborative model. This thesis further discusses the opportunities and challenges brought by modeling embodiment in collaborative referential communication in human-robot interaction.
Show less
- Title
- Referential grounding towards mediating shared perceptual basis in situated dialogue
- Creator
- Liu, Changsong, Ph. D.
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
In situated dialogue, although an artificial agent (e.g., robot) and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the agent to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). The work presented in this dissertation focuses on...
Show moreIn situated dialogue, although an artificial agent (e.g., robot) and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the agent to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). The work presented in this dissertation focuses on computational approaches to enable robust and adaptive referential grounding in situated settings.First, graph-based representations are employed to capture a human speaker's linguistic discourse and an agent's visual perception. Referential grounding is then formulated as a graph-matching problem, and a state-space search algorithm is applied to ground linguistic references onto perceived objects. Furthermore, hypergraph representations are used to account for group-based descriptions, and one prevalent pattern of collaborative communication observed from a human-human dialogue dataset is incorporated into the search algorithm. This graph-matching based approach thus provides a principled way to model and utilize spatial relations, group-based descriptions, and collaborative referring discourse in situated dialogue. Evaluation results demonstrate that, when the agent's visual perception is unreliable due to computer vision errors, the graph-based approach significantly improves referential grounding accuracy over a baseline which only relies on object-properties.Second, an optimization based approach is proposed to mediate the perceptual differences between an agent and a human. Through online interaction with the human, the agent can learn a set of weights which indicate how reliably/unreliably each dimension (object type, object color, etc.) of its perception of the environment maps to the human’s linguistic descriptors. Then the agent can adapt to the situation by applying the learned weights to the grounding process and/or adjusting its word grounding models accordingly. Empirical evaluation shows this weight-learning approach can successfully adjust the weights to reflect the agent’s perceptual insufficiencies. The learned weights, together with updated word grounding models, can lead to a significant improvement for referential grounding in subsequent dialogues.Third, a probabilistic labeling algorithm is introduced to handle uncertainties from visual perception and language processing, and to potentially support generation of collaborative responses in the future. The probabilistic labeling algorithm is formulated under the Bayesian reasoning framework. It provides a unified probabilistic scheme to integrate different types of evidence from the collaborative referring discourse, and to generate ranked multiple grounding hypotheses for follow-up processes. Evaluated on the same dataset, probabilistic labeling significantly outperforms state-space search in both accuracy and efficiency.All these approaches contribute to the ultimate goal of building collaborative dialogue agents for situated interaction, so that the next generation of intelligent machines/devices can better serve human users in daily work and life.
Show less
- Title
- Interactive learning of verb semantics towards human-robot communication
- Creator
- She, Lanbo
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
"In recent years, a new generation of cognitive robots start to enter our lives. Robots such like ASIMO, PR2, and Baxter have been studied and applied in education and service applications. Different from traditional industry robots doing specific repetitive tasks in a well controlled environment, cognitive robots must be able to work with human partners in a dynamic environment which is filled with uncertainties and exceptions. It is unlikely to pre-program every type of knowledge (e.g.,...
Show more"In recent years, a new generation of cognitive robots start to enter our lives. Robots such like ASIMO, PR2, and Baxter have been studied and applied in education and service applications. Different from traditional industry robots doing specific repetitive tasks in a well controlled environment, cognitive robots must be able to work with human partners in a dynamic environment which is filled with uncertainties and exceptions. It is unlikely to pre-program every type of knowledge (e.g., perceptual knowledge like different colors or shapes; action knowledge like how to complete a task) into the robot systems ahead of time. Just like how children learn from their parents, it's desirable for robots to continuously acquire knowledge and learn from human partners on how to handle novel and unknown situations. Driven by this motivation, the goal of this dissertation is to develop approaches that allow robots to acquire and refine knowledge, particularly, knowledge related to verbs and actions, through interaction/dialogue with its human partner. Towards this goal, this dissertation has made following contributions i . As a first step, we propose a goal state based verb semantics and develop a three-tier action/task knowledge representation. This representation on one hand supports the connection between symbolic representations of language and continuous sensori-motor representations of the robots; and on the other hand, supports the application of existing planning algorithms to address novel situations. Our empirical results have shown that, given this representation, the robot can immediately apply the newly learned action knowledge to perform actions under novel situations. Secondly, the goal state representation and the three-tier structure are integrated into a dialogue system on board of a SCHUNK robotic arm to learn new actions through human-robot dialogue in a simplified blocks world. For a novel complex action, the human can give an illustration through dialogue using robot's existing action knowledge. Comparing the environment changes before and after the action illustration, the robot can identify a goal state to represent the novel action, which can be immediately applied to new environments. Empirical studies have shown that action knowledge can be acquired by following human instructions. Furthermore, the results also demonstrate that step-by-step instructions lead to better learning performance compared to one-shot instructions. To solve the insufficiency issue of applying the single goal state representation in more complex domains (e.g., kitchen and living room), the single goal state is extended to a hierarchical hypothesis space to capture different possible outcomes of a verb action. Our empirical results demonstrate that the representation of hypothesis space, combined with the learned hypothesis selection algorithm, outperforms approaches using single hypothesis representation. Lastly, we address uncertainties in the environment for verb acquisition. Previous works rely on perfect environment sensing and human language understanding, which does not hold in real world situation. In addition, rich interactions between teachers and learners as observed in human teaching/learning have not been explored. To address these limitations, the last part presents a new interactive learning approach that allows robots to proactively engage in interaction with human partners by asking good questions to handle uncertainties of the environment. Reinforcement learning is applied for the robot to acquire an optimal policy for its question-asking behaviors by maximizing the long-term reward. Empirical results have shown that the interactive learning approach leads to more reliable models for grounded verb semantics, especially in the noisy environments."--Pages ii-iii.
Show less