Referential grounding towards mediating shared perceptual basis in situated dialogue
In situated dialogue, although an artificial agent (e.g., robot) and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the agent to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). The work presented in this dissertation focuses on computational approaches to enable robust and adaptive referential grounding in situated settings.First, graph-based representations are employed to capture a human speaker's linguistic discourse and an agent's visual perception. Referential grounding is then formulated as a graph-matching problem, and a state-space search algorithm is applied to ground linguistic references onto perceived objects. Furthermore, hypergraph representations are used to account for group-based descriptions, and one prevalent pattern of collaborative communication observed from a human-human dialogue dataset is incorporated into the search algorithm. This graph-matching based approach thus provides a principled way to model and utilize spatial relations, group-based descriptions, and collaborative referring discourse in situated dialogue. Evaluation results demonstrate that, when the agent's visual perception is unreliable due to computer vision errors, the graph-based approach significantly improves referential grounding accuracy over a baseline which only relies on object-properties.Second, an optimization based approach is proposed to mediate the perceptual differences between an agent and a human. Through online interaction with the human, the agent can learn a set of weights which indicate how reliably/unreliably each dimension (object type, object color, etc.) of its perception of the environment maps to the human’s linguistic descriptors. Then the agent can adapt to the situation by applying the learned weights to the grounding process and/or adjusting its word grounding models accordingly. Empirical evaluation shows this weight-learning approach can successfully adjust the weights to reflect the agent’s perceptual insufficiencies. The learned weights, together with updated word grounding models, can lead to a significant improvement for referential grounding in subsequent dialogues.Third, a probabilistic labeling algorithm is introduced to handle uncertainties from visual perception and language processing, and to potentially support generation of collaborative responses in the future. The probabilistic labeling algorithm is formulated under the Bayesian reasoning framework. It provides a unified probabilistic scheme to integrate different types of evidence from the collaborative referring discourse, and to generate ranked multiple grounding hypotheses for follow-up processes. Evaluated on the same dataset, probabilistic labeling significantly outperforms state-space search in both accuracy and efficiency.All these approaches contribute to the ultimate goal of building collaborative dialogue agents for situated interaction, so that the next generation of intelligent machines/devices can better serve human users in daily work and life.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Liu, Changsong, Ph. D.
- Thesis Advisors
-
Chai, Joyce Y.
- Committee Members
-
Altmann, Erik M.
Jin, Rong
Tong, Yiying
- Date Published
-
2015
- Program of Study
-
Computer Science - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- xii, 127 pages
- ISBN
-
9781321739473
1321739478
- Permalink
- https://doi.org/doi:10.25335/xfk9-c067