Eye gaze for reference resolution in multimodal conversational interfaces
Multimodal conversational interfaces allow users to carry a spoken dialogue with an artificial conversational agent while looking at a graphical display. The dialogue is used to accomplish purposeful tasks. Motivated by previous psycholinguistic findings, this dissertation investigates how eye gaze contributes to automated spoken language understanding in such a setting, specifically focusing on robust reference resolution---a process that identifies the referring expressions in an utterance and determines which entities these expressions refer to. As a part of this investigation we attempt to model user focus of attention during human-machine conversation by utilizing the users' naturally occurring eye gaze. We study which eye gaze and auxiliary visual factors contribute to this model's accuracy. Among the various features extracted from eye gaze, fixation intensity has shown to be the most indicative in reflecting attention. We combine user speech along with this gaze-based attentional model into an integrated reference resolution framework. This framework fuses linguistic, dialogue, domain, and eye gaze information to robustly resolve various kinds of referring expressions that occur during human-machine conversation. Our studies have shown that based on this framework, eye gaze can compensate for limited domain models and dialogue processing capability. We further extend this framework to handle recognized speech input acquired situated dialogue within an immersive virtual environment. We utilize word confusion networks to model the set of alternative speech recognition hypotheses and incorporate confusion networks into the reference resolution framework. The empirical results indicate that incorporating eye gaze significantly improves reference resolution performance, especially when limited domain model information is available to the reference resolution framework. The empirical results also indicate that modeling recognized speech via confusion networks rather than the single best recognition hypothesis leads to better reference resolution performance.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Prasov, Zahar
- Thesis Advisors
-
Chai, Joyce Y.
- Committee Members
-
Jin, Rong
Jain, Anil K.
Biocca, Frank
- Date Published
-
2011
- Subjects
-
Computational linguistics
Electronic data processing--Psychological aspects
Gaze
Human-computer interaction
Language and languages--Computer programs
- Program of Study
-
Computer Science
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 168 pages
- ISBN
-
9781124603155
1124603158
- Permalink
- https://doi.org/doi:10.25335/e8jd-p284