CONVERSATION  

 By  

Janghee Cho  

A THESIS 

Submitted to   

 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 

 

 

 

 

 

MENTAL MODELS AND HOME VIRTUAL ASSISTANTS (HVAs): 

 A SINGLE TURN-TAKING IS NOT ENOUGH TO HAVE A HUMAN-LIKE 

Michigan State University  

in partial fulfillment of the requirements  

for the degree of  

Information and Media — Master of Arts 

2018 

 

 

ABSTRACT 

MENTAL MODELS AND HOME VIRTUAL ASSISTANTS (HVAs): 

 A SINGLE TURN-TAKING IS NOT ENOUGH TO HAVE A HUMAN-LIKE 

CONVERSATION  

By  

Janghee Cho  

As  virtual  assistants  become  more  intelligent,  these  applications  are  dispersed  in  our 

everyday life. However, virtual assistants are still used in very routine tasks (Wingfield, 2018). 

People are expected to speak to virtual assistants in a conversational manner; nevertheless, there 

has been little research looking at people's mental models for what kinds of interactions they think 

the devices are capable of. I conducted a qualitative study to describe people’s strategies to seek a 

wide range of information when interacting with Google Home, which is a type of Home Virtual 

Assistant (HVA), and how their understandings of HVAs may change their interaction. I found 

that people believed applying human-to-human communication skills may facilitate a conversation 

with Google Home (i.e., common-sense models). Also, people expected Google Home performs 

like Google search or an artificial intelligence system that is able to become more intelligent if 

people provide more information (i.e., machine-oriented models). I present implications for the 

design of HVAs to encourage inexperienced people to be involved in the interaction with HVAs 

based on these results. 

 

 

 

 

 

 
Copyright by 
JANGHEE CHO  
2018 

 

 

 

 

 

 

 

 

 

 

 

 

 

ACKNOWLEDGEMENTS 

 

Firstly, I would like to express my sincere gratitude to my advisor Dr. Emilee Rader for the 

continuous support of my master study. Here guidance helped me in all the time of research and 

writing of this thesis. Besides my advisor, I would like to thank the rest of my thesis committee: 

Dr. Susan Wyche and Dr. Taiwoo Park, for their insightful comments and encouragement, but 

also for the question which incented me to widen my research from various perspectives.  

My sincere thanks also go to our Behavior Information Technology lab members, who provided 

me many ideas and feedbacks. Without their precious support, it would not be possible to 

conduct this research. Also this work would not have been possible without the financial support 

of the Department of Media and Information at Michigan State University.  

Last but not the least, I would like to thank my family, especially my wife Ahyoung, for 

supporting me spiritually throughout writing this thesis and my life in general.  

 

 

 

 

 

 

iv 

 

 

 

TABLE OF CONTENTS 

 

LIST OF TABLES ……………………………………………………………………………… vi  

LIST OF FIGURES ……………………………………………………………………………  vii  

CHAPTER 1. INTRODUCTION ................................................................................................ 1 

CHAPTER 2. RELATED WORKS ............................................................................................ 5 
2.1 MENTAL MODELS .............................................................................................................. 5 
2.2 DIFFERENT CONVERSATIONAL PATTERNS IN HUMAN AND VIRTUAL ASSISTANT 
COMMUNICATION .................................................................................................................... 7 
2.3 TASK DESIGN..................................................................................................................... 9 

CHAPTER 3. METHODS ........................................................................................................ 11 
3.1 PARTICIPANTS .................................................................................................................. 11 
3.2 PROCEDURE ..................................................................................................................... 12 
3.3 TASK DESIGN................................................................................................................... 14 
3.4 ANALYSIS ........................................................................................................................ 15 

CHAPTER 4 FINDINGS .......................................................................................................... 17 
4.1. HOW DID PEOPLE SEEK SPECIFIC INFORMATION? ............................................................ 17 
4.2. HOW DID PARTICIPANTS INTERACT WITH GOOGLE HOME AFTER LEARNING ITS 
LIMITATIONS? ....................................................................................................................... 24 
4.3. WHAT MADE PARTICIPANTS STOP USING IT. ................................................................... 28 
4.4 MENTAL MODELS OF HVAS ............................................................................................. 32 

CHAPTER 5. DISCUSSION .................................................................................................... 40 
5.1 GIVING AN ANSWER IS NOT A SOLUTION FOR THE EFFECTIVE INTERACTION ....................... 40 
5.2. HOW TO SUPPORT COMMON-SENSE MODELS ................................................................... 41 
5.3 LIMITATION ..................................................................................................................... 43 

6. CONCLUSION .................................................................................................................... 44 

APPENDICES .......................................................................................................................... 46 
APPENDIX A....................................................................................................................... 47 
APPENDIX B. ...................................................................................................................... 49 
APPENDIX C. ...................................................................................................................... 50 
APPENDIX D....................................................................................................................... 51 
APPENDIX E ....................................................................................................................... 54 
APPENDIX F ....................................................................................................................... 57 

REFERENCES ......................................................................................................................... 59 

 

 

v 

 

LIST OF TABLES 

Table 1: The order of tasks for each participant ……………………………………………….. 14 

Table 2: Two mental models and related strategies ……………………………………………. 39 

Table 3: Participants’ demographic information ………………………………………………. 50 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

vi 

 

 

LIST OF FIGURES 

 

Figure 1: Participants’ questions in the factual tasks ………………………………………….. 20 

Figure 2: Participants’ questions in the instrumental tasks …………………………………..... 22 

Figure 3: Data from Google Activity in terms of task 3 ……………………………………….. 23 

Figure 4: Participants’ questions in the unanswerable tasks …………………………………… 26 

Figure 5: Participant’s reaction toward Google Home’s preprogrammed answer …………….. 30 

Figure 6: Participants’ reaction toward Google Home’s long response. ………………………. 31 

Figure 7: The result of Google search(left) and Google Home(right) in response to the same 
question ………………………………………………………………………………………… 36 

Figure 8: Participant’s interaction with Google Home in task 3 and the debriefing from the 
interview session …………………………………………………………….………………… 38 

Figure 9: Google Home’s current error handling strategies …………………………………… 43 

 

 

 

 

 

 

 

vii 

 

 

CHAPTER 1. INTRODUCTION  

Conversation is becoming a key interface of Human-computer interaction (HCI) as 

virtual assistants, such as Amazon Alexa, Google Assistant or Apple Siri, integrate into personal 

or home devices. Virtual assistants, so-called conversational agents, refer to a dialogue system by 

“combining automatic speech recognition and speech synthesis with natural language processing 

and dialogue management” (Roger K. Moore, 2017, p. 281). As virtual assistants become more 

intelligent, these applications take on a wide variety of roles and functions in our everyday life. It 

is expected that virtual assistants will become essential user interfaces to many services in the 

near future (R K Moore, 2013). 

In many cases, virtual assistants’ practical value is still in doubt and limited to a few 

simple tasks such as checking weather forecasts, playing music, and controlling other smart 

appliance (Cohen, Cheyer, Horvitz, El Kaliouby, & Whittaker, 2016; Gašić, Hakkani-Tür, & 

Celikyilmaz, 2017; Roger K. Moore, 2017). Since current virtual assistants are developed based 

on task-oriented functions (e.g., playing music, turning on lights, setting an alarm and so on), 

virtual assistants are used in very routine tasks (Wingfield, 2018). Moreover, more than half of 

people rarely or only occasionally use virtual assistants and many people have little knowledge 

about how to take advantage of virtual assistants’ ability in their life (Roger K. Moore, 2017; 

Richter, 2016).  

System designers keep updating virtual assistants’ abilities to make them more likely to 

have natural interactions with people. For instance, Amazon announced a new feature (i.e., Brief-

mode1) to make interactions with virtual assistants seamless. If people activate this mode, 

                                                
1About brief mode:  https://www.theverge.com/2018/3/16/17128860/amazon-alexa-brief-mode-testing-responses-
verbal-beeps  

 

1 

Amazon Alexa does not give a reply to people’s request to turn on the light by using a verbal 

confirmation, instead it just turns on the light. Furthermore, Google continued to develop the 

quality of its virtual assistant’s voice to make it more realistic2. It was reported that most people 

get used to using virtual assistants through playful interaction such as humorous responses 

(Luger & Sellen, 2016). Given the fact that anthropomorphic cues such as name, gender, voice or 

physical appearance cues enhance users’ engagement and usability in the interaction between 

people and virtual assistants (Kopp, Gesellensetter, Krämer, & Wachsmuth, 2005; Tegos et al., 

2012; Niculescu et al., 2014), system designers might presume that designing virtual assistants to 

be more like humans would be helpful to encourage people to adopt this advanced technology. 

The basic idea to be like a human in conversation is a joint action by aligning each other’s 

knowledge and experience to achieve a goal (Clark, 1997). Since people use virtual assistants by 

taking turns talking to them, mimicking conversational manners well would be critical for 

making virtual assistants to be more human-like.  

 Understanding people’s beliefs and expectations regarding virtual assistants is important 

for knowing how to improve a conversation between people and virtual assistants in order to 

engage inexperienced people in the interaction with virtual assistants. Recent research studies 

have examined how people use them in their everyday life, focusing on task-oriented usage and 

interaction with smart home applications (Clark, Dutta, & Newman, 2016; Cowan et al., 2017; 

McGregor & Tang, 2017; Porcheron, Fischer, Reeves, & Sharples, 2018; Rong, Fourney, 

Brewer, Morris, & Bennett, 2017). However, most studies investigated experienced users who 

already made use of virtual assistants in some way, which means that they were already educated 

about how to use virtual assistants through their experience. Moreover, people who tend to adopt 

                                                
2 https://www.engadget.com/2018/03/27/google-text-to-speech-service-has-more-realistic-voices/  

 

2 

advanced technology as soon as it becomes available or have a professional background 

regarding technology may be adept at exploiting virtual assistants’ abilities. Rather than a single 

task-oriented usage, in this paper, I focus more on the conversation between people and virtual 

assistants to understand how inexperienced people create a dialogue to seek certain information. 

Since the interaction between people and virtual assistants is based on conversation, analyzing 

their verbal strategies is critical for developing virtual assistants. Also, by focusing on potential 

users who haven’t experienced using virtual assistants, I study how they behave in their 

interactions with virtual assistants by applying their beliefs and expectations.  

In this study, I use the term, home virtual assistant (HVA), to specify a certain domain 

for a conversational interface, which is the focus of this study. In the previous research, virtual 

assistants were not categorized according to different domains or purposes. Different labels are 

often used (e.g., conversational agents, personal assistants, or chatbot) for the same services or 

devices, or they are used interchangeably. Given the fact that virtual assistants have been 

learning numerous abilities incessantly, it is necessary to distinguish virtual assistants used in the 

context of the home (i.e., home virtual assistants) from the mobile-driven context in a portable 

form. The main motivation to use mobile-driven virtual assistants was that they were determined 

to be time-saving (Luger & Sellen, 2016). While carrying out multi-tasking is the main 

motivation for the mobile-driven virtual assistants, home virtual assistants may be utilized in our 

everyday life for other purposes. Among home virtual assistants in the current market, I 

exclusively use the Google Home device to observe the extent of their conversation, not only one 

single turn-taking between people and HVAs. In a recent study from a market research firm, 

Google Home was seen as having the highest accuracy rate in answering the most questions 

(Enge, 2017). A single turn-taking is not enough to assess the process of how users learn and 

 

3 

change their approach so that using a device with better performance to give an answer for is 

necessary in this study. Moreover, tasks are designed in a scenario-based context, therefore, I am 

intentionally choosing a device that allows the participant to talk as much as they can.  

I conducted a qualitative study that is comprised of two phases in order to understand 

people’s mental models. Mental models refer to sets of beliefs and understandings that help users 

make decisions (Norman, 1983).  Mental models form people’s thoughts and guide their 

interaction with the system. Understanding the mental model that people use in the interaction is 

important in designing a more effective system for people. I began with an observation to 

identify strategies that users use to complete the tasks. I asked participants to think-aloud while 

conducting the tasks. Then, I had a semi-structured interview to probe for a set of causal beliefs 

that impact their behavior during the tasks. In think-aloud, people verbalize everything that they 

are observing into words, which reveals the detail of human problem-solving processes that 

controls users’ reasoning (Johnson-Laird, 2010). People can verbalize the linguistic contents of 

their working memory that affect how knowledge is organized and used in our mind (D'Andrade, 

1995;Norman, 2013). In this sense, I identified themes and patterns in participants’ utterance 

data that reveal their mental models. 

In this study, I examined how inexperienced people interact with HVAs and how their 

understanding of HVAs may change their interaction with them. I introduce tasks for seeking 

information through HVAs focusing on our everyday conversation and describe what mental 

models inexperienced people form while developing their verbal strategies to proceed with a 

conversation with Google Home. I also present design implications as to how to promote a 

conversation between people and HVAs and to encourage inexperienced people to be involved in 

the interaction with HVAs.  

 

4 

CHAPTER 2. RELATED WORKS  

As a way of understanding people’s interaction with HVAs, I reviewed literature that 

studied mental model’s approach to understand human behavior in the system, conversation 

patterns in human-to-human interaction and human-to-virtual assistant interaction, and the way 

to develop a task to investigate information seeking behavior.  

 

2.1 MENTAL MODELS 

Research in HCI has commonly studied people’s mental models to understand how 

people understand a system’s operation, especially those designed with a complex structure. 

Mental models refer to a human’s reasoning mechanism that people use to interact with the 

world around them (Jones, Ross, Lynam, Perez, & Leitch, 2011); these models are sets of beliefs 

and understanding that help people’s decisions (Norman, 1983; Rader & Slaker, 2017). Mental 

models are formed by an implicit collection of beliefs about the system based on their 

experiences and interaction (French & Hancock, 2017). People obtain information from other 

people that are like themselves, from the media, from communications and interaction with 

experts, and from their own experiences (Wash & Rader, 2011). In this sense, “mental model 

tends to be functional rather than complete or accurate representation of reality” (Jones et al., 

2011, p. 5); people continue to modify the mental model in order to solve a problem through 

interaction with the system (Norman, 1983). In interacting with technological artifacts, people 

formulate mental models of themselves and mental models of the things with which they are 

interacting (Young, 1983).  

 

5 

Besides experience and background knowledge, the structure of the human information 

processing system is also a component of mental models (Norman, 1983). Encountering new 

knowledge in the world, users try to understand it by extrapolating from their own knowledge in 

their heads (Norman, 2013). They use their knowledge stored in their working memory, 

invaluable in the performance of everyday tasks, to make a decision to select and manipulate 

information (Jones et al., 2011). Working memory is available immediately, clear and complete 

without mental effort, but it also enables users to retrieve information from their long-term 

memory (Nersessian, 2002). Thus, working memory holds a lot of clues in regards to capturing a 

user’s understanding of system operations.  

Researchers have also investigated users’ mental models in the context of technologies 

with an opaque operation – web searching (Choo, Detlor, & Turnbull, 2000; Hölscher & Strube, 

2000; Muramatsu & Pratt, 2001), and social media algorithms (Eslami et al., 2016; Rader & 

Gray, 2015). These operations provide little feedback and prevent a user from understanding the 

details of their functionality, which is how a particular input is processed and interpreted. Home 

virtual assistants like Google Home also have an opaque operational interface, obstructing an 

understanding of the system’s process. Koenemann and Belkin (1996) define an opaque level of 

interactivity as an interface that does not provide any indication of the underlying 

transformations that the system performs.  

The mental model approach is generally recognized as a useful method to unveil 

underlying users’ behaviors in the context where users encounter a complex or unfamiliar 

domain (DeVito, Gergle, & Birnholtz, 2017; French & Hancock, 2017; Kang, Dabbish, Fruchter, 

& Kiesler, 2015; Rader & Gray, 2015). This is because people develop an understanding of 

causal processes associated with mechanical systems by drawing upon their mental models, 

 

6 

formed through analogical thinking (Jones et al., 2011). To design a more effective interface for 

those opaque systems, most studies aim to understand users’ beliefs and patterns. In this study, I 

examine mental models, focusing on the conversational interface, without the consideration of 

visual feedback.  

 

2.2 DIFFERENT CONVERSATIONAL PATTERNS IN HUMAN AND VIRTUAL 

ASSISTANT COMMUNICATION  

Understanding human-to-human communication is essential to analyze the interaction 

between people and virtual assistants. Based on the way people have a conversation, I can 

investigate the reverberation of current HVAs’ design on the interaction between humans and 

HVAs. Also, it allows me to understand how differently people talk with HVAs in comparison 

with human-to-human communication.  

The most common type of human-to-human communication is a conversation (Clark, 

1997). When we say 'conversation', we picture a situation where more than one person express 

their thoughts, feelings, or any ideas to each other. Sometimes we often figuratively say, “This is 

not a conversation”, once we feel the interlocutors do not commit themselves to the conversation. 

In this sense, some extent of the effort should be included in the conversation. Clark (1997) 

argues human-to-human communication is a joint project, "in which people establish and carry 

out joint purposes they are willing and able to commit to" (Clark, 1997, p. 319). Each 

interlocutor not only just act in response to the other, but also coordinate with each other by 

aligning each other’s knowledge and experience (i.e., grounding, Clark 1997). He argues people 

cannot have a conversation without certain pieces of common ground. We are usually aware of 

the other to see their appearance or know their background that helps to assume who they are. In 

 

7 

exchanging words, we also form mutual knowledge and beliefs. Having a conversation in 

human-to-human communication is about a continuous process for the mutual purpose by 

updating their common ground.  

Virtual assistants have been developed based on the metaphor of human conversation. 

People are expected to speak to virtual assistants in a conversational manner. However, people 

are required to know a certain way to interact with them. Considering the fact that “the common 

human conversation is face-to-face setting” (Clark, 1997, p. 9), people simultaneously interact 

with others: they give a sign whether they want to keep interacting with one another by using 

physical gestures or short responses. Since Google Home does not have any visual signifiers, 

except an LED light on the touch surface3,  it is difficult for people to formulate a question 

(Reeves, 2017). Moreover, people often use ambiguous expressions, which connote subtle 

nuances, in communication, which makes the interaction with virtual assistants worse (Rong et 

al., 2017). 

Previous research studied how people interact with virtual assistants in certain situations 

such as a museum (Kopp, Gesellensetter, Krämer, & Wachsmuth, 2005), office (McGregor & 

Tang, 2017) or a home (Clark et al., 2016; Porcheron et al., 2018). Despite the fact that those 

studies provided rich empirical findings, they are limited to the task-oriented interaction. 

Nevertheless, a few studies provided a pattern of how people proceed in their conversation with a 

virtual assistant. Luger and Sellen (2016) found people who have used virtual assistants for a 

fairly long time use a particular economy of language (i.e., limiting the types of language used 

and shortening utterances), which is perceived as a way to successfully interact with virtual 

                                                
3 See details:  https://support.google.com/googlehome/answer/7073219?hl=en  

 

8 

assistants. Specifically, they report that people focused on keywords to form a simple sentence, 

and also enunciate a question. Porcheron, Fisher, and Sharples (2017) also found that people 

keep asking the same question, failing to receive an expected response. Despite the same 

question, people changed their tone to make their voice louder and emphasized key sounds. 

Similar speech modifications were observed in children’s interactions with virtual assistants 

(Lovato & Piper, 2015)  Given the fact that people have little common ground with virtual 

assistants, people cannot help modifying the way they have a conversation with virtual assistants. 

In this study, I do not analyze how people use virtual assistants to deal with a routine task that 

allows me to know their usage, but not to extract a pattern of dialogue. Rather I investigate how 

people develop a strategy, interacting with virtual assistants in more detail in order to shed light 

on what future interactions with conversational agents might be like.  

 

2.3 TASK DESIGN 

To examine people’s strategies for having a conversation with HVAs, it is necessary to 

understand how to design a task that allows me to view their interaction to seek information. 

Information seeking is a process in which people purposefully engage in order to learn or solve a 

problem (Schamber & Marchionini, 1996). A task in information seeking research has been 

considered as an important tool to investigate users’ behavior on the web (Jinyoung Kim, 

McNally, Norooz, & Druin, 2017; Rieh, 2004). Depending on how to manipulate a task design, 

the cognitive process can be divergent so that a classification of the task is important to study the 

relationship between tasks and human information behavior (K.-S. Kim & Allen, 2002; Li & 

Belkin, 2008).  

 

9 

Kim and Allen (2002) classified two search tasks, a known-item search and subject 

search. While a known-item search task offers an explicit direction to find a piece of information 

that is known to exist, a subject search task just requires participants to find any pieces of 

information that are related to the subject and regarded as useful to them. They found that a 

known-item search task took less time to complete with higher search performance. However, 

providing too many clues and describing the steps for the search would restrict the participants’ 

behavior taking the same route, which means that a researcher may get less useful results to 

understand participants’ cognitive process.  

Tasks are also categorized depending on the users’ information needs (Jeonghyun Kim, 

2006). Users may just seek specific information (i.e., factual tasks), aim to define a thing, event, 

reason, means, and so on (i.e., descriptive tasks), or search for an instrument to know how to use 

something (i.e., instrumental tasks). Users also explore the web to compare, evaluate, and infer 

information in order to generalize meaningful patterns (i.e., exploratory tasks).  

I introduce two additional types of tasks related to the exploratory tasks: a controversial 

task and a predictive task. People often have a conversation with their friends or family to 

exchange opinions on social issues or forecast a future event or trend. Both controversial tasks 

and predictive tasks are designed in an unanswerable way, but I pose a simple context that 

enables a participant to talk with Google Home. The controversial task includes only a 

controversial issue that contemporarily gives rise to public disagreement. In this task, 

participants were asked to check Google Home’s opinion about it and make their own decision. 

In the predictive task, I designed a scenario in which participants need to check the prediction of 

a certain situation. Through those types of tasks, I examined qualitative research to assess 

people’s attempt to have a conversation in order to understand their mental models.  

 

10 

CHAPTER 3. METHODS  

I conducted user study sessions in which I gave six tasks to 20 novice users to complete 

and asked them to think aloud as they completed those tasks. The study took place from 

November 2017 to January 2018 at the Behavior Information Technology Lab (BITLab) at 

Michigan State University.  

 

3.1 PARTICIPANTS 

I recruited 20 native speakers of American English in the Midwest United States through 

a paid research pool organized by our institution for maximum variation. Google Home is too 

inflexible to understand different English accents such that people who are not native English 

speakers are struggling to deliver their commands (Paul, 2017). To work within this context, I 

limited my participants to native speakers. Each participant was given $20 compensation for 

their participation for a 50-60-minute session. Participants were screened based on four criteria 

in a recruiting survey. Eligible participants 1) were at least 18 years of age, 2) had no experience 

with any home virtual assistant devices, 3) had little or no experience with mobile-driven virtual 

assistants, and 4) were not experts in virtual assistant technology. 

Although my study is focused on Google Home, I categorized people who had any 

experience with other home virtual assistant devices as ineligible. This is because such users 

have pre-existing mental models formed through their interactions with the other devices that 

could affect their behavior. However, I restrictively allowed some mobile-driven virtual 

assistants’ users to participate. Given the fact that the usage between home virtual assistants and 

mobile-driven virtual assistants are different, people who have used only certain features that 

 

11 

require a particular economy of language (e.g., “directions to home”, “tomorrow’s weather”) a 

few times are permitted in the study. I also excluded participants who identified themselves as 

technology experts (job responsibilities include computer programming, quality assurance and 

testing, IT security, or network administration) or virtual assistant experts (work experience at 

any company related to virtual assistants). I believe that those expert groups have knowledge that 

helps them effortlessly use such a device, or at minimum, easily apply their knowledge to 

unexpected situations (Luger & Sellen, 2016). 

I also chose participants who passed through the recruiting survey (Appendix 1). As 

divided by age group (e.g., 18-27, 28-37, 38-47, 48-57 or 58+), I selected 20 participants from a 

wide variety of ages on a ‘first-come first-served’ basis (mean age = 40; age range = 19 - 71). In 

addition, I selected 10 females and 10 males. Eleven participants had never used any virtual 

assistant device. Nine participants had used mobile-driven virtual assistants, but they rarely used 

it in their daily life; three for a few times per month, four for about once per month and two for 

less than once per month. (more detailed information is shown in Appendix 2)  

 

3.2 PROCEDURE 

Upon arriving at the lab, the participants were provided with a consent form. Once 

necessary consent had been granted, brief instructions (Appendix 3) on how to use Google Home 

were offered in case a participant did not know how to converse with Google Home. Each 

participant was given a different new Google account I created for the purpose of data collection 

in the study session. While I created new Google accounts for each participant, there may be 

some device identifier and history on Google’s servers that I did not have access to and cannot 

change. Also, each Google account may be subject to a certain geographical data since Google 

 

12 

Home was located in the lab, which is leading to provide location-based answers during the 

study sessions. Note that I reset Google Home to the factory defaults in every 5 accounts since 

Google Home allows to link to only 5 accounts. 

After providing Google Home’s information, each participant was given brief 

instructions to understand how to do a think-aloud and a training set to practice a think-aloud 

session (Appendix 4). After finishing all of the instrument sessions, I asked participants to think 

aloud while conducting the tasks. The think-aloud instruction in this study is based on the 

guidelines provided by Ericsson and Simon (Ericsson & Simon, 1980). Each task was written on 

a separate sheet of paper. To control for order effects, I randomized the order of tasks with the 

help of a Latin square. To be specific, the first, sixth, eleventh, and sixteenth participants began 

with the first task and the second, seventh, twelfth, and seventeenth participants began with the 

second task (Table 1). I did not pose a time limitation for the completion of each task, and it was 

finished when they desisted from doing the task. After completing all tasks, I conducted a semi-

structured interview with general questions, like “Tell me what you thought about the interaction 

with Google Home during the task”, that sought to comprehend the overall perceptions but avoid 

prejudging the answer (Granger Morgan, Fischhoff, Bostrom, & Atman, 2002). Then, the 

participant was asked to elaborate their behavior to recall underlying reasons that they might 

have for formulating their strategies. Developing those follow-up questions based on the 

participants’ answers, I probed for a set of causal beliefs that impact their behavior during the 

tasks. During the interview, a video camera was set up in the lab to record the interviews to 

capture participants’ facial expressions while they were interacting with Google Home. Lastly, 

the participants were asked to complete a survey for demographic information (Appendix 5). 

 

 

13 

                      Order 
Participant 
P1, P6, P11, P16 

P2, P7, P12, P17 

P3, P8, P13, P18 

P4, P9, P14, P19 

1 

2 

3 

4 

5 

6 

Task 1 

Task 2 

Task 3 

Task 4 

Task 2 

Task 3 

Task 4 

Task 5 

Task 3 

Task 4 

Task 5 

Task 1 

Task 4 

Task 5 

Task 1 

Task 2 

Task 5 

Task 1 

Task 2 

Task 3 

Task 6 

Task 6 

Task 6 

Task 6 

P5, P10, P15, P20 

Task 5 

Task 1 

Task 2 

Task 3 

Task 4 

Task 6 

Table 1: The order of tasks for each participant 

 

3.3 TASK DESIGN 

I designed six tasks (Appendix 6) to learn about how people converse with Google 

Home. All tasks were provided with a scenario-based context, except the last one which allowed 

participants to ask anything they wanted, so they could easily pretend to seek certain information 

(Nielsen, 2014) . To make the task actionable, prompts in all tasks are designed to ask 

participants to talk with Google Home (e.g., please ask Google Home about it). Further, all tasks 

did not contain information that provided an explicit direction to find a piece of information. In 

other words, tasks in this study are subject search types (K.-S. Kim & Allen, 2002). For instance, 

I did not ask participants to find travel information from a certain information source in the first 

task. By using the subject search type task, it is possible to gain data more closely resemble real-

world information seeking (Rieh, 2002). Posing a specific direction to do a task may result in 

biased behavior that is restricted within limited results, so I intentionally used only subject search 

type tasks in this study.  

 

14 

Tasks were developed based on the ability lists on Google Home’s website4 to 

demonstrate users’ practical actions as the usage would be in real life. To design unanswerable 

tasks, some tasks were not intended to refer to the ability lists on the website. The goal in this 

study is to understand how people perceive home virtual assistants rather than to test 

performance of the system, and thus, I intentionally included unanswerable tasks to observe the 

way participants adjusted their approach to find the answer. By adapting a controversial task and 

a predictive task, I designed unanswerable tasks. I designed answerable tasks to apply certain 

information needs such as factual tasks or instrumental tasks (Jeonghyun Kim, 2006). When 

asking for a definition, for example, it is easy to receive an answer by using only one question, 

so I excluded descriptive type tasks in this study. The tasks were given in written form, with 

fields such as travel, shopping, passport, Obamacare and sports. The final task was open to 

participants where I allowed them to ask anything they want.  

 

3.4 ANALYSIS 

I recorded the interviews and collected log data in the task sessions through Google 

activity5 where Google accumulated users' verbalized sentences and displays its results. All 

recorded data was transcribed in the text data by Scribie (www.scribie.com). Based on the 

transcripts from Scribie, I conducted qualitative data analysis by using Nvivo. To identify 

strategies and mental models, I iteratively analyzed the data by using an inductive qualitative 

approach. I began the process of open coding by reading the full transcripts to identify similar 

patterns in participants’ questions and emergent causal explanations about participants’ beliefs 

                                                
4 Tasks were developed in October and November in 2017. Google Home’s ability lists are available to see at 
https://support.google.com/chromecast/answer/7130274?hl=e n&ref_topic=7195641     
5 Available at https://myactivity.google.com/  

 

15 

and expectations toward Google Home. To be specific, I focused on a certain phrase and an 

underlying purpose in the question that the participants said in order to identify each strategy. I 

examined only tasks' transcripts to extract participants' strategies. In the case of mental models, I 

examined both tasks' transcripts and semi-structured interviews’ transcripts.  

With this approach, I investigated how frequently people talk about a certain phrase (wh-

question, direct action, or only a word) or a certain concept in order to discern their strategy, 

focusing only on a single question at first. The unit of analysis for the strategy was one question 

that begins by uttering a trigger word, such as "Hey Google" or "Okay, Google". I then re-read 

the entire task transcripts to recognize whether each question implied a certain intention. For 

instance, I coded both “Hey, Google, do you know anything about Obamacare?” and “Hey, 

Google, do you have prices for televisions in East Lansing?” in the ‘Tell me anything you know’ 

strategy, although the second one does not contain the specified word “any”, because both 

sentences are used to try to extract any information related to a certain topic. To identify mental 

models from the transcripts, I focused on how participants explained the way they formulated the 

next question based on their previous experience and interaction with Google Home during the 

task. In addition, I examined participants' general thoughts and feelings and any causal 

explanations in the semi-structured interviews' transcripts.  

 

 

 

 

 

 

16 

CHAPTER 4 FINDINGS 

My findings below focus on patterns of participants’ strategies and their mental models 

that formed their strategies.  Note that the goal of my study is not intended to evaluate Google 

Home’s success or failure in answering users’ questions. Even though I might assess success 

based on whether Google Home provides the right answer to what users are asking, my particular 

concern in this paper is to gain insight into how participants change the way they speak to 

Google Home based on its responses. 

 

4.1. HOW DID PEOPLE SEEK SPECIFIC INFORMATION? 

In this section, I described how participants interact with Google Home to find 

information on answerable tasks.  

 

People tend to provide more information for their goals  

Participants spent the longest time to perform factual tasks to find out several pieces of 

information in comparison with the other types of tasks. They did not finish the factual tasks 

immediately, which means that participants applied a variety of strategies in the two factual 

tasks. The most commonly used strategy was a 'personal context' strategy to explain personal 

backgrounds or desire. Participants often explained what information they were looking for as 

the first question. This is because participants were not familiar with interacting with Google 

Home and they adapted their habitual behavior that demonstrated what they wanted to know 

exactly to the interlocutor: 

 

 

17 

“Yeah, I usually try to give some kind of background as far as where I'm coming 
from and what I'm doing. So I think that's more of habit.” - P11, interview  
“I don't know how Google Home works internally really, but I figured that would 
potentially help if I gave it context for what I wanted then maybe it would be able to 
find results more tailored to what I wanted specifically.”  - P13, interview 
 

This is a common factor in human conversation that people share information and update their 

common ground to make clear the conversation. Some participants even gave more than one 

sentence to detail their needs: 

 

 

“Okay, Google. I'm thinking of going to Japan here real soon and I would like to find 
a few things or get some answers to a few things here, like about flights or 
accommodations. Maybe what I can do there, where I can stay, that's basically 
accommodations. Give me some ideas about the flights, how much it might cost, how 
long it would take going to Japan.” - P6, think aloud 

However, Google Home could not answer the question including a personal context such as their 

needs or future plans. Even worse, Google Home kept asking participants to set up a payment 

method if they said they wanted to buy a television (“It looks like you don't have a payment 

method or delivery address set up yet. You can do that in the Google Home app.” – Google 

Home).  

Once Google Home could not give an answer or provided an inaccurate answer, 

participants usually made their questions more specific to clarify what they meant. Participants 

surmised that the reason Google Home could not give the right answer was because of their 

questions:  

 

 

“I think I just asked what are the most recent flights, but probably I just didn't give 
it a specific day to search for. Hey, Google, when is the next flight going out to 
Japan?”   - P12, think aloud 
 

18 

Based on their knowledge, they also specified their questions by adding a representative brand 

(e.g., “Okay, Google. Does Best Buy have Visio televisions?” – P2), or a certain place. They 

thought it would help Google Home understand more clearly what they were looking for. For 

example, P18 added a hotel brand to the question:  

 

 

“Yeah, that's a too wide-open question. Let's see. Yeah, I think without really having 
a specific destination in mind, if I was doing this normally, I would look at a map of 
Tokyo and identify specific places I want to go to Tokyo. Hey Google, what is the 
nearest hotel in Tokyo to the Imperial Palace?” - P18, think aloud 

An experience allowed them to form a simple question 

Using a simple imperative form of the question, instead of a wh-question, is another 

strategy to extract information from Google Home when participants could not receive an answer 

from Google Home (e.g., “Hey, Google. Tell me about Japan.”). One participant (P13) used only 

the imperative form after receiving the right answer. Another participant (P8) used this strategy 

for the first question since they had seen the simple imperative form of the question in a 

commercial for an HVA. They believed they were supposed to use this strategy to be successful 

and that it is a natural way to interact with Google Home:  

 
“I just feel like I've heard that said to Google Home before, maybe like 
commercials or whatever. That's the way that they want you to use it, or I shouldn't 
say that necessarily, but the way that they advertise, Google Company advertises it 
being used so that feels like the highest success rate for...”   - P8, interview  
“Perhaps, or at least the easiest way to most likely get an answer.”  - P13, interview 

 

Google Home increases people’s expectations due to its answer  

Participants tended to ask for additional information when Google Home gave a 

satisfactory answer. Participants wanted to know the exact flight time or the airline’s name 

 

19 

and the hotel’s price, amenities in the hotel, or more detailed information, conducting the 

first task (see details in Figure 1). For example, P1 asked about the entrance fee of a certain 

place that Google Home answered:  

 
P1: “Hey, Google. What other places of interest are there in Osaka Japan?”  
 
Google Home: “Here's a summary from the website Japan-top.com. Amongst 
Osaka's many attractions, these stand out: Dotonbori in Namba, and K-Comedy 
Theatre, Osaka Castle, Tenjin Matsuri, Sumiyoshi Taisha Shrine, Shinsaibashi and 
more. To read more, look for the link in your Google Home app.” 
 
P1: “Hey, Google. How much does it cost to visit Osaka Castle?”  

 

Furthermore, they also wanted to know specific information regarding the TV and the other 

option to issue a passport, after receiving the right answer (see details in Figure 1).  

P10: Hey Google, does the Royal Park Hotel have a swimming pool? 
 
P17: Hey Google, what is the prices of those rooms in Tokyo? 
 
P14: Hey, Google, for the flight from Lansing, Michigan, to Tokyo, Japan, what are 
my flight times? 
 
P9: Okay, Google. What airlines fly from Lansing to Tokyo? 

P2: Does Best Buy have flat screen televisions? 
 
P15: Okay, Google, does Best Buy have smaller TVs? 

Task 1:  
 

 
Task 2:  
 

 
Task 3:  
 
 
 

P4: Hey, Google, is there a place other than online I can go to get my passport? 

 

Figure 1: Participants’ questions in the factual tasks: After receiving the right answer, 
participants asked questions again to know additional information.  

 

 

20 

Because a majority of participants believed that Google Home would perform like a 

Google search, they expected Google Home might be able to provide detailed information, 

reflecting their previous experience with Google search, where they can find anything they want 

to know:  

 

 “which is weird because if you were to Google that, that’s the first thing that 
would come up is all the hotels, how much they are, that kinda thing.” - P2, 
interview 

 

People tended to align with the language use from Google Home’s answer 

Participants also sought more detailed information, receiving an appropriate answer. To 

ask for detailed information, participants developed a question based on information that was 

given by Google Home. They tended to reuse a word that Google Home said (see details in 

Figure 2). People tended to align their questions by adopting the computer’s word choice like the 

interaction between humans (Branigan, Pickering, Pearson, & McLean, 2010).  

 

21 

P1:  Hey, Google, what documents do I need to get a passport?  
 
Google Home: On the website usa.gov they say, "What you'll need to provide, Form 
DS-11, proof of US citizenship, government-issued photo ID, color passport photo, 
and fee payment." 
 
P1: Hey, Google, where do I get a passport photo? 

Example 1:  
 

 
Example 2:  
 

 

P13: Okay Google, how do I get a passport for the first time?  
 
Google Home: On the website travel.state.gov, they say, "Inside the United States, 
you must submit your completed application including citizenship evidence, 
photocopy of ID, photo, and fees in person at a passport acceptance facility. You 
may be eligible to apply in person at a passport agency if you are traveling soon." 
 
P13: Okay Google, where is the nearest passport acceptance agency to me? 

 

 

Figure 2: Participants’ questions in the instrumental tasks: Participants align the 
questions by using a word to which Google Home gave in order to seek additional 
information.  
 

Relatively easy to find information in terms of how to do something  

Unlike the other types of task, it was easier for participants to interact with Google Home 

on the instrumental task. I found similar patterns and strategies in the instrumental task (i.e., 

search for an instrument to know how to use something) with factual tasks (i.e., seek specific 

information). It seems obvious, since both tasks are designed to find out specific information, 

even though the instrumental task is specialized to find ‘how-to’ information. However, 

participants did not stumble over coming up with a new question to find information. 

Participants received a means to issue a passport, using a straightforward wh-question, which 

was the most common way to ask (e.g., “Okay Google, where could I get a passport?” - P16, 

 

22 

“Okay Google, how can I get a passport?” - P19).  Even though some participants explained their 

needs to form the first question (e.g., “Okay, Google. I need a passport, so I can travel overseas. 

How do I do this?” – P6), they immediately asked again, leaving out their personal contexts:  

 
“..when I changed the wording of the question, it did give me, it found a website 
that had information. So that was, it was very straightforward. It understood those 
questions very easily.” – P20, interview  
 

Sometimes participants received an inadequate reply, however, they seemed to be more 

persistent in this task than the other tasks. For example, Google Home formed a hasty conclusion 

that participants wanted to know how to renew a passport. Then it gave the information how to 

renew the passport (see details Figure 3). Despite its blunder, participants readily rephrased their 

questions to be clearer, adding just an adjective (e.g., “new”, “first”).  

Figure 3: Data from Google Activity in terms of task 3: Google Home provided 
information regarding a passport renewal, although a participant asked, ‘where can I get a 
passport from’. It happened because Google Home is currently designed to just read a 
text on the website that has a relevant title.  
 

 

 

23 

4.2. HOW DID PARTICIPANTS INTERACT WITH GOOGLE HOME AFTER 

LEARNING ITS LIMITATIONS? 

 

In this section, I described how participants interact with Google Home to find 

information on unanswerable tasks.  

Different approaches in the first attempt  

More than half of the participants asked Google Home about general information by using 

a broad type of question in the controversial task (e.g., “Okay Google, can you give me any 

information about Obamacare?” – P6, “Hey, Google. Please tell me about Obamacare.” – P8, 

“Okay Google, what is Obamacare? – P16), although participants were asked to check Google 

Home’s opinion on Obamacare. Some participants explained that they wanted to know how to 

proceed with a conversation based on the broad question first:  

 

“First, I wanted to be just like, "What is it?" so that I would get an idea of where I 
could start to understand that part.” – P16, interview 
“I guess I thought that rather than starting with just the opinion it was important for 
me to ask questions about facts and kind of then create my own opinion before I 
asked their opinion.” – P11, interview 
 

Nevertheless, they could not gain enough information to develop the next question so that they 

asked Google Home’s opinion in the end. After learning of Google Home’s inability to answer in 

terms of its opinion on Obamacare, a few participants gave up on the task. However, most 

participants attempted to ask more questions, making use of different strategies.   

In case of the predictive task, most participants asked about Google Home’s anticipation 

regarding the NBA playoffs, while some participants sought to determine the current state of the 

Detroit Pistons by asking a broad question at the beginning (e.g., “how are the Detroit Pistons 

doing?” - P15). Since the predictive task was designed to be unanswerable, participants could not 

 

24 

receive any piece of information regarding Google Home’s prediction. To deal with this 

challenge, participants applied various strategies to extract information.  

 

People commonly use a broad question  

 

Although there was no dominant strategy for seeking information regarding the 

unanswerable tasks, the most commonly used strategy was ‘Tell me anything you know’ strategy 

(see detailed in figure 4). In other words, participants provided an open question to draw out any 

information Google Home knew. They took a step backward and tried to think broadly about 

how to formulate the right question by using this strategy. Some participants were doubtful about 

Google Home’s knowledge on a basic topic so that they wanted to make sure whether Google 

Home was able to provide information regarding basketball or NBA. Since Google Home could 

not show its limitations to form a common ground between participants and Google Home, 

participants adjusted their expectations of its ability by themselves: 

 

"Cause I think the first time I asked about the Detroit Pistons, it said, "I don't 
understand," or "I can't help you with that." Or I think I asked about the Pistons, and 
then I asked about the Detroit Pistons, so I'm like, "Alright. Does this thing even 
understand what basketball is?" So that's why I went that route to make sure it had 
some clue.” - P14, interview  

 

 

25 

Case 1: ‘Tell me anything you know’ strategy in the controversial task.  

 
P9: Okay. She's not sure about the answer. Okay, Google. What do you know 
about Obamacare? 
 
P12: So it said it didn't know when I asked for its opinion. Hey, Google, do you 
know anything about Obamacare? 
 
P13: Okay. What... Do I have anything else to say? Okay, Google, can you give 
me your opinions on anything? 
 

Case 2: ‘Tell me anything you know’ strategy in the predictive task 
 

P14: Hey, Google, do you know what the NBA is?” 
 
P5: Hey Google, do you know the National Basketball Association? 
 
P6: Okay, Google. What do you know about basketball? 
 

 

Figure 4: Participants’ questions in the unanswerable tasks: Participants broadly asked 
questions to extract any related information or check Google Home’s ability.  

 

 

A diverse range of strategies to gain related information 

Participants also made their questions more specific, providing their personal information 

or knowledge. To deal with Google Home’s inability, participants ask about related information 

the Pistons’ current record, last year’s results, remaining games, or the criteria for determining 

playoff teams – to give more specific contexts to Google Home. Because participants recognized 

Google Home’s limitation to predict a future event, they, at least, attempted to accumulate a 

piece of information to predict the possibility by themselves: 

 

 “I want to just find out any information about the Pistons that I can, and from 
there maybe get some semblance of a prediction for how they'd be doing the 
playoffs.” - P13, interview  
 

Some participants (P9, P13, P14) believed using a better-known synonym (Affordable 

Healthcare Act) would be of help to get the answer from Google Home. P14 thought since 

 

26 

Obamacare is the informal word to call Affordable Healthcare Act, she need to use the right 

word:  

 

 

“Well, that doesn't help me. Obamacare is also the Affordable Healthcare Act, so let 
me try asking about that. Hey, Google, can you tell me about the Affordable 
Healthcare Act?” – P14, think-aloud 

In the predictive task, some participants (P9, P10, P15, P19) rephrased their questions, 

asking about the odds of the Pistons making the playoffs. They believed that Google Home was 

able to figure out the odds by calculating data itself or by referring to a certain website. This 

perception resulted in participants’ relatively high expectation toward Google Home so that 

participants kept asking questions by using different keywords, making questions more specific:  

 

 

“I guess it's kind of hard to predict on whether or not you make the playoffs, but I 
would think maybe that Google would have the ability to run some sort of analytics 
to say, "This is their odds of finishing in the top eight or not."…” – P9, interview 

P18 who was a sports journalist at one time made their questions more specific by using his 

experiences. He attempted to extract Google Home’s prediction by using a hypothetical question 

(e.g., “Hey, Google, would the Detroit Pistons be in the playoffs if the playoffs started today?” – 

P18). Despite this participant’s professional knowledge, Google Home did not provide any 

additional information. Another participant behaved as if he taught Google Home, explaining a 

common idea (e.g., “Okay. Okay, Google. Detroit Pistons are a basketball team. Do you know 

anything about them?” – P6). He believed Google Home learned if he kept explaining a certain 

information.  

 

27 

4.3. WHAT MADE PARTICIPANTS STOP USING IT 

Participants usually gave up on tasks once they had no idea to how clarify their questions 

any further. Sometimes participants easily completed up their tasks, receiving an adequate 

answer or realizing its limitations. In those cases, they lowered their expectations regarding 

Google Home's performance, and thus, did not demand any further detailed information from 

Google Home. Nevertheless, the majority of participants thought having a conversation with 

Google Home was challenging. While Google Home is considered as a good tool to ask a simple 

question for a trivial fact, participants mentioned formulating the next question is difficult as 

following up on Google Home’s feedback. In the end, they were discouraged to use Google 

Home to find information, even though they attempted a variety of strategies to extract more 

information:  

 

“It's difficult to tell it exactly the question that you want and have it understand you. 
Sometimes you get inaccurate information or... Not inaccurate, but not helpful 
information. Sometimes it doesn't understand you and you have to re-ask your 
question multiple times.” - P19, interview  

“Like the Obamacare one, I was trying to ask her what the political debates about it 
were, but she didn't have any information on that. Then I tried to rephrase it and use 
different words, but she still didn't get it. It was difficult to think of a new way to say 
it that she would understand.”  - P16, interview 

 

One distinct thing that deterred participants in the first task was a preprogrammed Google 

Home’s answer in terms of the flight reservation. Although it seemed convenient to find a 

ticket in response to the question, participants struggled to follow Google Home’s joint 

action since it was able to reply to only a certain form of short answer (see details in 

Figure 5). The majority of participants were also struggling to follow up on Google 

 

28 

Home’s long responses to unfamiliar or unrelated topics in order to develop the next 

question. It was difficult for them to memorize or gain specific information from Google 

Home’s long answer. (See details in Figure 6). Such Google Home’s preprogrammed 

answer can be considered as Google Home’s effort to exchange information for a 

successful interaction, but participants could not appropriately follow up its 

communication way.  

 

29 

 

P14: Hey, Google, can you find me a flight from Lansing, Michigan, to Tokyo, Japan?  
 
Google Home: Upcoming round-trip flights from Lansing to Tokyo start at $1,379. Did 
you have any dates in mind?  
 
P14: Hey, Google, I would wanna go January 2nd through January 8th. 
 
Google Home: Sorry, one more time. 
 
P14: January 2nd through January 8th. 
 
Google Home: Sorry, I'm not sure how to help with that. 
 
P14: Okay. So I don't know how to talk to Google to do the followup that they asked me 
a question, so we'll try that again and I'll just start with "Hey Google," which is now 
awake. Hey, Google, can you find me a flight from Lansing, Michigan, to Tokyo, Japan, 
January 2nd to January 8th?  
 
Google Home: I didn't catch that either. Try saying your dates like, "February 16th to 
February 20th." 
 
P14: Hey Google, January 2nd to January 8th. 
 
Google Home: I'm still not getting it. Please try again later. 
 
…… 
 
P14: Hey, Google, to fly to Tokyo, Japan, from Lansing, Michigan, on January 2nd, 
what is my departing flight time? 
 
Google Home: Round-trip flights from Lansing to Tokyo, leaving January 2nd and 
coming back January 8th, start at $1,682. 
 
P14: It's interesting that Google Home can still remember the timeframe that I was 
looking at, the exact dates I wanted to travel, 

 

 

 
Figure 5: Participant’s reaction toward Google Home’s preprogrammed answer: 
Since Google Home restricted participants to use a certain type of short answer in order 
to provide additional information, the participant was struggling to follow up Google 
Home’ question. Also, P14 was surprised that Google Home still remember the exact 
dates, although P14 asked only the flight time on the departure date.  

 

30 

 

P15: Okay Google, is there a restaurant in the airport in Tokyo, Japan?  
 
Google Home: I found a few places. The first one is at Hanedakuko, 3 Chome-
4-2, 2 in Ota. The second one is Tsurutontan Haneda Airport shop at 2-6-5, 4F 
in Ota. The third one is Takafuku at 2-6-5 in Ota. 
 
P15: Wow. [chuckle] Okay Google. Gee, I don't know what to think here. 

Figure 6: Participant’s reaction toward Google Home’s long response: Participants 
usually could not follow up information if Google Home provide ‘text-to-speech’ type of 
answers.  
 

Moreover, participants mentioned the way Google Home deals with errors is too 

unspecified to proceed to a follow-up question. In other words, participants thought Google 

Home is derelict in its function to inform them of how to reword their question: 

 

 

“like I say, the "I'm still learning," is a little frustrating after a while because... The 
questions are not... We're not asking it really complex questions but it's... It goes 
over so many options”  - P18, interview  

 

Relying on only the speech to seek information also exacerbated the interaction. 

Compared to the usage of Google search on a desktop computer or mobile device, 

participants explained obtaining the information by only using a conversational interface 

was too difficult to obtain further information. Since participants had no experience in 

HVAs, they were accustomed to making reformulated search questions based on the search 

result on the screen. Thus, they found it difficult to develop another question, relying on 

only Google Home's answer:  

 

 

31 

“It's obviously not like Google where they give you 20 million choices and then you 
can scroll down and pick which ones sound the best. I don't know, I guess maybe if 
she gave you choices like, "Is this what you want? Is this what you want?" that would 
be cool…Cause at least actual Google does that. It doesn't... It might not know 
exactly what you want, but it at least gives you multiple suggestions and then you can 
pick which one that you're like, "Okay, that's sounds most like it might be helpful to 
me.”  - P2, interview 

 “Well, I wonder if there's a specific format to use to ask questions and if that's how 
it's programmed than the user needs to know how to ask the questions if that's what's 
wrong.” - P1, interview 

 

4.4 MENTAL MODELS OF HVAS 

My findings indicate that participants attempted to formulate different strategies 

depending on Google Home’s answers and based on their different levels of expectations. A 

majority of participants attempted to use diverse strategies, not adhering to only one strategy 

across different types of tasks. Although participants tended to use more specific strategies in 

answerable tasks (Task 1,2,3) and more broad strategies in unanswerable tasks (Task 4, 5), there 

is no distinguishable pattern between each different task. Since mental models keep evolving to 

adopt new knowledge, participants consistently adjusted the way they communicated with 

Google Home. 

I found two types of mental models that influence the way people interacted with Google 

Home. Most participants made use of both models to find out information. I briefly summarize 

two models and related strategies in Table 2. Each strategy is also not mutually exclusive and 

sometimes the participant used more than two strategies in one question. 

 

 

 

32 

Common-sense models 

The first model is common-sense models in which participants opt to make use of common 

sense in a human conversational manner (i.e., grounding) or cliché that anyone easily 

understands. Although participants did not perceive Google Home as a human being, they 

believed using a human-to-human communication skill may facilitate a conversation with 

Google Home, especially when Google Home failed to answer them. People have conversations 

to coordinate with people by aligning each other’s knowledge and experience (Clark, 1997). It 

led participants to bring a question with personal contexts at the beginning of the task to 

coordinate a common ground:   

 

“That's just naturally. I guess when I talk to people, I try to provide that context 
because if somebody comes with me with just a question goes, "I want to buy a TV.”  -
P14, interview 

 

People also tend to adopt a range of words or phrases by their interlocutor in human-to-human 

communication (Branigan et al., 2010). In a similar way, participants developed their questions 

based on the word used by Google Home. 

Participants typically used this model when they believed Google Home could understand 

general knowledge everyone may know. They believed telling a representative brand’s name or a 

place that everyone may know would make it easier to receive answers from Google Home. For 

example, P10 asked a question to include Las Vegas (“Hey Google, What does Las Vegas say 

the odds are for the Detroit Pistons basketball game to win the NBA playoffs?”) because she 

believed Google Home found the possibility of making playoffs:  

 

33 

“Like the NBA. Who could predict that? But I thought Las Vegas could predict that. 
So when I asked for odds I thought maybe that would help [….] cause they set the 
odds for all the professional games. And then when people bet they know how much 
of a chance there is for their team.”  - P10, interview 

 

Participants also included the representative retail shop – Best Buy –where they used to visit 

to buy a TV:   

 

“I feel like that's one of the biggest tech stores, and it's the one that I'm most familiar 
with. That's where I usually get my electronics from, and they're everywhere, so...” - 
P8, interview 

“Because I know that you can buy a television at Best Buy. That's where I would go, 
so I probably would give it information..” – P5, interview 

 

Machine-oriented models  

The second model is machine-oriented models are associated with high expectations of 

Google Home that usually lead participants to push for a specific piece of information. People 

tend to focus on a certain entity (e.g., Google search engine) during the interaction with a 

computer, and thus, this source orientation influences their behavior (Solomon & Wash, 2014). 

Since all participants had no experience in any types of HVAs, they came up with the affiliated 

service of Google in lieu of Google Home itself:  

 

 “when you think of Google Home, it's Google and Google has everything you 
ever wanted to know, so […] I'm surprised that it couldn't find any answer to 
that.” – P9, interview 

“So I’m thinking, "Okay, so this is a Google device. Obviously, it probably has a 
way to access the Google engine."”- P12, interview 

 

 

34 

This source orientation led participants to formulate specific questions that demand further 

information. Because they believed Google Home could find information through Google search 

where participants used to find anything they want.  Some participants described their strategies 

to seek information through Google Home were made based on how they typed in Google 

search. They believed recombination of keywords by adding particular information would help 

to receive the right answer from Google Home:  

 

  

  “Yeah, I try to see what are the current debates or what is the political debate. 
'Cause I thought that normally when I search stuff into Google, I'll use keywords like 
"political" or "current" to focus the search more”  - P16, interview 

  “So the only way I can equate that is when I was first learning how to use Google 
Scholar knowing how to put in the search terms to find what I wanted and figure out 
what language it understood.” – P14, interview  

Although Google Home reminded participants of Google search, this conceptualization 

would not help them to use Google Home. Google Home sometimes gave the same answer that 

provided in the google search result page (See details on Figure 7), however, it could not 

facilitate to proceed a further conversation. On the Google search page, people can see more than 

one result at a time, which means that they formulate their next sentence to find information. On 

the other hands, Google Home provided only one answer, resulting in the difficulty to have a 

conversation.  

 

 

 

35 

 
P1: Hey, Google. 
Where is a good place 
to buy a television in 
East Lansing, 
Michigan?  
 
Google Home: I found 
a few places. The first 
one is WLNS at 2820 
East Saginaw Street in 
Lansing. The second 
one is WLNS-TV at 
3660 Van Atta Road in 
Okemos. The third one 
is Best Buy at 2020 
West Grand River 
Avenue in Okemos. 

 

 

 

 

 

 

 

Figure 7: The result of Google search(left) and Google Home(right) in response to the 
same question: Google Home provides the same answer that people are given in the 
Google Search. However, Google search displayed more than one information which 
allows people to select the information they need.  

 

Participants also conceptualized Google Home as an artificial intelligence device that is 

able to learn based on the pieces of information received from users. Participants attempted to 

train Google Home by presenting a certain information if they believed Google Home had the 

 

36 

ability to collect data from a verbal interaction between Google Home and users. Especially, 

people formed this model once Google Home provided an answer based on the participants’ 

previous questions. For instance, P14 believed Google Home is operated by using not only its 

own dataset, but also data received from users, after seeing that Google Home retained the travel 

schedule during the conversation. (see details in Figure 5). As reflecting this experience, P14 

said that using a broad question to extract what Google Home knows about a certain topic would 

be helpful to proceed a conversation with Google Home:  

 

“it does seem to have some memory and some retention [….] so it did remember 
something. It did hold something. So had that question come before the NBA 
question, I guess I could have started with, "Do you know what the NBA is?" "Do 
you know who Detroit Pistons are?" And built it that way to see if something could 
stay in its memory to build the information that I wanted better.” - P14, interview  

 

P9 also pushed information that may be helpful to develop Google Home’s ability in order 

to proceed further conversations. P9 thought Google Home is able to retain information 

from his previous questions and provide related information. (See details on Figure 8)  

 

 

37 

Task 3 during the think aloud session:  

 
P9: Okay, Google, why is an LED TV better than a plasma TV?  
 
Google Home: On the website pcmag.com, they say, "In plasma HDTVs, the 
phosphorus that create the image on the screen light up themselves and don't 
require backlighting. This doesn't mean it's more efficient than back-lit LCD 
screens, though. On the contrary, plasma panels are much heavier than both 
CCFL and LED back-lit LCDs and consume much more power." 
 
P9: Okay, Google, can you recommend a TV for me to buy?  
 
Google Home: On the website ebay.com, they say, "LCD technology is more 
expensive than plasma so a large plasma screen can be bought for the price of a 
much smaller LCD screen." 
 
P9: That is not the question that I asked. Okay, Google, can you recommend a 
TV for me to buy?  
 
Google Home : Here's a summary from the website bestbuy.com. Best Buy 
customers often prefer the following products when searching for LG plasma 
TV: LG 65 inches class LED 2160p Smart 4K Ultra HDTV with high dynamic 
range and more. 
 

Semi-structured interview session:  
 

  P9: I feel like it learned. Or it understood what I was looking for based on a 
previous question, so I was, like I said, trying to coach it into understand what I 
want it to do 

 

Figure 8: Participant’s interaction with Google Home in task 3 and the debriefing 
from the interview session: P9 received TV recommendation based on previous 
question’s context from Google Home. 
 

Both models are made of some extent of expectations on Google Home, which is formed 

based on the interaction with Google Home. The two models indicate the underlying causal 

process of how Google Home works and why people used a certain strategy. In the next section, 

I demonstrate design implications based on two mental models.   

 

 

38 

 

 
1) Common-sense Model:  
 

Description 

Description of 
the models 

Related 
strategies 

Google Home should have common sense. 
Google Home has a conversation to coordinate with people by aligning each 
other’s knowledge and experience. 
 
 
‘Personal contexts’ strategy: Participants provide context regarding personal  
Information or their needs. 
‘Knowledge in my head’ strategy: Participants specify a context by using  
their own knowledge or adding a representative brand or a certain place. 
‘Alignment with what Google Home just said’ strategy: Participants align  
their queries by using a word to which Google Home gave. 
 
 

Description of 
the models 

Related 
strategies 

 

2) Machine-oriented Model:  
 

Google Home operates as Google search where you can find any 
information. 
Google Home is able to learn and give better answers by harnessing data 
users provide. 
 
 
‘General specific-wh-question’ strategy: Participants rephrase a question by  
adding particular information.  
‘Teach you’ strategy: Participants try to develop Google Home’s ability by  
providing certain ideas. 
‘Tell me anything you know’ strategy: Participants pose an open question to  
extract what Google Home knows about a certain topic. 
 
 

Table 2: Two mental models and related strategies   

 

 

 

 

39 

CHAPTER 5. DISCUSSION 

People without previous experience with home virtual assistants used a variety of 

strategies to seek information, and their understandings of HVAs changed throughout their 

interaction. Unlike human-to-human communication, interacting with HVAs are not 

communicating with a conversation partner that is capable of grounding, although HVAs kept 

answering people’s questions. Based on my findings, I present the most important factors for 

improving the design of HVAs to foster a more realistic conversation with them.   

 

5.1 GIVING AN ANSWER IS NOT A SOLUTION FOR THE EFFECTIVE 

INTERACTION 

 

Google Home responds to questions if it is able to detect any related key-words from 

people’s question, although it is not exactly related to people’s intention. Considering the 

machine-oriented model, giving a relevant answer is not the best solution for having a 

conversation. After receiving the answer, participants formed unnecessary expectations toward 

Google Home so that they overestimated Google Home’s ability. As a result, people may be 

unable to make an accurate judgement of Google Home’s limitation to have a conversation.  

Google Home is known as an HVA device that gives a correct answer more often than 

the other HVAs devices (Sterling, 2017). Google Home provided the most relevant information 

regarding a topic or listed the closest places in the same manner as Google search where people 

are given the most relevant information. For example, Google Home could find out an upcoming 

flight schedule with the price once it was given a question regarding the flight. However, Google 

Home could not provide any additional information such as a departure time, the airline or an 

alternative route, information that is often provided in a Google search. In addition, Google 

 

40 

Home provided the related information from Wikipedia or certain websites in response to the 

participants’ question. Participants naturally re-ask for more detailed information based on that 

information, although Google Home is limited to discovering further information.  

Considering the fact that people are accustomed to seeking information to explore back 

and forth between hyperlinks through a visual interface such as desktop or mobile, Google Home 

should not provide an answer like Google search’s result. While people can reshape their search 

keyword, seeing more than one search result on the webpage, they can’t reformulate their 

questions based on the single answer from Google Home. Presenting the best relevant answer 

like Google search may raise its accuracy and people’s satisfaction for a while. Nevertheless, it 

makes people easily become exhausted to proceed a conversation so that inexperienced people 

are not intended to engage the interaction with HVAs. In this sense, the system designer should 

consider not only how to provide a relevant answer, but also how to form the answer suited with 

the HVAs’ interface, not to implant unnecessary expectations.   

 

5.2. HOW TO SUPPORT COMMON-SENSE MODELS 

Having a conversation is a continuous process to align both interlocuter’s knowledge and 

experience. In other words, human conversation is not an autonomous action, rather a 

participatory action based on the coordination of individual actions by two people (Clark, 1996). 

Unlike human-to-human communications, people cannot be acquainted with Google Home’s 

ability unless they became familiar with its abilities through other sources such as the media or 

the company’s marketing booklet. Even, people need to exclusively use a fixed form of questions 

to seek certain information through Google Home, considering HVAs’ ability list on their 

official website or commercials.  

 

41 

People resolved this challenge by using the common-sense models: (1) they implicitly 

expected Google Home should know general knowledge and (2) they believed Google Home is 

capable of grounding so that they explained their backgrounds or further information, or aligned 

their questions with the word Google Home used. Since the common-sense model is derived 

from human’s intuitive behavior to have a conversation, I argue that system designers should 

consider a way to support people to employ the common-sense models.  

Recognizing a human’s common mental connection between ideas or things 

Understanding what comes into people’s mind when a certain topic is mentioned is 

important to design HVAs in order to get rid of a mismatch in the conversation. The system 

designer should investigate how people associate a certain topic with other things which involve 

a representative place, product, individual, or group or a metaphor that people often use to 

describe it. Although what people say and how people describe a certain topic are chosen from a 

great range of possibilities (Monaghan, Goodman, & Robinson, 2012), HVAs should be 

designed to be able to recognize a high association between more than two ideas or things in 

order to support the common-sense model.  

Diversifying HVA’s error handling strategies  

In order to help people to reframe their questions, asking different types of clarification 

questions to get a better idea of what information the people actually need is also important. 

Current Google Home’s error handling strategies to give an excuse when it can’t answer are not 

enough for people to support their grounding behavior (see current Google Home’s error 

handling strategies in Figure 10). After receiving the answer indicating a failure, people 

sometimes realized they needed to speak a different way. However, they were struggling to 

develop a new way to ask a question since there is no hint to modify their questions, and thus, 

 

42 

they ceased from having a conversation with Google Home. In human discourse, people 

incessantly make sure each other’s intention if something on the interlocuter’s question is not 

clear to proceed the conversation. In this sense, Google Home also need to ask questions to 

clarify people’s intention rather than make rash judgments to provide incomplete answers.    

Examples of Google Home’s error handling strategies when Google Home cannot 
respond in people’s questions:  
 

•  Sorry, I'm not sure how to help with that, but I'm still learning. 

•  Sorry, I don't know how to help with that yet. 

•  Sorry, I'm not sure how to help with that yet, but my team's helping me learn. 

 

Figure 9: Google Home’s current error handling strategies: When Google Home 
cannot provide answers, it did not explain the exact underlying reasons why it cannot 
respond in the user’s question. Instead, it just gave an excuse to the user.  

 
 

 

5.3 LIMITATIONS 

There are several limitations to this study. Since I designed the tasks using specific topics 

and conducted a qualitative study using a think-aloud approach, participants’ behavior might be 

influenced by their familiarity with the topics. A different number of questions in each different 

task type also impacted my analysis. Compared with factual tasks, the other tasks had relatively 

fewer questions. Another limitation is that my study cannot be generalized to all inexperienced 

people, considering my sample size. Also, since I focused more on the diverse range of strategies 

that participants applied in the task, strategies in my findings cannot be considered as general 

patterns in the other potential users, because I included all emergent strategies, even though they 

are not mainly used in the task. In addition, I could not cover smart home usage in the task since 

my study was run in the lab, even though controlling other smart home devices is one of primary 

usage.   

 

43 

CHAPTER 6. CONCLUSION 

For this work, I examined a qualitative study to understand the interaction between 

people and HVAs, observing inexperienced people’s strategies of talk that proceed a 

conversation and the impacts on the change of interaction from their beliefs and expectations. 

Current virtual assistant technologies only allow people to use their abilities in routine tasks. Yet, 

people expect these devices to handle more detailed information from virtual assistants. They 

want these assistants to understand human conversational manners. In order to encourage 

inexperienced individuals to be involved in the interaction with virtual assistants, understanding 

potential users’ mental models is necessary to develop virtual assistants that are prevalent in our 

daily lives. 

My findings reveal that inexperienced people attempted to exchange a piece of 

information to create a dialogue, while HVAs were not capable of grounding with people. People 

assumed that their participatory actions can clarify the miscommunication. However, due to their 

lack of HVAs interaction experiences, they adapted their habitual behaviors that demonstrate 

what they wanted to know. Also, inexperienced people overestimated HVAs’ abilities once they 

were given a certain answer. I conclude that a single turn-taking is not enough to have a human-

like conversation. Rather than focusing on giving a key-word related answer, optimizing the 

answer in a voice-only interface to do a grounding between people and HVAs is critical to virtual 

assistants’ development.   

My research provides the HCI community with a first step to understand how the 

majority of inexperienced people perceive the virtual assistants and thus make use of them to 

search for information. To better understand general population’s understanding of virtual 

assistants, the subsequent work is to further examine mental models across a number of different 

 

44 

applications embedded virtual assistants such as Amazon Echo. Furthermore, my work suggests 

a need for more research into human-like virtual assistants that are able to manage multiple turn-

takings for conversation and to control unmet expectations for developing the affordable 

systems.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

45 

 

 

 

 

 

 
 
 
 
 
 
 
 
 
 

APPENDICES 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

46 

APPENDIX A.  

 

Recruiting Survey Questions 

 
Q1. What is your age in years? 
 
 
Q2. What is your gender? 

[short answer fill in the blank; must be at least 18 to be eligible] 

•  Woman 
•  Man 
•  Fill in the blank _______________ 
•  Prefer not to answer 

 
 
Q3. Have you ever used any devices on the list below? Select all that apply: 

•  Google Home, Google Home mini, Google Home max, any device with google assistant 

inside [ineligible] 

•  Amazon Echo, Amazon Echo dot, Amazon Tap, Amazon Echo Show, Amazon Echo 

spot, any devices with Alexa inside. [ineligible] 

•  Apple Home Pod [ineligible] 
•  Any other home appliance operated by the spoken language [ineligible] 
•  None the above 

 
Q4. Have you ever used any service on the list below? Select all that apply: 

•  Apple Siri [to Q4.1] 
•  Google Assistant, Google Now [ to Q4.1] 
•  MS Cortana [to Q4.1] 
•  Samsung Bixby [to Q4.1] 
•  Any other service operated by the spoken language on the smart phone [to Q4.1] 
•  None the above  

 
 (If the subject answer anything except “None the above” on Q4, one or two additional follow-up 
questions will be presented) 
(follow-up) Q4.1. How often do you usually use <participant’s answer in the previous Q4 
question> ?  

•  Several times per day [ineligible] 
•  About once per day [ineligible] 

 

47 

•  A few times per week [ineligible] 
•  About once per week [ineligible] 
•  A few times per month  
•  About once per month   
•  Less than once per month  

 

 
(follow-up) Q4.2. Which feature do you use? Select all that apply: 

•  Get directions 
•  To call or text someone 
•  Set alarms or timers 
•  Set reminders or make “To-Do” list  
•  Check weather information  
•  Play music  
•  Launch an app 
•  Check news [ineligible] 
•  Read emails or messages aloud to me [ineligible] 
•  To search for something on the internet [ineligible] 
•  Others ____________________ [ineligible] 

 
Q5. Do you now, or have you ever worked in a job where your responsibilities included:  

•  Development or Quality Assurance for a computer program operated by spoken language 

interface or dialog systems(i.e., Facebook messenger bot, Slackbot),  or any kind of 
virtual assistants   

•  communicating with clients or customers via a computer program operated by spoken 

language interface or dialog systems (i.e., Facebook messenger bot, Slackbot) 

•  working on the development of the conversational interface in virtual assistant or a 

computer program operated by spoken language interface or dialog systems (i.e., 
Facebook messenger bot, Slackbot) for your organization? 

Yes [ineligible] 
No 
 
Q6. Do you now, or have you ever worked in a job where your responsibilities included: 
computer programming, quality assurance and testing, IT security, or network administration?  
Yes [ineligible] 
No 

 

 

48 

Age 
71 
38 
20 
56 
48 
56 
39 
19 
34 
61 
35 
22 
21 
35 
61 
19 
47 
59 
36 
22 

 
P1 
P2 
P3 
P4 
P5 
P6 
P7 
P8 
P9 
P10 
P11 
P12 
P13 
P14 
P15 
P16 
P17 
P18 
P19 
P20 

 

 

 

 

Appendix B.  

Participants’ information 

Gender 

Experience in VA 

Internet Literacy (1-5)  

F 
F 
F 
F 
M 
M 
M 
M 
M 
F 
F 
F 
M 
F 
F 
F 
M 
M 
M 
M 

No 
No 
Yes 
No 
No 
No 
Yes 
Yes 
No 
Yes 
Yes 
No 
No 
Yes 
No 
Yes 
No 
Yes 
Yes 
No 

2.9 
2.1 
2.9 
2.1 
2.5 
3.6 
3.9 
2.8 
4 
2.6 
2.8 
3.1 
3.4 
3.0 
3.5 
3.1 
1.8 
4.4 
3.1 
3.5 

Table 3: Participants’ demographic information 

49 

Appendix C.  

A brief instruction of Google Home 

• 

Instruction: Google Home is a smart speaker developed by Google. Google Home 

enables users to speak voice commands to interact with services through the intelligent 

personal assistant called Google Assistant. Google Home is able to detect your voice 

when you say “Hey Google.” or “Okay, Google.” During the test, you need to utter one 

of the two unless Google Home do not answer your question. 

•  Conduct a sample task to make sure whether the participant understand how to use it 

(Sample task: Please find out the name of the President in France. After finding it, please 

say “Thank you, Google.”)  

 

 

 

 

 

 

 

 

50 

 

 

 

Appendix D.  

 

Think-aloud instruction 

  “In this session, you’ll be asked to 'think aloud'. The most important thing for you to remember 

is that this is not a test – there is no right or wrong answer to ask a question to Google Home. 

The point of the study is to learn about how people talk with this kind of device to achieve what 

they want. Your honest responses will be extremely useful in helping us understand users’ belief 

or knowledge of the Google Home. Also, participation in this study is completely voluntary. 

Although I don't know why this would happen, if you become uncomfortable in any way feel 

free to stop at any time, and you will still receive $20 for participating.” 

 

  “So, during this session, I’m interested in two things: how you approach to get an 

outcome/answer what you want, and what you think about as you converse. I’ll teach you how to 

do this and give you a chance to practice in just a second. It’s very easy to do, and we get a lot of 

good information from these kinds of studies about people’s thought processes while completing 

certain activities. It’s basically like you’re talking to yourself, but loud enough for other people 

to hear.” 

 

  “Now, when I say ‘think aloud’, what I mean is that you should say whatever is on your mind 

while you complete the exercises. I want you to tell me EVERYTHING that you are thinking 

from the time you begin an exercise until you finish it. I would like you to talk aloud as 

CONTINUOUSLY as possible – even if the only thing you are thinking is “I’m drawing a 

blank.”  

 

51 

 

  “I want to hear about what you’re looking for or what you’re trying to do, even if it seems 

obvious. If you hesitate or are indecisive, describe what’s causing your hesitation. Don't try to 

plan out what to say or try to explain to me what you are thinking. Don’t hold back hunches, 

guesses, wild ideas, or negative comments – these things will all help us understand how the 

software is being used. Just try to act as if you are alone, speaking to yourself – only a little 

louder.” 

 

“You can also ask me questions, but I can’t explain anything about the system or task to you 

right now. If you do have questions, go ahead and verbalize them anyway so we can learn about 

the kinds of questions and problems people have, and I’ll answer your questions at the end of the 

session. And don't worry, if you forget to “think aloud” I will jump in and remind you to keep 

talking.” 

   “Now, you try thinking aloud. Here's a problem: please think aloud while you answer the 

question, ' How many windows are there in your best friend's house or apartment?.” “Good!”  

 “I demonstrate an example to use Google Home. Let’s say I want to know whether I need a 

jacket tomorrow.” 

################################################################# 

Interviewer: Okay, let’s ask whether I need a jacket tomorrow or not. “Okay, google. Do you think do I 
need a jacket tomorrow? 

Google Home: Tomorrow in East Lansing expected a high of x degrees and a low of x degrees. 

Interviewer: Hmm..I do not still know whether I need a jacket or not. Let’s say, “Okay, google. Is it cold 
tomorrow? 

 

52 

Google Home: Tomorrow in East Lansing expected a high of x degrees and a low of x degrees. 

Interviewer: Oh, no it gives me the same answer. Maybe Google does not know the meaning of jacket. 
Let’s try another way. Today I wear a jacket, so maybe I can ask like this. “Okay google, Tomorrow is 
colder than today?” 

Google Home: Sorry I’m not sure how I can help. 

Interviewer: Okay, then..What was the low temperature… “Okay, google. What is the low temperature in 
tomorrow?” 

Google Home: degree 

Interviewer: Okay let’s ask “Okay, Google. X is cold? 

Google Home: Sorry I’m not sure how I can help. 

Interviewer: Hmm..maybe Google home can answer only high and low temperature. Okay, then maybe I 
can ask about cloth. “Ok Google. What should I wear tomorrow?” 

Google Home: 

Interviewer: Ok stop google. I’m done! 

################################################################# 

 

 

 

 

 

 

 

53 

 

 

Appendix E.  

 

Post Survey 

Questions about Demographics 

 

Q1. What is the last grade or class you completed in school? 

•  None, or grades 1-8 

•  Some high school 

•  High school graduate or GED certificate 

•  Technical, trade, or vocational school AFTER high school 

•  Some college, no 4-year degree 

•  4-year college degree 

•  Some postgraduate or professional schooling, no postgraduate degree 

•  Postgraduate or professional degree, including master’s, doctorate, medical or law degree 

 

Q2. What was your total household income before taxes during the past 12 months? 

•  Less than $25,000 

•  $25,000 to $34,999 

•  $35,000 to $49,999 

•  $50,000 to $74,999 

•  $75,000 to $99,999 

•  $100,000 to $149,999 

•  $150,000 to $199,999 

 

54 

•  $200,000 or more 

 

Q3. Which categories below best describe you? Select all that apply: 

•  White 

•  Hispanic, Latino or Spanish 

•  Black or African American 

•  Asian 

•  American Indian or Alaska Native 

•  Middle Eastern or North African 

•  Native Hawaiian or Other Pacific Islander 

•  Some Other Race, Ethnicity or Origin (please specify) 

 

Q4. In politics, as of today, do you consider yourself a Republican, a Democrat or an 

independent?  

•  Republicans 

• 

Independents [to Q4-1] 

•  Democrats 

 

 

 

(follow-up) Q4-1. As of today, do you lean more to the Democratic Party or the Republican 

Party?  

•  Republicans 

•  Democrats 

55 

Based on Gallup. Party affiliation http://news.gallup.com/poll/15370/party-affiliation.aspx, 

October 2017.  

 

 

Question about Internet Literacy 

 

How familiar are you with the following Internet-related terms? Please rate your understanding 

of each term below from None (no understanding) to Full (full understanding):  

•  Wiki 

[None, Little, Some, Good, Full] 

 

 

 

 

•  Meme  [None, Little, Some, Good, Full] 

•  Phishing 

[None, Little, Some, Good, Full] 

•  Bookmark 

[None, Little, Some, Good, Full] 

•  Cache  [None, Little, Some, Good, Full] 

•  SSL 

[None, Little, Some, Good, Full] 

•  AJAX  [None, Little, Some, Good, Full] 

•  RSS 

[None, Little, Some, Good, Full] 

Based on Hargittai, E., & Hsieh, Y. P. (2011). Succinct Survey Measures of Web-Use Skills. 

Social Science Computer Review, 30(1), 95–107. 

 

 

 

 

 

 

 

56 

Appendix F.  

 

Task lists 

Each task gives you a certain situation that you need to find some information, talking with 

Google Home. If you think you receive an acceptable answer that you are expecting, please 

say “Thanks, Google”, and then move to the next one.  

 
Task 1: (Answerable / factual) 
Assume that you are planning to take a trip to Japan in the near future. You would like to find 
useful information about flights, accommodation, and features of interests in a certain destination 
in Japan. Please ask Google Home about it. When you finally decide where to go, how to get 
there, what to do, and so on, please say “Thanks, Google.” (Please select any period you want to 
go) 
 
Task 2: (Answerable / factual) 
Assume that you want to purchase a new television. You want to find a place where you can buy 
one near your home. Please ask Google Home about it. When you decide the place where you 
can buy a new one, please say “Thanks, Google.” (Please assume that this place is your home)  
 
Task 3: (Answerable / instrumental) 
Assume that you need to issue a passport to travel overseas, but you do not know how to issue it 
and where to go for it. Please talk with Google Home to find out solutions.  
 
Task 4: (Unanswerable / exploratory – controversial)  
Assume that you just read news about Obamacare and you are not sure if you want to sign up for 
it. So, you want to check Google Home’s opinion on Obamacare in order to make a decision on 
whether you should enroll. Ask for Google Home’s opinion. 
 
Task 5: (Unanswerable / exploratory – predictive) 

 

57 

Assume that you are a big fan of the Detroit Pistons basketball team and you hope they will 
advance in the Playoffs this season. You may want to know Google Home’s prediction about it. 
Please ask Google Home about it. (NBA Playoffs are a tournament among the best 16 teams in 
the league) 
 
Task 6:  
Please ask anything you want to ask.  

 

 

 

 

 

 

 

 

58 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

REFERENCES 

59 

 

REFERENCES 

Branigan, H. P., Pickering, M. J., Pearson, J., & McLean, J. F. (2010). Linguistic alignment 

between people and computers. Journal of Pragmatics, 42(9), 2355–2368. 
https://doi.org/10.1016/j.pragma.2009.12.012 

Choo, C. W., Detlor, B., & Turnbull, D. (2000). Information seeking on the web: An integrated 
model of browsing and searching. First Monday, 5(2). https://doi.org/10.5210/fm.v5i2.729 

Clark, M., Dutta, P., & Newman, M. W. (2016). Towards a Natural Language Programming 

Interface for Smart Homes. Proceedings of the 2016 ACM International Joint Conference 
on Pervasive and Ubiquitous Computing: Adjunct, 49–52. 
https://doi.org/10.1145/2968219.2971443 

Cohen, P., Cheyer, A., Horvitz, E., El Kaliouby, R., & Whittaker, S. (2016). On the Future of 
Personal Assistants. In Proceedings of the 2016 CHI Conference Extended Abstracts on 
Human Factors in Computing Systems - CHI EA ’16 (pp. 1032–1037). 
https://doi.org/10.1145/2851581.2886425 

Cowan, B. R., Pantidi, N., Coyle, D., Morrissey, K., Clarke, P., Al-Shehri, S., … Bandeira, N. 

(2017). “What Can I Help You With?”: Infrequent Users’ Experiences of Intelligent 
Personal Assistants. Proceedings of the 19th International Conference on Human-Computer 
Interaction with Mobile Devices and Services - MobileHCI ’17, 1–12. 
https://doi.org/10.1145/3098279.3098539 

DeVito, M. A., Gergle, D., & Birnholtz, J. (2017). Algorithms ruin everything: RIPTwitter, Folk 
Theories, and Resistance to Algorithmic Change in Social Media. Proceedings of the 2017 
CHI Conference on Human Factors in Computing Systems  - CHI ’17, 3163–3174. 
https://doi.org/10.1145/3025453.3025659 

Enge, E. (2017). Digital Personal Assistants: Which Is Smartest? Retrieved December 29, 2017, 

from https://www.stonetemple.com/digital-personal-assistants-test 

Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 

215–251. https://doi.org/10.1037/0033-295X.87.3.215 

Eslami, M., Karahalios, K., Sandvig, C., Vaccaro, K., Rickman, A., Hamilton, K., & Kirlik, A. 

(2016). First I &quot;like&quot; it, then I hide it: Folk Theories of Social Feeds. In 
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - 
CHI ’16 (pp. 2371–2382). https://doi.org/10.1145/2858036.2858494 

French, M., & Hancock, J. (2017). What’s the Folk Theory? Reasoning About Cyber-Social 

Systems. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2910571 

Gašić, M., Hakkani-Tür, D., & Celikyilmaz, A. (2017). Spoken language understanding and 

interaction: Machine learning for human-like conversational systems. Computer Speech and 

 

60 

Language, 0, 5–7. https://doi.org/10.1016/j.csl.2017.05.006 

Granger Morgan, M., Fischhoff, B., Bostrom, A., & Atman, C. J. (2002). Mental Models 

Interviews. Risk Communication. A Mental Models Approach, 63–83. 
https://doi.org/0521002567 

Hölscher, C., & Strube, G. (2000). Web search behavior of internet experts and newbies. 

Computer Networks, 33(1), 337–346. https://doi.org/10.1016/S1389-1286(00)00031-1 

Johnson-Laird, P. N. (2010). Mental models and human reasoning. Proceedings of the National 

Academy of Sciences of the United States of America, 107(43), 18243–50. 
https://doi.org/10.1073/pnas.1012933107 

Jones, N. a., Ross, H., Lynam, T., Perez, P., & Leitch, A. (2011). Mental Model an 

Interdisciplinary Synthesis of Theory and Methods. Ecology and Society, 16(1), 46–46. 
https://doi.org/46 

Kang, R., Dabbish, L., Fruchter, N., & Kiesler, S. (2015). “My data just goes everywhere”: User 

mental models of the internet and implications for privacy and security. In Symposium on 
Usable Privacy and Security (SOUPS) 2015 (pp. 39–52). 

Kim, J. (2006). Task as a predictable indicator for information seeking behavior on the Web. 

ProQuest Dissertations and Theses. Retrieved from 
http://search.proquest.com/docview/305277561?accountid=14643%5Cnhttp://mlbsfx.sibi.us
p.br:3410/sfxlcl41?url_ver=Z39.88-
2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&genre=dissertations+%26+theses&sid
=ProQ:ProQuest+Dissertations+%26+Theses+Global&atit 

Kim, J., McNally, B., Norooz, L., & Druin, A. (2017). Internet Search Roles of Adults in their 

Homes. In Proceedings of the 2017 CHI Conference on Human Factors in Computing 
Systems  - CHI ’17 (pp. 4948–4959). https://doi.org/10.1145/3025453.3025572 

Kim, K.-S., & Allen, B. (2002). Cognitive and task influences on Web searching behavior. 

Journal of the American Society for Information Science and Technology, 53(2), 109–119. 
https://doi.org/10.1002/asi.10014 

Kopp, S., Gesellensetter, L., Krämer, N. C., & Wachsmuth, I. (2005). A conversational agent as 

museum guide - Design and evaluation of a real-world application. In Lecture Notes in 
Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture 
Notes in Bioinformatics) (Vol. 3661 LNAI, pp. 329–343). 
https://doi.org/10.1007/11550617_28 

Li, Y., & Belkin, N. J. (2008). A faceted approach to conceptualizing tasks in information 

seeking. Information Processing and Management, 44(6), 1822–1837. 
https://doi.org/10.1016/j.ipm.2008.07.005 

Lovato, S., & Piper, A. M. (2015). “Siri, is this you?”: Understanding Young Children’s 

Interactions with Voice Input Systems. In Proceedings of the 14th International Conference 

 

61 

on Interaction Design and Children - IDC ’15 (pp. 335–338). 
https://doi.org/10.1145/2771839.2771910 

Luger, E., & Sellen, A. (2016). “Like Having a Really Bad PA”: The Gulf between User 
Expectation and Experience of Conversational Agents. Proceedings of the 2016 CHI 
Conference on Human Factors in Computing Systems - CHI ’16, 5286–5297. 
https://doi.org/10.1145/2858036.2858288 

McGregor, M., & Tang, J. C. (2017). More to Meetings: Challenges in Using Speech-Based 

Technology to Support Meetings. Proceedings of the 2017 ACM Conference on Computer 
Supported Cooperative Work and Social Computing - CSCW ’17, 2208–2220. 
https://doi.org/10.1145/2998181.2998335 

Monaghan, L. F., Goodman, J. E., & Robinson, J. M. (2012). A cultural approach to 

interpersonal communication : essential readings. Wiley-Blackwell. Retrieved from 
https://books.google.com/books?id=4U7VLOg7J3kC&pg=PA158&lpg=PA158&dq=What+
we+say+and+how+we+say+it+are+chosen+from+a+great+range+of+possibilities&source=
bl&ots=7NGxtxDloR&sig=PoUxnBajoMifVJRl2uCvDlCiy0Y&hl=en&sa=X&ved=0ahUK
Ewihy6a3rrvaAhVK5oMKHahWBwIQ6AEINzAB#v=onepage&q=What we say and how 
we say it are chosen from a great range of possibilities&f=false 

Moore, R. K. (2013). Spoken language processing: Where do we go from here? Lecture Notes in 
Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture 
Notes in Bioinformatics). https://doi.org/10.1007/978-3-642-37346-6-10 

Moore, R. K. (2017). Is spoken language all-or-nothing? Implications for future speech-based 
human-machine interaction. In Lecture Notes in Electrical Engineering (Vol. 999 LNEE, 
pp. 281–291). https://doi.org/10.1007/978-981-10-2585-3_22 

Muramatsu, J., & Pratt, W. (2001). Transparent Queries: investigation users’ mental models of 
search engines. In Proceedings of the 24th annual international ACM SIGIR conference on 
Research and development in information retrieval (pp. 226–233). 
https://doi.org/http://doi.acm.org/10.1145/383952.383991 

Nersessian, N. J. (2002). The cognitive basis of model-based reasoning in science. The Cognitive 

Basis of Science, 133–153. https://doi.org/10.1017/cbo9780511613517.008 

Norman, D. A. (1983). Some Observations on Mental Models. In Mental Models (Vol. 7, pp. 7–

14). https://doi.org/Cited By (since 1996) 346\rExport Date 12 September 2012 

Norman, D. A. (2013). The Design of Everyday Things. Revised and expanded edition. Human 

Factors and Ergonomics in Manufacturing. https://doi.org/10.1002/hfm.20127 

Paul, S. (2017). Voice Is the Next Big Platform, Unless You Have an Accent | WIRED. 

Retrieved March 11, 2018, from https://www.wired.com/2017/03/voice-is-the-next-big-
platform-unless-you-have-an-accent/ 

Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice Interfaces in Everyday 

 

62 

Life. Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems. 
https://doi.org/doi.org/10.1145/3173574.3174214 

Rader, E., & Gray, R. (2015). Understanding User Beliefs About Algorithmic Curation in the 

Facebook News Feed. In Proceedings of the 33rd Annual ACM Conference on Human 
Factors in Computing Systems - CHI ’15 (pp. 173–182). 
https://doi.org/10.1145/2702123.2702174 

Rader, E., & Slaker, J. (2017). The Importance of Visibility for Folk Theories of Sensor Data. In 

Symposium on Usable Privacy and Security (SOUPS) 2017. 

Reeves, S. (2017). Some Conversational Challenges of Talking with Machines. In Companion of 

the 2017 ACM Conference on Computer Supported Cooperative Work and Social 
Computing - CSCW ’17 Companion (pp. 431–436). 
https://doi.org/10.1145/3022198.3022666 

Richter, F. (2016). • Chart: What the Amazon Echo Is Actually Used For | Statista. Retrieved 

April 7, 2018, from https://www.statista.com/chart/6080/amazon-echo-usage/ 

Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the Web. Journal 

of the American Society for Information Science and Technology, 53(2), 145–161. 
https://doi.org/10.1002/asi.10017 

Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home 
environment. Journal of the American Society for Information Science and Technology. 
https://doi.org/10.1002/asi.20018 

Rong, X., Fourney, A., Brewer, R. N., Morris, M. R., & Bennett, P. N. (2017). Managing 

Uncertainty in Time Expressions for Virtual Assistants. In Proceedings of the 2017 CHI 
Conference on Human Factors in Computing Systems  - CHI ’17 (pp. 568–579). 
https://doi.org/10.1145/3025453.3025674 

Schamber, L., & Marchionini, G. (1996). Information Seeking in Electronic Environments. 

Journal of Education for Library and Information Science, 37(1), 81. 
https://doi.org/10.2307/40324289 

Solomon, J., & Wash, R. (2014). Human-what interaction? Understanding user source 

orientation. In Proceedings of the Human Factors and Ergonomics Society (Vol. 2014–
Janua, pp. 422–426). https://doi.org/10.1177/1541931214581088 

Sterling, G. (2017). Study: Google Home has more answers than Amazon Echo - Marketing 
Land. Retrieved April 14, 2018, from https://marketingland.com/amazon-echo-google-
home-go-head-head-knowledge-test-208589 

Task Scenarios for Usability Testing. (2014). Retrieved March 11, 2018, from 

https://www.nngroup.com/articles/task-scenarios-usability-testing/ 

Wash, R., & Rader, E. (2011). Influencing mental models of security. In Proceedings of the 2011 

 

63 

workshop on New security paradigms workshop - NSPW ’11 (p. 57). 
https://doi.org/10.1145/2073276.2073283 

Wingfield, N. (2018). Alexa, We’re Still Trying to Figure Out What to Do With You - The New 

York Times. Retrieved April 7, 2018, from 
https://www.nytimes.com/2018/01/15/technology/virtual-assistants-alexa.html 

 

 

64