CONVERSATION By Janghee Cho A THESIS Submitted to MENTAL MODELS AND HOME VIRTUAL ASSISTANTS (HVAs): A SINGLE TURN-TAKING IS NOT ENOUGH TO HAVE A HUMAN-LIKE Michigan State University in partial fulfillment of the requirements for the degree of Information and Media — Master of Arts 2018 ABSTRACT MENTAL MODELS AND HOME VIRTUAL ASSISTANTS (HVAs): A SINGLE TURN-TAKING IS NOT ENOUGH TO HAVE A HUMAN-LIKE CONVERSATION By Janghee Cho As virtual assistants become more intelligent, these applications are dispersed in our everyday life. However, virtual assistants are still used in very routine tasks (Wingfield, 2018). People are expected to speak to virtual assistants in a conversational manner; nevertheless, there has been little research looking at people's mental models for what kinds of interactions they think the devices are capable of. I conducted a qualitative study to describe people’s strategies to seek a wide range of information when interacting with Google Home, which is a type of Home Virtual Assistant (HVA), and how their understandings of HVAs may change their interaction. I found that people believed applying human-to-human communication skills may facilitate a conversation with Google Home (i.e., common-sense models). Also, people expected Google Home performs like Google search or an artificial intelligence system that is able to become more intelligent if people provide more information (i.e., machine-oriented models). I present implications for the design of HVAs to encourage inexperienced people to be involved in the interaction with HVAs based on these results. Copyright by JANGHEE CHO 2018 ACKNOWLEDGEMENTS Firstly, I would like to express my sincere gratitude to my advisor Dr. Emilee Rader for the continuous support of my master study. Here guidance helped me in all the time of research and writing of this thesis. Besides my advisor, I would like to thank the rest of my thesis committee: Dr. Susan Wyche and Dr. Taiwoo Park, for their insightful comments and encouragement, but also for the question which incented me to widen my research from various perspectives. My sincere thanks also go to our Behavior Information Technology lab members, who provided me many ideas and feedbacks. Without their precious support, it would not be possible to conduct this research. Also this work would not have been possible without the financial support of the Department of Media and Information at Michigan State University. Last but not the least, I would like to thank my family, especially my wife Ahyoung, for supporting me spiritually throughout writing this thesis and my life in general. iv TABLE OF CONTENTS LIST OF TABLES ……………………………………………………………………………… vi LIST OF FIGURES …………………………………………………………………………… vii CHAPTER 1. INTRODUCTION ................................................................................................ 1 CHAPTER 2. RELATED WORKS ............................................................................................ 5 2.1 MENTAL MODELS .............................................................................................................. 5 2.2 DIFFERENT CONVERSATIONAL PATTERNS IN HUMAN AND VIRTUAL ASSISTANT COMMUNICATION .................................................................................................................... 7 2.3 TASK DESIGN..................................................................................................................... 9 CHAPTER 3. METHODS ........................................................................................................ 11 3.1 PARTICIPANTS .................................................................................................................. 11 3.2 PROCEDURE ..................................................................................................................... 12 3.3 TASK DESIGN................................................................................................................... 14 3.4 ANALYSIS ........................................................................................................................ 15 CHAPTER 4 FINDINGS .......................................................................................................... 17 4.1. HOW DID PEOPLE SEEK SPECIFIC INFORMATION? ............................................................ 17 4.2. HOW DID PARTICIPANTS INTERACT WITH GOOGLE HOME AFTER LEARNING ITS LIMITATIONS? ....................................................................................................................... 24 4.3. WHAT MADE PARTICIPANTS STOP USING IT. ................................................................... 28 4.4 MENTAL MODELS OF HVAS ............................................................................................. 32 CHAPTER 5. DISCUSSION .................................................................................................... 40 5.1 GIVING AN ANSWER IS NOT A SOLUTION FOR THE EFFECTIVE INTERACTION ....................... 40 5.2. HOW TO SUPPORT COMMON-SENSE MODELS ................................................................... 41 5.3 LIMITATION ..................................................................................................................... 43 6. CONCLUSION .................................................................................................................... 44 APPENDICES .......................................................................................................................... 46 APPENDIX A....................................................................................................................... 47 APPENDIX B. ...................................................................................................................... 49 APPENDIX C. ...................................................................................................................... 50 APPENDIX D....................................................................................................................... 51 APPENDIX E ....................................................................................................................... 54 APPENDIX F ....................................................................................................................... 57 REFERENCES ......................................................................................................................... 59 v LIST OF TABLES Table 1: The order of tasks for each participant ……………………………………………….. 14 Table 2: Two mental models and related strategies ……………………………………………. 39 Table 3: Participants’ demographic information ………………………………………………. 50 vi LIST OF FIGURES Figure 1: Participants’ questions in the factual tasks ………………………………………….. 20 Figure 2: Participants’ questions in the instrumental tasks …………………………………..... 22 Figure 3: Data from Google Activity in terms of task 3 ……………………………………….. 23 Figure 4: Participants’ questions in the unanswerable tasks …………………………………… 26 Figure 5: Participant’s reaction toward Google Home’s preprogrammed answer …………….. 30 Figure 6: Participants’ reaction toward Google Home’s long response. ………………………. 31 Figure 7: The result of Google search(left) and Google Home(right) in response to the same question ………………………………………………………………………………………… 36 Figure 8: Participant’s interaction with Google Home in task 3 and the debriefing from the interview session …………………………………………………………….………………… 38 Figure 9: Google Home’s current error handling strategies …………………………………… 43 vii CHAPTER 1. INTRODUCTION Conversation is becoming a key interface of Human-computer interaction (HCI) as virtual assistants, such as Amazon Alexa, Google Assistant or Apple Siri, integrate into personal or home devices. Virtual assistants, so-called conversational agents, refer to a dialogue system by “combining automatic speech recognition and speech synthesis with natural language processing and dialogue management” (Roger K. Moore, 2017, p. 281). As virtual assistants become more intelligent, these applications take on a wide variety of roles and functions in our everyday life. It is expected that virtual assistants will become essential user interfaces to many services in the near future (R K Moore, 2013). In many cases, virtual assistants’ practical value is still in doubt and limited to a few simple tasks such as checking weather forecasts, playing music, and controlling other smart appliance (Cohen, Cheyer, Horvitz, El Kaliouby, & Whittaker, 2016; Gašić, Hakkani-Tür, & Celikyilmaz, 2017; Roger K. Moore, 2017). Since current virtual assistants are developed based on task-oriented functions (e.g., playing music, turning on lights, setting an alarm and so on), virtual assistants are used in very routine tasks (Wingfield, 2018). Moreover, more than half of people rarely or only occasionally use virtual assistants and many people have little knowledge about how to take advantage of virtual assistants’ ability in their life (Roger K. Moore, 2017; Richter, 2016). System designers keep updating virtual assistants’ abilities to make them more likely to have natural interactions with people. For instance, Amazon announced a new feature (i.e., Brief- mode1) to make interactions with virtual assistants seamless. If people activate this mode, 1About brief mode: https://www.theverge.com/2018/3/16/17128860/amazon-alexa-brief-mode-testing-responses- verbal-beeps 1 Amazon Alexa does not give a reply to people’s request to turn on the light by using a verbal confirmation, instead it just turns on the light. Furthermore, Google continued to develop the quality of its virtual assistant’s voice to make it more realistic2. It was reported that most people get used to using virtual assistants through playful interaction such as humorous responses (Luger & Sellen, 2016). Given the fact that anthropomorphic cues such as name, gender, voice or physical appearance cues enhance users’ engagement and usability in the interaction between people and virtual assistants (Kopp, Gesellensetter, Krämer, & Wachsmuth, 2005; Tegos et al., 2012; Niculescu et al., 2014), system designers might presume that designing virtual assistants to be more like humans would be helpful to encourage people to adopt this advanced technology. The basic idea to be like a human in conversation is a joint action by aligning each other’s knowledge and experience to achieve a goal (Clark, 1997). Since people use virtual assistants by taking turns talking to them, mimicking conversational manners well would be critical for making virtual assistants to be more human-like. Understanding people’s beliefs and expectations regarding virtual assistants is important for knowing how to improve a conversation between people and virtual assistants in order to engage inexperienced people in the interaction with virtual assistants. Recent research studies have examined how people use them in their everyday life, focusing on task-oriented usage and interaction with smart home applications (Clark, Dutta, & Newman, 2016; Cowan et al., 2017; McGregor & Tang, 2017; Porcheron, Fischer, Reeves, & Sharples, 2018; Rong, Fourney, Brewer, Morris, & Bennett, 2017). However, most studies investigated experienced users who already made use of virtual assistants in some way, which means that they were already educated about how to use virtual assistants through their experience. Moreover, people who tend to adopt 2 https://www.engadget.com/2018/03/27/google-text-to-speech-service-has-more-realistic-voices/ 2 advanced technology as soon as it becomes available or have a professional background regarding technology may be adept at exploiting virtual assistants’ abilities. Rather than a single task-oriented usage, in this paper, I focus more on the conversation between people and virtual assistants to understand how inexperienced people create a dialogue to seek certain information. Since the interaction between people and virtual assistants is based on conversation, analyzing their verbal strategies is critical for developing virtual assistants. Also, by focusing on potential users who haven’t experienced using virtual assistants, I study how they behave in their interactions with virtual assistants by applying their beliefs and expectations. In this study, I use the term, home virtual assistant (HVA), to specify a certain domain for a conversational interface, which is the focus of this study. In the previous research, virtual assistants were not categorized according to different domains or purposes. Different labels are often used (e.g., conversational agents, personal assistants, or chatbot) for the same services or devices, or they are used interchangeably. Given the fact that virtual assistants have been learning numerous abilities incessantly, it is necessary to distinguish virtual assistants used in the context of the home (i.e., home virtual assistants) from the mobile-driven context in a portable form. The main motivation to use mobile-driven virtual assistants was that they were determined to be time-saving (Luger & Sellen, 2016). While carrying out multi-tasking is the main motivation for the mobile-driven virtual assistants, home virtual assistants may be utilized in our everyday life for other purposes. Among home virtual assistants in the current market, I exclusively use the Google Home device to observe the extent of their conversation, not only one single turn-taking between people and HVAs. In a recent study from a market research firm, Google Home was seen as having the highest accuracy rate in answering the most questions (Enge, 2017). A single turn-taking is not enough to assess the process of how users learn and 3 change their approach so that using a device with better performance to give an answer for is necessary in this study. Moreover, tasks are designed in a scenario-based context, therefore, I am intentionally choosing a device that allows the participant to talk as much as they can. I conducted a qualitative study that is comprised of two phases in order to understand people’s mental models. Mental models refer to sets of beliefs and understandings that help users make decisions (Norman, 1983). Mental models form people’s thoughts and guide their interaction with the system. Understanding the mental model that people use in the interaction is important in designing a more effective system for people. I began with an observation to identify strategies that users use to complete the tasks. I asked participants to think-aloud while conducting the tasks. Then, I had a semi-structured interview to probe for a set of causal beliefs that impact their behavior during the tasks. In think-aloud, people verbalize everything that they are observing into words, which reveals the detail of human problem-solving processes that controls users’ reasoning (Johnson-Laird, 2010). People can verbalize the linguistic contents of their working memory that affect how knowledge is organized and used in our mind (D'Andrade, 1995;Norman, 2013). In this sense, I identified themes and patterns in participants’ utterance data that reveal their mental models. In this study, I examined how inexperienced people interact with HVAs and how their understanding of HVAs may change their interaction with them. I introduce tasks for seeking information through HVAs focusing on our everyday conversation and describe what mental models inexperienced people form while developing their verbal strategies to proceed with a conversation with Google Home. I also present design implications as to how to promote a conversation between people and HVAs and to encourage inexperienced people to be involved in the interaction with HVAs. 4 CHAPTER 2. RELATED WORKS As a way of understanding people’s interaction with HVAs, I reviewed literature that studied mental model’s approach to understand human behavior in the system, conversation patterns in human-to-human interaction and human-to-virtual assistant interaction, and the way to develop a task to investigate information seeking behavior. 2.1 MENTAL MODELS Research in HCI has commonly studied people’s mental models to understand how people understand a system’s operation, especially those designed with a complex structure. Mental models refer to a human’s reasoning mechanism that people use to interact with the world around them (Jones, Ross, Lynam, Perez, & Leitch, 2011); these models are sets of beliefs and understanding that help people’s decisions (Norman, 1983; Rader & Slaker, 2017). Mental models are formed by an implicit collection of beliefs about the system based on their experiences and interaction (French & Hancock, 2017). People obtain information from other people that are like themselves, from the media, from communications and interaction with experts, and from their own experiences (Wash & Rader, 2011). In this sense, “mental model tends to be functional rather than complete or accurate representation of reality” (Jones et al., 2011, p. 5); people continue to modify the mental model in order to solve a problem through interaction with the system (Norman, 1983). In interacting with technological artifacts, people formulate mental models of themselves and mental models of the things with which they are interacting (Young, 1983). 5 Besides experience and background knowledge, the structure of the human information processing system is also a component of mental models (Norman, 1983). Encountering new knowledge in the world, users try to understand it by extrapolating from their own knowledge in their heads (Norman, 2013). They use their knowledge stored in their working memory, invaluable in the performance of everyday tasks, to make a decision to select and manipulate information (Jones et al., 2011). Working memory is available immediately, clear and complete without mental effort, but it also enables users to retrieve information from their long-term memory (Nersessian, 2002). Thus, working memory holds a lot of clues in regards to capturing a user’s understanding of system operations. Researchers have also investigated users’ mental models in the context of technologies with an opaque operation – web searching (Choo, Detlor, & Turnbull, 2000; Hölscher & Strube, 2000; Muramatsu & Pratt, 2001), and social media algorithms (Eslami et al., 2016; Rader & Gray, 2015). These operations provide little feedback and prevent a user from understanding the details of their functionality, which is how a particular input is processed and interpreted. Home virtual assistants like Google Home also have an opaque operational interface, obstructing an understanding of the system’s process. Koenemann and Belkin (1996) define an opaque level of interactivity as an interface that does not provide any indication of the underlying transformations that the system performs. The mental model approach is generally recognized as a useful method to unveil underlying users’ behaviors in the context where users encounter a complex or unfamiliar domain (DeVito, Gergle, & Birnholtz, 2017; French & Hancock, 2017; Kang, Dabbish, Fruchter, & Kiesler, 2015; Rader & Gray, 2015). This is because people develop an understanding of causal processes associated with mechanical systems by drawing upon their mental models, 6 formed through analogical thinking (Jones et al., 2011). To design a more effective interface for those opaque systems, most studies aim to understand users’ beliefs and patterns. In this study, I examine mental models, focusing on the conversational interface, without the consideration of visual feedback. 2.2 DIFFERENT CONVERSATIONAL PATTERNS IN HUMAN AND VIRTUAL ASSISTANT COMMUNICATION Understanding human-to-human communication is essential to analyze the interaction between people and virtual assistants. Based on the way people have a conversation, I can investigate the reverberation of current HVAs’ design on the interaction between humans and HVAs. Also, it allows me to understand how differently people talk with HVAs in comparison with human-to-human communication. The most common type of human-to-human communication is a conversation (Clark, 1997). When we say 'conversation', we picture a situation where more than one person express their thoughts, feelings, or any ideas to each other. Sometimes we often figuratively say, “This is not a conversation”, once we feel the interlocutors do not commit themselves to the conversation. In this sense, some extent of the effort should be included in the conversation. Clark (1997) argues human-to-human communication is a joint project, "in which people establish and carry out joint purposes they are willing and able to commit to" (Clark, 1997, p. 319). Each interlocutor not only just act in response to the other, but also coordinate with each other by aligning each other’s knowledge and experience (i.e., grounding, Clark 1997). He argues people cannot have a conversation without certain pieces of common ground. We are usually aware of the other to see their appearance or know their background that helps to assume who they are. In 7 exchanging words, we also form mutual knowledge and beliefs. Having a conversation in human-to-human communication is about a continuous process for the mutual purpose by updating their common ground. Virtual assistants have been developed based on the metaphor of human conversation. People are expected to speak to virtual assistants in a conversational manner. However, people are required to know a certain way to interact with them. Considering the fact that “the common human conversation is face-to-face setting” (Clark, 1997, p. 9), people simultaneously interact with others: they give a sign whether they want to keep interacting with one another by using physical gestures or short responses. Since Google Home does not have any visual signifiers, except an LED light on the touch surface3, it is difficult for people to formulate a question (Reeves, 2017). Moreover, people often use ambiguous expressions, which connote subtle nuances, in communication, which makes the interaction with virtual assistants worse (Rong et al., 2017). Previous research studied how people interact with virtual assistants in certain situations such as a museum (Kopp, Gesellensetter, Krämer, & Wachsmuth, 2005), office (McGregor & Tang, 2017) or a home (Clark et al., 2016; Porcheron et al., 2018). Despite the fact that those studies provided rich empirical findings, they are limited to the task-oriented interaction. Nevertheless, a few studies provided a pattern of how people proceed in their conversation with a virtual assistant. Luger and Sellen (2016) found people who have used virtual assistants for a fairly long time use a particular economy of language (i.e., limiting the types of language used and shortening utterances), which is perceived as a way to successfully interact with virtual 3 See details: https://support.google.com/googlehome/answer/7073219?hl=en 8 assistants. Specifically, they report that people focused on keywords to form a simple sentence, and also enunciate a question. Porcheron, Fisher, and Sharples (2017) also found that people keep asking the same question, failing to receive an expected response. Despite the same question, people changed their tone to make their voice louder and emphasized key sounds. Similar speech modifications were observed in children’s interactions with virtual assistants (Lovato & Piper, 2015) Given the fact that people have little common ground with virtual assistants, people cannot help modifying the way they have a conversation with virtual assistants. In this study, I do not analyze how people use virtual assistants to deal with a routine task that allows me to know their usage, but not to extract a pattern of dialogue. Rather I investigate how people develop a strategy, interacting with virtual assistants in more detail in order to shed light on what future interactions with conversational agents might be like. 2.3 TASK DESIGN To examine people’s strategies for having a conversation with HVAs, it is necessary to understand how to design a task that allows me to view their interaction to seek information. Information seeking is a process in which people purposefully engage in order to learn or solve a problem (Schamber & Marchionini, 1996). A task in information seeking research has been considered as an important tool to investigate users’ behavior on the web (Jinyoung Kim, McNally, Norooz, & Druin, 2017; Rieh, 2004). Depending on how to manipulate a task design, the cognitive process can be divergent so that a classification of the task is important to study the relationship between tasks and human information behavior (K.-S. Kim & Allen, 2002; Li & Belkin, 2008). 9 Kim and Allen (2002) classified two search tasks, a known-item search and subject search. While a known-item search task offers an explicit direction to find a piece of information that is known to exist, a subject search task just requires participants to find any pieces of information that are related to the subject and regarded as useful to them. They found that a known-item search task took less time to complete with higher search performance. However, providing too many clues and describing the steps for the search would restrict the participants’ behavior taking the same route, which means that a researcher may get less useful results to understand participants’ cognitive process. Tasks are also categorized depending on the users’ information needs (Jeonghyun Kim, 2006). Users may just seek specific information (i.e., factual tasks), aim to define a thing, event, reason, means, and so on (i.e., descriptive tasks), or search for an instrument to know how to use something (i.e., instrumental tasks). Users also explore the web to compare, evaluate, and infer information in order to generalize meaningful patterns (i.e., exploratory tasks). I introduce two additional types of tasks related to the exploratory tasks: a controversial task and a predictive task. People often have a conversation with their friends or family to exchange opinions on social issues or forecast a future event or trend. Both controversial tasks and predictive tasks are designed in an unanswerable way, but I pose a simple context that enables a participant to talk with Google Home. The controversial task includes only a controversial issue that contemporarily gives rise to public disagreement. In this task, participants were asked to check Google Home’s opinion about it and make their own decision. In the predictive task, I designed a scenario in which participants need to check the prediction of a certain situation. Through those types of tasks, I examined qualitative research to assess people’s attempt to have a conversation in order to understand their mental models. 10 CHAPTER 3. METHODS I conducted user study sessions in which I gave six tasks to 20 novice users to complete and asked them to think aloud as they completed those tasks. The study took place from November 2017 to January 2018 at the Behavior Information Technology Lab (BITLab) at Michigan State University. 3.1 PARTICIPANTS I recruited 20 native speakers of American English in the Midwest United States through a paid research pool organized by our institution for maximum variation. Google Home is too inflexible to understand different English accents such that people who are not native English speakers are struggling to deliver their commands (Paul, 2017). To work within this context, I limited my participants to native speakers. Each participant was given $20 compensation for their participation for a 50-60-minute session. Participants were screened based on four criteria in a recruiting survey. Eligible participants 1) were at least 18 years of age, 2) had no experience with any home virtual assistant devices, 3) had little or no experience with mobile-driven virtual assistants, and 4) were not experts in virtual assistant technology. Although my study is focused on Google Home, I categorized people who had any experience with other home virtual assistant devices as ineligible. This is because such users have pre-existing mental models formed through their interactions with the other devices that could affect their behavior. However, I restrictively allowed some mobile-driven virtual assistants’ users to participate. Given the fact that the usage between home virtual assistants and mobile-driven virtual assistants are different, people who have used only certain features that 11 require a particular economy of language (e.g., “directions to home”, “tomorrow’s weather”) a few times are permitted in the study. I also excluded participants who identified themselves as technology experts (job responsibilities include computer programming, quality assurance and testing, IT security, or network administration) or virtual assistant experts (work experience at any company related to virtual assistants). I believe that those expert groups have knowledge that helps them effortlessly use such a device, or at minimum, easily apply their knowledge to unexpected situations (Luger & Sellen, 2016). I also chose participants who passed through the recruiting survey (Appendix 1). As divided by age group (e.g., 18-27, 28-37, 38-47, 48-57 or 58+), I selected 20 participants from a wide variety of ages on a ‘first-come first-served’ basis (mean age = 40; age range = 19 - 71). In addition, I selected 10 females and 10 males. Eleven participants had never used any virtual assistant device. Nine participants had used mobile-driven virtual assistants, but they rarely used it in their daily life; three for a few times per month, four for about once per month and two for less than once per month. (more detailed information is shown in Appendix 2) 3.2 PROCEDURE Upon arriving at the lab, the participants were provided with a consent form. Once necessary consent had been granted, brief instructions (Appendix 3) on how to use Google Home were offered in case a participant did not know how to converse with Google Home. Each participant was given a different new Google account I created for the purpose of data collection in the study session. While I created new Google accounts for each participant, there may be some device identifier and history on Google’s servers that I did not have access to and cannot change. Also, each Google account may be subject to a certain geographical data since Google 12 Home was located in the lab, which is leading to provide location-based answers during the study sessions. Note that I reset Google Home to the factory defaults in every 5 accounts since Google Home allows to link to only 5 accounts. After providing Google Home’s information, each participant was given brief instructions to understand how to do a think-aloud and a training set to practice a think-aloud session (Appendix 4). After finishing all of the instrument sessions, I asked participants to think aloud while conducting the tasks. The think-aloud instruction in this study is based on the guidelines provided by Ericsson and Simon (Ericsson & Simon, 1980). Each task was written on a separate sheet of paper. To control for order effects, I randomized the order of tasks with the help of a Latin square. To be specific, the first, sixth, eleventh, and sixteenth participants began with the first task and the second, seventh, twelfth, and seventeenth participants began with the second task (Table 1). I did not pose a time limitation for the completion of each task, and it was finished when they desisted from doing the task. After completing all tasks, I conducted a semi- structured interview with general questions, like “Tell me what you thought about the interaction with Google Home during the task”, that sought to comprehend the overall perceptions but avoid prejudging the answer (Granger Morgan, Fischhoff, Bostrom, & Atman, 2002). Then, the participant was asked to elaborate their behavior to recall underlying reasons that they might have for formulating their strategies. Developing those follow-up questions based on the participants’ answers, I probed for a set of causal beliefs that impact their behavior during the tasks. During the interview, a video camera was set up in the lab to record the interviews to capture participants’ facial expressions while they were interacting with Google Home. Lastly, the participants were asked to complete a survey for demographic information (Appendix 5). 13 Order Participant P1, P6, P11, P16 P2, P7, P12, P17 P3, P8, P13, P18 P4, P9, P14, P19 1 2 3 4 5 6 Task 1 Task 2 Task 3 Task 4 Task 2 Task 3 Task 4 Task 5 Task 3 Task 4 Task 5 Task 1 Task 4 Task 5 Task 1 Task 2 Task 5 Task 1 Task 2 Task 3 Task 6 Task 6 Task 6 Task 6 P5, P10, P15, P20 Task 5 Task 1 Task 2 Task 3 Task 4 Task 6 Table 1: The order of tasks for each participant 3.3 TASK DESIGN I designed six tasks (Appendix 6) to learn about how people converse with Google Home. All tasks were provided with a scenario-based context, except the last one which allowed participants to ask anything they wanted, so they could easily pretend to seek certain information (Nielsen, 2014) . To make the task actionable, prompts in all tasks are designed to ask participants to talk with Google Home (e.g., please ask Google Home about it). Further, all tasks did not contain information that provided an explicit direction to find a piece of information. In other words, tasks in this study are subject search types (K.-S. Kim & Allen, 2002). For instance, I did not ask participants to find travel information from a certain information source in the first task. By using the subject search type task, it is possible to gain data more closely resemble real- world information seeking (Rieh, 2002). Posing a specific direction to do a task may result in biased behavior that is restricted within limited results, so I intentionally used only subject search type tasks in this study. 14 Tasks were developed based on the ability lists on Google Home’s website4 to demonstrate users’ practical actions as the usage would be in real life. To design unanswerable tasks, some tasks were not intended to refer to the ability lists on the website. The goal in this study is to understand how people perceive home virtual assistants rather than to test performance of the system, and thus, I intentionally included unanswerable tasks to observe the way participants adjusted their approach to find the answer. By adapting a controversial task and a predictive task, I designed unanswerable tasks. I designed answerable tasks to apply certain information needs such as factual tasks or instrumental tasks (Jeonghyun Kim, 2006). When asking for a definition, for example, it is easy to receive an answer by using only one question, so I excluded descriptive type tasks in this study. The tasks were given in written form, with fields such as travel, shopping, passport, Obamacare and sports. The final task was open to participants where I allowed them to ask anything they want. 3.4 ANALYSIS I recorded the interviews and collected log data in the task sessions through Google activity5 where Google accumulated users' verbalized sentences and displays its results. All recorded data was transcribed in the text data by Scribie (www.scribie.com). Based on the transcripts from Scribie, I conducted qualitative data analysis by using Nvivo. To identify strategies and mental models, I iteratively analyzed the data by using an inductive qualitative approach. I began the process of open coding by reading the full transcripts to identify similar patterns in participants’ questions and emergent causal explanations about participants’ beliefs 4 Tasks were developed in October and November in 2017. Google Home’s ability lists are available to see at https://support.google.com/chromecast/answer/7130274?hl=e n&ref_topic=7195641 5 Available at https://myactivity.google.com/ 15 and expectations toward Google Home. To be specific, I focused on a certain phrase and an underlying purpose in the question that the participants said in order to identify each strategy. I examined only tasks' transcripts to extract participants' strategies. In the case of mental models, I examined both tasks' transcripts and semi-structured interviews’ transcripts. With this approach, I investigated how frequently people talk about a certain phrase (wh- question, direct action, or only a word) or a certain concept in order to discern their strategy, focusing only on a single question at first. The unit of analysis for the strategy was one question that begins by uttering a trigger word, such as "Hey Google" or "Okay, Google". I then re-read the entire task transcripts to recognize whether each question implied a certain intention. For instance, I coded both “Hey, Google, do you know anything about Obamacare?” and “Hey, Google, do you have prices for televisions in East Lansing?” in the ‘Tell me anything you know’ strategy, although the second one does not contain the specified word “any”, because both sentences are used to try to extract any information related to a certain topic. To identify mental models from the transcripts, I focused on how participants explained the way they formulated the next question based on their previous experience and interaction with Google Home during the task. In addition, I examined participants' general thoughts and feelings and any causal explanations in the semi-structured interviews' transcripts. 16 CHAPTER 4 FINDINGS My findings below focus on patterns of participants’ strategies and their mental models that formed their strategies. Note that the goal of my study is not intended to evaluate Google Home’s success or failure in answering users’ questions. Even though I might assess success based on whether Google Home provides the right answer to what users are asking, my particular concern in this paper is to gain insight into how participants change the way they speak to Google Home based on its responses. 4.1. HOW DID PEOPLE SEEK SPECIFIC INFORMATION? In this section, I described how participants interact with Google Home to find information on answerable tasks. People tend to provide more information for their goals Participants spent the longest time to perform factual tasks to find out several pieces of information in comparison with the other types of tasks. They did not finish the factual tasks immediately, which means that participants applied a variety of strategies in the two factual tasks. The most commonly used strategy was a 'personal context' strategy to explain personal backgrounds or desire. Participants often explained what information they were looking for as the first question. This is because participants were not familiar with interacting with Google Home and they adapted their habitual behavior that demonstrated what they wanted to know exactly to the interlocutor: 17 “Yeah, I usually try to give some kind of background as far as where I'm coming from and what I'm doing. So I think that's more of habit.” - P11, interview “I don't know how Google Home works internally really, but I figured that would potentially help if I gave it context for what I wanted then maybe it would be able to find results more tailored to what I wanted specifically.” - P13, interview This is a common factor in human conversation that people share information and update their common ground to make clear the conversation. Some participants even gave more than one sentence to detail their needs: “Okay, Google. I'm thinking of going to Japan here real soon and I would like to find a few things or get some answers to a few things here, like about flights or accommodations. Maybe what I can do there, where I can stay, that's basically accommodations. Give me some ideas about the flights, how much it might cost, how long it would take going to Japan.” - P6, think aloud However, Google Home could not answer the question including a personal context such as their needs or future plans. Even worse, Google Home kept asking participants to set up a payment method if they said they wanted to buy a television (“It looks like you don't have a payment method or delivery address set up yet. You can do that in the Google Home app.” – Google Home). Once Google Home could not give an answer or provided an inaccurate answer, participants usually made their questions more specific to clarify what they meant. Participants surmised that the reason Google Home could not give the right answer was because of their questions: “I think I just asked what are the most recent flights, but probably I just didn't give it a specific day to search for. Hey, Google, when is the next flight going out to Japan?” - P12, think aloud 18 Based on their knowledge, they also specified their questions by adding a representative brand (e.g., “Okay, Google. Does Best Buy have Visio televisions?” – P2), or a certain place. They thought it would help Google Home understand more clearly what they were looking for. For example, P18 added a hotel brand to the question: “Yeah, that's a too wide-open question. Let's see. Yeah, I think without really having a specific destination in mind, if I was doing this normally, I would look at a map of Tokyo and identify specific places I want to go to Tokyo. Hey Google, what is the nearest hotel in Tokyo to the Imperial Palace?” - P18, think aloud An experience allowed them to form a simple question Using a simple imperative form of the question, instead of a wh-question, is another strategy to extract information from Google Home when participants could not receive an answer from Google Home (e.g., “Hey, Google. Tell me about Japan.”). One participant (P13) used only the imperative form after receiving the right answer. Another participant (P8) used this strategy for the first question since they had seen the simple imperative form of the question in a commercial for an HVA. They believed they were supposed to use this strategy to be successful and that it is a natural way to interact with Google Home: “I just feel like I've heard that said to Google Home before, maybe like commercials or whatever. That's the way that they want you to use it, or I shouldn't say that necessarily, but the way that they advertise, Google Company advertises it being used so that feels like the highest success rate for...” - P8, interview “Perhaps, or at least the easiest way to most likely get an answer.” - P13, interview Google Home increases people’s expectations due to its answer Participants tended to ask for additional information when Google Home gave a satisfactory answer. Participants wanted to know the exact flight time or the airline’s name 19 and the hotel’s price, amenities in the hotel, or more detailed information, conducting the first task (see details in Figure 1). For example, P1 asked about the entrance fee of a certain place that Google Home answered: P1: “Hey, Google. What other places of interest are there in Osaka Japan?” Google Home: “Here's a summary from the website Japan-top.com. Amongst Osaka's many attractions, these stand out: Dotonbori in Namba, and K-Comedy Theatre, Osaka Castle, Tenjin Matsuri, Sumiyoshi Taisha Shrine, Shinsaibashi and more. To read more, look for the link in your Google Home app.” P1: “Hey, Google. How much does it cost to visit Osaka Castle?” Furthermore, they also wanted to know specific information regarding the TV and the other option to issue a passport, after receiving the right answer (see details in Figure 1). P10: Hey Google, does the Royal Park Hotel have a swimming pool? P17: Hey Google, what is the prices of those rooms in Tokyo? P14: Hey, Google, for the flight from Lansing, Michigan, to Tokyo, Japan, what are my flight times? P9: Okay, Google. What airlines fly from Lansing to Tokyo? P2: Does Best Buy have flat screen televisions? P15: Okay, Google, does Best Buy have smaller TVs? Task 1: Task 2: Task 3: P4: Hey, Google, is there a place other than online I can go to get my passport? Figure 1: Participants’ questions in the factual tasks: After receiving the right answer, participants asked questions again to know additional information. 20 Because a majority of participants believed that Google Home would perform like a Google search, they expected Google Home might be able to provide detailed information, reflecting their previous experience with Google search, where they can find anything they want to know: “which is weird because if you were to Google that, that’s the first thing that would come up is all the hotels, how much they are, that kinda thing.” - P2, interview People tended to align with the language use from Google Home’s answer Participants also sought more detailed information, receiving an appropriate answer. To ask for detailed information, participants developed a question based on information that was given by Google Home. They tended to reuse a word that Google Home said (see details in Figure 2). People tended to align their questions by adopting the computer’s word choice like the interaction between humans (Branigan, Pickering, Pearson, & McLean, 2010). 21 P1: Hey, Google, what documents do I need to get a passport? Google Home: On the website usa.gov they say, "What you'll need to provide, Form DS-11, proof of US citizenship, government-issued photo ID, color passport photo, and fee payment." P1: Hey, Google, where do I get a passport photo? Example 1: Example 2: P13: Okay Google, how do I get a passport for the first time? Google Home: On the website travel.state.gov, they say, "Inside the United States, you must submit your completed application including citizenship evidence, photocopy of ID, photo, and fees in person at a passport acceptance facility. You may be eligible to apply in person at a passport agency if you are traveling soon." P13: Okay Google, where is the nearest passport acceptance agency to me? Figure 2: Participants’ questions in the instrumental tasks: Participants align the questions by using a word to which Google Home gave in order to seek additional information. Relatively easy to find information in terms of how to do something Unlike the other types of task, it was easier for participants to interact with Google Home on the instrumental task. I found similar patterns and strategies in the instrumental task (i.e., search for an instrument to know how to use something) with factual tasks (i.e., seek specific information). It seems obvious, since both tasks are designed to find out specific information, even though the instrumental task is specialized to find ‘how-to’ information. However, participants did not stumble over coming up with a new question to find information. Participants received a means to issue a passport, using a straightforward wh-question, which was the most common way to ask (e.g., “Okay Google, where could I get a passport?” - P16, 22 “Okay Google, how can I get a passport?” - P19). Even though some participants explained their needs to form the first question (e.g., “Okay, Google. I need a passport, so I can travel overseas. How do I do this?” – P6), they immediately asked again, leaving out their personal contexts: “..when I changed the wording of the question, it did give me, it found a website that had information. So that was, it was very straightforward. It understood those questions very easily.” – P20, interview Sometimes participants received an inadequate reply, however, they seemed to be more persistent in this task than the other tasks. For example, Google Home formed a hasty conclusion that participants wanted to know how to renew a passport. Then it gave the information how to renew the passport (see details Figure 3). Despite its blunder, participants readily rephrased their questions to be clearer, adding just an adjective (e.g., “new”, “first”). Figure 3: Data from Google Activity in terms of task 3: Google Home provided information regarding a passport renewal, although a participant asked, ‘where can I get a passport from’. It happened because Google Home is currently designed to just read a text on the website that has a relevant title. 23 4.2. HOW DID PARTICIPANTS INTERACT WITH GOOGLE HOME AFTER LEARNING ITS LIMITATIONS? In this section, I described how participants interact with Google Home to find information on unanswerable tasks. Different approaches in the first attempt More than half of the participants asked Google Home about general information by using a broad type of question in the controversial task (e.g., “Okay Google, can you give me any information about Obamacare?” – P6, “Hey, Google. Please tell me about Obamacare.” – P8, “Okay Google, what is Obamacare? – P16), although participants were asked to check Google Home’s opinion on Obamacare. Some participants explained that they wanted to know how to proceed with a conversation based on the broad question first: “First, I wanted to be just like, "What is it?" so that I would get an idea of where I could start to understand that part.” – P16, interview “I guess I thought that rather than starting with just the opinion it was important for me to ask questions about facts and kind of then create my own opinion before I asked their opinion.” – P11, interview Nevertheless, they could not gain enough information to develop the next question so that they asked Google Home’s opinion in the end. After learning of Google Home’s inability to answer in terms of its opinion on Obamacare, a few participants gave up on the task. However, most participants attempted to ask more questions, making use of different strategies. In case of the predictive task, most participants asked about Google Home’s anticipation regarding the NBA playoffs, while some participants sought to determine the current state of the Detroit Pistons by asking a broad question at the beginning (e.g., “how are the Detroit Pistons doing?” - P15). Since the predictive task was designed to be unanswerable, participants could not 24 receive any piece of information regarding Google Home’s prediction. To deal with this challenge, participants applied various strategies to extract information. People commonly use a broad question Although there was no dominant strategy for seeking information regarding the unanswerable tasks, the most commonly used strategy was ‘Tell me anything you know’ strategy (see detailed in figure 4). In other words, participants provided an open question to draw out any information Google Home knew. They took a step backward and tried to think broadly about how to formulate the right question by using this strategy. Some participants were doubtful about Google Home’s knowledge on a basic topic so that they wanted to make sure whether Google Home was able to provide information regarding basketball or NBA. Since Google Home could not show its limitations to form a common ground between participants and Google Home, participants adjusted their expectations of its ability by themselves: "Cause I think the first time I asked about the Detroit Pistons, it said, "I don't understand," or "I can't help you with that." Or I think I asked about the Pistons, and then I asked about the Detroit Pistons, so I'm like, "Alright. Does this thing even understand what basketball is?" So that's why I went that route to make sure it had some clue.” - P14, interview 25 Case 1: ‘Tell me anything you know’ strategy in the controversial task. P9: Okay. She's not sure about the answer. Okay, Google. What do you know about Obamacare? P12: So it said it didn't know when I asked for its opinion. Hey, Google, do you know anything about Obamacare? P13: Okay. What... Do I have anything else to say? Okay, Google, can you give me your opinions on anything? Case 2: ‘Tell me anything you know’ strategy in the predictive task P14: Hey, Google, do you know what the NBA is?” P5: Hey Google, do you know the National Basketball Association? P6: Okay, Google. What do you know about basketball? Figure 4: Participants’ questions in the unanswerable tasks: Participants broadly asked questions to extract any related information or check Google Home’s ability. A diverse range of strategies to gain related information Participants also made their questions more specific, providing their personal information or knowledge. To deal with Google Home’s inability, participants ask about related information the Pistons’ current record, last year’s results, remaining games, or the criteria for determining playoff teams – to give more specific contexts to Google Home. Because participants recognized Google Home’s limitation to predict a future event, they, at least, attempted to accumulate a piece of information to predict the possibility by themselves: “I want to just find out any information about the Pistons that I can, and from there maybe get some semblance of a prediction for how they'd be doing the playoffs.” - P13, interview Some participants (P9, P13, P14) believed using a better-known synonym (Affordable Healthcare Act) would be of help to get the answer from Google Home. P14 thought since 26 Obamacare is the informal word to call Affordable Healthcare Act, she need to use the right word: “Well, that doesn't help me. Obamacare is also the Affordable Healthcare Act, so let me try asking about that. Hey, Google, can you tell me about the Affordable Healthcare Act?” – P14, think-aloud In the predictive task, some participants (P9, P10, P15, P19) rephrased their questions, asking about the odds of the Pistons making the playoffs. They believed that Google Home was able to figure out the odds by calculating data itself or by referring to a certain website. This perception resulted in participants’ relatively high expectation toward Google Home so that participants kept asking questions by using different keywords, making questions more specific: “I guess it's kind of hard to predict on whether or not you make the playoffs, but I would think maybe that Google would have the ability to run some sort of analytics to say, "This is their odds of finishing in the top eight or not."…” – P9, interview P18 who was a sports journalist at one time made their questions more specific by using his experiences. He attempted to extract Google Home’s prediction by using a hypothetical question (e.g., “Hey, Google, would the Detroit Pistons be in the playoffs if the playoffs started today?” – P18). Despite this participant’s professional knowledge, Google Home did not provide any additional information. Another participant behaved as if he taught Google Home, explaining a common idea (e.g., “Okay. Okay, Google. Detroit Pistons are a basketball team. Do you know anything about them?” – P6). He believed Google Home learned if he kept explaining a certain information. 27 4.3. WHAT MADE PARTICIPANTS STOP USING IT Participants usually gave up on tasks once they had no idea to how clarify their questions any further. Sometimes participants easily completed up their tasks, receiving an adequate answer or realizing its limitations. In those cases, they lowered their expectations regarding Google Home's performance, and thus, did not demand any further detailed information from Google Home. Nevertheless, the majority of participants thought having a conversation with Google Home was challenging. While Google Home is considered as a good tool to ask a simple question for a trivial fact, participants mentioned formulating the next question is difficult as following up on Google Home’s feedback. In the end, they were discouraged to use Google Home to find information, even though they attempted a variety of strategies to extract more information: “It's difficult to tell it exactly the question that you want and have it understand you. Sometimes you get inaccurate information or... Not inaccurate, but not helpful information. Sometimes it doesn't understand you and you have to re-ask your question multiple times.” - P19, interview “Like the Obamacare one, I was trying to ask her what the political debates about it were, but she didn't have any information on that. Then I tried to rephrase it and use different words, but she still didn't get it. It was difficult to think of a new way to say it that she would understand.” - P16, interview One distinct thing that deterred participants in the first task was a preprogrammed Google Home’s answer in terms of the flight reservation. Although it seemed convenient to find a ticket in response to the question, participants struggled to follow Google Home’s joint action since it was able to reply to only a certain form of short answer (see details in Figure 5). The majority of participants were also struggling to follow up on Google 28 Home’s long responses to unfamiliar or unrelated topics in order to develop the next question. It was difficult for them to memorize or gain specific information from Google Home’s long answer. (See details in Figure 6). Such Google Home’s preprogrammed answer can be considered as Google Home’s effort to exchange information for a successful interaction, but participants could not appropriately follow up its communication way. 29 P14: Hey, Google, can you find me a flight from Lansing, Michigan, to Tokyo, Japan? Google Home: Upcoming round-trip flights from Lansing to Tokyo start at $1,379. Did you have any dates in mind? P14: Hey, Google, I would wanna go January 2nd through January 8th. Google Home: Sorry, one more time. P14: January 2nd through January 8th. Google Home: Sorry, I'm not sure how to help with that. P14: Okay. So I don't know how to talk to Google to do the followup that they asked me a question, so we'll try that again and I'll just start with "Hey Google," which is now awake. Hey, Google, can you find me a flight from Lansing, Michigan, to Tokyo, Japan, January 2nd to January 8th? Google Home: I didn't catch that either. Try saying your dates like, "February 16th to February 20th." P14: Hey Google, January 2nd to January 8th. Google Home: I'm still not getting it. Please try again later. …… P14: Hey, Google, to fly to Tokyo, Japan, from Lansing, Michigan, on January 2nd, what is my departing flight time? Google Home: Round-trip flights from Lansing to Tokyo, leaving January 2nd and coming back January 8th, start at $1,682. P14: It's interesting that Google Home can still remember the timeframe that I was looking at, the exact dates I wanted to travel, Figure 5: Participant’s reaction toward Google Home’s preprogrammed answer: Since Google Home restricted participants to use a certain type of short answer in order to provide additional information, the participant was struggling to follow up Google Home’ question. Also, P14 was surprised that Google Home still remember the exact dates, although P14 asked only the flight time on the departure date. 30 P15: Okay Google, is there a restaurant in the airport in Tokyo, Japan? Google Home: I found a few places. The first one is at Hanedakuko, 3 Chome- 4-2, 2 in Ota. The second one is Tsurutontan Haneda Airport shop at 2-6-5, 4F in Ota. The third one is Takafuku at 2-6-5 in Ota. P15: Wow. [chuckle] Okay Google. Gee, I don't know what to think here. Figure 6: Participant’s reaction toward Google Home’s long response: Participants usually could not follow up information if Google Home provide ‘text-to-speech’ type of answers. Moreover, participants mentioned the way Google Home deals with errors is too unspecified to proceed to a follow-up question. In other words, participants thought Google Home is derelict in its function to inform them of how to reword their question: “like I say, the "I'm still learning," is a little frustrating after a while because... The questions are not... We're not asking it really complex questions but it's... It goes over so many options” - P18, interview Relying on only the speech to seek information also exacerbated the interaction. Compared to the usage of Google search on a desktop computer or mobile device, participants explained obtaining the information by only using a conversational interface was too difficult to obtain further information. Since participants had no experience in HVAs, they were accustomed to making reformulated search questions based on the search result on the screen. Thus, they found it difficult to develop another question, relying on only Google Home's answer: 31 “It's obviously not like Google where they give you 20 million choices and then you can scroll down and pick which ones sound the best. I don't know, I guess maybe if she gave you choices like, "Is this what you want? Is this what you want?" that would be cool…Cause at least actual Google does that. It doesn't... It might not know exactly what you want, but it at least gives you multiple suggestions and then you can pick which one that you're like, "Okay, that's sounds most like it might be helpful to me.” - P2, interview “Well, I wonder if there's a specific format to use to ask questions and if that's how it's programmed than the user needs to know how to ask the questions if that's what's wrong.” - P1, interview 4.4 MENTAL MODELS OF HVAS My findings indicate that participants attempted to formulate different strategies depending on Google Home’s answers and based on their different levels of expectations. A majority of participants attempted to use diverse strategies, not adhering to only one strategy across different types of tasks. Although participants tended to use more specific strategies in answerable tasks (Task 1,2,3) and more broad strategies in unanswerable tasks (Task 4, 5), there is no distinguishable pattern between each different task. Since mental models keep evolving to adopt new knowledge, participants consistently adjusted the way they communicated with Google Home. I found two types of mental models that influence the way people interacted with Google Home. Most participants made use of both models to find out information. I briefly summarize two models and related strategies in Table 2. Each strategy is also not mutually exclusive and sometimes the participant used more than two strategies in one question. 32 Common-sense models The first model is common-sense models in which participants opt to make use of common sense in a human conversational manner (i.e., grounding) or cliché that anyone easily understands. Although participants did not perceive Google Home as a human being, they believed using a human-to-human communication skill may facilitate a conversation with Google Home, especially when Google Home failed to answer them. People have conversations to coordinate with people by aligning each other’s knowledge and experience (Clark, 1997). It led participants to bring a question with personal contexts at the beginning of the task to coordinate a common ground: “That's just naturally. I guess when I talk to people, I try to provide that context because if somebody comes with me with just a question goes, "I want to buy a TV.” - P14, interview People also tend to adopt a range of words or phrases by their interlocutor in human-to-human communication (Branigan et al., 2010). In a similar way, participants developed their questions based on the word used by Google Home. Participants typically used this model when they believed Google Home could understand general knowledge everyone may know. They believed telling a representative brand’s name or a place that everyone may know would make it easier to receive answers from Google Home. For example, P10 asked a question to include Las Vegas (“Hey Google, What does Las Vegas say the odds are for the Detroit Pistons basketball game to win the NBA playoffs?”) because she believed Google Home found the possibility of making playoffs: 33 “Like the NBA. Who could predict that? But I thought Las Vegas could predict that. So when I asked for odds I thought maybe that would help [….] cause they set the odds for all the professional games. And then when people bet they know how much of a chance there is for their team.” - P10, interview Participants also included the representative retail shop – Best Buy –where they used to visit to buy a TV: “I feel like that's one of the biggest tech stores, and it's the one that I'm most familiar with. That's where I usually get my electronics from, and they're everywhere, so...” - P8, interview “Because I know that you can buy a television at Best Buy. That's where I would go, so I probably would give it information..” – P5, interview Machine-oriented models The second model is machine-oriented models are associated with high expectations of Google Home that usually lead participants to push for a specific piece of information. People tend to focus on a certain entity (e.g., Google search engine) during the interaction with a computer, and thus, this source orientation influences their behavior (Solomon & Wash, 2014). Since all participants had no experience in any types of HVAs, they came up with the affiliated service of Google in lieu of Google Home itself: “when you think of Google Home, it's Google and Google has everything you ever wanted to know, so […] I'm surprised that it couldn't find any answer to that.” – P9, interview “So I’m thinking, "Okay, so this is a Google device. Obviously, it probably has a way to access the Google engine."”- P12, interview 34 This source orientation led participants to formulate specific questions that demand further information. Because they believed Google Home could find information through Google search where participants used to find anything they want. Some participants described their strategies to seek information through Google Home were made based on how they typed in Google search. They believed recombination of keywords by adding particular information would help to receive the right answer from Google Home: “Yeah, I try to see what are the current debates or what is the political debate. 'Cause I thought that normally when I search stuff into Google, I'll use keywords like "political" or "current" to focus the search more” - P16, interview “So the only way I can equate that is when I was first learning how to use Google Scholar knowing how to put in the search terms to find what I wanted and figure out what language it understood.” – P14, interview Although Google Home reminded participants of Google search, this conceptualization would not help them to use Google Home. Google Home sometimes gave the same answer that provided in the google search result page (See details on Figure 7), however, it could not facilitate to proceed a further conversation. On the Google search page, people can see more than one result at a time, which means that they formulate their next sentence to find information. On the other hands, Google Home provided only one answer, resulting in the difficulty to have a conversation. 35 P1: Hey, Google. Where is a good place to buy a television in East Lansing, Michigan? Google Home: I found a few places. The first one is WLNS at 2820 East Saginaw Street in Lansing. The second one is WLNS-TV at 3660 Van Atta Road in Okemos. The third one is Best Buy at 2020 West Grand River Avenue in Okemos. Figure 7: The result of Google search(left) and Google Home(right) in response to the same question: Google Home provides the same answer that people are given in the Google Search. However, Google search displayed more than one information which allows people to select the information they need. Participants also conceptualized Google Home as an artificial intelligence device that is able to learn based on the pieces of information received from users. Participants attempted to train Google Home by presenting a certain information if they believed Google Home had the 36 ability to collect data from a verbal interaction between Google Home and users. Especially, people formed this model once Google Home provided an answer based on the participants’ previous questions. For instance, P14 believed Google Home is operated by using not only its own dataset, but also data received from users, after seeing that Google Home retained the travel schedule during the conversation. (see details in Figure 5). As reflecting this experience, P14 said that using a broad question to extract what Google Home knows about a certain topic would be helpful to proceed a conversation with Google Home: “it does seem to have some memory and some retention [….] so it did remember something. It did hold something. So had that question come before the NBA question, I guess I could have started with, "Do you know what the NBA is?" "Do you know who Detroit Pistons are?" And built it that way to see if something could stay in its memory to build the information that I wanted better.” - P14, interview P9 also pushed information that may be helpful to develop Google Home’s ability in order to proceed further conversations. P9 thought Google Home is able to retain information from his previous questions and provide related information. (See details on Figure 8) 37 Task 3 during the think aloud session: P9: Okay, Google, why is an LED TV better than a plasma TV? Google Home: On the website pcmag.com, they say, "In plasma HDTVs, the phosphorus that create the image on the screen light up themselves and don't require backlighting. This doesn't mean it's more efficient than back-lit LCD screens, though. On the contrary, plasma panels are much heavier than both CCFL and LED back-lit LCDs and consume much more power." P9: Okay, Google, can you recommend a TV for me to buy? Google Home: On the website ebay.com, they say, "LCD technology is more expensive than plasma so a large plasma screen can be bought for the price of a much smaller LCD screen." P9: That is not the question that I asked. Okay, Google, can you recommend a TV for me to buy? Google Home : Here's a summary from the website bestbuy.com. Best Buy customers often prefer the following products when searching for LG plasma TV: LG 65 inches class LED 2160p Smart 4K Ultra HDTV with high dynamic range and more. Semi-structured interview session: P9: I feel like it learned. Or it understood what I was looking for based on a previous question, so I was, like I said, trying to coach it into understand what I want it to do Figure 8: Participant’s interaction with Google Home in task 3 and the debriefing from the interview session: P9 received TV recommendation based on previous question’s context from Google Home. Both models are made of some extent of expectations on Google Home, which is formed based on the interaction with Google Home. The two models indicate the underlying causal process of how Google Home works and why people used a certain strategy. In the next section, I demonstrate design implications based on two mental models. 38 1) Common-sense Model: Description Description of the models Related strategies Google Home should have common sense. Google Home has a conversation to coordinate with people by aligning each other’s knowledge and experience. ‘Personal contexts’ strategy: Participants provide context regarding personal Information or their needs. ‘Knowledge in my head’ strategy: Participants specify a context by using their own knowledge or adding a representative brand or a certain place. ‘Alignment with what Google Home just said’ strategy: Participants align their queries by using a word to which Google Home gave. Description of the models Related strategies 2) Machine-oriented Model: Google Home operates as Google search where you can find any information. Google Home is able to learn and give better answers by harnessing data users provide. ‘General specific-wh-question’ strategy: Participants rephrase a question by adding particular information. ‘Teach you’ strategy: Participants try to develop Google Home’s ability by providing certain ideas. ‘Tell me anything you know’ strategy: Participants pose an open question to extract what Google Home knows about a certain topic. Table 2: Two mental models and related strategies 39 CHAPTER 5. DISCUSSION People without previous experience with home virtual assistants used a variety of strategies to seek information, and their understandings of HVAs changed throughout their interaction. Unlike human-to-human communication, interacting with HVAs are not communicating with a conversation partner that is capable of grounding, although HVAs kept answering people’s questions. Based on my findings, I present the most important factors for improving the design of HVAs to foster a more realistic conversation with them. 5.1 GIVING AN ANSWER IS NOT A SOLUTION FOR THE EFFECTIVE INTERACTION Google Home responds to questions if it is able to detect any related key-words from people’s question, although it is not exactly related to people’s intention. Considering the machine-oriented model, giving a relevant answer is not the best solution for having a conversation. After receiving the answer, participants formed unnecessary expectations toward Google Home so that they overestimated Google Home’s ability. As a result, people may be unable to make an accurate judgement of Google Home’s limitation to have a conversation. Google Home is known as an HVA device that gives a correct answer more often than the other HVAs devices (Sterling, 2017). Google Home provided the most relevant information regarding a topic or listed the closest places in the same manner as Google search where people are given the most relevant information. For example, Google Home could find out an upcoming flight schedule with the price once it was given a question regarding the flight. However, Google Home could not provide any additional information such as a departure time, the airline or an alternative route, information that is often provided in a Google search. In addition, Google 40 Home provided the related information from Wikipedia or certain websites in response to the participants’ question. Participants naturally re-ask for more detailed information based on that information, although Google Home is limited to discovering further information. Considering the fact that people are accustomed to seeking information to explore back and forth between hyperlinks through a visual interface such as desktop or mobile, Google Home should not provide an answer like Google search’s result. While people can reshape their search keyword, seeing more than one search result on the webpage, they can’t reformulate their questions based on the single answer from Google Home. Presenting the best relevant answer like Google search may raise its accuracy and people’s satisfaction for a while. Nevertheless, it makes people easily become exhausted to proceed a conversation so that inexperienced people are not intended to engage the interaction with HVAs. In this sense, the system designer should consider not only how to provide a relevant answer, but also how to form the answer suited with the HVAs’ interface, not to implant unnecessary expectations. 5.2. HOW TO SUPPORT COMMON-SENSE MODELS Having a conversation is a continuous process to align both interlocuter’s knowledge and experience. In other words, human conversation is not an autonomous action, rather a participatory action based on the coordination of individual actions by two people (Clark, 1996). Unlike human-to-human communications, people cannot be acquainted with Google Home’s ability unless they became familiar with its abilities through other sources such as the media or the company’s marketing booklet. Even, people need to exclusively use a fixed form of questions to seek certain information through Google Home, considering HVAs’ ability list on their official website or commercials. 41 People resolved this challenge by using the common-sense models: (1) they implicitly expected Google Home should know general knowledge and (2) they believed Google Home is capable of grounding so that they explained their backgrounds or further information, or aligned their questions with the word Google Home used. Since the common-sense model is derived from human’s intuitive behavior to have a conversation, I argue that system designers should consider a way to support people to employ the common-sense models. Recognizing a human’s common mental connection between ideas or things Understanding what comes into people’s mind when a certain topic is mentioned is important to design HVAs in order to get rid of a mismatch in the conversation. The system designer should investigate how people associate a certain topic with other things which involve a representative place, product, individual, or group or a metaphor that people often use to describe it. Although what people say and how people describe a certain topic are chosen from a great range of possibilities (Monaghan, Goodman, & Robinson, 2012), HVAs should be designed to be able to recognize a high association between more than two ideas or things in order to support the common-sense model. Diversifying HVA’s error handling strategies In order to help people to reframe their questions, asking different types of clarification questions to get a better idea of what information the people actually need is also important. Current Google Home’s error handling strategies to give an excuse when it can’t answer are not enough for people to support their grounding behavior (see current Google Home’s error handling strategies in Figure 10). After receiving the answer indicating a failure, people sometimes realized they needed to speak a different way. However, they were struggling to develop a new way to ask a question since there is no hint to modify their questions, and thus, 42 they ceased from having a conversation with Google Home. In human discourse, people incessantly make sure each other’s intention if something on the interlocuter’s question is not clear to proceed the conversation. In this sense, Google Home also need to ask questions to clarify people’s intention rather than make rash judgments to provide incomplete answers. Examples of Google Home’s error handling strategies when Google Home cannot respond in people’s questions: • Sorry, I'm not sure how to help with that, but I'm still learning. • Sorry, I don't know how to help with that yet. • Sorry, I'm not sure how to help with that yet, but my team's helping me learn. Figure 9: Google Home’s current error handling strategies: When Google Home cannot provide answers, it did not explain the exact underlying reasons why it cannot respond in the user’s question. Instead, it just gave an excuse to the user. 5.3 LIMITATIONS There are several limitations to this study. Since I designed the tasks using specific topics and conducted a qualitative study using a think-aloud approach, participants’ behavior might be influenced by their familiarity with the topics. A different number of questions in each different task type also impacted my analysis. Compared with factual tasks, the other tasks had relatively fewer questions. Another limitation is that my study cannot be generalized to all inexperienced people, considering my sample size. Also, since I focused more on the diverse range of strategies that participants applied in the task, strategies in my findings cannot be considered as general patterns in the other potential users, because I included all emergent strategies, even though they are not mainly used in the task. In addition, I could not cover smart home usage in the task since my study was run in the lab, even though controlling other smart home devices is one of primary usage. 43 CHAPTER 6. CONCLUSION For this work, I examined a qualitative study to understand the interaction between people and HVAs, observing inexperienced people’s strategies of talk that proceed a conversation and the impacts on the change of interaction from their beliefs and expectations. Current virtual assistant technologies only allow people to use their abilities in routine tasks. Yet, people expect these devices to handle more detailed information from virtual assistants. They want these assistants to understand human conversational manners. In order to encourage inexperienced individuals to be involved in the interaction with virtual assistants, understanding potential users’ mental models is necessary to develop virtual assistants that are prevalent in our daily lives. My findings reveal that inexperienced people attempted to exchange a piece of information to create a dialogue, while HVAs were not capable of grounding with people. People assumed that their participatory actions can clarify the miscommunication. However, due to their lack of HVAs interaction experiences, they adapted their habitual behaviors that demonstrate what they wanted to know. Also, inexperienced people overestimated HVAs’ abilities once they were given a certain answer. I conclude that a single turn-taking is not enough to have a human- like conversation. Rather than focusing on giving a key-word related answer, optimizing the answer in a voice-only interface to do a grounding between people and HVAs is critical to virtual assistants’ development. My research provides the HCI community with a first step to understand how the majority of inexperienced people perceive the virtual assistants and thus make use of them to search for information. To better understand general population’s understanding of virtual assistants, the subsequent work is to further examine mental models across a number of different 44 applications embedded virtual assistants such as Amazon Echo. Furthermore, my work suggests a need for more research into human-like virtual assistants that are able to manage multiple turn- takings for conversation and to control unmet expectations for developing the affordable systems. 45 APPENDICES 46 APPENDIX A. Recruiting Survey Questions Q1. What is your age in years? Q2. What is your gender? [short answer fill in the blank; must be at least 18 to be eligible] • Woman • Man • Fill in the blank _______________ • Prefer not to answer Q3. Have you ever used any devices on the list below? Select all that apply: • Google Home, Google Home mini, Google Home max, any device with google assistant inside [ineligible] • Amazon Echo, Amazon Echo dot, Amazon Tap, Amazon Echo Show, Amazon Echo spot, any devices with Alexa inside. [ineligible] • Apple Home Pod [ineligible] • Any other home appliance operated by the spoken language [ineligible] • None the above Q4. Have you ever used any service on the list below? Select all that apply: • Apple Siri [to Q4.1] • Google Assistant, Google Now [ to Q4.1] • MS Cortana [to Q4.1] • Samsung Bixby [to Q4.1] • Any other service operated by the spoken language on the smart phone [to Q4.1] • None the above (If the subject answer anything except “None the above” on Q4, one or two additional follow-up questions will be presented) (follow-up) Q4.1. How often do you usually use ? • Several times per day [ineligible] • About once per day [ineligible] 47 • A few times per week [ineligible] • About once per week [ineligible] • A few times per month • About once per month • Less than once per month (follow-up) Q4.2. Which feature do you use? Select all that apply: • Get directions • To call or text someone • Set alarms or timers • Set reminders or make “To-Do” list • Check weather information • Play music • Launch an app • Check news [ineligible] • Read emails or messages aloud to me [ineligible] • To search for something on the internet [ineligible] • Others ____________________ [ineligible] Q5. Do you now, or have you ever worked in a job where your responsibilities included: • Development or Quality Assurance for a computer program operated by spoken language interface or dialog systems(i.e., Facebook messenger bot, Slackbot), or any kind of virtual assistants • communicating with clients or customers via a computer program operated by spoken language interface or dialog systems (i.e., Facebook messenger bot, Slackbot) • working on the development of the conversational interface in virtual assistant or a computer program operated by spoken language interface or dialog systems (i.e., Facebook messenger bot, Slackbot) for your organization? Yes [ineligible] No Q6. Do you now, or have you ever worked in a job where your responsibilities included: computer programming, quality assurance and testing, IT security, or network administration? Yes [ineligible] No 48 Age 71 38 20 56 48 56 39 19 34 61 35 22 21 35 61 19 47 59 36 22 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Appendix B. Participants’ information Gender Experience in VA Internet Literacy (1-5) F F F F M M M M M F F F M F F F M M M M No No Yes No No No Yes Yes No Yes Yes No No Yes No Yes No Yes Yes No 2.9 2.1 2.9 2.1 2.5 3.6 3.9 2.8 4 2.6 2.8 3.1 3.4 3.0 3.5 3.1 1.8 4.4 3.1 3.5 Table 3: Participants’ demographic information 49 Appendix C. A brief instruction of Google Home • Instruction: Google Home is a smart speaker developed by Google. Google Home enables users to speak voice commands to interact with services through the intelligent personal assistant called Google Assistant. Google Home is able to detect your voice when you say “Hey Google.” or “Okay, Google.” During the test, you need to utter one of the two unless Google Home do not answer your question. • Conduct a sample task to make sure whether the participant understand how to use it (Sample task: Please find out the name of the President in France. After finding it, please say “Thank you, Google.”) 50 Appendix D. Think-aloud instruction “In this session, you’ll be asked to 'think aloud'. The most important thing for you to remember is that this is not a test – there is no right or wrong answer to ask a question to Google Home. The point of the study is to learn about how people talk with this kind of device to achieve what they want. Your honest responses will be extremely useful in helping us understand users’ belief or knowledge of the Google Home. Also, participation in this study is completely voluntary. Although I don't know why this would happen, if you become uncomfortable in any way feel free to stop at any time, and you will still receive $20 for participating.” “So, during this session, I’m interested in two things: how you approach to get an outcome/answer what you want, and what you think about as you converse. I’ll teach you how to do this and give you a chance to practice in just a second. It’s very easy to do, and we get a lot of good information from these kinds of studies about people’s thought processes while completing certain activities. It’s basically like you’re talking to yourself, but loud enough for other people to hear.” “Now, when I say ‘think aloud’, what I mean is that you should say whatever is on your mind while you complete the exercises. I want you to tell me EVERYTHING that you are thinking from the time you begin an exercise until you finish it. I would like you to talk aloud as CONTINUOUSLY as possible – even if the only thing you are thinking is “I’m drawing a blank.” 51 “I want to hear about what you’re looking for or what you’re trying to do, even if it seems obvious. If you hesitate or are indecisive, describe what’s causing your hesitation. Don't try to plan out what to say or try to explain to me what you are thinking. Don’t hold back hunches, guesses, wild ideas, or negative comments – these things will all help us understand how the software is being used. Just try to act as if you are alone, speaking to yourself – only a little louder.” “You can also ask me questions, but I can’t explain anything about the system or task to you right now. If you do have questions, go ahead and verbalize them anyway so we can learn about the kinds of questions and problems people have, and I’ll answer your questions at the end of the session. And don't worry, if you forget to “think aloud” I will jump in and remind you to keep talking.” “Now, you try thinking aloud. Here's a problem: please think aloud while you answer the question, ' How many windows are there in your best friend's house or apartment?.” “Good!” “I demonstrate an example to use Google Home. Let’s say I want to know whether I need a jacket tomorrow.” ################################################################# Interviewer: Okay, let’s ask whether I need a jacket tomorrow or not. “Okay, google. Do you think do I need a jacket tomorrow? Google Home: Tomorrow in East Lansing expected a high of x degrees and a low of x degrees. Interviewer: Hmm..I do not still know whether I need a jacket or not. Let’s say, “Okay, google. Is it cold tomorrow? 52 Google Home: Tomorrow in East Lansing expected a high of x degrees and a low of x degrees. Interviewer: Oh, no it gives me the same answer. Maybe Google does not know the meaning of jacket. Let’s try another way. Today I wear a jacket, so maybe I can ask like this. “Okay google, Tomorrow is colder than today?” Google Home: Sorry I’m not sure how I can help. Interviewer: Okay, then..What was the low temperature… “Okay, google. What is the low temperature in tomorrow?” Google Home: degree Interviewer: Okay let’s ask “Okay, Google. X is cold? Google Home: Sorry I’m not sure how I can help. Interviewer: Hmm..maybe Google home can answer only high and low temperature. Okay, then maybe I can ask about cloth. “Ok Google. What should I wear tomorrow?” Google Home: Interviewer: Ok stop google. I’m done! ################################################################# 53 Appendix E. Post Survey Questions about Demographics Q1. What is the last grade or class you completed in school? • None, or grades 1-8 • Some high school • High school graduate or GED certificate • Technical, trade, or vocational school AFTER high school • Some college, no 4-year degree • 4-year college degree • Some postgraduate or professional schooling, no postgraduate degree • Postgraduate or professional degree, including master’s, doctorate, medical or law degree Q2. What was your total household income before taxes during the past 12 months? • Less than $25,000 • $25,000 to $34,999 • $35,000 to $49,999 • $50,000 to $74,999 • $75,000 to $99,999 • $100,000 to $149,999 • $150,000 to $199,999 54 • $200,000 or more Q3. Which categories below best describe you? Select all that apply: • White • Hispanic, Latino or Spanish • Black or African American • Asian • American Indian or Alaska Native • Middle Eastern or North African • Native Hawaiian or Other Pacific Islander • Some Other Race, Ethnicity or Origin (please specify) Q4. In politics, as of today, do you consider yourself a Republican, a Democrat or an independent? • Republicans • Independents [to Q4-1] • Democrats (follow-up) Q4-1. As of today, do you lean more to the Democratic Party or the Republican Party? • Republicans • Democrats 55 Based on Gallup. Party affiliation http://news.gallup.com/poll/15370/party-affiliation.aspx, October 2017. Question about Internet Literacy How familiar are you with the following Internet-related terms? Please rate your understanding of each term below from None (no understanding) to Full (full understanding): • Wiki [None, Little, Some, Good, Full] • Meme [None, Little, Some, Good, Full] • Phishing [None, Little, Some, Good, Full] • Bookmark [None, Little, Some, Good, Full] • Cache [None, Little, Some, Good, Full] • SSL [None, Little, Some, Good, Full] • AJAX [None, Little, Some, Good, Full] • RSS [None, Little, Some, Good, Full] Based on Hargittai, E., & Hsieh, Y. P. (2011). Succinct Survey Measures of Web-Use Skills. Social Science Computer Review, 30(1), 95–107. 56 Appendix F. Task lists Each task gives you a certain situation that you need to find some information, talking with Google Home. If you think you receive an acceptable answer that you are expecting, please say “Thanks, Google”, and then move to the next one. Task 1: (Answerable / factual) Assume that you are planning to take a trip to Japan in the near future. You would like to find useful information about flights, accommodation, and features of interests in a certain destination in Japan. Please ask Google Home about it. When you finally decide where to go, how to get there, what to do, and so on, please say “Thanks, Google.” (Please select any period you want to go) Task 2: (Answerable / factual) Assume that you want to purchase a new television. You want to find a place where you can buy one near your home. Please ask Google Home about it. When you decide the place where you can buy a new one, please say “Thanks, Google.” (Please assume that this place is your home) Task 3: (Answerable / instrumental) Assume that you need to issue a passport to travel overseas, but you do not know how to issue it and where to go for it. Please talk with Google Home to find out solutions. Task 4: (Unanswerable / exploratory – controversial) Assume that you just read news about Obamacare and you are not sure if you want to sign up for it. So, you want to check Google Home’s opinion on Obamacare in order to make a decision on whether you should enroll. Ask for Google Home’s opinion. Task 5: (Unanswerable / exploratory – predictive) 57 Assume that you are a big fan of the Detroit Pistons basketball team and you hope they will advance in the Playoffs this season. You may want to know Google Home’s prediction about it. Please ask Google Home about it. (NBA Playoffs are a tournament among the best 16 teams in the league) Task 6: Please ask anything you want to ask. 58 REFERENCES 59 REFERENCES Branigan, H. P., Pickering, M. J., Pearson, J., & McLean, J. F. (2010). Linguistic alignment between people and computers. Journal of Pragmatics, 42(9), 2355–2368. https://doi.org/10.1016/j.pragma.2009.12.012 Choo, C. W., Detlor, B., & Turnbull, D. (2000). Information seeking on the web: An integrated model of browsing and searching. First Monday, 5(2). https://doi.org/10.5210/fm.v5i2.729 Clark, M., Dutta, P., & Newman, M. W. (2016). Towards a Natural Language Programming Interface for Smart Homes. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, 49–52. https://doi.org/10.1145/2968219.2971443 Cohen, P., Cheyer, A., Horvitz, E., El Kaliouby, R., & Whittaker, S. (2016). On the Future of Personal Assistants. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems - CHI EA ’16 (pp. 1032–1037). https://doi.org/10.1145/2851581.2886425 Cowan, B. R., Pantidi, N., Coyle, D., Morrissey, K., Clarke, P., Al-Shehri, S., … Bandeira, N. (2017). “What Can I Help You With?”: Infrequent Users’ Experiences of Intelligent Personal Assistants. Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI ’17, 1–12. https://doi.org/10.1145/3098279.3098539 DeVito, M. A., Gergle, D., & Birnholtz, J. (2017). Algorithms ruin everything: RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17, 3163–3174. https://doi.org/10.1145/3025453.3025659 Enge, E. (2017). Digital Personal Assistants: Which Is Smartest? Retrieved December 29, 2017, from https://www.stonetemple.com/digital-personal-assistants-test Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215–251. https://doi.org/10.1037/0033-295X.87.3.215 Eslami, M., Karahalios, K., Sandvig, C., Vaccaro, K., Rickman, A., Hamilton, K., & Kirlik, A. (2016). First I "like" it, then I hide it: Folk Theories of Social Feeds. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI ’16 (pp. 2371–2382). https://doi.org/10.1145/2858036.2858494 French, M., & Hancock, J. (2017). What’s the Folk Theory? Reasoning About Cyber-Social Systems. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2910571 Gašić, M., Hakkani-Tür, D., & Celikyilmaz, A. (2017). Spoken language understanding and interaction: Machine learning for human-like conversational systems. Computer Speech and 60 Language, 0, 5–7. https://doi.org/10.1016/j.csl.2017.05.006 Granger Morgan, M., Fischhoff, B., Bostrom, A., & Atman, C. J. (2002). Mental Models Interviews. Risk Communication. A Mental Models Approach, 63–83. https://doi.org/0521002567 Hölscher, C., & Strube, G. (2000). Web search behavior of internet experts and newbies. Computer Networks, 33(1), 337–346. https://doi.org/10.1016/S1389-1286(00)00031-1 Johnson-Laird, P. N. (2010). Mental models and human reasoning. Proceedings of the National Academy of Sciences of the United States of America, 107(43), 18243–50. https://doi.org/10.1073/pnas.1012933107 Jones, N. a., Ross, H., Lynam, T., Perez, P., & Leitch, A. (2011). Mental Model an Interdisciplinary Synthesis of Theory and Methods. Ecology and Society, 16(1), 46–46. https://doi.org/46 Kang, R., Dabbish, L., Fruchter, N., & Kiesler, S. (2015). “My data just goes everywhere”: User mental models of the internet and implications for privacy and security. In Symposium on Usable Privacy and Security (SOUPS) 2015 (pp. 39–52). Kim, J. (2006). Task as a predictable indicator for information seeking behavior on the Web. ProQuest Dissertations and Theses. Retrieved from http://search.proquest.com/docview/305277561?accountid=14643%5Cnhttp://mlbsfx.sibi.us p.br:3410/sfxlcl41?url_ver=Z39.88- 2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&genre=dissertations+%26+theses&sid =ProQ:ProQuest+Dissertations+%26+Theses+Global&atit Kim, J., McNally, B., Norooz, L., & Druin, A. (2017). Internet Search Roles of Adults in their Homes. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17 (pp. 4948–4959). https://doi.org/10.1145/3025453.3025572 Kim, K.-S., & Allen, B. (2002). Cognitive and task influences on Web searching behavior. Journal of the American Society for Information Science and Technology, 53(2), 109–119. https://doi.org/10.1002/asi.10014 Kopp, S., Gesellensetter, L., Krämer, N. C., & Wachsmuth, I. (2005). A conversational agent as museum guide - Design and evaluation of a real-world application. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3661 LNAI, pp. 329–343). https://doi.org/10.1007/11550617_28 Li, Y., & Belkin, N. J. (2008). A faceted approach to conceptualizing tasks in information seeking. Information Processing and Management, 44(6), 1822–1837. https://doi.org/10.1016/j.ipm.2008.07.005 Lovato, S., & Piper, A. M. (2015). “Siri, is this you?”: Understanding Young Children’s Interactions with Voice Input Systems. In Proceedings of the 14th International Conference 61 on Interaction Design and Children - IDC ’15 (pp. 335–338). https://doi.org/10.1145/2771839.2771910 Luger, E., & Sellen, A. (2016). “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI ’16, 5286–5297. https://doi.org/10.1145/2858036.2858288 McGregor, M., & Tang, J. C. (2017). More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’17, 2208–2220. https://doi.org/10.1145/2998181.2998335 Monaghan, L. F., Goodman, J. E., & Robinson, J. M. (2012). A cultural approach to interpersonal communication : essential readings. Wiley-Blackwell. Retrieved from https://books.google.com/books?id=4U7VLOg7J3kC&pg=PA158&lpg=PA158&dq=What+ we+say+and+how+we+say+it+are+chosen+from+a+great+range+of+possibilities&source= bl&ots=7NGxtxDloR&sig=PoUxnBajoMifVJRl2uCvDlCiy0Y&hl=en&sa=X&ved=0ahUK Ewihy6a3rrvaAhVK5oMKHahWBwIQ6AEINzAB#v=onepage&q=What we say and how we say it are chosen from a great range of possibilities&f=false Moore, R. K. (2013). Spoken language processing: Where do we go from here? Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-642-37346-6-10 Moore, R. K. (2017). Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In Lecture Notes in Electrical Engineering (Vol. 999 LNEE, pp. 281–291). https://doi.org/10.1007/978-981-10-2585-3_22 Muramatsu, J., & Pratt, W. (2001). Transparent Queries: investigation users’ mental models of search engines. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 226–233). https://doi.org/http://doi.acm.org/10.1145/383952.383991 Nersessian, N. J. (2002). The cognitive basis of model-based reasoning in science. The Cognitive Basis of Science, 133–153. https://doi.org/10.1017/cbo9780511613517.008 Norman, D. A. (1983). Some Observations on Mental Models. In Mental Models (Vol. 7, pp. 7– 14). https://doi.org/Cited By (since 1996) 346\rExport Date 12 September 2012 Norman, D. A. (2013). The Design of Everyday Things. Revised and expanded edition. Human Factors and Ergonomics in Manufacturing. https://doi.org/10.1002/hfm.20127 Paul, S. (2017). Voice Is the Next Big Platform, Unless You Have an Accent | WIRED. Retrieved March 11, 2018, from https://www.wired.com/2017/03/voice-is-the-next-big- platform-unless-you-have-an-accent/ Porcheron, M., Fischer, J. E., Reeves, S., & Sharples, S. (2018). Voice Interfaces in Everyday 62 Life. Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems. https://doi.org/doi.org/10.1145/3173574.3174214 Rader, E., & Gray, R. (2015). Understanding User Beliefs About Algorithmic Curation in the Facebook News Feed. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15 (pp. 173–182). https://doi.org/10.1145/2702123.2702174 Rader, E., & Slaker, J. (2017). The Importance of Visibility for Folk Theories of Sensor Data. In Symposium on Usable Privacy and Security (SOUPS) 2017. Reeves, S. (2017). Some Conversational Challenges of Talking with Machines. In Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’17 Companion (pp. 431–436). https://doi.org/10.1145/3022198.3022666 Richter, F. (2016). • Chart: What the Amazon Echo Is Actually Used For | Statista. Retrieved April 7, 2018, from https://www.statista.com/chart/6080/amazon-echo-usage/ Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the Web. Journal of the American Society for Information Science and Technology, 53(2), 145–161. https://doi.org/10.1002/asi.10017 Rieh, S. Y. (2004). On the Web at home: Information seeking and Web searching in the home environment. Journal of the American Society for Information Science and Technology. https://doi.org/10.1002/asi.20018 Rong, X., Fourney, A., Brewer, R. N., Morris, M. R., & Bennett, P. N. (2017). Managing Uncertainty in Time Expressions for Virtual Assistants. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17 (pp. 568–579). https://doi.org/10.1145/3025453.3025674 Schamber, L., & Marchionini, G. (1996). Information Seeking in Electronic Environments. Journal of Education for Library and Information Science, 37(1), 81. https://doi.org/10.2307/40324289 Solomon, J., & Wash, R. (2014). Human-what interaction? Understanding user source orientation. In Proceedings of the Human Factors and Ergonomics Society (Vol. 2014– Janua, pp. 422–426). https://doi.org/10.1177/1541931214581088 Sterling, G. (2017). Study: Google Home has more answers than Amazon Echo - Marketing Land. Retrieved April 14, 2018, from https://marketingland.com/amazon-echo-google- home-go-head-head-knowledge-test-208589 Task Scenarios for Usability Testing. (2014). Retrieved March 11, 2018, from https://www.nngroup.com/articles/task-scenarios-usability-testing/ Wash, R., & Rader, E. (2011). Influencing mental models of security. In Proceedings of the 2011 63 workshop on New security paradigms workshop - NSPW ’11 (p. 57). https://doi.org/10.1145/2073276.2073283 Wingfield, N. (2018). Alexa, We’re Still Trying to Figure Out What to Do With You - The New York Times. Retrieved April 7, 2018, from https://www.nytimes.com/2018/01/15/technology/virtual-assistants-alexa.html 64