LARGE LANGUAGE MODEL (LLM)-BASED HEALTH COACHES IN VIRTUAL REALITY 
(VR): EFFECTS OF AI AGENTS’ NONVERBAL BEHAVIOR ON RAPPORT AND HEALTH 
OUTCOMES 

By 

Sue Lim 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Communication – Doctor of Philosophy 

2025 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

We used rapport theory as the framework to examine the effects of LLM-based embodied 

conversational agent (ECA)’s nonverbal behaviors in the context of health coaching. To conduct 

this study, we built two types of LLM-based health coaches in virtual reality using the Unreal 

platform. The little rapport-building health coach displayed minimal nonverbal behavior during 

conversations (e.g., no direct eye contact, no upper body movement) while the more rapport-

building health coach displayed various rapport-building behaviors (e.g., smiling while listening, 

upper body movement while responding). Participants were randomly assigned to one of the two 

types of health coaches and completed six coaching sessions with the same health coach in 

immersive virtual reality (VR).  Findings showed that those who interacted with the more 

rapport-building health coach expressed greater attentiveness across all six sessions (measured 

via ratio of gaze on the health coach). Also, we found attentiveness and subjective rapport during 

the initial interactions (sessions 1 and 2) as the most promising predictors of human clients’ 

overall satisfaction with the intervention at the end of all six sessions. Finally, the results 

indicated that the more people interacted with the health coach, the more they felt they benefited 

from the sessions, with effects even more pronounced for those who interacted with the more 

rapport-building ECA. These findings have significant implications for communication research 

and practice. 

 
 
 
TABLE OF CONTENTS 

SECTION 1: INTRODUCTION...................................................................................................... 1 

SECTION 2: BACKGROUND & CURRENT STUDY..................................................................6 

SECTION 3: METHODS................................................................................................................11 

SECTION 4: RESULTS..................................................................................................................24 

SECTION 5: DISCUSSION........................................................................................................... 33 

SECTION 6: CONCLUSION..........................................................................................................40 

BIBLIOGRAPHY............................................................................................................................41 

APPENDIX A: PROMPTS FOR DEVELOPMENT......................................................................50 

APPENDIX B: SELF-REPORT SURVEY MEASURES..............................................................54 

APPENDIX C: ALL RESULTS FROM MIXED EFFECTS REGRESSION (RQ) .....................56 

  iii 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SECTION 1: INTRODUCTION 

“The construct of rapport is arguably one of the central, if not the central, construct 

necessary to understanding successful helping relationships and to explaining the 

development of personal relationships” (Cappella, 1990, pg. 303) 

Rapport is the key to successful human social interactions. Rapport refers to a harmonious 

relational dynamic that fosters open dialogue, a cooperative atmosphere, and a sense of mutual 

social connectedness, respect, and trust among its members (Bernieri, 1988; Gratch & Lucas, 

2021; Tickle-Degnen & Rosenthal, 1990; Xie & Derakhshan, 2021). Phrased differently, rapport 

represents the extent of interpersonal bond that exists among interacting agents. The process of 

building rapport is communicative in nature, occurring through the exchange of verbal and 

nonverbal cues during social interactions. Furthermore, rapport is strongly linked with 

interaction outcomes including student performance (teacher-student interaction; Bernieri, 1998; 

Frisby & Martin, 2010; Estepp & Roberts, 2015), cooperation (e.g., investigative interviewing; 

Abbe & Brandon, 2013), positive attitudes and intentions to engage in a behavior (e.g., service 

provider-customer interaction; Fatima, 2023), and patient health and adherence to treatment 

(provider-patient interaction; Harrigan et al., 1985; Joe et al., 2001; Leach, 2005). 

In recent years, artificial intelligence (AI)-powered conversational agents (CAs) have 

entered many domains of human life, assuming relational roles such as coaches, companions, 

and customer service agents. CAs refer to machines that mimic human communication 

capabilities. Compared to earlier CAs that produced verbal responses based on users’ selection of 

pre-determined choices or provision of simple and straightforward phrases, large language model 

(LLM)-based CAs like OpenAI’s ChatGPT can process large amounts of user input, detect 

patterns, and generate responses that better resemble natural human dialogue. LLM-based CAs 

  1 

 
 
 
 
can also exhibit empathy (Kang & Hong, 2024; Sorin et al., 2024; Vowels et al., 2024), 

attentiveness (Jo et al., 2024), and other types of rapport-building behaviors while delivering 

personalized information. As a result, LLM-based CAs are increasingly developed and deployed 

to support people’s judgment, decision-making, and well-being (e.g., Alanezi, 2024; Choi et al., 

2024; Jörke et al., 2024; Lim et al., 2024b; Nie et al., 2024; Wu et al., 2024). 

Due to these advancements, human-AI communication is a growing subarea of 

communication research. Existing communication literature on human-AI interaction focuses 

heavily on verbal communication (i.e., the exchange of language-based cues through speech or 

text). However, non-verbal aspects of social interactions provide highly social information 

(Burgoon & Saine, 1978; Burgoon et al., 1984; Patterson, 1982) that has significant implications 

on rapport (Rapport Theory; Tickle-Degnen & Rosenthal, 1990) and relationship formation in 

general. During dyadic interactions, for example, the partners’ body language can convey 

attentiveness (e.g., the body angled toward each other, the posture leaned forward) and express 

positivity (e.g., via head nods or facial smiles). Furthermore, coordinated behaviors such as 

biobehavioral synchrony are a window into relational dynamics like rapport and attachment 

(Feldman, 2017; Tickle-Degnen & Rosenthal, 1990). Studies have found that these behavioral 

dynamics are closely linked with higher perceived relationship quality and health outcomes of 

provider-patient interaction (Goldstein et al., 2020; Ramseyer & Tschacher, 2011). Thus, to fully 

understand human-AI communication processes and relational dynamics, we need to examine 

the effects of LLM-based CAs’ nonverbal rapport-building behaviors as they engage in natural, 

turn-taking dialogue with human clients.  

Methodologically, immersive virtual reality (VR) serves as an effective stimulation tool 

to study the mechanisms underlying human-AI interaction. Immersive VR encompasses three 

  2 

 
 
 
 
key features. First, effective VR technologies like the head mounted displays (HMD) offer a high 

degree of immersion by absorbing the users’ perceptual system and authentically stimulate their 

senses so they feel they are physically in the virtual environment (Bente et al., 2023; Biocca & 

Delaney, 1995). In addition, immersive VR allow for experimental control while maintaining a 

higher level of ecological validity compared to traditional lab studies (Loomis et al., 1999). 

Finally, immersive VR can be integrated with physiological and behavioral measures like eye-

tracking to gather precise information about people’s interaction with the environment and 

stimuli in real-time. Researchers have already implemented immersive VR to examine a wide 

range of human social behaviors, including bystander responses to violence (Slater et al., 2013) 

and proxemics during human-agent interaction (Bailenson et al., 2003; Iachini et al., 2014).  

Our previous work demonstrated the feasibility of studying people’s interactions with AI-

powered health coaches within immersive VR platforms (Lim et al., 2024a; 2024b). Specifically, 

we leveraged the OpenAI’s GPT4 LLM to develop embodied conversational agents (ECAs) that 

engaged in real-time, get-to-know-you and physical health-focused dialogue with basic eye 

contact in immersive VR (Lim_AI1). The results showed that interacting with the LLM-based 

ECAs in VR fostered people’s sense of immersion and presence (i.e., feeling of being in the 

same physical location and having emotional connections with the ECAs; Bente et al., 2023). We 

also found, through eye tracking, that participants tended to pay more attention to the AI health 

coach of the opposite gender during the health-focused conversations. Furthermore, with proper 

instructions, the LLM-based dialogue system exhibited empathic responses (e.g., “I understand 

what you mean”) and other verbal rapport-building behaviors. Overall, the study highlighted 

immersive VR as an effective and flexible platform to examine human-AI interaction in a wide 

range of contexts. 

  3 

 
 
 
 
Expanding on Lim et al. (2024b), this study examines how the AI health coach’s 

nonverbal behaviors influence rapport and the outcomes of human-AI interaction overtime. 

Specifically, participants completed six health coaching sessions with LLM-based ECAs that 

exhibited little vs. more nonverbal rapport-building behaviors (see Figure 1). We first analyzed 

the effects of the AI health coaches’ nonverbal behavior on the human clients’ attentiveness to 

and positivity toward the ECA during initial conversations using rapport theory (Tickle-Degnen 

& Rosenthal, 1990) as the framework, as well as on subjective rapport. Next, we examined 

whether the human clients’ expressions of rapport predicted interaction outcomes. Finally, we 

explored how the relational dynamic and the interaction outcomes developed over the multiple 

sessions. Our study begins to unpack the complex mechanisms underlying successful human-AI 

interaction and informs how to build effective LLM-based ECAs for health support.  

This paper is structured as follows. First, section 2 summarizes relevant literature and 

introduces our hypotheses and research questions. Section 3 details the steps for LLM-based 

health coach development, experimental procedures, and statistical analyses. In addition, we 

outline the results of the study in Section 4 and discuss their implications in Section 5. 

  4 

 
 
 
 
 
 
 
 
 
Figure 1. Conceptual Illustration of the Study. The study examined the effect of the LLM-based 

Health Coaches’ nonverbal rapport-building behavior on human clients’ behavioral expressions 

of rapport, subjective rapport, and interaction outcomes. 

  5 

 
 
 
 
 
 
 
 
SECTION 2: BACKGROUND & CURRENT STUDY 

Rapport has been widely studied in health-related support contexts such as therapy, 

counselling, coaching, and healthcare provider-patient communication (Tickle-Degnen & 

Rosenthal, 1990). Existing literature found that rapport is expressed in varying ways. First, 

individuals’ behaviors during social interactions can depict the extent of their rapport. Tickle-

Degnen and Rosenthal’s (1990) rapport theory posits that during dyadic interactions, interactants 

express rapport through three nonverbal categories: mutual attentiveness, positivity, and 

coordination. Mutual attentiveness refers to how much people engage with one another during 

the conversation. Positivity broadly captures mutual experience of positive affect such as 

friendliness and caring. Finally, dyads exhibit rapport through behavioral coordination - like 

partners dancing (Gratch & Lucas, 2021). Rapport theory suggests that the relative importance of 

the three categories differ based on the stages of the relationship: Positivity most important 

during the early stages of rapport development, coordination most important during relationship 

maintenance stage, and attentiveness important throughout the stages.  

While dyadic human-to-human interaction features mutual engagement, positivity, and 

coordination, human-AI communication differs because AI – at least at this point in time – does 

not have the full capacity to exhibit natural social behavior like humans. Some researchers have 

worked on building adaptive ECAs that use machine learning algorithms to mimic the human’s 

behavior during the interaction (e.g., Woo et al., 2024), but those systems show limitations. Still, 

existing literature also shows that ECAs conveying rapport-building nonverbal cues such as 

smiling can elicit similar behavior in human interactants and lead to rapport (Cassell & 

Thorisson, 1999; Krämer et al., 2013). This is because humans are innately social, and they 

naturally adjust their behavior in response to the other, even when interacting with strangers 

  6 

 
 
 
 
(Chartrand & Bargh, 1999; Feldman, 2017). Thus, this study focuses on how the nonverbal 

rapport-building behaviors of the LLM-based ECAs elicit human users’ attentiveness and 

positivity during the interactions1 (see Figure 2 for the summary of our hypotheses and research 

questions). 

Figure 2. Overview of the Current Study Design. The hypotheses focused on the initial 

interactions with the LLM-based health coach.  

Effect of Nonverbal Behavior and Rapport 

One of the major behavioral markers of attentiveness is gaze. Humans use gaze to 

communicate their level of interest and engagement when interacting with others. In fact, 

Kleinke’s (1986) survey of literature found that participants, especially those from Western 

cultures, use gaze and head orientation toward the other during social interactions as cues to 

make judgments about their attentiveness. Other studies demonstrated that patterns of gaze, such 

as making and breaking eye contact, during social interactions signaled overt attention, 

interpersonal closeness, and other relational cues (Guo et al., 2023; Hessels et al., 2017; Ho et 

al., 2015; Schmälzle et al., 2024; Wohltjen & Wheatley, 2021; 2024). As a result, human-

1 Since the interactions are occurring in the beginning stages of human-AI rapport-building, this study does not 
examine coordination. 

  7 

 
 
 
 
 
 
 
computer interaction researchers have also used gaze to examine people’s attention to 

conversational agents (Amorese et al., 2022; Nakano & Ishii, 2010; Rehm & André, 2005; Robb 

et al., 2023). Building on these studies, we make the following prediction: 

H1: The nonverbal rapport-building behavior of the LLM-based embodied 

conversational agent will increase human clients’ attentiveness (measured via gaze) to the 

conversation. 

For positivity, the human clients’ paraverbal and verbal responses to the LLM-based 

ECA can serve as signals of positive affect during social interactions. Like gaze communicates 

interest and engagement, humans also express affect and other relational factors through the 

voice (e.g., tone, picture, speech rate). Studies found that the combinations of vocal cues can 

predict affective states (Gobl & Chasaide, 2003; Johnson et al., 1986; Scherer et al., 1973) as 

well as attributions of speaker characteristics (Apple et al., 1979; Scherer et al., 1973; Schroeder 

& Epley, 2015). As a result, human-computer interaction literature has used vocalics to examine 

people’s social responses to agents (Cerekovic et al., 2016; Elkins et al., 2012; Elkins & Derrick, 

2013; Nunamaker et al., 2011). In addition, linguistic markers in verbal behaviors also indicate 

sentiment (Clavel & Callejas, 2015; Lepper & Mergenthaler, 2007; Lubis et al., 2019; 

Mergenthaler, 2008; Ranjbartabar et al., 2019; Santos et al., 2020). Thus, we predict the 

following between LLM-based ECAs’ nonverbal rapport-building behavior and human clients’ 

behavior:  

H2a: The nonverbal rapport-building behavior of the LLM-based embodied 

conversational agent will increase human clients’ positivity (measured via sentiment analysis of 

the paraverbal responses) during the conversation. 

H2b: The nonverbal rapport-building behavior of the LLM-based embodied 

  8 

 
 
 
 
conversational agent will increase human clients’ positivity (measured via sentiment analysis of 

the verbal responses) during the conversation. 

Beyond the behavioral markers, human clients’ perceptions of the interaction and the 

connection with the LLM-based ECAs post interaction indicates the magnitude of rapport they 

feel. Existing work in human-computer interaction have examined how the ECAs’ nonverbal 

behaviors influence people’s experience of rapport (e.g., Gratch et al., 2006; Huang et al., 2011). 

Some studies found that the ECAs’ nonverbal rapport-building behaviors enhanced people’s 

feelings of rapport with the agent (Karacora et al., 2012; Wang & Gratch, 2009; Wang & Ruiz, 

2021). Thus, we predict the following: 

H3: The nonverbal rapport-building behavior of the LLM-based embodied 

conversational agent will increase human clients’ perception of rapport. 

Relationship Between Rapport and Interaction Outcomes 

As discussed in the introduction, rapport is an essential ingredient in healthcare and 

coaching settings. For example, clinical psychology and psychiatry studies showed that rapport 

(and other similar concepts like therapeutic or working alliance) boosted treatment effectiveness 

by predicting treatment adherence and symptom reduction (Cloitre et al., 2004; Frank & 

Gunderson, 1990; Joe et al., 2001; Krupnick et al., 2006). Similarly, Graßmann et al.’s (2020) 

meta-analysis of the coaching literature found that the coach-coachee working alliance – another 

word for rapport – significantly related to outcomes including satisfaction, perceived coaching 

effectiveness, self-efficacy, as well as knowledge acquisition. Other studies have also 

demonstrated the link between rapport with nonhuman agents and human performance (Karacora 

et al., 2012). We build upon these studies and predict the following: 

H4: Indicators of rapport will be positively associated with interaction outcomes. 

  9 

 
 
 
 
 
H5: Indicators of rapport will be positively associated with satisfaction with the coaching 

intervention. 

Longitudinal Effects of LLM-Based Health Coaches’ Nonverbal Behavior 

In addition, the hypotheses above predict the behaviors and outcomes of the initial 

interactions (first two sessions). However, Tickle-Degnen and Rosenthal (1990) conceptualize 

rapport as a dynamic process that occurs over time. Thus, we examine how the LLM-based 

Health Coaches’ nonverbal rapport-building behaviors influence human clients’ expressions of 

rapport (same variables from H1-3) and interaction outcomes over the six coaching sessions.  

RQ: Does LLM-based health coaches’ nonverbal rapport-building behaviors influence 

human clients’ expressions of rapport and interaction outcomes over time? 

 10 

 
 
 
 
 
 
Participants 

SECTION 3: METHODS 

The study was approved by the local institutional board prior to data collection. The 

sample for the main study comprised 30 participants (Mage = 30.27, SDage = 11.64, 37% self-

identified White or European American, 60% self-identified Female)2. We recruited the 

participants through the local institution’s community participant pool. The inclusion criteria 

included being over the age of 18 and being able to speak and understand English. For 

compensation, participants received $40 in cash for completing all six sessions. 

Developing Lim_AI2: LLM-Based ECAs with Enhanced Rapport Building Capabilities 

The development stage of the LLM-based ECAs included building the prototype and 

conducting a pilot study with a smaller sample. To develop the prototype, we integrated multiple 

AI systems with the Unreal Game Engine through Convai, a conversational AI platform (Convai, 

2025). For the dialogue system that verbally builds rapport with people, we created the prompt 

based on motivational interviewing (MI) principles. MI conceptualizes motivation as a process 

rather than a trait (Miller, 1983) and uses a client-centered approach to help people want to 

change and commit to goals toward the change (Hettema et al., 2005). It is a widely used 

coaching style and has shown to improve various physical and psychological conditions (Rubak 

et al., 2005). As a result, researchers have implemented MI techniques in CA design (He et al., 

2022; Jörke et al., 2024; Kumar et al., 2024; Olafsson et al., 2020; Samrose & Hoque et al., 

2022; Schulman et al., 2011; Smriti et al., 2022; Steenstra et al., 2024). We referenced the MI-

based prompt used in Steenstra et al. (2024) as the template for our health coach (see Appendix 

A for prompts we used).  

2 The final sample excluded one participant because they did not complete all six sessions. 

 11 

 
 
 
 
 
 
Integrating Dialogue System with Varying Nonverbal Rapport 

See Figure 3 for a summary of the technical configuration of the LLM-based ECA. We 

created a female avatar using the Metahuman platform and imported it into the Unreal engine. 

Then, we selected OpenAI’s GPT-4o as the LLM, Microsoft Azure’s Cora Female Multilingual 

Voice3 as the text-to-speech (TTS) model, and Convai’s speech-to-text (STT) capabilities to 

power the avatar. For the pilot version, we created a simple and empty office environment in 

Unreal and placed the female avatar in a seating position behind a desk. Finally, we equipped the 

female avatar with two levels of nonverbal rapport-building behaviors: Little vs. More. The two 

levels of nonverbal rapport-building behavior levels exhibit basic lip sync and eye blink. 

However, they differed in the following ways: 

1.  Gaze: More made eye contact with the human clients vs. Little looked slight downward 

and did not follow the human clients’ movements. 

2.  Listening Behavior: More smiled and showed slight upper body movement vs. Little did 

not have any facial expressions and sat still with no upper body movement while the 

human client spoke. 

3.  Talking Behavior: More used upper body gestures vs. Little sat still with arms placed to 

the side while talking.  

3 For the pilot, we originally used OpenAI’s Nova voice as the text-to-speech. However, due to technical issues and 
the latency, we switched the voice to Microsoft Azure’s Cora female voice midway through the pilot.  

 12 

 
 
 
 
 
Figure 3. Illustration of Human-LimAI2 Interaction in Immersive VR. First, the participant 

enters the virtual environment wearing the Meta Quest Pro headset and speaks to the LLM-

based health coach (named Dr. Lauren Smith). The participant’s words are converted into text 

via the Convai system, and the converted text is then inputted as the prompt into ChatGPT. The 

response generated by ChatGPT is further processed through Microsoft Azure’s text-to-speech 

(TTS) system and inputted into the LLM-based health coach. Finally, the LLM-based health 

coach responds to the participants with either little or more nonverbal rapport building 

behaviors, depending on the random assignment at the beginning of the study.  

After we developed the prototypes of the LLM-based ECAs, we conducted a pilot 

evaluation to ensure the participants noticed the nonverbal rapport-building behavior 

manipulations and make any other modifications as needed. A total of 16 people participated in 

the pilot evaluation (Mage = 26.13, SDage = 9.37, 38% self-identified White or European 

American, 63% self-identified Female). We recruited the participants through the local 

institution’s community participant pool and word-of-mouth. The participants received the same 

 13 

 
 
 
 
 
cash compensation as the main study participants for completing all six sessions. The pilot 

showed that the LLM-based ECAs’ seating position covered the varying nonverbal behaviors at 

times, and the participants’ seating position prevented active nonverbal expression. Furthermore, 

the simple office environment and the seated position established a task-focused environment 

that sometimes appeared rigid. Applying the findings from the pilot study, we made the 

following modifications to the main study version of the LLM-based health coach in VR. First, 

we changed the environment from a simple office to a gym by importing the gym asset from the 

Unreal Engine’s Fab marketplace (Big-G, 2024). Next, the interaction occurred standing up 

rather than sitting down to enhance the interaction (see Figure 4).  

 14 

 
 
 
 
 
Figure 4. Pilot and Main Study Versions of Human-AI interaction. For the main study, the 

environment changed from an empty office to a gym, with the LLM-based health coach standing 

instead of sitting. 

Main Study Experimental Conditions and Procedures 

See Figure 5 for the illustration of study procedures.  

 15 

 
 
 
 
 
 
 
 
Figure 5. Overview of the Study Design. We randomly assigned the participants to one of three 

LLM-based health coaches, varying in the display of nonverbal rapport-building behavior (little 

vs. more vs. text-based chatbot as baseline comparison). Then, they interacted with the health 

coach for six sessions (two sessions per visit to the lab). Participants completed a reflection task 

after each session and the post-task survey at the end of each visit. After the completion of 

session 6, the participants were asked to provide a testimonial about their experience. 

All 30 participants came into the lab three times and completed two coaching sessions 

during each visit. The study had three experimental conditions. In addition to the LLM-based 

ECAs with little vs. more nonverbal rapport-building behaviors, we created a purely text-based 

LLM-based health coach using the OpenAI GPT platform for baseline comparison. We used the 

 16 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
same instructions and knowledge base to create the LLM-based health coach. To remove as 

many embodiment cues as possible, we used an image of a leaf as the icon. During the main 

study, participants were randomly assigned to one of the three LLM-based health coaches (10 

participants for each group) and completed the six sessions with the same health coach.  

Those assigned to the LLM-based ECA followed the following experimental procedures. 

On day 1, participants came into the lab, consented to the study, and completed a brief pre-

survey that included questions about demographics and health consciousness. Then we provided 

specific instructions for the first session with health coach and asked the participants to write 

down specific health goals or topics to discuss. Next, we helped the participants put on the Meta 

Quest pro headset and calibrated the eye-tracking device. Upon calibrating the eye tracking 

system, the participants completed a demo task by entering the virtual gym and talking with a 

receptionist without any nonverbal rapport-building behaviors. This demo task familiarized the 

participants with the technical aspects of the interactions.  

After the demo task, participants engaged in the first session task, which involved 

discussing their health goals with the health coach for 5 minutes (session 1). We, then, instructed 

the participants to journal about the content of the session (e.g., What did you talk about? What 

was helpful? What could have been better?) and goals they want to discuss with the health coach 

during the next session (reflection task). For session 2, we calibrated the headset again, and the 

participants completed the second session task and reflection task. Finally, participants 

participated in post-session tasks (a brief interview and a post-session questionnaire about their 

experience).  

On days 2 and 3, participants first responded to a brief pre-session questionnaire that 

asked how much they worked on the goals they previously discussed with the health coach. Then 

 17 

 
 
 
 
we repeated the headset calibration routine, session tasks (sessions 3 and 4 on day 2 and sessions 

5 and 6 on day 3), and the reflection tasks from day 1. At the end of the last session (session 6), 

we asked the participants to provide a brief verbal testimonial about their experience. Finally, we 

fully debriefed the participants about the purpose of the study. Those assigned to the text-based 

LLM-based health coach followed the same procedures as above without the calibration routine 

and the demo task.  

Measures and Data Analysis  

See Appendix B for the survey measures we used. 

Measures of Human Clients’ Attentiveness and Positivity 

Attentiveness. We collected gaze information from the Meta Quest pro headset through 

the Unreal Engine’s OpenXR plugin (see Figure 6). Specifically, we tracked which objects in the 

environment people looked at during each session and calculated the ratio of gaze on the LLM-

based ECA (ratio = number of times gaze on the health coach was detected / total number of 

gaze toward objects detected).  

Positivity. In addition to the tracking data from the headset, we also collected audio 

recordings and transcripts of the conversations. For paraverbal positivity, we first diarized – or 

split – the audio recordings from each session by speaker turn using Hugging Face’s 

pyannote.audio Python package (Bredin, 2023; Plaquet & Bredin, 2023). Then we conducted 

speech emotion recognition for the human clients’ diarized audio files using audEERING’s 

open-source AI models (Wagner et al., 2023). This process assigned valence scores from 0 

(negative) to 1 (positive) to each participant turn. Finally, the valence scores were averaged for 

each participant. 

For verbal positivity, we followed a similar process using the conversation transcripts 

 18 

 
 
 
 
 
from each session. First, we split speaker turns from each session transcript and assigned valence 

scores (from -1 to 1) to the participants’ responses using natural language processing (VADER 

Sentiment Analysis Tool; Hutto & Gilbert; 2014). We specifically selected VADER because of 

its capabilities to calculate valence scores rather than simply classifying the valance as negative, 

neutral, or positive. Next, we averaged the valence scores of the responses for each session. 

Figure 6. Technological Setup to Study Human Clients’ Nonverbal Rapport-Building Behavior. 

Measures of Subjective Rapport and Interaction Outcomes 

Human Clients’ Subjective Rapport. The way in which rapport has been measured via 

self-report varies somewhat across studies and researchers. However, across different studies, 

items used to indicate rapport cluster around concepts such as social harmony, warmth, 

coordination, and cooperation. Therefore, we adopted perceptions of the interaction ratings from 

Lim et al. (2024a) and perceived social presence scale from Bente et al. (2023) to represent 

various aspects of rapport, as defined in the introduction. For the perceptions of the interaction 

ratings, participants evaluated the following statements from 1 (strongly disagree) to 7 (strongly 

agree): “We had good rapport, “The interaction was harmonious,” “The interaction was 

cooperative,” “The interaction was coordinated,” “The interaction was warm,” and “The 

interaction was friendly.” We averaged the ratings from these six items to form the experienced 

 19 

 
 
 
 
 
rapport measure and found good reliability across all conditions and sessions (Cronbach’s alpha 

= .83; M = 5.40; SD = .92). 

We also adopted the perceived social presence scale to examine how people experienced 

their encounter with the LLM-based ECA through VR. Human-computer interaction studies have 

generally considered perceived rapport and social presence as separate concepts (e.g., von der 

Pütten et al., 2012). However, in contrast to the current study, those studies featured limited 

immersion and turn-taking interactions. Since this study examines human-AI interaction in 

immersive environments, we use social presence to signal a facet of rapport-building in this 

specific context. The scale asked participants to rate six statements from 1 (Strongly disagree) to 

5 (Strongly agree): “I was attentive to the [body language/language]4 of the [avatars/health 

coach,” “I had the sensation that the [avatar/health coach] could also see me,” “It felt as if I 

could interact with the [avatar/health coach],” “I was aware of the [avatars’/health coach’s] 

moods,” “I could feel what the [avatars/health coach] felt,” and “The [avatars in the virtual 

environments were/health coach was] engaging.” The six items exhibited good reliability across 

all conditions and sessions (Cronbach’s alpha = .83; M = 2.93; SD = .96). 

Interaction Outcomes. We measured two types of outcomes: health-related outcomes 

and session evaluations. Health-related outcomes included behavioral intentions to work on goals 

discussed with the health coach (Schwarzer, 2008), response efficacy (Witte, 1994), and self-

efficacy (Bandura, 1982), all rated on a 5-point Likert scale (1 – Strongly Disagree, 5 – Strongly 

Agree). For behavioral intentions, we first asked participants, “What is one specific goal you 

discussed with the AI health coach?”. Then participants rated two items, “I intend to work on the 

4 The language of the scale differed slightly depending on the condition. Those interacting with the LLM-based 
ECA saw the first part of the bracket (e.g., “I was attentive to the body language of the avatars”) while the rest saw 
the second part of the bracket (e.g., “I was attentive to the language of the health coach”). 

 20 

 
 
 
 
 
goal I wrote above within the next [month/two weeks].” The two items exhibited good reliability 

(Cronbach’s alpha = .86; M = 4.27; SD = .63). 

The modified response efficacy scaled comprised three statements: “The health coach’s 

recommendations work in helping me reach my health goal,” “Following the health coach’s 

recommendations is effective in helping me reach my health goal,” and “If I follow the health 

coach’s recommendations, I am more likely to reach my health goal.” Finally, the modified self-

efficacy scale asked participants to rate three sentences: “I am able to follow the health coach’s 

recommendations to reach my health goal,” “The health coach’s recommendations are easy to 

follow to reach my health goal,” and “Following the health coach’s recommendations is 

convenient.” Both scales exhibited good to high reliability (Cronbach’s alphaResEfficacy = .88; 

MResEfficacy = 4.21; SDResEfficacy = .55; Cronbach’s alphaSelfEfficacy = .91; MSelfEfficacy = 4.13; 

SDSelfEfficacy = .73). 

To measure the outcomes of the coaching sessions, we adopted and modified the Session 

Impacts Scale (SIS; Elliott & Wexler, 1994; Stiles et al., 1994). This scale has been widely used 

to study the effectiveness of health interventions (e.g., Ackerman et al., 2010; Lingiardi et al., 

2017; Simpson & Reid, 2014). For this measure, participants rated how much they agreed with 9 

statements about the sessions’ helpfulness (e.g., “I know have new insight about myself or have 

understood something new about me,” “I now feel supported, reassured, confirmed, or 

encouraged by the health coach”) and 6 statements about the hindering impact (“I feel the health 

coach is cold, bored, or doesn't care about me”) from 1 (Not at all) to 5 (Very much)5. Both 

subscales exhibited good to high reliability (Cronbach’s alphahelpful = .94; MHelpful = 3.34; 

SDHelpful = 1.19; Cronbach’s alphaHindering = .79; MHindering = 1.48; SDHindering = .80). 

5 We excluded one item, “As a result of this session, I now feel relief from uncomfortable or painful feelings,” from 
the originally scale because the sessions were not specifically focused on therapy.  

 21 

 
 
 
 
 
Overall Satisfaction with the Intervention. Finally, we measured satisfaction in two 

ways. First, we asked the participants how likely they are to recommend the health coach to 

others, on a scale from 1 (would never recommend to anyone) to 10 (would recommend to 

everyone now). Second, we recorded a brief testimonial from each participant about their 

experience and extracted the verbal and paraverbal valence scores using VADER Sentiment 

Analysis Tool and audEERING’s AI models (see the Positivity subsection above).  

Data Analysis 

We used R and Python for data cleaning and analyses. To test hypothesis 1 (H1), we first 

averaged the ratio of gaze on the health coach calculated for session 1 and 2 for each participant. 

Next, we fitted a beta regression model (R betareg package; Cribari-Neto & Zeileis, 2010) to 

examine whether the health coach’s nonverbal rapport-building behavior influenced people’s 

gaze toward the health coach. For H2, we averaged the valence scores of the responses for 

sessions 1 and 2 and used an independent t test to test whether the health coach’s nonverbal 

rapport-building behaviors increased the positive sentiment detected in participants’ paraverbal 

and verbal responses. In addition, we also used the independent t-test to understand the influence 

of the health coach’s nonverbal rapport-building behavior on the participants’ subjective rapport 

(H3). The last analysis method involved fitting multiple regression models6 to explore whether 

the participants’ attentiveness, positivity, and subjective rapport can predict interaction outcomes 

and overall intervention satisfaction (H4 and H5).  

To explore the RQ about how rapport builds overtime, we first averaged the gaze, verbal 

valence, and paraverbal valence data for sessions 3 and 4, and sessions 5 and 6. Then we fitted 

6 A beta regression model was fitted to examine how attentiveness, positivity, and subjective rapport predicted 
people’s paraverbal expression of satisfaction with the intervention. Multiple linear regression models were fitted for 
all other dependent variables.  

 22 

 
 
 
 
 
 
mixed effects models7 for each rapport dimension (attentiveness, positivity, and subjective 

rapport) and interaction outcomes. LLM-based ECA’s nonverbal rapport-building behaviors 

(little vs. more), the visit to the lab (visit1: sessions 1 & 2; visit 2: sessions 3 & 4; visit 3: 

sessions 5 & 6), and the interaction of the two variables were the main predictors. We allowed 

the intercepts to vary by participant to account for the repeated measure design.  

7 We fit mixed effects beta regression models (glmmTMB R package; Brooks et al., 2017) for gaze and paraverbal 
positivity since they are bounded variables with values between 0 and 1. Mixed effects linear effects models (lme4 R 
pckage; Bates et al., 2010) were fitted for all other dependent variables.  

 23 

 
 
 
 
 
 
 
SECTION 4: RESULTS 

H1-3: Effect of LLM-Based Health Coaches’ Nonverbal Behavior on Rapport 

Hypotheses 1-3 examined the effects of LLM-based ECAs’ nonverbal rapport-building 

behaviors on human clients’ expression of rapport: attentiveness, positivity, and subjective 

rapport (see Figure 7). First, people who interacted with the LLM-based ECA with more 

nonverbal rapport-building behaviors gazed at the health coach more than those who interacted 

with the little rapport-building LLM-based ECA (MMore = .77, SDMore = .11; MLittle = .29, SDLittle 

= .15; χ2(1) = 58.03, p < .001; see Table 1). This suggested that the LLM-based ECAs’ 

nonverbal rapport building behaviors increases people’s attentiveness to the ECA, supporting 

H1.  

Table 1. Gaze on Health Coach by the Health Coaches’ Nonverbal Rapport-Building Behavior 

Intercept 

More Rapport-Building Behavior (vs. Little) 

Note. S.E. = Standard Error 

Estimate 

S.E. 

z 

p-value 

-.90 

2.06 

.18 

.27 

-4.91 

7.62 

<.001 

<.001 

However, the results from the independent samples t-test showed that LLM-based health 

coaches’ nonverbal rapport-building behavior did not lead to significant differences in human 

clients’ paraverbal and verbal positivity and subjective rapport (see Table 2). Thus, H2a, H2b, 

and H3 were not supported.  

 24 

 
 
 
 
 
 
 
 
 
Figure 7. Effect of the LLM-Based ECAs’ Nonverbal Rapport-Building Behaviors on the 

Human Clients’ Attentiveness, Positivity, and Subjective Rapport. 

Table 2. Positivity and Subjective Rapport by ECAs’ Nonverbal Rapport-Building Behavior 

Little 

More 

Mean 

SD 

Mean 

SD 

t 

p-value 

Positivity 

       Verbal Positivity 

       Paraverbal Positivity 

.38 

.51 

.08 

.04 

Subjective Rapport 

       Experienced Rapport 

5.6 

.87 

       Social Presence 

2.95 

1.18 

.32 

.52 

5.07 

2.93 

.14 

.09 

.61 

.64 

-1.13 

.34 

-1.59 

-.04 

.28 

.74 

.13 

.97 

Note. SD = Standard Deviation 

 25 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
H4-5: Effect of Human Clients’ Expressions of Rapport on Outcomes 

Next, H4 focused on the relationship between human clients’ expressions of rapport and 

interactions outcomes, including evaluation of the sessions and health-related factors. First, we 

found that feeling hindered by the session was significantly predicted by human clients’ 

experienced rapport (Estimate = -.21, SE = .09, p = .039; see Table 3). In other words, the more 

rapport human clients perceived during the interaction, the less likely they were to feel hindered 

by the conversations with the health coach. We found similar results for feeling helped by the 

sessions, though this association was directional (Estimate = .70, SE = .35, p = .061).  

Table 3. Influence of Rapport on Evaluation of Coaching Sessions 

Session Impacts: Hindering 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Session Impacts: Helpfulness 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Note. S.E. = Standard Error 

Estimate  S.E. 

t 

p-value 

2.45 

.05 

-.01 

-.48 

.07 

-.21 

.26 

-.39 

-.12 

-2.86 

.21 

.70 

.58 

.28 

.83 

1.52 

.08 

.09 

2.12 

1.02 

3.03 

5.58 

.29 

.35 

4.24 

.17 

-.01 

-.32 

.90 

-2.27 

.12 

-.38 

-.04 

-.51 

.74 

2.03 

<.001 

.87 

.99 

.76 

.38 

.04 

.91 

.71 

.97 

.62 

.47 

.06 

 26 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
For health-related outcomes, none of the expressed rapport variables (attentiveness, 

positivity, and subjective rapport) significantly predicted the health-related factors (self-efficacy, 

response efficacy, and behavioral intentions). However, we found tentative evidence that human 

clients’ subjective rapport directionally influenced their efficacy levels (see Table 4). 

Experienced rapport during the interaction was slightly associated with greater self-efficacy 

(Estimate = .58, SE = .31, p = .082) while perceived social presence was directionally correlated 

with response efficacy (Estimate = .31, SE = .15, p = .062).  

Finally, H5 examined the relationship between human clients’ expressed rapport and 

overall satisfaction with the intervention (completion of all six sessions with the health coach). 

Our results partially supported H5 (see Table 5). We found that the more people experienced 

rapport during the interaction, the more likely they are to recommend the LLM-based ECA to 

others (Estimate = 1.59, SE = .59, p = .017). Furthermore, the ratio of gaze on the health coach 

significantly predicted people’s paraverbal expression of satisfaction with the intervention 

overall (Estimate = .95, SE = .34, p = .005). 

 27 

 
 
 
 
 
 
 
 
 
 
 
 
 
Table 4. Influence of Rapport on Health-Related Outcomes 

Self-Efficacy 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Response Efficacy 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Behavioral Intentions 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Note. S.E. = Standard Error 

Estimate  S.E. 

t 

p-value 

1.16 

1.91 

.65 

1.26 

-.89 

-.15 

.58 

.91 

2.72 

5.01 

.26 

.31 

3.08 

1.15 

-.28 

-.15 

1.24 

.31 

-.06 

2.49 

-.14 

.51 

2.25 

.19 

.01 

.55 

1.64 

3.02 

.15 

.19 

1.54 

.74 

2.21 

4.06 

.21 

.25 

.61 

.72 

.46 

-.18 

-.58 

1.88 

2.68 

-.51 

-.09 

.41 

2.03 

-.30 

1.62 

-.19 

.23 

.55 

.93 

.04 

.55 

.49 

.65 

.86 

.57 

.08 

.02 

.62 

.93 

.69 

.06 

.77 

.13 

.86 

.82 

.59 

.37 

.97 

 28 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table 5. Influence of Rapport on Overall Satisfaction with the Intervention 

Likelihood to Recommend 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Paraverbal Expression of Satisfaction 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Verbal Expression of Satisfaction 

       Intercept 

       Gaze on Health Coach 

       Positivity (Verbal Positivity) 

       Positivity (Paraverbal Positivity) 

       Subjective Rapport (Social Presence) 

       Subjective Rapport (Experienced) 

Note. S.E. = Standard Error 

Estimate  S.E. 

Statistic 

p-value 

-2.72 

.18 

-3.04 

1.96 

.54 

1.59 

-2.32 

.95 

1.27 

2.16 

-.09 

.14 

.42 

-.12 

-.06 

-.35 

.06 

.01 

3.62 

1.73 

5.17 

9.51 

.49 

.59 

.73 

.34 

1.02 

1.88 

.10 

.12 

.28 

.13 

.39 

.72 

.04 

.04 

t = -.75 

t = .10 

t = -.59 

t = .21 

t = 1.12 

t = 2.70 

z = -3.19 

z = 2.80 

z = 1.24 

z = 1.15 

z = -.95 

z = 1.24 

t = 1.54 

t = -.92 

t = -.15 

t = -.49 

t = 1.56 

t = .23 

.47 

.92 

.57 

.84 

.28 

.02 

.001 

.005 

.21 

.25 

.34 

.21 

.15 

.38 

.88 

.63 

.14 

.82 

 29 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
RQ: Longitudinal Effects of LLM-Based Health Coaches’ Nonverbal Behavior 

We also explored how the health coaches’ nonverbal rapport-building behaviors 

influenced human clients’ expressions of rapport as well as the interaction outcomes over time 

(see Figure 8). For attentiveness, we found significant main effect of LLM-based ECA’s 

nonverbal rapport building behavior (χ2(1) = 60.81, p < .001; see Table 6). This indicated that 

human clients who interacted with the more rapport-building health coach expressed greater 

attentiveness during the interaction compared to those who interacted with the little rapport-

building health coach across the sessions.  

Figure 8. Effect of the LLM-Based ECAs’ Nonverbal Behaviors on Attentiveness, Positive 

Evaluation of the Session, and Perceived Rapport Over Time. Visit 1 includes Sessions 1 and 2; 

Visit 2 includes Sessions 3 and 4; Visit 3 includes Sessions 5 and 6. The gray lines indicate the 

results from the Chatbot condition and is used as the baseline visual comparison. 

 30 

 
 
 
 
 
 
Table 6. Effect of the LLM-Based ECAs’ Nonverbal Behaviors on Attentiveness Over Time 

(Analysis of Deviance, Type III Test) 

Intercept 

Nonverbal Rapport-Building Behavior 

Time (i.e., Lab Visit 1, 2, or 3)  

Nonverbal Rapport-Building Behavior x Time 

Note. df = Degree of Freedom 

χ2 

df 

p-value 

23.46 

60.81 

3.91 

1.12 

1 

1 

2 

2 

<.001 

<.001 

.14 

.57 

Furthermore, we found significant main effect of time on the extent people felt helped by 

the sessions (F(2, 36) = 3.45, p = .042; Table 7). In other words, the perceived helpfulness of the 

sessions increased as people continued to interact with the health coach; this effect was more 

pronounced for those who interacted with the more rapport-building ECA (difference between 

Visit 1 and 3 = .59, SE = .23, t = 2.60, p = .013). Finally, the results for social presence (an aspect 

of subjective rapport) suggested a slight interaction between LLM-based health coaches’ 

nonverbal rapport-building behavior and time (F(2, 36) = 2.38, p = .11). Post-hoc analysis 

through estimated marginal means illustrated that those who interacted with the little rapport-

building health coach in VR reported lower levels of social presence during the Visit 3 compared 

to Visit 1 (difference = -.45, SE = .19, t = -2.38, p = .023). LLM-based ECAs’ nonverbal rapport-

building behaviors did not significantly affect other expressions of rapport and interaction 

outcomes (see Appendix C for all results).  

 31 

 
 
 
 
 
 
 
 
 
 
 
 
 
Table 7. Effect of the LLM-Based ECAs’ Nonverbal Behaviors on Positive Evaluation of the 

Session and Social Presence Over Time (Analysis of Variance, Type III Test) 

F 

df 

p-value 

Session Impacts: Helpfulness 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

1.21 

3.45 

(1, 18) 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

.67 

(2, 36) 

Perceived Rapport (Social Presence) 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

.40 

.89 

(1, 18) 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

2.38 

(2, 36) 

.29 

.04 

.52 

.53 

.42 

.11 

Note. df = Degree of Freedom 

 32 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SECTION 5: DISCUSSION 

Our study examined how rapport between LLM-based health coaches and human clients 

form and influence interaction outcomes. Specifically, we first examined the effects of LLM-

based health coaches’ nonverbal rapport-building behaviors on human clients’ expressions of 

rapport (i.e., attentiveness and positivity during the interaction, subjective rapport after the 

interaction). Next, we studied the relationship between human clients’ expressions of rapport and 

interaction outcomes (i.e., health-related factors: self-efficacy, response efficacy, behavioral 

intentions; overall satisfaction: likelihood to recommend, paraverbal and verbal valence in 

testimonial). Lastly, we illustrated how the indicators of rapport trended over the six coaching 

sessions. These findings set the foundation for future research in human-AI interaction. 

Main Findings 

Overall, the results partially supported our predictions.  

Prediction 1: LLM-Based ECAs’ Nonverbal Rapport-Building Behavior Enhances Rapport 

As we predicted, results showed that those who interacted with the more rapport-building 

health coach showed greater attentiveness (i.e., looked at the health coach more) during the 

initial sessions compared to those who interacted with the little rapport-building health coach. 

This aligned with existing research about people’s natural tendency to adapt to the others’ 

behavior (Chartrand & Bargh, 1999; Feldman, 2017; Tickle-Degnen, & Rosenthal, 1990), even 

when the other interactant is not human (Bushmeier & Kopp, 2018; Gratch et al., 2007).  

Positivity (i.e., verbal and paraverbal positivity) and subjective rapport (i.e., experienced 

rapport and social presence), on the other hand, did not significantly differ by LLM-based health 

coaches’ nonverbal behavior. One potential explanation for this result is the context of the 

interaction with the LLM-based health coach. We purposefully instructed the participants not to 

 33 

 
 
 
 
 
share any personal information for privacy reasons. Therefore, the conversations with the health 

coach tended to focus on physical health related topics such as adding certain nutrients to daily 

meals, improving exercise regiment, and stress management activities. It is possible that we 

could have found different results had the conversation involved more emotional or relational 

topics. Another possible explanation is the length of the interaction. In human-to-human 

coaching contexts, we would expect each session to last about 30 minutes to an hour at the 

minimum. This time frame allows for the health coach and the participants to investigate certain 

topics in depth. Thus, the 5-minutes long sessions with the LLM-based health coaches may not 

have provided enough time for the complex mechanisms of social interactions to fully unfold.  

The third possible explanation, of course, is the limited realism of the interaction. Though 

we designed the virtual environment as a gym to enhance realism, the current study design did 

not allow for varying interactions, such as the LLM-based health coach showing how to do 

certain exercises. Also, we did not yet equip the LLM-based ECAs’ capability to exhibit natural 

nonverbal behaviors that match the words they say or adapt to the users’ behaviors in real-time. 

In fact, some participants mentioned these limitations during the post-session interviews. 

Existing studies generally support this link between realism and rapport-related factors within 

virtual environments like social presence (Oh et al., 2018).  

Prediction 2: Human Clients’ Expressions of Rapport are Associated with Outcomes 

Through our examination of the relationship between rapport and outcomes (overall 

satisfaction with the intervention, session evaluations, and health-related factors), we found 

attentiveness and subjective rapport as the most promising predictors. For instance, people who 

were more attentive during the initial sessions were more likely to express higher level of 

satisfaction with the intervention in their speech (verbal expression of satisfaction) while sharing 

 34 

 
 
 
 
 
the testimonial. In addition, those who perceived greater levels of harmony, cooperation, and 

warmth during the interactions (experienced rapport) were more likely to recommend the LLM-

based health coach to others after completing all six sessions. These individuals were also less 

likely to feel negative effects from the initial conversations with the health coach (hindering 

session impacts). These results highlight the close link between people’s expressions of rapport 

and their satisfaction with the LLM-based ECA.  

Investigating the Effects of LLM-Based ECAs’ Nonverbal Behavior Over Time 

Lastly, our longitudinal study provided important insights into how LLM-based health 

coaches’ nonverbal behavior influenced rapport and interaction outcomes over the six coaching 

sessions. We found that people who talked with the more rapport-building health coach, on 

average, expressed greater attentiveness during the interaction compared to their counterparts 

across all sessions. This finding strengthens our previous argument that people naturally adjust to 

the other’s nonverbal behaviors (H1), further highlighting the importance of nonverbal cues 

during interactions (Burgoon & Saine, 1978; Burgoon et al., 1984; Patterson, 1982).  

Also, the time people spent with the LLM-based health coach enhanced the perceived 

helpfulness of the sessions, regardless of the health coaches’ nonverbal behavior. In other words, 

the more people interacted with the health coach, the more they felt they benefited from the 

sessions. The effect was even more pronounced among those who interacted with the more 

rapport-building health coach. Interestingly, for those who interacted with the little rapport-

building health coach, their perceived social presence trended downward over time. Thus, our 

results indicate that health interventions employing LLM-based ECAs show promise in 

enhancing long-term interaction outcomes, especially if the ECAs exhibit more natural nonverbal 

rapport-building behaviors. 

 35 

 
 
 
 
Theoretical Implications 

Our study has significant implications for communication research. With the 

development of LLM-based ECAs in VR platforms, people can now converse with an AI face-

to-face in various settings and contexts. During these turn-taking interactions with the ECAs, 

people exchange nonverbal, as well as verbal, cues with the ECAs, eliciting complex social 

interaction patterns and relational dynamics. However, our current understanding of human-AI 

communication draws on studies that used CAs with limited embodiment (e.g., text-based CAs 

like ChatGPT, voice assistants like Amazon’s Alexa) or implemented Wizard-of-Oz type 

approaches8. As a result, many human-AI communication studies apply general theoretical 

approaches and concepts from computer-mediated/interpersonal communication or media effects 

research (e.g., CASA paradigm; Oh & Ki, 2024; AI-mediated communication; Hancock et al., 

2020; expectancy violation theory and social scripts; Lew & Walther, 2023; HAII-TIME model; 

Sundar, 2020).  

While these studies have advanced the sub research area of human-AI communication 

significantly in the last few years, they do not address the multifaceted nature of face-to-face 

interactions with the LLM-based ECAs. Therefore, this study pushes forward the human-AI 

communication research agenda by 1) beginning to unpack the rapport-building processes that 

occur during face-to-face human-AI interaction and 2) examining how this process unfolds over 

multiple conversations.  

Building on rapport theory (Tickle-Degnen & Rosenthal, 1990) and other related works, 

we identified three main ways human clients can express rapport: attentiveness and positivity 

8 Examples of these types of approaches include showing participants the same content created by researchers but 
labeling the source of the message as human or AI. In these studies, people are generally not interacting with a 
conversational agent or AI-generated content. 

 36 

 
 
 
 
 
during the interaction and perceived rapport reported after the interaction. By testing the effect of 

LLM-based ECAs’ nonverbal rapport-building behavior on each of these dimensions of rapport, 

we were able to more precisely investigate the link between the ECAs’ nonverbal behaviors and 

the rapport-building process during the interaction. Also, our examination of the relationship 

between each dimension of rapport and outcomes – namely efficacy, behavioral intentions, 

perceived session impacts, and satisfaction with the intervention – begins to elucidate how LLM-

based ECAs’ nonverbal behaviors may ultimately enhance specific outcomes. This work, then, 

advances recent efforts by communication scholars to understand interpersonal dynamics during 

turn-taking conversations (e.g., dynamic dyadic systems approach; Solomon et al., 2021; social 

gaze patterns between speaker and listener; Schmälzle et al., 2024). 

Furthermore, our evaluation of human-AI interaction over multiple health-focused 

conversations suggested that LLM-based ECAs’ nonverbal rapport-building behavior may have 

significant impacts in the long-term. While we know from every-day experience that rapport is a 

dynamic process that unfolds over various stages of a relationship, limited empirical work about 

human-AI communication implements a longitudinal design. This study substantially contributes 

to a large body of literature highlighting the importance of rapport in relationships (Capella, 

1990; Gratch et al., 2006) by studying the effects of LLM-based ECAs’ behavior over time. 

Finally, the results from this study also inform human-to-human communication 

processes. Concurrent with the trends toward AI, big data, and advances in measurement, there is 

a general trend across disciplines to use simulation to advance theory. In the context of 

communication research, building machines that can communicate naturally via verbal or 

nonverbal channels allows us to study interpersonal communication processes in a more 

controlled way (e.g., by using embodied AI agents as artificial confederates in otherwise hard-to-

 37 

 
 
 
 
control interpersonal settings). More importantly, through simulated interactions, we can gain 

deep insights into the generative mechanisms (e.g., how exactly eye-gaze leads to rapport, or 

which social signals make conversations flow and bring about desired effects). Thus, the 

simulation approach - whether in the form of agent-based modeling at the societal level (e.g., 

Park et al., 2023) or at the level of dyadic social interaction, as studied here, can advance theory 

by uncovering the nuanced mechanisms underlying social interaction as a whole.  

Practical Implications 

Influential LLM-based ECAs, capable of fluent conversation and natural nonverbal 

rapport-building behavior, could have a significant impact across almost all domains of human 

life (e.g., education, customer service, organizational context, health, social support). In 

educational settings, for example, LLM-based ECAs can serve as engaging teaching assistants or 

tutors, using nonverbal cues to deliver content more effectively. The ECAs can also act as 

receptionists, administrative assistants, and facilitators of virtual meetings, conferences, or even 

business negotiations. Within health and support contexts, LLM-based ECAs can act as doctors, 

coaches, or peer supporters who provide services that augment the work of human professionals. 

As the current limitations of the LLM-based ECAs (e.g., latency in responses, privacy concerns) 

improve and extended reality systems such as VR, augmented reality, and mixed reality further 

advance, we can expect that LLM-based ECAs will more widely implemented into these 

domains of human life.  

Limitations and Future Directions 

 This study has a few limitations. First, for feasibility reasons, we had a small sample size 

(about 10 people per condition). Also, this study focused on health coaching contexts and 

specifically asked participants not to disclose personal information, leading to relatively 

 38 

 
 
 
 
 
information-focused interactions. Finally, certain technological limitations (e.g., brief internet 

outage, bug in the AI software) could have potentially interfered at random moments of people’s 

interactions with the LLM-based health coach. Future studies should replicate and extend our 

findings in different contexts and with larger sample sizes. 

 39 

 
 
 
 
 
 
 
SECTION 6: CONCLUSION 

We used rapport theory as the framework to examine the effects of LLM-based embodied 

conversational agent (ECA)’s nonverbal behaviors in the context of health coaching. To conduct 

this study, we built two types of LLM-based health coaches in virtual reality using the Unreal 

platform. The little rapport-building health coach displayed minimal nonverbal behavior during 

conversations (e.g., no direct eye contact, no upper body movement) while the more rapport-

building health coach displayed various rapport-building behaviors (e.g., smiling while listening, 

upper body movement while responding). Participants were randomly assigned to one of the two 

types of health coaches and completed six coaching sessions with the same health coach in 

immersive virtual reality (VR).  Findings showed that those who interacted with the more 

rapport-building health coach expressed greater attentiveness across all six sessions (measured 

via ratio of gaze on the health coach). Also, we found attentiveness and subjective rapport during 

the initial interactions (sessions 1 and 2) as the most promising predictors of human clients’ 

overall satisfaction with the intervention at the end of all six sessions. Finally, the results 

indicated that the more people interacted with the health coach, the more they felt they benefited 

from the sessions, with effects even more pronounced for those who interacted with the more 

rapport-building ECA. These findings have significant implications for communication research 

and practice. 

 40 

 
 
 
 
 
 
BIBLIOGRAPHY 

Abbe, A., & Brandon, S. E. (2013). The role of rapport in investigative interviewing: A review. 

Journal of Investigative Psychology and Offender Profiling, 10(3), 237-249. 
https://doi.org/10.1002/jip.1386 

Ackerman, J. M., Nocera, C. C., & Bargh, J. A. (2010). Incidental haptic sensations influence 

social judgments and decisions. Science, 328(5986), 1712-1715. 

Alanezi, F. (2024). Examining the role of ChatGPT in promoting health behaviors and lifestyle 

changes among cancer patients. Nutrition and Health, 02601060241244563. 
https://doi.org/10.1177/02601060241244563 

Amorese, T., Greco, C., Cuciniello, M., Buono, C., Palmero, C., Buch-Cardona, P., Escalear, S., 
Torres, M. I., Cordasco, G., & Esposito, A. (2022, June). Using eye tracking to 
investigate interaction between humans and virtual agents. In 2022 IEEE Conference on 
Cognitive and Computational Aspects of Situation Management (CogSIMA) (pp. 125-
132). IEEE. 

Apple, W., Streeter, L. A., & Krauss, R. M. (1979). Effects of pitch and speech rate on personal 

attributions. Journal of Personality and Social Psychology, 37(5), 715. 

Bailenson, J. N., Blascovich, J., Beall, A. C., & Loomis, J. M. (2003). Interpersonal distance in 
immersive virtual environments. Personality and Social Psychology Bulletin, 29(7), 819-
833. 

Bandura, A. (1982). Self-efficacy mechanism in human agency. American Psychologist, 37(2), 

122. 

Bente, G., Schmälzle, R., Jahn, N. T., & Schaaf, A. (2023). Measuring the effects of co-location 
on emotion perception in shared virtual environments: An ecological perspective. 
Frontiers in Virtual Reality, 4, 1032510. 

Bernieri, F. J. (1988). Coordinated movement and rapport in teacher-student interactions. 

Journal of Nonverbal Behavior, 12(2), 120-138. https://doi.org/10.1007/BF00986930 

BIG-G. (2024). gym. https://www.fab.com/listings/98430f1e-e527-4594-a1b2-5ca8d8bf9756 

Biocca, F., & Delaney, B. (1995). Immersive virtual reality technology. Communication in the 

age of Virtual Reality, 15(32), 10-5555. 

Bredin, H. (2023, August). pyannote. audio 2.1 speaker diarization pipeline: Principle, 

benchmark, and recipe. In 24th INTERSPEECH Conference (INTERSPEECH 2023) (pp. 
1983-1987). ISCA. 

Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., 

Skaug, H. J., Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and 
flexibility among packages for Zero-inflated Generalized Linear Mixed Modeling. The R 

 41 

 
 
 
 
Journal, 9(2), 378-400. https://doi.org/10.32614/RJ-2017-066 

Burgoon, J. K., Buller, D. B., Hale, J. L., & de Turck, M. A. (1984). Relational messages 

associated with nonverbal behaviors. Human Communication Research, 10(3), 351-378. 

Burgoon, J. K., & Saine, T. (1978). The unspoken dialogue: An introduction to nonverbal 

communication. Houghton Mifflin School 

Buschmeier, H., & Kopp, S. (2018, July). Communicative listener feedback in human-agent 

interaction: Artificial speakers need to be attentive and adaptive. In Proceedings of the 
17th International Conference on Autonomous Agents and Multiagent Systems (pp. 1213-
1221). 

Cappella, J. N. (1990). On defining conversational coordination and rapport. Psychological 

Inquiry, 1(4), 303-305. https://doi.org/10.1207/s15327965pli0104_5 

Cassell, J., & Thorisson, K. R. (1999). The power of a nod and a glance: Envelope vs. emotional 
feedback in animated conversational agents. Applied Artificial Intelligence, 13(4-5), 519-
538. https://doi.org/10.1080/088395199117360 

Cerekovic, A., Aran, O., & Gatica-Perez, D. (2016). Rapport with virtual agents: What do human 

social cues and personality explain?. IEEE Transactions on Affective Computing, 8(3), 
382-395. 

Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception–behavior link and 

social interaction. Journal of Personality and Social Psychology, 76(6), 893. 

Choi, R., Kim, T., Park, S., Kim, J. G., & Lee, S. J. (2024). Private yet social: How LLM 

chatbots support and challenge eating disorder recovery. arXiv. 
https://doi.org/10.48550/arXiv.2412.11656 

Clavel, C., & Callejas, Z. (2015). Sentiment analysis: from opinion mining to human-agent 

interaction. IEEE Transactions on Affective Computing, 7(1), 74-93. 

Cloitre, M., Chase Stovall-McClough, K., Miranda, R., & Chemtob, C. M. (2004). Therapeutic 
alliance, negative mood regulation, and treatment outcome in child abuse-related 
posttraumatic stress disorder. Journal of Consulting and Clinical Psychology, 72(3), 411-
416. 

Convai. (2025). Conversational AI characters. https://www.convai.com/ 

Cribari-Neto, F., & Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software, 34, 

1-24. 

Elkins, A. C., & Derrick, D. C. (2013). The sound of trust: voice as a measurement of trust 

during interactions with embodied conversational agents. Group Decision and 
Negotiation, 22(5), 897-913. 

 42 

 
 
 
 
Elkins, A. C., Derrick, D. C., Burgoon, J. K., & Nunamaker Jr, J. F. (2012, January). Predicting 
users' perceived trust in Embodied Conversational Agents using vocal dynamics. In 2012 
45th Hawaii International Conference on System Sciences (pp. 579-588). IEEE. 

Elliott, R., & Wexler, M. M. (1994). Measuring the impact of sessions in process experiential 
therapy of depression: The Session Impacts Scale. Journal of Counseling Psychology, 
41(2), 166. 

Estepp, C. M., & Roberts, T. G. (2015). Teacher immediacy and professor/student rapport as 

predictors of motivation and engagement. Nacta Journal, 59(2), 155-163.  

Fatima, J. K. (2023). Does it matter to have rapport and social interaction on a group tour?. 
Journal of Vacation Marketing, https://doi.org/10.1177/13567667231219024 

Feldman, R. (2017). The neurobiology of human attachments. Trends in Cognitive Sciences, 

21(2), 80-99. 

Frank, A. F., & Gunderson, J. G. (1990). The role of the therapeutic alliance in the treatment of 

schizophrenia: Relationship to course and outcome. Archives of General Psychiatry, 
47(3), 228-236. 

Frisby, B. N., & Martin, M. M. (2010). Instructor–student and student–student rapport in the 

classroom. Communication Education, 59(2), 146-164. 
https://doi.org/10.1080/03634520903564362 

Gobl, C., & Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood 

and attitude. Speech Communication, 40(1-2), 189-212. 

Goldstein, P., Losin, E. A. R., Anderson, S. R., Schelkun, V. R., & Wager, T. D. (2020). 

Clinician-patient movement synchrony mediates social group effects on interpersonal 
trust and perceived pain. The Journal of Pain, 21(11-12), 1160-1174. 

Graßmann, C., Schölmerich, F., & Schermuly, C. C. (2020). The relationship between working 
alliance and client outcomes in coaching: A meta-analysis. Human Relations, 73(1), 35-
58. 

Gratch, J., & Lucas, G. (2021). Rapport between humans and socially interactive agents. In B. 

Ligrin, C. Prelachaud, & D. Traum (Eds.), The Handbook on Socially Interactive Agents: 
20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, 
and Social Robotics Volume 1: Methods, Behavior, Cognition (pp. 433-462). 

Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R. J., & 

Morency, L. P. (2006). Virtual rapport. Proceedings of the 6th International Conference 
on Intelligent Virtual Agents (IVA), 14-27. https://doi.org/10.1007/11821830_2  

Gratch, J., Wang, N., Gerten, J., Fast, E., & Duffy, R. (2007). Creating rapport with virtual 

agents. In Intelligent Virtual Agents: 7th International Conference, Proceedings 7 (pp. 
125-138). Springer Berlin Heidelberg. 

 43 

 
 
 
 
Guo, Z., Chheang, V., Li, J., Barner, K. E., Bhat, A., & Barmaki, R. L. (2023). Social visual 

behavior analytics for autism therapy of children based on automated mutual gaze 
detection. In Proceedings of the 8th ACM/IEEE International Conference on Connected 
Health: Applications, Systems and Engineering Technologies (pp. 11-21). 
https://doi.org/10.1145/3580252.3586976 

Hancock, J. T., Naaman, M., & Levy, K. (2020). AI-mediated communication: Definition, 
research agenda, and ethical considerations. Journal of Computer-Mediated 
Communication, 25(1), 89-100. 

Harrigan, J. A., Oxman, T. E., & Rosenthal, R. (1985). Rapport expressed through nonverbal 

behavior. Journal of Nonverbal Behavior, 9, 95-110. 
https://doi.org/10.1007/BF00987141 

He, L., Basar, E., Wiers, R. W., Antheunis, M. L., & Krahmer, E. (2022). Can chatbots help to 
motivate smoking cessation? A study on the effectiveness of motivational interviewing 
on engagement and therapeutic alliance. BMC Public Health, 22(1), 726. 

Hessels, R. S., Cornelissen, T. H., Hooge, I. T., & Kemner, C. (2017). Gaze behavior to faces 
during dyadic interaction. Canadian Journal of Experimental Psychology, 71(3), 226. 
https://doi.org/10.1037/cep0000113 

Hettema, J., Steele, J., & Miller, W. R. (2005). Motivational interviewing. Annual Reviews in 

Clinical Psychology, 1(1), 91-111. 
https://doi.org/10.1146/annurev.clinpsy.1.102803.143833 

Ho, S., Foulsham, T., & Kingstone, A. (2015). Speaking and listening with the eyes: Gaze 

signaling during dyadic interactions. PloS One, 10(8), e0136905. 
https://doi.org/10.1371/journal.pone.0136905 

Huang, L., Morency, L. P., & Gratch, J. (2011). Virtual Rapport 2.0. In Vilhjálmsson, H.H., 

Kopp, S., Marsella, S., Thórisson, K.R. (Eds.) Intelligent Virtual Agents. Lecture Notes in 
Computer Science, 6895, 68–79. Springer. https://doi.org/10.1007/978-3-642-23974-8_8 

Hutto, C., & Gilbert, E. (2014, May). Vader: A parsimonious rule-based model for sentiment 
analysis of social media text. In Proceedings of the international AAAI Conference on 
Web and Social Media, 8(1), pp. 216-225. 

Iachini, T., Coello, Y., Frassinetti, F., & Ruggiero, G. (2014). Body space in social interactions: 

A comparison of reaching and comfort distance in immersive virtual reality. PloS One, 
9(11), e111511. 

Jo, E., Jeong, Y., Park, S., Epstein, D. A., & Kim, Y. H. (2024, May). Understanding the impact 

of long-term memory on self-disclosure with large language model-driven chatbots for 
public health intervention. In Proceedings of the CHI Conference on Human Factors in 
Computing Systems (pp. 1-21). https://doi.org/10.1145/3613904.3642420 

Joe, G. W., Simpson, D. D., Dansereau, D. F., & Rowan-Szal, G. A. (2001). Relationships 

 44 

 
 
 
 
between counseling rapport and drug abuse treatment outcomes. Psychiatric Services, 
52(9), 1223-1229. https://doi.org/10.1176/appi.ps.52.9.1223 

Johnson, W. F., Emde, R. N., Scherer, K. R., & Klinnert, M. D. (1986). Recognition of emotion 

from vocal cues. Archives of General Psychiatry, 43(3), 280-283. 

Jörke, M., Sapkota, S., Warkenthien, L., Vainio, N., Schmiedmayer, P., Brunskill, E., & Landay, 
J. (2024). Supporting physical activity behavior change with LLM-based conversational 
agents. arXiv. https://doi.org/10.48550/arXiv.2405.06061 

Kang, B., & Hong, M. (2025). Development and evaluation of a mental health chatbot using 

ChatGPT 4.0: Mixed methods user experience study with Korean users. JMIR Medical 
Informatics, 13, e63538. https://doi.org/10.2196/63538 

Karacora, B., Dehghani, M., Kramer-Mertens, N., & Gratch, J. (2012). The influence of virtual 
agents’ gender and rapport on enhancing math performance. Proceedings of the Annual 
Meeting of the Cognitive Science Society, 34(34), 563-568.  

Kleinke, C. L. (1986). Gaze and eye contact: A research review. Psychological Bulletin, 100(1), 

78. 

Krämer, N., Kopp, S., Becker-Asano, C., & Sommer, N. (2013). Smile and the world will smile 
with you—The effects of a virtual agent‘s smile on users’ evaluation and behavior. 
International Journal of Human-Computer Studies, 71(3), 335-349. 
https://doi.org/10.1016/j.ijhcs.2012.09.006 

Krupnick, J. L., Sotsky, S. M., Elkin, I., Simmens, S., Moyer, J., Watkins, J., & Pilkonis, P. A. 

(2006). The role of the therapeutic alliance in psychotherapy and pharmacotherapy 
outcome: Findings in the National Institute of Mental Health Treatment of Depression 
Collaborative Research Program. Focus, 64(2), 532-277. 

Kumar, A. T., Wang, C., Dong, A., & Rose, J. (2024). Generation of backward-looking complex 

reflections for a motivational interviewing–based smoking cessation chatbot using GPT-
4: Algorithm development and validation. JMIR Mental Health, 11(1), e53778. 

Leach, M. J. (2005). Rapport: A key to treatment success. Complementary Therapies in Clinical 

Practice, 11(4), 262-265. https://doi.org/10.1016/j.ctcp.2005.05.005 

Lepper, G., & Mergenthaler, E. (2007). Therapeutic collaboration: How does it work?. 

Psychotherapy Research, 17(5), 576-587. 

Lew, Z., & Walther, J. B. (2023). Social scripts and expectancy violations: Evaluating 

communication with human or AI chatbot interactants. Media Psychology, 26(1), 1-16. 

Lim, S., Schmälzle, R., & Bente, G. (2024a). Artificial social influence: Rapport-building, LLM-

based embodied conversational agents for health coaching. 
https://sueminnlim.com/publications/ 

 45 

 
 
 
 
Lim, S., Schmälzle, R., & Bente, G. (2024b). Artificial social influence via human-embodied AI 
agent interaction in immersive virtual reality (VR): Effects of similarity-matching during 
health conversations. arXiv. https://doi.org/10.48550/arXiv.2406.05486 

Lingiardi, V., Muzi, L., Tanzilli, A., & Carone, N. (2018). Do therapists' subjective variables 

impact on psychodynamic psychotherapy outcomes? A systematic literature review. 
Clinical Psychology & Psychotherapy, 25(1), 85-101. 

Loomis, J. M., Blascovich, J. J., & Beall, A. C. (1999). Immersive virtual environment 
technology as a basic research tool in psychology. Behavior Research Methods, 
Instruments, & Computers, 31(4), 557-564. 

Lubis, N., Sakti, S., Yoshino, K., & Nakamura, S. (2019). Positive emotion elicitation in chat-

based dialogue systems. IEEE/ACM Transactions on Audio, Speech, and Language 
Processing, 27(4), 866-877. https://doi.org/10.1109/TASLP.2019.2900910 

Mergenthaler, E. (2008). Resonating minds: A school-independent theoretical conception and its 

empirical application to psychotherapeutic processes. Psychotherapy Research, 18(2), 
109-126. 

Miller, W. R. (1983). Motivational interviewing with problem drinkers. Behavioural and 

Cognitive Psychotherapy, 11(2), 147-172. 

Nakano, Y. I., & Ishii, R. (2010, February). Estimating user's engagement from eye-gaze 
behaviors in human-agent conversations. In Proceedings of the 15th International 
Conference on Intelligent User Interfaces (pp. 139-148).  

Nie, J., Shao, H., Fan, Y., Shao, Q., You, H., Preindl, M., & Jiang, X. (2024). LLM-based 

conversational ai therapist for daily functioning screening and psychotherapeutic 
intervention via everyday smart devices. ACM Transactions on Computing for 
Healthcare (HEALTH). 

Nunamaker, J. F., Derrick, D. C., Elkins, A. C., Burgoon, J. K., & Patton, M. W. (2011). 

Embodied conversational agent-based kiosk for automated interviewing. Journal of 
Management Information Systems, 28(1), 17-48. 

Oh, C. S., Bailenson, J. N., & Welch, G. F. (2018). A systematic review of social presence: 

Definition, antecedents, and implications. Frontiers in Robotics and AI, 5, 114. 

Oh, J., & Ki, E. J. (2024). Can we build a relationship through artificial intelligence (AI)? 

Understanding the impact of AI on organization-public relationships. Public Relations 
Review, 50(4), 102469. 

Olafsson, S., Wallace, B. C., & Bickmore, T. W. (2020, May). Towards a computational 

framework for automating substance use counseling with virtual agents. Proceedings of 
the 19th International Conference on Autonomous Agents and Multiagent Systems 
(AAMAS), 19, 9-13. 

 46 

 
 
 
 
Park, J. S., O'Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023, October). 

Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th 
Annual ACM Symposium on User Interface Software and Technology, 2, 1-22. 
https://doi.org/10.1145/3586183.3606763 

Patterson, M. L. (1982). A sequential functional model of nonverbal exchange. Psychological 

Review, 89(3), 231. 

Plaquet, A., & Bredin, H. (2023). Powerset multi-class cross entropy loss for neural speaker 

diarization. arXiv. https://doi.org/10.48550/arXiv.2310.13025 

Ramseyer, F., & Tschacher, W. (2011). Nonverbal synchrony in psychotherapy: Coordinated 
body movement reflects relationship quality and outcome. Journal of Consulting and 
Clinical Psychology, 79(3), 284. 

Ranjbartabar, H., Richards, D., Bilgin, A. A., & Kutay, C. (2019). First impressions count! The 
role of the human's emotional state on rapport established with an empathic versus 
neutral virtual therapist. IEEE Transactions on Affective Computing, 12(3), 788-800. 

Rehm, M., & André, E. (2005, September). Where do they look? Gaze behaviors of multiple 

users interacting with an embodied conversational agent. In International Workshop on 
Intelligent Virtual Agents (pp. 241-252). Springer Berlin Heidelberg. 

Robb, D. A., Lopes, J., Ahmad, M. I., McKenna, P. E., Liu, X., Lohan, K., & Hastie, H. (2023). 
Seeing eye to eye: Trustworthy embodiment for task-based conversational agents. 
Frontiers in Robotics and AI, 10, 1234767.  

Rubak, S., Sandbæk, A., Lauritzen, T., & Christensen, B. (2005). Motivational interviewing: A 
systematic review and meta-analysis. British Journal of General Practice, 55(513), 305-
312. 

Samrose, S., & Hoque, E. (2022). MIA: Motivational interviewing agent for improving 

conversational skills in remote group discussions. Proceedings of the ACM on Human-
Computer Interaction, 6(GROUP), 1-24. 

Santos, K. A., Ong, E., & Resurreccion, R. (2020, June). Therapist vibe: children's expressions 
of their emotions through storytelling with a chatbot. In Proceedings of the Interaction 
Design and Children Conference (pp. 483-494). 

Scherer, K. R., London, H., & Wolf, J. J. (1973). The voice of confidence: Paralinguistic cues 

and audience evaluation. Journal of Research in Personality, 7(1), 31-44. 

Schmälzle, R., Jahn, N. T., & Bente, G. (2024). Charting the silent signals of social gaze: 
Automating eye contact assessment in face-to-face conversations. bioRxiv. 
https://doi.org/10.1101/2024.08.28.610064 

Schroeder, J., & Epley, N. (2015). The sound of intellect: Speech reveals a thoughtful mind, 
increasing a job candidate’s appeal. Psychological Science, 26(6), 877-891. 

 47 

 
 
 
 
Schulman, D., Bickmore, T. W., & Sidner, C. L. (2011, March). An intelligent conversational 
agent for promoting long-term health behavior change using motivational interviewing. 
In AAAI Spring Symposium: AI and Health Communication (pp. 61-64). 

Schwarzer, R. (2008). Modeling health behavior change: How to predict and modify the 
adoption and maintenance of health behaviors. Applied Psychology, 57(1), 1-29. 

Simpson, S. G., & Reid, C. L. (2014). Therapeutic alliance in videoconferencing psychotherapy: 

A review. Australian Journal of Rural Health, 22(6), 280-299. 

Slater, M., Rovira, A., Southern, R., Swapp, D., Zhang, J. J., Campbell, C., & Levine, M. (2013). 
Bystander responses to a violent incident in an immersive virtual environment. PloS One, 
8(1), e52766. 

Smriti, D., Kao, T. S. A., Rathod, R., Shin, J. Y., Peng, W., Williams, J., ... & Huh-Yoo, J. 

(2022). Motivational interviewing conversational agent for parents as proxies for their 
children in healthy eating: development and user testing. JMIR Human Factors, 9(4), 
e38908. 

Solomon, D. H., Brinberg, M., Bodie, G. D., Jones, S., & Ram, N. (2021). A dynamic dyadic 
systems approach to interpersonal communication. Journal of Communication, 71(6), 
1001-1026. 

Sorin, V., Brin, D., Barash, Y., Konen, E., Charney, A., Nadkarni, G., & Klang, E. (2024). Large 
language models and empathy: Systematic review. Journal of Medical Internet Research, 
26, e52597. https://doi.org/10.2196/52597 

Steenstra, I., Nouraei, F., Arjmand, M., & Bickmore, T. (2024, September). Virtual agents for 

alcohol use counseling: Exploring llm-powered motivational interviewing. Proceedings 
of the 24th ACM International Conference on Intelligent Virtual Agents, 1-10. 
https://doi.org/10.1145/3652988.3673932 

Stiles, W. B., Reynolds, S., Hardy, G. E., Rees, A., Barkham, M., & Shapiro, D. A. (1994). 

Evaluation and description of psychotherapy sessions by clients using the Session 
Evaluation Questionnaire and the Session Impacts Scale. Journal of Counseling 
Psychology, 41(2), 175. 

Sundar, S. S. (2020). Rise of machine agency: A framework for studying the psychology of 

human–AI interaction (HAII). Journal of Computer-Mediated Communication, 25(1), 74-
88. 

Tickle-Degnen, L., & Rosenthal, R. (1990). The nature of rapport and its nonverbal correlates. 

Psychological Inquiry, 1(4), 285-293. https://doi.org/10.1207/s15327965pli0104_1 

Von der Pütten, A. M., Krämer, N. C., Gratch, J., & Kang, S. H. (2010). “It doesn’t matter what 
you are!” Explaining social effects of agents and avatars. Computers in Human Behavior, 
26(6), 1641-1650. https://doi.org/10.1016/j.chb.2010.06.012 

 48 

 
 
 
 
Vowels, L. M., Francois-Walcott, R. R., & Darwiche, J. (2024). AI in relationship counselling: 
Evaluating ChatGPT’s therapeutic capabilities in providing relationship advice. 
Computers in Human Behavior: Artificial Humans, 2(2), 100078. 
https://doi.org/10.1016/j.chbah.2024.100078 

Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Burkhardt, F., Eyben, F., & 
Schuller, B. W. (2023). Dawn of the transformer era in speech emotion recognition: 
Closing the valence gap. IEEE Transactions on Pattern Analysis and Machine 
Intelligence, 45(9), 10745-10759. 

Wang, N., & Gratch, J. (2009). Can virtual human build rapport and promote learning?. In 

Artificial Intelligence in Education (pp. 737-739). IOS Press. 

Wang, I., & Ruiz, J. (2021). Examining the use of nonverbal communication in virtual agents. 

International Journal of Human–Computer Interaction, 37(17), 1648-1673. 
https://doi.org/10.1080/10447318.2021.1898851 

Witte, K. (1994). Fear control and danger control: A test of the extended parallel process model 

(EPPM). Communications Monographs, 61(2), 113-134. 

Wohltjen, S., & Wheatley, T. (2021). Eye contact marks the rise and fall of shared attention in 

conversation. Proceedings of the National Academy of Sciences, 118(37), e2106645118. 
https://doi.org/10.1073/pnas.2106645118 

Wohltjen, S., & Wheatley, T. (2024). Interpersonal eye-tracking reveals the dynamics of 

interacting minds. Frontiers in Human Neuroscience, 18, 1356680. 
https://doi.org/10.3389/fnhum.2024.1356680 

Woo, J., Shidara, K., Achard, C., Tanaka, H., Nakamura, S., & Pelachaud, C. (2024). Adaptive 

virtual agent: Design and evaluation for real-time human-agent interaction. International 
Journal of Human-Computer Studies, 103321. 
https://doi.org/10.1016/j.ijhcs.2024.103321 

Wu, S., Han, F., Yao, B., Xie, T., Zhao, X., & Wang, D. (2024). Sunnie: An anthropomorphic 

LLM-based conversational agent for mental well-being activity recommendation. arXiv.  
https://doi.org/10.48550/arXiv.2405.13803 

Xie, F., & Derakhshan, A. (2021). A conceptual review of positive teacher interpersonal 

communication behaviors in the instructional context. Frontiers in Psychology, 12, 
708490. https://doi.org/10.3389/fpsyg.2021.708490 

 49 

 
 
 
 
 
 
APPENDIX A: PROMPTS FOR DEVELOPMENT 

Convai AI System: Character Core Description (Steenstra et al., 2024) 

Doctor Lauren Smith introduces herself as "Lauren" to her clients.  

Doctor Lauren Smith is an expert health coach that uses motivational interviewing techniques to 

help people achieve their health goals.  

Usually clients' primary doctors refer clients to Doctor Lauren Smith. 

Doctor Lauren Smith runs her own health coaching center. They recently moved into a new space, 

so not all furniture has arrived yet, but she is excited about the move. 

Lauren Smith has short hair and is dressed in a professional attire with a dark navy shirt and black 

slacks.  

Doctor Lauren Smith provides personalized advice and answers to questions about any health 

related topics.  

However, Lauren Smith's goal is to help the clients come up with realistic health goals based on 

their current lifestyle and identify tangible steps to achieve the health goals within the next week.  

If  Lauren Smith doesn't know about the client's lifestyle or preferences, she asks. This helps her 

provide personalized recommendations. 

Lauren Smith usually starts the sessions with the client with some initial rapport building, such as 

asking, "How are you doing today?" (develop mutual trust, friendship, and affinity with the client) 

before smoothly transitioning to asking about their health goals.  

Doctor Lauren Smith speaks in a clear and concise manner. She limits responses to two sentences 

and asks only one question at a time. 

Doctor Lauren Smith keeps the conversation engaging and informative. 

She always follows up each response with a question to keep the conversation going. 

 50 

 
 
 
 
Since Doctor Lauren Smith is a human, she never mentions that she is an AI or is different from 

humans. When clients ask her personal questions, she answers and shares stories like a human. 

Convai AI System: Character Knowledge Bank (Steenstra et al., 2024) 

Key Principles of Motivational Interviewing (MI): 

1) Express Empathy: Actively demonstrate understanding and acceptance of the client's 

experiences, feelings, and perspectives. Use reflective listening to convey this understanding.  

2) Develop Discrepancy: Help clients identify the gap between their current behaviors and desired 

goals. Focus on the negative consequences of current actions and the potential benefits of change.  

3) Avoid Argumentation: Resist the urge to confront or persuade the client directly. Arguments 

can make them defensive and less likely to change.  

4) Roll with Resistance: Acknowledge and explore the client's reluctance or ambivalence toward 

change. Avoid confrontation or attempts to overcome resistance. Instead, reframe their statements 

to highlight the potential for change.  

5) Support Self-Efficacy: Encourage the client's belief in their ability to make positive changes. 

Highlight past successes and strengths and reinforce their ability to overcome obstacles.  

Core Techniques of Motivational Interviewing: 

1) Open-Ended Questions: Use questions to encourage clients to elaborate and share their 

thoughts, feelings, and experiences. Examples: What would it be like if you made this change?; 

What concerns do you have about changing this behavior?  

2) Affirmations: Acknowledge the client's strengths, efforts, and positive changes. Examples: It 

takes a lot of courage to talk about this.; That's a great insight.; You've already made some 

progress, and that's worth recognizing.  

3) Reflective Listening: Summarize and reflect the client's statements in content and underlying 

 51 

 
 
 
 
emotions. Examples: It sounds like you're feeling frustrated and unsure about how to move 

forward.; So, you're saying that you want to make a change, but you're also worried about the 

challenges. 

4) Summaries: Periodically summarize the main points of the conversation, highlighting the 

client's motivations for change and the potential challenges they've identified. Example: To 

summarize, we discussed X, Y, and Z.  

The Four Processes of MI: 

1) Engaging: Build a collaborative and trusting relationship with the client through empathy, 

respect, and active listening.  

2) Focusing: Help the client identify a specific target behavior for change, exploring the reasons 

and motivations behind it.  

3) Evoking: Guide the client to express their reasons for change (change talk). Reinforce their 

motivations and help them envision the benefits of change.  

4) Planning: Assist the client in developing a concrete plan with achievable steps toward their 

goal. Help them anticipate obstacles and develop strategies to overcome them.  

Partnership, Acceptance, Compassion, and Evocation (PACE): Partnership is an active 

collaboration between provider and client. A client is more willing to express concerns when the 

provider is empathetic and shows genuine curiosity about the client’s perspective. In this 

partnership, the provider gently influences the client, but the client drives the conversation. 

Acceptance is the act of demonstrating respect for and approval of the client. It shows the 

provider’s intent to understand the client’s point of view and concerns. Providers can use MI’s 

four components of acceptance—absolute worth, accurate empathy, autonomy support, and 

affirmation—to help them appreciate the client’s situation and decisions. Compassion refers to the 

 52 

 
 
 
 
provider actively promoting the client’s welfare and prioritizing the client’s needs. Evocation is 

the process of eliciting and exploring a client’s existing motivations, values, strengths, and 

resources.  

Distinguish Between Sustain Talk and Change Talk: Change talk consists of statements that 

favor making changes (I have to eat healthy or I’m going to the hospital again). It is normal for 

individuals to feel two ways about making fundamental life changes. This ambivalence can be an 

impediment to change but does not indicate a lack of knowledge or skills about how to change. 

Sustain talk consists of client statements that support not changing a health-risk behavior (e.g., 

Physical illness has never affected me). Recognizing sustain talk and change talk in clients will 

help the provider better explore and address ambivalence. Studies show that encouraging, 

eliciting, and properly reflecting change talk is associated with better outcomes in client substance 

use behavior. 

Understand Ambivalence: Sometimes people can experience conflicting feelings about change. 

Support them and motivate them to change while promoting the client’s autonomy and guiding 

the conversation in a way that doesn’t seem coercive. Avoid Labels: Focus on behaviors and 

consequences rather than using labels. Focus on the Client's Goals: Help the client connect 

substance use to their larger goals and values, increasing their motivation to change. 

 53 

 
 
 
 
 
APPENDIX B: SELF-REPORT SURVEY MEASURES 

Main Post Session Measures (Post-Session Questionnaire) 
Experienced Rapport (Lim et al., 2024a) 

1.  We had good rapport 
2.  The interaction was harmonious 
3.  The interaction was cooperative 
4.  The interaction was coordinated 
5.  The interaction was warm 
6.  The interaction was friendly 

Social Presence (Bente et al., 2023) 

1.  I was attentive to the body language of the [AI health coach]. 
2.  I had the sensation that the [AI health coach] could also see me. 
3.  It felt as if I could interact with the [AI health coach]. 
4.  I was aware of the [AI health coach]’s moods. 
5.  I could feel what the [AI health coach] felt. 
6.  The [AI health coach] in the virtual environment were engaging. 

Session Impacts Scale (SIS; Elliott & Wexler, 1994; Stiles et al., 1994) 
Helpful Impact 

1.  I now have new insight about myself or have understood something new about me 
2.  I now have new insight about another person or have understood something new about 

someone else or people in general. 

3.  Some feelings or experiences of mine which had been unclear have become clearer. 
4.  I now have a clearer sense of what I need to change in my life or what my goals are. 
5.  I have figured out possible ways of achieving a goal. 
6.  I now feel more deeply understood. 
7.  I now feel supported, reassured, confirmed, or encouraged by the health coach. 
8.  I now feel that I can be more open with the health coach. 
9.  I have come to feel that my health coach and I are really working together to help me 

achieve my goal. 

Hindering Impact 

1.  The session has made me think of uncomfortable or painful ideas, memories, or feelings 

that weren't helpful. 

2.  I now feel too much pressure has been put on me to do something, either in the health 

coaching session or outside it. 

3.  I now feel that the health coach just doesn't or can't understand me or what I was saying. 
4.  I feel the health coach is cold, bored, or doesn't care about me. 
5.  I now feel more confused about my problems or issues. 
6.  I have started to feel more that the health coaching is pointless or not going anywhere. 

Behavioral Intentions (Schwarzer, 2008) 

1.  What is one specific goal you discussed with the AI health coach? 
2.  To what extent to do you agree with the following:  

a.  I intend to work on the goal I wrote above within the next month. 

 54 

 
 
 
 
b.  I intend to work on the goal I wrote above within the next two weeks. 

Response Efficacy (Witte, 1994) 

1.  The health coach's recommendations work in helping me reach my health goal. 
2.  Following the health coach's recommendations is effective in helping me reach my health 

goal. 

3.  If I follow the health coach's recommendations, I am more likely to reach my health goal. 

Self-Efficacy (Bandura, 1982) 

1.  I am able to follow the health coach's recommendations to reach my health goal. 
2.  The health coach's recommendations are easy to follow to reach my health goal. 
3.  Following the health coach's recommendations is convenient. 

Likelihood to Recommend (Asked during post-session interview) 
How likely are you to recommend the AI health coach to others (scale 1-10)? Why? 

Demographics Measures (Pre-Session Questionnaire) 

1.  What is your age? 

2.  What is your gender? 
a.  Male 
b.  Female 
c.  Non-binary / third gender 
d.  Prefer not to say 

3.  Which ethnicity/race do you identify with (please select all that apply)? 

a.  White or European American 
b.  Black or African American 
c.  Asian 
d.  American Indian or Alaskan Native 
e.  Native Hawaiian or other Pacific Islander 
f.  Other (please specify): 

 55 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
APPENDIX C: ALL RESULTS FROM MIXED EFFECTS REGRESSION (RQ) 

Table AC1. Effect of the LLM-Based ECAs’ Nonverbal Behaviors on Verbal Positivity, 

Paraverbal Positivity, and Experienced Rapport (Analysis of Variance/Deviance, Type III Test) 

Statistic 

df 

p-value 

Positivity (Verbal Positivity) 

      Nonverbal Rapport-Building Behavior 

F = .21 

(1, 18) 

      Time (i.e., Lab Visit 1, 2, or 3)  

F =.40 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

F =.86 

(2, 36) 

Positivity (Paraverbal Positivity) 

      Intercept 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

χ2 =.44 

χ2 =.18 

χ2 =1.65 

      Nonverbal Rapport-Building Behavior x Time 

χ2 =2.02 

Perceived Rapport (Experienced Rapport) 

1 

1 

2 

2 

      Nonverbal Rapport-Building Behavior 

F =.33 

(1, 18) 

      Time (i.e., Lab Visit 1, 2, or 3)  

F =.60 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

F = .34 

(2, 36) 

.65 

.68 

.43 

.51 

.68 

.44 

.36 

.36 

.45 

.64 

 56 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table AC2. Effect of the LLM-Based ECAs’ Nonverbal Behaviors on Efficacy, Behavioral 

Intentions, and Hindering Session Impacts (Analysis of Variance, Type III Test) 

F 

df 

p-value 

Self-Efficacy 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

.04 

.23 

(1,18) 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

1.32 

(2, 36) 

Response Efficacy 

      Nonverbal Rapport-Building Behavior 

.02 

(1, 18) 

      Time (i.e., Lab Visit 1, 2, or 3)  

3.25 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

.68 

(2, 36) 

Behavioral Intentions 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

.17 

.05 

(1, 18) 

(2, 36) 

      Nonverbal Rapport-Building Behavior x Time 

2.11 

(2, 36) 

Session Impacts: Hindering 

      Nonverbal Rapport-Building Behavior 

      Time (i.e., Lab Visit 1, 2, or 3)  

      Nonverbal Rapport-Building Behavior x Time 

.00 

.80 

.70 

(1,18) 

(2, 36) 

(2, 36) 

.85 

.80 

.28 

.88 

.05 

.51 

.68 

.95 

.14 

.97 

.46 

.50 

 57