EXAMINING THE EFFECTS OF SOCIAL MEDIA BOTS ON ONLINE DISCUSSIONS:  
EVIDENCE FROM AN OBSERVATIONAL AND AN EXPERIMENTAL STUDY 

By 

Ruth Jin-Hee Heo 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements 
for the degree of  

Communication – Doctor of Philosophy  

2025 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

This dissertation explores the impact of bots on online public discourse, specifically 

focusing on human users’ language and attitudes in response to online interactions. The findings 

from Study 1 suggest that while bots generally exhibited lower levels of politicization, 

polarization, and neutrality, they displayed higher levels of anger, disgust, fear, and joy. Some of 

these features, specifically, politicization and disgust of bots can still influence humans over 

time. Furthermore, Studies 2 and 3 compared the effects of LLM-generated content versus non-

LLM-generated content on individuals’ attitudes. The results show that LLM-generated contents 

can subtly influence users, as individuals often struggle to distinguish between human-like and 

machine-generated content on social media. As bots become more sophisticated with 

technological advancements, they are increasingly capable of shaping human attitudes in ways 

that are nearly indistinguishable from human interactions. Given the pervasive use of such 

technologies on social media, understanding their relational impact on humans is becoming 

crucial.  

 
 
Copyright by  
RUTH JIN-HEE HEO 
2025 

 
 
 
 
 
 
 
 
 
 
 
TABLE OF CONTENTS 

INTRODUCTION………………………………………………………………………………1  

LITERATURE REVIEW………………...………………………………………….………….....3  
Social Media Bots and Their Impacts on Humans………...………………..……………...3 
CASA Framework and Bots’ Social Influence on Humans…………..……...…………….4   
Hypotheses  and  Research  Questions……………………………...….………………….6 
Overview of Studies……………………………...…………………….………………...10 

STUDY  1………………………………………………………….…..…………………………16  
Study 1 Research Design...……………………………………………………………….16 
Study 1 Results…………………………………………………….…...………………...20 
Study 1 Discussion………..……………………………………………………………...28 

STUDY  2………………………………………………………………..………………….……31 
Study 2 Research Design.......…………………………………………………...….…….31 
Study 2 Results…………………………………………………………………....……...33 
Study 2 Discussion……………………………………………………………………….36 

STUDY  3…………………………………………………………………..………….…………43 
Study 3 Research Design...………………………………………...………….………….43 
Study 3 Results…………………………………………………….…..…….…………...46 
Study 3 Discussion………………………………………………….…..….…………….49 

GENERAL  DISCUSSION……………………………………………………….…………….52 

CONCLUSION………………………………………...….……………………………………..58 
Limitations……………………………………………………………………………….59 
Future Directions…………………………….…………………………………………...61 

REFERENCES…………………………….……………………………………………..……...64 

APPENDIX A: SUPPLEMENTARY TABLES AND FIGURES…..……………………….…..71 

APPENDIX B: PILOT TEST RESULTS FOR BOT DETECTION………………..……………78 

APPENDIX C: AN INSTRUCTION FOR SIMULATING TWEETS USING GPT-4………..…80 

APPENDIX D: AN INSTRUCTION FOR PERSONA SIMULATION USING LLAMA-3….....81

iii 

 
 
INTRODUCTION 

Previous research has consistently shown that the presence of bots increased the 

likelihood of social media users’ exposure to triggering and/or biased content (Bail et al., 2020; 

Badawy et al., 2018). Specifically, bots played a role in the dissemination of misinformation and 

the exacerbation of conflicts during significant events, such as the 2016 U.S. presidential election 

(Bail et al., 2020; Badawy et al., 2018; Keller & Klinger, 2019; Shao et al., 2017), the COVID-

19 pandemic (Ferrara, 2020; Xu & Sasahara, 2021), and the Catalan referendum for 

independence (Stella et al., 2018). However, bots’ direct influences on humans remain largely 

speculative (e.g., Aldayel & Magdy, 2022; Bail et al., 2020; Caldarelli et al., 2020; Ferrara, 

2020).  

In recent years, the emergence of large language models (LLMs) has transformed the 

landscape of social media platforms. LLMs enable users to deploy bots effectively and produce 

highly deceptive content in a rapid fashion (Zhang et al., 2024). For example, this includes the 

viral video that falsely depicted Volodymyr Zelensky, the president of Ukraine, announcing a 

surrender to Russia (Satariano & Mozur, 2023). LLMs not only disrupt the content produced in 

social media but also have contributed to the emergence of more nuanced and sophisticated bots 

on the platforms. With the rise of so-called LLM-powered bots, which exhibit increasingly 

human-like features, there is potential for these bots to exert social influence equivalent to that of 

humans. Nonetheless, a lack of evidence leads to questioning the extent of the impact of 

information provided by these bots, how this will influence humans, and what the consequences 

of this influence may be.  

The purpose of the dissertation is to examine the potential influence of social media bots’ 

presence on humans in the context of online discussions about genetically modified organisms 

1 

 
(GMOs). GMOs have emerged as a technological advancement aimed at enhancing the quality 

and quantity of agricultural crops (Rathod & Hedaoo, 2022; Sohi et al., 2013). Historically, 

GMOs have been a controversial topic, with divided opinions—some people emphasize their 

benefits, while others focus on the potential risks. This context provides a valuable opportunity 

to examine how bots may shape public discourse, especially in light of the ongoing controversies 

surrounding GMOs. This research comprises three studies, and Study 1 utilizes observational 

data to examine bots’ role in social media discourse by analyzing how their content influences 

that of humans. Using time series prediction, it analyzes linguistic elements in tweets to 

determine the extent to which bots’ linguistic features affect those of humans or the extent to 

which humans’ linguistic features affect those of bots. Study 2 further employs a controlled 

experiment to investigate the causal influence of LLM-generated content as well as non-LLM-

generated content on human attitudes. Specifically, this study adapts methods involving LLMs to 

create personas based on respondents’ answers from a nationally representative poll (Hewitt et 

al., 2024). In Study 3, actual human subjects were recruited correspondingly to the same 

experimental setup of Study 2, which increases the generalizability of the results as well as 

reinforces the validity of Study 2’s results. The present research not only assesses the overall 

impact of bots but also validates the growing influence of LLM-generated content in the online 

environment. Given the concerns over the drastic development of the new technology, the 

findings could contribute to contemplating the ways to mitigate unforeseeable consequences 

driven by them.    

2 

 
 
 
Social Media Bots and their Impacts on Humans 

LITERATURE REVIEW 

Bots are automated software programs designed to simulate human behavior. While they 

can facilitate interactions with users, bots are often deployed to disseminate information 

according to specific agendas. In some cases, bots serve benign purposes, potentially promoting 

free speech, political discourse, and even social activism (Ferrara et al., 2016; Gorwa & 

Guilbeault, 2020; Savage et al., 2015). However, recent research has focused on the malicious 

intent of implementing bots for manipulating public discourses. Bots have been identified as key 

contributors to the spread of misinformation, hate speech, and identity theft (Stocking & Sumida, 

2018). Specific to social media platforms, bots often masquerade as human users, thereby 

polluting public conversations. Their malicious influence on social media, specifically Twitter 

(now X), was repeatedly reported as bots disseminated politically charged posts to amplify a 

political agenda (Bail et al., 2020; Badawy et al., 2018), violent content that contained negative 

and inflammatory narratives to intentionally induce conflicts among politically divisive groups 

(Stella et al., 2018), and misinformation that interrupted credible information sources 

(Broniatowski et al., 2018; Shao et al., 2017; Xu & Sasahara, 2022).  

Despite predicaments caused by bots, such as repeated exposure to harmful, triggering 

content, it is yet unclear if their influence is actually strong enough to directly change one’s 

attitude toward a topic (e.g., Aldayel & Magdy, 2022; Bail et al., 2020). For example, Bail et al. 

(2020) examined the role of the Russian Internet Research Agency (IRA) in the context of the 

2016 U.S. presidential election; results indicated bots had a role in polluting the public discourse 

over the election, but their actual influence on other users’ attitudes and behaviors has not been 

confirmed (Bail et al., 2020). Similarly, Aldayel and Magdy (2022) investigated bots’ effect on 

3 

 
users’ stances in the context of political and social events such as climate change, feminism, 

Brexit, and other movements. Their findings suggested that the relatively rare occurrences of 

social media bots and lack of evidence for a direct relationship between bots and users’ stances 

led to an inconclusive effect of bots.  

While these studies advise caution in interpreting bots' influence, emerging research 

suggests that bots powered by LLMs could have a more substantial impact on human opinion. 

This new powerful technology enables bots to be easily implemented, efficiently generate 

deceptive content, and actively interact with other users (Zhang et al., 2024). As this technology 

advances, the potential of bots to influence humans may increase, although the magnitude of the 

influence remains uncertain (Burtell & Woodside, 2023).   

CASA Framework and Bots’ Social Influence on Humans   

The computers-as-social-actors (CASA) theory (Reeves & Nass, 1996) offers some 

insights for gauging the effect of bots on humans. Originally, the CASA theory focused on 

human-computer interactions by examining specific cues emitted by computers. Unlike 

traditional media (e.g., televisions, newspapers, radio, etc.), people perceive computers as 

independent sources rather than mere channels or mediums in the communication process (see 

review in Sundar & Nass, 2000; 2001; Hocevar et al., 2017). As independent sources, computers 

interact with humans eliciting mindless social responses from users in ways similar to human-to-

human interactions. Specifically, people were found to apply social rules in their interactions 

with computers in responding to specific cues from computers.   

Initially, Nass and Steuer (1993) proposed that cues related to the use of language (as 

communication tools), interactivity, the assignment of social roles typically played by humans, 

and human sounding speech were most likely to elicit automatic social responses. Then, later 

4 

 
studies identified and tested the cues specifically gender, ethnicity, reciprocal self-disclosure, and 

simple labels designated a computer as ‘specialist,’ all of which led users to apply social rules to 

computers, even when acknowledging that computers are as non-human (Leshner et al., 1998; 

Nass & Moon, 2000, e.g., Nass et al., 1997; Nass et al., 2000; Moon, 2000). For example, in a 

lab experiment, participants rated a male voiced computer as friendlier than a female voiced 

computer when evaluating their performance. Moreover, the same study found that a female 

voiced computer was perceived as more expert in matters of love and relationships compared to 

a male voiced computer. In other words, participants mindlessly gender-stereotyped computers 

based on the voice cue (Nass et al., 1997; Nass & Moon, 2000). These findings suggest that 

specific characteristics of computers affect people to apply mindless social responses to 

computers, treating computers as social actors akin to humans (Qiu & Benbasat, 2009, e.g., Nass 

et al., 1994, 1995; Nass & Moon, 2000).  

In recent years, similar mindless reactions have been consistently observed when people 

interact with emerging technologies, such as voice activated navigation systems (Nass et al., 

2005), the virtual assistant Alexa (Schneider, 2020), smartphones (Carolus et al., 2019), and in a 

study exploring how humans responded to a “lost” robot (Srinivvasan & Takayama, 2016). 

These results highlight that technologies (a broad category of computers) that possess specific 

features enable active social interactions. In other words, it is plausible that humans may be 

influenced by bots to a similar extent as they are by other humans. This influence may be 

amplified by the advent of more sophisticated “human-like” technologies, such as bots powered 

by LLMs. LLMs meet the components identified by Nass and Steuer (1993) as contributing to 

mindless responses among people, as mentioned earlier. These LLM-enhanced bots, which 

5 

 
mimic human behavior more closely and exceed human skills in various areas, could intensify 

the social influence exerted by bots in online interactions. 

Hypotheses and Research Questions 

Early studies raised concerns over bots, as they tend to distribute politically and 

emotionally triggering content as well as biased information (i.e. misinformation), and the 

increasing sophistication of bots raises concerns about tainted public discourse. This tendency 

indicates that bots’ content will be distinct from that of humans. Moreover, bots are likely to 

generate more politically inciting (i.e. politicizing), biased (i.e. polarizing), and emotionally 

intense (i.e. negatively piqued) language compared to human users in a wide range of issues 

(e.g., Badawy et al., 2018; Bail et al., 2020, the U.S. Election; Broniatowski et al., 2018, 

vaccines; Xu & Sasahara, 2022, COVID-19 pandemic). In the context of GMOs, they also tend 

to display such tendencies in their tweets amid mixed views over the topic. Capturing this 

manifestation, the current dissertation breaks these tendencies down into four specific features–

topical themes, politicization, polarization, and emotions (e.g., anger, fear, disgust, joy)–and 

assesses them respectively. In examining the linguistic features of bots’ content compared to 

those of humans, I propose the following hypotheses: 

H1: The topical themes of bots’ tweets will differ from those of humans’ tweets. 

H2: Bots’ tweets will be more likely than humans’ tweets to exhibit 1) politicization, 2) 

polarization, and 3) negative emotions, while exhibiting less 4) neutrality and 5) positive 

emotion. 

H2-1: Bots’ tweets will exhibit more politicization than humans’ tweets.  

H2-2: Bots’ tweets will exhibit more polarization than humans’ tweets.  

6 

 
H2-3: Bots’ tweets will exhibit more negative emotions, including a) anger, b) disgust, c) 

fear) than humans’ tweets.  

H2-4: Bots’ tweets will exhibit less neutrality and positive emotion (i.e., joy) than 

humans’ tweets. 

Despite the high likelihood of bots containing distinct linguistic features compared to 

those of humans, there are only speculations to support the direct impact of bots on humans (e.g., 

Bail et al., 2020; Aldayel & Magdy, 2022).  That is, the inquiry of whether bots’ politicization, 

polarization, and emotions (e.g., anger, disgust, and fear) in their content directly influence the 

content of human users will be investigated. Thus, I pose the following research questions:   

RQ1: Does the politicization of bots’ tweets influence the politicization of humans’ tweets? 

RQ2: Does the polarization of bots’ tweets influence the polarization of humans’ tweets? 

RQ3: D Do the negative emotions in bots’ tweets—1) anger, 2) disgust, and 3) fear—influence 

the emotions in humans’ tweets? 

Previous research on the perceived communication quality of bots has shown that people 

often have positive perceptions of bots (Edwards et al., 2014; Edwards et al., 2016). 

Additionally, in some conversational contexts, interruptions made by artificial intelligence (AI) 

(e.g, LLM) have been found to improve relationships in human-human interactions (Hohenstein 

& Jung, 2019). However, when comparing content labeled with a human cue versus a bot cue, 

people tend to evaluate the content with a human label more positively, regardless of the actual 

source (Chu & Liu, 2024; Graefe et al., 2018; Karinshak et al., 2023). With regard to the quality 

of content produced by bots versus humans, some studies have compared AI-generated news 

articles with articles written by human journalists. Overall, people assessed both types of articles 

as equally descriptive, boring, and objective (Clerwall, 2014). In some cases, people rated AI-

7 

 
generated texts as more effectively delivered and more believable (Graefe et al., 2018; Karinshak 

et al., 2023). For instance, when the source of the news was arbitrarily assigned, people often 

assessed AI-generated news as more credible and possessing greater journalistic expertise 

(Graefe et al., 2018). Moreover, people reported that AI-generated content seemed to have 

stronger arguments and more positive effects on attitudes compared to human-written content 

(Karinshak et al., 2023). However, when AI-generated stories were compared with human 

written stories, people found AI-stories similar or less engaging and similar or higher 

counterarguing from participants (Chu & Liu, 2024).  

In sum, it remains uncertain whether individuals are capable of distinguishing between 

LLM-generated content and non-LLM-generated content. LLM-generated content can vary 

depending on the original input provided to the model. Although the content is produced by an 

LLM, the underlying source may differ, originating from either bots or humans. Specifically, it is 

unclear whether people can differentiate between different types of LLM-generated content (e.g., 

LLM-generated bot content and. LLM-generated human content) and non-LLM-generated 

content (e.g., human content and bot content). Additionally, it is uncertain whether LLM-

generated content can be distinguished from content generated by (real) humans or (real) bots. 

Therefore, I propose the following research questions: 

RQ4: Do people perceive LLM-generated content and non-LLM-generated content differently in 

terms of bot-likeness and human-likeness? 

RQ4-1: Do people perceive human content and LLM-generated bot content differently in 

terms of (a) bot-likeness and (b) human-likeness?   

RQ4-2: Do people perceive bot content and LLM-generated bot content differently in 

terms of (a) bot-likeness and (b) human-likeness? 

8 

 
RQ4-3: Do people perceive human content and LLM-generated human content differently 

in terms of (a) bot-likeness and (b) human-likeness? 

RQ4-4: Do people perceive bot content and LLM-generated human content differently in 

terms of (a) bot-likeness and (b) human-likeness? 

RQ5: Do people perceive LLM-simulated bot content and LLM-generated human content 

differently in terms of (a) bot-likeness and (b) human-likeness? 

RQ6: Do people perceive bot content and human content differently in terms of (a) bot-likeness 

and (b) human-likeness? 

Despite some research examining the effects of LLM-generated content versus non-

LLM-generated content on perceptions of quality (Chu & Liu, 2024; Clerwall, 2014; Graefe et 

al., 2018; Karinshak et al., 2023), the impact of bots—specifically the extent to which they 

influence attitudes—remains unclear. Therefore, the following set of research questions is posed 

to compare LLM-generated content and non-LLM-generated content: 

RQ7: Do LLM-generated content and non-LLM-generated content affect individuals’ attitudes 

differently?  

RQ7-1: Do LLM-generated bot content and human content affect individuals’ attitudes 

differently? 

RQ7-2: Do LLM-generated bot content and bot content affect individuals’ attitudes 

differently? 

RQ7-3: Do LLM-generated human content and human content affect individuals’ 

attitudes differently? 

RQ7-4: Do LLM-generated human content and bot content affect individuals’ attitudes 

differently? 

9 

 
RQ8: Do LLM-generated human content and LLM-generated bot content affect individuals’ 

attitudes differently?   

RQ9: Do human content and bot content affect individuals’ attitudes differently? 

Overview of Studies 

Topical Context: Genetically Modified Organisms 

This study attempts to address existing gaps in research on social media bots, particularly 

their influence on public discussion regarding GMOs. GMOs have emerged as a technological 

advancement aimed at enhancing the quality and quantity of agricultural crops (Rathod & 

Hedaoo, 2022; Sohi et al., 2013). In the U.S., this biotechnology has led to an increase in crop 

yield, covering approximately 71.5 million hectares in 2019 (Catherine et al. 2024). Beyond 

quantity, GMOs also contribute to improved nutritional quality; for example, L-1 transgenic corn 

has shown increases in Vitamin C, beta-carotene, and folate compared to its non-GMO 

counterparts (Naqvi et al., 2009). Additionally, GMOs help reduce the use of pesticides and 

herbicides, thereby minimizing environmental hazards (Rathod & Hedaoo, 2022). However, 

public perception often associates GMOs with health and environmental risks (Catherine et al., 

2024). 

According to a public opinion survey conducted by the Pew Research Center (Funk, 

2020), nearly half of Americans (48%) view GMOs as having positive or neutral health impacts, 

while 51% perceive them negatively. Notably, 30% of those who reported a neutral stance still 

expressed concerns about potential negative consequences of GMOs. Furthermore, another 

survey indicated that 54% of participants admitted to knowing very little or nothing at all about 

GMOs, and 25% claimed they had never heard of them (Hallman et al., 2013; Wunderlich & 

10 

 
Gatto, 2015). Discussions on Twitter (X) reflected similar trends, with 54% of posts being 

neutral and 32% negative, while only 14% expressed positive sentiment (Sohi et al., 2023). 

Despite the consistent trends, studies by Jun et al. (2020) and Howell et al. (2018) 

revealed that public sentiment on Twitter (X) could shift significantly in response to real-world 

events, such as reports from the National Academies of Sciences, Engineering, and Medicine 

(NASEM) and the United States Department of Agriculture’s (USDA) closure of public 

consultations on GMO regulations. Additionally, recent issues related to pandemics and climate 

change may have further influenced public perceptions of GMOs. These overlapping events 

highlight the complexities underlying public discussions about GMOs, which are shaped by 

various social dynamics, including the role of bots. This research aims to uncover how public 

discourse has evolved and the specific role that bots play in this context. 

The current topic provides the context necessary to study the impact of bots’ social 

influence on GMO discussions. To systematically explore the impact of bots on humans, the 

proposed dissertation conducted a three-part study.  

Study 1: An Observational Study  

Study 1 utilized observational data from Twitter (X) to analyze bots’ direct influence on 

humans or the other way around. By examining linguistic components in social media posts, 

such as topical themes, politicization, polarization, and emotions, this study compared the 

nuances in content shared by bots and humans (H1, H2) and evaluated the direction of influence 

(RQ1-RQ3).  

11 

 
 
 
Studies 2 and 3: Controlled Online Experiments  

The linguistic relationships identified in Study 1, however, are insufficient to directly 

claim a causal influence of bots on humans without establishing nonspuriousness. To address this 

gap, Studies 2 and 3 aim to investigate the causal influence of bots on human attitudes through a 

controlled online experiment. Specifically, in response to growing concerns about LLM-powered 

bots, these studies incorporate content generated by LLMs, including both LLM-powered bot 

content and LLM-powered human content, to assess whether people have ability to discern the 

source of different types of contents (RQ4-RQ6) and their effects on human attitudes (RQ7-

RQ9). For comparison, content from bots and humans identified in Study 1 was used to evaluate 

the impact of LLM-generated content versus non-LLM-generated content. Given the complexity 

of the information ecosystem on social media, it is crucial to integrate diverse content types that 

are likely to appear on these platforms and assess their impact on one’s attitudes. This second 

study simulated personas from the existing research (i.e., McFadden et al., 2024) using LLMs 

(i.e., Llama-3) to participate in the experiment. Further, Study 3 recruited actual human subjects 

to verify the results.    

12 

 
Table 1 
Details of the Main Variables  

Variable Name   Conceptual Definition  

Operational Definition  

Topics refer to frequently observed 
themes over time.  

BERTopic generates topical themes 
based on closely clustered keywords. 

Study 1  

Topics  

Bot status  

Politicization  

Bot status indicates whether users are 
classified as bots or humans, with bots 
exhibiting human-mimicking behaviors 
that humans do not necessarily display. 

Politicization refers to the extent to 
which an issue or event becomes 
personalized, with narratives shifting 
from broader economic, social, and 
political analysis to focus on competing 
actors, while also involving political 
figures who represent opposing factions, 
driving controversy within the political 
arena (Chinn et al., 2020).   

Measureme
nt Level  

Units of 
Observations   

Categorical    Aggregated 

themes of 
tweets across 
time  

An 
Individual 
user  

Detecting bots with temporal 
information using LLMs (i.e., GPT-4)  

Categorical   

Two dictionaries (i.e., republican and 
democrat), developed in Chinn et al. 
(2020) was implemented. 

Continuous   The number 
of politicized 
words 
observed 
from an 
individual 
user  

13 

 
  
  
 
 
 
Table 1 (cont’d) 

Polarization  

Polarization refers to the degree of 
political bias, typically categorized as 
right-wing, left-wing, or centrist, that is 
often reflected in media (Baley et 
al. ,2020). 

Emotions  

Emotions in this study are based on 
Ekman’s basic emotions, which include 
fear, anger, disgust, sadness, joy, and 
surprise. This study focuses on the 
expression of fear, anger, disgust, and 
joy, which are considered invariant 
across cultures and species (Ekman, 
1992). Additionally, neutrality is 
included to account for objectivity in the 
linguistic features being examined. 

Pretrained models from hugging face 
(bucketresearch/politicalBiasBERT) 
was utilized to calculate the score of 
political bias in texts (Baly et al., 
2020). The model initially adopted 
labels–left, center, and right–
classifying and recoding using 
numeric values of -1, 0, or 1, 
respectively.  

Pretrained models from hugging face 
(j-hartmann/emotion-english-
distilroberta-base) was utilized to 
measure the level of emotions (e.g., 
fear, anger, disgust, neutrality, joy) 
expressed in tweets on a scale from 0 
to 1. 

Categorical   Aggregated 
polarization 
of user’s 
tweets    

Continuous    Aggregated 
emotions of 
user’s tweets   

14 

 
 
 
 
Table 1. (cont’d) 

Study 2 and Study 3 

Attitude   

Attitudes indicate one’s stance towards 
GMO editing (McFadden et al., 2024). 

Bot likeness  

Bot likeness refers to the extent to which 
people perceive the user of the message 
as bots.  

Human likeness   Human likeness refers to the extent to 
which people perceive the user of the 
message as bots.  

Self-reported ratings of how likely 
individuals are to advocate for the 
position of the message, on a scale 
from 1 to 5. 

Self-reported ratings of how likely 
individuals are to perceive the user of 
content created by bots, on a scale 
from 1 to 5. 

Self-reported ratings of how likely 
individuals are to perceive the user of 
content created by humans, on a scale 
from 1 to 5. 

Continuous    Individual 

participants  

Continuous    Individual 

participants  

Continuous    Individual 

participants  

15 

 
STUDY 1  

Study 1 Research Design 

Research Procedure  

The dataset comprises 26,040,042 tweets which were extracted from July 2009 to 

September 2019. A list of hashtags and keywords (Table 2) was used to target and include posts 

relevant to GMOs, which were extracted through the valid application programming interface 

(API). Given the large size of the dataset, users were selected based on their activity levels. 

Specifically, the total number of posts by each user was calculated and 800 users from each 

quartile of activity was selected; in total 2,400 user accounts were selected, and the total of 

1,449,994 tweets posted by these users were used for analysis. This approach maintains dataset 

representativeness while reducing its size. Before the analysis the data was cleaned in the 

procedure outlined in the following section. Once completed, the suggested variables were 

created based on the measurement description.   

Table 2    
Keywords and Hashtags used for Tweet Extraction   

Keywords   

Hashtags   

gmo, gmos, gm food, gmfoods, gm 

#gmo, #gmos, #gmfood, #gmfoods, #gm_food, 

foods, genetically modified, genetic 

#gm_foods, #geneticallymodified, 

modified, genetical modified, genetic 

#genetically_modified, #geneticallymodifiedfood, 

modification, genetical modification, 

#geneticallymodifiedfoods, 

genetically modification, genetic 

#genetically_modified_food, 

engineering, genetical engineering, 

#genetically_modified_foods, #geneticmodification, 

genetically engineering, genetical 

#genetic_modification, #geneticengineering, 

engineered, genetically engineered, 

#geneticalengineering, #geneticallyengineering, 

transgenic, transgenics, transgenesis, 

#genetic_engineering, #geneticengineered, 

transgenically, transgenes, transgene 

#genetically_engineered, #genetically, #transgenic, 

#transgenics, #transgenes, #transgene  

16 

 
Notes. Tweets written in English but published from all countries were extracted.    

Data Processing   

For topical analysis, tweets were preprocessed; however, the process was minimized as 

BERT is a transformer-based model which is already trained for contextualizing meanings 

between words. The extracted tweets were preprocessed using a package, preprocessor, which is 

available in Python. First, unnecessary features such as uniform resource locators (URLs), 

mentions, and reserved words (e.g., RT, FAV) were eliminated from the tweets; hashtags were 

kept while removing the sign (#) as they often capture meaningful content in tweets. Lastly, extra 

white spaces and punctuation were removed. Having cleaned the tweets of such content, all 

letters were changed to lower cases for uniformity. Except for topical analyses, raw full texts 

were provided.  

Measurements  

The conceptual and operational definitions for each variable are summarized in Table 1. 

Bot Status. The present study used LLMs to detect bots, as tested in Heo et al. (2024). 

As recommended, temporal information, along with example cases, was provided to the LLMs to 

distinguish between bots and humans. Specifically, both correct and incorrect cases were 

presented as examples to enhance task performance. This approach was informed by multiple 

rounds of pilot studies, including coding by human experts (Appendix B). In the coding process, 

bots were assigned a value of 1, while humans were assigned a value of 0. 

Topics. Latent topics were extracted using unsupervised topical modeling, specifically 

BERTopic (Grootendorst, 2022). Here, a topical modeling technique was applied, targeting bots, 

and human users, respectively.  

Politicization. Two dictionaries (i.e., Republican and Democrat), developed by Chinn et 

al. (2020) were implemented to tally the frequency of words found in each dictionary (Appendix 

17 

 
A). Chinn et al. created these dictionaries as part of their exploration into politicization, 

specifically within COVID-19 news coverage. By analyzing the frequency of political party or 

political affiliation terms within various texts, their study demonstrates how such language can 

mark politicization, highlighting the subtle ways language may influence audience perception 

and contribute to politicization of an issue. This approach to tallying frequencies based on party-

affiliated language provides a quantitative lens for assessing political discourse across diverse 

media channels and contexts. 

Polarization. In investigating polarization, a pretrained model in Hugging Face, 

specifically bucketresearch/politicalBiasBERT, developed by Baly et al. (2020), was used to 

calculate the political bias score in texts. The model is designed to classify texts along a political 

spectrum, initially using labels–left, center, and right–and recoding them into numeric values of -

1, 0, or 1, respectively. Baly et al. trained this model to recognize nuanced linguistic patterns and 

biases across political ideologies, allowing it to assess ideological leanings in a wide range of 

textual content. This numeric recoding facilitates a more standardized, quantitative analysis of 

political bias, enabling comparisons of language patterns across different political orientations. 

Emotions. A sentence-level analysis will be conducted, which is an extension of the 

word–and lexical–level analysis that has been used in previous communication literature 

(Rudkowsky et al., 2018) using a pretrained model in Hugging Face (j-hartmann/emotion-

english-distilroberta-base). Emotions, including anger, disgust, fear, joy, and neutrality, were 

evaluated on a scale from 0 (no emotion) to 1 (extreme emotion).  

18 

 
 
 
Table 3 
Topical Themes of Bots and Humans  

Topic 

Keywords 

Bot 

Gene editing technology 

geneediting, im, think, those, genomeediting, crisprcas9, 
benefits, ag, my, talk 

GMO TweetZUP1 trending  1h, tweetzup, trending, page, been, has, for, the, gmo, popped 

Non-GMO snack 

 Non-GMO snack 

Non-GMO snack  

 Non-GMO snack  

Non-GMO snack  

 Non-GMO snack  

 Non-GMO snack 

Non-GMO snack  

Human 

everybody, perfect, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

ideal, everybody, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

ideal, everyone, glutendairy, snack, bars, kosher, healthy, keep, 
free, nongmo 

perfect, everyone, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

ideal, everyone, glutendairy, snack, bars, kosher, healthy, keep, 
free, nongmo 

ideal, everybody, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

everybody, perfect, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

perfect, everyone, glutendairy, snack, bars, kosher, healthy, 
keep, free, nongmo 

The GMO controversy 

are, my, but, if, crops, gmos, not, so, we, organic 

 AquAdvantage Salmon 

salmon, frankenfish, fish, consumption, aquabounty, fda, 
approved, wild, approves, animal 

H.112—Protecting the right 
to know GMOs 

vermont, vermonts, law, vt, effect, treading, passes, attorney, 
vts, h112 

Physicians for social 
responsibility supports 
mandatory labeling of 
GMOs 

psr, responsibility, physicians, social, support, foods, for, 
labeling, of, gmo 

1TweetZUP is a free tool that alerts users to trending topics and real-time events on Twitter (X). 

19 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table 3 (cont’d) 

A bill placing a moratorium 
on the cultivation of GMOs 

maui, hawaii, moratorium, kauai, county, hawaiis, island, judge, 
mauis, hawaiians 

Glyphosate and health 
related concerns 

glyphosate, probable, carcinogen, deterioration, glyphosates, 
urine, genotoxic, classifies, roundup, geopolitics 

Genetically engineered 
bacteria 

newsgenetically, lives, save, bacteria, can, science, engineered, 
inoperable, listerine, anaerobic 

non-GMO products  

vegetariansafe, 5000mcg, biotin, nails, hair, skin, glutenfree, 
healthy, 100, here 

GMOs fighting widespread 
bee-killing mites 

bees, beekeepers, 'honey, 100000, bee, killing, neonicotinoid, 
permit, ants, flying 

Conspiracy theories and 
public health beliefs 

chemtrails, vaccines, fluoride, cdcwhistleblower, vaccine, skies, 
aluminum, radiation, sb277, water 

Study 1 Results 

After implementing the modified bot detection approach proposed by Heo et al. (2024), 

in a total of 2,400 user accounts, 1,742 user accounts (76%) were classified as bots whereas 

those of 448 (22%) were categorized as humans. The remaining 60 users (3%) were 

uncategorized; there was no particular distinction of this number of users, but this was attributed 

to a technical issue with GPT-4 when generating the response. A total of 1,449,994 tweets posted 

by 1,742 bots and 448 humans were used for the analysis.  

In order to test H1, BERTopic was used to extract representative themes across tweets of 

each entity, and the 10 most dominant topics were selected in this report. In terms of bots’ topics, 

the most prevailing topic included the keywords such as ‘genediting,’ 

‘genomeediting,’crisprcas9,’and ‘benefits,’ which referred to gene editing technologies; yet, the 

representative documents of this topic mostly questioned GMOs. Moreover, the same keywords 

were repeatedly shown in different topics; terms including ‘glutendaily,’ ‘snack,’ ‘bars,’ 

‘kosher,’ ‘healthy,’ ‘keep,’ and ‘nongmo’ were repeatedly shown in seven other categories. The 

keywords in topics of bots implies the generic promotion of nonGMO foods. On the other hand, 

20 

 
 
 
 
 
 
 
 
topics of humans’ tweets, there were more political and social controversies over GMOs. The 

most prevalent topic included terms such as ‘if,’ ‘crops,’ ‘gmos,’ ‘not,’ ‘so,’ and ‘organic,’ and 

relevant documents pointed out the GMO labeling issues and called out companies that support 

GMO foods. Similarly, other topics depicted social issues surrounding GMOs. Notably, the issue 

about genetically modified salmon approved by FDA was captured by the keywords like 

‘salmon,’ ‘frankenfish,’ ‘fish,’ ‘consumption,’ ‘aquabounty,’ ‘fda,’ ‘approved,’ ‘wild,’ and 

‘approves,’ ‘animal.’ Moreover, the issue of Maui County’s moratorium on GMO crops in 

Hawaii was revealed with the terms including ‘maui,’ ‘hawaii,’ ‘moratorium,’ ‘kauai,’ ‘county,’ 

‘hawaiis,’ ‘island,’ ‘judge,’ ‘mauis,’ and ‘hawaiians.’ That being said, the topics of tweets posted 

by humans and bots were discrete and bots were more likely to promote a nonGMO product 

whereas humans were more interested in discussing real life events related to GMOs (Table 3). 

Thus, the results support H1. Despite the distinct topics, it is important to note that some 

overlapping themes, such as questioning GMOs, suggest potential interactions between humans 

and bots. 

To test the H2, the levels of politicization, polarization, and emotions including anger, 

disgust, fear, neutrality, and joy between bots and humans were compared. Mann-Whitney U 

tests were conducted due to the non-parametric nature of variables used in this test. Initially I 

posited that bots were more likely to post more politicized and polarized tweets. However, the 

results indicate that humans used more politicized and had more polarized languages in their 

contents compared to bots (politicization: p= .0002; polarization: p<.001, Table 4). In terms of 

negative emotions including anger, disgust, and fear, in alignment with the hypothesis, anger, 

disgust, and fear were shown to be higher among bots compared to humans (anger: p<.001; 

disgust: p<.001, fear: p<.001, Table 4). Furthermore, neutrality was higher for humans 

21 

 
(neutrality: p<.001, Table 4) but bots showed a higher level of joy compared to humans (joy: 

p<.001, Table 4). That is, H2 is partially supported: unexpectedly, humans were more likely to 

use polarized, politicized, neutral language, while expressing high anger, disgust, fear, and less 

joy compared to bots. 

22 

 
Table 4 
Descriptive Statistics of Variables (Study 1) 

Polarization 

Politicization 

Anger 

M (SD) 

M (SD) 

M (SD) 

Disgust 

M (SD) 

Fear 

M (SD) 

Neutral 

M (SD) 

Joy 

M (SD) 

Bot 

-0.09 (0.99) 

0.00 (0.06) 

0.15 (0.21) 

0.01 (0.04) 

0.46 (0.36) 

0.16 (0.25) 

0.10 (0.22) 

(n=1,250,401) 

Human 

-0.13 (0.99) 

0.00 (0.05) 

0.14 (0.21) 

0.01 (0.08) 

0.44 (0.37) 

0.22 (0.29) 

0.06 (0.15) 

(n=112,845) 

Bot + Human 

-0.09 (0.21) 

-0.09 (0.99) 

0.15 (0.21) 

0.01 (0.04) 

0.46 (0.36) 

0.16 (0.26) 

0.10 (0.22) 

(n=1,363,246) 

Range 

-1 – 1 

  0 – 4 

0 – 1 

0 – 1 

0 – 1 

0 – 1 

0 – 1 

23 

 
 
Figure 1 
Visualization of Linguistic Feature Trends Over Time, Aggregated by Month, Comparing Bots 
and Humans

To evaluate the causal relationship posed by research questions (RQ1 through RQ3), 

vector autoregression (VAR) was used to run the granger causality tests as well as to generate 

impulse response functions (IRFs). Each linguistic score of bots and humans was aggregated by 

averaging the scores by month. Then the prediction of linguistic features from bots to humans 

and humans to bots were tested to infer a causal direction of the relationship. Granger causality 

tests specifically examine the lagged impact of each linguistic feature of one entity on another 

over time. Furthermore, IRFs are employed to measure the magnitude and significance of the 

temporal responses of the linguistic features of each entity on one another. While incorporating 

the longitudinal effect of variables in the social media environment, the delayed lag was selected 

24 

 
 
to 1. Such a decision was made given the short-lived discussion in social media (Zhang et al., 

2023). Further, IRF was selected to 12 to predict the reactive dynamics of the effect of shocks 

over 12 months.  

Figure 1 demonstrates how a monthly aggregated variable had changed over time. To 

specifically address the directional influence between bots and humans, Granger causality tests 

were conducted. When evaluating the directional relationship between politicization of bots and 

humans (RQ1), the results indicated that the politicization of bots Granger-caused the 

politicization of humans (p=0.03) (Table 5). Conversely, the influence from humans to bots was 

not statistically significant (p=1) (Table 5). In terms of polarization (RQ2), neither direction 

demonstrated statistical significance (bot to human: p=0.35; human to bot: p=1) (Table 5). 

Furthermore, with respect to the directional relationship of anger (RQ3a), the findings revealed 

that human anger Granger-caused bots anger (p=0.006) (Table 5), but not vice versa (p=1.00) 

(Table 5). In terms of fear (RQ3b), although bots influenced humans more than the reverse, the 

relationships were nonsignificant (bots to humans: pfear=0.98; humans to bots: pfear=0.12 (Table 

5). For disgust (RQ3c), both directions were significant at the 0.05 significance level (bots to 

humans: p=0.00; humans to bots: p=0.01 (Table 5).  

25 

 
 
 
Figure 2  
Impulse Response Functions (IRFs) of Disgust, Analyzing the Reciprocal Impact Between Bots 
and Humans

Further, IRFs were produced to confirm the treatment effect in future projection. The 

results showed that the increase in the lagged polarization and anger of bots decreased the level 

of those variables of humans, accordingly; yet these relationships were nonsignificant 

(Bpolarization=-0.29, ppolarization=0.06; Banger=-0.07, panger=0.58). Otherwise, the increase in the lagged 

politicization, fear, and disgust of bots increased the level of those corresponding variables of 

26 

 
 
humans; yet such relationships were also nonsignificant Bpoliticization=0.02, ppolarization=0.86; 

Bfear=0.02, pfear=0.87) but disgust (Bdisgust=1.68, pdisgust=0.00) (Figure 2).  

The opposite direction (i.e., humans to bots) was also tested; the results showed that the 

increase in lagged politicization, polarization, and anger of humans decreased the level of those 

variables of bots, accordingly; yet these relationships were nonsignificant (Bpoliticization=-0.08, 

ppolarization=0.31; Bpolarization=-0.03, ppolarization=0.52; Banger=-0.05, panger=0.36). Otherwise, the 

increase in lagged fear and disgust of humans increased the level of those corresponding 

variables of bots; yet such relationships were also nonsignificant (Bfear=0.07, pfear=0.24; 

Bdisgust=0.02, pdisgust=0.31).  

Table 5  
Granger Causality Test (Based on VARs with Lag of 1) 
Independent Variable 
Dependent Variable 

Bots → Humans 

Human Politicization 

Bot Politicization 

Human Polarization 

Bot Polarization 

Human Anger 

Human Disgust 

Human Fear 

Humans → Bots 

Bot Anger 

Bot Disgust 

Bot Fear 

Bot Politicization 

Human Politicization 

Bot Polarization 

Human Polarization 

Bot Anger 

Bot Disgust 

Bot Fear 

Human Anger 

Human Disgust 

Human Fear 

*p <.05, **p <.01, ***p <.001, two tailed 

p-value 

0.028* 

0.347 

0.264 

0.000*** 

0.977 

0.260 

0.351 

0.006** 

0.014* 

0.119 

27 

 
 
 
 
 
 
 
Study 1 Discussion 

The present study examined the linguistic features of humans and bots and evaluated 

whether specific linguistic characteristics of one entity influenced those of the other. Previous 

studies have explored the provocative and often conflicting role of bots in online discussions, 

particularly during social and political events, where their presence can escalate conflicts and 

distort discourse. Despite differences in context, this study aimed to investigate the role of bots 

in comparison to humans in public discourse on Twitter (X) (the key results of Study 1 are 

summarized in Table 6). 

First, the topical frames generated by humans and bots were compared to identify 

discrepancies in the content produced by each. The results revealed that bot-generated content 

tended to use more generic and promotional language than human-generated content. In contrast, 

human-generated content was more likely to reflect real-life events, such as those related to 

GMOs (e.g., FDA’s approval of GMO salmon). Although bots also generated content 

referencing social events like Maui County’s moratorium on genetically engineered crops and 

FDA’s approval of GMO salmon, the majority of their content focused on promoting GMO-

related products. Such results indicate the unique roles that bots and humans play in shaping 

public discourse. However, despite the fact that each entity generated distinct content, the 

presence of overlapping topics, specifically questioning GMOs, highlight their mutual interest 

that ispossibly shared in public discourse. 

Next, specific linguistic features were targeted to examine whether bots expressed more 

politicization, polarization, and negative emotions such as anger, fear, and disgust. Additionally, 

the study investigated whether bots showed fewer neutral and positive emotions such as joy. 

Unexpectedly, humans’ tweets were more likely to include polarized and politicized language. 

28 

 
However, negative emotions like anger, disgust, and fear were more prevalent in bot content 

compared to human content. Although joy was expected to be more prominent in human tweets, 

it was found to be higher in bot content, whereas neutrality was higher in human tweets. These 

findings confirm previous studies that have identified bots as highly emotionally charged 

(particularly with negative emotions), although they were found to express less politicized or 

polarized languages compared to humans. This discrepancy may be linked to the results on 

topical frames, which suggest that bots tend to generate more promotional and advertising-

related content. Notably, however, bots were found to generate more angry, disgusted, and 

fearful content than humans, which may reflect the role of bots in amplifying perceived 

uncertainties, threats, or risks related to GMOs.  

Further, the direct impact of bots on humans were assessed, particularly in terms of how 

bot content may affect human content. The results showed that politicization and disgust 

expressed by bots had longitudinal effects on human content creation. Notably, the disgust 

expressed by bots had significant long-term impacts on humans, suggesting a strong dynamic 

response, as indicated by the IRFs. This effect was more pronounced than that of the other 

variables, highlighting the potential role of bots in shaping human content over time. 

29 

 
 
 
Table 6 
Summary of Results for Hypotheses and Research Questions in Study 1 
Hypotheses and Research Questions 

Summary of Results 

H1: The topical themes of bots’ tweets 
will differ from those of humans’ tweets. 

H1 is supported; Bots’ tweets and humans’ tweets 
exhibited distinct topical issues. 

H2. Bots’ tweets will be more likely than 
humans’ tweets to exhibit 1) 
politicization, 2) polarization, and 3) 
negative emotions including a) anger, b) 
disgust, c) fear, while exhibiting less 4) 
neutrality and 5) positive emotion (i.e., 
joy). 

RQ1: Does the politicization of bots’ 
tweets influence the politicization of 
humans’ tweets? 

H2-1 is not supported; Humans’ tweets exhibited 
a greater level of politicization compared to bots’ 
tweets. 

H2-2 is not supported; Humans’ tweets exhibited 
a greater level of polarization compared to bots’ 
tweets.H2-3 is supported. Bots’ tweets showed a 
higher level of anger, disgust, and fear than 
humans’ tweets.  

H2-4 is partially supported. Bots’ tweets showed 
a lower level of neutrality but a higher level of 
joy than humans’ tweets. 

A Granger causality test indicates that the 
politicization of bots’ tweets influenced that of 
humans. However, the IRF results did not further 
support the directional influence. 

RQ2: Does the polarization of bots’ 
tweets influence the polarization of 
humans’ tweets? 

A Granger causality test indicates that the 
polarization of bots’ tweets did not influence that 
of humans (or vice versa). The IRF results also 
did not support either directional influence. 

RQ3: Do the negative emotions in bots’ 
tweets— 1) anger, 2) disgust, and 3) 
fear—influence the emotions in humans’ 
tweets? 

RQ3-1: A Granger causality test indicates that the 
anger in humans’ tweets influenced bots’ tweets. 
However, the IRF results did not further support 
this directional influence. 

RQ3-2: A Granger causality test indicates that the 
disgust in bots’ tweets influenced humans’ 
tweets. The IRF results also supported this 
directional influence. 

RQ3-3: A Granger causality test indicates that the 
fear in bots’ tweets does not influence humans’ 
tweets or vice versa. Additionally, the IRF results 
did not further support this directional influence. 

30 

 
 
 
 
 
STUDY 2 

Study 2 Research Design 

Stimuli  

To evaluate the impact of LLM-powered bots, GPT-4 was used to create a bot content 

with five tweets. In parallel, to address the growing trend of humans using LLMs to generate 

content, comparable LLM-generated human content was also created. Additionally, tweets from 

bot and human accounts identified in Study 1 were compiled, with five tweets selected from each 

category (human or bot). In each stimulus, five tweets were presented. A compilation of tweets 

was provided as a reference to facilitate the content creation process, using the few-shot 

prompting technique. Importantly, the length of tweets was limited to 280 characters, in 

accordance with X guidelines. Additionally, the prompt instructed GPT-4 to include URL to 

verify its ability to reference an external source. However, due to potential confounding effects, 

the URLs2 were kept consistent across conditions. Details of the instructions are included in 

Appendix C, and the stimuli are provided in Appendix A.  

Research Procedure   

An online experiment was conducted using personas developed from data provided by 

McFadden et al. (2024). This approach, validated by Hewitt et al. (2024), demonstrated a strong 

correlation with actual treatment effects. While Hewitt et al. (2024) utilized GPT-4 in their 

experiments, the present study employed Llama-3. This decision was made because GPT-4 had 

already been used for content creation, potentially leading to data contamination through 

2 This study chose to use repeated URL links across conditions for two reasons. First, the links were not 
interactively used by participants, as they were provided to the LLMs. Second, when creating the stimuli using GPT-
4, it was found that GPT-4 was capable of incorporating URL links, even when relevant to the context. To maintain 
uniformity, this study decided to embed the same URL links in each tweet, using them consistently across all 
conditions.  

31 

 
 
memorization (e.g., Li et al., 2024; Jiang et al., 2024). Initially, participant personas were created 

using Llama-3, based on demographic data from McFadden et al. (2024).  The dataset, provided 

by the first author of the paper, included demographic information such as race, age, education, 

gender, partisanship, income, location, region, and previous experience in fields related to food, 

agriculture, health, or medicine. In alignment with the method implemented by Hewitt et al. 

(2024) to stimulate persona, a list of demographic information was provided to Llama-3 which 

used it to construct a persona for each participant that would then respond to the questions after 

being exposed to a stimulus. (an actual script is available in Appendix D). The created personas 

were randomly assigned to one of the experimental conditions, where they were exposed to a 

stimulus and asked to indicate their stance on the issue presented.  

Participants 

As described earlier, the present study generated personas based on previous research. 

This section outlines the demographics of participants from McFadden et al. (2024), which the 

present study utilized for Llama-3. In total, 3,125 participants responded to the survey. More 

than half of participants identified as female (55%). The average age was 44.39 (SD=19.31). Of 

all the participants 71% were White, followed by Black (14%) and Hispanic (7%); additionally, 

36% of them reported to be identified as Democrats and 28% as Republicans (see Table 8 for 

details).        

Measurements  

The conceptual and operational definitions for each variable are summarized in Table 1. 

Bot-likeliness. Participants will be asked to measure the perceived bot-likeness of the 

message on a 0-100 scale (0 = not at all to 100 = very much).    

32 

 
 
Human-likeness. Participants will be asked to measure the perceived human-likeness of 

the message on a 0-100 scale (0 = not at all to 100 = very much).   

Attitudinal Change. The same question used by McFadden et al. (2024) was employed 

to evaluate participants’ stance on the safety of gene editing in the context of food and 

agriculture. The question asked, "What is your opinion about the safety of gene editing in the 

context of food and agriculture?" Participants responded using a 5-point Likert scale, ranging 

from "extremely unsafe" (1) to "extremely safe" (5). 

Participants’ initial survey answers were subtracted from their responses from the present 

experiment to assess any changes in their attitudes. The higher scores indicate the extent to 

which participants changed their response in accordance with the given message.   

Study 2 Results 

In the experiment, personas were generated using demographic information from 

McFadden et al. (2024) with Llama-3, and these personas responded to the questions. In total, 

3,125 personas were created, and corresponding responses were recorded. Occasionally, Llama-3 

did not generate responses for specific questions; in these cases, available responses were 

considered and included in the analysis. To address the research questions, the study employed 

planned contrast analysis, which enabled comparisons between specific conditions. As the 

research questions required comparisons between two conditions, contrast weights of -1 and 1 

were assigned to the selected conditions, with a weight of 0 assigned to the remaining conditions. 

The details are included in Table 9. 

RQ4 examines the impact of LLM-generated content compared to non-LLM-generated 

content on the extent to which people perceive the content as being created by bots or humans. 

When comparing the LLM-generated bot condition to the human condition (RQ4-1a), 

33 

 
participants perceived LLM-generated bot content as more bot-like than human content 

(t(1190.53) = 16.57, p = 0.00). Conversely, when comparing the LLM-generated bot condition 

with the human condition (RQ4-1b), participants in the LLM-generated bot condition perceived 

the content as more human-like than in the human condition (t(1293.04) = 2.71, p = 0.01). 

Next, the LLM-generated bot condition was compared to the bot condition (RQ4-2a), 

revealing that participants in the LLM-generated bot condition reported higher levels of bot-

likeness compared to those in the bot condition, though the effect was not significant (t(1361.23) 

= 1.77, p = 0.08). When comparing the LLM-generated bot condition with the bot condition 

(RQ4-2b), the results showed that the bot condition had a higher mean score for human-likeness 

compared to the LLM-generated bot condition (t(1530.37) = -20.39, p = 0.00). 

The comparison of bot-likeness between the LLM-generated human condition and the 

human condition (RQ4-3a) revealed that the human condition showed a higher level of bot-

likeness compared to the LLM-generated human condition (t(1005.08) = -2.02, p = 0.04). 

However, when comparing the LLM-generated human condition with the human condition 

(RQ4-3b), the mean score for human-likeness in the LLM-generated human condition was 

significantly higher than in the human condition (t(1432.47) = 66.23, p = 0.00). 

Lastly, when comparing the LLM-generated human condition with the bot condition 

(RQ4-4a), participants in the bot condition perceived the content as more bot-like than those in 

the LLM-generated human condition (t(1425.42) = -30.05, p = 0.00). When comparing perceived 

human-likeness between the LLM-generated human condition and the bot condition (RQ4-4b), 

participants in the LLM-generated human condition perceived the content as more human-like 

compared to those in the bot condition (t(1543.08) = 31.30, p = 0.00). 

34 

 
RQ5a examines whether there is a discrepancy in the perception of bot-likeness between 

LLM-generated human content and LLM-generated bot content. The results indicated that 

participants in the LLM-generated bot condition perceived the content as more bot-like 

compared to those in the LLM-generated human condition (t(1312.16) = 28.01, p = 0.001). 

RQ5b explores whether participants perceive the human-likeness of LLM-generated human 

content differently from LLM-generated bot content. The results showed that LLM-generated 

human content was perceived as more human-like compared to LLM-generated bot content 

(t(1499.02) = -50.23, p = 0.00). 

RQ6a investigates whether there is a perceived difference between the bot condition and 

the human condition. The results indicated that the bot condition was perceived as more bot-like 

compared to the human condition (t(1048.14) = 16.23, p = 0.001). RQ6b examines whether there 

is a difference in the perception of human-likeness between the bot condition and the human 

condition. The results showed that participants in the bot condition perceived the content as more 

human-like compared to those in the human condition (t(1385.17) = 28.19, p = 0.00). 

RQ7 examines the comparative effects of content generated by LLM-generated bots 

versus non-LLM-generated content on attitude change regarding gene editing in food and 

agriculture. First, the LLM-generated bot condition was compared with the human condition 

(RQ7-1). The results indicated that the human condition led to a greater attitude change 

compared to the LLM-generated bot condition (t(1530.56) = -4.11, p = 0.001). 

Next, the LLM-generated bot condition was compared with the bot condition (RQ7-2). 

The results showed a negligible difference in attitude change between the two conditions, which 

was not significant (t(1558.51) = 0.04, p = 0.97). 

35 

 
The attitude change induced by the LLM-generated human condition was compared with 

the human condition (RQ7-3). The results showed a slightly higher attitude change for the 

human condition compared to the LLM-generated condition, but the difference was not 

significant (t(1491.15) = -1.54, p = 0.12). 

The LLM-generated human condition was compared with the bot condition (RQ7-4), and 

the results showed that LLM-generated human content induced a greater attitude change 

compared to bot content (t(1541.88) = 2.93, p = 0.00). 

RQ8 explores the comparative impact of LLM-generated bot content versus LLM-

generated human content. The results indicated that LLM-generated human content led to a 

greater attitude change among participants compared to LLM-generated bot content (t(1550.63) 

= -2.93, p = 0.00). 

Lastly, RQ9 examines the difference in attitude change between the human condition and 

the bot condition. The results showed that participants in the human condition experienced a 

greater attitude change compared to those in the bot condition (t(1542.14) = -4.10, p =0.00). 

Study 2 Discussion 

Expanding on Study 1, which identified the influence of bots on humans on Twitter, 

Study 2 primarily focused on examining individuals’ ability to discern LLM-generated content 

from non-LLM-generated content, as well as the extent to which they are influenced by LLM-

generated content compared to non-LLM-generated content (the key results of Study 2 are 

summarized in Table 7). 

An interesting finding emerged regarding LLM-generated human content, which was 

perceived as less likely to be created by bots compared to bot content but more likely to be 

created by bots compared to human content; LLM-generated human content was, however, more 

36 

 
 
 
likely to be created by humans compared to both human content and bot content. Additionally, 

LLM-generated bot content was more likely to be recognized as bot-generated compared to 

human content, but was equivalently perceived as being bot-generated compared to bot content; 

LLM-generated bot content was more likely to be perceived as human-generated compared to 

human content, but less likely to be perceived as human-generated compared to bot content. 

These findings suggest that LLM-generated human content may be perceived as more human-

like and less bot-like compared to non-LLM-generated content, potentially deceiving individuals 

and influencing their perceptions, thereby affecting attitude change. Yet, people were generally 

not equipped to accurately identify the source of the content. In terms of attitude change, LLM-

generated bot content had an impact as equivalent as bot content, and its impact was significantly 

smaller compared to human content. However, although LLM-generated human content was not 

as impactful as human content, it had a greater impact on attitude change compared to bot 

content. The results also indicate the LLM-generated human content could affect people’s 

attitude as much as human content.  

Additionally, this study evaluated how effectively individuals distinguished between 

LLM-generated bot content and LLM-generated human content, as well as the impact of each 

type of content on attitude change. Interestingly, participants were more likely to perceive LLM-

generated bot content as being created by bots, and less likely to perceive it as human-generated, 

compared to LLM-generated human content. Moreover, participants’ attitudes were more 

strongly swayed by LLM-generated human content than by LLM-generated bot content. These 

findings underscore the potential risk of LLM-generated human content, which may deceive 

individuals more effectively than LLM-generated bot content. 

37 

 
Finally, the present study compared bot content and human content to assess how well 

individuals can discriminate between the two types, as well as how each type influences attitude 

change. Bot content was more likely to be perceived as bot-generated compared to human 

content, but it was still seen as more human-like than human content. Despite this perceptual 

ambiguity, human content led to a greater attitude change than bot content.  

Table 7 
Summary of Results for Hypotheses and Research Questions in Study 2 
Research questions 

Summary of results 

RQ4-1: Do people perceive human content 
and LLM-generated bot content differently 
in terms of (a) bot-likeness and (b) human-
likeness? 

LLM-generated bot content was perceived as 
more likely to be created by bots than human 
content. However, LLM-generated bot content 
was more likely to be perceived as created by 
humans compared to human content. 

RQ4-2: Do people perceive bot content and 
LLM-generated bot content differently in 
terms of (a) bot-likeness and (b) human-
likeness? 

Bot content was not perceived differently from 
LLM-generated bot content in terms of bot-
likeness. However, bot content was more likely 
to be perceived as created by humans compared 
to LLM-generated bot content. 

RQ4-3: Do people perceive human content 
and LLM-generated human content 
differently in terms of (a) bot-likeness and 
(b) human-likeness? 

LLM-generated human content was perceived as 
more likely to be created by bots than human 
content. However, LLM-generated human 
content was perceived as more likely to be 
created by humans than human content. 

RQ4-4: Do people perceive bot content and 
LLM-generated human content differently 
in terms of (a) bot-likeness and (b) human-
likeness? 

Bot content was perceived as more likely to be 
created by bots than LLM-generated human 
content. Additionally, LLM-generated human 
content was perceived as more likely to be 
created by humans than bot content. 

RQ5: Do people perceive LLM-generated 
bot content and LLM-generated human 
content differently in terms of (a) bot-
likeness and (b) human-likeness? 

LLM-generated bot content was perceived as 
more likely to be created by bots than LLM-
generated human content. Additionally, LLM-
generated human content was perceived as more 
likely to be created by humans than LLM-
generated bot content. 

38 

 
 
 
Table 7 (cont’d) 

RQ6: Do people perceive bot content and 
human content differently in terms of (a) 
bot-likeness and (b) human-likeness? 

RQ7-1: Do LLM-generated bot content 
and human content affect individuals’ 
attitudes differently? 

Bot content was perceived as more likely to be 
created by bots than human content. However, 
bot content was perceived as more likely to be 
created by humans than human content. 

Human content led to a greater level of attitude 
change compared to LLM-generated bot.  

RQ7-2: Do LLM-generated bot content 
and bot content affect individuals’ attitudes 
differently? 

There was no difference in attitude change 
between LLM-generated bot content and bot 
content. 

RQ7-3: Do LLM-generated human content 
and human content affect individuals’ 
attitudes differently? 

There was no difference in attitude change 
between LLM-generated human content and 
human content. 

RQ7-4: Do LLM-generated human content 
and bot content affect individuals’ attitudes 
differently? 

LLM-generated human content led to a greater 
level of attitude change compared to bot content. 

RQ8: Do LLM-generated human content 
and LLM-generated bot content affect 
individuals’ attitudes differently? 

LLM-generated human content led to a greater 
level of attitude change compared to LLM-
generated bot content. 

RQ9: Do human content and bot content 
affect individuals’ attitudes differently? 

Human content led to a greater level of attitude 
change compared to bot content. 

39 

 
 
 
Table 8. 
Sample Demographics in Study 2 and Study 3 

Age 

Gender 

   Female 

   Male 

   Other 

Race 

   White 

   Black 

   Hispanic 

   Non-Black /Non-White/      

   Non-Hispanic 

Partisanship 

   Democrats 

   Republicans 

   Non-Republican/  

   Non-Democrat 

Education 

   Less than high school degree 

   High school degree (high  

   school diploma or    

   equivalent including GED) 

   Some college but no degree    

   Associate’s degree in college  

   Bachelor’s degree in college 

   Graduate or Professional  

   degree (MS, PhD, JD, MD) 

Study 2 

Study 3 

M (SD) or n (%) 

M (SD) or n (%) 

44.39 (19.31) 

45.24 (16.62) 

1713 (54.82) 

1412 (45.18) 

2227 (71.26) 

446 (14.27) 

219 (7.01) 

233 (7.46) 

701 (49.09) 

682 (47.76) 

45 (3.15) 

939 (65.76) 

169 (11.83) 

137 (9.59) 

183 (12.82) 

1126 (36.03) 

864 (27.58) 

1137 (36.38) 

664 (46.50) 

302 (21.15) 

462 (32.35) 

118 (3.78) 

787 (25.18) 

3 (0.21) 

171 (11.97) 

892 (28.54) 

344 (11.01) 

638 (20.42) 

346 (11.07) 

298 (20.87) 

168 (11.76) 

537 (37.61) 

251 (17.58) 

40 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Table 8 (cont’d) 

Income 

   Less than $10,000 

   $20,000 - $29,999     

   $30,000 - $39,999     

   $40,000 - $49,999 

   $50,000 - $59,999     

   $60,000 - $69,999     

   $70,000 - $79,999     

   $80,000 - $89,999     

   $90,000 - $99,999     

   $100,000 - $149,999     

Area 

   Suburban 

   Urban 

   Rural 

Region 

217 (6.94) 

68 (4.76) 

354 (11.33) 

103 (7.21) 

332 (10.62) 

129 (9.03) 

263 (8.42) 

119 (8.33) 

335 (10.72) 

144 (10.08) 

218 (6.98) 

109 (7.63) 

274 (8.77) 

118 (8.26) 

146 (4.67) 

82 (5.74) 

174 (5.57) 

91 (6.37) 

388 (12.42) 

232 (16.25) 

1570 (50.24)  754 (52.80) 

842 (26.94) 

426 (29.83) 

713 (22.82) 

248 (17.37) 

   South (Delaware, Maryland, Washington DC, Virginia,  

1339 (42.85)  545 (38.17) 

   West Virginia, Kentucky, North Carolina, South Carolina,  

   Tennessee, Georgia, Florida, Alabama, Mississippi,  

   Arkansas, Louisiana, Texas, Oklahoma) 

   Midwest (Ohio, Michigan, Indiana, Wisconsin, Illinois,    

668 (21.38) 

273 (19.12) 

   Minnesota, Iowa, Missouri, North Dakota, South  

   Dakota, Nebraska, Kansas) 

   Northeast (Maine, New Hampshire, Vermont,  

575 (18.40) 

246 (17.23) 

   Massachusetts, Rhode Island, Connecticut, New  

   York, New Jersey, Pennsylvania) 

   West (Montana, Idaho, Wyoming, Colorado, New  

543 (17.38) 

364 (25.49) 

   Mexico, Arizona, Utah, Nevada, California, Oregon,  

   Washington, Alaska, and Hawaii) 

41 

 
 
 
 
 
 
 
Table 9 
Planned Contrast Analysis Weights, Condition Means (and Standard Deviations) 

LLM-generated bot content  

LLM-generated human content  

Bot 

Human 

Contrast weight 

  Contrast 1 

  Contrast 2 

  Contrast 3 

  Contrast 4 

  Contrast 5 

  Contrast 6 

Study 2 

M (SD) 

1 

1 

0 

0 

1 

0 

  Bot-likeness 

61.45 (25.24) 

  Human-likeness 

13.49 (24.25) 

  Attitude change 

0.55 (1.15) 

781 

n 

Study 3 

M (SD) 

  Bot-likeness 

48.80 (27.31) 

  Human-likeness 

58.42 (26.55) 

  Attitude changea 

0.19 (0.78) 

n 

377 

0 

0 

1 

1 

-1 

0 

28.78 (18.41) 

70.31 (20.13) 

0.71 (1.07) 

781 

47.64 (27.03) 

57.93 (26.26) 

0.20 (0.62) 

365 

aThe impact of attitude change for Study 3 was tested using multinomial logistic regression.

42 

0 

-1 

0 

-1 

0 

1 

-1 

0 

-1 

0 

0 

-1 

59.32 (20.08) 

32.11 (38.59) 

37.21 (21.52) 

10.72 (14.94) 

0.55 (1.19) 

0.81 (1.33) 

781 

782 

48.82 (26.54) 

48.28 (28.05) 

57.01 (25.91) 

57.46 (27.26) 

0.20 (0.67) 

0.21 (0.73) 

349 

337 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Study 3 Research Design 

STUDY 3  

Study 3 aimed to replicate Study 2 with human participants in order to verify the findings 

generated by Llama-3 in Study 2. Although other LLMs (e.g., GPT-4) have produced rigorous 

results in previous research (Hewitt et al., 2024), Llama-3 had not been actively employed in 

prior studies. In Study 3, by recruiting human participants, we were able to both confirm the 

effectiveness of the results generated by Llama-3 and replicate the findings from Study 2. 

Stimuli 

In Study 2, only textual information was used as input for the LLMs. However, for Study 

3, which involved human participants, the stimuli were designed as tweet-formatted images. 

While creating realistic tweet stimuli, additional external cues were incorporated. To control for 

confounding factors, these cues—such as username, user handle, profile image, post date and 

time, number of likes, views, retweets, quotes, and saves—were kept constant across conditions 

(see Figure S-4). 

Research Procedure    

The research procedure in Study 3 was slightly modified to accommodate actual human 

participants. Participants were recruited via the online platform Prolific, and upon completion of 

the study, each participant received a $1 compensation. The study was advertised on Prolific, and 

individuals interested in participating voluntarily joined. To minimize bias, the true aim of the 

study was concealed at the outset in order to assess the genuine impact of LLM-generated and 

non-LLM-generated messages. 

After consenting to participate, participants were first asked to complete a set of 

questions designed to assess their pre-existing stance on gene editing in the context of food and 

43 

 
 
agriculture. These questions were mixed with bogus items to prevent participants from 

identifying the study’s true purpose. Participants were then randomly assigned to one of the 

experimental conditions and asked to respond to the same attitude questions about gene editing, 

as well as questions regarding bot-likeness and human-likeness. Finally, participants completed 

demographic questions before the experiment concluded. At the end of the study, a debriefing 

message was provided to explain the true purpose of the experiment. 

Participants 

As mentioned, participants were recruited via Prolific. An initial power analysis result 

pointed to having 1,424 samples based on the small effect size of AI on attitudes reported by 

Huang and Wang. (2023), but I oversampled in a case of the poor quality of data. After removing 

responses that did not complete (at least) 60% of questions and did not pass an attention check 

question, in total, 1,428 responses were left. Less than half of participants identified as female 

(49%). The average age was 45.24 (SD=16.62). Of all the participants, 66% were White, 

followed by Black (12%) and Hispanic (10%); additionally, 46% of them reported to be 

identified as Democrats and 21% as Republicans (see Table 8 for details).     

Measurements 

The conceptual and operational definitions for each variable are summarized in Table 1. 

Bot-likeness. The same item was used in Study 2.  

Human-likeness. The same item was used in Study 2 

Attitude Change. The same measure used in Study 2 was applied in Study 3 to calculate 

attitude change. The key difference, however, was that in Study 3, attitudes toward gene editing 

in the field of agriculture were measured both before and after exposure to the stimuli, as there 

was no pre-measurement of attitudes in advance.

44 

 
Table 10 
Multinomial Logistic Regression Model of Attitude Changes after Exposure to the Stimuli 

Dependent variable category 

Independent variables 

Negative change (v. no change)  Positive change (v.no change) 

(Reference group: Human content) 

LLM-generated bot content 

LLM-generated human content 

Bot content 

AIC 

†p < .1, *p < .05, ** p < .01, ***p < .001, two-tailed. 

b (SE) 

b (SE) 

0.18 (0.33) 

0.22 (0.33) 

-0.08 (0.36) 

2046.97 

0.11 (0.19) 

0.16 (0.19) 

0.11 (0.19) 

45 

 
 
 
 
 
 
 
 
Study 3 Results 

In reporting the results for Study 3, a planned contrast analysis was conducted to address 

RQs 4 through 6, following the same methodology used in Study 2. 

RQ4 explores the difference in bot-likeness and human-likeness perceptions between the 

LLM-generated conditions and non-LLM-generated conditions. The difference in perceived bot-

likeness was examined between the LLM-generated bot content and the human content (RQ4-

1a). The results showed a minimal difference between these conditions (t(1420) = 0.72, p = 

0.47). The LLM-generated bot condition was compared with the human condition to test the 

difference in perceived human-likeness (RQ4-1b), and the results were non-significant (t(1419) = 

1.37, p = 0.17).  

The LLM-generated bot condition was also compared with the bot condition (RQ4-2a), 

and the results revealed no significant differences between the conditions (t(1420) = -0.02, p = 

0.98). However, when comparing the LLM-generated bot condition to the bot condition in terms 

of perceived human-likeness (RQ4-2b), the results showed that the LLM-generated bot content 

was perceived as more human-like compared to the bot content (t(1419) = 2.02, p = 0.04).  

When comparing the LLM-generated human condition with the human condition (RQ4-

3a), participants perceived them almost equally as bot-like (t(1420) = -0.89, p = 0.38). Further, 

the difference in perceived human-likeness between the LLM-generated human condition and 

the human condition was minimal (RQ4-3b) (t(1419) = 0.68, p = 0.50).  

Regarding the comparison between the LLM-generated human condition and the bot 

condition (RQ4-4a), only a negligible shift was found in perceived bot-likeness (t(1420) = -1.63, 

p = 0.47), and when comparing the LLM-generated human condition with the bot condition for 

human-likeness (RQ4-4b), the results were also non-significant (t(1419) = 1.32, p = 0.19). 

46 

 
RQ5a explores the difference in perceived bot-likeness between the LLM-generated bot 

condition and the LLM-generated human condition, with the result showing no significant 

difference (t(1420) = 1.61, p = 0.11). RQ5b further examines the perceived human-likeness of 

the LLM-generated human condition versus the LLM-generated bot condition, revealing no 

significant difference (t(1419) = 0.69, p = 0.49). 

RQ6a investigates whether there is a discrepancy in perceiving the content as bots, 

specifically between the bot condition and the human condition, but no significant difference was 

observed (t(1420) = 0.74, p = 0.46). RQ6b examines the difference in perceived human-likeness 

between the human condition and the bot condition, with no significant difference found (t(1419) 

= -0.64, p = 0.52). 

Figure 3 
The Contrasted Estimated Marginal Means for Each Comparison of the Experimental Groups 

For the analysis of attitude change (RQ7-RQ9), a multinomial regression was conducted, 

as the dependent variable was not normally distributed. Approximately 74% of participants 

showed no change in attitude, 5% moved in the opposite direction of the content, and 21% 

shifted their attitude in the direction suggested by the content. Multinomial regression was 

applied to compare the conditions where participants shifted their attitudes (either positively or 

negatively) versus those who showed no change. The “no change” group was used as the 

reference to compare “positive change” (i.e., change in alignment with the content) and 

“negative change" (i.e., change in opposition to the content). The results of the multinomial 

47 

 
 
 
 
regression are presented in Table 10. Additionally, to compare the impact of each experimental 

condition with one another, the emmeans package was used. This package allows for contrasting 

the estimated marginal means of each experimental condition across groups, where the attitudes 

of the groups were changed either positively, negatively, or not at all. Figure 3 displays the 

contrasted estimated marginal means for each comparison of the experimental groups. 

RQ7 evaluated attitude change induced by the LLM-generated conditions versus the non-

LLM-generated condition. The LLM-generated bot condition was compared with the human 

condition (RQ7-1), and the results showed no significant difference in all three categories of 

attitude change (positive change: estimate =-0.02 SE=0.03, p = 0.99; negative change: estimate = 

-0.01, SE=0.02, p = 0.99; no change: estimate = 0.02, SE=0.03, p = 0.99).  

The impact of the LLM-generated bot condition was compared with the bot condition 

(RQ7-2), and the results showed no significant effect on attitude change for either the positive,  

negative change, no change groups (positive change: estimate =0.003 SE=0.03, p = 1.00; 

negative change: estimate = -0.01, SE=0.02, p = 0.99; no change: estimate = 0.01, SE=0.03, p = 

1.00). 

When the LLM-generated human condition was compared with the human condition 

(RQ7-3), there was no significant difference in attitude change for either the positive, negative, 

or no change groups (positive change: estimate =0.003 SE=0.03, p = 1.00; negative change: 

estimate = -0.01, SE=0.02, p = 0.99; no change: estimate = 0.01, SE=0.03, p = 1.00 ).  

The LLM-generated human condition was compared with the bot condition (RQ7-4), the 

effect was not statistically significant in none of the groups (positive change: estimate =-0.004 

SE=0.03, p = 1.00; negative change: estimate = -0.01, SE=0.02, p = 0.98; no change: estimate = 

0.02, SE=0.03, p = 0.99). 

48 

 
  
For RQ8, which examines the effect of the LLM-generated human condition versus the 

LLM-generated bot condition on attitude change, both conditions exhibited equivalent non-

significant effects across groups (positive change: estimate =-0.01 SE=0.03, p = 1.00; negative 

change: estimate = -0.002, SE=0.02, p = 1.00; no change: estimate = 0.01, SE=0.03, p = 1.00). 

Finally, for RQ9, which investigates the effect of the human condition versus the bot 

condition on attitude change, both conditions exhibited equivalent non-significant effects across 

groups (positive change: estimate =0.02 SE=0.03, p = 0.99; negative change: estimate = 0.005, 

SE=0.02, p = 1.00; no change: estimate = 0.01, SE=0.03, p = 1.00). 

Study 3 Discussion  

While targeting human participants, Study 3 aimed to replicate the findings from Study 2. 

Although most results were not fully reproduced, some aligned with those from the previous 

study. When participants were asked to evaluate whether the content they received appeared to 

be generated by humans or bots, they found no significant difference between LLM-generated 

content and non-LLM-generated content. However, there was one exception: when comparing 

LLM-generated bot content with bot content, participants reported that the LLM-generated bot 

content was more likely to be created by humans. Even though this finding was not reported in 

Study 2, the current result showed the difficulty of identifying the source of messages with the 

given content. Additionally, different types of content did not affect participants’ attitudes, which 

did not replicate the results from Study 2, where LLM-generated human content and human 

content were found to cause a greater attitude change compared to other types of content. (the 

key results of Study 2 are summarized in Table 11). 

Overall, the mean scores for perceptions of bot-likeness and human-likeness ranged 

narrowly from 47.64 to 48.82, indicating no significant differences across the conditions. 

49 

 
Similarly, the mean perception of human-likeness ranged from 57 to 58 across the conditions. 

This suggests that participants were generally unable to distinguish between LLM-generated and 

non-LLM-generated content, except when comparing LLM-generated bot content to bot content, 

where they perceived the LLM-generated bot content as more human-like. 

When comparing the LLM-generated bot content to other LLM-generated content, no 

significant differences were observed in terms of perceived bot-likeness or human-likeness. This 

pattern was also consistent when comparing the bot content with the human content. 

Regarding the impact of LLM-generated content on attitude change, no statistically 

significant differences were observed in any of the comparisons (non LLM-generated content). 

This is quite consistent with the results from Study 2, even though Study 2 reported the 

significant effect of attitude change of LLM-generated content over non LLM-generated content. 

Finally, when comparing the effects of LLM-generated content to one another, the extent 

to which participants changed their attitudes was nearly identical regardless of the type of LLM-

generated content they were exposed to. Similarly, no significant difference in attitude change 

was observed between the bot content and the human content. These results indicate that varied 

types of content did not necessarily lead to attitude change. 

Table 11 
Summary of Results for Hypotheses and Research Questions in Study 3 
Research questions 

Summary of results 

RQ4-1: Do people perceive human content 
and LLM-simulated bot content differently 
in terms of (a) bot-likeness and (b) human-
likeness?   

Human content and LLM-generated bot content 
were not perceived differently in terms of bot-
likeness and human-likeness.  

RQ4-2: Do people perceive bot content and 
LLM-simulated bot content differently in 
terms of (a) bot-likeness and (b) human-
likeness? 

Bot content and LLM-generated bot content 
were not perceived differently in terms of bot-
likeness. However, LLM-generated bot content 
was perceived as more likely to be created by 
humans than bot content. 

50 

 
Table 11 (cont’d) 

RQ4-3: Do people perceive human content 
and LLM-generated human content 
differently in terms of (a) bot-likeness and 
(b) human-likeness? 

RQ4-4: Do people perceive bot content and 
LLM-generated human content differently 
in terms of (a) bot-likeness and (b) human-
likeness? 

RQ5: Do people perceive LLM-generated 
bot content and LLM-generated human 
content differently in terms of (a) bot-
likeness and (b) human-likeness? 

Human content and LLM-generated human 
content were not perceived differently in terms 
of bot-likeness or human-likeness. 

Bot content and LLM-generated human content 
were not perceived differently in terms of bot-
likeness or human-likeness. 

LLM-generated bot content and LLM-generated 
human content were not perceived differently in 
terms of bot-likeness or human-likeness. 

RQ6: Do people perceive bot content and 
human content differently in terms of (a) 
bot-likeness and (b) human-likeness? 

Bot content and human content were not 
perceived differently in terms of bot-likeness or 
human-likeness. 

RQ7-1: Do LLM-generated bot content and 
human content affect individuals’ attitudes 
differently? 

There was no difference in attitude change 
between LLM-generated bot content and human 
content. 

RQ7-2: Do LLM-generated bot content and 
bot content affect individuals’ attitudes 
differently? 

There was no difference in attitude change 
between LLM-generated bot content and bot 
content. 

RQ7-3: Do LLM-generated human content 
and human content affect individuals’ 
attitudes differently? 

There was no difference in attitude change 
between LLM-generated human content and 
human content. 

RQ7-4: Do LLM-generated human content 
and bot content affect individuals’ attitudes 
differently? 

There was no difference in attitude change 
between LLM-generated human content and bot 
content. 

RQ8: Do LLM-generated human content 
and LLM-generated bot content affect 
individuals’ attitudes differently?   

There was no difference in attitude change 
between LLM-generated human content and 
LLM-generated bot content. 

RQ9: Do human content and bot content 
affect individuals’ attitudes differently? 

There was no difference in attitude change 
between bot content and human content. 

51 

 
 
 
GENERAL DISCUSSION 

This dissertation investigates the impact of bots on humans. In Study 1, social media data 

(tweets) were used to examine the linguistic features of both bots and humans, focusing on how 

these entities differ in their linguistic styles and whether bots' linguistic features influence human 

language use. First, the findings from topic modeling indicate that the tweets of humans and bots 

were distinct. However, there was an overlapping topic questioning GMOs, suggesting a shared 

interest between both bots and humans. The subsequent analysis was conducted to compare the 

difference of linguistic features between bots and humans, as well as to infer a causal 

relationship between bots and humans; the results showed that anger, disgust, fear, and joy were 

found to be more prevalent among bots compared to humans, while polarization, politicization, 

and neutrality were pronounced among humans compared to bots. Moreover, some features such 

as politicization and disgust directly influenced human linguistic behaviors.  

To further investigate the influence of bots, Studies 2 and 3 explored recent 

advancements in technology, particularly LLMs, which enable bots to generate content that more 

closely resembles human-written text. This technological development creates opportunities for 

bots to be perceived as humans, potentially altering the information ecosystem on social media 

platforms. However, the extent to which people can identify whether a message is written by a 

human or a bot, and whether this distinction influences their attitudes, remains unclear. And 

studies 2 and 3 sought to investigate these effects. The results from Studies 2 and 3 consistently 

demonstrated that participants were unable to accurately distinguish the source of content. In 

Study 2, LLM-generated human content had a stronger impact on participants’ attitudes 

compared to both LLM-generated bot content and bot content. Additionally, human content led 

to a greater attitude change than LLM-generated bot content and (non-LLM) bot content. 

52 

 
 
However, this effect was not observed in Study 3. The following sections break down these 

results and explore their broader implications. 

In Study 1, an observational study was conducted to analyze the topics and linguistic 

features in tweets from both bots and humans. The results revealed distinct differences in the 

content produced by humans and bots. Humans focused more on real-life events, whereas bots 

tended to post promotional and generic content. However, there were some topics shared 

between bots and humans, which indicate a shared interest and possible further interactions. In 

terms of linguistic features, humans were more likely to use politicized, polarized, and neutral 

language, while bots displayed higher levels of anger, disgust, fear, and joy. This pattern aligns 

with previous literature, which indicates that bots use emotionally charged language, with 

negative emotions such as anger, disgust, and fear being more pronounced than humans (Bail et 

al., 2020; Badawy et al., 2018; Keller & Klinger, 2019; Shao et al., 2017). This finding 

emphasizes the malicious role of bots in insinuating negativity. Yet, humans still displayed more 

politicized, polarized, and neutral languages, which could be attributed to the generally low 

levels of these variables, specifically politicization averages of 0.002 for bots and 0.003 for 

humans. That being said, it is important to note that these differences could be attributed to large 

sample sizes and require caution for further interpretation.  

Furthermore, Study 1 identified a direct influence between bots and humans in terms of 

language use, with bots’ politicization and disgust impacting human discourse. This influence 

was evident over time, especially with regard to disgust, which was further reinforced by the 

IRFs. Although politicization was less pronounced among bots compared to humans, the 

politicized tweets of bots still had a significant influence on those of humans, presenting a 

potential risk for shaping human attitudes. Moreover, among the various negative emotions, 

53 

 
bots’ disgust demonstrated a direct influence on that of humans. These findings align with 

previous literature that emphasizes the possible nefarious impact bots can have on human 

attitudes (Aldayel & Magdy, 2022; Bail et al., 2020), and they were confirmed longitudinally. 

In Study 2, a controlled online experiment was conducted to assess the impact of bots on 

human attitudes, as well as the extent to which people could discern whether content was 

generated by bots or humans. Given the rise of LLMs, there was concern regarding the potential 

impact of LLM-generated content on users. As such, in addition to the human and bot content 

tested in Study 1, this experiment also included content generated by GPT-4. The experiment 

was conducted using Llama-3, with personas derived from demographic information provided by 

McFadden et al. (2024). The results were somewhat mixed when participants identified the 

source of the content. Yet, LLM-generated human content was seen as more human-like than 

other counterparts and perceived as less bot-like than LLM-generated bot content and bot 

content. These findings suggest that LLM-generated human content may have a greater impact 

on individuals than traditional bot content and even LLM-generated bot content. Nonetheless, as 

seen in other comparisons, it is likely that people overall were unable to identify the source of 

content, and such results are consistent with previous studies suggesting that people struggle to 

differentiate AI-generated content from human content (Clerwall, 2014; Graefe et al., 2018; 

Karinshak et al., 2023). Furthermore, when examining attitude change, LLM-generated human 

content also had a greater influence on attitude change compared to bot content and LLM-

generated bot content. This finding implies that LLM generated content, specifically LLM-

generated human content that mimics human language could have a more significant impact on 

individuals than traditional bot content (Burtell & Woodside, 2023). 

54 

 
 
Study 3 aimed to replicate Study 2 with real human participants. While the majority of 

significant findings from Study 2 did not replicate in Study 3, some trends were consistent with 

the earlier results. Overall, the variation in perceptions of bot-likeness and human-likeness was 

minimal, with means for bot-likeness ranging from 47.64 to 48.82 on a 0-100 scale, and human-

likeness ranging from 57.01 to 58.42 on a 0-100 scale. The only notable difference was that 

LLM-generated bot content was more likely to be perceived as human-generated than bot 

content. These findings suggest that participants struggled to distinguish between content created 

by bots and humans in alignment with Study 2. Yet, in Study 2, participants misattributed the 

source of the content whereas in Study 3, there was a weak tendency to classify the content as 

either human-generated or bot-generated. This lack of differentiation poses a potential risk, as 

inaccurate perceptions of content origin could affect how users process and respond to the 

information.  

Interestingly, while most participants did not change their attitudes in response to the 

stimuli, 21% (301 out of 1,428) of participants did shift their stance in alignment with the 

message. Although the effect was not statistically significant, this finding suggests that certain 

individuals may still be influenced by LLM-generated content as well as non-LLM generated 

content. However, it is inconclusive since the non-significant results in Study 3 can be attributed 

to confounding variables that could not be controlled. Hence, it is important to be cautious when 

interpreting these results. Additionally, it is possible that the effects were attenuated since the 

study’s design captured participants’ attitudes immediately after exposure. The future directions 

that could address these limitations are presented in the following section.   

55 

 
 
 
 
 
Table 12.  
A Summary of Results  

Summary  

Study 1 demonstrated the role of bots in public discourse on Twitter (X) regarding GMOs, in 
contrast to humans. Results showed that bots’ posts were more generic and commercial 
compared to those of humans. In terms of linguistic features, bots elicited more anger and fear 
than humans whereas politicization and polarization were more pronounced among humans. 
When analyzing the direct impact of bots' linguistic features on humans, politicization in bots 
was found to directly influence politicization in humans, and disgust in bots also directly 
influenced disgust in humans. 

Study 2 and Study 3 were conducted to further explore the causal influence of bots. Both 
studies incorporated the latest technology, specifically LLMs, which potentially enhance the 
impact of bots. Online experiments were conducted to compare the impact of LLM-
empowered content versus non-LLM-empowered content on individuals’ attitudes as well as 
their ability to identify the source of the content. While Study 2 found that participants 
perceived LLM-empowered human content as more human-generated and less bot-generated 
than the other content types, it also found that LLM-empowered human content was more 
effective at changing attitudes compared to traditional bot content and LLM-generated bot 
content. In contrast, Study 3 found that all content influenced attitudes to a similar degree, and 
participants had difficulty identifying the correct source. 

Overall, the results are in alignment with the CASA theory (Reeves & Nass, 1996), as 

computers, specifically AI-enhanced bots in the present context, may induce social responses 

from users who mindlessly apply social rules while interacting with them. Although attitude 

changes prompted by the contents were inconsistent across the present studies, it is notable that 

participants were not proficient at identifying the correct source of the content, which implies the 

potential influence of bots on social media platforms. The CASA theory has been applied to 

numerous technologies, and recent development in artificial intelligence—specifically those that 

possess language capabilities, facilitate interactivity, fill social roles, and operate with human-

sounding speech (Nass & Steuer, 1993)—could make AI-enhanced bots a powerful source of 

influence, promoting human-machine communication that closely resembles human-human 

communication. As more people adopt this new technology, it is imperative to continue studies 

56 

 
exploring its potential benefits and risks for individuals and society. These efforts could help 

raise awareness among individuals and inform policymakers and governments to enact proper 

interventions and/or regulations regarding AI use particularly on social media.     

57 

 
 
 
CONCLUSION 

This dissertation explores the impact of bots on online public discourse, specifically 

focusing on human users’ language and attitudes in response to their possible online interactions. 

The findings suggest that bots and humans exhibited distinct topics, though there was some 

overlap regarding the reliability of GMO technology. A similar topic shared between bots and 

humans indicated common interests and the potential for further interactions. Moreover, bots 

were more likely to display negatively charged emotions in their tweets, and specific features, 

such as politicization and disgust, influenced the linguistic characteristics of humans’ tweets, 

particularly over time. This highlights the importance of user vigilance, as well as governmental 

interventions and/or regulations, in mitigating the potential disruptions bots can cause in public 

discourse and the influence they may exert on humans’ opinions. As the present research implies 

the potential risk posed by bots, it is imperative to conduct more research to explore the effective 

and legitimate forms of interventions that could mitigate the harm. Importantly, the present 

results align with prior research that primarily focused on politically charged topics, 

demonstrating that the presence of bots can affect public discourse on social media platforms, 

regardless of the topic. 

Furthermore, the comparison between LLM-generated content and non-LLM-generated 

content demonstrates the potential for LLM-powered bots to deceptively influence users, as 

people are often unable to distinguish between human-like and machine-generated content on 

social media. As bots become more sophisticated with advances in technology, they are 

increasingly capable of influencing human attitudes in ways that are almost indistinguishable 

from human interaction. Given the proliferation of such technologies on social media, 

58 

 
understanding their relational impact on humans becomes even more critical (Guzman & Lewis, 

2019). 

Limitations 

This research is not without its limitations. Despite efforts to design the studies 

rigorously, several trade-offs were made to balance cost and efficiency. First, only a selected 

number of users were included in Study 1. This decision was made to reduce the computational 

costs associated with bot detection and subsequent analyses. However, the users were randomly 

selected based on their level of activity (e.g., number of posts), and tweets were retrieved 

accordingly. While an effort was made to use a representative sample, it will be important to 

incorporate a full dataset in future research. 

Second, pretrained models used to assess polarization and detect discrete emotions may 

introduce limitations in interpreting results. Although these models are initially fine-tuned 

through supervised learning to capture specific concepts, transitioning to unsupervised learning 

can lead to unintended biases or deviations from the original task, potentially distorting the 

intended interpretation. In contrast, a dictionary-based approach was employed to analyze 

politicization, which can enhance interpretability by focusing on specific, predefined terms. 

However, this approach struggles to capture the contextual nuances and subtleties of meaning 

that pretrained models are better equipped to handle. As such, future researchers should be aware 

of the inherent limitations of both approaches.  

Regarding the online experiments, the third limitation is the stimuli used cannot be 

claimed to have been perfectly designed. The classification of messages as bot-generated or 

human-generated, as determined by human coders and a bot detection tool, showed only limited 

consensus. As a result, only those messages for which there was agreement between at least two 

59 

 
human coders and the bot detection tool were included in the analysis. As outlined in Appendix 

B, achieving agreement among human coders in bot detection proved to be a challenging task. 

Nonetheless, the present study aimed to include only messages that had reached a reasonable 

level of consensus. 

Fourth, the use of Llama-3 to generate personas and conduct experiments may have 

influenced the findings, given that the method has not been widely validated in prior research. In 

contrast, Hewitt et al. (2024) used GPT-4 and found a high correlation between actual results and 

those derived from LLM-simulated personas. Although their study did not employ Llama-3, the 

state-of-the-art performance of Llama-3 across various fields (Dubey et al., 2024) supports its 

inclusion in the current research, despite the limited validation of the method. 

Fifth, the non-significant results of Study 3 could be attributed to confounding factors. 

Despite efforts to control the cues provided in the stimuli, some of them may have been 

perceived as specific indicators for identifying content as either human or bot-generated. The 

present research focused primarily on the content, but it is important to recognize that source 

cues might have influenced participants, leading them to heuristically perceive and process the 

message. Therefore, it is crucial to specifically control for such variables and investigate their 

effects. 

Lastly, Study 3 was unable to replicate the results from Study 2. The primary difference 

between the two studies was the use of human participants in Study 3, but it is important to note 

that the observed differences could also be attributed to the short time lag between the pre and 

post-attitude measures. In Study 2, pre-existing attitudes were drawn from McFadden et al. 

(2024), whereas in Study 3, participants’ existing attitudes were measured immediately prior to 

exposure to the stimulus. Given the findings of previous persuasion studies, it is difficult to 

60 

 
change one’s attitude in a short period, and the brief interval between the pre and post-exposure 

measurements in Study 3 may have influenced the results. 

Future Directions 

The present research project demonstrated the impact of bots on humans across three 

separate studies. Several key findings warrant further exploration. As noted in the limitations, 

although the current study selected users from a dataset, future research could expand by 

including the entire dataset, incorporating all users who participated in the discussions. Although 

the current findings were derived from the representative samples of long-term public discourse 

surrounding GMOs, by analyzing the whole dataset, a more comprehensive understanding of the 

influence of bots on humans would be provided. 

Furthermore, as noted earlier, unsupervised and supervised methods can yield different 

results. While recognizing the trade-offs associated with each approach, it is important to attempt 

replication of the results using alternative methods to verify their robustness. For example, 

comparing results from both dictionary-based and pre-trained models can help verify their 

consistency. In social science research, supervised methods are often preferred for their ability to 

maintain the interpretability of results. However, combining different methods can contribute to 

methodological advancements and provide a more comprehensive understanding.   

As discussed earlier, the current research utilized specific language models for distinct 

purposes, such as GPT-4 for content creation and Llama-3 for the online experiment. This 

decision was made to avoid data contamination resulting from overreliance on a single model. 

However, this choice limits the generalizability of the findings. In future research, different 

language models should be tested to better understand the precise impact of LLMs on the 

influence of bots and shaping individuals’ attitudes. Additionally, employing various LLMs in 

61 

 
online experiments could further expand our understanding of their potential in social science 

research. Moreover, regarding the results from Study 2 and Study 3, further investigation is 

needed to understand how individuals’ perceptions of the source of content affect their 

evaluation of messages and attitudes. This point is exemplified by studies conducted by 

Wischnewski et al. (2021) and Wischnewski et al. (2024). The present studies highlight the 

challenge of detecting bots, especially as technological advancements make identification even 

more difficult. The issue arises when people mistakenly perceive users as bots, and the potential 

impact of such misperceptions. Wischnewski et al. (2021) found that people are more likely to 

assume users with opinion-incongruent posts are bots, a tendency mediated by perceived 

credibility. In other words, individuals who identify posts that align with their own opinions are 

more likely to evaluate those posts as credible and attribute them to humans rather than bots. 

While the experimental conditions in this study compared opinion-congruent and opinion-

incongruent content, other factors could bias users’ perceptions, and it is important to identify 

conditions that may amplify these biases.  

Further, the cognitive processing of messages from either bots or humans remains an 

important area for future research. In the current study, it is difficult to assess the amount of 

cognitive effort people devote to processing each message, particularly in relation to (perceived) 

source cues, which may influence the extent to which messages affect attitude change (Chaiken 

& Ledgerwood, 2012; Chaiken & Maheswaran, 1994). If people engage more deeply with LLM-

generated messages than with non-LLM-generated content, this could amplify the impact on 

attitude change. While social media messages are often short-lived, it is critical to understand 

how cognitive processing might vary depending on the type of content, as this could shape long-

term effects. 

62 

 
Moreover, the effects of repeated exposure to messages should also be examined. Given 

the transient nature of attention specifically on social media, users’ attention spans tend to be 

short. However, repeated exposure to the same messages over time may influence attitudes in 

ways that are not immediately evident (Bornstein, 1989; Schmidt & Eisend, 2015). Yan et al. 

(2023) reported the effects of people’s exposure to bots increased individuals’ perpetual bias, 

leading them to overestimate others’ vulnerability to bots and decreased their self-efficacy in 

recognizing bots. Although this experiment also relied on relatively short exposure of the 

message, it still showed a significant impact on subsequent evaluations. That being said, 

investigating the effects of repeated or extended exposure on message acceptance or rejection 

would be a valuable avenue for further research (e.g., Skurka & Keating, 2024). 

Lastly, while this dissertation focused on GMOs, it would be insightful to replicate this 

research in different contexts. Exploring how bots influence perceptions and attitudes in other 

topical areas could offer broader implications for understanding the dynamics of online 

discourse. 

63 

 
 
 REFERENCES 

 Aldayel, A., & Magdy, W. (2022). Characterizing the role of bots’ in polarized stance on social 

media. Social Network Analysis and Mining, 12(1), 30. https://doi.org/10.1007/s13278-
022-00858-z  

Badawy, A., Ferrara, E., & Lerman, K. (2018, August). Analyzing the digital traces of political 

manipulation: The 2016 Russian interference Twitter campaign. In 2018 IEEE/ACM 
International Conference on Advances in Social Networks Analysis and Mining 
(ASONAM) (pp. 258-265). IEEE.  

Bail, C. A., Guay, B., Maloney, E., Combs, A., Hillygus, D. S., Merhout, F., Freelon, D., & 

Volfovsky, A. (2020). Assessing the Russian Internet Research Agency’s impact on the 
political attitudes and behaviors of American Twitter users in late 2017. Proceedings of 
the National Academy of Sciences, 117(1), 243–250. 
https://doi.org/10.1073/pnas.1906420116  

Baly, R., Martino, G. D. S., Glass, J., & Nakov, P. (2020). We can detect your bias: Predicting 

the political ideology of news articles. arXiv preprint arXiv:2010.05338. 
https://doi.org/10.48550/arXiv.2010.05338 

Bornstein, R. F. (1989). Exposure and affect: Overview and meta-analysis of research, 1968–
1987. Psychological Bulletin, 106(2), 265-289. https://doi.org/10.1037/0033-
2909.106.2.265 

Broniatowski, D. A., Jamison, A. M., Qi, S., AlKulaib, L., Chen, T., Benton, A., Quinn, S. C., & 

Dredze, M. (2018). Weaponized health communication: Twitter bots and Russian trolls 
amplify the vaccine debate. American Journal of Public Health, 108(10), 1378–1384. 
https://doi.org/10.2105/AJPH.2018.304567  

Burtell, M., & Woodside, T. (2023). Artificial influence: An analysis of AI-driven persuasion. 
arXiv preprint arXiv:2303.08721. https://doi.org/10.48550/arXiv.2303.08721 

Caldarelli, G., De Nicola, R., Del Vigna, F., Petrocchi, M., & Saracco, F. (2020). The role of bot 
squads in the political propaganda on Twitter. Communications Physics, 3(1), 81. 
https://doi.org/10.1038/s42005-020-0340-4 

Carolus, A., Muench, R., Schmidt, C., & Schneider, F. (2019). Impertinent mobiles-Effects of 

politeness and impoliteness in human-smartphone interaction. Computers in Human 
Behavior, 93, 290-300. https://doi.org/10.1016/j.chb.2018.12.030 

Catherine, K. N., Mugiira, B. R., & Muchiri, N. J. (2024). Public perception of genetically 

modified organisms and the implementation of biosafety measures in Kenya. Advances in 
Agriculture, 2024(1), 5544617. https://doi.org/10.1155/2024/5544617 

Chaiken, S., & Ledgerwood, A. (2012). A theory of heuristic and systematic information 

processing. In P. A. M. Van Lange, E. T. Higgins, & A. W. Kruglanski (Eds.), Handbook 
of theories of social psychology (Vol. 1, pp. 246-266). Sage Publications.  

64 

 
Chaiken, S., & Maheswaran, D. (1994). Heuristic processing can bias systematic processing: 

Effects of source credibility, argument ambiguity, and task importance on attitude 
judgment. Journal of Personality and Social Psychology, 66(3), 460–473. 
https://doi.org/10.1037/0022-3514.66.3.460  

Chinn, S., Hart, P. S., & Soroka, S. (2020). Politicization and polarization in climate change 

news content, 1985-2017. Science Communication, 42(1), 112-129. 
https://doi.org/10.1177/107554701990029  

Chu, H., & Liu, S. (2024). Can AI tell good stories? Narrative transportation and persuasion with 

ChatGPT. Journal of Communication, 74(5), 347-358. 
https://doi.org/10.1093/joc/jqae029 

Clerwall, C. (2017). Enter the robot journalist: Users’ perceptions of automated content. In The 
Future of Journalism: In an Age of Digital Media and Economic Uncertainty (pp. 165-
177). Routledge.  

Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, 
A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., 
Korenev, A., Hinsvark, A., Rao, A., Zhang, A., …Zhao, Z. (2024). The Llama 3 Herd of 
Models (arXiv:2407.21783). arXiv. https://arxiv.org/abs/2407.21783 

Edwards, C., Beattie, A. J., Edwards, A., & Spence, P. R. (2016). Differences in perceptions of 
communication quality between a Twitterbot and human agent for information seeking 
and learning. Computers in Human Behavior, 65, 666-671. 
https://doi.org/10.1016/j.chb.2016.07.003  

Edwards, C., Edwards, A., Spence, P. R., & Shelton, A. K. (2014). Is that a bot running the 

social media feed? Testing the differences in perceptions of communication quality for a 
human agent and a bot agent on Twitter. Computers in Human Behavior, 33, 372-376. 
https://doi.org/10.1016/j.chb.2013.08.013  

Ekman, P. (1992). Are there basic emotions? Psychological Review, 99(3), 550–

553. https://doi.org/10.1037/0033-295X.99.3.550 

Ferrara, E. (2020). # covid-19 on twitter: Bots, conspiracies, and social media activism. arXiv 

preprint arXiv: 2004.09531. https://arxiv.org/abs/2004.09531  

Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. 
Communications of the ACM, 59(7), 96–104. https://doi.org/10.1145/2818717  

Funk, C. (2020, March 18). About half of U.S. adults are wary of health effects of genetically 

modified foods, but many also see advantages. Pew Research Center. 
https://www.pewresearch.org/short-reads/2020/03/18/about-half-of-u-s-adults-are-wary-
of-health-effects-of-genetically-modified-foods-but-many-also-see-advantages/  

Gorwa, R., & Guilbeault, D. (2020). Unpacking the Social Media Bot: A Typology to Guide 

Research and Policy. Policy & Internet, 12(2), 225–248. https://doi.org/10.1002/poi3.184 

65 

 
Graefe, A., Haim, M., Haarmann, B., & Brosius, H. B. (2018). Readers’ perception of computer-
generated news: Credibility, expertise, and readability. Journalism, 19(5), 595-610. 
https://doi.org/10.1177/1464884916641269 

Grootendorst, M. (2022). BERTopic: Neural Topic Modeling With a Class-Based TFIDF 

Procedure. arXiv:2203.05794v0571. https://doi.org/10.48550/arXiv.2203.05794 

Groshek, J. (2011). Media, instability, and democracy: Examining the Granger-causal 

relationships of 122 countries from 1946 to 2003. Journal of Communication, 61(6), 
1161-1182. https://doi.org/10.1111/j.1460-2466.2011.01594.x 

Guzman, A. L., & Lewis, S. C. (2020). Artificial intelligence and communication: A human–
machine communication research agenda. New Media & Society, 22(1), 70-86. 
https://doi.org/10.1177/146144481985869 

Hallman, W. K., Cuite, C. L., & Morin, X. K. (2013). Public perceptions of labeling genetically 

modified foods (Working Paper 2013–1). Rutgers, The State University of New Jersey, 
School of Environmental and Biological Sciences. 
http://humeco.rutgers.edu/documents_PDF/news/GMlabelingperceptions.pdf 

Hartmann, J. (2022). Emotion English DistilRoBERTa-base. Hugging Face. 

https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/ 

Heo, R.J., Heo, J., Lee, S., & Peng, T.Q. (2024). Leveraging large language models to detect 

bots: Utilizing meta, temporal, and social interaction information [Manuscript submitted 
for publication]. Department of Communication. Michigan State University  

Hewitt, L., Ashokkumar, A., Ghezae, I., & Willer, R. (n.d.). Predicting results of social science 

experiments using large language models [Working paper]. 
https://docsend.com/view/ity6yf2dansesucf 

Hocevar, K. P., Metzger, M., & Flanagin, A. J. (2017). Source credibility, expertise, and trust in 
health and risk messaging. In Oxford research encyclopedia of communication. 
https://doi.org/10.1093/acrefore/9780190228613.013.287 

Hohenstein, J., & Jung, M. (2020). AI as a moral crumple zone: The effects of AI-mediated 

communication on attribution and trust. Computers in Human Behavior, 106, 106190. 
https://doi.org/10.1016/j.chb.2019.106190 

Howell, E. L., Wirz, C. D., Brossard, D., Jamieson, K. H., Scheufele, D. A., Winneg, K. M., & 

Xenos, M. A. (2018). National Academies of Sciences, Engineering, and Medicine report 
on genetically engineered crops influences public discourse. Politics and the Life 
Sciences, 37(2), 250-261. https://doi.org/10.1017/pls.2018.12  

Huang, G., & Wang, S. (2023). Is artificial intelligence more persuasive than humans? A meta-

analysis. Journal of Communication, 73(6), 552-562. https://doi.org/10.1093/joc/jqad024  

66 

 
Jiang, M., Liu, K. Z., Zhong, M., Schaeffer, R., Ouyang, S., Han, J., & Koyejo, S. (2024). 
Investigating data contamination for pre-training language models. arXiv preprint 
arXiv:2401.06059. https://doi.org/10.48550/arXiv.2401.06059 

Jun, I., Zhao, Y., He, X., Gollakner, R., Court, C., Munoz, O., Bian, J., Capua, I., & Prosperi, M. 

(2020). Understanding perceptions and attitudes toward genetically modified organisms 
on Twitter. International Conference on Social Media and Society, 291–298. 
https://doi.org/10.1145/3400806.3400839  

Karinshak, E., Liu, S. X., Park, J. S., & Hancock, J. T. (2023). Working with AI to persuade: 

Examining a large language model’s ability to generate pro-vaccination messages. 
Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1-29. 
https://doi.org/10.1145/3579592 

Keller, T. R., & Klinger, U. (2019). Social bots in election campaigns: Theoretical, empirical, 

and methodological implications. Political Communication, 36(1), 171-189. 
https://doi.org/10.1080/10584609.2018.1526238 

Leshner, G., Reeves, B., & Nass, C. (1998). Switching channels: The effects of television 

channels on the mental representations of television news. Journal of Broadcasting & 
Electronic Media, 42(1), 21-33. https://doi.org/10.1080/08838159809364432 

Li, Y., Guo, Y., Guerin, F., & Lin, C. (2024, November). An open-source data contamination 

report for large language models. In Findings of the Association for Computational 
Linguistics: EMNLP 2024 (pp. 528-541). https://doi.org/10.18653/v1/2024.findings-
emnlp.30 

McFadden, B. R., Rumble, J. N., Stofer, K. A., & Folta, K. M. (2024). US public opinion about 
the safety of gene editing in the agriculture and medical fields and the amount of 
evidence needed to improve opinions. Frontiers in Bioengineering and Biotechnology, 
12, 1340398. https://doi.org/10.3389/fbioe.2024.1340398 

Moon, Y. (2000). Intimate exchanges: Using computers to elicit self-disclosure from 

consumers. Journal of Consumer Research, 26(4), 323-339. 
https://doi.org/10.1086/209566 

Naqvi, S., Zhu, C., Farre, G., Ramessar, K., Bassie, L., Breitenbach, J., ... & Capell, T. (2009). 
Transgenic multivitamin corn through biofortification of endosperm with three vitamins 
representing three distinct metabolic pathways. Proceedings of the National Academy of 
Sciences, 106(19), 7762–7767. https://doi.org/10.1073/pnas.0901412106 

Nass, C., & Moon, Y. (2000). Machines and mindlessness: Social responses to computers. 
Journal of Social Issues, 56(1), 81-103.https://doi.org/10.1111/0022-4537.00153   

Nass, C., & Steuer, J. (1993). Voices, boxes, and sources of messages: Computers and social 

actors. Human Communication Research, 19(4), 504-527. https://doi.org/10.1111/j.1468-
2958.1993.tb00311.x 

67 

 
Nass, C., Isbister, K., & Lee, E. J. (2000). Truth is beauty: Researching embodied conversational 

agents. Embodied Conversational Agents, 2000, 374-402. 

Nass, C., Jonsson, I. M., Harris, H., Reaves, B., Endo, J., Brave, S., & Takayama, L. (2005, 

April). Improving automotive safety by pairing driver emotion and car voice emotion. 
In CHI'05 extended abstracts on Human factors in computing systems (pp. 1973-1976). 
https://doi.org/10.1145/1056808.1057070 

Nass, C., Moon, Y., & Green, N. (1997). Are computers gender-neutral? Gender stereotypic 

responses to computers. Journal of Applied Social Psychology, 27(10), 864–876. 
https://doi.org/10.1111/j.1559-1816.1997.tb00275.x 

Nass, C., Moon, Y., Fogg, B. J., Reeves, B., & Dryer, C. (1995, May). Can computer 

personalities be human personalities?. In Conference Companion on Human Factors in 
Computing Systems (pp. 228-229).  

Nass, C., Steuer, J., & Tauber, E. R. (1994, April). Computers are social actors. In Proceedings 
of the SIGCHI Conference on Human Factors in Computing Systems (pp. 72-78).  

Qiu, L., & Benbasat, I. (2009). Evaluating anthropomorphic product recommendation agents: A 
social relationship perspective to designing information systems. Journal of management 
information systems, 25(4), 145-182.  https://doi.org/10.2753/MIS0742-1222250405 

Rathod, D., & Hedaoo, R. P. (2022). Assessment of knowledge and attitudes on genetically 

modified foods among students studying life sciences. Cureus, 14(12). 
https://doi.org/10.7759/cureus.32744 

Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and 

new media like real people and places. Cambridge University Press.  

Rudkowsky, E., Haselmayer, M., Wastian, M., Jenny, M., Emrich, Š., & Sedlmair, M. (2018). 
More than bags of words: Sentiment analysis with word embeddings. Communication 
Methods and Measures, 12(2-3), 140-157. 
https://doi.org/10.1080/19312458.2018.1455817 

Satariano, A., & Mozur, P. (2023, February 7). The people onscreen are fake. The disinformation 

is real. The New York Times. https://www.nytimes.com/2023/02/07/technology/artificial-
intelligence-training-deepfake.html 

Savage, S., Monroy-Hernandez, A., & Höllerer, T. (2016, February). Botivist: Calling volunteers 

to action using online bots. In Proceedings of the 19th ACM Conference on Computer-
Supported Cooperative Work & Social Computing (pp. 813-822). 

Schmidt, S., & Eisend, M. (2015). Advertising repetition: A meta-analysis on effective frequency 

in advertising. Journal of Advertising, 44(4), 415-428. 
https://doi.org/10.1080/00913367.2015.1018460 

68 

 
Schneider, F. (2020). How users reciprocate to Alexa. In C. Stephanidis et al. (Eds.), HCI 
International 2020—Late breaking posters (pp. 376–383). Springer International 
Publishing. https://doi.org/10.1007/978-3-030-60700-5_48 

Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The spread of fake 

news by social bots. arXiv preprint arXiv:1707.07592, 96, 104. 
http://arxiv.org/abs/1707.07592   

Skurka, C., & Keating, D. M. (2024). How repeated exposure to persuasive messaging shapes 
message responses over time: a longitudinal experiment. Human Communication 
Research, hqae008. https://doi.org/10.1093/hcr/hqae008 

Sohi, M., Pitesky, M., & Gendreau, J. (2023). Analyzing public sentiment toward GMOs via 

social media between 2019-2021. GM Crops & Food, 14(1), 1-9. 
https://doi.org/10.1080/21645698.2023.2190294 

Srinivasan, V., & Takayama, L. (2016, May). Help me please: Robot politeness strategies for 

soliciting help from humans. In Proceedings of the 2016 CHI conference on human 
factors in computing systems (pp. 4945-4955). https://doi.org/10.1145/2858036.2858217 

Stella, M., Ferrara, E., & De Domenico, M. (2018). Bots increase exposure to negative and 

inflammatory content in online social systems. Proceedings of the National Academy of 
Sciences, 115(49), 12435–12440. https://doi.org/10.1073/pnas.1803470115  

Stocking, G., & Sumida, N. (2018, October 15). Social media bots draw public’s attention and 

concern. Pew Research Center. https://www.journalism.org/2018/10/15/social-media-
bots-draw-publics-attention-and-concern/ 

Sundar, S. S., & Nass, C. (2000). Source orientation in human-computer interaction: 

Programmer, networker, or independent social actor. Communication Research, 27(6), 
683-703. https://doi.org/10.1177/009365000027006001 

Sundar, S. S., & Nass, C. (2001). Conceptualizing sources in online news. Journal of 

Communication, 51(1), 52-72. https://doi.org/10.1111/j.1460-2466.2001.tb02872.x 

Swanson, N. R., & Granger, C. W. (1997). Impulse response functions based on a causal 

approach to residual orthogonalization in vector autoregressions. Journal of the American 
Statistical Association, 92(437), 357-367. 
https://doi.org/10.1080/01621459.1997.10473634 

Wischnewski, M., Bernemann, R., Ngo, T., & Krämer, N. (2021, May). Disagree? You must be a 
bot! How beliefs shape twitter profile perceptions. In Proceedings of the 2021 CHI 
Conference on Human Factors in Computing Systems (pp. 1-11). 
https://doi.org/10.1145/3411764.3445109 

Wischnewski, M., Ngo, T., Bernemann, R., Jansen, M., & Krämer, N. (2024). “I agree with you, 

bot!” How users (dis) engage with social bots on Twitter. New Media & Society, 26(3), 
1505-1526. https://doi.org/10.1177/14614448211072307 

69 

 
Wunderlich, S., & Gatto, K. A. (2015). Consumer perception of genetically modified organisms 

and sources of information. Advances in Nutrition, 6(6), 842-851. 
https://doi.org/10.3945/an.115.008870 

Xu, W., & Sasahara, K. (2022). Characterizing the roles of bots on Twitter during the COVID-19 

infodemic. Journal of Computational Social Science, 5(1), 591–609. 
https://doi.org/10.1007/s42001-021-00139-3  

Yan, H. Y., Yang, K. C., Shanahan, J., & Menczer, F. (2023). Exposure to social bots amplifies 

perceptual biases and regulation propensity. Scientific Reports, 13(1), 20707. 
https://doi.org/10.1038/s41598-023-46630-x 

Zhang, Y., Shah, D., Pevehouse, J., & Valenzuela, S. (2023). Reactive and asymmetric 

communication flows: Social media discourse and partisan news framing in the wake of 
mass shootings. The International Journal of Press/Politics, 28(4), 837-861. 
https://doi.org/10.1177/19401612211072793 

Zhang, Y., Sharma, K., Du, L., & Liu, Y. (2024, May). Toward Mitigating Misinformation and 

Social Media Manipulation in LLM Era. In Companion Proceedings of the ACM on Web 
Conference 2024 (pp. 1302-1305). https://doi.org/10.1145/3589335.3641256 

70 

 
 
 
APPENDIX A: SUPPLEMENTARY TABLES AND FIGURES 

Table S-13 
Description of Key Terms 

Terms 

Descriptions 

Username 

It is a public identifier of users that is used to log in to your account and is 

(handle) 

visible when sending and receiving replies and Direct Messages. 

Retweet 

Retweeting involves sharing someone else’s tweet with the given user’s own 

(RT) 

followers 

Followers 

Followers refer to accounts that follow the user. 

Followees 

Followers refer to accounts that the user follows. 

Mention 

Mentioning enables a user to include other people’s usernames anywhere in the 

user’s own tweet. 

Hashtags 

Self-assigned topic categories for tweets that people include alongside their 

posts. 

FAV 

Favorite (FAW) is a feature that allows users to bookmark tweets and show 

appreciation for them 

BERTopic 

Unsupervised topical modeling that is used to extract latent topics from 

documents 

VAR Model  Vector Autoregression (VAR) models are used to analyze the dynamic 

relationships between multiple time series variables and how changes in one 

variable may influence others over time (Swanson & Granger, 1997). 

IRF 

Impulse Response Functions (IRFs) show how the variables in a model react 

over time to a shock in one or more of the model’s variables (Swanson & 

Granger, 1997; Zhang et al. 2023). 

71 

 
 
 
 
Table S-13 (cont’d) 

Granger 

Granger causality tests estimate whether lags of one variable can be used to 

Causality 

predict another variable longitudinally (Groshek 2011) 

Table S-14 
Dictionaries for Politicalization 
A list of words 

Republicans 

republican, republicans, gops, gop, conservatives, conservative 

Democrats 

democrat, democrats, liberal, liberals 

Figure S-4 
Stimulus for Each Condition 
LLM-generated bot 

72 

 
 
 
 
 
 
 
 
 
Figure S-4 (cont’d) 

LLM-generated 
human 

73 

 
 
 
 
 
 
 
 
 
 
 
 
Figure S-4 (cont’d) 

Bot 

74 

 
 
 
 
 
 
 
 
 
 
 
Figure S-4 (cont’d) 

75 

 
 
 
 
 
 
 
 
 
 
Figure S-4 (cont’d) 
Human 

76 

 
 
 
 
 
 
 
 
 
 
Figure S-4 (cont’d) 

77 

 
 
 
 
 
APPENDIX B: PILOT TEST RESULTS FOR BOT DETECTION 

Although the preliminary study suggested the potential of incorporating temporal 
information alongside random examples (Heo et al., 2024), the present study conducted a pilot 
experiment to validate the proposed method. In the first pilot study, a separate dataset consisting 
of 800 user accounts (200 users from each quartile of activity) was extracted from the original 
dataset. Using the proposed method with GPT-4, which included temporal information and 
random examples, the results indicated a skewed distribution: 79% of decisions identified as 
bots, 2% as humans, and 19% as undecidable. This skewed tendency raised concerns, prompting 
further verification before proceeding with the method. 

In the subsequent pilot study for the present research, 100 random users were selected 
from the final dataset of 2,400 users. Consistent with the prior study, GPT-4 was employed to 
detect bots using meta-information, text, and temporal information. To improve task 
performance, an ensemble approach was adopted, employing majority voting based on decisions 
from each information type. Additionally, for a few-shot prompting, random cases were 
introduced. Given that the initial inclusion of temporal information with random cases led to 
unexpected results, I also incorporated both correct and incorrect examples, anticipating that this 
would help contextualize the decision-making process. Specifically, two example cases were 
provided for each decision, following a confusion matrix: 1) identify bots as bots, 2) identify 
bots as humans, 3) identify humans as bots, and 4) identify humans as humans. To establish 
ground truth, an expert (the author) classified each user as either a bot or a human. 

The results revealed generally low F1 scores across all conditions. However, the 
condition in which temporal information was provided alongside random examples yielded the 
highest F1 score (0.44), followed by the condition with temporal information and 
correct/incorrect cases (F1 = 0.43). Other combinations of information were not effective, and 
the ensemble approach did not surpass the performance of the few-shot prompting conditions 
(see Table S-15). 

Given the low F1 scores, an additional analysis was performed using the widely-used bot 

detection tool, Botometer, to compare its output against the ground truth labels. The Botometer 
produces a score ranging from 0 to 5, with two thresholds—2.5 and 3—being used to classify 
users as bots. However, the results revealed that Botometer’s classification also failed to generate 
high F1 scores, with a score of 0.32 for the 3.0 threshold. Moreover, no true positives or false 
negatives were detected, further illustrating the challenges in bot detection. 

These results highlight the inherent complexity of bot detection. To assess the level of 

agreement among experts, three experts (including the author) were tasked with coding 52 user 
accounts as either bots or humans. The average pairwise percent agreement among experts was 
70%, underscoring the challenge of bot classification, even among trained professionals. Despite 
the limitations of the current bot detection method, I proceeded with the use of GPT-4, temporal 
information, and example cases. Although temporal information combined with random 
examples yielded the highest F1 scores, this approach resulted in a high number of false positives 
and a low rate of true negatives. To address this issue, I incorporated alternative examples (both 

78 

 
 
 
 
 
 
correct and incorrect cases), which helped reduce false positives and improve the identification 
of true positives. This revised approach was ultimately adopted for the main study. 

Table S-15 
The Performance Metrics of Bot Detection Tools 

Meta 

Text 

Temporal 

Social interaction 

Ensemble (majority voting) 

Temporal +  

Random examples 

Temporal +  

Correct and incorrect examples 

Botometer 

(threshold 3) 

Botometer 

(threshold 2.5) 

Accuracy 

Precision 

Recall 

F1 

0.65 

0.56 

0.42 

0.46 

0.60 

0.33 

0.33 

0.33 

0.24 

0.30 

0.33 

0.30 

0.36 

0.53 

0.54 

0.60 

0.40 

0.86 

0.35 

0.41 

0.33 

0.40 

0.36 

0.44 

0.45 

0.31 

0.69 

0.43 

0.72 

0.50 

0.23 

0.32 

0.85 

0 

. 

. 

79 

 
   
 
  
 
 
 
APPENDIX C: AN INSTRUCTION FOR SIMLUATING TWEETS USING GPT-4 

Here’s a compilation of tweets about genetically modified organisms (GMOs). The goal is to 
create five new tweets opposing GMO issues based on this compilation. It should be realistic. 
When including links, please ensure they come from valid sources and support the content of the 
tweet. Aim to match the average length of the example tweets provided, but feel free to vary 
your tweets' lengths within the 280-character limit 

80 

 
 
 
APPENDIX D: AN INSTRUCTION FOR PERSONA SIMULATION USING LLAMA-3 

Demographic details of this person: Ethnicity/Race: [race], Age: [#], Education: [education], 
Gender: [gender], Partisanship: [partisanship], Income: [income], Area: [area], Region: [region], 
Experience in a food or agricultural field: [# of years], Experience in a health or medical field: [# 
of years]. 

You will be asked to read a message. Please read it carefully and then answer the following 
questions. Please respond by indicating the number that best reflects your opinion. 

Message: 

[INSERTED STIMULUS BASED ON THE ASSIGNED CONDITION] 

Question 1: What is your opinion about the safety of gene editing in the context of food and 
agriculture? 
1. Extremely safe 
2. Somewhat safe 
3. Neither safe nor unsafe 
4. Somewhat unsafe 
5. Extremely unsafe 
Question 1 Answer: 

Question 2: What is your opinion about the safety of gene editing in the context of health and 
medical applications? 
1. Extremely safe 
2. Somewhat safe 
3. Neither safe nor unsafe 
4. Somewhat unsafe 
5. Extremely unsafe 
Question 2 Answer: 

Question 3: Please indicate the extent to which you think this message was likely created by a 
human, using a scale from 0 to 100 (0 = not at all, 100 = very much). 
Question 3 Answer: 

Question 4: Please indicate the extent to which you think this message was likely created by a 
bot, using a scale from 0 to 100 (0 = not at all, 100 = very much). 
Question 4 Answer: 

81