SECURITY ENHANCEMENTS FOR LLM-BASED CHATBOTS

Large Language Model (LLM)-based chatbot systems have transformed various aspects of everyday life, including smart home devices, recommendation systems, and search engines. As the capacity and integration of LLMs expand, these systems facilitate daily communications, assist with specialized tasks such as medical documentation, and even operate real-world transactions like event bookings or financial management. However, the probabilistic nature of LLMs and their reliance on human-curated training data introduce substantial security and privacy concerns. This dissertation identifies vulnerabilities in LLM-based chatbot systems and proposes practical and effective attack and defense strategies. The research spans both the training phase and the inference phase of chatbot system development and deployment, guided by the dual principles of system-level practicality and model-level vulnerability.First, this dissertation investigates a vulnerability in which an adversary can corrupt training data and inject backdoors into end-to-end LLM-based chatbot systems. Prior backdoor attacks have targeted single-turn tasks or relied on access to explicit data labels. This dissertation introduces a novel, label-free multi-turn backdoor attack, embedding subtle triggers, such as natural interjections, across several conversational turns in the fine-tuning data. Experimental evaluations demonstrate that poisoning less than 2\% of the training data enables adversaries to control LLM outputs when specific triggers are present, while maintaining stealthiness. Second, to protect user data privacy during model training, a dynamic federated learning framework is proposed. Existing federated learning client selection methods often suffer from low communication efficiency, resulting in suboptimal training time and wasted resources. This dissertation introduces a new framework that incorporates bandwidth prediction and adaptive client scheduling into federated learning for LLM training. By leveraging a long-term observation window and predictive modeling of client network conditions, the framework effectively selects the most reliable clients, ensuring faster training convergence and improved time-to-accuracy without compromising user privacy. The proposed method outperforms prior approaches under fluctuating real-world network scenarios. Third, this dissertation investigates the vulnerability of chatbot systems that generate toxic responses in user interactions. Existing evaluation efforts primarily focus on single-turn prompts, overlooking the dynamic escalation of harmful content in multi-turn dialogues. We introduce an LLM-based red teaming tool that automatically and agnostically engages in multi-turn conversations with the target model to elicit harmful outputs. We find that seemingly non-toxic individual sentences can trigger toxic behavior in conversation, and these are often classified as safe by existing tools. Fourth, this dissertation investigates practical defenses that can be implemented when only API access is available. We propose a low-cost plugin, the Moving Target Defense, to enhance LLM-based chatbot systems against jailbreak attacks. This approach dynamically adjusts decoding hyperparameters at inference time, which are settings that control randomness in next-word prediction. By modifying these parameters, the system can reject jailbreak attacks while maintaining utility and responsiveness. Importantly, this defense requires no access to the underlying model weights or retraining, and can be deployed as a post-processing layer for existing LLM APIs. Experimental results show substantial improvements in defense effectiveness compared to static black-box defense strategies.Overall, these research threads reveal a broad and urgent attack surface for LLM-based chatbot systems, ranging from subtle poisoning of training data to inference-time adversarial exploitation. Through the development and evaluation of both attack and defense techniques, this dissertation not only uncovers current vulnerabilities, but also advances mitigation technologies suitable for real-world deployment.Finally, the dissertation concludes with a discussion of future research directions, emphasizing the ongoing evolution of LLM architectures, the challenging arms race between attacks and defenses, and the necessity for systematic evaluation frameworks. As LLM-based chatbots continue to improve across critical sectors, the foundational discoveries and proposed mechanisms in this work aim to support a more secure, AI-enabled future.

Read