Reverse Engineer of Deceptions : Attacks and Defenses for Deep Learning Models

The development of artificial intelligence has been so rapid that every week recently there is a new foundation model update by such as OpenAI, Anthropic, Google, XAI etc.. Ten years ago, most research was still focused on deep neural networks like AlexNet or generative adversarial networks. Now, people talk about large language models (LLM), LLM-based agents, or test time scaling, etc.. However, one thing has remained unchanged: the persistent vulnerability of AI systems to adversarial attacks and backdoor attacks, which threaten their reliability across applications. This dissertation addresses this enduring challenge by advancing the security and robustness of machine learning models through four interconnected contributions. First, it develops a reverse engineering framework to recover original images from adversarial perturbations, enhancing the resilience of image classifiers. Second, it introduces a model parsing technique to infer victim model attributes from attack instances, shedding light on attack transferability and model weaknesses. Third, it examines data poisoning in diffusion models, uncovering bilateral effects—both adversarial vulnerabilities and unexpected defensive benefits—such as improved robustness in classifiers trained on generated data. Finally, it proposes machine unlearning for vision-language models, mitigating harmful outputs and bypassing limitations of traditional safety fine-tuning which relies too much on the spurious correlation. Through all these works, the work tries to reverse engineer the deceptions, delve into the true attributes and methods of the adversaries and then defend accordingly. From image classification to image generation, from classic neural networks to foundation models like diffusion models and vision language models, the work examines through different algorithms and model architectures. These advancements, grounded in rigorous experimentation across diverse datasets, collectively strengthen AI systems against adversarial threats and training-time backdoor injections. The work offers practical tools for secure deployment in high-stakes domains. Beyond immediate applications, this research bridges the gap between the rapid evolution of AI capabilities and the foundational need for trust, laying the groundwork for future investigations into robust artificial intelligence in an era of ever-advancing foundation models.

Read