TOWARD PRIVATE, SECURE, AND ROBUST AI-ENABLED VOICE SERVICES
         Voice, as a primary way for people to communicate with each other and interact with computers and smart devices, is expected to be trustworthy and reliable. For example, modern authentications use voice as a biometric to verify a user's identity; users give voice commands to control the smart devices via speech-to-text services. Compared to other biometrics such as iris, fingerprint, and face ID, voice biometrics show high usability because it does not require complicated hardware other than a microphone to support the authentication. Besides, the voice biometric can also be adapted for remote call authentication. Furthermore, voice serves as a crucial interface for humans to interact with smart devices, representing the most intuitive method for giving commands to artificial intelligence (AI) agents. The potential for smart devices and robots to comprehend human speech in the future holds great promise.However, recent studies demonstrated the vulnerabilities of using voice interfaces to communicate, conduct speaker authentication, and deliver messages to smart devices. This dissertation aims to introduce the background of AI-enabled voice services; discover the vulnerabilities of modern voice models and systems; understand the root cause of the vulnerabilities; and provide security solutions to safeguard voice services.First, we focus on speaker authentication security. Particularly, we propose a secure and robust speaker verification system called SuperVoice. By discovering the high-frequency energy in human speech, we find the special characteristics between different persons, and between humans and machines. Exploiting the high-frequency energy, the SuperVoice can enhance the performance of verified speakers and defend against machine-played attacks such as replay attacks, adversarial attacks, and inaudible attacks.Moreover, we propose a backdoor attack called MasterKey to attack speaker authentication systems. Compared to previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. Second, we explore the speech recognition security. Specifically, we design a new adversarial attack named SpecPatch to attack vulnerable speech recognition models. This attack alters the speech recognition model output by injecting a short, imperceptible noise-like sound. Compared to previous adversarial audio attacks, the SpecPatch shows strong resistance under different types of distortions and is able to succeed even when the user is present. Furthermore, we propose PhantomSound, a query-efficient black-box attack toward commercial speech recognition services/APIs/voice assistants. Different from existing black-box adversarial attacks on voice assistants, PhantomSound leverages the decision-based attack to produce effective adversarial audios and reduces the number of queries by optimizing the gradient estimation. We demonstrate the danger of PhantomSound on commercial speech recognition services and off-the-shelf smart voice assistants.Third, we investigate the voice privacy protection. To address the privacy leakage issue of voice communication, we create a system, called NEC, that uses an AI model to selectively jam the user's voice from an unauthorized recorder. The NEC transmits speaker-specified noise via an inaudible channel to jam the only user's sound. We successfully implemented the NEC, and demonstrated that NEC can protect the user's sound from being recorded.This dissertation comprehensively addresses the prevalent challenges and vulnerabilities in voice-enabled services. In an age where voice-enabled devices are becoming ubiquitous in homes and public spaces, ensuring the security of these devices is paramount. Our research helps in safeguarding the privacy and safety of the general public, who are often the targets of security breaches. In conclusion, our comprehensive analysis and proactive solutions to the challenges in AI-enabled voice interaction systems represent a leap forward. We offer a security perspective in a field that is critical to the technological advancement of our society. Our contributions may lay the groundwork for safer, more secure voice AI interactions, benefiting both the security community and society as a whole.
    
    Read
- In Collections
 - 
    Electronic Theses & Dissertations
                    
 
- Copyright Status
 - In Copyright
 
- Material Type
 - 
    Theses
                    
 
- Authors
 - 
    Guo, Hanqing
                    
 
- Thesis Advisors
 - 
    Xiao, Li LX
                    
Yan, Qiben QY
 
- Committee Members
 - 
    Hunter, Eric EH
                    
Cao, Zhichao ZC
 
- Date Published
 - 
    2024
                    
 
- Subjects
 - 
    Computer science
                    
 
- Program of Study
 - 
    Computer Science - Doctor of Philosophy
                    
 
- Degree Level
 - 
    Doctoral
                    
 
- Language
 - 
    English
                    
 
- Pages
 - 208 pages
 
- Permalink
 - https://doi.org/doi:10.25335/7p7x-tv25