Automated Speaker Recognition in Non-ideal Audio Signals Using Deep Neural Networks
Speaker recognition entails the use of the human voice as a biometric modality for recognizing individuals. While speaker recognition systems are gaining popularity in consumer applications, most of these systems are negatively affected by non-ideal audio conditions, such as audio degradations, multi-lingual speech, and varying duration audio. This thesis focuses on developing speaker recognition systems robust to non-ideal audio conditions.Firstly, a 1-Dimensional Convolutional Neural Network (1D-CNN) is developed to extract noise-robust speaker-dependent speech characteristics from the Mel Frequency Cepstral Coefficients (MFCC). Secondly, the 1D-CNN-based approach is extended to develop a triplet-learning-based feature-fusion framework, called 1D-Triplet-CNN, for improving speaker recognition performance by judiciously combining MFCC and Linear Predictive Coding (LPC) features. Our hypothesis rests on the observation that MFCC and LPC capture two distinct aspects of speech: speech perception and speech production. Thirdly, a time-domain filterbank called DeepVOX is learned from vast amounts of raw speech audio to replace commonly-used hand-crafted filterbanks, such as the Mel filterbank, in speech feature extractors. Finally, a vocal style encoding network called DeepTalk is developed to learn speaker-dependent behavioral voice characteristics to improve speaker recognition performance. The primary contribution of the thesis is the development of deep learning-based techniques to extract discriminative, noise-robust physical and behavioral voice characteristics from non-ideal speech audio. A large number of experiments conducted on the TIMIT, NTIMIT, SITW, NIST SRE (2008, 2010, and 2018), Fisher, VOXCeleb, and JukeBox datasets convey the efficacy of the proposed techniques and their importance in improving speaker recognition performance in non-ideal audio conditions.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Chowdhury, Anurag
- Thesis Advisors
-
Ross, Arun
- Committee Members
-
Liu, Xiaoming
Boddeti , Vishnu
Chai, Joyce
Salem, Fathi
- Date Published
-
2021
- Subjects
-
Computer science
- Program of Study
-
Computer Science - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 167 pages
- Permalink
- https://doi.org/doi:10.25335/hcmt-7t79