Vocal style factorization for effective speaker recognition in affective scenarios
         The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech. In this thesis, we hypothesize that speaker identity is composed of various vocal style factors that may be learned from unlabeled speech data and re-combined using a neural network architecture to generate holistic speaker identity representations for affective scenarios. In this regard we propose the E-Vector neural network architecture, composed of a 1-D CNN for learning speaker identity features and a vocal style factorization technique for determining vocal styles. Experiments conducted on the MSP-Podcast dataset demonstrate that the proposed architecture improves state-of-the-art speaker recognition accuracy in the affective domain over baseline ECAPA-TDNN speaker recognition models. For instance, the true match rate at a false match rate of 1% improves from 27.6% to 46.2%. Additionally, we provide an analysis between speaker recognition match scores and emotions to identify challenging affective scenarios.
    
    Read
- In Collections
 - 
    Electronic Theses & Dissertations
                    
 
- Copyright Status
 - Attribution 4.0 International
 
- Material Type
 - 
    Theses
                    
 
- Authors
 - 
    Sandler, Morgan Lee
                    
 
- Thesis Advisors
 - 
    Ross, Arun A.
                    
 
- Committee Members
 - 
    Kordjamshidi, Parisa
                    
Yan, Qiben
 
- Date Published
 - 
    2023
                    
 
- Subjects
 - 
    Artificial intelligence
                    
Computer science
 
- Program of Study
 - 
    Computer Science - Master of Science
                    
 
- Degree Level
 - 
    Masters
                    
 
- Language
 - 
    English
                    
 
- Pages
 - 55 pages
 
- ISBN
 - 
    9798379615116
                    
 
- Permalink
 - https://doi.org/doi:10.25335/nrsx-g437