(1 - 1 of 1)
- Robust signal processing methods for miniature acoustic sensing, separation, and recognition
- Fazel, Amin
- Electronic Theses & Dissertations
One of several emerging areas where micro-scale integration promises significant breakthroughs is in the field of acoustic sensing. However, separation, localization, and recognition of acoustic sources using micro-scale microphone arrays poses a significant challenge due to fundamental limitations imposed by the physics of sound propagation. The smaller the distance between the recording elements, the more difficult it is to measure localization and separation cues and hence it is more...
Show moreOne of several emerging areas where micro-scale integration promises significant breakthroughs is in the field of acoustic sensing. However, separation, localization, and recognition of acoustic sources using micro-scale microphone arrays poses a significant challenge due to fundamental limitations imposed by the physics of sound propagation. The smaller the distance between the recording elements, the more difficult it is to measure localization and separation cues and hence it is more difficult to recognize the acoustic sources of interest. The objective of this research is to investigate signal processing and machine learning techniques that can be used for noise-robust acoustic target recognition using miniature microphone arrays.The first part of this research focuses on designing "smart" analog-to-digital conversion (ADC) algorithms that can enhance acoustic cues in sub-wavelength microphone arrays. Many source separation algorithms fail to deliver robust performance when applied to signals recorded using high-density sensor arrays where the distance between sensor elements is much less than the wavelength of the signals. This can be attributed to limited dynamic range (determined by analog-to-digital conversion) of the sensor which is insufficientto overcome the artifacts due to large cross-channel redundancy, non-homogeneous mixing and high-dimensionality of the signal space. We propose a novel framework that overcomes these limitations by integrating statistical learning directly with the signal measurement (analog-to-digital) process which enables high fidelity separation of linear instantaneous mixture. At the core of the proposed ADC approach is a min-max optimization of a regularized objective function that yields a sequence of quantized parameters which asymptotically tracks the statistics of the input signal. Experiments with synthetic and real recordings demonstrate consistent performance improvements when the proposed approach is used as the analog-to-digital front-end to conventional source separation algorithms.The second part of this research focuses on investigating a novel speech feature extraction algorithm that can recognize auditory targets (keywords and speakers) using noisy recordings. The features known as Sparse Auditory Reproducing Kernel (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a subspace spanned by sparse, regularized, over-complete, non-linear, and phase-shifted gammatone basis functions. The feature extraction algorithm involves computing kernel functions between the speech data and pre-computed set of phased-shifted gammatone functions, followed by a simple pooling technique ("MAX" operation). In this work, we present experimental results for a hidden Markov model (HMM) based speech recognition system whose performance has been evaluated on a standard AURORA 2 dataset. The results demonstrate that the SPARK features deliver significant and consistent improvements in recognition accuracy over the standard ETSI STQ WI007 DSR benchmark features. We have also verified the noise-robustness of the SPARK features for the task of speaker verification. Experimental results based on the NIST SRE 2003 dataset show significant improvements when compared to a standard Mel-frequency cepstral coefficients (MFCCs) based benchmark.